Backend response optimization: queues, async, backpressure
1) Why: Goals and SLOs
The goal is a stable fast response even under bursts. Business expresses this SLO:- API (CRUD/directories): p95 ≤ 250-400 ms, error rate <1%.
- Payment/settlement (asynchronous): internal SLA for confirmation ≤ 2-5 minutes, and the client - instant 202/Accepted + status poller/webhook.
- WS/real-time: RTT p95 ≤ 120 мс, disconnect ≤ 0. 5%.
Key: to untie "slow" steps (providers, databases, external APIs) from user response through queues and competent load limitation.
2) Basic picture: where the latency is taken
Bottlenecks: database (pools/indexes), external providers (PSP/game), blocking I/O, GC/stop world, JSON serialization, "heavy" aggregations.
Symptoms: p99 growth, DB connection queue, retray bursts, retry storm.
Antidote: asynchronous pipelines + backpressure + timeouts/retreats + idempotency.
3) Asynchronous patterns: SEDA and CQRS
SEDA (staged event-driven architecture): split the processing into stages (ingress → validation → write → integration → notification). Each has its own turn and concurrency limits.
CQRS: Separate reads and writes. Writing - to the log/database, reading - from projections/caches.
Outbox: the event is published atomically along with the record (avoid "lost" messages).
Saga: long business processes with compensating transactions instead of global ones.
4) Queues and streams: selection and tuning
RabbitMQ/NATS JetStream - task commands (work queues), Kafka - events/streams with replay.
Settings that affect response:- Prefetch/max in-flight: limit the number of simultaneously processed messages per worker (for example, 16-64) so as not to "clog" the database/external API.
- Acker/repeats: 'ack' after idempotent recording; exponential delay and jitter repeats.
- DLQ/parking lot: there are no endless retreats - after N attempts, goes to Dead Letter Queue.
- Partitioning (Kafka): key by essence (userId/txnId) for ordering; parallelism through the number of parties.
5) Backpressure - how not to drown
The idea: Take only as much as you can process within the latency of the SLO.
Technicians:- Admission control: limit competition (semaphore/worker-pool) for each external dependency: database, PSP, game provider.
- Traffic shaping: token-bucket/leaky-bucket at the service entrance and on critical routes.
- Queues with an upper border: when full, cut off the tail (429/503 + Retry-After) or transfer to asap-batch.
- Adaptive concurrency (AIMD): increase parallelism on success, decrease on timeouts.
- Circuit Breaker: 'closed → open → half-open' by errors/timeouts of the external API; when open - degradation (cache/stub).
go sem: = make (chan struct {}, 64 )//limit of competition to DB/PSP
func handle(req) {
select {
case sem <- struct{}{}:
defer func(){ <-sem }()
ctx, cancel:= context. WithTimeout(req. ctx, 300time. Millisecond)
defer cancel()
res, err:= db. Do(ctx, req)
if err == context. DeadlineExceeded { metrics. Timeouts. Inc(); return TooSlow() }
return Ok(res)
default:
metrics. Backpressure. Inc()
return TooBusy(429, "Retry-After: 0. 2")
}
}
6) Timeouts, retreats and jitter: "three survival whales"
Timeouts shorter than SLO: if SLO 400 ms, timeout to DB/provider 250-300 ms; total request timeout <400-600 ms.
Retrai limited and smart: 1-2 max attempts, only for safe operations (idempotent), with exponent and jitter.
Coalessing: Aggregate replays for a single key.
Pseudocode (exponent + jitter):python for attempt in range(0, 2):
try:
return call(dep, timeout=0. 3)
except Timeout:
backoff = (0. 05 (2attempt)) + random. uniform(0, 0. 05)
sleep(backoff)
raise UpstreamUnavailable
7) Identity and deduplication
Idempotency-Key on HTTP (deposits, payments), 'operation _ id' in the database (unique index).
Inbox/Outbox: incoming webhooks - always through an unchangeable inbox table with dedupe by 'event _ id'; outbound - from outbox by transaction.
Exactly-once "in meaning": we allow repeated delivery/execution, but there is only one effect.
8) Fast API for slow operations
Synchronous response: 201/202 + status URL ('/status/{ id} '), ETA and retro hints.
Webhooks/Server-Sent Events/WS - push the state when ready.
Client discipline: 'Retry-After', idempotence, polling limit.
Example of response:json
HTTP/1. 1 202 Accepted
Location: /v1/withdrawals/req_9f2/status
Retry-After: 2
{
"request_id": "req_9f2", "state": "processing", "next_check_sec": 2
}
9) Minimize hot work
Put heavy things in the background: transformations, aggregations, notifications, writing to DWH.
Cache and projections: commonly read - cache-aside with short TTL and event disability.
Batch patterns: group external calls (e.g. request provider limits once in N ms).
Serialization: fast codecs (protobuf/msgpack) for service-to-service communications; JSON on edge only.
10) DB under control
Connection pools: upper bounds (based on cores/IO), queues to pool enabled.
Indexes and plan: p95 explain + regression autotests of plans.
Request timeouts: short, 'statement _ timeout' (Postgres).
Hot rows/locks: key shardings, optimistic locks (balance version), saga instead of a "monolithic" transaction.
11) WebSocket/real-time
Newsletter limiter: batched broadcast, max msgs/sec per connection.
Internal backpressure: outbound message queue with a cap; on overflow - drop low-priority.
Sticky-routing and PDB during releases - so as not to produce a reconnect storm.
12) Observability so as not to guess
Metrics (RED/USE + backpressure):- 'request _ rate ',' error _ ratio ',' latency _ p95/p99'on routes.
- `queue_depth`, `lag_seconds`, `consumer_inflight`, `retries_total`, `dlq_rate`.
- `backpressure_drops`, `admission_rejects`, `circuit_open`.
- Для БД: `connections_in_use/max`, `locks`, `slow_queries`.
- Traces: spans' queue → worker → db/psp'with tags' operation _ id ',' partition ',' retry '.
- Logs: structural, with 'trace _ id', without PII; individual "open/close circuit" events.
13) Load testing
Open-model (arrivals/sec) for bursts; Closed-model (VUs) for sessions.
Profiles: brief burst 60-120 s and soak 1-4 h.
Injection failures: slow down the external API by + 200-500 ms, look at p99/retrai/queues.
Green zone criteria: no growth 'queue _ lag', stable p95, 'dlq_rate≈0'.
14) Safety and reliability
TLS/mTLS queues, message signing, schema monitoring (Avro/Protobuf + Schema Registry).
Idempotent producer (Kafka), exactly-once tx where justified.
Chaos mode: periodically "drop" the addiction and look at degradation (circuit, fallback).
15) Examples of "pieces" of configurations
Nginx/Envoy input shaping:nginx limit_req_zone $binary_remote_addr zone=api:10m rate=20r/s;
server {
location /api/ {
limit_req zone=api burst=40 nodelay;
proxy_read_timeout 0. 6s; # is shorter than SLO proxy_connect_timeout 0. 2s;
}
}
RabbitMQ (prefetch):
basic. qos (prefetch_count = 32) # CPU/IO balance
Kafka consumer (Java fragment):
java props. put(ConsumerConfig. MAX_POLL_RECORDS_CONFIG, 200);
props. put(ConsumerConfig. FETCH_MAX_BYTES_CONFIG, 5_000_000);
props. put(ConsumerConfig. MAX_POLL_INTERVAL_MS_CONFIG, 60_000);
16) Implementation checklist (prod-ready)
- Critical paths are divided into synchronous response and asynchronous processing (SEDA).
- Admission control and competition limits for external dependencies.
- Timeouts are shorter than SLO; retrai ≤ 2, with exponent and jitter; coalessing.
- Circuit breaker + degradation (cache/stub), half-open policy.
- Queues/streams: prefetch/in-flight, DLQ, key batches.
- Idempotency (operation_id/Idempotency-Key), Outbox/Inbox, deduplication.
- Cache: cache-aside, short TTL + event disability.
- DB: pool limits, statement_timeout, indexes, anti-lock strategies.
- WS: message limits, butching, sticky-routing, PDB.
- Observability: backpressure/queues/retries metrics, end-to-end trails, dashboards.
- Load and failure tests (open + closed, burst + soak), green zone criteria.
Resume Summary
A fast backend is not a "make another cache," but a controlled stream: entry is limited, heavy - in the background, each stage with a queue and limits, retrays are rare and smart, and chains are protected by circuit breaker and idempotency. Add timeout discipline, observability and regular stress tests - and your p95/p99 will remain green even under the bursts and whims of external providers.