Backend response optimization: queues, async, backpressure

1) Why: Goals and SLOs

The goal is a stable fast response even under bursts. Business expresses this SLO:

API (CRUD/directories): p95 ≤ 250-400 ms, error rate <1%.
Payment/settlement (asynchronous): internal SLA for confirmation ≤ 2-5 minutes, and the client - instant 202/Accepted + status poller/webhook.
WS/real-time: RTT p95 ≤ 120 мс, disconnect ≤ 0. 5%.

Key: to untie "slow" steps (providers, databases, external APIs) from user response through queues and competent load limitation.

2) Basic picture: where the latency is taken

Bottlenecks: database (pools/indexes), external providers (PSP/game), blocking I/O, GC/stop world, JSON serialization, "heavy" aggregations.

Symptoms: p99 growth, DB connection queue, retray bursts, retry storm.

Antidote: asynchronous pipelines + backpressure + timeouts/retreats + idempotency.

3) Asynchronous patterns: SEDA and CQRS

SEDA (staged event-driven architecture): split the processing into stages (ingress → validation → write → integration → notification). Each has its own turn and concurrency limits.

CQRS: Separate reads and writes. Writing - to the log/database, reading - from projections/caches.

Outbox: the event is published atomically along with the record (avoid "lost" messages).

Saga: long business processes with compensating transactions instead of global ones.

4) Queues and streams: selection and tuning

RabbitMQ/NATS JetStream - task commands (work queues), Kafka - events/streams with replay.

Settings that affect response:

Prefetch/max in-flight: limit the number of simultaneously processed messages per worker (for example, 16-64) so as not to "clog" the database/external API.
Acker/repeats: 'ack' after idempotent recording; exponential delay and jitter repeats.
DLQ/parking lot: there are no endless retreats - after N attempts, goes to Dead Letter Queue.
Partitioning (Kafka): key by essence (userId/txnId) for ordering; parallelism through the number of parties.

5) Backpressure - how not to drown

The idea: Take only as much as you can process within the latency of the SLO.

Technicians:

Admission control: limit competition (semaphore/worker-pool) for each external dependency: database, PSP, game provider.
Traffic shaping: token-bucket/leaky-bucket at the service entrance and on critical routes.
Queues with an upper border: when full, cut off the tail (429/503 + Retry-After) or transfer to asap-batch.
Adaptive concurrency (AIMD): increase parallelism on success, decrease on timeouts.
Circuit Breaker: 'closed → open → half-open' by errors/timeouts of the external API; when open - degradation (cache/stub).

Pseudocode (Go-like):

go sem: = make (chan struct {}, 64 )//limit of competition to DB/PSP

func handle(req) {
select {
case sem <- struct{}{}:
defer func(){ <-sem }()
ctx, cancel:= context. WithTimeout(req. ctx, 300time. Millisecond)
defer cancel()
res, err:= db. Do(ctx, req)
if err == context. DeadlineExceeded { metrics. Timeouts. Inc(); return TooSlow() }
return Ok(res)
default:
metrics. Backpressure. Inc()
return TooBusy(429, "Retry-After: 0. 2")
}
}

6) Timeouts, retreats and jitter: "three survival whales"

Timeouts shorter than SLO: if SLO 400 ms, timeout to DB/provider 250-300 ms; total request timeout <400-600 ms.

Retrai limited and smart: 1-2 max attempts, only for safe operations (idempotent), with exponent and jitter.

Coalessing: Aggregate replays for a single key.

Pseudocode (exponent + jitter):

python for attempt in range(0, 2):
try:
return call(dep, timeout=0. 3)
except Timeout:
backoff = (0. 05 (2attempt)) + random. uniform(0, 0. 05)
sleep(backoff)
raise UpstreamUnavailable

7) Identity and deduplication

Idempotency-Key on HTTP (deposits, payments), 'operation _ id' in the database (unique index).

Inbox/Outbox: incoming webhooks - always through an unchangeable inbox table with dedupe by 'event _ id'; outbound - from outbox by transaction.

Exactly-once "in meaning": we allow repeated delivery/execution, but there is only one effect.

8) Fast API for slow operations

Synchronous response: 201/202 + status URL ('/status/{ id} '), ETA and retro hints.

Webhooks/Server-Sent Events/WS - push the state when ready.

Client discipline: 'Retry-After', idempotence, polling limit.

Example of response:

json
HTTP/1. 1 202 Accepted
Location: /v1/withdrawals/req_9f2/status
Retry-After: 2
{
"request_id": "req_9f2",  "state": "processing",  "next_check_sec": 2
}

9) Minimize hot work

Put heavy things in the background: transformations, aggregations, notifications, writing to DWH.

Cache and projections: commonly read - cache-aside with short TTL and event disability.

Batch patterns: group external calls (e.g. request provider limits once in N ms).

Serialization: fast codecs (protobuf/msgpack) for service-to-service communications; JSON on edge only.

10) DB under control

Connection pools: upper bounds (based on cores/IO), queues to pool enabled.

Indexes and plan: p95 explain + regression autotests of plans.

Request timeouts: short, 'statement _ timeout' (Postgres).

Hot rows/locks: key shardings, optimistic locks (balance version), saga instead of a "monolithic" transaction.

11) WebSocket/real-time

Newsletter limiter: batched broadcast, max msgs/sec per connection.

Internal backpressure: outbound message queue with a cap; on overflow - drop low-priority.

Sticky-routing and PDB during releases - so as not to produce a reconnect storm.

12) Observability so as not to guess

Metrics (RED/USE + backpressure):

'request _ rate ',' error _ ratio ',' latency _ p95/p99'on routes.
`queue_depth`, `lag_seconds`, `consumer_inflight`, `retries_total`, `dlq_rate`.
`backpressure_drops`, `admission_rejects`, `circuit_open`.
Для БД: `connections_in_use/max`, `locks`, `slow_queries`.
Traces: spans' queue → worker → db/psp'with tags' operation _ id ',' partition ',' retry '.
Logs: structural, with 'trace _ id', without PII; individual "open/close circuit" events.

13) Load testing

Open-model (arrivals/sec) for bursts; Closed-model (VUs) for sessions.

Profiles: brief burst 60-120 s and soak 1-4 h.

Injection failures: slow down the external API by + 200-500 ms, look at p99/retrai/queues.

Green zone criteria: no growth 'queue _ lag', stable p95, 'dlq_rate≈0'.

14) Safety and reliability

TLS/mTLS queues, message signing, schema monitoring (Avro/Protobuf + Schema Registry).

Idempotent producer (Kafka), exactly-once tx where justified.

Chaos mode: periodically "drop" the addiction and look at degradation (circuit, fallback).

15) Examples of "pieces" of configurations

Nginx/Envoy input shaping:

nginx limit_req_zone $binary_remote_addr zone=api:10m rate=20r/s;
server {
location /api/ {
limit_req zone=api burst=40 nodelay;
proxy_read_timeout 0. 6s; # is shorter than SLO proxy_connect_timeout 0. 2s;
}
}

RabbitMQ (prefetch):


basic. qos (prefetch_count = 32) # CPU/IO balance

Kafka consumer (Java fragment):

java props. put(ConsumerConfig. MAX_POLL_RECORDS_CONFIG, 200);
props. put(ConsumerConfig. FETCH_MAX_BYTES_CONFIG, 5_000_000);
props. put(ConsumerConfig. MAX_POLL_INTERVAL_MS_CONFIG, 60_000);

16) Implementation checklist (prod-ready)

Critical paths are divided into synchronous response and asynchronous processing (SEDA).
Admission control and competition limits for external dependencies.
Timeouts are shorter than SLO; retrai ≤ 2, with exponent and jitter; coalessing.
Circuit breaker + degradation (cache/stub), half-open policy.
Queues/streams: prefetch/in-flight, DLQ, key batches.
Idempotency (operation_id/Idempotency-Key), Outbox/Inbox, deduplication.
Cache: cache-aside, short TTL + event disability.
DB: pool limits, statement_timeout, indexes, anti-lock strategies.
WS: message limits, butching, sticky-routing, PDB.
Observability: backpressure/queues/retries metrics, end-to-end trails, dashboards.
Load and failure tests (open + closed, burst + soak), green zone criteria.

Resume Summary

A fast backend is not a "make another cache," but a controlled stream: entry is limited, heavy - in the background, each stage with a queue and limits, retrays are rare and smart, and chains are protected by circuit breaker and idempotency. Add timeout discipline, observability and regular stress tests - and your p95/p99 will remain green even under the bursts and whims of external providers.