Why it is critical to log and trace API requests
Full article
1) Why even logs and tracing in iGaming
Money and reputation. Any loss/double of the settlement is direct losses. We need proof that the operation took place once.
Regulatory. Reporting, disputes, investigations - without magazines you are "blind."
SLOs and incidents. Is latency growing? Deposit conversion falling? The trails will show a bottleneck.
Security and fraud. Abnormal patterns, replays, scripting attacks are visible in telemetry.
Conclusion: observability is part of the design of money, not the "final touch."
2) What exactly to trace and log
2. 1 Chain-wide correlation
'trace _ id '- one per request from edge → domain services → bus → consumers.
'span _ id '- for each hop, with'parent _ span _ id'.
Business keys: 'tenant _ id/brand _ id/region', 'player _ id' (alias), 'session _ id', 'round _ id', 'bet _ id', 'settlement _ id', 'idempotency _ key'.
2. 2 What to write in the logs (structure)
Timestamp ISO-8601 with timezone.
Method/path/status, duration (ms), payload size (bytes).
Outcome and error class ('business/4xx/5xx'), code ('RG _ BLOCK', 'DUPLICATE', 'IDEMPOTENCY _ MISMATCH').
Host/zone/build version, service name and environment ('prod/eu-west-1').
Network characteristics: IP/ASN (aggregated), user-agent (truncated/normalized).
2. 3 Where - by layers
Edge/API gateway: authentication, rate limits, geo/bot filters.
Domains (Wallet/Bonus/RGS): commands/events, saga statuses, database/cache latency.
Bus/queues: lag, retry, DLQ, deadup.
Kacca/PSP: authorizations, 3-DS, merchant/route.
3) Formats: structured logs only
Free text is useless for search and alerts. Use JSON strings (one entry - one string).
Example (truncated):json
{
"ts":"2025-10-23T16:21:05. 481Z", "env":"prod", "service":"wallet", "version":"1. 14. 3", "level":"INFO", "event":"bet. settle", "trace_id":"tr_a1b2c3", "span_id":"sp_01", "tenant_id":"brand-7", "region":"EU", "bet_id":"b_001", "round_id":"r_8c12", "idempotency_key":"settle_r_8c12_1", "latency_ms":124, "status":"credited", "win_minor":1460, "currency":"EUR"
}
4) Tracing: OpenTelemetry as Standard
HTTP/gRPC/DB/cache instrumentation + custom spans on sagas ('authorize → commit → settle → credit').
Context propagation: W3C Trace Context ('traceparent', 'tracestate'), in webhooks - headers.
Baggage: brand/region/trace flags only, not PII.
Sampling:- default 1-10% for total traffic, always 100% for monetary error/latency> SLO, dynamic upsampling on incident.
5) WORM audit and immutability
For critical actions (changing limits, key-rotation, jackpot configs, manual support operations) - WORM storage (write once read many).
Requirements: immutability, signatures/hashes, independent compliance access, retention by law (for example, 5-7 years).
6) PII and log security
Do not log PAN, CVV, document-ID, e-mail/phone in clear text. Mask/tokenize.
In the logs, use the player's pseudo-identifier (stable hash).
Secrets/tokens never get into the log (filters at the SDK/agent level).
Data residency: journals and trails physically in the region (EU/UK/BR...), with separate access roles (RBAC/ABAC).
Encryption at-rest/in-transit, access by temporary tokens, the principle of minimum rights.
7) Metrics and SLOs that hold the platform
Latency p95/p99 by key endpoints: 'bets. authorize`, `bets. settle`, `wallet. credit`, `cashier. deposit`.
Error rate by class and code.
Queue/consumer lag (tire),% of retreats and "storms."
Settle lag (from outcome to credit), deposit success rate by PSP/geo.
Webhook lag p99 by topic.
Alerts - according to the "SLO budget" (exceeded the error/latency budget for the window → incident).
8) For Investigations and Disputes: Minimum Intake
Cross-reference'trace_id event_id idempotency_key settlement_id'.
Time snapshot of sagas statuses.
Signature/hash of the request/webhook (for non-repudiation).
Screenshot/snapshot of the configuration (version of the bonus/jackpot rules) by 'ts'.
9) Storage and cost
Hot (7-14 days): search for incidents and post-mortems.
Warm (30-90 days): product analytics and fraud patterns.
Cold/archive (≥ 1 year): legal/regulatory needs.
Apply filters and sampling, store units, turn on TTL and compression. Use indexing for 'trace _ id', 'tenant _ id', 'event', 'status _ code'.
10) Checklists
For platform/operator
- Everywhere there is' trace _ id ',' idempotency _ key ', tags' tenant/brand/region '.
- Structured JSON logs; OpenTelemetry on HTTP/gRPC/DB/cache/bus.
- WORM audit of crete actions; retention by regulation.
- PII/secret masking; magazines and trails - by region.
- SLO dashboards: p95/p99, error-rate, queue-lag, settle-lag, webhook-lag.
- Alerts on the SLO budget; auto-upsampling trails under degradation.
- DR/xaoc exercises: double-delivery, region fall, wallet delay.
- Access to logs and tracks - RBAC/ABAC, "four eyes" for export.
For Provider (RGS/live/JP)
- Send/scan 'trace _ id' and 'idempotency _ key' to all calls and webhooks.
- Logs - structured; the error code/class is committed.
- Webhooks signed; I store 'event _ id' and deduplication.
- Trace the outcome/settlement, measure 'settle _ lag'.
- PII masking; tokens/keys are not logged.
- I make sampling reasonable (100% for errors and monetary anomalies).
11) Anti-patterns (red flags)
Text logs without structure and 'trace _ id'.
The absence of'idempotency _ key'in the logs of write operations.
Logging PII/secrets, writing Bearer tokens.
Logs of all regions in one bucket without distinction.
Lack of WORM for crete activities; "editable" audits.
Events are posted bypassing outbox/CDC → "lost" operations.
Blind 100% tracing without filters (storage ruin, noise).
No dashboards SLO/alert; investigations last for hours.
12) Implementation by steps (realistic)
1. Enter a single 'trace _ id' and 'idempotency _ key' in all contracts (REST/gRPC/webhooks).
2. Translate logs to JSON; add required fields (service, version, region, timestamp, codes).
3. Connect OpenTelemetry and context propagation; minimum tracing for money ways.
4. Configure SLO dashboards and alerts; define budgets.
5. Enable WORM audit of critical actions; determine retention.
6. Introduce PII/secret masking, log access restriction.
7. Add chaos cases and exercises, practice post-mortems.
8. Optimize storage: sampling, TTL, archives.
13) The bottom line
Logs and tracing are not "convenient to have," but an irrevocable obligation of the platform and the iGaming provider. Structured logs, end-to-end trails, WORM auditing, PII protection and SLO observability turn incidents into manageable events, and disputes into reproducible cases. On such a foundation, money moves once, reporting is reproduced at any time, and the team remains fast and calm - even at the peak of tournaments and during releases.