Why Infrastructure Scaling Is Important
Why do businesses need to scale
Revenue without "ceiling." Peak events (derbies, finals, major slot releases) multiply RPS. Scalability turns traffic spikes into GGR growth rather than 5xx errors.
Stable SLOs. We keep the p95 latency of critical paths (rate, balance update, withdrawal) within the target framework for any online.
Cost under control. Elasticity = pay for a "hot watch" rather than a "constant high."
Regulatory and brand. The availability and predictable operation of the cash register/wallet is the subject of audit and player confidence.
Scaling types
Horizontal (scale-out)
Add service instances. Basis for stateless-API, bridge to providers, web gateways, workers. Pros: fault tolerance, elasticity. Cons: Idempotence and external condition are required.
Vertical (scale-up)
Increasing node resources. Suitable for databases and OLAP clusters, but has a limit and is more expensive per unit gain.
Geographical
Multi-AZ and, if necessary, multi-region: closer to the player → lower delay for bets/streams and more resistance to accidents.
What exactly scales in a casino
Edge and API: gateways, WAF, GraphQL/REST, WebSocket hubs (bets/events).
Bridge to providers: live/RNG adapters with HPA by RPS and time to 'bet. accepted`.
Wallet/ledger: stateful-core - scaling through replicas for reading, sharding and transaction optimization.
Cash desk: separate pools for payment providers/crypto on/off-ramp, queues for payments.
Queues/event bus: Kafka/NATS cluster with autoscaling consumers.
Cache/directories: Redis/Memory-caching of hot keys, CDN for static assets.
Streaming: WebRTC/LL-HLS edge nodes with autofolback and autoscale over QoS.
Engineering Philosophy
1. Idempotence in money. Any retray by 'bet. place`/`payout. request 'is processed exactly once (idempotence key).
2. Queues and backpressure. Critical paths are not blocked: if the provider/database is slow, requests fall into the buffer with a controlled "drain," secondary features degrade first.
3. Cache first. Read-heavy queries (balance, lobby) - via cache/materialized views; disability - by events.
4. Sharding. We separate data/streams (by 'playerId', country, provider, currency).
5. Consistency is where the money is. Strict ACID for wallet/ledger only; the rest is eventual through events.
6. Observability before release. Metrics/trails are part of the service contract, otherwise autoscale is "blind."
Metrics and Objectives (SLO/SLA)
p95/p99 latency:- `bet. place '≤ 150-250 ms (within the region), 'wallet. debit/credit` ≤ 50–100 мс, `payout. quote/submit` ≤ 500–800 мс.
- Error rate: '5xx' <0. 1–0. 3% on API, 'reject _ rate' bets <0. 2% during normal operation.
- Throughput: RPS on API/bridge; events/sec on the bus.
- Queues: length and waiting time (for example, payments ≤ 2-5 minutes during peak hours).
- Stream QoS: dropped frames, RTT betting signals, abortion rounds.
- Cache hits: hit-ratio> 85-95% on hot keys.
- Cost/Revenue: cost of infrastructure/GGR, cost of request (µ $ per call).
Domain scaling patterns
Wallet and ledger
Reader-replicas for reading; writer - one per shard.
CQRS: write (strictly) separate from read (materialized slices).
Batch reconciliation and "touch-up" transactions - strictly through the append-only journal.
Bridge/game integrations
Stateless adapters with autoscale by latency of'bet. accepted`.
Circuit breaker for each provider, with degradation - temporary degradation of UI and disabling tables.
Payments/Crypto
Dedicated pool for webhook 'and PSP/on-chain listeners; reprocessing by idempotency.
Router by provider based on SLA/cost/country.
Loading operations
Workers/jobs (bonuses, missions, tournaments) - in queues; are scaled by queue length and deadlines.
Streaming
Edge pools for regions, WebRTC → LL-HLS auto-foul; vertical bitrate/quality limits for QoS retention.
Architectural solutions
HPA/VPA/Cluster Autoscaler: HPA — на API/bridge; VPA - to ETL/reports; nodes - heterogeneous pools (CPU-heavy, memory-heavy, network-optimized).
PodDisruptionBudget and priorities: the core of money is protected from displacement.
Feature flags and canary releases: scale new features to a percentage of traffic.
Geo-routing: Anycast/DNS and regional ingress gateways are closer to the user.
Cost and efficiency
Resource profiles. Requests/limits are set and correspond to the real profile (without CPU-throttling on critical paths).
Spot pools for analytics/ETL and background jobs.
Auto-disable test/stage environments outside the working window.
Cache instead of cores. It is cheaper to add Redis hits than to multiply the CPU by the database.
Scale-out security
mTLS/mesh between services as the call graph grows.
NetworkPolicy: Money/PII domains are separate trust zones.
Rotating secrets and signing images - more nodes = more risk locations.
Blast-radius control: sharding and request limits protect against cascade.
Anti-patterns
Scale monolith with global locks: increase in hearths = increase in conflicts.
Warm clusters forever "at peak," instead of HPA and degradation of "secondary" features.
Mix OLTP and OLAP on the same database - any report kills bet delays.
Lack of idempotency - doubles of debit at retreats (especially at the peak).
Blind autoscale by CPU - ignores the real metric (time 'bet. place ', queue length).
One payment provider per country - there is nothing to scale when it "lies."
Scaling implementation checklist
Strategy
- SLO (p95 latencies, errors, RPS) and error budget are defined.
- Domain segmentation: money/rates/cash desk - separate from secondary features.
Data
- Sharding/replicas, CQRS per read, materialized views.
- A cache layer with a clear disability policy.
Infrastructure
- HPA/VPA, miscellaneous node pools, PDBs, and priorities.
- Geo-routing, multi-AZ, DR readiness
Applications
- IdempotencyKey for money/payments/webhooks.
- Circuit breakers and timeouts; backpressure/queues.
- Feature flags and canary.
Observability
- Trails are end-to-end (ingress → API → wallet → provider → webhook).
- Dashboards RPS/latency/errors/queues/QoS stream.
- Alerts to 'reject _ rate' growth and 'round' degradation. settle`.
Cost
- Correct requests/limits, spots for background tasks, auto-sleep non-prod.
Scaling infrastructure is not about "more servers." This is about controlled elasticity: where hard consistency (money) is needed - we design a shard core and fast transactions; where possible - we transfer to events, queues and caches. Add to this observability, geography and release discipline - and the platform will withstand any peak without compromises on SLO, P&L and player confidence.