Docker and Kubernetes in iGaming: Deploy strategies
1) iGaming Context: Platform Requirements
Real time (live games, bets, tournament events) → strict p95 API/WS.
Traffic peaks (streams/promos) → fast autoscale without a cold start.
Money and compliance → loop isolation, release traceability, access control and auditing.
Multi-jurisdictions/brands → tenas (namespaces/projects), network and resource isolation policies.
Key SLOs: login ≥ 99. 9%, deposit ≥ 99. 85%, p95 API ≤ 250-400 ms, p95 WS RTT ≤ 120 ms.
2) Basic architecture on Kubernetes
Layers: Ingress/Edge → API/gateways → services (wallet, profile, promo, anti-fraud) → queues/streams → storage.
Isolation: 'namespace' on brand/market or "cell" by region; individual NodePools (public API/batch/ws-realtime).
Network policies: 'NetworkPolicy' on a "deny by default" basis, separate egress policies to PSP/KYC/game providers.
Storages: 'StorageClass' with replication within a zone/region, operators for databases/caches (Postgres/MySQL, Redis, Kafka).
3) Container images: quality and safety
Multi-arch (amd64/arm64), distroless or slim-bases, only necessary binaries.
SBOM and vulnerability scanning, image signing (Cosign), reception policy ('ImagePolicyWebhook').
Immutable-tagging: releases by 'sha256'; "latest" is prohibited.
Runtime profiles: 'readOnlyRootFilesystem', 'runAsNonRoot', 'seccomp/AppArmor', minimal Capabilities.
4) Release strategies: when and what to choose
RollingUpdate (default)
No downtime; for most APIs.
Control via readiness/liveness/startup probes, maxUnavailable/maxSurge.
Blue-Green
Parallel stacks Blue and Green; traffic switching at Ingress/Service level.
Good for large schema/config changes; fast rollback.
Canary
Gradual inclusion of a percentage of traffic (5→10→25→50→100).
Trigerim SLO-gates: p95, error-rate, anomalies in deposits/rates.
Options: Service Mesh (Istio/Linkerd), Ingress controller with canary annotations.
A/B и Shadow
Shadow: mirror some of the traffic to the new release without answering the user (pure telemetry).
A/B: functional experiments with flags (feature-flags) and segmentation of players/markets.
5) GitOps and Configuration Management
GitOps (Argo CD/Flux): clusters read the desired state from Git; all changes through PR and review.
Templates: Helm/Kustomize, a single chart library.
Secrets: External Managers (Vault/Cloud SM), 'ExternalSecrets '/' Secrets Store CSI'; KMS keys and rotation.
Pipeline (simplified):1. CI collects the signed image → push into registers.
2. PR changes image version/config → GitOps applies.
3. Canary rollout with SLO-gates → automatic promotion or auto-rollback.
6) Autoscaling for peaks and WS load
HPA by application metrics (RPS, p95 latency, queue lag), not just CPU/RAM.
KEDA for event skale (Kafka, RabbitMQ, Redis, HTTP-queue).
VPA for daily editing of requests/limits.
Cluster Autoscaler + warm pools of nodes (pre-provision) for the duration of promo/tournaments.
WebSocket specifics:- individual NodePools (more network descriptors), 'PodDisruptionBudget' for soft update, sticky-routing (Session Affinity) via Ingress/Mesh.
7) Stateful-contours: wallet, database, queues
Operators (Postgres/MySQL, Redis Sentinel/Cluster, Kafka Operator): declarative replication, 'PITR', automatic backups.
RPO/RTO policy: synchronous replication within the zone, asynchronous to DR regions.
Idempotency/outbox for deposits/payouts, inbox pattern for PSP webhooks and game providers.
StorageClass with fast IOPS; for the wallet - a separate class and nodes with local SSDs (and replication).
8) Network layer and gateways
Ingress (Nginx/Envoy/HAProxy/ALB) with mTLS to backends, HTTP/2/3, HSTS, rate-limits.
Service Mesh: canary routes, retrays/timeouts, circuit-breakers, TLS within the cluster by default.
Egress gateways: whitelisting to PSP/KYC/providers, DNS and IP control.
9) Observability and SLO release gates
OpenTelemetry: traces through the front→API→platyozh/igrovoy provider; 100% errors and "slow" spans.
RED/USE metrics + business SLI (deposit/bet/output success).
JSON logs with 'trace _ id', WORM for audit.
Release-gates: promote only if SLO is green on the test share.
10) Security: from supply chain to runtime
Policy as Code: OPA/Gatekeeper/Kyverno (prohibition privileged, requirement 'runAsNonRoot', limits, pull-checks of signature).
Secrets and keys: only from Secret Manager; 'envFrom' minimize, sidecar-injection secrets.
Webhooks/webhooks of providers: HMAC signatures, idempotency, egress gateway.
Compliance: audit of releases, artifacts, accesses (RBAC/MFA), geo-isolated storage of CCP artifacts/logs.
11) Multi-region, failover and DR
Active standby by region (minimum for wallet/login/payments).
Traffic routing: GSLB/Anycast; health checks by SLI (login/deposit/rate).
Catastrophic switching: DR-cutover button (freeze writes → promote DB → warm up caches → phased roll of traffic).
Exercise: quarterly GameDay with PSP, zone, game provider "falling."
12) Configuration and feature management
Feature-flags (configs in ConfigMap/External Config) - disabling heavy functions in case of an accident.
Versioned configs (hashes, checksum annotations on Pod), canary config rollout.
Runtime overrides at the Mesh/Ingress level (timeouts, retry policies) without rebild images.
13) Economics and productivity
NodePools by assignment: RPS-API, WS-realtime, batch/ETL.
Spot/Preemptible для batch/ETL с `PodPriority` и `PodDisruptionBudget`.
Batch compilation and warm-up (JIT/template cache) to reduce cold-start.
Resource budgets: requests/limits, VPA recommendations, connection limits to database/PSP, connection pooling.
14) Manifest templates (fragments)
Deployment with canary via Ingress annotations:yaml apiVersion: apps/v1 kind: Deployment metadata:
name: payments-api spec:
replicas: 6 strategy:
type: RollingUpdate rollingUpdate: {maxSurge: 2, maxUnavailable: 1}
template:
metadata:
labels: {app: payments-api, version: v2}
spec:
securityContext: {runAsNonRoot: true}
containers:
- name: app image: registry/payments@sha256:...
ports: [{containerPort: 8080}]
resources:
requests: {cpu: "300m", memory: "512Mi"}
limits: {cpu: "1", memory: "1Gi"}
readinessProbe:
httpGet: {path: /healthz, port: 8080}
periodSeconds: 5
HPA by custom metric (RPS/latency via Prometheus Adapter):
yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: {name: payments-api}
spec:
scaleTargetRef: {apiVersion: apps/v1, kind: Deployment, name: payments-api}
minReplicas: 6 maxReplicas: 60 metrics:
- type: Pods pods:
metric:
name: rps_per_pod target:
type: AverageValue averageValue: "120"
NetworkPolicy (Ingress gateway only and egress needed):
yaml apiVersion: networking. k8s. io/v1 kind: NetworkPolicy metadata: {name: payments-restrict}
spec:
podSelector: {matchLabels: {app: payments-api}}
policyTypes: ["Ingress","Egress"]
ingress:
- from: [{namespaceSelector: {matchLabels: {gw: ingress}}}]
egress:
- to: [{ipBlock: {cidr: 10. 0. 0. 0/8}}] # internal services
- to: [{namespaceSelector: {matchLabels: {svc: psp-egress}}}]
15) Release checklist (prod-ready)
- Image signed, SBOM collected, vulnerabilities at acceptable level.
- Manifests pass policy-check (Kyverno/OPA), minimum privileges.
- Readiness/Startup probes correct; 'PDB' and 'PodPriority' configured.
- Canary plan: 5%→10%→25%→50%→100% with SLO gates and auto-rollback.
- HPA/KEDA + Cluster Autoscaler; warm-pool nodes for the event.
- Secrets from Vault/SM; configs are versioned; feature flags are ready for degradation.
- e2e tracing enabled; alerts on SLI (deposit/rate/withdrawal).
- DR-plan and "button" cutover are checked at the stand; backups/PITR tested.
- Documentation: how to roll back, how to switch PSP/game provider, who to call at night.
16) Anti-regression and type traps
Grace period readiness too short → early 5xx at rollout.
Single DB pool without → limits in case of an avalanche of connections.
Secrets in environment variables without rotation → leak.
Mesh without limits/timeouts → freezes on degrading providers.
HPAs only on CPU → WS/API do not have time to scale.
Resume Summary
Deploy strategies in iGaming are a combination of reliable container practices (secure images, access policy), smart releases (canary/blue-green with SLO gates), the right autoscale (HPA/KEDA + warm nodes for peaks), operators for stateful loops, and multi-regional DR. Add GitOps, tracing through payments and game providers, network policies and savings through specialized NodePools - and your releases will be predictable, fast and safe for money and players.