API va infratuzilmani monitoring qilish vositalari
1) Prinsiplar: maqsadlardan asboblarga
SLO-first: mahsulot maqsadlari uchun vositalarni tanlang va moslashtiring (login, depozit, stavka), aksincha emas.
Open standards: OpenTelemetry (treyslar/metriklar/loglar), Prometheus exposition format, Loki JSON-loglar.
Yagona kontekst:’trace _ id ’/’ span _ id’log va metriklarda; linki «dashbord → treys → log».
Cost-aware: metriklarning kardinalligi, TTL loglari, sampling treyslari - oldindan.
2) Metrika: yig’ish, saqlash, vizualizatsiya
Сбор: Prometheus / Agent-режим (VictoriaMetrics Agent, Grafana Agent, OpenTelemetry Collector).
Omborxonalar (TSDB): Prometheus (single), Thanos/Cortex/Mimir (gorizontal kattalashtirish), VictoriaMetrics (CPU/RAM tejash).
Vizualizatsiya: Grafana «shisha panel» sifatida.
API (RED) va infratuzilma (USE) uchun nima o’lchash kerak:- RED: `rate(requests)`, `error_ratio`, `latency p95/p99` по `route`, `method`, `provider`.
- USE: CPU/Mem, file descriptors, connection pools, queue lag, GC pauses.
- k8s: kube-state-metrics, node-exporter, cAdvisor, ingress/gateway exporters.
- БД/кэши: postgres_exporter, mysql_exporter, redis_exporter, kafka_exporter, rabbitmq_exporter.
- Servis mash: Envoy metrics, istio/Linkerd dashboards.
- PSP/внешние: custom exporters (webhook success, PSP success ratio, callback latency).
promql
Depozitlarning muvaffaqiyati (SLI)
sum(rate(ig_payments_requests_total{route="/payments/deposit",status=~"2.."}[5m]))
/
sum(rate(ig_payments_requests_total{route="/payments/deposit"}[5m]))
p95 latency API histogram_quantile(0. 95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))
db_connections_in_use/ db_connections_max DB ulanish pulini toʻldirish3) Logi: qidirish, korrelyatsiya, o’zgarmaslik
Stek: OpenSearch/Elasticsearch + Beats/Vector/Fluent Bit yoki Grafana Loki (saqlash arzonroq, «log-kabi-oqim»).
Formati: JSON’ts, level, service, env, trace_id, user_pid, route, status, latency_ms'.
Amaliyotlar: audit uchun PII, WORM-baketalarni niqoblash, TTL/ILM siyosati,’env/region/brand’bo’yicha partiyalashtirish.
4) Trassirovka: millisekundlar yo’qotiladigan joylar
Стек: OpenTelemetry SDK/Collector → Jaeger/Tempo/Honeycomb/New Relic Traces.
Semplash siyosati: 100% xatolar, «sekin» so’rovlarga moslashtirish, 1-5% muvaffaqiyatli.
Теги iGaming: `provider`, `psp`, `risk_decision`, `bonus_id`, `market`, `ws_table_id`.
Debagning tezkor retsepti: SLO qizil ustunidan → muammoli yo’nalishdagi treys → PSP/o’yin provayderidagi «qalin» span → vebxuka log.
5) APM platformalari: «hammasi bir-biriga»
Tijorat yechimlari (Datadog, New Relic, Dynatrace, Grafana Cloud) APM, loglar, treyslar, sintetika, RUMlarni yopadi.
Ijobiy tomonlari: joriy etish tezligi, «qutidan» korrelyatsiya. Minuslar: qiymat/vendor-lok.
Gibrid: OSSdagi yadro (Prometheus + Grafana + Tempo + Loki), kritik floular bo’yicha tijorat modullari bilan sintetikani/alertingni «chizish».
6) Sintetika va RUM: «tashqi» va «o’yinchi nigohida»
Синтетика: Checkly, Grafana Synthetic Monitoring, k6 Cloud, Uptrends, Pingdom, Catchpoint, ThousandEyes.
Skriptlar: login → depozit (sandbox) → o’yinni boshlash → vebxukni tekshirish.
Geo: EU/LatAm/MEA/APAC, mobil tarmoqlar, ASN-miks.
RUM: Web-SDK (TTFB/LCP/CLS), mobil SDK; mamlakat/tarmoq/qurilma bo’yicha segmentatsiya.
7) Kubernetes-monitoring sirti
Control plane: etcd, API-server (apiserver_request_total, latency), scheduler/controllermanager.
Data plane: kubelet, CNI, ingress/gateway; `PodDisruptionBudget` и эвикшены.
Avtoskeyl: HPA/VPA/Cluster Autoscaler metrikasi va hodisalari; warm-pullar.
Tarmoq siyosati :/deny events, DNS latency.
8) Ma’lumotlar bazalari, navbatlar, keshlar
Postgres/MySQL: lag replikatsiya, deadlocks, bloat, WAL, checkpoint duration, taymautlar.
Kafka/RabbitMQ: consumer lag, rebalances, queue depth, redeliveries.
Redis: evictions, blocked clients, latency percentiles, replika-lag.
PITR/backaplar: backup operator vazifalari + «tiklangunga qadar vaqt» dashbord.
9) Tarmoq, CDN, WAF, o’yin provayderlari va PSP
CDN/Edge: hit-ratio, hududlar bo’yicha TTFB, shield hit, «o’tkazib yuborish bo’roni».
WAF/bot menejeri: share challenge/bloklar, ASN/mamlakatlar, FPR login/depozitda.
Game providers: stol/slotni ishga tushirish vaqti, nosozlik/studiyalar boʻyicha taymautlar.
PSP: success ratio/latency metod/mamlakat/BIN, xato kodlari 3DS/AVS, webhooks success & delay.
10) Alerting va navbatchilik
Routing: Alertmanager → PagerDuty/Opsgenie/Slack.
Qoidalar: simptomatik (SLO) + sababiy (resurslar).
Antishum: guruhlash, zanjirli alyortlarni bostirish, chiqarish uchun sukunat oynalari.
CDdagi SLO-geytlar: avto-pauza/buzilishlarda orqaga qaytish (Argo Rollouts/Flagger AnalysisRun).
Alert namunalari (soddalashtirilgan):- `login_success_ratio < 99. 9% for 10m`
- `p95 /payments/deposit > 0. 4s for 10m`
- `db_connections_saturation > 0. 85 for 5m`
- `kafka_consumer_lag > 30s`
- `cdn_hit_ratio drop > 15% in 10m (per region)`
11) Haqiqatan ham yordam beradigan dashbordlar
Depozit flou: huni, p95/p99, PSP/BIN/mamlakatlar boʻyicha xatolar, vebxuklarning kechikishi.
Live-o’yinlar/WS: ulanishlar, RTT, resend/reconnect, provayderlar xatolari.
API salomatligi: Yo’nalishlar bo’yicha RED, saturations, top slow endpoints trays.
DR paneli: replication lag, WAL shipping, DR mintaqasidan synthetic login/deposit.
Security: WAF, bot score, 401/403 anomaliyalar, imzolangan vebxuklar.
12) Telemetriya qiymatini boshqarish
Metriklarning kardinalligi:’user _ id’ni yorliqlarga kiritmang, limitlar’route’va’provider’.
Downsampling va retention-klasslar (issiq 7-14 kun, issiq 30-90, sovuq arxiv).
Logi: hodisalar sakrashi - sampling/dedupni yoqing; stacktrace alohida saqlang.
Treyslar: «qimmat» yo’llar bo’yicha dinamik sampling (to’lovlar/xulosalar).
13) Monitoringda xavfsizlik va maxfiylik
agentlardan kollektorlargacha mTLS; at-rest shifrlash.
’user _ pid’ taxallusi, elektron pochta/telefon/hujjatlarni log’larda taqiqlash.
audit uchun RBAC/MFA, WORM; uchinchi tomon monitoring provayderlari bilan DPA.
14) CI/CD va avtootkat bilan integratsiya
SLI ekspozitsiyasi CD-tahlillar uchun prom-metrik sifatida.
Release labels (’version’,’rollout _ step’) metrik/loglar/treyslarda.
Avtomatik kanar geytlari: deploy faqat yashil SLOlarda davom etadi.
15) Tezkor start-stek (referens)
To’lov/transport: HOTEL Collector + Prometheus/VM Agent + Fluent Bit.
Ombor: VictoriaMetrics/Thanos (metriklar), Loki/OpenSearch (loglar), Tempo/Jaeger (treyslar).
Vizualizatsiya: Grafana + tayyor dashbordlar k8s/Envoy/Postgres.
Sintetika & RUM: Checkly/k6 + Grafana RUM (yoki tijorat analogi).
Alerting: Alertmanager → PagerDuty/Slack; bogʻlarda runbooks.
16) Joriy etish chek-varaqasi (prod-ready)
- Login/depozit/stavka/chiqarish uchun SLO/SLI aniqlangan.
- RED/USE + biznes-SLI metrikalari; yorliqlarning yagona ontologiyasi.
- JSON’trace _ id’log’i, audit uchun PII, WORM niqoblash.
- OpenTelemetry end-to-end; xatolarni semplash 100%.
- Asosiy mintaqalardan sintetika + RUM prodda.
- «Flou depozit», «WS», «API salomatligi», «DR» dashbordlari.
- Alerting: SLO simptomlari + resurs sabablari; antishum.
- SLO-geytlar CDga ulangan; avtootkat.
- Qiymat rejasi: retenshen/semplash/kardinallik.
- DPA/xavfsizlik: mTLS, RBAC, log maxfiyligi.
Xulosa
Kuchli monitoring - bu «chiroyli grafiklar» to’plami emas, balki aloqa tizimi: RED/USE metriklari,’trace _ id’, OpenTelemetry-trastirovkalar, sintetik va RUM, dashbordlar, alerting va CI/CD-ga o’rnatilgan SLO-geytlar. Ochiq standartlar atrofida stekni yig’ing, telemetriya narxini nazorat qiling va yorliqlar ontologiyasini standartlashtiring - shunda API va infratuzilma bilan bog’liq har qanday muammolar oldindan ko’rinadi va o’yinchilar ularni payqamasdan tuzatiladi.
