Casino rating according to the aggregate assessment of experts
1) Why do you need an "expert" rating
User voices are useful, but subject to cheats and emotions. Experts are a filter of professional criteria: licenses and compliance, payments, game integrity, live stream quality, support, RG tools, UX and reputation. Cumulative assessment allows:- Reduce heterogeneous opinions into one numerical metric.
- Take into account the competence of the expert in a specific criterion.
- Ensure repeatability and auditability of results.
2) Expert panel: how to shape
Selection criteria: experience ≥ 3 years in the domain (regulatory, payments, live technologies, support, RG/compliance), no conflict of interest.
Quotas: at least 7-12 experts covering different domains (law/compliance, payments, live-ops, UX/A11y, data).
Declarations: NDA + declaration of affiliation; experts with conflicts of interest rate all but related brands.
Calibration: Run 3-5 reference cases together to level the scale.
3) Rubricator and weights (base model example)
Sum of weights = 1. 00.
4) Assessment scale and expert form
Each expert (e) puts a score (r_{e,k }\in [0; 100]) by public checklist (subcriteria with prompts and thresholds).
Examples of prompts:- Payments: output p95 ≤ 24 h = 90-100; 24-72 h = 70-89;> 7 days = 0-30.
- Live: e2e (95p) ≤ 2. 5 c = 90–100; 2. 6–4. 0 = 70–89; >6. 0 = 0–30.
- RG: limits/timeout/self-exclusion of 1-2 tapas = 90-100; no self-exclusion = ≤ 40.
5) Normalising and tackling 'generous/strict' experts
1. Standardization by expert (z-scores):[
z_{e,k} = \frac{r_{e,k} - \mu_e}{\sigma_e+\epsilon}
]
where (\mu _ e ,\sigma _ e) is the average and RMS of all points scored by the expert (for all casinos/criteria).
2. Inverse conversion to [0; 1]:[
s_{e,k} = \Phi(z_{e,k})
]
where (\Phi) is the standard normal CDF.
3. Emission limit: winsorize on 5-95 percentiles before standardization.
6) Weighing competence and reliability experts
Final weight of the expert (w_e) - mixture:- Competence in criterion (k): (c_{e,k}\in[0; 1]) (declared and confirmed by cases/portfolios).
- Reliability of consent: for example, contribution through the α of Crippendorf/ κ Cohen; above agreement → above weight.
- Activity and completeness: penalty for omissions> 10% marks.
[
W_{e,k} = \lambda_1 c_{e,k} + \lambda_2 \underbrace{\text{Reliab}e}{\text{по α/κ}} + \lambda_3 \text{Coverage}e
]
(usually (\lambda _ 1 = 0. 6,\ \lambda_2=0. 3,\ \lambda_3=0. 1)), then normalize (\sum _ e W {e, k} = 1).
7) Aggregation by criterion and total casino score
1. Criterion scores:[
S_{k} = \sum_{e} W_{e,k}, s_{e,k}
]
2. Final casino score:
[
\text{Score} = \sum_{k} \omega_k, S_{k}
]
where (\omega _ k) are the weights from the rubricator.
3. Confidence interval (bootstrap according to experts): 10k remutations → p5-p95 for Score.
8) Ranking: Sustainable practices
Weighted amount (default). Simple, transparent.
Borda rule (for pure rank). Total points by expert positions; resistant to "wound" points.
Bayesian smoothed estimate:[
\hat{\theta}i = \frac{\sum_e w_e, r{e,i} + m\mu_0}{\sum_e w_e + m}
]
where (m) is the prior force, (\mu _ 0) is the global mean. Useful for different numbers of ratings.
Paired comparisons (BTL/Plackett-Luce). If experts rank rather than score.
9) Example of mini-calculation (3 casinos × 3 criteria × 4 experts)
Let after normalization and weighting by competence obtained (S_k):10) Reliability and consistency of experts
Krippendorf α (universal for interval scales): ≥ 0. 8 - excellent; 0. 67–0. 8 - acceptable; below - revision of headings/calibration.
Cohen/Fliss κ - if the scale is discrete.
Rater drift: compare the early/late half of the questionnaires; at drift - recalibration, expert weight reduction.
11) Anti-manipulation measures
Blind assessment: experts do not see other people's points and the branding of the "customer."
Randomizing the order of casino cards.
Conflict control: auto-exclude expert from related brands.
Anomalies: Grubbs/ESD emissions test for each criterion; sharp discrepancies → manual verification.
Edit history: Any change after the fact is recorded in the changelog with a reason.
12) Publication transparency
Methodology: public weights, formulas, update date, panel composition (without personal data - roles/experience/domains).
Casino passports: expanded cards - sources, excerpts from the rules, RG/limits screen, live quality metrics.
Errors: Publish confidence intervals and a "draw" flag.
Operator appeals: SLA response, list of acceptable documents (license, regulatory letters, audit reports).
13) Updates and rating life
Frequency: basic recalculation monthly; unscheduled - when changing the license, regulator fines, mass payment/security incidents.
Versioning: vYYYY. MM, public diff (what changed and why).
Deactivations: the casino is removed from publication if the license is "suspended" - until clarified.
14) Model extensions (when "up")
Regional sub-ratings: their weights/norms for Ontario, EU, LatAm, etc.
Multicriteria Analysis (MCDA): TOPSIS/MAUT as an alternative to simple sum.
Hybrid with RUM data: automatic live-quality metrics (e2e/startup/rebuffering) are added as an "expert sensor" with a separate weight.
Explainability: Shapley-decomposition of the contribution of criteria to the final score.
15) Frequent mistakes and how to avoid them
Mixing jurisdictions into one scale. Make regional versions.
Opaque weights. Publish and argue; changes - only through changelog.
Ignore scatter. Write confidence intervals, do not hide the "draw."
Skewing one domain. Balance the panel and use competent weights.
One expert "drags" the assessment. Limit the contribution of one rating to the caps threshold (for example, ≤ 25% in the criteria).
16) Checklists
For organizers
- Panel 7-12 experts, roles/domains covered
- Rubricator and weights published
- Calibration on standards; α ≥ 0. 67
- Normalization (z/MAD), winsorize, abatement
- Competency weights (W_{e,k}) and caps by contribution
- Bootstrap and confidence intervals
- Changelog, appeals, casino passports
For readers
- Update Date and Rating Version
- Methodology and weights available
- Errors and sources visible
- Verification of legality in your country - mandatory
17) Casino Public Card Template (Recommended)
Final score + interval (p5-p95)
Strengths: 2-3 bullets (by criteria)
Risks/limitations: 2-3 bullets
Docking basis: license (no., regulator), RG tools, payments (p95 output), live metrics
Changes for version vYYYY. MM: what is improved/impaired
Aggregate peer review is a procedure, not "editorial taste." A clear panel, transparent weights, normalization, stable aggregation methods and publication of errors turn subjective opinions into a reliable, repeatable rating. This rating helps players choose safely and consciously, and operators understand what to improve in order to honestly raise their score.