How AI predicts lead conversion
The lead conversion forecast answers two questions: who is most likely to convert and what to do with this forecast (rate, priority, processing route). The key is not "algorithm for the sake of algorithm," but pure events, correct attribution and operational rules: how you use speed - in media booking, anti-fraud, scoring of applications or CRM.
1) Database and events (minimum)
Targets (label): binary 'y ∈ {0,1}' - whether the target conversion occurred in horizon T (for example, 'FTD in 14 days', 'purchase in 7 days', 'demo→platnyy in 30 days').
Raw sources:- Marketing: UTM/channel/creative/site, click/show time.
- Behavior: page/screen views, depth, speed, funnel events.
- Reg/questionnaire: form fields, CUS/verafication (if applicable), lags between steps.
- Payments/product: statuses, amounts, payment methods (without PII in URL).
- Technique: device/OS/browser, network/IP/ASN, delays, errors.
Time rules: all labels - UTC; for training, we consider features only from the past relative to the event label (no lycage).
2) Fici (which really helps)
Pre-conversion RFM surrogates:- Recency (click/reg time to now), Frequency (events/sessions), Monetary proxy (depth or value of micro-events).
- Channel/creative: 'source/medium/campaign/content/term', 'placement', 'creative _ id'.
- GEO and locale: country/currency/language (categorical with target coding).
- Device/technique: 'device/os/browser', speed, loading errors, form visibility.
- Funnel lags: 'time _ to _ reg', 'time _ to _ verify', 'time _ to _ payment _ init'.
- Lead quality: completeness of the questionnaire, geo↔platyozh matches, behavioral anomalies.
- Anti-fraud signals: IP/ASN scoring, velocity, cuckles/server-side markers.
- Season/time: day of the week, hour, campaign/promotional periods.
3) Algorithms and when to choose them
Logistic regression is fast, interpretable, excellent as a baseline and for production rules (montonic restrictions).
Gradient boosting (XGBoost/LightGBM/CatBoost) is the de facto standard: it works with tabular data, categorical and imbalance.
Neural networks/TabNet - justified with very large and diverse data (combination of nameplate + text/images).
Uplift models - if we want to predict the increase in conversion from the impact (campaign/bonus), and not the conversion itself.
Class imbalance: Use 'class _ weight', 'focal loss', or 'AUC-PR' as the primary metric; do not "inflate" the minor class unnecessarily.
4) Validation: time only
Divide train/valid/test by time (rolling/forward split), otherwise "spy on the future." For online - A/B or geo-holdout: part of the traffic works according to the rules of the model, part - according to the baseline.
5) Quality metrics (and why they are)
AUC-ROC - overall ranking potential.
AUC-PR - critical for imbalance.
LogLoss/Brier - fines for poor probability calibration.
Calibration (Reliability curve, ECE) - probability 0. 3 should mean "conversion in ~ 30% of cases."
Lift/KS/Top-bucket hit rate - an increase in the top N% of ranked leads (shows business value).
Decision-metrics: Precision@k, Recall@k, Cost-aware gain (см. ниже).
6) Probability calibration
Most boosts "over/under" probabilities. Use Platt scaling or Isotonic regression for validation. Check calibration in segments (channel/geo/device) - shifts are common.
7) How to turn speed into money (deciding)
7. 1. Value function
Let'p (x) 'be the conversion probability,' V'be the expected value (NGR/LTV) of the conversion,' C'be the contact/bid/handling cost.
Expected margin is' EM (x) = p (x)· V − C '.
Show ads/raise bid/send lead to priority only if 'EM (x)> 0'. Threshold 'p = C/V'.
7. 2. Three levels of application
Media publishing: 'bid ∝ p (x) × E [V]' at the specified target Payback/ROAS.
Scoring applications (call center/CRM): we prioritize queues by 'p (x)' and 'EM (x)'; "cheap" leads with high'p '→ auto-processing, "expensive" leads with low'p' → postpone/exclude.
Personalization: triggers/bonuses only where the expected increase is positive (uplift, and not "stimulate those who would have bought it anyway").
8) Economic evaluation of the model
Simulate profit curve: sort leads by 'p (x)', pass the threshold from top to bottom and count 'profit = Σ (p· V − C)' to the k-th percentage of the sample. We take the threshold at the maximum of the curve. Add contact costs (manager/call), frequency ceilings, and compliance constraints (age/GEO/consent).
9) Dealing with liquor and displacement
Liquidge: Exclude features that occur after the targeted point or "prompt" the outcome (for example, the fact of KYC, if the goal is to pass KYC).
Channel offsets: different GEO/sources → different baseline conversions. Use stratification/cross validation by segment + calibration.
Data drift: monitor PSI/category share, weekly AUC/LogLoss, out-of-range feature.
10) Interpretation and trust
SHAP/feature importance - show top factors at the level of dataset and specific lead.
Montonicity - for "common" features (for example, the more engagement, the higher the probability), monotonic restrictions can be fixed.
Decision log - log "why the lead was prioritized/excluded."
11) MLOps and operation
Pipeline: sbor→ochistka→fichi→obucheniye→kalibrovka→deploy (API/script) →monitoring.
Online metrics: p95 latency scoring, uptime,% errors, share of unprocessed leads.
Quality monitoring: AUC/PR, calibration, drift, business metrics (ROI/Payback by speed buckets).
Model rotation: schedule (e.g. monthly) + alert on degradation.
12) Examples of rules (pseudo)
Call center prioritization:- `p ≥ 0. 6 '→ call for 5 minutes, experienced agent.
- `0. 3 ≤ p < 0. 6 '→ auto-communication + call again in 2 hours.
- `p < 0. 3 'and' C _ contact 'high → digital warm-up, no call.
- 'bid = base_bid × (p/ p_target) 'with restrictions' min/max bid ', dayparting and mouthguards.
13) Experiments and proof of benefit
A/B by lead: measure not only conversion, but also profit/lead, processing time, lead value.
Geo-split: If the call center is limited, experiment on geographic clusters.
Sliding window: fix the horizon of the metric (for example, D14) and wait for filling without peeping ahead of time.
14) Compliance, privacy and ethics
Consent/Privacy: No PII in UTM/URL, custom consents are factored into targeting.
Fairness: Do not use sensitive signs; audit segments for "skew."
Responsible Marketing: correct disclaimers, age/geo-rules, communication frequency limits.
15) Frequent errors
1. Click/EPC optimization instead of conversion and profit.
2. Incorrect split (random instead of temporary) → overestimated offline speed.
3. Incorrect thresholds and bad decisions are → without calibration.
4. Likij in fiches → a "magically" high AUC, zero online effect.
5. No cost control (C_contact, cap) - margin goes away.
6. Lack of A/B is a model "on the shelf," business does not believe.
7. Unaccounted drift - speed gets old, profits fall.
16) Implementation checklist
- Label and horizon T defined, business rules agreed.
- Time split and basic baseline (logreg).
- Fluid-free features: RFM, lags, channel/creative, device/geo, technology.
- Boosting + calibration (Platt/Isotonic), AUC-PR/LogLoss/Calibration metrics.
- Profit curve and threshold'p = C/V '.
- Integration: call center/CRM/bid rules, guardrails and decision logs.
- A/B or geo-holdout, online profit metrics.
- Drift monitoring, rotation regulations.
17) 30-60-90 plan
0-30 days - Frame and baseline
Describe the goal and horizon, collect features without liquid, make a baseline (logreg).
Set up time validation, calibration, profit curve and initial threshold.
Prepare integration (API/script) and dry run on history.
31-60 days - Model in sale
Enable boosting (LightGBM/CatBoost), calibration, SHAP reports.
Run A/B (or geo-holdout) on 20-30% of traffic.
Include prioritization/biding rules, guardrails, decision logs.
61-90 days - Scale and sustainability
Expand segments and channels, implement uplift where incentives/bonuses are available.
MLOps: drift monitoring, scoring SLA, rotation plan.
Retro weekly: adjusting thresholds, updating features and dictionaries.
The AI conversion forecast works when you correctly formulate the goal, build a temporary validation, calibrate the probability and turn the speed into a monetary solution: rate, priority, route. Add MLOps, A/B confirmation and guardrails on compliance - and the model will cease to be a "decoration," but will become an operational tool that speeds up the funnel, reduces the cost of sale and increases profits.