Production Readiness — Churn Prediction v2

Deployment Status

READY

Shadow mode complete (7 days). Canary at 10% approved. High-LTV recall gate failing — deploy with segment-level monitoring and automated rollback trigger at recall < 0.68.

Serving Configuration

Serving mode	Batch daily
Inference cadence	02:00 UTC, nightly
Artifact format	ONNX (churn-xgb-v2.onnx)
Infra	AWS Batch + S3 output
Prediction volume	~42k customers/run
Output sink	Redshift predictions table

Monitoring SLOs

Monitor	Threshold	Alert
Input feature drift (PSI)	< 0.20	Active
Prediction distribution shift	KL < 0.15	Active
Batch job success	100%	Active
High-LTV segment recall	≥ 0.68 (alert) / ≥ 0.75 (gate)	Watchlist
Null rate in key features	< 2%	Active

Rollout Strategy

Shadow Mode — Completed

Run model in parallel with v1 for 7 days. Compared prediction distributions, validated no systematic bias vs v1 baseline. No rollback triggered.

Done · 2026-06-15

Canary — 10% traffic · In progress

Route 10% of customer scoring to v2. Monitor recall on High-LTV segment daily. Automated rollback if recall drops below 0.68 for 2 consecutive runs.

Active since 2026-06-20

Promote to 100% — Gated on recall gate

Full promotion requires High-LTV recall ≥ 0.75 sustained for 5 days at canary. Current value: 0.71. Estimated gate pass: 1–2 sprints with feature engineering improvement (planned for Sprint 15).

Blocked on recall gate

Rollback Plan

✓Automated trigger: High-LTV recall < 0.68 for 2 consecutive nightly runs → auto-revert to churn-xgb-v1 in Redshift pipeline
✓Manual trigger: On-call engineer can flip ACTIVE_MODEL env var from v2 → v1 with no deploy required (SSM parameter)
✓v1 artifact retained: churn-xgb-v1.onnx pinned in S3 model registry for 90 days post-promote
✓Runbook: Notion: Churn Model Rollback Runbook — last updated 2026-06-18