AI RESEARCH
PROXIMA: A Reliability Scoring Framework for Proxy Metrics in Online Controlled Experiments
arXiv CS.LG
•
ArXi:2604.14352v1 Announce Type: cross Online A/B testing at scale relies on proxy metrics -- short-term, easily-measured signals used in place of slow-moving long-term outcomes. When the proxy-outcome relationship is heterogeneous across user segments, aggregate correlation can mask directional failures akin to Simpson's Paradox, leading to costly ship/no-ship errors. We