Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades

ArXi:2604.14251v1 Announce Type: new Monitoring LLM safety at scale requires balancing cost and accuracy: a cheap latent-space probe can screen every input, but hard cases should be escalated to a expensive expert. Existing cascades delegate based on probe uncertainty, but uncertainty is a poor proxy for delegation benefit, as it ignores whether the expert would actually correct the error. To address this problem, we