AI RESEARCH
Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols
arXiv CS.LG
•
ArXi:2604.18245v1 Announce Type: new Large language models are increasingly deployed as protocols: structured multi-call procedures that spend additional computation to transform a baseline answer into a final one. These protocols are evaluated only by end-to-end accuracy, giving limited insight into when they help, when they hurt, and whether their behavior transfers under distribution shift or composition. We propose a paired-outcome measurement interface for auditing a single protocol step on exact-match tasks.