AI RESEARCH

Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

arXiv CS.AI

ArXi:2604.13061v1 Announce Type: cross Large language models (LLMs) are increasingly deployed in high-stakes autonomous and interactive workflows, where reliability demands continuous, multi-turn coherence. However, current evaluation methods either rely on post-hoc semantic judges, measure unidirectional token confidence (e.g., perplexity), or require compute-intensive repeated sampling (e.g., semantic entropy