AI RESEARCH
Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures
arXiv CS.AI
•
ArXi:2603.21415v1 Announce Type: new As large language models are deployed as autonomous agents with tool execution privileges, a critical assumption underpins their security architecture: that model errors are detectable at runtime. We present empirical evidence that this assumption fails for two of three instruction-following models evaluable for conflict detection. We