AI RESEARCH

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

r/MachineLearning

Writeup of an emergent behavior I observed in production. Posting here for methodological critique and pointers to related work. Context: a conversational AI system (single-tool tool schema with 5 enumerated action types, each with explicit description). Observed across ~2,400 messages, the model uses the enum correctly most of the time. When it deviates, the deviation is the point of interest.