Autonomous agents get more reliable when you stop treating the prompt as the execution layer

One of the most common mistakes in agent system design is treating the prompt as the main control surface for execution behavior. It works fine for s. It falls apart on real long-running work. I spent a significant amount of time hardening an autonomous execution engine against the failure modes that actually matter in practice: models that skip required tools, produce plausible-looking incomplete output, and claim they cannot do things the telemetry proves they could. Here is what the failure actually looks like before you harden against it.