The Harness Problem Is Real — And the Edit Tool Is Where It Starts

The debate is framed wrong. Every week someone publishes a benchmark comparing GPT-5.x vs Claude Opus vs Gemini on SWE-bench. The implicit assumption: the model is the variable that matters. Pick the best model, your coding agent works better. But a benchmark published last month broke that assumption cleanly. Grok Code Fast went from 6.7% to 68.3% on a real-world coding task - not because the model changed, not because of a new