I benchmarked code retrieval for AI coding agents on 60 tasks

Dev.to AI
Generative AI AI Safety AI Research AI Tools

A tuned grep beat my MCP code-intelligence server on F1 by 9 points. I'm publishing the result anyway. Here's why. Why this benchmark exists I've spent the last six months building sverklo, a local-first MCP server that gives AI coding agents (Claude Code, Cursor, Windsurf) a real symbol graph instead of grep-based pattern matching. The product positioning has always been "stops the agent from hallucinating function names that don't exist in your codebase." That positioning is hand-wavy without numbers. Six months in, I had no public benchmark.