Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

ArXi:2603.15916v1 Announce Type: cross When LLM agents autonomously design ML experiments, do they perform genuine architecture search -- or do they default to hyperparameter tuning within a narrow region of the design space? We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5 Pro) across a combinatorial configuration space of 108,000 discrete cells for dashcam collision detection over 27 days.