MiroThinker H1 tops GPT 5.4, Claude 4.6 Opus on BrowseComp; its 3B param open source variant beats GPT 5 on GAIA

r/singularity
Generative AI

Was reading through the MiroThinker paper (arXi:2603.15726) and two things jumped out at me that I think are worth discussing. First, the BrowseComp results. MiroThinker H1 scores 88.2, beating Gemini 3.1 Pro at 85.9, Claude 4.6 Opus at 84.0, and GPT 5.4 at 82.7. On GAIA the gap is even wider: 88.5 vs GPT 5's 76.4. These are strong results for a browsing agent, but I want to be upfront that it doesn't dominate everywhere. On SUPERChem, Gemini 3 Pro leads comfortably (63.2 vs 51.3). On Humanity's Last Exam, both Seed 2.0 Pro (54.2) and Claude 4.6 Opus (53.1) beat it at 47.7.