Mistral Small 4 vs Qwen3.5-9B on document understanding benchmarks, but it does better than GPT-4.1

Ran Mistral Small 4 through some document tasks via the Mistral API and wanted to see where it actually lands. This leaderboard does head-to-head comparisons on document tasks: The short version: Qwen3.5-9B wins 10 out of 14 sub-benchmarks. Mistral wins 2. Two ties. Qwen is rank with 77.0, Mistral is rank with 71.5. OlmOCR Bench: Qwen 78.1, Mistral 69.6. Qwen wins every sub-category. The math OCR gap is the biggest, 85.5 vs 66. Absent detection is bad on both (57.2 vs 44.7) but Mistral is worse. OmniDocBench: closest of the three, 76.7 vs 76.4.