Gemma 4 E4B vs Qwen3.5-4B on document tasks: Qwen wins the benchmarks, but the sub-scores tell a different story

Results live here: Ran both through the IDP Leaderboard (OlmOCR Bench, OmniDocBench, IDP Core) and the headline numbers aren't the interesting part. Top-line scores: Benchmark Gemma 4 E4B Qwen3.5-4B OlmOCR 47.0 75.4 OmniDoc 59.7 67.6 IDP Core 55.0 74.5 Qwen wins all three. On OlmOCR the gap is 28 points. Open and shut, right? Not quite. Drill into IDP Core: Sub-task Gemma 4 E4B Qwen3.5-4B OCR (raw text recognition) 74.0 64.7 KIE (structured extraction) 11.1 86.0 Table 55.0 76.7 VQA 65.3 72.4 Gemma reads text from documents better than Qwen.