AI RESEARCH

[R] IDP Leaderboard: Open benchmark for document AI across 16 VLMs, 9,000+ documents, 3 benchmark suites

r/MachineLearning

We're releasing the IDP Leaderboard, an open evaluation framework for document understanding tasks. 16 models tested across OlmOCR, OmniDoc, and our own IDP Core benchmark (covering KIE, table extraction, VQA, OCR, classification, and long document processing). Key results: - Gemini 3.1 Pro leads overall (83.2) but the margin is tight. Top 5 within 2.4 points. - Cheaper model variants (Flash, Sonnet) produce nearly identical extraction quality to flagship models. The differentiation only appears on reasoning-heavy tasks like.