Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models

ArXi:2605.18824v1 Announce Type: cross Evaluation of foundation models often rely on aggregate scores from benchmarks that lack comprehensive coverage and metadata for a fine-grained evaluation. We