AI RESEARCH
Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models
arXiv CS.AI
•
ArXi:2605.18824v1 Announce Type: cross Evaluation of foundation models often rely on aggregate scores from benchmarks that lack comprehensive coverage and metadata for a fine-grained evaluation. We