SCAN: Structured Capability Assessment and Navigation for LLMs

ArXi:2505.06698v4 Announce Type: replace Evaluating Large Language Models (LLMs) has become increasingly important, with automatic evaluation benchmarks gaining prominence as alternatives to human evaluation. While existing research has focused on approximating model rankings, such benchmarks fail to provide users and developers with a comprehensive and fine-grained understanding of a specific model's capabilities.