Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

ArXi:2605.12398v1 Announce Type: new Estimating question difficulty is a critical component in evaluating and improving large language models (LLMs) for question answering (QA). Existing approaches often rely on readability formulas, retrieval-based signals, or popularity statistics, which may not fully capture the reasoning challenges posed to modern LLMs. In this paper, we