MathDuels: Evaluating LLMs as Problem Posers and Solvers

ArXi:2604.21916v1 Announce Type: new As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We