MentalBench: A DSM-Grounded Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models

ArXi:2602.12871v2 Announce Type: replace Large language models (LLMs) have attracted growing interest as ive tools for psychiatric assessment and clinical decision. However, existing mental health benchmarks largely rely on social media data or ive dialogue settings, limiting their ability to assess whether models can apply formal diagnostic criteria and differential diagnostic rules. In this paper, we