AI RESEARCH
MentalBench: A DSM-Grounded Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
arXiv CS.CL
•
ArXi:2602.12871v2 Announce Type: replace Large language models (LLMs) have attracted growing interest as ive tools for psychiatric assessment and clinical decision. However, existing mental health benchmarks largely rely on social media data or ive dialogue settings, limiting their ability to assess whether models can apply formal diagnostic criteria and differential diagnostic rules. In this paper, we