A Benchmark Suite of Reddit-Derived Datasets for Mental Health Detection

ArXi:2604.23458v1 Announce Type: cross The growing availability of online groups has opened up new windows to study mental health through natural language processing (NLP). However, it is hindered by a lack of high-quality, well-validated datasets. Existing studies have a tendency to build task-specific corpora without collecting them into widely available resources, and this makes reproducibility as well as cross-task comparison challenging.