From Comprehension to Reasoning: A Hierarchical Benchmark for Automated Financial Research Reporting

ArXi:2603.19254v1 Announce Type: new Large language models (LLMs) are increasingly used to generate financial research reports, shifting from auxiliary analytic tools to primary content producers. Yet recent real-world deployments reveal persistent failures--factual errors, numerical inconsistencies, fabricated references, and shallow analysis--that can distort assessments of corporate fundamentals and ultimately trigger severe economic losses.