QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals

ArXi:2604.15859v1 Announce Type: cross Forecasting has become a natural benchmark for reasoning under uncertainty. Yet existing evaluations of large language models remain limited to judgmental tasks in simple formats, such as binary or multiple-choice questions. In practice, however, forecasting spans a far broader scope. Across domains such as economics, public health, and social graphics, decisions hinge on numerical estimates over continuous quantities, a capability that current benchmarks do not capture.