ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules

ArXi:2603.29928v1 Announce Type: new Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions yet prevailing regression benchmarks evaluate them almost exclusively via point estimate metrics RMSE R2 These aggregate measures often obscure model performance in the tails of the distribution a critical deficit for high stakes decision making in domains like finance and clinical research where asymmetric risk profiles are the norm We