Impermanent: A Live Benchmark for Temporal Generalization in Time Series Forecasting

ArXi:2603.08707v1 Announce Type: new Recent advances in time-series forecasting increasingly rely on pre-trained foundation-style models. While these models often claim broad generalization, existing evaluation protocols provide limited evidence. Indeed, most current benchmarks use static train-test splits that can easily lead to contamination as foundation models can inadvertently train on test data or perform model selection using test scores, which can inflate performance. We