TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

ArXi:2603.05764v1 Announce Type: cross Autonomous coding agents can produce strong tabular baselines quickly on Kaggle-style tasks. Practical value depends on end-to-end correctness and reliability under time limits. This paper