TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

ArXi:2604.12891v1 Announce Type: new Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we