AI RESEARCH

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

arXiv CS.AI

ArXi:2603.10026v1 Announce Type: cross Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. However, for cascaded reduction operations involving multiple loops with inter-loop data dependencies, such as the safe softmax followed by GEMM within attention mechanisms, existing compilers lack effective automated fusion and kernel generation capabilities.