CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

ArXi:2605.05023v1 Announce Type: new Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet ing diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention.