AI RESEARCH
Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel
arXiv CS.LG
•
ArXi:2604.13327v1 Announce Type: cross Modern GPU workloads, especially large language model (LLM) inference, suffer from kernel launch overheads and coarse synchronization that limit inter-kernel parallelism. Recent megakernel techniques fuse multiple operators into a single persistent kernel to eliminate launch gaps and expose inter-kernel parallelism, but struggle to handle dynamic shapes and data-dependent computation in real workloads. We present Event Tensor, a unified compiler abstraction for dynamic megakernels.