AI RESEARCH

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

arXiv CS.AI

ArXi:2603.01960v2 Announce Type: replace-cross TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled $K,V$ streaming.