Registers Matter for Pixel-Space Diffusion Transformers

ArXi:2605.16147v1 Announce Type: new Vision Transformers (ViTs) are known to exhibit high-norm patch-token outliers that degrade feature map quality, a problem effectively mitigated by \textit{register tokens}. As diffusion models increasingly adopt transformer architectures and move toward pixel-space