A Geometric Perspective on Robustness in Vision Transformers [R]

Hi everyone! I'm sharing a paper I've been working on that investigates how different positional encoding schemes (learned absolute, sinusoidal, and rotary) shape the internal representations of Vision Transformers, and how these representations relate to robustness under distributional shift. Paper PDF: Positional embeddings (PEs) in Vision Transformers (ViTs) are known to impact performance and robustness, but their role in shaping internal spatial representations is not well understood.