AI RESEARCH
FAR: Function-preserving Attention Replacement for IMC-friendly Inference
arXiv CS.AI
•
ArXi:2505.21535v4 Announce Type: replace-cross While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework that substitutes all attention in pretrained DeiTs with sequential modules inherently compatible with IMC dataflows.