Flexible and Efficient Spatio-Temporal Transformer for Sequential Visual Place Recognition

ArXi:2510.04282v2 Announce Type: replace Sequential Visual Place Recognition (Seq-VPR) leverages transformers to capture spatio-temporal features effectively. In practice, a transformer-based Seq-VPR model should be flexible to the number of frames per sequence (seq- length), deliver fast inference, and have low memory usage to meet real-time constraints. However, existing approaches prioritize performance at the expense of flexibility and effi- ciency.