Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

ArXi:2603.05781v1 Announce Type: cross Dense image retrieval is accurate but offers limited interpretability and attribution, and it can be compute-intensive at scale. We present \textbf{BM25-V}, which applies Okapi BM25 scoring to sparse visual-word activations from a Sparse Auto-Encoder (SAE) on Vision Transformer patch features.