HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

ArXi:2604.17237v1 Announce Type: cross Decoding-free reranking methods that read relevance signals directly from LLM attention weights offer significant latency advantages over autoregressive approaches, yet suffer from attention score homogenization: middle-context documents receive near-identical scores, destroying the fine-grained distinctions required for ranking.