AI RESEARCH

Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model

arXiv CS.AI

ArXi:2603.28554v1 Announce Type: cross Visual document understanding typically requires separate retrieval and generation models, doubling memory and system complexity. We present Hydra, a dual-head approach that provides both ColBERT-style late-interaction retrieval and autoregressive generation from a single vision-language model