ORION: ORthonormal Text Encoding for Universal VLM AdaptatION

ArXi:2602.19530v2 Announce Type: replace Vision language models (VLMs) have nstrated remarkable generalization across diverse tasks, yet their performance remains constrained by the quality and geometry of the textual prototypes used to represent classes. Standard zero shot classifiers, derived from frozen text encoders and handcrafted prompts, may yield correlated or weakly separated embeddings that limit task specific discriminability. We