GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations

ArXi:2605.17727v1 Announce Type: new Frozen vision-language embeddings contain signals at multiple semantic resolutions, from object identity to attributes, relations, and full-caption meaning, but they expose these signals through a fixed-length vector interface. We study whether embedding length can be turned into a controllable semantic access interface. We propose \textbf{GraSP-VL}, which learns a shared near-orthogonal prefix transform over frozen VLM embeddings.