Parallel In-context Learning for Large Vision Language Models

ArXi:2603.16092v1 Announce Type: cross Large vision-language models (LVLMs) employ multi-modal in-context learning (MM-ICL) to adapt to new tasks by leveraging nstration examples. While increasing the number of nstrations boosts performance, they incur significant inference latency due to the quadratic computational cost of Transformer attention with respect to the context length. To address this trade-off, we propose Parallel In-Context Learning (Parallel-ICL), a plug-and-play inference algorithm. Parallel-ICL partitions the long nstration context into multiple shorter, manageable chunks.