Token-Efficient Multimodal Reasoning via Image Prompt Packaging

ArXi:2604.02492v1 Announce Type: cross Deploying large multimodal language models at scale is constrained by token-based inference costs, yet the cost-performance behavior of visual prompting strategies remains poorly characterized. We