DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection

ArXi:2603.23455v1 Announce Type: new Multi-Modal LLMs (MLLMs) nstrate strong visual grounding capabilities on popular object detection benchmarks like OdinW-13 and RefCOCO. However, state-of-the-art models still struggle to generalize to out-of-distribution classes, tasks and imaging modalities not typically found in their pre-