Multi-label Instance-level Generalised Visual Grounding in Agriculture

ArXi:2603.06699v1 Announce Type: new Understanding field imagery such as detecting plants and distinguishing individual crop and weed instances is a central challenge in precision agriculture. Despite progress in vision-language tasks like captioning and visual question answering, Visual Grounding (VG), localising language-referred objects, remains unexplored in agriculture.