FOMO-3D: Using Vision Foundation Models for Long-Tailed 3D Object Detection

ArXi:2603.08611v1 Announce Type: new In order to navigate complex traffic environments, self-driving vehicles must recognize many semantic classes pertaining to vulnerable road users or traffic control devices. However, many safety-critical objects (e.g., construction worker) appear infrequently in nominal traffic conditions, leading to a severe shortage of