Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

ArXi:2412.01782v4 Announce Type: replace-cross DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly important for safety-critical applications, such as in autonomous vehicles.