ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

ArXi:2604.26806v1 Announce Type: cross Transformer-based architectures have established a dominant paradigm in global semantic perception; however, they remain fundamentally constrained by the profound spatial heterogeneity inherent in natural images. Specifically, the imposition of a uniform global receptive field across regions of varying information density inevitably leads to local feature degradation, particularly in dense conflict zones populated by microscopic targets. To address this mechanistic limitation, we propose ViCrop-Det, a.