Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

ArXi:2604.10437v1 Announce Type: new Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i)