Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification

ArXi:2604.05210v1 Announce Type: new Accurate and timely identification of construction hazards around workers is essential for preventing workplace accidents. While large vision-language models (VLMs) nstrate strong contextual reasoning capabilities, their high computational requirements limit their applicability in near real-time construction hazard detection. In contrast, small vision-language models (sVLMs) with fewer than 4B parameters offer improved efficiency but often suffer from reduced accuracy and hallucination when analyzing complex construction scenes.