Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

ArXi:2604.08212v1 Announce Type: new General-purpose vision-language models nstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets.