Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

ArXi:2604.21102v1 Announce Type: new We present a novel framework for automatically evaluating building conditions nationwide in the United States by leveraging large language models (LLMs) and Google Street View (GSV) imagery. By fine-tuning Gemma 3 27B on a modest human-labeled dataset, our approach achieves strong alignment with human mean opinion scores (MOS), outperforming even individual raters on SRCC and PLCC relative to the MOS benchmark.