Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

ArXi:2605.12684v1 Announce Type: cross Multimodal large language models (MLLMs) are now routinely deployed for visual understanding, generation, and curation. A substantial fraction of these applications require an explicit aesthetic judgment. Most existing solutions reduce this judgment to predicting a scalar score for a single image.