AI RESEARCH
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach
arXiv CS.AI
•
ArXi:2509.22378v2 Announce Type: replace-cross Recently, Image-to-Music (I2M) generation has garnered significant attention, with potential applications in fields such as gaming, advertising, and multi-modal art creation. However, due to the ambiguous and subjective nature of I2M tasks, most end-to-end methods lack interpretability, leaving users puzzled about the generation results. Even methods based on emotion mapping face controversy, as emotion represents only a singular aspect of art.