AI RESEARCH

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

arXiv CS.AI

ArXi:2604.19544v1 Announce Type: new Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences