AI RESEARCH
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling
arXiv CS.AI
•
ArXi:2604.19544v1 Announce Type: new Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences