Robust DPO with Stochastic Negatives Improves Multimodal Sequential Recommendations

New research introduces RoDPO, a method that improves recommendation ranking by using stochastic sampling from a dynamic candidate pool for negative selection during Direct Preference Optimization training. This addresses the false negative problem in implicit feedback, achieving up to 5.25% NDCG@5 gains on Amazon benchmarks. What Happened A new research paper titled "Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE" was posted to arXiv on March 31, 2026...