TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

ArXi:2507.21584v4 Announce Type: replace Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial linguistic cues under static preference supervision. We propose TARS, a token-adaptive preference strategy that reformulates DPO as a principled min-max optimization problem.