Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

ArXi:2604.15705v1 Announce Type: new Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity inherent in the endogenous reasoning remains largely unexplored.