AI RESEARCH

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

arXiv CS.LG

ArXi:2602.05890v2 Announce Type: replace