AI RESEARCH

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

arXiv CS.LG

ArXi:2512.19728v2 Announce Type: replace Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-