AI RESEARCH

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

arXiv CS.LG • April 15, 2026

ArXi:2512.19728v2 Announce Type: replace Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-