AI RESEARCH
Hard Negative Sample-Augmented DPO Post-Training for Small Language Models
arXiv CS.LG
•
ArXi:2512.19728v2 Announce Type: replace Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-