AI RESEARCH
A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning
arXiv CS.AI
•
ArXi:2510.18814v2 Announce Type: replace-cross Can language models improve their reasoning performance without external rewards, using only their own sampled responses for