AI RESEARCH

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

arXiv CS.LG

ArXi:2605.18592v1 Announce Type: new Rubric-based reward shaping is an effective method for fine-tuning LLMs via RL, where structured rubrics decompose standard outcome rewards into multiple dimensions to provide richer reward signals. Recent works make the rubrics adaptive based on local signals such as the rollouts from the current step or pairwise comparisons. However, these methods discard the diagnostics produced during evaluation after immediate use and prevent the long-term accumulation and strategic reuse of evaluation knowledge.