AI RESEARCH
RewardHarness: Self-Evolving Agentic Post-Training
arXiv CS.AI
•
ArXi:2605.08703v1 Announce Type: new Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference annotation and additional model