RewardHarness: Self-Evolving Agentic Post-Training

ArXi:2605.08703v1 Announce Type: new Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference annotation and additional model