Explanation Quality Assessment as Ranking with Listwise Rewards

ArXi:2604.24176v1 Announce Type: new We reformulate explanation quality assessment as a ranking problem rather than a generation problem. Instead of optimizing models to produce a single "best" explanation token-by-token, we train reward models to discriminate among multiple candidate explanations and learn their relative quality.