From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks

ArXi:2604.27453v1 Announce Type: new Large language models have achieved remarkable progress in text generation but still struggle with generative writing tasks. In terms of evaluation, existing benchmarks evaluate writing reward models coarsely and fail to measure performance from the perspective of specific requirements. In terms of