EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation

ArXi:2508.06046v2 Announce Type: replace-cross Although the effectiveness of Large Language Models (LLMs) as judges (LLM-as-a-judge) has been validated, their performance remains limited in open-ended tasks, particularly in story evaluation. Accurate story evaluation is crucial not only for assisting human quality judgment but also for providing key signals to guide story generation.