AI RESEARCH

Autorubric: Unifying Rubric-based LLM Evaluation

arXiv CS.AI

ArXi:2603.00077v2 Announce Type: replace-cross Techniques for reliable rubric-based LLM evaluation -- ensemble judging, bias mitigation, few-shot calibration -- are scattered across papers with inconsistent terminology and partial implementations. We