AI RESEARCH
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
arXiv CS.AI
•
ArXi:2603.21362v1 Announce Type: new LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency.