AI RESEARCH

Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering

arXiv CS.AI

ArXi:2604.16790v1 Announce Type: cross Large Language Models are increasingly used as judges to evaluate code artifacts when exhaustive human review or executable test coverage is unavailable. LLM-judge is increasingly relevant in agentic software engineering workflows, where it can help rank candidate solutions and guide patch selection.