Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

ArXi:2605.08437v1 Announce Type: cross Existing benchmarks for legal AI focus primarily on tasks where LLMs must produce legal arguments or documents, yet the capacity to \emph{judge} such arguments -- weighing competing claims, applying doctrine to facts, and rendering reasoned decisions -- is arguably as fundamental to a well-functioning legal system as advocacy itself. We