AI RESEARCH

TriBench-Ko: Evaluating LLM Risks in Judicial Workflows

arXiv CS.CL

ArXi:2605.03792v1 Announce Type: new Large language models (LLMs) are increasingly integrated into legal workflows. However, existing benchmarks primarily address proxy tasks, such as bar examination performance or classification, which fail to capture the performance and risks inherent in day-to-day judicial processes. To address this, we publicly release TriBench-Ko, a Korean benchmark designed to evaluate potential deployment risks of LLMs within the context of verified judicial task requirements.