LLM as an (Opinionated) Judge

Image made by the author using ChatGPT The evaluation problem Building systems around large language models has become standard practice. Use cases span a wide spectrum; from narrow, well-defined tasks like classifying spam emails, to open-ended ones like generating plain-language summaries of complex legal contracts or even surfacing the key issues buried within them. On the implementation side, the barriers have dropped considerably.