LLM as an (Opinionated) Judge

Towards AI
Generative AI AI Research

Image made by the author using ChatGPT The evaluation problem Building systems around large language models has become standard practice. Use cases span a wide spectrum; from narrow, well-defined tasks like classifying spam emails, to open-ended ones like generating plain-language summaries of complex legal contracts or even surfacing the key issues buried within them. On the implementation side, the barriers have dropped considerably.