OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

ArXi:2509.26495v3 Announce Type: replace Large Language Model (LLM) safety is one of the most pressing challenges for enabling wide-scale deployment. While most studies and global discussions focus on generic harms, such as models assisting users in harming themselves or others, enterprises face a fundamental concern: whether LLM-based agents are safe for their intended use case. To address this, we