AI RESEARCH
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
arXiv CS.AI
•
ArXi:2603.25412v1 Announce Type: new Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect outputs -- and treats the reasoning chain as an opaque intermediate artifact.