AI RESEARCH
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
arXiv CS.AI
•
ArXi:2510.13727v2 Announce Type: replace Generative AI systems are increasingly assisting and acting on behalf of end users in practical settings, from digital shopping assistants to next-generation autonomous cars. In this context, safety is no longer about blocking harmful content, but about preempting downstream hazards like financial or physical harm. Yet, most AI guardrails continue to rely on output classification based on labeled datasets and human-specified criteria,making them brittle to new hazardous situations.