ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction

ArXi:2506.01770v2 Announce Type: replace-cross Large Language Models (LLMs) have achieved tremendous success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks of generating harmful content and are vulnerable to jailbreaking attacks, creating unaddressed security issues regarding their deployments. In the context of software engineering for artificial intelligence (SE4AI) techniques, model-based analysis has nstrated notable potential for analyzing and monitoring machine learning models, particularly in stateful deep neural networks.