RACC: Representation-Aware Coverage Criteria for LLM Safety Testing

ArXi:2602.02280v2 Announce Type: replace-cross Large Language Models (LLMs) face severe safety risks from jailbreak attacks, yet current safety testing largely relies on static datasets and lacks systematic criteria to evaluate test suite quality and adequacy. While coverage criteria have proven effective for smaller neural networks, they are impractical for LLMs due to computational overhead and the entanglement of safety-critical signals with irrelevant neuron activations.