AI RESEARCH
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
arXiv CS.LG
•
ArXi:2511.18721v3 Announce Type: replace The SmoothLLM defense provides a certification guarantee against jailbreaking attacks, but it relies on a strict "k-unstable" assumption that rarely holds in practice. This strong assumption can limit the trustworthiness of the provided safety certificate. In this work, we address this limitation by