AI RESEARCH

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

arXiv CS.LG

ArXi:2511.18721v3 Announce Type: replace The SmoothLLM defense provides a certification guarantee against jailbreaking attacks, but it relies on a strict "k-unstable" assumption that rarely holds in practice. This strong assumption can limit the trustworthiness of the provided safety certificate. In this work, we address this limitation by