SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

ArXi:2605.00974v1 Announce Type: cross LLMs are increasingly equipped with safety alignment mechanisms, yet recent studies nstrate that they remain vulnerable to jailbreaking attacks that elicit harmful behaviors without explicit policy violations. While a growing body of work has explored automated jailbreak strategies, existing methods face several fundamental challenges, including the lack of systematic utilization of both successful and failed attack experiences, as well as the absence of principled mechanisms for composing and selecting reusable attack rules under diverse constraints.