Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

ArXi:2410.15362v2 Announce Type: replace-cross Aligned Large Language Models (LLMs) have attracted significant attention for their safety, particularly in the context of jailbreak attacks that attempt to bypass guardrails via adversarial prompts. Among existing approaches, the Greedy Coordinate Gradient (GCG) attack pioneered automated jailbreaks through discrete token optimization; however, its low sample efficiency limits practical applicability.