Adaptive Prompt Embedding Optimization for LLM Jailbreaking

ArXi:2604.24983v1 Announce Type: new Existing white-box jailbreak attacks against aligned LLMs typically append discrete adversarial suffixes to the user prompt, which visibly alters the prompt and operates in a combinatorial token space. Prior work has avoided directly optimizing the embeddings of the original prompt tokens, presumably because perturbing them risks destroying the prompt's semantic content.