Sockpuppetting: Jailbreaking LLMs by Combining Prefilling with Optimization

ArXi:2601.13359v2 Announce Type: replace-cross Prefill attacks are an effective and low-cost jailbreaking method, as they directly insert an acceptance sequence (e.g., "Sure, here is how to. ") at the start of an LLM's output and lead the model to continue the response. We make two contributions to this prior work. First, we show that an unsophisticated adversary can improve the well-known prefill attacks by ensembling a small number of prefill variants.