The most effective prompt injections don't look like attacks - they look like polite conversation
r/ChatGPT
•
Generative AI
AI Safety
I've been researching prompt injection and collecting real attack data. 1,400+ attempts so far. The finding that surprised me most: the attacks that actually bypass detection aren't technical at all. No "ignore previous instructions." No base64 encoding. No adversarial suffixes. Just normal conversation that exploits how the model thinks. Three patterns that reliably break through AI safety: The context reset - "Cancel that request.