How AI Gets Tricked — A 10th Grader's Theory
Dev.to AI
•
Generative AI
AI Safety
Okay so I was talking to Claude at like midnight and accidentally figured out something real. I'm a 10th grader preparing for JEE so take this with however much salt you want - but hear me out. Everyone talks about AI jailbreaking like it's some insane technical thing. But I think the actual mechanism is simpler. I call it the Wheel Theory. The Two Wheels Think of AI safety like a combination lock with two wheels spinning independently. Wheel 1 - Input: AI classifies the FORMAT of what you sent. "Math problem." "Joke." "Story." Safety filters react here first.