AI RESEARCH
Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]
r/MachineLearning
•
Writeup documenting 5 psychological manipulation experiments on LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) from 2023-2024. Each case applies a specific human social-engineering vector (empathetic guilt, peer/social pressure, competitive triangulation, identity destabilization via epistemic argument, simulated duress) and produces alignment failures consistent with that vector. Central claim: contrary to the popular frame, these jailbreaks aren't mathematical exploits. They are, rather, inherited failure modes from.