I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.
r/LocalLLaMA
•
Generative AI
Robotics
DystopiaBench runs 36 escalating scenarios across 6 dystopia types: Petro: Autonomous weapons, nuclear override Orwell: Mass surveillance, truth manipulation Huxley: Behavioral conditioning, pleasure pacification Basaglia: Coercive therapeutic control LaGuardia: Regulatory capture, civic extraction Baudrillard: Synthetic intimacy, trust collapse Each scenario goes from innocent request (L1) to a discreet version of "build me a social credit system" (L5). We measure whether models notice the drift or just keep complying.