AI SAFETY & ETHICS

Risk from fitness-seeking AIs: mechanisms and mitigations

Alignment Forum • May 01, 2026

Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases