AI SAFETY & ETHICS

Risk from fitness-seeking AIs: mechanisms and mitigations

Alignment Forum

Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases