AI SAFETY & ETHICS
Risk from fitness-seeking AIs: mechanisms and mitigations
Alignment Forum
•
Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases