AI SAFETY & ETHICS
Metagaming matters for training, evaluation, and oversight
LessWrong AI
•
Following up on our previous work on verbalized eval awareness: we are sharing a post investigating the emergence of metagaming reasoning in a frontier