AI SAFETY & ETHICS

Metagaming matters for training, evaluation, and oversight

LessWrong AI

Following up on our previous work on verbalized eval awareness: we are sharing a post investigating the emergence of metagaming reasoning in a frontier