AI SAFETY & ETHICS

A Toy Environment For Exploring Reasoning About Reward

Alignment Forum

Tldr: We share a toy environment that we found useful for understanding how reasoning changed over the