Predicting When RL Training Breaks Chain-of-Thought Monitorability (8 minute read)
TLDR AI
•
Generative AI
Reinforcement Learning
Researchers propose a framework that predicts when RL