Claude beat human researchers on an alignment task, and then the results vanished in production
The Decoder
•
Generative AI
In a controlled experiment, nine autonomous Claude instances dramatically outperformed human researchers on an open alignment problem. But when Anthropic tried to transfer the winning method to its own production models, the effect vanished.