AI SAFETY & ETHICS

Letting Claude do Autonomous Research to Improve SAEs

LessWrong AI

This work was done as part of MATS 7.1 I pointed Claude at our new synthetic Sparse Autoencoder benchmark, told it to improve Sparse Autoencoder (SAE) performance, and left it running overnight. By morning, it had boosted F1 score from 0.88 to 0.95. Within another day, with occasional input from me, it had matched the logistic regression probe ceiling of 0.97 -- a score I honestly hadn't thought was possible for an SAE on this benchmark.