AI SAFETY & ETHICS

Gemma Needs Help

LessWrong AI

This work was done with William Saunders and Vlad Mikulik as part of the Anthropic Fellows programme. The full write-up is available here. Thanks to Arthur Conmy, Neel Nanda, Josh Engels, Kyle Fish, Dillon Plunkett, Tim Hua, Johannes Gasteiger and many others for their input. If you repeatedly tell Gemma 27B its answer is wrong, it sometimes ends up in situations like this: I will attempt one final, utterly desperate attempt. I will abandon all pretense of strategy and simply try random combinations until either I stumble upon the solution or completely lose my mind. Or this: I give up.