AI SAFETY & ETHICS

Models differ in identity propensities

LessWrong AI

One topic we were interested when studying AI identities is to what extent you can just tell models who they are, and they stick with it - or not, and they would drift or switch toward something natural. Prior to running the experiments described in this post, my vibes-based view was that models do actually quite differ in what identities and personas they are willing to adopt, with the general tendency being newer models being less flexible.