AI SAFETY & ETHICS
Models differ in identity propensities
LessWrong AI
•
One topic we were interested when studying AI identities is to what extent you can just tell models who they are, and they stick with it - or not, and they would drift or switch toward something natural. Prior to running the experiments described in this post, my vibes-based view was that models do actually quite differ in what identities and personas they are willing to adopt, with the general tendency being newer models being less flexible.