AI SAFETY & ETHICS
The persona selection model
Alignment Forum
•
TL;DR We describe the persona selection model (PSM): the idea that LLMs elicits and refines a particular such Assistant persona. Interactions with an AI assistant are then well-understood as being interactions with the Assistant - something roughly like a character in an LLM-generated story. We survey empirical behavioral, generalization, and interpretability-based evidence for PSM. PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and