Measure what Matters: Psychometric Evaluation of AI with Situational Judgment Tests

ArXi:2510.22170v2 Announce Type: replace Persona conditioning is widely used to steer large language model (LLM) behavior, but it is unclear whether it induces stable behavioral structure or superficial variation. We propose a framework to measure consistent behavioral tendencies using situational judgment tests (SJTs), multidimensional item response theory (MIRT), and structured synthetic personas, treating responses as observations of latent behavioral variables.