Different Demographic Cues Yield Inconsistent Conclusions About LLM Personalization and Bias

ArXi:2601.18486v2 Announce Type: replace graphic cue-based evaluation is widely used to study how large language models (LLMs) adapt their responses to signaled graphic attributes within and across groups. This approach typically relies on a single cue (e.g., names) as a proxy for group membership, implicitly treating different cues as interchangeable operationalizations of the same identity-conditioned behavior. We test this assumption in realistic advice-seeking interactions spanning 14.8M prompts, focusing on race and gender in a U. S. context.