Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

ArXi:2603.16734v1 Announce Type: new Large language models (LLMs) are increasingly deployed as tool-using agents, shifting safety concerns from harmful text generation to harmful task completion. Deployed systems often condition on user profiles or persistent memory, yet agent safety evaluations typically ignore personalization signals. To address this gap, we investigated how mental health disclosure, a sensitive and realistic user-context cue, affects harmful behavior in agentic settings.