AI RESEARCH
Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing
arXiv CS.CL
•
ArXi:2603.07202v1 Announce Type: new As Large Language Models (LLMs) transition into autonomous agentic roles, the risk of deception-defined behaviorally as the systematic provision of false information to satisfy external incentives-poses a significant challenge to AI safety. Existing benchmarks often focus on unintentional hallucinations or unfaithful reasoning, leaving intentional deceptive strategies under-explored. In this work, we