ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments

ArXi:2603.08024v1 Announce Type: new As large language models (LLMs) evolve into autonomous agents capable of acting in open-ended environments, ensuring behavioral alignment with human values becomes a critical safety concern. Existing benchmarks, focused on static, single-turn prompts, fail to capture the interactive and multi-modal nature of real-world conflicts. We