Evaluating Language Models for Harmful Manipulation

ArXi:2603.25326v1 Announce Type: new Interest in the concept of AI-driven harmful manipulation is growing, yet current approaches to evaluating it are limited. This paper