AI RESEARCH

MANTA: Multi-turn Assessment for Nonhuman Thinking & Alignment

arXiv CS.LG

ArXi:2605.16301v1 Announce Type: cross Single-turn benchmarks such as AnimalHarmBench (AHB) have established important baselines for measuring animal welfare alignment in large language models (LLMs), but they miss a critical failure mode: models that respond appropriately when unpressured may capitulate when follow-up conversational turns