AI RESEARCH
MANTA: Multi-turn Assessment for Nonhuman Thinking & Alignment
arXiv CS.LG
•
ArXi:2605.16301v1 Announce Type: cross Single-turn benchmarks such as AnimalHarmBench (AHB) have established important baselines for measuring animal welfare alignment in large language models (LLMs), but they miss a critical failure mode: models that respond appropriately when unpressured may capitulate when follow-up conversational turns