AI RESEARCH
Automated alignment is harder than you think
arXiv CS.AI
•
ArXi:2605.06390v1 Announce Type: new A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as capabilities improve. We argue that, even when research agents are not scheming to deliberately sabotage alignment work, this plan could produce compelling but catastrophically misleading safety assessments resulting in the unintentional deployment of misaligned AI.