Protecting humanity and Claude from rationalization and unaligned AI

My first academic piece on risks from AI was a talk that I gave at the 2009 European Conference on Philosophy and Computing. Titled “ three factors misleading estimates of the safety of artificial general intelligence ”, one of the three factors was what I called anthropomorphic trust: Trust in humans is at least partially mediated by oxytocin - higher levels of oxytocin lead to trusting behavior. Trusting somebody and then not being betrayed by the trustee increases oxytocin levels, and the hormone has been linked to pair bonding.