AI SAFETY & ETHICS

Not a Paper: "Frontier Lab CEOs are Capable of In-Context Scheming"

LessWrong AI

(Fragments from a research paper that will never be written, but whose existence was brought to my attention by GradientDissenter.) Extended Abstract. The CEOs of frontier AI developers are becoming increasingly powerful and wealthy, significantly increasing their potential for risks. One concern is that of executive misalignment: when the CEO has different incentives and goals than that of the board of directors, or of humanity as a whole. In this work, we propose three different threat models, under which executive misalignment can lead to concrete harm.