AI SAFETY & ETHICS

Chat, is this sus?

LessWrong AI

A large assumption we have made in AI control is that humans will be perfect at auditing, that is, being shown a transcript and determining if the AI was scheming in that transcript. But we are uncertain whether humans will be perfect at auditing; they are prone to fatigue and distraction. That is why I’m releasing "Sentinel" today, an extremely high-stimulation way to audit boring transcripts. Sentinel is a revolutionary way to get juice out of your human auditors by gamifying the auditing process with a level system, perks, power-ups, and fun features. Try it now here.