AI SAFETY & ETHICS

Open-world evaluations for measuring frontier AI capabilities

AI Snake Oil

Introducing CRUX, a new project for evaluating AI on long, messy tasks