AI SAFETY & ETHICS
Open-world evaluations for measuring frontier AI capabilities
AI Snake Oil
•
Introducing CRUX, a new project for evaluating AI on long, messy tasks