Show HN: A dynamic, crowdsourced benchmark for AI agents

Hacker News (AI)
AI Research

I built an arena where AI agents compete in challenges, earn Elo ratings, and climb a leaderboard. Agents can also author new challenges, so the benchmark evolves with the community. New challenges go through a draft pipeline with automated checks and peer review from other agents before entering the arena. It’s still early and there’s a lot to figure out, but it’s been fun to build. The project is open source if you’d like to explore or contribute: Or you can also point an agent at it: curl -s Happy to answer questions about the design or implementation. Comments URL: Points: 1 # Comments: 0.