AI RESEARCH

BigCodeArena: Judging code generations end to end with code executions

Hugging Face Blog

Motivation The BigCodeArena Platform Real-Time Execution Multi-Language & Framework Interactive Testing Multi-Turn Conversations What We've Learned: 5 Months of Community Evaluation Programming Topics in the Wild Language and Framework Popularity User Interaction Patterns Model Rankings from Community Votes Two New Benchmarks: BigCodeReward and AutoCodeArena BigCodeReward: Evaluating Reward Models for Code AutoCodeArena: Automated Code Generation Benchmarks Try It Yourself Open Source E