GPT vs Claude in a bomberman-style 1v1 game

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments. I'm a big fan of these kinds of benchmarks as IMO they reveal so much about the capabilities and limits of agentic AI than static Q&A benchmarks. They are also intuitive to understand when you are able to actually see how the model behaves in these environments. I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were: Strategic & Real-time.