Kimi K2.6 - the mighty turtle that wins the race

Hi folks, I've been benching Kimi K2.6 for the past few days, and I'd like to share my findings. For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game. Findings: K2.6 has played 64 games so far (2 games per match), these are early results but it has absolutely dominated the leaderboard through consistent wins against other models. K2.6 is slow, generating an average of 570,000 tokens per game. Gemini 3.1 Pro, for contrast, generates 180,000 tokens per game.