Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League

Hi LocalLlama. Here are the results from the March run of the GACL. A few observations from my side: GPT-5.4 clearly leads among the major models at the moment. Qwen3.5-27B performed better than every other Qwen model except 397B, trailing it by only 0.04 points. In my opinion, it’s an outstanding model. Kimi2.5 is currently the top open-weight model, ranking globally, while GLM-5 comes next at globally. Significant difference between Opus and Sonnet, than I expected. GPT models dominate the Battleship game.