GLM 5.1 sits alongside frontier models in my social reasoning benchmark

Still need matches for reliable data but GLM 5.1 looks to be very competitive with other frontier models. This uses a benchmark I made that pits LLMs against each other in autonomous games of Blood on the Clocktower (a complex social deduction game) - last screenshot shows GLM 5.1 playing as the evil team (red). For contrast, Claude Opus 4.6 costs $3.69 per game. GLM 5.1 costs $0.92 per game. With a 0% tool error rate. Very impressive. submitted by /u/cjami [link] [comments]