Qwen3.5 family comparison on shared benchmarks

Main takeaway: 122B, 35B, and especially 27B retain a lot of the flagship’s performance, while 2B/0.8B fall off much harder on long-context and agent categories. submitted by /u/Deep-Vermicelli-4591 [link] [comments]