Running DeepSeek V3.2 with dense attention (like in llama.cpp) makes it a bit dumber

It was bugging me how the attention implementation (dense vs sparse) affects DeepSeek V3.2 (Speciale) reasoning performance. I checked it before in lineage-bench and found no meaningful difference, but that test was only up to lineage-192 (lineage graphs with 192 nodes). This time I decided to use much larger lineage-bench graphs to make any difference in reasoning performance pronounced.