PyTorch 2.12.0+cu132 (CUDA 13.2) — SA2/SA3 Attention Stability Benchmarks

With the release of PyTorch 2.12.0+cu132, I ran a full benchmark suite to verify that SA2 and SA3 attention backends are stable and working correctly in the new environment. Tests were conducted on the following models: flux1-krea-dev_fp8_scaled - 20 steps, CFG 1, 1024×1024 flux-2-klein-base-9b-fp8 - 20 steps, CFG 5, 1280×1280 wan2.2_t2v_high/low_noise_14B_fp16 + lightx2v_4steps_lora - 2+2 steps, CFG 1, 640×640 All backends (fp8_cuda, fp8pp_cuda, triton, SA3 standard, SA3 per_block_mean) are confirmed stable. Results in the charts below.