[ComfyUI] Accelerate Z-Image (S3-DiT) by 20-30% & save 3.5GB VRAM using Triton+INT8 (No extra model downloads)

Hey everyone, I've recently started building open-source optimizations for the AI models I use heavily, and I'm excited to share my latest project with the ComfyUI community! I built a custom node that accelerates Z-Image S3-DiT (6.15B) by 20-30% using Triton kernel fusion + W8A8 INT8 quantization. The best part? It runs directly on your existing BF16 model. GitHub: 💡 Why you might want to use this: No extra massive downloads: It quantizes your existing BF16 safetensors on the fly at runtime. You don't need to download a separate GGUF or quantized version.