Modular: Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

Modular Blog
Generative AI LLMs AI Research

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance