AI RESEARCH

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

r/MachineLearning • March 30, 2026

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and design challenges

Read Full Article