AI RESEARCH
[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX
r/MachineLearning
•
New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and design challenges