FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
r/LocalLLaMA
•
Generative AI
I'm working on it in ComfyUI, and the kernel can also be used in LLM