Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU
r/LocalLLaMA
•
Generative AI
AI Hardware
AI Tools
Quick summary: I found a way to use the RT Cores (normally used for ray tracing in games) to handle expert routing in MoE models.