Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU

r/LocalLLaMA
Generative AI AI Hardware AI Tools

Quick summary: I found a way to use the RT Cores (normally used for ray tracing in games) to handle expert routing in MoE models.