vLLM on Jetson Orin — pre-built wheel with Marlin GPTQ support (3.8x prefill speedup)
r/LocalLLaMA
•
AI Tools
Hey all, If you're running GPTQ models on a Jetson Orin (AGX, NX, or Nano), you've probably noticed that stock vLLM doesn't ship Marlin kernels for SM 8.7. It covers 8.0, 8.6, 8.9, 9.0 - but not the Orin family. Which means your tensor cores just sit there doing nothing during GPTQ inference. I ran into this while trying to serve Qwen3.5-35B-A3B-GPTQ-Int4 on an AGX Orin 64GB. The performance without Marlin was underwhelming, so I compiled vLLM 0.17.0 with the SM 8.7 target included and packaged it as a wheel.