2000 TPS with QWEN 3.5 27b on RTX-5090

r/LocalLLaMA
Open Source AI

I've been tuning my settings for a specific job that classifies markdown documents - lots of input tokens, no real caching because every doc is different and very few output tokens. So, these numbers are totally situational, but I thought I would share if anyone cares. In the last 10 minutes it processed 1,214,072 input tokens to create 815 output tokens and classified 320 documents. ~2000 TPS I'm pretty blown away because the first iterations were much slower.