Qwen 3.5 prompt re-processing speed up for VLLM (settings inside)
r/LocalLLaMA
•
Open Source AI
AI Tools
I have been reading some posts around the internet and it appears it was not just me having this issue with Qwen3.5. It seemed like it was reprocessing the ENTIRE prompt getting longer and longer between responses as time went on. This was driving me nuts and was making the model unusable at longer contexts sometimes taking minutes to respond. However VLLM 0.17.0 release had some interesting updates, and I was able to test new settings that made a DRASTIC improvement at long context conversation/coding agent operations.