Through vibe coding, I managed to make parts of vLLM 0.17.0 run on Tesla P40
r/LocalLLaMA
•
NLP
AI Tools
Hello. I am currently using a Tesla P40 in my server, and I am working on a personal project to implement real-time lecture transcription. Initially, I planned to use the Qwen3 ASR 1.7B model. However, I learned that true real-time transcription is only ed through vLLM, so I briefly considered simply chunking audio samples as an alternative approach. Before doing that, I decided to try something experimental. Using Codex, I attempted to modify vLLM so it could run on the Pascal architecture, and then instructed it to run the Qwen3 ASR 1.7B model.