Qwen 3.5 0.8B - small enough to run on a watch. Cool enough to play DOOM.

So I went down the rabbit hole of making a VLM agent that actually plays DOOM. The concept is dead simple - take a screenshot from VizDoom, draw a numbered grid on top, send it to a vision model with two tools (shoot and move), the model decides what to do. Repeat. The wild part? It's Qwen 3.5 0.8B - a model that can run on a smartwatch, trained to generate text, but it handles the game surprisingly well. On the basic scenario it actually gets kills. Like, it sees the enemy, picks the right column, and shoots. I was genuinely surprised.