Qwen 3.6 for Claude Code in 1L

I use a p3 tiny gen 2 with an rtx 2000 ada (16gb vram). It gets hot, so I modeled and printed a fan hanger to keep it cool. It's dumb, but it feels like Claude Code, just unlimited. I did have to use the change in this PR to make llamacpp work well with cc though: Qwen 3.6 35b a3b q4km unsloth, 400 t/s prompt, 24 t/s generation. With the change to let prompt prefixes cache, I'm amazed at what these newfangled tools can generate. Have a great day folks, I just wanted to share my experience with someone <3 submitted by /u/brickinthefloor [link] [comments.