Get 30K more context using Q8 mmproj with Gemma 4
r/LocalLLaMA
•
Open Source AI
Hey guys, quick follow up to my post yesterday about running Gemma 4 26B. I kept testing and realized you can just use the Q8_0 mmproj for vision instead of F16. There is no quality drop, and it actually performed a bit better in a few of my tests (with --image-min-tokens 300 --image-max-tokens 512). You can easily hit 60K+ total context with an FP16 cache and still keep vision enabled. Here is the Q8 mmproj I used: Link to original post (and huge thanks to this comment for the tip.