Do not fall into the trap of chasing the next scale or upgrade.

I mean; don't get me wrong, I love me some improvements and enhancements and it keeps on giving. and with MTP making its way to llama.cpp soon, a lot of you who aren't already running custom compiles are about to get a boost in inference speed, and your workflows will feel that extra POWER when running locally. That is insane. but don’t fall for the trap. Productivity is being measured by large context sizes and token consumption, but models in their current form can already do so much even on 6GB and 12GB GPUs.