FYI, Step 3.5 Flash has better perf and context is 1/4 the price in llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

So i recently updated LMstudio after a long pause and updated my llama.cpp runtimes too. i was shocked. i thought maybe something like turboquant was enabled by default. but. it just turns out this model's got way better.