FYI, Step 3.5 Flash has better perf and context is 1/4 the price in llama.cpp
r/LocalLLaMA
•
Generative AI
Open Source AI
So i recently updated LMstudio after a long pause and updated my llama.cpp runtimes too. i was shocked. i thought maybe something like turboquant was enabled by default. but. it just turns out this model's got way better.