Is there anyway to run bigger models at 20t/s with 24vram + 64gb ram DDR5?

r/LocalLLaMA
Open Source AI AI Research

I know the new Qwen 27B is amazing right now for coding in general, but since 122b is supposed to be coming as well, it’s expected to be better I guess? I am actually surprised at how this dense model performs I haven’t used Codex at all anymore for all my C++ programming needs. submitted by /u/soyalemujica [link] [comments]