Mistral Medium 3.5 on AMD Strix Halo

r/LocalLLaMA
Generative AI Open Source AI

TLDR; it's slow as heck. Run overnight. I asked it a question about codebase architecture. For an end-to-end prompt of 48k tokens + 4k thinking tokens, it took about 2 hours. llama-server -hf unsloth/Mistral-Medium-3.