I checked Strix Halo (Ryzen ai max+ 395) performance test as context length increases

r/LocalLLaMA
Generative AI Open Source AI AI Research

Hi all, I saw a lot of test videos and postings for how exactly good Strix Halo machine(GTR9 PRO) is for Local LLM as long context length. So I put together a small benchmark project for testing how local llama.cpp models behave as context length increases on an AMD Strix Halo 128GB machine.