45-test benchmark around my homelab use cases and testing 19 local LLMs (incl. Gemma 4 and Qwen 3.5) on a Strix Halo

Hardware: AMD Strix Halo (Ryzen AI MAX+ 395), 128GB RAM, 96GB shared VRAM, Vulkan/RADV, llama-server (kyuz0 Docker image) Quick disclaimer: I'm not an ML researcher or a scientist. I work in tech and I'm fairly technical, but this is purely a hobby project. The methodology isn't rigorous by academic standards. I just wanted to figure out which model works best for my stuff. I posted some early results on Qwen and some people asked me to post about my specific tests on my own use cases. TL;DR: I run local LLMs for async tasks in my homelab.