Qwen3.5-9B.Q4_K_M on RTX 3070 Mobile (8GB) with ik_llama.cpp — optimization findings + ~50 t/s gen speed, looking for tips

r/LocalLLaMA
Generative AI AI Research

Disclouse: This post partly written with the help of Claude Opus 4.6 to help with gathering the info and making it understandable for myself first and foremost. and this post etc! Hi! Been tuning local inference on my laptop and wanted to share some info reallyu because some of it surprised me. Would also love to hear what others are getting on similar hardware.