Qwen 3.6 35B A3B Q4_K_M quant evaluation

About the Model: 35B total parameters, 3B active (A3B) mixture of experts architecture. Evaluation approach taken: We took Q4_K_M quantized GGUF from Unsloth. Ran it on CPU via llama-cpp-python and tested on three standard benchmarks: - HumanEval (code generation), - HellaSwag (commonsense reasoning), and - BFCL (function calling). 1,264 samples total. Evaluation Results: - HumanEval: 47.56% (78/164) - HellaSwag: 74.30% (743/1000) - BFCL: 46.00% (46/100) Hardware: 32 vCPU, 125GB RAM. No.