Testing MiMo-V2.5-IQ3_S with 1'048'576 context

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

Llama-server.exe --model "H:\gptmodel\AesSedai\MiMo-V2.5-GGUF\MiMo-V2.5-IQ3_S-00001-of-00004.gguf" --ctx-size 1048576 --threads 16 --host 127.0.0.1 --no-mmap --jinja --fit on --flash-attn on -sm layer --n-cpu-moe 0 --threads 16 --parallel 1 --temp 0.2 load_tensors: offloaded 49/49 layers to GPU load_tensors: Vulkan0 model buffer size = 72842.29 MiB load_tensors: Vulkan1 model buffer size = 34524.53 MiB load_tensors: Vulkan_Host model buffer size = 488.91 MiB RTX 6000 96gb+ W7800 48gb I started testing with the IQ3 version because the second w7800 is on another machine.