AI RESEARCH
INT3 compression+fused metal kernels [R]
r/MachineLearning
•
Hey guys, I am a researcher and solo founder. I compress models with INT3 at +0.14 nats and built a 2-bit KV cache for long-horizon tasks. I shipped both (INT3 model + INT2 KV) with custom fused Metal kernels for Mac (M-series). Currently Qwen 7B is available in preview. brew install reinforceai/spiral/spiral spiral-chat I am optimizing kernels further and working on Triton kernels for GPU. There is still room to pack efficiently, I will share models soon. I will appreciate any feedback or any model you want me to compress within 100B parameters.