AI RESEARCH

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

r/MachineLearning

Hi everyone, I am from Australia: ) I just released a new research prototype It’s a lossless BF16 compression format that s weights in 12 bits by replacing the 8-bit exponent with a 4-bit group code. For 99.97% of weights, decoding is just one integer ADD. Byte-aligned split storage: true 12-bit per weight, no 16-bit padding waste, and zero HBM read amplification.