Any idea why prunning can improve perplexity?
r/LocalLLaMA
•
AI Research
I made an little experiment -I combined and modified version of wanda prunning with (data free) quantisation. To be exact HQQ. I wont lie maybe I made mistakes -its still just an research result but in this specific combination it looks like prunning before quant can improve quality. May relay on that I used an data free quant in combination with prunned where I do used data. Any idea why that could be? I would be happy about feedback! submitted by /u/ShotokanOSS [link] [comments]