Any idea why prunning can improve perplexity?

r/LocalLLaMA
AI Research

I made an little experiment -I combined and modified version of wanda prunning with (data free) quantisation. To be exact HQQ. I wont lie maybe I made mistakes -its still just an research result but in this specific combination it looks like prunning before quant can improve quality. May relay on that I used an data free quant in combination with prunned where I do used data. Any idea why that could be? I would be happy about feedback! submitted by /u/ShotokanOSS [link] [comments]