Reducing MP3 compression bias in music datasets via codec-aware reconstruction
r/LocalLLaMA
•
Data Science
I built a tool to improve decoding of MP3 files (LAME encoded) reducing systematic codec induced bias in audio datasets. Rather than denoising, it treats reconstruction as a disambiguation problem: MP3 encoding is non-injective, so the observed signal corresponds to a distribution of plausible originals. The model approximates this as a Bayesian inference problem induced by the compression process itself, selecting a coherent signal consistent with both codec structure and musical priors.