Corruptions of Supervised Learning Problems: Typology and Mitigations

ArXi:2307.08643v4 Announce Type: replace Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Marko kernels, our approach leads to three novel opportunities.