AI RESEARCH

[R] From Garbage to Gold: A Formal Proof that GIGO Fails for High-Dimensional Data with Latent Structure — with a Connection to Benign Overfitting Prerequisites

r/MachineLearning

Paper: GitHub (R simulation, Paper Summary, Audio Overview): I'm Terry, the first author. This paper has been 2.5 years in the making and I'd genuinely welcome technical critique from this community. The core result: We formally prove that for data generated by a latent hierarchical structure - Y ← S¹ → S² → S'² - a Breadth strategy of expanding the predictor set asymptotically dominates a Depth strategy of cleaning a fixed predictor set. The proof follows from partitioning predictor-space noise into two formally distinct components: Predictor.