Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. The idea is to fine-tune a second model with bounded mirrored gradient