Synthetic Designed Experiments for Diagnosing Vision Model Failure

ArXi:2605.00832v1 Announce Type: cross Current synthetic data pipelines for computer vision generate images without diagnosing what the downstream model actually needs. This open-loop paradigm treats synthetic data as cheap real data, randomly sampling the generator's output space and hoping to cover the model's failure modes. We argue this fundamentally misuses synthetic data's unique property: the controllable, independent variation of scene factors.