Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution