Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data

ArXi:2502.05400v4 Announce Type: replace Although LLMs have achieved significant success, their reliance on large volumes of human-annotated data has limited their potential for further scaling. In this situation, utilizing self-generated synthetic data has become crucial for fine-tuning LLMs without extensive human annotation. However, current methods often fail to ensure consistent improvements across iterations, with performance stagnating after only minimal updates. To overcome these challenges, we