Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

ArXi:2510.12728v3 Announce Type: replace-cross Large Language Models (LLMs) are increasingly embedded in applications, and people can shape model behavior by editing prompt instructions. Yet encoding subtle, domain-specific policies into prompts is challenging. Although this process often benefits from concrete test cases, test data and prompt instructions are typically developed as separate artifacts, reflecting traditional machine learning practices in which model tuning was slow and test sets were static.