AI RESEARCH
LLM Output Detectability and Task Performance Can be Jointly Optimized
arXiv CS.CL
•
ArXi:2605.01350v1 Announce Type: new Detecting machine-generated text is essential for transparency and accountability when deploying large language models (LLMs). Among detection approaches, watermarking is a statistically reliable method by design -- it embeds detectable signals into LLM outputs by biasing their token distributions. However, it has been reported that watermarked LLMs often perform worse on downstream tasks. We propose PUPPET, a framework that fine-tunes an LLM via reinforcement learning to generate text that is both detectable and better performing on downstream tasks.