Dataset Watermarking for Closed LLMs with Provable Detection

ArXi:2605.06865v1 Announce Type: new Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on