When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

ArXi:2512.08875v2 Announce Type: replace-cross Large Language Models (LLMs) have recently nstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adapting LLMs to tabular data generation: (i) fine-tuning smaller models directly on tabular datasets, and (ii) prompting larger models with examples provided in context. In this work, we show that popular implementations from both regimes exhibit a tendency to compromise privacy by reproducing memorized patterns of numeric digits from their.