AI RESEARCH

Differentially Private Language Generation and Identification in the Limit

arXiv CS.CL

ArXi:2604.08504v1 Announce Type: cross We initiate the study of language generation in the limit, a model recently We then turn to the harder problem of language identification in the limit. Here, we show that privacy creates fundamental barriers. We prove that no $\varepsilon$-DP algorithm can identify a collection containing two languages with an infinite intersection and a finite set difference, a condition far stronger than the classical non-private characterization of identification. Next, we turn to the stochastic setting where the sample strings are sampled i.i.d.