A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

ArXi:2504.21035v4 Announce Type: replace-cross Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illusion of privacy by proposing a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.