Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

ArXi:2505.16831v3 Announce Type: replace-cross Unlearning in large language models (LLMs) aims to remove specified data, but its efficacy is typically assessed with task-level metrics like accuracy and perplexity. We show that these metrics can be misleading, as models can appear to forget while their original behavior is easily red through minimal fine-tuning. This \emph{reversibility} suggests that information is merely suppressed, not genuinely erased. To address this critical evaluation gap, we.