AI RESEARCH

Language models recognize dropout and Gaussian noise applied to their activations

arXiv CS.AI

ArXi:2604.17465v1 Announce Type: new We provide evidence that language models can detect, localize and, to a certain degree, verbalize the difference between perturbations applied to their activations. precisely, we either (a) \emph{mask} activations, simulating \emph{dropout}, or (b) add \emph{Gaussian noise} to them, at a target sentence. We then ask a multiple-choice question such as ``\emph{Which of the previous sentences was perturbed?}'' or ``\emph{Which of the two perturbations was applied.