Beyond Forgetting: Machine Unlearning Elicits Controllable Side Behaviors and Capabilities

ArXi:2601.21702v3 Announce Type: replace We consider Representation Misdirection (RM), a class of large language model (LLM) unlearning methods that achieve forgetting by redirecting the forget-representations, that is, latent representations of forget-samples, toward a target vector. Despite being important, the roles of the target vector used in RM, however, remain underexplored. Here, we approach and revisit RM through the lens of the Linear Representation Hypothesis.