AI RESEARCH
Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
arXiv CS.LG
•
ArXi:2605.01699v1 Announce Type: new Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgically removed without measurable capability cost. Our central protocol is a leave-one-out cross-sequence probe that tests whether a memorisation signature generalises across held-out sequences.