AI RESEARCH
Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning
arXiv CS.LG
•
ArXi:2603.07445v1 Announce Type: cross Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the