Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

ArXi:2604.00012v1 Announce Type: cross Despite the impressive performance of general-purpose large language models (LLMs), they often require fine-tuning or post-