AI RESEARCH
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
arXiv CS.LG
•
ArXi:2508.00161v3 Announce Type: replace The releases of powerful open-weight large language models (LLMs) are often not accompanied by access to their full