AI RESEARCH

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

arXiv CS.LG

ArXi:2508.00161v3 Announce Type: replace The releases of powerful open-weight large language models (LLMs) are often not accompanied by access to their full