Activation Exposure & Feature Interpretability for GGUF via llama-server
r/LocalLLaMA
•
Generative AI
AI Safety
Open Source AI
You can now capture per-layer activation vectors from llama-server during inference, train sparse autoencoders on them, discover which internal features correspond to specific behaviors (sycophancy, hedging, creativity, etc.), and extract those features as GGUF control vectors for real-time steering. What this is: A C++ patch to llama-server that adds `/activations` endpoints, plus a Python pipeline for the full SAE workflow.