AI SAFETY & ETHICS

vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models

LessWrong AI

TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques such as probes, steering, and activation oracles. We benchmarked it as 8-44× faster than existing alternatives for single-GPU use, though we note a planned version of nnsight closes this gap. To our knowledge it’s also the only tool that s all four common types of parallelism (pipeline, tensor, expert, data) and dynamic batching, enabling efficient multi-GPU and multi-node work on frontier open-weights models. It is also integrated with Inspect.