85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B: benchmarks, safety, weight forensics - Abliterlitics

r/LocalLLaMA
AI Hardware AI Research

I've been building Abliterlitics, an open-source abliteration forensics toolkit. The idea is straightforward: take the same base model, compare the different abliteration techniques others have applied, then measure what actually changed using benchmarks, safety evaluation, distribution shift, and weight-level analysis. This post covers Qwen3.6-27B, comparing five abliteration variants against the base model. I recovered safetensors from HauhauCS's Q8_K_P GGUF, then ran 85 hours of benchmarks, HarmBench, KL divergence, and weight forensics across all six.