d318 is almost always suppressive in Qwen-2.5-3B emotional vectors, built an emotion vector steering pipeline, positive steering collapses to a single 'preschool teacher' register regardless of emotion
r/LocalLLaMA
•
AI Safety
Open Source AI
It appears that on lower weight models, behavior converges to either be highly sycophantic or neutral with no real in between, however existentialism did seem to be somewhat present. Using some heatmaps and visualizations, the cosine similarities between emotions appears coherent with what'd be expected, and there's really interesting dimensional dominances. In Qwen-2.5-3B, d318 is almost always the greatest in magnitude and almost always suppressive. Could be interesting for interpretability research.