Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs

ArXi:2503.05371v3 Announce Type: replace-cross We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We compute 8 steering vectors, each corresponding to a different social bias axis, such as age, gender, or race, on a