A Representation-Level Assessment of Bias Mitigation in Foundation Models

ArXi:2604.08561v1 Announce Type: cross We investigate how successful bias mitigation reshapes the embedding space of encoder-only and decoder-only foundation models, offering an internal audit of model behaviour through representational analysis. Using BERT and Llama2 as representative architectures, we assess the shifts in associations between gender and occupation terms by comparing baseline and bias-mitigated variants of the models.