Fairness Evaluation and Inference Level Mitigation in LLMs

ArXi:2510.18914v3 Announce Type: replace-cross Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although