Social Bias in LLM-Generated Code: Benchmark and Mitigation

ArXi:2605.00382v1 Announce Type: cross Large Language Models (LLMs) are increasingly deployed to generate code for human-centered applications where graphic fairness is critical. However, existing evaluations focus almost exclusively on functional correctness, leaving social bias in LLM-generated code largely unexamined. Extending our prior work on Solar, we conduct a comprehensive empirical study using SocialBias-Bench, a benchmark of 343 real-world coding tasks spanning seven graphic dimensions.