Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

ArXi:2603.10001v1 Announce Type: cross Large Language Models (LLMs) exhibit inequalities with respect to various cultural contexts. Most prominent open-weights models are trained on Global North data and show prejudicial behavior towards other cultures. Moreover, there is a notable lack of resources to detect biases in non-English languages, especially from Latin America (Latam), a continent containing various cultures, even though they share a common cultural ground.