Robust Bias Evaluation with FilBBQ: A Filipino Bias Benchmark for Question-Answering Language Models

ArXi:2602.14466v2 Announce Type: replace With natural language generation becoming a popular use case for language models, the Bias Benchmark for Question-Answering (BBQ) has grown to be an important benchmark format for evaluating stereotypical associations exhibited by generative models. We expand the linguistic scope of BBQ and construct FilBBQ through a four-phase development process consisting of template categorization, culturally aware translation, new template construction, and prompt generation.