AI RESEARCH
Using Embedding Models to Improve Probabilistic Race Prediction
arXiv CS.CL
•
ArXi:2604.22555v1 Announce Type: new Estimating racial disparity requires individual-level race data, which are often unavailable due to the sensitivity of collecting such information. To address this problem, many researchers utilize Bayesian Improved Surname Geocoding (BISG), which have critically relied on Census surname data. Unfortunately, these data capture race-surname relationships only for common surnames, omitting approximately 10% of the US population.