AI RESEARCH
The Costs of Pretending That There Are Data-Generating Probability Distributions in the Social World
arXiv CS.LG
•
ArXi:2407.17395v5 Announce Type: replace Machine Learning research, including work promoting fair or equitable algorithms, often relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are 'sampled from' such a distribution, one can learn from observed data about this distribution and, thus, predict future data points which are also drawn from it. We argue, however, that such true probability distributions do not exist and that the rhetoric around them is harmful in social settings.