Bias in artificial intelligence can take many forms — from racial bias and gender prejudice to recruiting inequity and age discrimination. "The underlying reason for AI bias lies in human prejudice - conscious or unconscious - lurking in
AI algorithms throughout their development. AI solutions adopt and scale human biases.
One potential source of this issue is
prejudiced hypotheses made when designing AI models, or algorithmic bias. Psychologists claim there’re about 180 cognitive biases, some of which may find their way into hypotheses and influence how AI algorithms are designed.
An example of algorithmic AI bias could be assuming that a model would automatically be less biased when not given access to protected classes, say, race. In reality, removing the protected classes from the analysis doesn’t erase racial bias from AI algorithms. The model could still produce prejudiced results relying on related non-protected factors, for example, geographic data — the phenomenon known as
proxy discrimination.
Another common reason for replicating AI bias is
the low quality of data on which AI models are trained. The training data may incorporate human decisions or echo societal or historical inequities.
For instance, if an employer uses an AI-based recruiting tool trained on historical employee data in a predominantly male industry, chances are AI would replicate gender bias.
The same applies to natural language processing algorithms. When learning on real-world data, like news reports or social media posts, AI is likely to show language bias and reinforce the existing prejudices. This is what happened with Google Translate, which tends to be biased against women when translating from languages with gender-neutral pronouns. The AI engine powering the app is more likely to generate such translations as “he invests” and “she takes care of the children” than vice versa.
AI bias can stem from
the way training data is collected and processed as well. The mistakes data scientists may fall prey to range from excluding valuable entries to inconsistent labeling to under- and oversampling. Undersampling, for example, can cause skews in class distribution and make AI models ignore minority classes completely.
Oversampling, in turn, may lead to the over-representation of certain groups or factors in the training datasets. For instance, crimes committed in locations frequented by the police are more likely to be recorded in the training dataset simply because that is where the police patrol. Consequently, the algorithms trained on such data are likely to reflect this disproportion.
A no less important source of AI bias is
the feedback of real-world users interacting with AI models. People may reinforce bias baked in already deployed AI models, often without realizing it. For example, a credit card company may use an AI algorithm that mildly reflects social bias to advertise their products, targeting less-educated people with offers featuring higher interest rates. These people may find themselves clicking on these types of ads without knowing that other social groups are shown better offers, thus, scaling the existing bias.