Bias & Fairness in ML
Key Idea: What is Bias in ML?
Bias in ML (not to be confused with the bias term, denoted b or w0) refers to the intentional or inadvertent typecasting, prejudice, or favoritism towards certain things, people, or groups over others which may affect 1.) the collection and interpretation of data, 2.) the design of a system, and 3.) how users interact with a system. Bias in ML may also refer to systematic error introduced by flawed sampling or reporting procedures.
Several known types of ML bias include automation bias, confirmation bias, experimenter’s bias, group attribution bias, implicit bias, in-group bias, out-group homogeneity bias, coverage bias, non-response bias, participation bias, reporting bias, sampling bias, and selection bias.
Watch & Read: ML & Human Bias
Machine Learning and Human Bias
Cite as: Machine Learning and Human Bias. Google News Initiative; 2017.
Lesson 6: Bias in Machine Learning
Cite as: Bias in Machine Learning. Google News Initiative; 2017.
Watch this video from Google's News Initiative and review the accompanying guide.
In this brief video and accompanying guide from Google's News Initiative, we're introduced to the concept of human bias in ML via a simple directive: "Picture a shoe."
Though it may not be immediately obvious, each of us is biased toward one image of what a "shoe" looks like over the others.
A few key points:
When training models informed by developers' perceptions, we expose them to bias
Human biases often become part of the technology we create
Decisions based on data are not always neutral
Deeper Dive: Fairness in ML
Fairness
Cite as: Mitchell, M. et al. Machine Learning Crash Course: Fairness. Google Developers; 2019.
Figure 2 from Mitchell, M. et al. Machine Learning Crash Course: Fairness. Google Developers; 2019.
Review this mini-lecture offered by Margaret Mitchell from Google's Machine Learning Crash Course.
In this mini-lecture from Google's Machine Learning Crash Course, Margaret Mitchell talks about human bias mitigation and fairness in ML.
A few key points:
When we store our ideas of the world, we store them based on what's typical or prototypical of objects (i.e., "bananas are yellow").
When the conditions of an object cohere with our stored understanding, we tend not to mention those conditions (i.e., "bananas" vs "yellow bananas")
Categorizing information using defining characteristics that may vary (such as color) impacts how we interact with examples that don't fit that mold. This is a form of a bias.
How Do Biases Affect ML Systems?
Reporting bias: The tendency for people to report things in a way that isn't a reflection of real-world frequencies, or the degree to which a property is characteristic of a class of individuals
Selection bias: Erroneous conclusions drawn from data sampled in ways that were non-adequately-representative of the population of interest
Overgeneralization: Conclusions drawn from information that is too limited or not specific enough
This is closely related to out-group homogeneity bias: The tendency to assume that "out-group" members are somehow more similar to one another (i.e., more homogenous), and less nuanced compared to "in-group" members.
Confirmation bias: The tendency to search for, interpret, and favor information that confirms one's own preexisting beliefs and hypotheses
Automation bias: The preference toward suggestions and decisions from automated systems (ex. ML systems) as if they are somehow more objective than information sourced from humans
What Strategies Can We Use to Design for Fairness?
Consider the problem. What is being adequately represented in your dataset and what might be overlooked?
Solicit feedback from experts. They may provide guidance or suggest changes that may enable your project to have a more sustainable (and positive) impact.
Train models to account for bias. What do outliers look like in your dataset? How does your model handle outliers? What implicit assumptions does your system integrate and how might you model or mitigate those?
Interpret outcomes. Is the ML system overgeneralizing or overlooking dynamic social context? If a human being were to perform the system's task instead, what would appropriate social behavior look like? What interpersonal cues does the system not account for?
Publish with context and disclosure. What are the appropriate scenarios in which to apply this model? What are the model's limitations?
Explore: Google's Fairness in ML Glossary
Several of the definitions noted in the above section can be found in Google's Fairness in ML Glossary.
Take a moment to explore these definitions and accompanying examples.
Explore: Google's Responsible AI Practices
Google AI's Responsible AI Practices webpage houses general recommended practices for AI/ML systems and provides examples of Google's work to promote responsible AI/ML, including the use of ML model cards for model disclosures.
Explore the page and review Google's recommended practices, including:
Use a human-centered design approach
Identify multiple metrics to assess training and monitoring
When possible, directly examine raw data
Understand the limitations of your dataset and model
Test, Test, Test
Continue to monitor and update the system after deployment
Read: What Happens When an Algorithm Cuts Your Health Care?
This 2018 article written by Colin Lecher for The Verge highlighted the recorded consquences of when the Arkansas Department of Human Services used a resource optimization algorithm that inadvertently cut care coverage and service hours to formerly-qualifying residents with no clear explanation why.
Riddled with legal intricacies and an admittance of model development errors by the vendor, the story offers a key example of how automation bias and lack of transparent model disclosures can negatively impact people seeking care in community and home health settings.
Read the article and consider how model deployment without due disclosures may perpetuate errors and biases.
Cite as: Lecher C. What Happens When an Algorithm Cuts Your Healthcare? The Verge. Published online March 21, 2018.
Read: Potential Biases in ML Algorithms Using EHR Data
Review this paper written by Milena Gianfrancesco and colleagues.
This paper underscores the potential for ML models which use EHR data to inadvertently perpetuate biases in healthcare settings.
A few key points:
While the use of ML methodologies in healthcare offers tremendous opportunity for health care efficacy and efficiency, concern exists around biases and deficiencies in data used by ML algorithms that may contribute to socioeconomic disparities in care.
Key problems outlined include:
Missing Data and Patients Not Identified by Algorithms
Model misinterpretations may happen when key data are unavailable or represent metadata that are not included in clinical decision support models.
Sample Size and Underestimation
Insufficient sample sizes may result in erroneous interpretations and predictions by ML models as small sample sizes or lack of healthcare service utilization may be misinterpreted as lower disease burden.
Misclassification and Measurement Error
Misclassification of disease and measurement errors are common sources of bias in observational studies and analyses based on EHR data. For example, quality of care may be affected by implicit biases if uninsured patients more frequently receive substandard care than those with insurance.
While the effects of measurement error and misclassification in regression models are well studied, these effects and mitigation strategies in the broader context of ML require further assessment.
Recommendations:
Review datasets to assess representation across demographic categories and care interruptions
Optimize ML models for imbalanced data sets
Work with interdisciplinary teams to choose appropriate questions and settings for machine learning use, interpret findings, and conduct follow-up studies.
Test clinical decision support models for bias and discriminatory aspects throughout all stages of development to ensure that models are not misinterpreting exposure-disease associations, including associations based on sex, race/ethnicity, or insurance.
Ensure that efforts to assess the utility of ML models does not solely focus on performance metrics, but also improvements in clinically relevant outcomes.
Cite as: Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018 Nov 1;178(11):1544-1547. doi: 10.1001/jamainternmed.2018.3763. PMID: 30128552; PMCID: PMC6347576.
Exercise: Does Your Model Perpetuate Bias?
In this exercise, we are going to build a simple neural network using synthetic patient satisfaction data to make a crucial funding decision... But things won't go quite right.
Directions:
Make a copy of this Google Colaboratory notebook in your Google Drive account (also found on GitHub)
Follow the annotated directions to generate the model
This exercise is adapted from a bias lab developed by the awesome team at Crash Course AI.
Share: #BiasInML
Thought prompt: In what ways can ML models that perpetuate bias impact healthcare or public health contexts? What risks do models which exacerbate existing biases pose to the goals of achieving health equity and mitigating health disparities?
Share your thoughts on Twitter using the hashtags #MDSD4Health #BiasInML
Tag us to join the conversation! @MDSD4Health
For ideas on how to take part in the conversation, check out our Twitter Participation Guide.
Bonus Material!
Want to learn more about bias & computing?
Check out Ethical CS: a curriculum centered around ethics & computer science, developed by Evan Peck of Bucknell University.