Bias can creep in at many stages of the deep-learning process, and the standard practices in computer science aren’t designed to detect it (1).
Imagine that you’ve taken up the sport of archery. With practice, you achieve consistency, but when you aim at the center of the target, the arrows land 5 cm to the right of center. What would you do?
Naturally, you’d adjust your aim. By aiming 5 cm to the left, you overcome your systematic bias, and your arrows now find their mark. Skilled archers know to adjust their aim to account for factors such as their own bias, the distance to the target, and the presence of crosswind.
The machine learning (ML) models that have begun to show success in medical imaging require even more careful attention to address the effects of bias. It’s well known that ML models are supremely adept at recognizing patterns. Often, though, the patterns that they learn can incorporate features that the systems’ authors never intended. We’ve laughed—and despaired—over the systems that diagnosed pneumonia based on a radiographic marker (2) or those that detected pneumothorax based on confounding image features, such as an inserted chest tube (3).
Such failures are examples of “shortcut learning,” where an ML system learns a pattern that fits its training data, without learning to generalize properly to the complexities of the real world (4). Because ML systems are trained on historical data, they also can learn shortcuts that are not only undesirable, but that incorporate historical prejudice. A résumé-screening system to select a technology company’s top job candidates was found to discriminate against women (5). A widely used algorithm to assign health care resources was found to be racially biased (6). Given that artificial intelligence (AI) systems can discern a patient’s race and sex from a chest radiograph (7), we must be attentive to the possibility that unintended biases in radiology ML systems could interact adversely with other health care AI models.
Several proposed approaches can help make ML models—and the data from which they’re derived—more transparent. “Model cards” can describe how a model performs across a variety of demographic or phenotypic groups; they help disclose the context in which the model is intended to be used (8). “Datasheets for datasets” can document the motivation, composition, collection process, and recommended uses of a given dataset; these datasheets have potential to increase transparency and accountability (9).
To address the particular challenges of bias in medical imaging AI systems, Dr Bradley Erickson and colleagues have produced a new collection of articles on “Mitigating Bias in Radiology Machine Learning” (https://pubs.rsna.org/page/ai/mitigating_bias). These articles describe approaches to help identify and reduce bias in radiology ML systems. Their articles address three principal areas where bias can impact an ML model: data handling (10), model development (11), and performance evaluation (12).
These articles will help us identify and mitigate potential biases in radiology ML systems, and thus better assure trustworthy results that are truly “on target” for the care of our patients.
Footnotes
Author declared no funding for this work.
Disclosures of conflicts of interest: C.E.K. Salary support from RSNA, paid to employer, for editorial role (Editor of Radiology: Artificial Intelligence).
References
- 1. Hao K . This is how AI bias really happens—and why it’s so hard to fix . MIT Technology Review . https://www.technologyreview.com/2019/02/04/137602/this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix/. Published February 4, 2019 .
- 2. Zech JR , Badgeley MA , Liu M , Costa AB , Titano JJ , Oermann EK . Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study . PLoS Med 2018. ; 15 : e1002683 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rueckel J , Trappmann L , Schachtner B , et al . Impact of confounding thoracic tubes and pleural dehiscence extent on artificial intelligence pneumothorax detection in chest radiographs . Invest Radiol 2020. ; 55 : 792 – 798 . [DOI] [PubMed] [Google Scholar]
- 4. Geirhos R , Jacobsen J-H , Michaelis C , et al . Shortcut learning in deep neural networks . Nat Mach Intell 2020. ; 2 : 665 – 673 . [Google Scholar]
- 5. Dastin J . Amazon scraps secret AI recruiting tool that showed bias against women. Reuters 2018 . https://www.reuters.com/article/idUSKCN1MK08G. Published October 10, 2018 .
- 6. Obermeyer Z , Powers B , Vogeli C , Mullainathan S . Dissecting racial bias in an algorithm used to manage the health of populations . Science 2019. ; 366 : 447 – 453 . [DOI] [PubMed] [Google Scholar]
- 7. Gichoya JW , Banerjee I , Bhimireddy AR , et al . AI recognition of patient race in medical imaging: a modelling study . Lancet Digit Health 2022. ; 4 : e406 – e414 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mitchell M , Wu S , Zaldivar A , et al . Model cards for model reporting . Proceedings of the Conference on Fairness, Accountability, and Transparency . 10.1145/3287560.3287596. Published January 29, 2019 . [DOI] [Google Scholar]
- 9. Gebru T , Morgenstern J , Vecchione B , Vaughan JW , Wallach H , Iii HD , Crawford K . Datasheets for datasets . Commun ACM 2021. ; 64 : 86 – 92 . [Google Scholar]
- 10. Rouzrokh P , Khosravi B , Faghani S , et al . Mitigating bias in radiology machine learning: 1. Data handling . Radiol Artif Intell 2022. ; 4 ( 5 ): e210290 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Zhang K , Khosravi B , Vahdati S , et al . Mitigating bias in radiology machine learning: 2. Model development . Radiol Artif Intell 2022. ; 4 ( 5 ): e220010 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Faghani S , Khosravi B , Zhang K , et al . Mitigating bias in radiology machine learning: 3. Performance metrics . Radiol Artif Intell 2022. ; 4 ( 5 ): e220061 . [DOI] [PMC free article] [PubMed] [Google Scholar]
