Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 1.
Published in final edited form as: Am J Bioeth. 2022 May;22(5):4–7. doi: 10.1080/15265161.2022.2059206

Promoting Ethical Deployment of Artificial Intelligence and Machine Learning in Healthcare

Kayte Spector-Bagdady a, Vasiliki Rahimzadeh b, Kaitlyn Jaffe a, Jonathan Moreno c
PMCID: PMC9805364  NIHMSID: NIHMS1852454  PMID: 35499568

INTRODUCTION

The ethics of artificial intelligence (AI) and machine learning (ML) exemplify the conceptual struggle between applying familiar pathways of ethical analysis versus generating novel strategies. Melissa McCradden et al.’s “A Research Ethics Framework for the Clinical Translation of Healthcare Machine Learning” puts pressure on this tension while still attempting not to break it—trying to impute structure and epistemic consistency where it is currently lacking (McCradden et al. 2022). They highlight an “AI chasm” “generated by a clash between the…cultures of computer science and clinical science,” but argue that the “ethical norms of human subjects research” are still the right pathway to bridge this divide.

The Open Peer Commentaries included in this issue agree with this central premise while critiquing the insufficiency of current ethics and regulatory solutions to adequately protect communities at higher risk for ML bias. The current U.S. human subjects research regulations (HSRR) were developed from traditional conceptions of research ethics (Schupmann and Moreno 2020). Research ethicists have subsequently been asked to apply existing concepts to new areas over which the regulatory structure does not reach. AI/ML is an excellent example of the strengths and limitations of this current default.

STAGE 0: DATASET GENERATION AND PROCUREMENT

McCradden et al. propose an evaluation framework that considers ethical issues in clinical applications of ML. An initial critic of this framework is where, exactly, that analysis should begin. The proposed framework begins with hypothesis-generating data access, moves to a ‘silent’ period of evaluation, and then ends with the post-evaluation of ML models in clinical settings. But Kadija Ferryman questions whether evaluating justice might be a useful analysis earlier in its lifecycle (Ferryman 2022). Doing so would likewise address concerns, which Effy Vayena and Alessandro Blasimme warn against, where ML algorithms carry forward biases reflected in the data on which they are trained (Vayena and Blassime 2022). Nicole Martinez-Martin and Mildred Cho agree. While clinical evaluation can measure select impacts, it cannot correct non-representative data in the learning models (Martinez-Martin and Cho 2022).

Ethics and equity evaluation is needed earlier in ML development, including during data generation and procurement, to ascertain who is (and is not) represented and why. Biases include the fact that training datasets generated from “patients” represent people with access to healthcare and datasets generated from “research participants” reflect those recruited to enroll. There are also known differences in consent rates between historically included and excluded populations (Spector-Bagdady et al. 2021). As Margaret Levi et al. argue, these early stages of development are critical (Levi, Bernstein, and Waeiss 2022). Far from preliminary work, dataset generation and procurement can have significant downstream implications and disparate impacts for some patients and their communities (Benjamin 2019).

LIMITS OF THE LAW

Authors also debate the usefulness and applicability of the current U.S. approach to governing research with humans. These ethical and legal frameworks are cyclical insofar as principlism initially drove the conception of the regulations, and research ethicists then arecalled upon to fill remaining and new gaps in regulatory oversight due to advancing technologies, perspectives, and methodologies. Closing the McCradden et al. “AI chasm” is harder due to these regulatory gaps—which may require more flexibility on the part of ethics.

One legal challenge to controlling all human subjects research in the United States is that the law must have a regulatory hook—a legally and (sometimes) socially justifiable right to curtail the free choice of individuals. One prospective source of that right derives from the researcher’s use of government funds. The Office of Human Research Protections (OHRP)-governed HSRR (the first subpart of which is referred to as the “Common Rule”) apply to research supported by the federal offices and agencies which signed it (45 CFR § 46.101(a)). But most academic medical centers require all researchers to follow the HSRR across funding types. Thus, the regulations from a practical perspective cover much academic human subjects research.

Another potential regulatory hook is that if researchers want to submit their data in support of a Food and Drug Administration (FDA) application (e.g., for a new device) the researchers must have followed FDA’s research regulations (which are substantially like the OHRP ones) (21 CFR §§ 50, 56). Thus, while technically true that researchers prospectively must follow the regulations only if they have federal funds, practically many academic researchers must do so anyway, and retrospectively many industry researchers must have done so to market their ML device. But as Vayena and Blasimme point out, the “wicked problem” of AI/ML is that research building the models is often done with private funding and FDA does not consider many ML systems to be devices over which they have authority. Many ML technologies, therefore, fall squarely into this legal chasm (Price et al. 2019). FDA is currently attempting to fill this gap, and, as Ho and Malpani argue, there are other promising approaches internationally (Ho and Malpani 2022), but the gap persists in the United States.

An additional legal limitation is that research involves a “human subject” under the HSRR only if it is interventional or involves identified data or biospecimens. When ML research only uses de-identified data or specimens, it is neither covered by the OHRP nor the FDA regime and has no Institutional Review Board (IRB) review or informed consent obligations. Even if data or specimens are identified, researchers can also apply for an IRB waiver of consent if their work is low risk. This generates additional differences regarding the consent status of data that ends up in ML training sets: some are from interventional or clinical trials where participants fully consented to the primary research as well as many secondary uses (as required by the revised Common Rule) with attendant biases in who is recruited and enroll, but others are generated from pragmatic clinical trials or other types of low-risk research, receive an IRB waiver, and neither require consent from nor even notice to patients whose medical records are used (Morain and Largent 2021) but are sometimes more demographically diverse (Spector-Bagdady et al. 2021).

Under the current legal regime, if we were to increase protections for all ML research, that would involve: 1) the government regulating all human subjects research no matter funding or whether it involves an FDA-marketed product, 2) changing those regulations such that de-identified data protocols are required to go through IRB review, and 3) that individual informed consent (the sharpest tool in the IRB box) could alleviate community-level equity issues. The enormity of these tasks brings us back to the creativity of the proposals in this issue.

ETHICS-DRIVEN SOLUTIONS

The HSRR are intended to (however imperfectly) be responsive to the risks of research. Research involving physical interventions and manipulations tops this hierarchy, research involving identified data and specimens that is otherwise low risk is at the bottom, and research with de-identified data and specimens is not even covered. Given the limitations of a legal structure based on traditional notions of research ethics—which focus on avoiding physical or psychological harm to individuals due to their creation in the wake of research scandals like the Public Health Service’s syphilis experiments in Tuskegee—McCradden et al. call upon us to again rethink how to approach novel tensions. A new set of characteristics upon which AI/ML should be evaluated may include non-welfare interests (De Vries et al. 2016), pernicious bias (Obermeyer et al. 2019), or, as James Shaw articulates, the “adaptive nature of data science, opaque analytic methods, and non-hypothesis driven uses of secondary data” (Shaw 2022).

As aptly explored by Tijs Vandemeulebroucke and colleagues, these risks generally redound at the community level (Vandemeulebroucke et al. 2022). But, as Levi et al. point out, the IRB framework specifically “excludes review of risks to society…” (Levi, Bernstein, and Waeiss 2022) and gives IRBs no legal authority to require review for, or opine on the community impact of, AI/ML protocols (Bernstein et al. 2021). Also, as Danton Char argues, ethical evaluation run by a certain group can be just as biased as the outcomes they are purporting to assess (Char 2022)— particularly if the IRB is directly hired by commercial companies that do not have a standing academically oriented IRB. And, even if an IRB found a protocol to be riskier due to these factors, individual informed consent cannot resolve community-level risk (Lynch, Wolf, and Barnes 2019). From this perspective, IRBs might not be the proper pathway to mitigate the risks of AI/ML as they have limited legal authority over which protocols they review in the first place, their focus is inherently on the individual, and their remit for high risk is only rejection if the risks outweigh the benefits or individual consent.

This is a hard problem. McCradden et al., as well as others in this issue, highlight important facets of it and make novel proposals regarding its resolution. Whether it can be resolved with existing regulatory tools or requires new ethical approaches, we defer. But whatever commonalities AI/ML research has with previous autonomy-based case studies, its implications have important differences in terms of risk of impact—and this kind of thoughtful deliberation is a critical step toward its resolution.

ACKNOWLEDGEMENTS

The authors wish to acknowledge W. Nicholson Price II for his helpful feedback on an earlier version of this editorial.

FUNDING

This work is supported within the National Institutes of Health (NIH) Health Care Systems Research Collaboratory by the NIH Common Fund through cooperative agreement U24AT009676 from the Office of Strategic Coordination within the Office of the NIH Director. This work is also supported by the NIH through the NIH HEAL Initiative under award number U24AT010961 (KSB and VR), the National Human Genome Research Institute under award number K01HG010496 (KSB) and T32HG008953 (VR), and the NIH Division of Loan Repayment (VR). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or other funders.

Footnotes

DISCLOSURE STATEMENT

The authors declare no conflicts.

REFERENCES

  1. Benjamin R. 2019. Assessing risk, automating racism. Science 366 (6464):421–2. doi: 10.1126/science.aaz3873. [DOI] [PubMed] [Google Scholar]
  2. Bernstein MS, Levi M, Magnus D, Rajala BA, Satz D, and Waeiss C. 2021. Ethics and society review: Ethics reflection as a precondition to research funding. Proceedings of the National Academy of Sciences 118 (52): e2117261118. doi: 10.1073/pnas.2117261118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Char D. 2022. Challenges of local ethics review in a global healthcare AI market. The American Journal of Bioethics 22 (5):39–41. doi: 10.1080/15265161.2022.2055214. [DOI] [PubMed] [Google Scholar]
  4. De Vries RG, Tomlinson T, Kim HM, Krenz CD, Ryan KA, Lehpamer N, and Kim SYH. 2016. The moral concerns of biobank donors: The effect of nonwelfare interests on willingness to donate. Life Sciences, Society and Policy 12 (1):3. doi: 10.1186/s40504-016-0036-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ferryman K. 2022. Rethinking the AI Chasm. The American Journal of Bioethics 22 (5):29–30. doi: 10.1080/15265161.2022.2055218. [DOI] [PubMed] [Google Scholar]
  6. Ho A, and Malpani A. Scaling up the research ethics framework for healthcare machine learning as global health ethics and governance. The American Journal of Bioethics 22 (5):36–38. doi: 10.1080/15265161.2022.2055209. [DOI] [PubMed] [Google Scholar]
  7. Levi M, Bernstein M, and Waeiss C. 2022. Broadening the ethical scope: A response to McCradden et al. The American Journal of Bioethics 22 (5):26–28. doi: 10.1080/15265161.2022.2055219. [DOI] [PubMed] [Google Scholar]
  8. Lynch HF, Wolf LE, and Barnes M. 2019. Implementing regulatory broad consent under the revised common rule: Clarifying key points and the need for evidence. The Journal of Law, Medicine and Ethics 47 (2): 213–31. doi: 10.1177/1073110519857277. [DOI] [PubMed] [Google Scholar]
  9. Martinez-Martin N, and Cho M. 2022. Bridging the AI Chasm: Can EBM address representation and fairness in clinical machine learning. The American Journal of Bioethics 22 (5):30–32. doi: 10.1080/15265161.2022. 2055212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. McCradden MD, Anderson JA, Stephenson EA, Drysdale E, Erdman L, Goldenberg A, and Zlotnik Shaul R. 2022. A research ethics framework for the clinical translation of healthcare machine learning. The American Journal of Bioethics 22 (5):8–12. doi: 10.1080/15265161.2021.2013977. [DOI] [PubMed] [Google Scholar]
  11. Morain SR, and Largent EA. 2021. When research is integrated into care — any “ought” from all the “is”. Hastings Center Report 51 (2):22–32. doi: 10.1002/hast.1242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Nicholson Price II W, Sachs R, and Eisenberg RS. 2019. New innovation models in medical AI. Harvard Journal of Law and Technology 33 (1): 66–116. 3783879. doi: 10.2139/ssrn.3783879. [DOI] [Google Scholar]
  13. Obermeyer Z, Powers B, Vogeli C, and Mullainathan S. 2019. Dissecting racial bias in an algorithm used to managehe health of populations. Science 366 (6464):447–53. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
  14. Schupmann W, and Moreno JD. 2020. Belmont in context: Perspectives in Biology and Medicine 63 (2):220–39. doi: 10.1353/pbm.2020.0028. [DOI] [PubMed] [Google Scholar]
  15. Shaw J. 2022. Emerging paradigms for ethical review of research using artificial intelligence. The American Journal of Bioethics 22 (5):42–44. doi: 10.1080/15265161.2022.2055206. [DOI] [PubMed] [Google Scholar]
  16. Spector-Bagdady K, Tang S, Jabbour S, Price WN, Bracic A, Creary MS, Kheterpal S, Brummett CM, and Wiens J. 2021. Respecting autonomy and enabling diversity: The effect of eligibility and enrollment on research data demographics. Health Affairs 40 (12):1892–9. doi: 10.1377/hlthaff.2021.01197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Vandemeulebroucke T, Denier Y, and Gastmans C. 2022. The need for a global approach to the ethical evaluation of healthcare machine learning. The American Journal of Bioethics 22 (5):33–35. doi: 10.1080/15265161.2022.2055207. [DOI] [PubMed] [Google Scholar]
  18. Vayena E, and Blassime A. 2022. A systemic approach to the oversight of machine learning clinical translation. The American Journal of Bioethics 22 (5):23–25. doi: 10.1080/15265161.2022.2055216. [DOI] [PubMed] [Google Scholar]

RESOURCES