Abstract
As nations design regulatory frameworks for medical AI, research and pilot projects are urgently needed to harness AI as a tool to enhance today’s regulatory and ethical oversight processes. Under pressure to regulate AI, policy makers may think it expedient to repurpose existing regulatory institutions to tackle the novel challenges AI presents. However, the profusion of new AI applications in biomedicine — combined with the scope, scale, complexity, and pace of innovation — threatens to overwhelm human regulators, diminishing public trust and inviting backlash. This article explores the challenge of protecting privacy while ensuring access to large, inclusive data resources to fuel safe, effective, and equitable medical AI. Informed consent for data use, as conceived in the 1970s, seems dead, and it cannot ensure strong privacy protection in today’s large-scale data environments. Informed consent has an ongoing role but must evolve to nurture privacy, equity, and trust. It is crucial to develop and test alternative solutions, including those using AI itself, to help human regulators oversee safe, ethical use of biomedical AI and give people a voice in co-creating privacy standards that might make them comfortable contributing their data. Biomedical AI demands AI-powered oversight processes that let ethicists and regulators hear directly and at scale from the public they are trying to protect. Nations are not yet investing in AI tools to enhance human oversight of AI. Without such investments, there is a rush toward a future in which AI assists everyone except regulators and bioethicists, leaving them behind.
Responsible AI for Social and Ethical Healthcare (RAISE) Dialogue
Many nations have issued policies portraying AI as a problem that needs human oversight, presumably using top-down regulatory models from the Industrial Revolution and 50-year-old research regulations. This might work while human regulators are smarter and faster than AI — but how long will that be?
There is an urgent need for research and pilot projects to harness AI as part of the solution to the AI problem by developing AI tools to help human regulators oversee safe, ethical use of biomedical AI and give people a voice in co-creating privacy standards that might make them comfortable contributing their data.
CAN CONSENT PROTECT PRIVACY?
Informed consent for data use, as conceived in the 1970s,1,2 seems dead. Consent has an ongoing role to play but must evolve to nurture privacy, equity, and trust in biomedical AI. Consent displays “respect for persons,” the first Belmont principle,3 but respect and privacy are not the same. They may have seemed alike in the 1970s when the main privacy risk was leakage or misuse of sensitive data collected from individuals. By not consenting, people could protect their own privacy.
This approach grows less effective in modern, data-intensive computational settings, in which “inference will necessarily play a large role.”4 Nonconsenting individuals who refuse to share personal data still face a risk that damaging inferences might be drawn about them using insights gleaned from consenters’ data. The inferences drawn about us, and not just data that come from us, fuel modern privacy loss.
Consent “loses its power to protect privacy as algorithms grow less biased, more generalizable,”5 and individuals’ privacy grows interdependent. Unbiased, generalizable software is excellent for health care but can inflict privacy loss on nonconsenters.
In AI-powered health care, privacy (or lack of it) is a systemic phenomenon, a function of the types of computations biomedicine conducts rather than each individual’s consent decisions. Consent and simple deidentification remain “the most widely used privacy preservation techniques for medical datasets”6 amid growing doubts that either protects privacy.
CONSENT’S DISPARATE IMPACT
Consent rules are neutral on their face but have discriminatory impacts because people’s willingness to consent varies by race, ethnicity, gender, and socioeconomic status.7,8 Using opt-out rather than opt-in consent is an ineffective patch. Society’s most vulnerable groups — patients in states where reproductive care invites prosecution, migrants fearing deportation, or parents worried their data might help police identify children as crime suspects — have the strongest incentives to opt out.8
In less than 50 years, biomedicine shifted away from practices that overincluded society’s most vulnerable members in research, such as the heavy use of prisoners in clinical trials before the 1970s.9 This shift was a triumph of bioethics, but the frame shifted in those same 50 years. Today’s ethics scandal is not overinclusion but rather underinclusion of vulnerable groups in research. When people are missing from training data, AI/machine learning tools may not work well for them.
Data inclusion promotes safety and equity. Consent rights that were yesterday’s bioethical triumph undermine those goals if, for whatever reason, they exclude society’s most vulnerable members from AI training data. It is time to consider whether popular consent rules are part of the “system” that fuels systemic inequity.
DIMINISHED PUBLIC VOICE
Consent rights that sought to empower individuals have ultimately muted the public’s voice in privacy policy. Consent rights splinter the public’s strong collective voice, reducing it to a murmur of hushed individual consent transactions. Data handlers calibrate their privacy protections to satisfy the least-demanding consenters among us, while ignoring society’s collective yearning for privacy.
Consent rights empower people to accept or reject (but not to influence) proposed data uses and privacy standards set by others. Study sections of elite scientists decide which uses of people’s data offer enough benefits to merit public funding. Ethical review boards, staffed largely by employees of the research institutions seeking those funds, then decide which privacy protections apply. Commercial technology companies choose data uses based on what might make money and set privacy policies that can be nontransparent even if people read them. Commercial norms infiltrate academic science when the only way for funded researchers to marshal the massive amounts of data, storage, and computing power for AI research is to partner with large technology companies.
Consent rights let people say “no” to data inclusion, but their “no” votes do not shape policy so long as enough consenters are willing to cross the picket line of data users’ abusive privacy practices. When protecting workers’ rights, policy makers recognize that “the right of employees to organize, bargain collectively, and participate through labor organizations of their own choosing in decisions which affect them … safeguards the public interest.”10
In contrast, policy makers favor consent to protect data subjects’ rights. This fractures collective action, akin to forcing laborers to make isolated yes/no decisions on whether to accept management’s wage demands, instead of unionizing workers to amplify their bargaining power. Today’s research regulations offer no option for data subjects to call a strike en masse and halt data uses that ignore the public’s valid privacy concerns.
FAILURE TO MINIMIZE PRIVACY RISKS
Modern research regulations require ethical review boards to minimize risks to human subjects, yet regulators express doubt that these boards have “appropriate expertise regarding data protections” to manage privacy risks.11 Unsure how to minimize privacy risks, ethical review boards sometimes treat informed consent as a confessional in which vaguely disclosing a risk earns absolution for privacy-related sins.
Minimizing privacy risks is difficult and expensive. Sharing data in secure enclaves costs more than letting users download data from public databases. Not letting commercial partners reuse data can make them charge more for the services they provide to researchers. Research budgets are tight, and funders prefer spending scarce funds on science rather than privacy. Stringent data use agreements to minimize risks are easy to write but hard to enforce.
There is a fear that ethical review boards, unable to minimize privacy risks, might lock data down, curtailing access crucial for advancing science and keeping nations competitive in the global AI arms race. Treating consent as a confessional is expedient and keeps data flowing without minimizing privacy risks. Yet, policies valuing expediency over public trust have limits, which, when reached, invite backlash that can freeze scientific progress.
CONSENT AND RECIPROCITY
Medical AI offers too much promise to let it stall amid public backlash. The United Kingdom’s experience with its Care.Data clinical data repository offers lessons on what not to do.
Care.Data was a large clinical data repository that Parliament approved in 2012 to gather patient-identifiable general practice data under the oversight of a newly created public agency.12,13 Data could be used for legally defined aims such as improving patient choice, customer service, health outcomes, transparency/accountability, and world-class health services research.12 Despite opt-out consent rights, patient backlash erupted in 2013 and 2014, causing the project to be terminated in 2016.13
Postmortem studies of Care.Data identified three principles — reciprocity, nonexploitation, and service of the public good — as the keys to public buy-in (“social license”). These are Belmont-style principles of data-intensive health research. Reciprocity is the first principle, implicitly incorporating the other two. Reciprocity operates on two levels, as shown in Table 1.
Table 1.
Elements of Reciprocity in Large Health Data Resources.
| Reciprocal duties to ensure strong data protections |
| People’s contributions of data to a public data resource create reciprocal duties binding all who manage, share, and use the data |
| Data users have duties not to exploit data subjects (including duties to minimize privacy risks) and to use people’s data for socially beneficial purposes |
| Database managers have duties to set and enforce data use and privacy policies and to vet data use proposals to ensure that users are responsible, qualified, and have a credible plan to conduct high-quality, beneficial studies |
| Reciprocity in communication |
| Social licensing stresses a further aspect of “reciprocity, which must begin with sound two-way communication”12 |
| People’s data contributions are reciprocated by granting them a voice in deciding the data protections and data uses that will serve the public good |
Care.Data postmortems stress the need for duty-based privacy protections: “The use of novel and secure data-management technologies is highlighted as key to achieving social licensing.”13 Duties for data handlers to treat data carefully are what protects privacy; consent alone cannot do so.
Current oversight frameworks strive to afford the first type of reciprocity in Table 1 (data protection), but reciprocity of communication is lacking. “To date, predominantly medical and scientific stakeholders have been in the position to determine the ethical boundaries of medicine, care, and medical research.”13 Community representatives on ethical review boards are rarely chosen by those they purportedly represent. Focus groups tend to produce more scholarly articles than real changes to consent forms. Reciprocity in communication “must be distinguished from more narrowly focused public relations exercises that seek to ‘capture’ the public, that is, to persuade the public of the legitimacy of decisions already taken by experts.”12
RECIPROCITY AND OPEN DATA SHARING
Reciprocity can coexist with open-access data sharing, but the meaning of “open access” must be clear. Rooted in essential public facilities such as harbors in the Middle Ages, modern law applies to infrastructures such as subways and pipelines.14 Open access means nondiscriminatory access for any qualified user who meets specified terms and conditions designed to ensure responsible, safe resource use.14 It has never meant that anybody can download sensitive health data to use however they wish.
There are ethical ways for agencies such as the National Institutes of Health to fund and develop open-access AI data resources, but they have a duty to establish clear privacy and data use policies. Reciprocity of communication entails giving the public a voice in what those policies will be.
Biomedical AI is a collective social enterprise that will produce its greatest public benefits when it elicits the widest possible public participation in training data. Top-down, expert-led policy-setting is not the right way to attract wide participation.
Reciprocity requires giving data subjects a voice in deciding key terms of consents they will be asked to sign: For what purposes can data be used, and under what privacy policies? “In this way, patient involvement in governance facilitates co-creating what is considered as trustworthy.”13
IS CO-CREATION OF CONSENT REALISTIC?
Traditionally, there was no way to engage research participants in co-creating consent because nobody knew who they were until they came forward and signed a consent form, which necessarily had to be prepared in advance.
This is not a problem in AI research, which uses real-world clinical data. The pool of future research participants is known. They are the patients in preexisting clinical datasets who researchers wish to study. They are available to co-create consent for AI research. Some readers may fear that the public might make bad decisions. Such fears ignore an essential truth about governance: Trust is itself reciprocal. You cannot earn public trust if you do not trust the public.
Is engaging large patient populations in reciprocal dialogue feasible, and would the resulting policies enjoy democratic legitimacy (social license)? In democracies, laws set by elected legislatures and regulators enjoy high legitimacy, despite concerns that majoritarian laws can tyrannize minorities or fall prey to lobbyists. One solution might be to seek legislation setting privacy standards for AI research.
The Care.Data experience is cautionary: It had clear legal authorization, yet the public still considered it illegitimate. There is a further problem in the United States: Laws regulating AI research might violate the First Amendment. Ethicists today often forget that current research oversight — in particular, its reliance on private-sector ethical review boards — was shaped by fear that direct federal regulation of research might infringe on constitutionally protected freedom of scientific inquiry.15 Unfortunately, private ethical review boards struggle for legitimacy (social license) in contexts, such as creating inclusive, open data resources for medical AI, involving trade-offs between privacy and the social benefits of data use.
AI-POWERED BIOETHICS
The way to achieve legitimacy and constitutionality simultaneously is to let patients speak for themselves — not through their elected legislators or nonelected ethical review boards that strangers appointed to protect their interests but as free people speaking in their own voices. However, they need to speak collectively to be heard.
This solution was infeasible in the 1970s, when current research regulations were designed. However, thanks to AI-powered digital technology, it is feasible now.
The CrowdLaw Catalog lists more than 100 examples of governments and institutions worldwide using digital deliberation tools to engage the public in policy making.16 An example is vTaiwan17,18 — the “v” is for vision, voice, vote, and virtual public engagement — which uses various tools, including Polis, an open-source system for “gathering, analyzing and understanding what large groups of people think in their own words, enabled by advanced statistics and machine learning.”19
Unlike popular social media tools that “optimize for engagement”20 by serving content that amplifies polarization, deliberation tools detect points of consensus and nudge dialogue in directions that expand it. People propose and deliberate their own ideas, instead of reacting to ethicists’ surveys. There is more than a decade of experience using digital tools to forge public consensus on divisive issues.21
Why are research ethicists not pilot-testing these tools to engage patients in co-creating consent standards for biomedical AI research? Biomedical AI requires AI-powered bioethics that lets ethicists hear directly and at scale from the public they are trying to protect. The U.S. Congress set aside “not less than” 5% of the National Institutes of Health Human Genome Project budget to study ethical aspects of genomic research.22 There has been no similar investment in AI ethics. Without such investments, we are rushing toward a future in which AI assists everyone except regulators and bioethicists, leaving them behind.
Supplementary Material
Disclosures
Author disclosures are available at ai.nejm.org.
Dr. Evans received support for this research from the Glenn and Deborah Renwick Faculty Fellowship in AI and Ethics. Drs. Evans and Bihorac also received support from the National Institutes of Health Common Fund’s “Patient-Focused Collaborative Hospital Repository Uniting Standards (CHoRUS) for Equitable AI” project (OT2OD0327-02). The views expressed are the authors’ own and do not necessarily reflect the positions of their institution, research collaborators, or funders.
References
- 1.Privacy Protection Study Commission. Personal privacy in an information society: the report of the Privacy Protection Study Commission. Vol. 2. Washington, DC: Privacy Protection Study Commission, 1977. [Google Scholar]
- 2.Secretary’s Advisory Committee on Automated Personal Data Systems. Records, computers, and the rights of citizens: report. Vol. 10. Washington, DC: US Department of Health, Education & Welfare, 1973. [Google Scholar]
- 3.National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report: ethical principles and guidelines for the protection of human subjects of research. 1978. (https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html). [PubMed]
- 4.Advisory Committee to the Director AI Working Group. Report of the ACD AI Working Group. December 6, 2019. (https://acd.od.nih.gov/documents/presentations/12132019AI_FinalReport.pdf).
- 5.Evans BJ. Rules for robots, and why medical AI breaks them. J Law Biosci 2023;10:lsad001. DOI: 10.1093/jlb/lsad001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kaissis GA, Makowski MR, Rückert D, Braren RF. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2020;2:305–311. DOI: 10.1038/s42256-020-0186-1. [DOI] [Google Scholar]
- 7.Jagsi R, Griffith KA, Sabolch A, et al. Perspectives of patients with cancer on the ethics of rapid-learning health systems. J Clin Oncol 2017;35:2315–2323. DOI: 10.1200/JCO.2016.72.0284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Evans BJ. The HIPAA privacy rule at age 25: privacy for equitable AI. Fla State Univ Law Rev 2023;50:741–810. DOI: 10.2139/ssrn.4316211. [DOI] [Google Scholar]
- 9.Advisory Committee on Human Radiation Experiments. Final report. Washington, DC: ACHRE, 1995. [Google Scholar]
- 10.Executive Office of the President. Executive order 14025 of April 26, 2021. Worker organizing and empowerment. Fed Regist 2021; 86:22829–22832. [Google Scholar]
- 11.U.S. Department of Health and Human Services. Human subjects research protections: enhancing protections for research subjects and reducing burden, delay, and ambiguity for investigators. Fed Regist 2011;76:44512–44531. [Google Scholar]
- 12.Carter P, Laurie GT, Dixon-Woods M. The social licence for research: why Care.Data ran into trouble. J Med Ethics 2015;41: 404–409. DOI: 10.1136/medethics-2014-102374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Muller SHA, Kalkman S, van Thiel GJMW, Mostert M, van Delden JJM. The social licence for data-intensive health research: towards co-creation, public value and trust. BMC Med Ethics 2021;22:110. DOI: 10.1186/s12910-021-00677-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Phillips CF Jr. The regulation of public utilities. Arlington, VA: PUR Books, 1993. [Google Scholar]
- 15.National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Protection of human subjects: institutional review boards: report and recommendations of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Fed Regist 1978;43:56173–56198. [PubMed] [Google Scholar]
- 16.GovLab. CrowdLaw Catalog. February 27, 2024 (https://catalog.crowd.law).
- 17.Horton C The simple but ingenious system Taiwan uses to crowdsource its laws. August 21, 2018. (https://www.technologyreview.com/2018/08/21/240284/the-simple-but-ingenious-system-taiwan-uses-to-crowdsource-its-laws/).
- 18.Hsiao Y-T, Lin S-Y, Tang A, Narayanan D, Sarahe C. vTaiwan: an empirical study of open consultation process in Taiwan. July 4, 2018. (https://osf.io/preprints/socarxiv/xyhft). Preprint.
- 19.Polis. Input crowd, output meaning. February 27, 2024 (https://pol.is/home).
- 20.Narayanan A, Bass KG. Call for participation: optimizing for what? Algorithmic amplification and society. Knight First Amendment Institute. November 4, 2022. (https://knightcolumbia.org/blog/call-for-participation-optimizing-for-what-algorithmic-amplification-and-society). [Google Scholar]
- 21.Simon J, Bass T, Boelman V, Mulgan G. Digital democracy: the tools transforming political engagement. London: NESTA, 2017. [Google Scholar]
- 22. Public Law No. 103–43, §1521, 107 Stat 181. 1993.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
