Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2023 Feb 24;30(5):819–827. doi: 10.1093/jamia/ocad022

A framework to identify ethical concerns with ML-guided care workflows: a case study of mortality prediction to guide advance care planning

Diana Cagliero 1, Natalie Deuitch 2,3, Nigam Shah 4, Chris Feudtner 5,6, Danton Char 7,8,
PMCID: PMC10114055  PMID: 36826400

Abstract

Objective

Identifying ethical concerns with ML applications to healthcare (ML-HCA) before problems arise is now a stated goal of ML design oversight groups and regulatory agencies. Lack of accepted standard methodology for ethical analysis, however, presents challenges. In this case study, we evaluate use of a stakeholder “values-collision” approach to identify consequential ethical challenges associated with an ML-HCA for advanced care planning (ACP). Identification of ethical challenges could guide revision and improvement of the ML-HCA.

Materials and Methods

We conducted semistructured interviews of the designers, clinician-users, affiliated administrators, and patients, and inductive qualitative analysis of transcribed interviews using modified grounded theory.

Results

Seventeen stakeholders were interviewed. Five “values-collisions”—where stakeholders disagreed about decisions with ethical implications—were identified: (1) end-of-life workflow and how model output is introduced; (2) which stakeholders receive predictions; (3) benefit-harm trade-offs; (4) whether the ML design team has a fiduciary relationship to patients and clinicians; and, (5) how and if to protect early deployment research from external pressures, like news scrutiny, before research is completed.

Discussion

From these findings, the ML design team prioritized: (1) alternative workflow implementation strategies; (2) clarification that prediction was only evaluated for ACP need, not other mortality-related ends; and (3) shielding research from scrutiny until endpoint driven studies were completed.

Conclusion

In this case study, our ethical analysis of this ML-HCA for ACP was able to identify multiple sites of intrastakeholder disagreement that mark areas of ethical and value tension. These findings provided a useful initial ethical screening.

Keywords: machine learning, clinical, artificial intelligence, ethics, palliative care, end-of-life care

INTRODUCTION

With the rapid increase in the availability of healthcare data collected during clinical care and recent advances in machine learning (ML) tools, ML has become a clinical reality.1,2 ML tools promise to improve quality and access to healthcare. However, in nonhealthcare contexts significant ethical concerns have emerged with applications of ML tools. Identifying ethical concerns with ML applications to healthcare (ML-HCA) before problems result is now a stated goal of ML design oversight groups and regulatory agencies, such as the US F.D.A.3,4 A challenge to this goal is the lack of an accepted standard methodology for ethical analysis of ML-HCAs.

Thus far, the few examinations of ethical issues emerging with ML-HCA have been broad and largely theoretical,5–9,14 extrapolating ethical issues that have emerged in nonhealthcare contexts. Identification of ethical issues arising in actual ML-driven applications for healthcare in situ is needed to expand understanding of the concerns emerging with ML uses and to guide approaches to address these concerns.5,7 There have been broad calls to address a lack of frameworks in place for operationalizing AI in healthcare, including a lack of guidance on best practices for inclusivity and equity.5 There have been scoping reviews identifying a dearth of literature in a variety of settings, including public health contexts and low- and middle-income countries.10 There is also the issue of a lack of clinical acceptance of AI decision support tools by providers.11 There have been attempts to create guidelines for ethical development and deployment of AI12,13 as well as to guide policymakers and regulators of important considerations for cost and efficiency.14,15 However, there remains a gap in methodology for identification of emerging ethical problems with specific ML-HCA.16 Identifying specific ethical problems, grounded in an actual ML-HCA, will help clarify ethical decisions and their interconnected consequences, which will improve ethical decision-making.17

Attempts to demonstrate algorithmic fairness typically focus narrowly on model performance, demonstrating parity (or lack of bias) in model outputs for preidentified populations of interest.18,19 These analyses do not examine the consequences of taking actions based on the model’s output.20 For any ML-HCA, consequential ethical problems are likely to involve 3 interacting elements that determine how useful ML-HCA is: (1) model characteristics including the assumptions, underlying data, design, and risk-estimate output from the model; (2) how the model output is implemented into a given workflow, including the decision regarding what risk level warrants intervention; (3) this intervention’s benefits and harms and the trade-offs posed.21 Key stakeholders might be guided by different values regarding each of these 3 elements, resulting in what might be called a “values-collision.”

We recently published a rigorous, broadly applicable stakeholder “values-collision” approach for identifying emerging ethical concerns with ML-HCAs, conceptually grounded in the pipeline or steps from design to clinical implementation.22 This approach relies on 6 premises: (1) multiple stakeholders are impacted by any ML-HCA and these stakeholders can be identified; (2) stakeholder groups are likely to have different values, and significantly different explicit or implicit goals for the ML-HCA can be ascertained through interviews; (3) the process of design and development of an ML-HCA involves making a series of decisions, from initially conceiving of the problem to be addressed by an ML-HCA to actually deploying and maintaining it; (4) how a stakeholder makes these decisions, or would want these decisions to be made, reflects their underlying values, as such decisions often are based on value judgments;23 (5) where stakeholder groups disagree or their values are at odds about resolving these decisions—where values collide—are where ethical problems are most likely to emerge;23,24 (6) while some of these “values-collisions” may mark novel ethical concerns, many may be resolvable by drawing on prior ethical scholarship on similar or related problems, allowing designers and users of an ML-HCA to address ethical challenges before they become consequential.

In this case study, we applied our “values-collision” approach to a machine-learning enabled workflow for identifying patients for advance care planning (ACP). Currently, a significant gap exists between patients’ desires concerning how they wish to spend their final days and how they actually spend them.25 ACP could help close this gap, as ACP could substantially expand patients’ access to important components of palliative care. But ACP is not commonly done by most Americans, with less than 7.5% of those over 65 years of age having ACP.26 ML methods, by efficiently identifying patients likely to benefit most from ACP, could play a crucial role in focusing ACP information interventions and thus in closing the gap. Ethical concerns have been raised surrounding the use of ACP and mortality predictions, specifically raising questions regarding reducing use of services rather than aligning care with patient goals.27,28 The model evaluated here identifies these individuals who might benefit from ACP by addressing a proxy problem: predicting the probability of a given patient passing away within the next 12 months.29 ML approaches to predicting need for ACP have the advantage of being scalable to large populations in ways other mortality prediction models are not.30,31

In this study, we demonstrate that we could identify consequential ethical challenges associated with implementing a ML-HCA for ACP, which subsequently guided direct revision and improvement of the ML-HCA.

The mortality prediction model

The design team20 developed a gradient boosted tree model to estimate the probability of 1-year, all-cause mortality upon inpatient admission, using a deidentified, retrospective dataset of EHR records for adult patients seen at Stanford Hospital between 2010 and 2017. This dataset comprised 907 683 admissions and had a prevalence of 17.6% for 1-year all-cause mortality. The performance of the resulting model was evaluated against clinician chart review prospectively. The design team gathered data used for validation of the model against clinician chart review over 2 months in the first half of 2019; each day in that interval they presented an experienced palliative-care advanced-practice (AP) nurse with a list of patients newly admitted to General Medicine at Stanford Hospital. These lists were in random order, and no model output was presented. The nurse performed chart reviews to answer the question, “Would you be surprised if this patient passed away in the next twelve months?” This “surprise” question is an established screening approach for identifying Palliative Care needs.32 There was strong agreement between model assessment and AP nurse response: AUROC of 0.86 (95% confidence interval of 0.8–91.9).

METHODS

We used a qualitative approach, interviewing ML-HCA designers, administrators, clinicians involved in ACP and patients affected, or potentially affected, by the ML-HCA to investigate how these stakeholders anticipate and envision the impact of the ML-HCA. Our focus was: identifying affected or potentially affected stakeholder groups and their relation to each other; eliciting stakeholder’s views on practical and ethical considerations with the ML-HCA’s design, implementation, and use; and, identifying where stakeholder groups disagreed about practical or ethical considerations—where their values collided.

Study setting and recruitment

The General Medicine service at Stanford Hospital is the admitting hospital service for nonintensive-care patients with nonsurgical illness. In the last year, the service admitted 3,143 patients (∼9/day). Currently, the palliative care team sees an average of 137 new patients per month with approximately 25% of these coming from this General Medicine service. The palliative care clinicians are unable to manually screen all admitted patients; and in the General Medicine wards, 5–6% of all admissions receive a palliative care consult. Admitted patients were being screened by the ML-HCA for potential ACP needs at admission (and these screening results evaluated by the palliative care service to determine clinician agreement with the ML-HCA’s screening results, but not yet acted upon) at the time the interviews were conducted. The study site was planning a pilot implementation to have the palliative care service begin to use the ML-HCA results as a screening tool for prioritizing palliative care consultation to discuss ACP with admitted patients. This study was conducted during pilot implementation.

Interviewees (the clinicians, administrators and designers involved with the ML-HCA; clinicians on the palliative care service; and patient volunteers who had previously been admitted patients on the general medicine service) were initially approached via telephone or email. No participants dropped out of the study. Recruitment stopped after thematic saturation was reached. Not all patients or clinicians working on the general medicine or palliative care service were interviewed. Inclusion criteria included designers (ie, the bioinformatics programming team who designed the algorithm), hospital administrators, clinicians and patients or patient-representatives impacted or potentially impacted by the mortality prediction to guide ACP algorithm. Exclusion criteria included anyone else.

Data collection and analysis

We used one-on-one interviews, a technique which has been found to be productive for discussing sensitive topics and because this approach is well suited for exploratory research attempting to find a range of perspectives.33–39 The study was approved by the IRB of the Stanford University School of Medicine. Informed consent was verbal due to the low-risk nature of the study and was obtained from all participants.

A semistructured interview guide of open-ended questions intended to elicit participants’ perspectives about design, clinical use, and ethical concerns with ML-HCA and its application to ACP, was piloted with 5 stakeholders (designers and clinicians). Following these pilot interviews, explicit questions about trust in results, data safety, bias, and equity concerns were added to the interview guide (Supplementary Material).

Selection of subsequent interviews was guided by information revealed in the coding process (ie, following leads). Interviews were conducted either in person or over video conferencing and were audio-recorded and transcribed. Though every effort was made to conduct interviews in person, video interviews were used to accommodate constraints from the COVID-19 pandemic. There were no differences noted in the nature of the responses to video interviews compared to the in-person interviews, as both interview formats resulted in similar length discussions with coverage of all interview questions and prompts. The primary investigators provided initial contact to all potential participants and conducted all interviews. Transcripts were uploaded into the qualitative analysis software Dedoose, (www.dedoose.com) and interview data analyzed inductively, using grounded theory.40–44 However, we modified our grounded theory approach in 2 important ways. First, we did use a semistructured interview guide in order to ensure we elicited interviewee perspectives on decisions with value-implications along the design to clinical implementation pipeline, as well as perspectives on known ethical concerns with AI tools potentially stemming from such decisions (Supplementary Material). Second, codes were both inductive from the interviews as well as a priori from the decisions and ethical topics included in the interview guide. Codes were generated through a collaborative reading and analysis of a subset of interviews and then finalized through successive iterations into categories and codes. At least one primary and one secondary coder independently coded each transcript. Differences were reconciled through consensus coding. Consensus coding was also used to identify and characterize areas of disagreement between and within stakeholder groups (or “values-collisions”) around design and implementation decisions and their codes. The inter-rater reliability score was not calculated but codes and themes were discussed to consensus. Emerging themes were identified, described and discussed by the research group. Interviews continued until saturation was achieved.

Coding was also used to create a relationship map between the various stakeholder groups (Figure 1). This figure was created to demonstrate the wide range of relationships and pressures that exist between stakeholders in the context of the prediction algorithm. This figure maps the relationships between stakeholders and potential “values-collisions.” Finally, we presented and discussed our findings with the pilot ML design team, who prioritized several changes to the pilot ML implementation to address the identified ethical concerns.

Figure 1.

Figure 1.

Stakeholder map. Interviews provided context to identify stakeholders affected or potentially affected by the ML-HCA for ACP and map the relationships of these stakeholders to the ML-HCA and to other stakeholder groups. Known relationships are indicated with solid lines, potential pressures identified through interview analysis are indicated with dashed lines and lines with question marks indicate theoretical interactions that are yet to be well defined.

RESULTS

Participants

Study participants included designers, administrators, clinicians, and patients (Table 1). Response was 100% to requests for participation. This response rate was likely positively influenced by individual emails and telephone calls to request participation. Interviews lasted from approximately 20 to 60 minutes.

Table 1.

Demographics of interviewed stakeholders

Role Percentage; n = 18 Gender W:M
Clinician 8 (44%) 4:4
Designer 3 (17%) 0:3
Administrator 3 (17%) 1:2
Patient 4 (22%) 2:2

Seven themes of values conflicts, or where stakeholders disagreed about decisions with ethical implications, emerged from the interviews. These themes were identified as major themes based on frequency discussed throughout interviews. We grouped the themes under the 3 interacting umbrellas affecting the usefulness of a model’s implementation, namely model characteristics, model implementation, and intervention benefits and harms.

Model Characteristic: Bias and Perpetuation of Bias, Perspectives on Death and End of Life Care, and Transparency and Evaluation of Efficacy

Model Implementation: Who Should Receive ML Output, Patient Consent and Involvement

Intervention Benefits and Harms: External Pressures and Study Integrity, Palliative Care Legitimization

From these 7 sets of “values-collisions” emerged key ethical considerations which are outlined in Table 2 and further explored in the discussion.

Table 2.

Key stakeholder “values-collisions”

Ethical concern Patient value Clinician value Designer value
1: Model characteristics: bias and perpetuation of bias Patients want data to include individuals like themselves Clinicians had the concern with ML perpetuating further inequalities Designers are concerned the algorithm could better identify individuals with better healthcare records
2: Model characteristics: perspective on death and end of life care Patients think the model would not impact end-of-life care decisions Clinicians think the model could improve end-of-life care conversations Designers feel that the model would guide ACP and impact end-of-life care decisions
3: Model characteristics: transparency and evaluation of efficacy Patients found that details were not important but would like overall idea of how prediction works Clinicians would like to know about how algorithm works, emphasis on use of prespecified trial endpoints Designers feel that it is more important to demonstrate algorithm validation than methodology 
4: Model implementation: who should receive ML output? Patients feel as it is important to get predictions from a trusted clinician such as a PCP Clinician’s concerns surround the algorithm further burdening the Palliative Care team Designers feel that the algorithm has low pretest probability, and the outcome is not harmful—ideal ML “test case”
5: Model implementation: patient consent and involvement Patients would like knowledge of mortality prediction Clinicians agree with patient knowledge if accompanied with conversation Designers feel algorithm may not be an accurate predictor of mortality, so should not be shown to patients—issue of misinterpretation
6: Intervention benefits and harms: external pressures and study integrity No key value collision Clinicians had concerns about media Designers had concerns about PR blowback if ML use and intent misinterpreted
7: Intervention benefits and harms: palliative care legitimization No key value collision Clinicians felt the model could add legitimacy to palliative care consultations; this reflected clinician concerns re: legitimacy of palliative care as an intervention Designers had desire to improve access to palliative care/ACP

Theme 1 model characteristics: Bias and perpetuation of bias

Patients were concerned with ensuring that data included individuals like themselves. It was important for patients to feel as though the data was reflective of themselves and their loved ones in order to provide accurate outputs.

Patient perspective: “I think my limited understanding of [machine learning]…is that the program learns based on inputs…so my concern would be that the inputs it’s given as it pertains to my life or my loved one’s life are similar.” -Individual 17 (Patient)

From a similar perspective, clinicians were worried of the ML technology’s impact on the system and potentially perpetuating further inequalities. They found it important to ensure that social justice was a factor to be addressed in the model inputs.

You know, social justice in clinical care is really a big issue. I worry about things like this…that it could just perpetuate the injustice and that people of color and people who are marginalized already don’t have access” -Individual 6 (Clinician)

Designers had the concern on how an algorithm could further perpetuate an issue that already in large part exists–that more fortunate individuals are the ones with better and more complete healthcare records.

“The model is better at identifying certain kinds of folks. And so, for instance, we know that the more information we have about people the better,… But for the people who have really scarce data, they might just sort of…start getting pushed into the pile of unknown unknowns…And so like it’s…exacerbating an existing inequity in the system” -Individual 2 (Designer)

Theme 2 model characteristics: Perspective on death and end of life care

Patients mentioned that knowledge surrounding the algorithm would not impact their end-of-life decisions. They did not think there would be a substantial way they would change their lives based on these outputs. They found that for individuals who were already chronically ill with poor prognoses this may be more helpful.

But saying to me you have five years left, is it going to change my life for me? Probably not…I think for people who are very chronically, chronically ill and have very poor prognoses even without anyone to look at it, and if a physician truly knew their patient, [the mortality prediction] might be helpful.” -Individual 15 (Patient)

Clinicians focused on how this algorithm could improve the quality of end-of-life care conversations. They believe it would help possibly relieve some of the pressures surrounding length of patient stay in particular.

I think that could really help, especially patients who come into the hospital these days are usually really sick, there’s this huge pressure to get them out of the hospital…. if this could be used in a way [for] th[e] patient needing a big picture conversation about where things are going.” -Individual 9 (Clinician)

These concerns presented value clashes with designers. Designers believed that most individuals would not know what to do with this information—a statement that contradicts patients who were saying they would make different financial decisions or have conversations with families about end-of-life care.

The reason people perceive it to be different is that that information is used differently. And I mean if someone tells you, you have a 10% risk of getting Alzheimer’s at age 60, most people don’t know what to do with it. Most people don’t know what to do with mortality information” -Individual 4 (Designer)

Theme 3 model characteristics: Transparency and evaluation of efficacy

Here designers urged the need to demonstrate validation rather than the details of the algorithm. They did not believe that understanding the detailed model algorithm would be helpful.

I don’t think that doctors are going to understand how the model really works…I think to be able to say…to look back a year these are all the patients that we screened last year, and this is what happened to them, to see the validation” -Individual 9 (Designer)

This contrasted with the patients and the clinicians, who wanted at least an overview or a deeper understanding of how the algorithm works.

But just understanding the basics…What we know is patients with similar conditions, similar ages, test scores, this is how their disease progressed.” -Individual 15 (Patient)

I’d want to know what factors it takes into account and a little bit about how they’re weighted, and I’d want to know something about the performance of the algorithm…it would be important to make sure that people were using this algorithm and had access to the numbers from it were given appropriate training in how to contextualize it” -Individual 13 (Clinician)

The end points that the clinician and patients asked for–how factors are weighted, disease progression, ages–to show validation they needed to be prespecified. Interpretability from a designer’s perspective on how an AI model produces a result varies from a casual interpretability or a level of understanding which is needed to build trust with clinicians, although there is overlap. What the patient and clinician are asking for is often not a level of interpretability that the designer can provide.

Theme 4 model implementation: Who should receive ML output?

Clinicians and patients disagreed with the strategy of the direct to general medicine approach both because of workflow implications and because of concerns about informed consent and patient wishes.

All patients interviewed expressed they would want this information from a trusted provider, not from a stranger.

It would best be approached if a PCP or a very trusted clinician referred –ideally with the patient’s prior permission and agreement…my gut says there needs to be a bond of trust with the person that has that first conversation with the patient.” -Individual 14 (Patient)

Palliative care clinicians were concerned with the impact on their workload–if they were acting as the “human screen” for this predictor, is there any added value or differentiation between an assisted or autonomous algorithm?

[Advanced care planning] [is] not an unlimited resource, … to give them an additional sorting function beyond screening each of those two charts that they’re looking at…are we just adding more work and then they’re not going to be able to complete kind of the tasks of advanced care planning for the patients in the time that they’re going to have allotted to spend on this.” -Individual 1 (Clinician)

Theme 5 model implementation: Patient consent and involvement

This theme reflected the greatest value conflicts. Patients wanted to know their prediction. Every single patient interviewed wanted access to the information that would be available to them regarding this prediction. The differences were surrounding the timing and manner that this information was relayed.

So I would like to be brought into the loop as soon as somebody knows that I will probably die soon” -Individual 14 (Patient)

Clinicians agreed, but with the condition that the information be relayed in conversation. It was important for clinicians to not inform patients without any context to what the prediction meant for them. The end goal for clinicians was more geared towards quality-of-life concerns rather than the timeline of death.

“I think [patients should have access to their mortality prediction] but not in a vacuum…yes with a conversation…it’s important for us to at least start having a conversation so that we understand the…goals and preferences towards the end of life” -Individual 13 (Clinician)

Designers did not agree with informing patients. This is because from the perspective of the designers, the algorithm output is not an accurate prediction of the patient’s mortality.

In my view, it would be very irresponsible to show this prediction to the patient directly because this tool is just a tool….it is certainly not a well-calibrated probability of the person actually dying. So presenting this as…the probability of the patient’s mortality to the patient would be wrong and also irresponsible” -Individual 8 (Designer)

Furthermore, designers thought that because most individuals already could benefit from palliative care, and these consults were seen as nonharmful intervention, the low pretest probably made it a good “initial test case” for machine learning.

So one of the reasons why this is attractive to us as an initial test case for using ML or machine learning in clinical medicine is because we’re really talking about providing extra care to a patient in a sense, right, just like extra attention.” – Individual 2 (Designer)

Theme 6 intervention benefits and harms: External pressures and study integrity

Both clinicians and designers had concerns about external pressures, most notably image in popular press. These are not typical study pressures (for example patient enrollment numbers or outcomes). This is because the ML algorithm is more public, and this study has more of a public setting.

So let me tell you the story that you’re going to tell the media when this family comes back and says I got a bill for my dying loved one when she died in the hospital…on observation, because they wouldn’t admit her because they thought she had less than 24 hours to live.” -Individual 1 (Clinician)

We saw this happen…eight years ago or so, with the whole death panels and how easy it is to spin a positive intervention… to twist and it became this, oh no, doctors are now deciding for themselves who gets to live and who dies.”– Individual 12 (Clinician)

So let’s say extreme situation…[someone] happens to be a board member and they get offended that somebody talked to be about dying, we’d have a PR nightmare on our hands” -Individual 4 (Designer)

Participants posited that the model may need randomized control trial-like protections to prevent this blowback from happening. Protections needed to have pre-specified endpoints for auditability and some form of transparency with the clinicians utilizing the information.

Theme 7 intervention benefits and harms: Palliative care legitimization

Finally, an unanticipated benefit to this predictor was what we termed “palliative care legitimization”. Palliative care clinicians and patients both mentioned that the algorithm has the unique ability, through automation, to provide added legitimacy to the profession—especially with regards more “old school” physicians who would not usually order ACP consults.

My experience has been the palliative team is not…they’re amazing, but they’re not held as in high esteem as the cardiac surgeons, or the neurosurgeons, or the trauma surgeons, so they don’t necessarily ask for a consult from Palliative Care” -Individual 15 (Patient)

So maybe able to get access to patients who traditionally wouldn’t get to me because of, again, that still like human element of preconceived ideas around Palliative Care and concerns around Palliative Care…automating like the identification of patient who could benefit, we now get to see that patient who previously we would never get access to.” -Individual 1 (Clinician)

Palliative care has struggled for years as a discipline, often facing perceptions from other medical disciplines that engaging in palliative care is an abdication of care.29 While the algorithm is not responsible for this, clearly its deployment interacts with this latent value system. Theme 7 is a value collision in that it refers to intraclinician conflict. This fits strongly enough with what was already known concerns around the field of palliative care, and efforts of field of palliative care to achieve parity with other disciplines.

DISCUSSION

Almost certainly, ML-HCAs will have a substantial impact on healthcare processes, quality, cost, and access, and in so doing will raise specific and perhaps unique ethical considerations and concerns in the healthcare context.45–49 This has been the case in nonhealthcare contexts,5,50 where ML implementation has generated toughening scrutiny due to scandals regarding how large repositories of private data have been sold and used,51 how the ML design of algorithmic flight controls resulted in accidents,52 and how computer-assisted prison sentencing guidelines perpetuate racial bias,53 to name but a few of the growing number of examples. Specifically for ML-HCAs, a variety of ethical considerations and concerns have been cited, such as bias arising from training data,8 the privacy of personal data in business arrangements,54 ownership of the data used to train ML-HCAs55 and accountability for ML-HCA’s failings.56 Notably, no systematic approach has yet emerged regarding the identification of specific ethical concerns arising from actual ML-HCAs, an emerging, complex, cross-disciplinary technology that potentially affects many aspects of healthcare.

In this case study of an ML-HCA for ACP the “values-collision” approach was able to identify multiple themes of disagreement between stakeholders, which mark areas of ethical and values conflict. In identifying affected stakeholder groups, eliciting stakeholder’s views, and identifying where stakeholder groups disagreed, we were able to provide actionable guidance to the ML-HCA design team regarding where consequential ethical challenges were likely to emerge. This approach differs from that of a checklist or EHR analytic in that analysis of stakeholder perspectives is unique in its implementation.57,58 “Values-collision” screening will certainly require revision through subsequent case studies and will need to be made more efficient to screen multiple ML-HCAs in a timely fashion, but offers an initial approach to identifying ethical challenges with an ML-HCA before such challenges become consequential.

From our findings, the design team was able to prioritize needed efforts focused on: (1) examining alternative implementation strategies to delivery of mortality predictions into the workflow (ie, directly to patients or to hospitalist clinicians); (2) explicitly clarifying to clinicians and patients that the mortality prediction was only evaluated as a surrogate to predict need for ACP, and not other mortality-related decisions, and possibly renaming the prediction as “ACP recommended” rather than “mortality prediction”; and, (3) shielding their ongoing research into mortality prediction from social media scrutiny until endpoint driven studies were completed (ie, enacting protections similar to blinded clinical trials).60 These 3 measures were reported back to the design team as they were actionable and drawn from the 5 key “values-collisions” we identified. The implications of these findings for informatics team demonstrated the need for multiple stakeholder perspectives on tool development, as well as protecting design development from external pressures in practice. These are takeaways that can be adapted for future stakeholder meetings with larger “values-collisions.”

From this case study, using “values-collisions” appears to help proactively and systematically identify ethical considerations with a specific ML-HCA, and facilitate interdisciplinary dialogue and collaboration to better understand and subsequently manage the ethical implications of an ML-HCA before and during its early deployment. Where the ethical concerns arose, we were able to draw on ethical scholarship and work with designers and users to address such challenges before they became consequential. Ethical scholarship includes discussion around traditional bioethical principles including beneficence or distributive justice. For example, while the ML mortality predictions could be delivered directly to general medicine clinicians, findings from the SUPPORT trial57 suggested nonpalliative care clinicians may not act on mortality predictions for ACP. In addition, the incentive pressures to meet quality metric indicators43 were perceived by the design team as having possible unintended ethical consequences, such as guiding treatment options or decisions around admission (ie, choosing to not admit patients with high likelihood of near-term mortality so as to not count against hospital 30-day mortality rates), rather than being used to guide ACP. Mortality predictions could also be given directly to patients, though that too could give rise to ethical concerns, particularly around patients’ abilities to understand such a prediction without contextualization, as with any screening test result,59 as well as the significant emotional effects of receiving a mortality prediction without appropriate support (such as the recent public outcry around physicians delivering poor prognoses via video-link61).

Future studies are needed to clarify when future ethical analyses should be conducted as this (or any ML-HCA) is revised and deployed more broadly. Optimizing timing of ethical analysis will also need iterative study but should occur in the “sweet spot” of the Collingridge dilemma: late enough that the ML-HCA impacts can be predicted, but not so late that the ethical problems have already become entrenched.62 Additionally, research should focus on how to better streamline, and make more efficient the ethical analysis process (whether questions can be delivered via survey, which questions are of the highest yield, and the optimal number of stakeholders assessments needed.) The inclusion of the end-users in the initial development of the mortality prediction ML tool would also be an important takeaway for future studies.

Limitations of this study included a small number of participants and limited demographic information. As with all qualitative studies there are limits to generalizability. However, in this case study, this framework methodology provides a standardizable approach for how to identify ethical challenges with a ML-HCA and is necessary to ensure machine learning tools fulfill their promise to improve care for patients and families.

Supplementary Material

ocad022_Supplementary_Data

ACKNOWLEDGMENTS

Thank you to our study participants for their time and thoughtful comments. Thank you to Sarah Wieten for her assistance with participant interviews.

Contributor Information

Diana Cagliero, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Natalie Deuitch, Department of Genetics, Stanford University School of Medicine, Stanford, California, USA; National Institutes of Health, National Human Genome Research Institute, Bethesda, Maryland, USA.

Nigam Shah, Center for Biomedical Informatics Research, Stanford University School of Medicine, Palo Alto, California, USA.

Chris Feudtner, The Department of Medical Ethics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA; Departments of Pediatrics, Medical Ethics and Healthcare Policy, The Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Danton Char, Division of Pediatric Cardiac Anesthesia, Department of Anesthesiology, Stanford University School of Medicine, Stanford, California, USA; Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA.

FUNDING

Stanford Human-Centered Artificial Intelligence Seed Grant.

AUTHOR CONTRIBUTIONS

Conception and design: DSC, ND, NS, and CF. Data Acquisition: ND and DSC. Data Analysis: ND, DAC, and DSC. Data Interpretation: all authors. Drafting of Manuscript: DSC, DAC, and ND. Critical revision of manuscript: all authors.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONFLICT OF INTEREST STATEMENT

None declared.

DATA AVAILABILITY

The data underlying this article are available in the article and will be shared on reasonable request to the corresponding author.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocad022_Supplementary_Data

Data Availability Statement

The data underlying this article are available in the article and will be shared on reasonable request to the corresponding author.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES