Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 1.
Published in final edited form as: Support Care Cancer. 2022 Jan 30;30(5):4363–4372. doi: 10.1007/s00520-021-06774-w

Clinician perspectives on machine learning prognostic algorithms in the routine care of patients with cancer: a qualitative study

Ravi B Parikh 1,2,3,4, Christopher R Manz 5, Maria N Nelson 1, Chalanda N Evans 1,6, Susan H Regli 4, Nina O’Connor 1,2,4, Lynn M Schuchter 1,2,4, Lawrence N Shulman 1,2,4, Mitesh S Patel 1,2,3,4,6,7, Joanna Paladino 8, Judy A Shea 1
PMCID: PMC10232355  NIHMSID: NIHMS1894614  PMID: 35094138

Abstract

Purpose

Oncologists may overestimate prognosis for patients with cancer, leading to delayed or missed conversations about patients’ goals and subsequent low-quality end-of-life care. Machine learning algorithms may accurately predict mortality risk in cancer, but it is unclear how oncology clinicians would use such algorithms in practice.

Methods

The purpose of this qualitative study was to assess oncology clinicians’ perceptions on the utility and barriers of machine learning prognostic algorithms to prompt advance care planning. Participants included medical oncology physicians and advanced practice providers (APPs) practicing in tertiary and community practices within a large academic healthcare system. Transcripts were coded and analyzed inductively using NVivo software.

Results

The study included 29 oncology clinicians (19 physicians, 10 APPs) across 6 practice sites (1 tertiary, 5 community) in the USA. Fourteen participants had previously had exposure to an automated machine learning-based prognostic algorithm as part of a pragmatic randomized trial. Clinicians believed that there was utility for algorithms in validating their own intuition about prognosis and prompting conversations about patient goals and preferences. However, this enthusiasm was tempered by concerns about algorithm accuracy, over-reliance on algorithm predictions, and the ethical implications around disclosure of an algorithm prediction. There was significant variation in tolerance for false positive vs. false negative predictions.

Conclusion

While oncologists believe there are applications for advanced prognostic algorithms in routine care of patients with cancer, they are concerned about algorithm accuracy, confirmation and automation biases, and ethical issues of prognostic disclosure.

Keywords: Predictive analytics, Machine learning, Advance care planning, Palliative care, Supportive oncology

Introduction

For patients with cancer, early conversations between clinicians and patients regarding goals of care and prognosis are a guideline-based practice and improve patient-reported mood and quality of life [1]. Early conversations are associated with well-established metrics of high-quality end-of-life care, including greater hospice enrollment and less chemotherapy near the end of life [26]. Despite the recognized value of early conversations, overoptimistic prognoses among oncologists may contribute to delayed or missed conversations about goals and care preferences [7, 8]. Use of prognostic algorithms may facilitate better clinician assessment of life expectancy in order to prompt earlier conversations about patients’ goals of care and treatment preferences [9] and more rational use of therapy and procedures near the end of life [10, 11].

Advances in machine learning (ML) and electronic health record (EHR) infrastructure and better availability of granular patient-level data have spurred an interest in using ML prognostic algorithms to better predict mortality risk and facilitate earlier conversations regarding goals of care between oncologists and patients with cancer [12, 13]. Recently, several ML algorithms—often based on hundreds of routinely collected real-time EHR variables such as laboratory values, comorbidities, and acute care utilization—have been shown to generate accurate predictions of mortality risk within and across several cancers [14, 15]. While most published ML algorithms offer static predictions at a single point in a patient’s trajectory (e.g., diagnosis of advanced cancer) [16], emerging automated algorithms generate updated predictions at multiple time points (e.g., at each clinical encounter) [17]. ML prognostic algorithms may outperform clinician intuition, prognostic nomograms based on relatively low numbers of variables, or clinical trial survival estimates [16, 1826]. Despite the abundance of published ML prognostic algorithms in cancer, however, there are only fledgling examples of such tools being used in routine oncology practice [27]. In particular, limited transparency and interpretability of the inputs, algorithm, and outputs are major issues that may lead oncology clinicians to be apprehensive about use of ML prognostic algorithms in practice [28, 29].

Oncology clinician perspectives on use of ML prognostic algorithms in practice have not yet been fully explored. Lack of knowledge about these perspectives is a significant barrier to the routine use of ML algorithms to improve care delivery for patients with cancer [30]. The objective of this qualitative study was to assess oncology clinicians’ perceptions on the utility, barriers, and desired characteristics of ML prognostic algorithms to inform care delivery of patients with cancer, particularly discussions of prognosis and advance care planning. We chose a qualitative design because, to our knowledge, no quantitative tool to assess clinician perspectives towards ML prognostic algorithms exists, and we wished to explore the full spectrum of clinician perspectives towards a hypothetical ML algorithm in practice.

Methods

Study design

We designed a multisite qualitative interview study at practices within a large academic National Cancer Institute-designated comprehensive cancer center and its affiliated community hospitals and ambulatory cancer centers. This study aimed to learn clinicians’ perceptions on the utility, barriers, and desired characteristics of machine learning prognostic algorithms used in the clinical care of patients, through the use of semi-structured phone interviews with purposively sampled oncology clinicians (N = 29). Our study adhered to standard reporting guidelines, including the Standards for Reporting Qualitative Research (SRQR) and the Consolidated Criteria for Reported Qualitative Research (COREQ) checklist (see Supplement).

Sample

Participants included medical oncologists and advanced practice providers (APPs), consisting of nurse practitioners and physician assistants, across six hematology/oncology practices within a large academic cancer center (see Supplemental Table 1 for a description of study sites). We included only medical oncology clinicians with active outpatient practices to ensure all participants were able to reflect on practical considerations of using advanced prognostic algorithms in practice. Some clinicians were practicing in clinical sites that had recently had access to a validated ML prognostic algorithm as part of a quality improvement initiative to identify patients with cancer who may be appropriate for an advance care planning discussion [31]. While the purpose of this study was not to compare responses between clinicians who had vs. had not been exposed to the intervention (which consisted of additional components besides algorithm exposure, including peer comparisons), we wished to capture perspectives on algorithm-based mortality predictions from both groups. Purely random sampling, by chance, may have resulted in underrepresentation of clinicians who were exposed to the intervention, and thus subsequent themes may not have been as representative of our intended population of oncologists who may have exposure to similar algorithms or prognostic tools. Thus, we stratified sampling based on practice site to obtain a 1:1 ratio of groups that did and did not have exposure to this algorithm in order to ensure better generalizability of our findings [31].

Recruitment

To recruit participants, we contacted each practice site and obtained email addresses for each site’s oncology clinicians. To ensure inclusion of perspectives that reflect the general distribution of oncologists in the USA, we generated a random order list of potential participants in a stratified fashion to obtain a 2:1 ratio of general oncology vs tertiary academic clinicians and a 2:1 ratio of physicians to APPs. A participant who refused participation was replaced by a member in that same group to maintain stratification ratios. As some clinicians at the study institutions had exposure to an existing mortality prediction tool that could influence study findings, subjects were also stratified in a 1:1 fashion for exposure/no exposure to the existing algorithm. Details regarding the algorithm and the clinical exposure are described elsewhere [9, 24, 31]. Either the two study Principal Investigators (CM or RBP) or project manager (CE) contacted clinicians by email to invite them to participate; non-responders were sent two follow-up invitations at weekly intervals. Further details on recruitment are available in the study protocol (see Supplement).

Interviews

Interview guides were developed using questions based on the research team’s previous work on the application of prognostic algorithms. The guides were pilot tested with four oncology clinicians who were not included in the study. Twenty-nine semi-structured interviews were conducted by telephone. Verbal informed consent was obtained prior to all interviews. Following each interview, a short survey was conducted to obtain information regarding clinician demographics, years in practice, practice type (general vs. specialty), and training. Interview guides included open-ended questions with follow-up probes to elicit clinicians’ opinions on a hypothetical advanced algorithm to predict mortality, without an explicit mention of specific models, model characteristics or associated interventions. Interviews lasted an average of 20 min (range 10 to 25 min) and contained 8 questions, including a question that asked whether clinicians had current or prior experience using any computer-generated models or mortality prediction tools to inform clinical decision-making. The interview guide is included as a Supplement. All interviews were conducted and audio recorded between October 2019 and February 2020 by a single researcher (MN) at the Mixed Methods Research Lab (MMRL) at the University of Pennsylvania and transcribed by an external service. The interviewer reviewed all transcriptions to verify accuracy.

Data collection and analysis

Transcripts were cleaned of all identifying information and uploaded to NVivo 12 Plus, a qualitative data analysis software. Over the course of interview collection, a codebook was developed using an inductive content analysis approach that relied upon a set of a priori codes derived from interview guide question topics32 as well as emergent themes based on a close reading of the transcripts; this codebook was iteratively refined and modified slightly as it was applied to the data. We achieved saturation by agreement between the interviewer and the coders after 25 interviews; in order to verify saturation and recurrence of themes among the general participant pool, we conducted 4 additional interviews. Two MMRL researchers coded all interview transcripts (5 double-coded, the remaining individually coded) using periodic inter-rater reliability estimation to measure agreement and facilitate analysis (final κ = 0.81 [range 0.62–0.81]). The resulting analysis is based upon the findings from this process. As exposure to existing automated mortality prediction could influence clinician perspectives, an exploratory analysis also evaluated for thematic differences in perspectives from clinicians with and without exposure to the existing algorithm.

Findings

Study participants

Of the 72 clinicians recruited to participate in this study, 29 were interviewed, for an overall participation rate of 40.3%. There were no significant differences between clinicians who chose vs. declined participation. Fifty-two percent of participants described themselves as general oncology clinicians, 66% were physicians, 66% were female, and the average years in practice was 15. Forty-eight percent of participants had been previously exposed to a prognostic algorithm. Additional characteristics of the study participants are described in Table 1.

Table 1.

Characteristics of oncology clinician participants

Age, years
 Mean (range) 47.6 (36–65)
Practice setting, n (%)
 General 15 (51.7)
 Tertiary 14 (48.2)
Gender, n (%)
 Male 10 (34.5)
 Female 19 (65.5)
Self-reported race, n (%)
 Non-Hispanic White 27 (93.1)
 Other 2 (6.9)
Years in practice, n (%)
 0–10 12 (41.4)
 11–20 11 (37.9))
 > 20 6 (20.7)
Clinician type, n (%)
 Physician 19 (65.5)
 Advanced practice practitioner 10 (34.5)
Prior exposure to prognostic algorithm, n (%)
 Yes 14 (48.3)
 No 15 (51.7)

Utility of prognostic algorithms in clinical practice

There were two primary applications of advanced prognostic applications in clinicians’ practice: (1) as a tool to compliment clinician prognostic intuition, and (2) as a tool to prompt conversations with patients about goals of care and end-of-life care (see Table 2 for illustrative quotes).

Table 2.

Illustrative quotes regarding the utility of prognostic algorithms in practice

As a tool to compliment clinician prognostic intuition “I wouldn’t use it solely…I’ve been in oncology [for several decades], so I tend to rely more on what I can see in my practical information versus a number… I think that it is a useful piece of information that could potentially guide you in some of your conversations.” (NP1, age 58, female, 35 years practicing, exposed to algorithm)
“I would need to validate it in my own experience. It’s sort of like getting a second opinion of what your own opinion is” (MD4, age 51, female, 15 years practicing, no exposure)
“Doctors aren’t clairvoyants and so we’re using our best guess and our experience to guide our impression of these situations and I think that’s one other obstacle to having end of life conversations, is just uncertainty yourself as a physician, and as a provider about what’s going to happen with your patients. I think [mortality predictions] would be very handy.” (MD5, age 37, female, 5 years practicing, no exposure)
Prompting end-of-life conversations “It would serve more of a reminder to me, like, ‘Hey, may be it’s time to have this conversation.’ I’m well aware that these things are impossible to predict, but to me it would be may be a prompt because sometimes us physicians can even fool ourselves to thinking that things are okay when they’re not” (MD6, age 49, male, 10 years practicing, no exposure)
“I think that that algorithm would be a factor, it would be a tool in my toolbox to use in terms of having goals of care conversations… I may use it for patients where it might be particularly difficult having them understand …what’s happening to them. I could use it as a tool to help patients understand their mortality risk, sort of in the way that we do in other things like [share survival statistics from clinical trials]” (MD3, age 42, female, 8 years practicing, no exposure)
“[T]here are some patients who you look at and they’re in a good mood and they seem to be doing well today. And so you can say, “Well, maybe today it’s a good day; let’s not ruin it by talking about death.” So knowing that that [prediction] score was there, would spur me on to actually have the conversation sooner.” (APP1, age 44, female, 16 years practicing, exposed to algorithm)
“I think that could be really helpful for those patients that don’t have a crisis, that have a very [good] functional status, because those are the more nitty gritty conversations. Even I typically have these conversations when I notice a change and sometimes after change is actually too late.” (APP2, age 37, female, 8 years practicing, no exposure)

A tool to complement clinician prognostic intuition

The first theme was that prognostic algorithms could have a role in practice, but primarily as tools that could support clinician prognostication and clinical decision-making, rather than as the primary source of estimating a patient’s prognosis. Clinicians felt that having access to an automated prognostic algorithm that flagged patients who had a high likelihood of dying in the next six months would be helpful in confirming their own instincts about a patients’ trajectory—a “second opinion on [their] own opinion” (MD4). One such clinician noted that having access to the algorithm’s information would be a helpful supplement but that they would not rely on it over their own clinical intuition. Another clinician said that they would check the accuracy of the prognostic algorithm against their own clinical judgment before integrating it into their regular patient care practice. Still, clinicians described their own limitations to accurately predicting the course of a disease, noting that “doctors aren’t clairvoyants” (MD5) and “…don’t have crystal balls” (MD10). Clinicians felt that such a tool could be helpful particularly in the care of patients whose disease course is difficult to predict, as this prognostic uncertainty often makes the decision to initiate a conversation difficult.

A tool to prompt end-of-life conversations

Distinct from being simply a tool to refine prognostication, a second theme was that automated prognostic predictions would prompt clinicians to have advance care planning discussions with their patients, particularly when they felt the predictions corroborated their clinical judgment. Such predictions could give an entrée into the conversation with patients, “a reminder…[that] maybe it’s time to have the conversation” (MD6). Some clinicians mentioned that they expected to have already had the conversation with any patient who was flagged as having a high risk of dying in the next six months, but that automated predictions would be a way to ensure no at-risk patient was missed. The algorithm predictions could prompt conversations regarding general goals of care, cessation of treatment, introduction of palliative care measures, or hospice referrals. However, if the algorithm plays such a role, some clinicians indicated a potential obligation to share this prognostic information with patients: “this is something [patients] would want to know” (MD5), raising possible ethical quandaries as discussed below.

Concerns about clinical application of advanced prognostic algorithms

Clinician concerns about integrating a prognostic algorithm into clinical care revolved around three themes 1) algorithm accuracy, 2) over-reliance on prediction algorithms, and 3) ethical considerations of sharing mortality predictions with patients (see Table 3 for representative quotes). Our exploratory analysis revealed that clinicians who had exposure to an existing advanced prognostic algorithm expressed reservations tied to their experience with that intervention.

Table 3.

Concerns regarding advanced prognostic algorithms in practice

Accuracy of predictions “[Prognosis] is tricky because so much can happen, especially depending on the disease. Usually I tell patients that within 48 h, [a prediction] is pretty darn accurate, but anything after that is hard to tell. I would have skepticism about the algorithm given all the tools we have available to us today and [because of] different responses to therapy, I’d be a little pessimistic that it would be very accurate” (MD1, age 53, female, 25 years practicing, no exposure)
“I don’t think a model is going to tell me whether a patient of mine is going to die or not, honestly… [As someone] who takes care of a patient with acute leukemia or a bone marrow transplant that’s near relapse, I frankly think that’s just absurd [to think that the model will tell me if the patient will live or die.]” (MD7, age 65, female, 17 years practicing, no exposure)
“I just think there’s a lot of under and over-reporting of co-morbidities. I’m not sure what [the algorithm] would capture from [the electronic medical record], but that would be my biggest worry is that the data it is using to make its calculation is not accurate. I guess that would be my biggest worry.” (MD8, age 41, female, 10 years practicing, no exposure)
Over-reliance “I think the biggest [concern about clinical application of the algorithm] would be an over-reliance with an overestimate of the precision of [the algorithm] …if we didn’t put it in the appropriate context and use that as the driving, most important feature.” (MD9, age 36, male, 4 years practicing, no exposure)
“[Automated prognostic information] could make me less present or available to my patients, because [the algorithm] is doing my work for me. I worry, would that take the place of that human connection and that human relationship?” (APP3, age 49, female, 13 years practicing, exposed to algorithm)
Ethical challenges “Any information we get, we’d have an obligation to disclose it to patients. And it depends in the accuracy of the data, can the disclosure cause harm to the patient? Especially mental harm. Any data …should not be disclosed to the patient until the clinician discloses it” (MD2, age 58, male, 26 years practicing, no exposure)
“If [the algorithm] estimated a patient had a 75% chance of dying in the next six months, and I had a therapy that I thought gave them a 30% chance of responding and living longer, I wonder if that 75% chance of dying would dissuade me from offering therapy that might help people live longer” (MD9, age 36, male, 4 years practicing, no exposure)

Accuracy of predictions

Even when clinicians were enthusiastic about the possible benefits of prognostic algorithms, responses about their perceived utility were qualified by the need for accurate predictions. Concerns around accuracy reflected skepticism that any predictions outside of a days- or week-long time window could be accurate. Some clinicians felt that their clinical judgment based on experience treating patients would be superior to any algorithm: “I rely on… what I can see… versus a number” (NP1). Often this skepticism centered around the unpredictability of certain diseases or treatment responses: “with all the tools we have available to us today… [because of] different responses to therapy that I’ve seen, I’d be a little pessimistic that [an algorithm] would be very accurate” (MD1). This sentiment was more common among sub-specialized oncology clinicians practicing in a tertiary center, as compared to general oncologists. Algorithm inaccuracy also contributed to ethical dilemmas regarding sharing predictions with patients, as discussed below. Finally, clinicians expressed concern about the reliability of using imperfect electronic medical record data, where important prognostic variables may be present but not recorded, to generate the predictions.

Over-reliance on a predictive algorithm

The potential for the tool to become a “crutch” in clinical decision making was a concern for clinicians who viewed over-reliance as a risk that could change the course of a patient’s care in a way that was perhaps unwarranted. One APP framed her concern about over-reliance within the possibility of losing the “human connection and that human relationship” with patients, potentially leading clinicians to be “less present” in their patient interactions. Drawing an analogy of how Oncotype testing has helped refine adjuvant chemotherapy for patients with breast cancer but has required increasing precision over time to prevent over-treatment for a subset of patients, an MD emphasized the importance of contextualizing the prediction for the specific patient rather than just broad reliance upon algorithm results.

Ethical considerations

Another theme was the ethical challenges of sharing mortality predictions with patients. Clinicians argued that “any [prognosis] data…should not be disclosed to the patient until the clinician discloses it” (MD2) and should be presented to the patient during a regular appointment where the context of a patient’s individual condition could be more fully explored. This concern is grounded in the belief that patients often misinterpret statistical data, such as survival statistics from a clinical trial, and thus are likely to misinterpret mortality predictions. While these concerns are applicable even when an algorithm is perfectly accurate, they are further magnified when predictions may be inaccurate.

Clinicians were particularly concerned that automated predictions derived from electronic medical record data might be automatically shared with patients without clinicians being able to provide critical clinical context. One physician expressed concerns that simply having such predictions in the EHR presented an ethical tension between a perceived “requirement” to share predictions with patients and the effect of such disclosure on patient well-being. This clinician even posed the hypothetical of a patient viewing their poor prognosis online and thinking, “Oh, my God… I’m dying in three months” (MD2) and how that data could cause “mental harm.” Clinicians also raised questions about how mortality predictions should interface with decisions regarding treatment that can affect prognosis, particularly whether high mortality predictions might convince a clinician to withhold a potentially beneficial therapy.

Exploratory analysis: impact of prior exposure to advanced prognostic algorithms

Half of clinicians interviewed for this study (N = 14) were practicing at sites participating in a quality improvement study that used an automated ML prognostic algorithm to prompt clinicians to have advance care planning discussions with patients who have high predicted mortality risk. Overall, clinicians with prior exposure to an advanced prognostic algorithm found it to be “helpful,” either in reminding them to have the conversation with a specific patient or by “keeping [the conversation] in mind” for all patients. Clinicians with prior exposure to the algorithm highlighted concerns over accuracy more than concerns about over-reliance or the ethical dilemmas posed by automated mortality predictions. There were no important thematic differences observed between clinicians with and without exposure to the prognostic algorithm.

Preferences on prognostic algorithm characteristics

The performance characteristics of a prognostic algorithm may influence its clinical utility. Clinicians expressed a range of perspectives on the optimal prediction characteristics (i.e., favoring false positives vs false negatives), as well as how these variables would influence the utility of mortality predictions (see Table 4 for illustrative quotes).

Table 4.

Preferences regarding algorithm performance and specifications

False positives vs false negative preference “I’d rather make sure to have that conversation more often with people and to be able to say to them, ‘You’re living past six months. Great!’, rather than, ‘I missed my window of opportunity’” (APP2, age 37, female, 8 years practicing, no exposure)
“We get so many [EHR] prompts for drug interactions and other things that I often ignore them. My fear is if there were too many false positives, I would start to ignore them” (MD6, age 49, male, 10 years practicing, no exposure)
“Having a lot of false positives would lead clinicians to ignore the prompts to have serious illness conversations with patients…But it’s the same thing from a false negative standpoint: If it wasn’t identifying patients that the clinician really deems to be high risk for death from their disease, it causes the clinician to doubt the reliability of the system” (APP4, age 46, female, 5 years practicing, exposed to algorithm)

Preferences about algorithm performance

Faced with an algorithm biased towards false positive identification of patients (effectively identifying too many patients) at high risk of death versus false negatives (effectively identifying too few patients), clinicians were evenly divided. Clinicians who preferred an algorithm bias towards false positives argued that this option would give patients more resources and time to plan their end-of-life care. Others preferred this option under the assumption that conversations regarding prognosis ought to be taking place regardless of a patient’s 6-month mortality risk.

Other clinicians felt that an algorithm biased towards more false negatives would be more productive for their practices (N = 12). In such an algorithm, more patients would pass away within a six-month period without having been flagged as having a high risk of death. One physician felt that false negatives would be preferable because “I could use my other clinical tools and my other judgment to figure out if a patient is close to end-of-life” (MD3). A key point of contention with clinicians who preferred more false negatives was that having too many false positives would create a numbing effect for clinicians, thus nullifying the utility of the algorithm.

Discussion

In this qualitative study of practicing medical oncology physicians and advanced practice providers, we found that ML prognostic algorithms in clinical oncology practice elicited a range of positive and negative perceptions with regard to improving quality of cancer and end-of-life care. Many oncology clinicians felt that such a tool had potential utility to prompt earlier conversations about goals of care. Others, particularly those who were skeptical about the tool, raised concerns about disclosing mortality predictions to patients. Most clinicians stated that its usefulness was predicated on the accuracy of the algorithm, although clinicians were split on optimal algorithm performance metrics. Given the “black box” nature of most ML predictions, oncology clinicians expressed concerns that the rationale for a high-risk prediction would not be evident to the clinician, potentially leading to overreliance on an algorithm. Our results provide new insights into the implementation of ML prognostic tools in oncology practice to achieve high-quality end-of-life care.

There are three critical insights from this analysis. First, clinicians were open to the utility of an ML algorithm for prompting and supporting goals of care communication and discussions about prognosis. Inadequate access to these conversations for patients with cancer is a known problem [32, 33]. Smaller qualitative studies have previously suggested that improving confidence about prognosis is a key benefit of prognostic algorithms [29]. Our study extends these findings by studying an oncology-specific population and emphasizing use of a ML prognostic tool. Sub-specialized oncologists had more reservations about the accuracy of an algorithm for their specific population, compared to general oncologists.

Second, clinicians revealed concerns about biases in the application of predictions from ML prognostic algorithms. Notably, clinicians were concerned that such algorithms could reinforce confirmation bias (only applying the information when it confirms their intuition) or automation bias (overly trusting the mortality prediction while ignoring other sources of information, including their own intuition). Confirmation bias is a well-recognized consequence of clinical decision support tools, and automated prognostic algorithms run the risk of reinforcing potentially inaccurate prognoses [34, 35]. Automation bias may lead to an overreliance on the prognostic algorithm, leading to incorrect decision-making and/or weakening of the patient-doctor relationship [36]. Subspecialized oncologists were particularly worried about incorrect predictions from the algorithm. These specialists often see more complex cases of advanced cancer in a specific disease area and thus may have better prognostic acumen within their disease area. Such oncologists may be skeptical about the accuracy of a disease-agnostic “black-box” algorithm, particularly for hematologic malignancies where even heavily resistant cancers are sometimes cured (in contrast to more predictable prognoses of metastatic solid tumors). This is one of the first studies to highlight that confirmation and automation biases are risks inherent with clinicians’ use of ML prognostic algorithms.

Third, disclosure to patients of machine-generated mortality risk predictions is controversial. Many clinicians were against sharing this information with patients, despite the known benefits of prognostic disclosure on patient psychological wellbeing without compromising the patient-doctor relationship [37, 38]. Many clinicians felt that patients may react unfavorably to a computer-generated life expectancy estimate in the absence of a clinicians’ contextualization. Indeed, direct disclosure of life expectancy is often not desired by patients with cancer [39]. However, patients view advance care planning much more favorably, and thus using ML prognostic estimates to prompt such discussions is potentially useful. Oncology clinicians felt that they, not algorithms, should be responsible for developing prognoses, although there was value to an algorithm that validates a clinician’s intuition. There was, however, little discussion about how oncology clinicians would respond to an algorithm output that challenges their intuition.

These results provide critical insights into future use of ML prognostic tools in clinical research and quality improvement in oncology. By prompting earlier conversations, ML prognostic tools could facilitate achieving metrics of high-quality end-of-life care in oncology. Well-powered prospective studies of ML prognostic tools should prioritize studying these outcomes. Additionally, oncology clinicians’ concerns regarding confirmation and automation biases highlight the need to pair ML algorithms with robust clinician training and support on how to use the information in relation to goals of care communication and informed decision-making. Clinicians’ concerns also illustrate the importance of more research to study the impact of algorithms on actual clinical decision-making and any unintended consequences on patient and clinician experience and outcomes, including reinforcing existing care disparities [40].

Our analysis has several strengths and limitations. First, we sampled opinions from a single large academic health system, and most clinician participants were non-Hispanic White and/or female. Furthermore, 40% had 10 or less years of experience. Thus, results may be difficult to generalize to other oncology practice settings. However, our sampling strategy ensured a diversity of perspectives from different practice types and training backgrounds with a diversity of clinicians. Second, some individuals interviewed had prior exposure to an automated prognostic algorithm. Because our aim was to achieve a broad representation of perspectives on applied machine-learning generated morality predictions and not to compare perspectives of those with and without prior exposure to such models, our study was not able to assess thematic saturation for this subgroup. However, themes—including the usefulness of algorithms to validate prognostic intuition—were similar among clinicians with or without prior algorithm exposure and our exploratory analysis did not detect large differences in perspectives. Third, while we did capture desired characteristics of prognostic algorithms, we did not assess optimal strategies for presenting algorithm-generated prognoses to clinicians to maximize utility of the system, nor did we specifically assess the optimal strategy to notify clinicians of a patient with limited prognosis. This is particularly important, as “alert fatigue” is a growing phenomenon with increased EHR prompts. These data points should be explored in future studies.

Conclusion

Attitudes towards the clinical benefit, ethics, accuracy, and utility of ML prognostic algorithms varied significantly among oncology clinicians caring for patients with cancer. Focusing on promising use cases, such as prompting end-of-life conversations and ensuring algorithm accuracy prior to implementation, are key considerations for deploying such tools in the routine care of patients with cancer in order to achieve high-quality end-of-life care. Future prospective studies should focus on using these principles as part of algorithm-based initiatives to improve palliative care and advance care planning utilization for patients with cancer. Additionally, further mixed-methods studies among oncology clinicians who have been exposed to machine learning algorithms are needed to identify which component(s) of a multipronged intervention are most beneficial.

Supplementary Material

Supplemental Table 1
Supplementary Material
Supplementary Checklist

Acknowledgements

The authors would acknowledge Zoe Belardo for assistance in interview coding.

Funding

This study was supported by the National Cancer Institute K08CA263541 (to R.B.P.), Penn Center for Precision Medicine Accelerator Fund (to R.B.P. and C.R.M.), and the National Palliative Care Research Center (to R.B.P.). The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Footnotes

Conflict of interest The authors declare no competing interests.

Ethics approval This study was approved by the University of Pennsylvania Institutional Review Board.

Consent to participate Informed consent was obtained from all individual participants included in this study.

Consent for publication Participants signed informed consent regarding the publishing of their data in a de identified manner.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s00520-021-06774-w.

Data availability

Not applicable.

References

  • 1.Gilligan T, Coyle N, Frankel RM et al. (2017) Patient-clinician communication: American Society of Clinical Oncology consensus guideline. J Clin Oncol 35:3618–3632. 10.1200/JCO.2017.75.2311 [DOI] [PubMed] [Google Scholar]
  • 2.Emanuel EJ, Young-Xu Y, Levinsky NG et al. (2003) Chemotherapy use among Medicare beneficiaries at the end of life. Ann Intern Med 138:639–643 [DOI] [PubMed] [Google Scholar]
  • 3.Earle CC, Neville BA, Landrum MB et al. (2004) Trends in the aggressiveness of cancer care near the end of life. J Clin Oncol 22:315–321. 10.1200/JCO.2004.08.136 [DOI] [PubMed] [Google Scholar]
  • 4.Earle CC, Landrum MB, Souza JM et al. (2008) Aggressiveness of cancer care near the end of life: is it a quality-of-care issue? J Clin Oncol 26:3860–3866. 10.1200/JCO.2007.15.8253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chastek B, Harley C, Kallich J et al. (2012) Health care costs for patients with cancer at the end of life. J Oncol Pract 8:75s–80s. 10.1200/JOP.2011.000469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wen F-H, Chen J-S, Su P-J et al. (2018) Terminally ill cancer patients’ concordance between preferred life-sustaining treatment states in their last six months of life and received life-sustaining treatment states in their last month: an observational study. J Pain Symptom Manage 56:509–518.e3. 10.1016/j.jpainsymman.2018.07.003 [DOI] [PubMed] [Google Scholar]
  • 7.Christakis NA, Lamont EB (2000) Extent and determinants of error in doctors’ prognoses in terminally ill patients: prospective cohort study. BMJ 320:469–472. 10.1136/bmj.320.7233.469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sborov K, Giaretta S, Koong A et al. (2019) Impact of accuracy of survival predictions on quality of end-of-life care among patients with metastatic cancer who receive radiation therapy. J Oncol Pract 18:e262–e270. 10.1200/JOP.18.00516 [DOI] [PubMed] [Google Scholar]
  • 9.Manz CR, Parikh RB, Small DS et al. (2020) Effect of integrating machine learning mortality estimates with behavioral nudges to clinicians on serious illness conversations among patients with cancer: a stepped-wedge cluster randomized clinical trial. JAMA Oncol 2020:e204759. 10.1001/jamaoncol.2020.4759 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wright AA, Zhang B, Ray A et al. (2008) Associations between end-of-life discussions, patient mental health, medical care near death, and caregiver bereavement adjustment. JAMA 300:1665–1673. 10.1001/jama.300.14.1665 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Brinkman-Stoppelenburg A, Rietjens JAC, van der Heide A (2014) The effects of advance care planning on end-of-life care: a systematic review. Palliat Med 28:1000–1025. 10.1177/0269216314526272 [DOI] [PubMed] [Google Scholar]
  • 12.Robbins R (2020) Hospitals tap AI to nudge clinicians toward end-of-life conversations. https://www.statnews.com/2020/07/01/end-of-life-artificial-intelligence/. Accessed 6 Oct 2020
  • 13.Huang S, Yang J, Fong S et al. (2020) Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett 471:61–71. 10.1016/j.canlet.2019.12.007 [DOI] [PubMed] [Google Scholar]
  • 14.Machine learning applications in cancer prognosis and prediction (2015) Computational and Structural. Biotechnol J 13:8–17. 10.1016/j.csbj.2014.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Nagy M, Radakovich N, Nazha A (2020) Machine learning in oncology: what should clinicians know? JCO Clin Cancer Inform 4:799–810. 10.1200/CCI.20.00049 [DOI] [PubMed] [Google Scholar]
  • 16.Elfiky AA, Pany MJ, Parikh RB et al. (2018) Development and application of a machine learning approach to assess short-term mortality risk among patients with cancer starting chemotherapy. JAMA Netw Open 1:e180926–e180926. 10.1001/jamanetworkopen.2018.0926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Thorsen-Meyer H-C, Nielsen AB, Nielsen AP et al. (2020) Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health 2(4):e179–e191. 10.1016/S2589-7500(20)30018-2 [DOI] [PubMed] [Google Scholar]
  • 18.Brajer N, Cozzi B, Gao M et al. (2020) Prospective and external evaluation of a machine learning model to predict in-hospital mortality of adults at time of admission. JAMA Netw Open 3:e1920733. 10.1001/jamanetworkopen.2019.20733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Desai RJ, Wang SV, Vaduganathan M et al. (2020) Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw Open 3:e1918962. 10.1001/jamanetworkopen.2019.18962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Marafino BJ, Park M, Davies JM et al. (2018) Validation of prediction models for critical care outcomes using natural language processing of electronic health record data. JAMA Netw Open 1:e185097. 10.1001/jamanetworkopen.2018.5097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sahni N, Simon G, Arora R (2018) Development and validation of machine learning models for prediction of 1-year mortality utilizing electronic medical record data available at the end of hospitalization in multicondition patients: a proof-of-concept study. J Gen Intern Med 33:921–928. 10.1007/s11606-018-4316-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rajkomar A, Oren E, Chen K et al. (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1:18. 10.1038/s41746-018-0029-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bertsimas D, Dunn J, Pawlowski C et al. (2018) Applied informatics decision support tool for mortality predictions in patients with cancer. JCO Clin Cancer Inform 2:1–11. 10.1200/CCI.18.00003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Parikh RB, Manz C, Chivers C et al. (2019) Machine learning approaches to predict 6-month mortality among patients with cancer. JAMA Netw Open 2:e1915997. 10.1001/jamanetworkopen.2019.15997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Titano JJ, Badgeley M, Schefflein J et al. (2018) Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med 24:1337–1341. 10.1038/s41591-018-0147-y [DOI] [PubMed] [Google Scholar]
  • 26.Gensheimer MF, Aggarwal S, Benson KRK et al. (2020) Automated model versus treating physician for predicting survival time of patients with metastatic cancer. J Am Med Inform Assoc 2020:ocaa290. 10.1093/jamia/ocaa290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Parikh RB, Gdowski A, Patt DA et al. (2019) Using big data and predictive analytics to determine patient risk in oncology. Am Soc Clin Oncol Educ Book 39:e53–e58. 10.1200/EDBK_238891 [DOI] [PubMed] [Google Scholar]
  • 28.Vollmer S, Mateen BA, Bohner G et al. (2020) Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 368:l6927. 10.1136/bmj.l6927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hallen SAM, Hootsmans NAM, Blaisdell L, Gutheil CM, Han PKJ (2015) Physicians’ perceptions of the value of prognostic models: the benefits and risks of prognostic confidence. Health Expect 18(6):2266–2277. 10.1111/hex.12196 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Adibi A, Sadatsafavi M, Ioannidis JPA (2020) Validation and utility testing of clinical prediction models: time to change the approach. JAMA 324:235–236. 10.1001/jama.2020.1230 [DOI] [PubMed] [Google Scholar]
  • 31.Manz CR, Chen J, Liu M et al. (2020) Validation of a machine learning algorithm to predict 180-day mortality for outpatients with cancer. JAMA oncology 6(11):1723–1730. 10.1001/jamaoncol.2020.4331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bernacki RE (2003) Block SD (2014) American College of Physicians High Value Care Task Force Communication about serious illness care goals: a review and synthesis of best practices. JAMA Int Med 174:1994. 10.1001/jamainternmed.2014.5271 [DOI] [PubMed] [Google Scholar]
  • 33.Dying in America: improving quality and honoring individual preferences near the end of life. Institute of Medicine. http://nationalacademies.org/hmd/Reports/2014/Dying-In-America-Improving-Quality-and-Honoring-Individual-Preferences-Near-the-End-of-Life.aspx. Accessed 23 Apr 2019 [Google Scholar]
  • 34.Elston DM (2020) Confirmation bias in medical decision-making. J Am Acad Dermatol 82:572. 10.1016/j.jaad.2019.06.1286 [DOI] [PubMed] [Google Scholar]
  • 35.Saposnik G, Redelmeier D, Ruff CC et al. (2016) Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak 16:138. 10.1186/s12911-016-0377-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zerilli J, Knott A, Maclaurin J et al. (2019) Algorithmic decision-making and the control problem. Mind Mach 29:555–578 [Google Scholar]
  • 37.Chen C-H, Tang S-T (2014) Prognostic disclosure and its influence on cancer patients. J Cancer Res Pract 1:103–112. 10.6323/JCRP.2014.1.2.02 [DOI] [Google Scholar]
  • 38.van der Velden NCA, Meijers MC, Han PKJ et al. (2020) The effect of prognostic communication on patient outcomes in palliative cancer care: a systematic review. Curr Treat Options Oncol 21:40. 10.1007/s11864-020-00742-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Walczak A, Henselmans I, Tattersall MHN et al. (2015) A qualitative analysis of responses to a question prompt list and prognosis and end-of-life care discussion prompts delivered in a communication support program. Psychooncology 24:287–293. 10.1002/pon.3635 [DOI] [PubMed] [Google Scholar]
  • 40.Obermeyer Z, Powers B, Vogeli C et al. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366:447–453. 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
  • 41.Elo S, Kyngäs H (2008) The qualitative content analysis process. J Adv Nurs 62(1):107–115. 10.1111/j.1365-2648.2007.04569.x [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Table 1
Supplementary Material
Supplementary Checklist

Data Availability Statement

Not applicable.

RESOURCES