Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2020 Dec 14;28(6):1108–1116. doi: 10.1093/jamia/ocaa290

Automated model versus treating physician for predicting survival time of patients with metastatic cancer

Michael F Gensheimer 1,, Sonya Aggarwal 1, Kathryn RK Benson 1, Justin N Carter 1, A Solomon Henry 2, Douglas J Wood 2, Scott G Soltys 1, Steven Hancock 1, Erqi Pollom 1, Nigam H Shah 2, Daniel T Chang 1
PMCID: PMC8200275  PMID: 33313792

Abstract

Objective

Being able to predict a patient’s life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians’ performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model.

Materials and Methods

A machine learning model was trained using 14 600 metastatic cancer patients’ data to predict each patient’s distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015–2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots.

Results

The radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63–0.81), 0.77 (0.73–0.81), and 0.68 (0.65–0.71), respectively.

Conclusions

The machine learning model’s predictions were more accurate than those of the treating physician or a traditional model.

Keywords: neoplasms, prognosis, machine learning, natural language processing, radiotherapy

INTRODUCTION

Predicting patients’ life expectancy is a fundamental task for physicians since it informs treatment decisions, advance care planning, and patients’ decisions about other aspects of their lives.1,2 Discussing prognosis with seriously ill patients can increase patient satisfaction and improve discussions of challenging topics, such as preferences for cardiopulmonary resuscitation.3 Of patients whose physicians did not discuss prognosis, many stated they would value such a discussion.3 However, oncologists and other physicians tend to perform poorly in predicting life expectancy, often over-predicting survival time,1,4–7 which may contribute to suboptimal and overly intensive care at the end of life that is seen in large-scale studies.8,9

Statistical models could be a useful tool to improve these predictions. Several studies have compared accuracy of mortality predictions made by physicians versus statistical models.10–14 The largest meta-analysis showed that physicians had better performance than the models for predicting mortality risk,10 though a few other studies have found the opposite pattern.11,13 Physicians rarely use prognostic models in practice,15,16 which could be justified if the models do not actually improve on their predictions.

In this paper we will refer to such prognostic models as traditional models, which generally use few predictor variables/features and require hand-coding of variables such as the Glasgow coma scale17 or performance status.2 These can be contrasted with machine learning (ML) models that use routinely collected electronic medical record (EMR) data and generally use hundreds or thousands of features. There has recently been rapid development of ML prognostic models.18–25 ML models have some important advantages compared to traditional models, including potentially higher accuracy due to the ability to use many automatically selected features and the ability to be automatically deployed with minimal increase in provider workload.

If ML models for life expectancy prediction are to be clinically deployed to help physicians and patients make decisions, it will be important to compare their performance to that of the treating physician and traditional models. If performance is worse than that of physicians, the model would be less useful and misplaced trust in the model could even be harmful to patients.26 While there have been studies comparing physician versus ML model performance for prediction of some endpoints, such as diabetic retinopathy, we are not aware of such a comparison for life expectancy prediction.27,28

In this paper, we compare performance of the treating physician, an ML model, and a traditional model for life expectancy prediction in patients with metastatic cancer. This task is important for this patient population since many of these patients live for only a few years, raising the importance of advance care planning.29 As far as we are aware, this is the first comparison of physician vs ML model performance for survival prediction, and we hope the results will help to serve as a baseline for future studies in this area.

MATERIALS AND METHODS

Machine learning model

We trained an ML model to predict overall survival time using EMR (Epic, Verona, WI) data for patients seen for metastatic cancer in the Stanford Health Care system from 2008–2020. The 3813 predictor variables included text of provider notes and radiology reports (frequency of 1–2-word phrases), laboratory values, vital signs, ICD-9 diagnosis codes, current procedural terminology codes, and medication administrations and prescriptions. For notes, labs, and vital signs, only the most recent year of data was included; for the other data sources, all past data were included. More recent data were weighted more heavily. Details of the patients and construction of predictor variables can be found in a prior publication describing an earlier version of the ML model.25 The earlier model was trained and tested using 12 588 patients with a median age at metastatic diagnosis of 63.5 years old. The most common primary tumor sites were gastrointestinal (27%), thoracic (16%), and genitourinary (11.2%). For the current version of the model, we added more recent data and identified additional patients by using the manually entered stage in the EMR. An additional 2822 patients were included, for a total of 15 410. Of these patients 810 had been enrolled in a study used as the source of the physician survival predictions and were set aside. The remaining patients were randomly divided with an 80%/20% split into a training set of 11 680 patients and a validation set of 2920 patients. As in the prior paper, for training, each patient could contribute multiple observations to the design matrix, starting with the first visit after time of metastatic diagnosis and continuing every 6 months until the last recorded visit.

In the earlier version of the ML model, a Cox proportional hazards model was used. In the current version, we used a discrete-time survival model that divides follow-up time from 0 to 5 years into 12 intervals and predicts a conditional survival probability for each time interval.30 We changed to the discrete-time survival model because it enabled faster training and more flexibility, for instance by allowing nonproportional hazards. L2 regularization strength was chosen by 5-fold cross-validation on the training set to maximize log likelihood. Other hyperparameters, including number of neural network layers, were fine-tuned by trying a range of values, fitting the model to the training set, and then picking the option with best performance on the validation set measured mainly by log likelihood. Once the model architecture was finalized, the final model coefficients were fit using combined training and validation sets. The Supplementary Methods contains more details of the ML model.

For patients with little data in the EMR, it would be impossible to create an accurate survival estimate. Therefore, predictions were only made for patients with at least 1 note, lab result, and procedure code by the evaluation date.

Physicians’ survival predictions

The physician survival predictions were taken from a study that had been published previously.31 From March 2015 to Nov. 2016, all patients receiving external beam radiotherapy for metastatic cancer were identified weekly by a research assistant and were enrolled. Patients could be enrolled again if they received another course of radiation. Members of the care team (attending radiation oncologists and resident physicians, nurses, and radiation therapists) filled out a form in which they predicted the patient’s life expectancy, picking from 7 bins (0–3, 3.1–6, 6.1–9, 9.1–12, 12.1–18, 18.1–24, >24 months). The form was distributed to providers during the first week of radiotherapy, but forms were returned at varying times. The survey form can be found in the Supplementary Material. For the current analysis, only the attending physician’s answers were used, and treatment courses were excluded if the attending physician did not return their form.

Traditional model

We also wished to evaluate performance of a traditional prognostic model. We previously found that a simple model using only Eastern Cooperative Oncology Group (ECOG) performance status as proposed by Jang et al performed as well as a more complicated model.2,25 Therefore, we used ECOG performance status as the single feature for the traditional model; it takes integer values from 0 to 4. Because Jang et al calibrated their predictions using patients seen at a palliative care clinic, all their predicted survival times were quite short (<1 year). Therefore, we recalibrated their model to better represent our patient population by using a group of 983 patients receiving palliative radiation and calculating Kaplan-Meier survival curves for each level of ECOG performance status (levels 3 and 4 were grouped together due to only 8 patients having a level of 4).

Statistical analysis

Survival outcomes were obtained using EMR data and public information, such as obituaries. The outcomes database was locked on January 31, 2019. Survival time was measured from the first day of radiotherapy. The ML model could use data up to the first day of radiotherapy. We attempted to avoid data leakage, which could occur if the model had information that was not actually available at that time. For instance, the date of each clinical note was set to the date it was last edited, not the date of the patient encounter. The ML model outputs a predicted survival curve for each patient, but to allow fair comparison with the physicians, for this analysis, we used the median predicted survival and binned it into the same 7 intervals that the physicians used.

Discrimination performance was measured using AUC for 1-year overall survival and Harrell’s C-index. The 1-year time point was chosen because it is clinically relevant (symptom burden and psychological distress increase in the last year of life, and advance care planning becomes more urgent), and for comparability with prior studies.5,24,25,32–34 To compare AUC between different prediction methods, Obuchowski’s nonparametric test was used as implemented in the fastAUC R package, with physician ID as the cluster variable.35 Within-patient clustering was not corrected for, since most patients were only enrolled once and the expected effect would be small. Calibration was measured with calibration plots. All tests were 2-sided, and P < .05 was considered statistically significant.

Discrimination of the traditional model could easily be compared to the ML model and physicians’ predictions using AUC for 1-year survival. It was more difficult to directly compare calibration between the traditional model and the other prediction methods, since there were only 4 levels of performance status in the traditional model, which could not be mapped cleanly to the 7 bins of physicians’ predictions. Therefore, we created a separate calibration plot for the traditional model.

Computer code for training the ML model and evaluating the predictions is available at https://github.com/MGensheimer/prognosis-model. This GitHub repository also contains coefficients for the first layer of the ML model’s neural network, which indicate the influence of each predictor variable on survival time. The ML model was implemented in Python 3.7.5 with the Keras library. Statistical analysis was done with R 3.6.2. Data have not been made available, as it would be challenging to fully deidentify the data. The current retrospective analysis was approved by the Stanford Institutional Review Board.

RESULTS

The study in which radiation oncologists predicted their patients’ life expectancy included 1190 treatment courses in 899 patients.31 Of these, 879 treatment courses in 685 patients were included in the current analysis (median of 1 course per patient, range 1–6). Reasons for exclusion were: attending physician did not return survey (n = 212 courses), insufficient data for the ML model to make a survival prediction (n = 98), and no follow-up information (n = 1).

Patient characteristics are shown in Table 1. Characteristics were similar among the entire patient population and analyzed patients. For analyzed patients, the most commonly treated sites were bone (n = 380) and brain (n = 335). More than 1 site could be treated in a single course.

Table 1.

Patient characteristics

Characteristic All treatment courses (n = 1190) Analyzed treatment courses (n = 879)
Median age, y (IQR) 64 (55–72) 64 (54–71)

Sex

 Female

 Male

608 (51%)

582 (49%)

473 (54%)

406 (46%)

Primary site

 Breast

 Gastrointestinal

 Genitourinary

 Gynecologic

 Head and neck

 Skin

 Thorax

 Other/unknown

196 (16%)

174 (15%)

135 (11%)

64 (5%)

69 (6%)

154 (13%)

369 (31%)

29 (2%)

141 (16%)

132 (15%)

88 (10%)

54 (6%)

48 (5%)

109 (12%)

286 (33%)

21 (2%)

Treatment type

 Conventional

 Stereotactic

584 (49%)

606 (51%)

416 (47%)

463 (53%)

Median dose, Gy (IQR) 24 (20–30) 24 (20–30)
Median fractions (IQR) 4 (1–10) 3 (1–10)

Abbreviation: IQR, interquartile range.

For the analyzed treatment courses, median follow-up was 10.9 months; median follow-up in surviving patients was 27.2 months. Median overall survival was 11.7 months. Survival proportion by the Kaplan-Meier method was 66% at 6 months, 49% at 12 months, and 32% at 24 months.

The treating attending physician and ML model both classified patients into 7 bins of predicted median survival time: 0–3, 3–1.6, 6.1–9, 9.1–12, 12.1–18, 18.1–24, and >24 months. The physician and ML model predictions were positively correlated (Figure 1).

Figure 1.

Figure 1.

Contingency table showing physician vs machine learning model predictions of median survival time. The 7 bins of predicted survival time were condensed to 4 to make interpretation easier.

Accuracy of survival estimates was assessed using AUC for 1-year survival (Figure 2), and Harrell’s C-index. Physician survival estimation accuracy would be expected to be correlated within each physician’s set of patients. Therefore, confidence intervals and P values had to be adjusted to take this clustering into account. Such corrections have been described for AUC but not C-index. The physician’s prediction, ML model, and traditional model had AUC for 1-year survival of 0.72 (95% CI 0.63–0.81), 0.77 (0.73–0.81), and 0.68 (0.65–0.71), respectively. Combining the physician’s and ML model predictions in a simple way by averaging the 2 bin numbers and rounding down (referred to subsequently as physician+ML) had AUC of 0.79 (0.73–0.84). Results of tests of AUC difference between different methods were as follows: physician’s prediction vs ML model, P = .11; physician’s prediction vs physician+ML, P = .0006; ML model vs physician+ML, P = .34; ML model vs traditional model, P < .0001. C-index for physician’s prediction, ML model, traditional model, and physician+ML was 0.67, 0.70, 0.64, and 0.72, respectively. In subgroup analysis by sex, race, ethnicity, and primary tumor site, the ML model outperformed the physicians’ predictions in 12/14 subgroups (Supplementary Table 1). Results for prediction of 6-month and 2-year survival can be found in the Supplementary Results.

Figure 2.

Figure 2.

Receiver operating characteristic curves for 1-year survival. A total of 851 treatment courses were included, and 28 were excluded due to patient lost to follow-up before 1 year. Abbreviation: ML, machine learning model.

As seen in Table 2 and Figure 3, both ML model and physician’s prediction were well-calibrated (no systematic over- or underprediction of survival time). For each method and for each predicted survival bin, the median actual survival fell within the predicted range. The traditional model also showed fairly good calibration (Supplementary Figure 1).

Table 2.

Actual median survival for patients in 4 bins of predicted median survival. Values are medians calculated with the Kaplan-Meier method, with 95% confidence intervals in parentheses. The 7 bins of predicted survival time were condensed to 4 to make interpretation easier.

Predicted median survival
0–6 months
6.1–12 months
12.1–24 months
>24 months
Patients Actual median survival Patients Actual median survival Patients Actual median survival Patients Actual median survival
Treating physician 255

4.1

(3.2–6.0)

314

10.1

(8.5–12.1)

225

19.8

(17.5–25.0)

85

36.8

(28.2–∞)

Machine learning model 249

3.7

(2.9–4.4)

211

9.2

(7.6–11.7)

203

14.6

(12.2–18.5)

216

32.7

(27.2–∞)

Figure 3.

Figure 3.

Kaplan-Meier plots of actual survival time for patients in each bin of predicted survival time for physicians and ML model. Number of patients in each bin and median survival times are shown in Table 2. The 7 bins of predicted survival time were condensed to 4 for visual clarity.

The preceding analyses binned patients by predicted survival time and then showed distribution of actual survival time for patients in each bin. It is also instructive to take the opposite approach: bin patients by actual survival time and show the distribution of predicted survival time. The distribution of physician vs ML model predictions for patients living longer or shorter than 12 months is shown in Figure 4. For patients with short survival time, the physicians were slightly less likely to predict very long survival time of >24 months. Conversely, for patients with long survival time, the ML model was much more likely to correctly predict long survival. Supplementary Figure 2 shows the distribution of predicted minus actual survival time bin for 609 treatment courses in patients who died. Note that since only deceased patients are included in this figure, a well-calibrated prediction method could appear to overestimate survival time.

Figure 4.

Figure 4.

Distribution of predicted survival time for patients who lived for shorter or longer than 12 months. Twenty-eight treatment courses were excluded due to patient lost to follow-up before 12 months.

To see if the ML model predictions added useful information to the physician predictions, we divided patients into 4 bins of physician-predicted survival (0-6 months, 6.1–12 months, 12.1–24 months, >24 months). For each patient, we recorded whether the ML model agreed with the physician (placed patient in same bin as physician) or disagreed and predicted shorter or longer survival time. Results are shown in Figure 5. Within each bin of physician-predicted survival, the ML model separated patients into good- and poor-prognosis groups. For instance, when the physician predicted 0–6 months survival but the ML model predicted >6 months survival, actual 6-month survival was 68% (95% CI 60–76) and median survival was 10.8 months (95% CI 7.7–12.2). When the physician predicted 6.1–12 months survival but the ML model predicted shorter survival, actual 6-month survival was 45% (95% CI 36–57) and median survival was 5.1 months (95% CI 4.0–8.8).

Figure 5.

Figure 5.

For patients in 4 bins of physician-predicted survival time, Kaplan-Meier survival plots based on whether machine learning model predicted shorter, same, or longer time. P values computed with log-rank test.

To learn about the reasons for the ML model’s predictions, we tabulated the most common features that led to longer predicted survival for good-prognosis patients, and led to shorter predicted survival for poor-prognosis patients. To find the feature j increasing survival the most for treatment course i, the following equation was used

argmaxjβjxij-xj-

where i is the index of the treatment course, j1, 2, , nfeatures is the index of the feature, β is the vector of coefficients of the first layer of the neural network, xij is the value of feature j for treatment course i, and xj- is the mean value in the training set for feature j. The features decreasing survival the most were found in an analogous way. The results are shown in Table 3 and generally correspond well to clinical intuition.

Table 3.

Features leading to longer predicted survival for good-prognosis patients, and shorter predicted survival for poor-prognosis patients. Good-prognosis patients were defined as those in the 2 highest bins of ML model-predicted survival (18.1–24 months, >24 months), and poor-prognosis patients were defined as those in the 2 lowest bins (0–3 months, 3.1–6 months)

Most common features increasing predicted survival time in 282 treatment courses with >18 month predicted survival Most common features decreasing predicted survival time in 249 treatment courses with 0–6 month predicted survival
Feature (+ means higher value increases survival) Count Feature (+ means higher value increases survival) Count
−Pulse 60 −Secondary malignant neoplasm of brain and spinal cord (ICD-9 198.3) 52
−Age 50 −Radiation treatment management (CPT 77427) 46
+Ephedrine (medication) 34 −Stereotactic MRI 45
+Complex radiation treatment delivery (CPT 77412) 29 −DNR/DNI order 43
+Office consultation (CPT 99244) 24 −Encounter for palliative care (ICD-9 V66.7) 40
+Out of bed to chair (nursing order) 23 −Consult to palliative care 39
+Red blood cell count 22 −Neoplasm-related pain (ICD-9 338.3) 34
+FDG PET/CT (skull to thighs) 21 −Pulse 34
−Red cell distribution width (lab) 20 −MRI full spine with and without contrast 27
+Weight 20 −Morphine 23

If the ML model were to be used in practice, the treating physician and ML model would occasionally strongly disagree about predicted survival time. Supplementary Table 2 shows detailed information for the patients for whom the treating physician and ML model disagreed the most (difference in median survival prediction of 5–6 bins, out of a maximum of 6 bins). In 30 of the 34 cases, the physician predicted shorter survival time. These patients’ actual survival outcomes better matched the ML model predictions. For instance, for the 30 cases when the physician predicted 0–6 months survival and the ML model predicted >18 months survival, median survival by the Kaplan-Meier method was 20.1 months (95% CI 11.6–∞), and 11 patients were still alive at last follow-up at a maximum of 35.5 months.

There were 12 physicians who submitted survival predictions for 20 or more treatment courses. To evaluate per-physician performance, we calculated performance metrics using only that physician’s treatment courses. For each physician, we calculated comp_improve: ML model C-index minus physician C-index. Interestingly, physicians who treated more patients in the study tended to have higher comp_improve, indicating poorer performance compared to the ML model (r = 0.69, 95% CI 0.18–0.90, P = .01, Supplementary Figure 3). There was no significant correlation between years since residency graduation and comp_improve (r = 0.21, 95% CI −0.42–0.70, P = .52, Supplementary Figure 4).

We defined high-intensity radiotherapy as treatments with >5 fractions or stereotactic technique. Patients whose physicians predicted longer survival had more high-intensity treatments (P < .0001 by chi square test). Table 4 shows the proportion of high-intensity treatment courses for patients with different combinations of physician and ML model survival predictions. In general, for patients within a given bin of physician-predicted survival, increasing model-predicted survival time was correlated with more use of high-intensity treatment. However, for the large group of patients with intermediate physician-predicted survival (6–24 months), proportion of high-intensity treatment did not vary much when the physician and model predictions were within 1 bin of each other. This suggests that use of the model predictions could help tailor treatment intensity to each patient.

Table 4.

Proportion of treatment courses with high intensity (>5 fractions, or stereotactic technique) when treatment courses are binned by physician and machine learning model survival predictions. Within each physician prediction bin, the P value is for whether proportion of high treatment intensity varies between patients in different model prediction bins (chi square test)

Physician prediction (months)
0–6 6.1–12 12.1–24 >24
Machine learning model prediction (months) 0–6 58% (76/130) 89% (82/92) 71% (17/24) 100% (3/3)
6.1–12 80% (47/59) 84% (74/88) 96% (51/53) 73% (8/11)
12.1–24 79% (31/39) 89% (68/76) 88% (59/67) 100% (21/21)
>24 93% (25/27) 98% (57/58) 88% (71/81) 82% (41/50)
P value .0003 .06 .02 .11

DISCUSSION

In this analysis of patients receiving radiation therapy for metastatic cancer, an ML model’s survival predictions were more accurate than the physician’s prediction and a traditional model. The comparison of physician’s prediction and ML model was not statistically significant, possibly due to large confidence interval of AUC for physician’s prediction due to correlation of results within physicians. However, combining the ML model and physician’s prediction resulted in a statistically significant improvement over the physician’s prediction. The results are notable because there were several factors favoring the physician. First, the physician could have information about the patient that was not recorded in the EMR and inaccessible to the computer.36 This is important for patients with metastatic cancer, for whom subjective information like performance status and symptom burden is critical.13 Combining the physician and ML predictions improved performance, suggesting that the physician and model used different sets of data to make their predictions and the model was not simply learning from the physician’s notes. Second, the ML model was given information up to the first day of radiation, whereas the physicians sometimes returned surveys after completion of radiation, giving them up to several weeks of additional data on the patient’s condition. Since the largest study of survival predictions of physicians vs traditional models showed that physicians had better performance, our results suggest that use of ML methods with many predictor variables may be important to reach or exceed human performance level in survival prediction.10 It is plausible that the use of detailed EMR data in an ML model improves performance. Concepts such as leptomeningeal disease or oligometastatic state apply to only a small proportion of metastatic cancer patients so would not be practical to include in most traditional models, but have an important influence on survival time when present and are easily incorporated into an ML model. Other studies of ML models for predicting cancer patients’ life expectancy have also shown promising results but have not compared performance to that of the treating physician.19,37–39

The comparison with the treating physician’s predictions gives insights into how implementation of the ML model could affect care. For patients with both short and long physician-predicted survival, the addition of the model estimate added valuable information (Figure 5). In the context of radiotherapy, physicians could use the ML model to help tailor treatment intensity to each patient. For patients with poor prognosis, shorter, more convenient and cost-effective radiation courses could be used.40 For those with good prognosis, complex techniques such as stereotactic ablative radiation therapy could be considered to improve long-term tumor control.41,42 In the context of the patient’s overall cancer care, the predictions could help oncologists decide when to introduce discussions of goals of care.29,43 It will be important to study how physicians use the model’s predictions to update their own predictions and influence their decision-making. Some studies show that being shown ML model predictions can improve physician predictions, but the example of computer-aided diagnosis for mammography suggests this is not a given.26,44 The ML model could be useful even if not used directly by physicians. For instance, it could help with large-scale screening for interventions such as palliative care referral or advance care planning conversations. Instead of relying on support staff to manually screen patients using a set of rules, which is time-consuming and error-prone, survival estimates could be generated automatically.

Our study has several limitations. First, we focused on predictions made by the patient’s radiation oncologist. Other physicians such as the medical oncologist may have had a longer relationship with the patient or more detailed knowledge of disease factors, such as tumor mutations, which could result in more accurate predictions. However, studies have not consistently shown differences in prognostic accuracy by physician specialty or even profession (eg, nurses vs doctors).7 All humans, even domain experts, use heuristics and have cognitive biases that can reduce their performance in prediction and decision-making compared to statistical models.45–48 Second, the ML model was trained and tested using patients in 1 health system, and performance would likely change if applied at other medical centers.49 We have published our model-building code, but it would not be straightforward to apply it to other institutions’ data. More progress is needed in assembling multi-institutional EMR datasets that contain high-quality information about cancer staging and care. Finally, this was a retrospective analysis, and it will be important to prospectively verify model performance and study its impact.26 For initial prospective use, we are planning low-risk studies in which the ML model is used to help screen patients for advance care planning interventions recommended by guidelines.50,51 These studies will validate performance and give insight into physician acceptance of using the model.

FUNDING

This work was supported in part by National Cancer Institute (Cancer Center Support Grant number 5P30CA124435), National Institutes of Health/National Center for Research Resources (CTSA award number UL1 RR025744), and the Stanford Medicine Program for AI in Healthcare. The funders had no role in study design or the manuscript.

AUTHOR CONTRIBUTIONS

MG, EP, and DC contributed in study design, data acquisition, data analysis, and manuscript preparation. SA, KB, JC, SS, and SH contributed in data acquisition. NS contributed in study design and manuscript preparation. ASH and DW contributed in data acquisition and data analysis. All authors have approved the final version of the manuscript and agree to be accountable for all aspects of the work.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

Supplementary Material

ocaa290_Supplementary_Data

ACKNOWLEDGMENTS

We thank Sigi Javitz and Ken Jung for informatics support.

DATA SHARING STATEMENT

Analysis code and some parameters of the ML model have been published online. The patient data are not available due to containing protected health information that would be challenging to remove.

CONFLICT OF INTEREST STATEMENT

MFG reports grant funding from Varian Medical Systems and Philips Healthcare. DTC reports grant funding and honoraria from Varian Medical Systems and stock ownership in ViewRay.

REFERENCES

  • 1. Krishnan M, Temel J, Wright A, et al. Predicting life expectancy in patients with advanced incurable cancer: a review. J Support Oncol 2013; 11 (2): 68–74. [DOI] [PubMed] [Google Scholar]
  • 2. Jang RW, Caraiscos VB, Swami N, et al. Simple prognostic model for patients with advanced cancer based on performance status. JOP 2014; 10 (5): e335–41. [DOI] [PubMed] [Google Scholar]
  • 3. Heyland DK, Allan DE, Rocker G, et al. Discussing prognosis with patients and their families near the end of life: impact on satisfaction with end-of-life care. Open Med Peer-Rev Indep Open-Access J 2009; 3: e101– 10. [PMC free article] [PubMed] [Google Scholar]
  • 4. Chow E, Davis L, Panzarella T, et al. Accuracy of survival prediction by palliative radiation oncologists. Int J Radiat Oncol 2005; 61 (3): 870–3. [DOI] [PubMed] [Google Scholar]
  • 5. Lakin JR, Robinson MG, Bernacki RE, et al. Estimating 1-year mortality for high-risk primary care patients using the “surprise” question. JAMA Intern Med 2016; 176 (12): 1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hartsell WF, Desilvio M, Bruner DW, et al. Can physicians accurately predict survival time in patients with metastatic cancer? Analysis of RTOG 97-14. J Palliat Med 2008; 11 (5): 723–8. [DOI] [PubMed] [Google Scholar]
  • 7. White N, Reid F, Harris A, et al. A systematic review of predictions of survival in palliative care: how accurate are clinicians and who are the experts? PLoS One 2016; 11 (8): e0161407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Bekelman JE, Halpern SD, Blankart CR, et al. ; for the International Consortium for End-of-Life Research (ICELR). Comparison of site of death, health care utilization, and hospital expenditures for patients dying with cancer in 7 developed countries. JAMA 2016; 315 (3): 272. [DOI] [PubMed] [Google Scholar]
  • 9. Earle CC, Neville BA, Landrum MB, et al. Trends in the aggressiveness of cancer care near the end of life. JCO 2004; 22 (2): 315–21. [DOI] [PubMed] [Google Scholar]
  • 10. Sinuff T, Adhikari NKJ, Cook DJ, et al. Mortality predictions in the intensive care unit: comparing physicians with scoring systems. Crit Care Med 2006; 34 (3): 878–85. [DOI] [PubMed] [Google Scholar]
  • 11. Jain R, Duval S, Adabag S.. How accurate is the eyeball test? A comparison of physician’s subjective assessment versus statistical methods in estimating mortality risk after cardiac surgery. Circ Cardiovasc Qual Outcomes 2014; 7 (1): 151–6. [DOI] [PubMed] [Google Scholar]
  • 12. Chew DP, Junbo G, Parsonage W, et al. ; for the Perceived Risk of Ischemic and Bleeding Events in Acute Coronary Syndrome Patients (PREDICT) Study Investigators. Perceived risk of ischemic and bleeding events in acute coronary syndromes. Circ Cardiovasc Qual Outcomes 2013; 6 (3): 299–308. [DOI] [PubMed] [Google Scholar]
  • 13. Gwilliam B, Keeley V, Todd C, et al. Development of Prognosis in Palliative care Study (PiPS) predictor models to improve prognostication in advanced cancer: prospective cohort study. BMJ 2011; 343 (1): d4920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Minne L, Toma T, de Jonge E, et al. Assessing and combining repeated prognosis of physicians and temporal models in the intensive care. Artif Intell Med 2013; 57 (2): 111–7. [DOI] [PubMed] [Google Scholar]
  • 15. Buchan TA, Ross HJ, McDonald M, et al. Physician judgement vs model-predicted prognosis in patients with heart failure. Can J Cardiol 2020; 36 (1): 84–91. [DOI] [PubMed] [Google Scholar]
  • 16. McGinn T. Putting meaning into meaningful use: a roadmap to successful integration of evidence at the point of care. JMIR Med Inform 2016; 4 (2): e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Knaus WA, Draper EA, Wagner DP, et al. APACHE II: a severity of disease classification system. Crit Care Med 1985; 13 (10): 818–29. [PubMed] [Google Scholar]
  • 18. Hong JC, Eclov NCW, Dalal NH, et al. System for high-intensity evaluation during radiation therapy (SHIELD-RT): a prospective randomized study of machine learning-directed clinical evaluations during radiation and chemoradiation. JCO 2020; 38 (31): 3652–61. [DOI] [PubMed] [Google Scholar]
  • 19. Manz CR, Chen J, Liu M, et al. Validation of a machine learning algorithm to predict 180-day mortality for outpatients with cancer. JAMA Oncol 2020. doi: 10.1001/jamaoncol.2020.4331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Mahmoudi E, Kamdar N, Kim N, et al. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. BMJ 2020; m958.doi: 10.1136/bmj.m958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1: 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Marafino BJ, Park M, Davies JM, et al. Validation of prediction models for critical care outcomes using natural language processing of electronic health record data. JAMA Netw Open 2018; 1 (8): e185097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Avati A, Jung K, Harman S, et al. Improving palliative care with deep learning. BMC Med Inform Decis Mak 2018; 18 (S4): 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Wegier P, Koo E, Ansari S, et al. mHOMR: a feasibility study of an automated system for identifying inpatients having an elevated risk of 1-year mortality. BMJ Qual Saf 2019; 28 (12): 971–9. [DOI] [PubMed] [Google Scholar]
  • 25. Gensheimer MF, Henry AS, Wood DJ, et al. Automated survival prediction in metastatic cancer patients using high-dimensional electronic medical record data. J Natl Cancer Inst 2019; 111 (6): 568–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Char DS, Shah NH, Magnus D.. Implementing machine learning in health care—addressing ethical challenges. N Engl J Med 2018; 378 (11): 981–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 316 (22): 2402. [DOI] [PubMed] [Google Scholar]
  • 28. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542 (7639): 115–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Hoerger M, Greer JA, Jackson VA, et al. Defining the elements of early palliative care that are associated with patient-reported outcomes and the delivery of end-of-life care. JCO 2018; 36 (11): 1096–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Gensheimer MF, Narasimhan B.. A scalable discrete-time survival model for neural networks. PeerJ 2019; 7: e6257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Benson KRK, Aggarwal S, Carter JN, et al. Predicting survival for patients with metastatic disease. Int J Radiat Oncol 2020; 106 (1): 52–60. [DOI] [PubMed] [Google Scholar]
  • 32. McCarthy EP, Phillips RS, Zhong Z, et al. Dying with cancer: patients’ function, symptoms, and care preferences as death approaches. J Am Geriatr Soc 2000; 48 (S1): S110–21. [DOI] [PubMed] [Google Scholar]
  • 33. Tishelman C, Petersson L-M, Degner LF, et al. Symptom prevalence, intensity, and distress in patients with inoperable lung cancer in relation to time of death. JCO 2007; 25 (34): 5381–9. [DOI] [PubMed] [Google Scholar]
  • 34. Hwang SS, Chang VT, Fairclough DL, et al. Longitudinal quality of life in advanced cancer patients. J Pain Symptom Manage 2003; 25 (3): 225–35. [DOI] [PubMed] [Google Scholar]
  • 35. Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics 1997; 53 (2): 567. [PubMed] [Google Scholar]
  • 36. Weiner SJ, Wang S, Kelly B, et al. How accurate is the medical record? A comparison of the physician’s note with a concealed audio recording in unannounced standardized patient encounters. J Am Med Inform Assoc 2020; 27 (5): 770–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Zhao B, Gabriel RA, Vaida F, et al. Predicting overall survival in patients with metastatic rectal cancer: a machine learning approach. J Gastrointest Surg 2020; 24 (5): 1165–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Montazeri M, Montazeri M, Montazeri M, et al. Machine learning models in breast cancer survival prediction. THC 2016; 24 (1): 31–42. [DOI] [PubMed] [Google Scholar]
  • 39. Wang J, Deng F, Zeng F, et al. Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model. Am J Cancer Res 2020; 10 (5): 1344–55. [PMC free article] [PubMed] [Google Scholar]
  • 40. Hartsell WF, Scott CB, Bruner DW, et al. Randomized trial of short- versus long-course radiotherapy for palliation of painful bone metastases. J Natl Cancer Inst 2005; 97 (11): 798–804. [DOI] [PubMed] [Google Scholar]
  • 41. Palma DA, Olson R, Harrow S, et al. Stereotactic ablative radiotherapy versus standard of care palliative treatment in patients with oligometastatic cancers (SABR-COMET): a randomised, phase 2, open-label trial. Lancet 2019; 393 (10185): 2051–8. [DOI] [PubMed] [Google Scholar]
  • 42. Nguyen Q-N, Chun SG, Chow E, et al. Single-fraction stereotactic vs conventional multifraction radiotherapy for pain relief in patients with predominantly nonspine bone metastases: a randomized phase 2 trial. JAMA Oncol 2019; 5 (6): 872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Sborov K, Giaretta S, Koong A, et al. Impact of accuracy of survival predictions on quality of end-of-life care among patients with metastatic cancer who receive radiation therapy. JOP 2019; 15 (3): e262–70. [DOI] [PubMed] [Google Scholar]
  • 44. Oakden-Rayner L. The rebirth of CAD: how is modern AI different from the CAD we know? Radiol Artif Intell 2019; 1 (3): e180089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Tversky A, Kahneman D.. Judgment under uncertainty: heuristics and biases. Science 1974; 185 (4157): 1124–31. [DOI] [PubMed] [Google Scholar]
  • 46. Dawes R, Faust D, Meehl P.. Clinical versus actuarial judgment. Science 1989; 243 (4899): 1668–74. [DOI] [PubMed] [Google Scholar]
  • 47. Wickens C, Hollands J, Banbury S, et al. Decision making. In: Wickens CD, Hollands JG, Banbury S, Parasuraman R, eds. Engineering Psychology and Human Performance. New York: Psychology Press; 2013: 245–83. [Google Scholar]
  • 48. Poses RM, Anthony M.. Availability, wishful thinking, and physicians’ diagnostic judgments for patients with suspected bacteremia. Med Decis Making 1991; 11 (3): 159–68. [DOI] [PubMed] [Google Scholar]
  • 49. Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 2018; 15 (11): e1002683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Bernacki R, Paladino J, Neville BA, et al. Effect of the serious illness care program in outpatient oncology: a cluster randomized clinical trial. JAMA Intern Med 2019; 179 (6): 751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Manz C, Parikh RB, Evans CN, et al. Effect of integrating machine learning mortality estimates with behavioral nudges to increase serious illness conversions among patients with cancer: a stepped-wedge cluster randomized trial. JCO 2020; 38 (Suppl 15): 12002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocaa290_Supplementary_Data

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES