Key Points
Question
Can natural language processing be used to predict the survival of patients with cancer from their initial oncologist consultation document?
Findings
In this prognostic study of 47 625 patients with cancer, 6-, 36-, and 60-month survival were predicted using both traditional and neural language models with performance similar to or better than that found in prior work.
Meaning
These findings suggest it is feasible to predict survival of patients with cancer using a common cancer document without additional data and without training separate models for specific types of cancer.
This prognostic study develops and evaluates neural natural language processing models that predict the survival of patients with general cancer using their initial oncologist consultation.
Abstract
Importance
Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer.
Objective
To investigate whether natural language processing can predict survival of patients with general cancer from a patient’s initial oncologist consultation document.
Design, Setting, and Participants
This retrospective prognostic study used data from 47 625 of 59 800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded.
Exposures
Initial oncologist consultation documents were analyzed using traditional and neural language models.
Main Outcomes and Measures
The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used.
Results
Of the 47 625 patients in the sample, 25 428 (53.4%) were female and 22 197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41 447 patients (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found.
Conclusions and Relevance
These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.
Introduction
Cancer is a leading cause of death globally, with survival depending on variables including cancer site, tumor type, age, sex, and comorbidities.1 Accurately predicting an individual patient’s survival could be used to improve cancer care. For example, it might suggest earlier referral to palliative care resources or consideration of more aggressive therapies upfront. Traditionally, survival rates are calculated retrospectively and categorized by only a few factors, primarily by cancer site and histology.1 Despite familiarity with these odds, oncologists can be inaccurate when predicting an individual patient’s survival prospectively2; they have trouble factoring in personal factors such as age.
Predictive models trained by machine learning may allow more personalized predictions by using many features of a patient’s particular characteristics and disease, and have been shown to outperform the prediction of treating oncologists.3 Some models developed to date utilize structured data, that is, data that are processed into specific features such as the presence of genetic markers, demographics, or specific aspects of clinical history.4,5,6,7,8 This may limit the widespread use of such models, as data availability varies among cancer treatment centers and between patients. It also limits what data can be used for a model, as not all clinical data are easily coded or categorized for extraction and analysis.9 The use of unstructured data, such as text within medical documents, may address these drawbacks. Almost all patients receiving treatment for cancer have an initial consultation document from their oncologist. Such documents generally have many details relevant to survival, for example, tobacco use or marital status, even if the clinic does not routinely store such data in structured data sets.
The use of machine learning with documents, known as natural language processing (NLP), has increasingly been applied throughout medicine.10,11,12 Many of the applications both in medicine generally and cancer specifically have utilized smaller, specific documents such as radiology or pathology reports.13,14 Some studies such as that by Liu et al15 have used patient encounter documents to predict the onset of 3 noncancer illnesses. They found that models using the unstructured text data in these documents outperformed using only structured data, and that adding structured data like demographics and laboratory data increased performance by only a marginal amount. Within cancer, recent work predicted survival in patients with lung cancer by extracting structured data from unstructured document data16 and used nonneural NLP on progress notes to update an individual’s prognosis.17
Neural networks are a type of machine learning modeled after the interconnectedness of neurons. When utilized in NLP, neural models can develop a more complex understanding of language, such as the presence of words with respect to each other, even when not adjacent.4 We were unable to find prior work using neural NLP methods to predict the survival of patients with general cancer using oncologist consultation documents, nor were such works identified in recent reviews of neural network applications in cancer,13 in general medical applications,11 or in a recent review of machine learning techniques used in cancer survival prediction.18
Our work sought to develop and evaluate neural NLP models that predict the survival of patients with general cancer using their initial oncologist consultation, without the use of structured data. By using this common document without structured data, we hoped to build models that would not be constrained by requiring the collection and processing of specific data. Similarly, we utilized our models with the patient population with general cancer seen across a provincial cancer control system, as opposed to focusing on predicting survival in patients with 1 cancer location treated in a single center. This allowed us to investigate the performance of more generalizable models. We also did not utilize any feature extraction techniques for our neural methods, instead allowing the neural networks to directly utilize the text after only low-level text preparation. We further sought to contribute to the field by comparing 2 common neural networks used in NLP with a recently developed neural network using transformers.19 We hypothesized that our models would be able to achieve predictive performance at least in line with prior work, predicting with an accuracy, balanced accuracy (BAC), and receiver operating characteristics area under the curve (AUC) all above 0.800 when tested upon a never-seen holdout set, and that the neural models would outperform the traditional nonneural bag-of-words (BoW) algorithm, with the models using transformers the best performing.
Methods
The University of British Columbia BC Cancer Research Ethics Board approved this prognostic study, including exemption from requiring informed consent from study participants due to this being infeasible. We report this study following the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.
Data Source
Our study cohort was selected from the 59 800 patients who started cancer care at BC Cancer between April 1, 2011, and December 30, 2016. These patients sought care for malignant disease or for precancerous or nonmalignant pathological findings that required specialist cancer care. BC Cancer provides all radiation therapy in British Columbia, and more than 85% of medical oncologists in the province have affiliation with the organization. BC Cancer cares for patients with cancer at 6 centers located in geographically diverse settings throughout the province and oversees systemic therapy at most of the 53 smaller Community Oncology Network sites. Data were provided to us by BC Cancer, which obtained mortality data from BC Vital Statistics administrative data.
Data Selection and Preparation
We excluded patients who had more than 1 cancer diagnosis. We included all patients who had at least 1 valid document recorded as being a medical or radiation oncologist consultation generated within 180 days of diagnosis irrespective of disease and selected the document closest to the date of diagnosis. At BC Cancer, some patients may not see an oncologist for a few months after diagnosis as they may first receive surgery elsewhere, and come to the organization for chemotherapy, radiation therapy, or other specialized treatments.
We applied some preprocessing to documents before they were used by our models, as outlined in eMethods in Supplement 1. When used with BoW models, words had their endings removed to become tokens. We calculated survival as the number of months between the selected document being generated, and either the patient’s recorded death date or April 6, 2022, when mortality data were last extracted from administrative data. We then produced binary labels for whether a patient survived 6, 36, or 60 months or not.
Natural Language Models
Language models are used in NLP to assign probabilities to the order of words,20 such as determining whether one sequence of words is more likely than another. We can extend this understanding to other tasks, such as predicting the binary survival outcomes in this work. Herein, we compare 4 language models: the traditional nonneural method, BoW,21,22 and 3 models using neural networks, consisting of convolutional neural networks (CNN),23,24,25 long short-term memory (LSTM),26 and a more recent transformer model, bidirectional encoder representations from transformers (BERT).19 We show simplified diagrams of these models in the Figure. Neural networks allow models to have more complicated understandings of language, such as how nearby or even distant words can change each other’s meaning. We provide further description of these models, including hyperparameters, in the eMethods in Supplement 1. We implemented our models in the Python 3 programming language. We describe further details, including libraries used27,28,29,30 and techniques for dealing with imbalance between survivors and nonsurvivors, in the eMethods in Supplement 1. Our code and trained models are publicly available through a GitHub repository.
Figure. Simplified Diagrams of the Language Models Used in This Work.

A, The bag-of-words model counts word occurrences in a document, which is then used by a traditional machine learning algorithm. B, The convolutional neural network model understands a document in small adjacent clusters of words called convolutions (one is shown with black lines). The model can then learn to predict from combinations of these convolutions. C, The long short-term memory model updates the prediction by reading the document one word at a time. It has a memory cell that allows it to remember some prior context (dotted lines). D, The bidirectional encoder representations from transformers model can understand how each word is connected to all other words in the document but can only read small portions of text. One word’s possible connections are indicated by a black line.
Statistical Analysis
The primary outcome was model performance when predicting patients’ survival of 6, 36, or 60 months. To avoid overfitting, when a model performed well on its training data but not on new data,31 we randomly separated our data into training (70%), development (10%) and testing (20%) sets. We trained our neural models on the training data up to 100 times (epochs), stopping when performance evaluated on the development set did not increase for 5 epochs. We used the training and development sets to tune our model hyperparameters, then evaluated the model on the test set to ensure no tuning or development could overfit the test set. We assessed model performance by reporting the accuracy, BAC, AUC, F1 score, sensitivity, specificity, and other metrics of this test run, which we define in eTable 1 in Supplement 1.
To better understand our BoW models, we measured a word’s importance by the absolute value of coefficient weights of the L2-regularized logistic regression model. For our neural models, we used the Captum interpretability library for PyTorch32 to implement integrated gradients.33 This attribution method allows us to visualize what words in a document positively or negatively contribute to a prediction. While interpretable, the visualization is specific to words in an individual document; a word’s or phrase’s shown importance may be different in the context of a different document. To protect privacy, we anonymized the shown text by changing dates, names, and other identifying components, ensuring this did not change the interpretation.
Results
Patient and Document Selection
Of the 59 800 patients from BC Cancer, we excluded 2784 who were recorded as starting cancer care multiple times. Of these remaining 57 016 patients, 9391 did not have a consultation from a medical or radiation oncologist within 180 days following their cancer diagnosis. This left 47 625 patients fulfilling our inclusion and exclusion criteria; all had updated mortality data. This cohort consisted of 25 428 female patients (53.4%) and 22 197 male patients (46.6%) with a mean (SD) age of 64.9 (13.7) years (Table 1). We observed patients surviving a mean (SD) of 61.7 (40.3) months after diagnosis, and 59.9 (39.9) months after their initial oncologist consultation; these numbers are limited by the observation period, which was a minimum of 5 years. For our prediction targets, 41 447 (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from the initial oncologist consultation.
Table 1. Characteristics of Patients in the Final Data Seta.
| Characteristic | Data (N = 47 625) |
|---|---|
| Sex | |
| Female | 25 428 (53.4) |
| Male | 22 197 (46.6) |
| Stage | |
| I | 6505 (13.7) |
| II | 8817 (18.5) |
| III | 6227 (13.1) |
| IV | 6287 (13.2) |
| Unknown | 19 789 (41.6) |
| Age at diagnosis, mean (SD), y | 64.9 (13.7) |
| Observed time survived, mean (SD), mo | |
| Since diagnosisb | 61.7 (40.3) |
| Since documentb,c | 59.9 (39.9) |
| Time survived, mean (SD), mo | |
| Since diagnosis of those who died | 27.0 (26.9) |
| Since document of those who diedc | 25.6 (26.6) |
| Between diagnosis and document generationc | 1.34 (1.26) |
| Survived, mo | |
| 6 | 41 447 (87.0) |
| 36 | 31 143 (65.4) |
| 60 | 27 880 (58.5) |
Unless otherwise indicated, data are expressed as No. (%) of patients.
Indicates the number of months patients survived during the study’s observation period, which was at least 60 months.
Indicates the number of months survived since the initial oncologist consultation document used in this study was generated.
Model Performance
In Table 2, we show how well our different NLP models predict whether patients will survive 5 years after their initial oncologist consultation when evaluated on a holdout test set. We see numerically similar performance between BoW, CNN, and LSTM, with BAC above 0.800 and AUC above 0.900. BERT has a lower performance across most metrics.
Table 2. Model Performance for Predicting 60-Month Survival After Patients’ Initial Oncologist Consultation Document Was Generated.
| Model | Accuracy | BAC | AUC | F1 | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| BoW | 0.823 | 0.835 | 0.915 | 0.856 | 0.798 | 0.871 |
| CNN | 0.828 | 0.837 | 0.918 | 0.862 | 0.809 | 0.866 |
| LSTM | 0.826 | 0.834 | 0.914 | 0.860 | 0.810 | 0.859 |
| BERT | 0.804 | 0.813 | 0.894 | 0.842 | 0.787 | 0.838 |
Abbreviations: AUC, receiver operating characteristics area under the curve; BAC, balanced accuracy; BERT, bidirectional encoder representations from transformers; BoW, bag-of-words; CNN, convolutional neural network; LSTM, long short-term memory.
Performance by Survival Length
Table 3 compares the performance for our BoW model with one of our best neural models, CNN, when predicting the different survival lengths. On a holdout test set, BoW performed best for predicting 6-month survival, achieving a BAC of 0.856 (AUC 0.928); CNN had best performance for predicting 36-month survival, with a BAC of 0.842 (AUC, 0.918) and for 60-month survival, with a BAC of 0.837 (AUC, 0.918). We see similar performance between both models. Additional metrics, and performance of all models when predicting all durations, are described in eTable 2 in Supplement 1.
Table 3. Model Performance for Predicting the Given Numbers of Months After Patients’ Initial Oncologist Consultation Document Was Generated.
| Modela | Survival, mo | Accuracy | BAC | AUC | F1 | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| BoW | 6 | 0.849 | 0.856 | 0.928 | 0.907 | 0.847 | 0.866 |
| CNN | 6 | 0.843 | 0.853 | 0.926 | 0.903 | 0.839 | 0.867 |
| BoW | 36 | 0.837 | 0.837 | 0.915 | 0.872 | 0.836 | 0.838 |
| CNN | 36 | 0.844 | 0.842 | 0.918 | 0.878 | 0.848 | 0.836 |
| BoW | 60 | 0.823 | 0.835 | 0.915 | 0.856 | 0.798 | 0.871 |
| CNN | 60 | 0.828 | 0.837 | 0.918 | 0.862 | 0.809 | 0.866 |
Abbreviations: AUC, receiver operating characteristics area under the curve; BAC, balanced accuracy; BoW, bag-of-words; CNN, convolutional neural networks.
Comparison of the traditional model, BoW, with our best performing neural model, CNN.
Interpretation
When observing the importance of tokens (words with endings removed), we see some similar tokens have top 10 importance for BoW models used to predict both 6- and 60-month survival (Table 4), but also some differences. For example, the token palliat, which can stem from the words palliative and palliation, is the most important feature in both. Different cancer locations or types are important for the 2 durations; breast and prostate are positive predictors for 6-month survival, while liver, glioblastoma, and lung are negative predictors for 60-month survival. The token N0 is a top 10 positive predictor for 60-month survival, although we see no other TNM classification tokens in top 10 features for either survival duration.
Table 4. Top 10 Tokens Used by Bag-of-Words Models for 6- and 60-Month Survival Predictiona.
| Feature importance rank | Survival prediction | |||
|---|---|---|---|---|
| 6 mo | 60 mo | |||
| Token | Coefficient direction | Token | Coefficient direction | |
| 1 | Palliat | Negative | Palliat | Negative |
| 2 | Poor | Negative | Metastat | Negative |
| 3 | Risk | Positive | Risk | Positive |
| 4 | Hospit | Negative | Liver | Negative |
| 5 | No | Positive | No | Positive |
| 6 | Unfortun | Negative | Glioblastoma | Negative |
| 7 | Prostat | Positive | Poor | Negative |
| 8 | Breast | Positive | Lung | Negative |
| 9 | Metastat | Negative | N0b | Positive |
| 10 | Stage | Positive | Whitehors | Positive |
Feature importance was calculated using the absolute value of coefficient weights in these L2-regularized logistic regression models. Tokens are words in which the word endings have been removed for processing.
Note that N0 refers to a patient without disease in lymph nodes in TNM cancer staging.
eFigures 1 and 2 in Supplement 1 show the importance of words in a patient’s document for our CNN models, which correctly predict that this patient survives 6 but not 60 months. We again see similarities and differences between the models developed to predict surviving both lengths. For example, the patient’s age had negative importance when predicting whether they would survive 60 months, but not 6 months. Positive aspects of the surgery and pathology, such as having clear margins and no lymphatic or vascular involvement, predicted positively in both models, but more so for the 6-month survival prediction. Medical comorbidities such as congestive heart failure and Barrett esophagus were negative predictors for 60-month survival, but less so in predicting 6-month survival.
Discussion
In this study, we trained and evaluated NLP models using both traditional and neural models to predict whether a patient with cancer will survive 6, 36, and 60 months using only their initial oncologist consultation document and no other data. The performance of our models, when evaluated on a never-before-seen internal holdout set, achieved accuracy, BAC, and AUC over 0.800, with our best models achieving AUC over 0.900. This performance was similar to or better than prior work seeking to predict survival of patients with cancer,5,8,16,34,35,36 despite using data that were more generalizable and readily available. Prior studies have predicted cancer survival for specific tumor sites such as breast or lung, or with the use of structured data such as processed clinical and genetic characteristics.
Our results suggest it is possible to predict the survival of patients with cancer without having to construct structured data sets, or limiting the predictions to specific types or locations of cancer. The availability of structured data may vary. Given the widespread availability of initial oncologist consultation documents, this opens up the possibility of more easily training and using such models across cancer types at different cancer centers.
To help our models be generalizable, we did minimal text processing. We did not extract features from the text to be used by our models, instead providing text directly. The documents in our data set were generated by oncologists practicing at 6 centers located in geographically varied regions without set templates or formatting requirements. Future work will be needed to externally validate our models on documents generated in other jurisdictions.
The performance of our CNN, LSTM, and BoW models was numerically similar. BERT’s inferior performance, despite its more deeply connected network, may be due to its limitation of only being able to use the first 512 tokens. The lack of clear performance gain with neural models compared with the traditional BoW model is surprising, but in line with some prior work.37,38 This may suggest survival prediction is largely dependent on the presence of words, compared with understanding how words relate to each other in a document.
We interpreted our neural models using integrated gradients and our BoW models using coefficient values. Both methods have their limitations. Feature correlation impacts L2-regularized coefficients, while integrated gradients show word importance based on their specific context. However, our results generally supported that our models used generalizable, interpretable portions of text to make their predictions, consistent with known mortality risk and protective factors.39,40 For example, palliative care is often accessed in the last months of a patient’s life, so it is unsurprising that an oncologist mentioning palliative care would support that a patient may not survive 6 months. The 60-month BoW model included N0, that a patient has no known spread to lymph nodes, but not the 6-month BoW model. Nodal status could be expected to be less relevant in the shorter time. We found that the neural model weighed clinically relevant data such as age, positive outcomes from a surgery, and comorbidities differently when predicting 6- or 60-month survival.
There are multiple avenues for further development to build on our results. While some related work found adding structured data led to only a small, likely not clinically significant improvement in performance,15 it may still be worthwhile investigating the addition of structured data. Future work could improve on our results by using different models or configurations, including those both more and less complex. Hardware advancement will allow more complex versions of our models, and there are new transformer models such as Longformer41 and BigBird42 that can use more text. Neural models can especially benefit from training on very large amounts of data. Future work could use our methodology to train models using widely available oncologist consultation documents from multiple jurisdictions or fine-tune our models by further training them with a relatively small number of a different jurisdiction’s documents. Future research could also adapt our models to update survival prediction after the initial consultation, such as by using progress notes. Further performance improvements and external validation may allow clinicians to use this methodology for improving care, such as better facilitating palliative resources with short-term survival predictions or using long-term predictions to consider more intense initial treatment.
Limitations
This study has some limitations. We took steps to ensure the validity of our models, including the use of a holdout test set, and eliminating location-specific text that was automatically added to documents, but we did not externally validate our models. Our work calculated survival from the first oncologist consultation document, as opposed to from diagnosis, as is common. This was done because the time from diagnosis to consultation can vary from a few weeks to a few months at BC Cancer if patients first received treatment such as surgery from outside clinicians, or initial workup from family physicians. We also note our data set consisted of patients first seen between 2011 and 2016, so we could establish 5-year survival. Given the rapid advancement of cancer treatment, accuracy may be limited for patients starting treatment presently. Finally, we did not compare our models’ performance with that of oncologists, as has been done in previous work.
Conclusions
In this prognostic study of 47 625 patients with cancer, we trained and evaluated traditional and neural models to predict whether patients will survive 6, 36, or 60 months based on their initial oncologist consultation. We evaluated our model on an internal holdout set of data, and found our best models achieved accuracy and BAC above 0.800 and AUC above 0.900. This performance was comparable with or superior to that of prior work, which has predicted survival only for specific types of cancer or used data that are more difficult to obtain.
Our findings suggest that a clinically useful survival prediction model for patients with general cancer may be possible without needing specific models for different cancer types and by using data that are readily accessible without complex data processing or data mining. These results still require external validation and have multiple avenues for further improvement; regardless, they suggest that this methodology may one day assist in care by helping patients with cancer and their treatment team by providing an individualized expectation of survival.
eMethods. Obtaining Data, Text Processing, Language Models Used, Hardware Used, and Implementation
eTable 1. Definition of Evaluation Metrics Reported in This Work
eTable 2. Performance of All Models When Predicting Surviving the Given Number of Months, With Extended Metrics
eFigure 1. Visualizing Word Importance of CNN Models Used to Predict 6- and 60-Month Survival
eFigure 2. Visualizing Word Importance of CNN Models Used to Predict 6- and 60-Month Survival, Adapted for Color Blindness
eReferences
Data Sharing Statement
References
- 1.National Cancer Institute . SEER cancer statistics review (CSR) 1975–2016. Updated April 9, 2020. Accessed August 26, 2022. https://seer.cancer.gov/archive/csr/1975_2016/
- 2.Benson KRK, Aggarwal S, Carter JN, et al. Predicting survival for patients with metastatic disease. Int J Radiat Oncol Biol Phys. 2020;106(1):52-60. doi: 10.1016/j.ijrobp.2019.10.032 [DOI] [PubMed] [Google Scholar]
- 3.Gensheimer MF, Aggarwal S, Benson KRK, et al. Automated model versus treating physician for predicting survival time of patients with metastatic cancer. J Am Med Inform Assoc. 2021;28(6):1108-1116. doi: 10.1093/jamia/ocaa290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhu W, Xie L, Han J, Guo X. The application of deep learning in cancer prognosis prediction. Cancers (Basel). 2020;12(3):603. doi: 10.3390/cancers12030603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Akcay M, Etiz D, Celik O. Prediction of survival and recurrence patterns by machine learning in gastric cancer cases undergoing radiation therapy and chemotherapy. Adv Radiat Oncol. 2020;5(6):1179-1187. doi: 10.1016/j.adro.2020.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Deng F, Zhou H, Lin Y, et al. Predict multicategory causes of death in lung cancer patients using clinicopathologic factors. Comput Biol Med. 2021;129:104161. doi: 10.1016/j.compbiomed.2020.104161 [DOI] [PubMed] [Google Scholar]
- 7.Ferroni P, Zanzotto FM, Riondino S, Scarpato N, Guadagni F, Roselli M. Breast cancer prognosis using a machine learning approach. Cancers (Basel). 2019;11(3):328. doi: 10.3390/cancers11030328 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kaur I, Doja MN, Ahmad T, et al. An integrated approach for cancer survival prediction using data mining techniques. Comput Intell Neurosci. 2021;2021:6342226. doi: 10.1155/2021/6342226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Krauze A, Camphausen K. Natural language processing—finding the missing link for oncologic data, 2022. Int J Bioinforma Intell Comput. 2022;1(1):22-42. [Google Scholar]
- 10.Barber EL, Garg R, Persenaire C, Simon M. Natural language processing with machine learning to predict outcomes after ovarian cancer surgery. Gynecol Oncol. 2021;160(1):182-186. doi: 10.1016/j.ygyno.2020.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wu S, Roberts K, Datta S, et al. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020;27(3):457-470. doi: 10.1093/jamia/ocz200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kehl KL, Xu W, Lepisto E, et al. Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin Cancer Inform. 2020;4(4):680-690. doi: 10.1200/CCI.20.00020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.AbuSamra AA, Al-Madhoun AMR. Applying deep learning and natural language processing in cancer: a survey. Abstract presented at: 2021 Palestinian International Conference on Information and Communication Technology (PICICT); September 28-29, 2021; Gaza, State of Palestine. [Google Scholar]
- 14.Li J, Zhou Z, Dong J, et al. Predicting breast cancer 5-year survival using machine learning: a systematic review. PLoS One. 2021;16(4):e0250370. doi: 10.1371/journal.pone.0250370 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Liu J, Zhang Z, Razavian N. Deep EHR: chronic disease prediction using medical notes. In: Proceedings of the 3rd Machine Learning for Healthcare Conference. Proceedings of Machine Learning Research; 2018:440-464. [Google Scholar]
- 16.Yuan Q, Cai T, Hong C, et al. Performance of a machine learning algorithm using electronic health record data to identify and estimate survival in a longitudinal cohort of patients with lung cancer. JAMA Netw Open. 2021;4(7):e2114723. doi: 10.1001/jamanetworkopen.2021.14723 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Morin O, Vallières M, Braunstein S, et al. An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication. Nat Cancer. 2021;2(7):709-722. doi: 10.1038/s43018-021-00236-2 [DOI] [PubMed] [Google Scholar]
- 18.Deepa P, Gunavathi C. A systematic review on machine learning and deep learning techniques in cancer survival prediction. Prog Biophys Mol Biol. 2022;174:62-71. doi: 10.1016/j.pbiomolbio.2022.07.004 [DOI] [PubMed] [Google Scholar]
- 19.Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv. Preprint posted online May 24, 2019. doi: 10.48550/arXiv.1810.04805 [DOI]
- 20.Jurafsky D, Martin JH. Speech and Language Processing. 2nd ed. Pearson Education; 2018. [Google Scholar]
- 21.Zhang A, Lipton ZC, Li M, Smola AJ. Dive into deep learning. arXiv. Preprint posted online June 21, 2021. doi: 10.48550/arXiv.2106.11342 [DOI]
- 22.Manning C, Raghavan P, Schuetze H. Introduction to Information Retrieval. Cambridge University Press; 2009. [Google Scholar]
- 23.Kim Y. Convolutional neural networks for sentence classification. In: Moschitti A, Pang B, Daelemans W, eds. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics; 2014:1746-1751. [Google Scholar]
- 24.Rios A, Kavuluru R. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: Ritz A, An L, eds. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. Association for Computing Machinery; 2015:258-267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rios A, Kavuluru R. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J Biomed Inform. 2017;75S(suppl):S85-S93. doi: 10.1016/j.jbi.2017.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Adhikari A, Ram A, Tang R, Lin J. Rethinking complex neural network architectures for document classification. In: Burstein J, Doran C, Solorio T, eds. Proceedings of the 2019 Conference of the North American Association for Computational Linguistics. Association for Computational Linguistics; 2019:4046-4051. [Google Scholar]
- 27.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825-2830. doi: 10.5555/1953048.2078195 [DOI] [Google Scholar]
- 28.Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. Vol 32. Curran Associates Inc; 2019. Accessed February 19, 2022. https://papers.nips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
- 29.Falcon W. The PyTorch Lightning. Version 1.4. March 30, 2019. Accessed January 15, 2022. https://www.pytorchlightning.ai
- 30.McKinney W. pandas: a Foundational python Library for data analysis and statistics. Python for High Performance Science Computing. 2011;14(9):1-9. Accessed January 14, 2022. https://www.dlr.de/sc/portaldata/15/resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf
- 31.Shalev-Shwartz S, Ben-David S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press; 2014. doi: 10.1017/CBO9781107298019 [DOI] [Google Scholar]
- 32.Kokhlikyan N, Miglani V, Martin M, et al. Captum: a unified and generic model interpretability library for PyTorch. arXiv. Preprint posted online September 16, 2020. doi: 10.48550/arXiv.2009.07896 [DOI]
- 33.Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. arXiv. Preprint posted online June 12, 2017. doi: 10.48550/arXiv.1703.01365 [DOI]
- 34.Lu F, Chen Z, Yuan X, et al. MMHG: Multi-modal hypergraph learning for overall survival after d2 gastrectomy for gastric cancer. In: 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress. IEEE; 2017:164-169. [Google Scholar]
- 35.Arya N, Saha S. Multi-modal advanced deep learning architectures for breast cancer survival prediction. Knowl Based Syst. 2021;221:106965. doi: 10.1016/j.knosys.2021.106965 [DOI] [Google Scholar]
- 36.Doppalapudi S, Qiu RG, Badr Y. Lung cancer survival period prediction and understanding: deep learning approaches. Int J Med Inform. 2021;148:104371. doi: 10.1016/j.ijmedinf.2020.104371 [DOI] [PubMed] [Google Scholar]
- 37.Zech J, Pain M, Titano J, et al. Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology. 2018;287(2):570-580. doi: 10.1148/radiol.2018171093 [DOI] [PubMed] [Google Scholar]
- 38.Ong CJ, Orfanoudaki A, Zhang R, et al. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS One. 2020;15(6):e0234908. doi: 10.1371/journal.pone.0234908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chok KSH, Law WL. Prognostic factors affecting survival and recurrence of patients with pT1 and pT2 colorectal cancer. World J Surg. 2007;31(7):1485-1490. doi: 10.1007/s00268-007-9089-0 [DOI] [PubMed] [Google Scholar]
- 40.Clemons M, Danson S, Hamilton T, Goss P. Locoregionally recurrent breast cancer: incidence, risk factors and survival. Cancer Treat Rev. 2001;27(2):67-82. doi: 10.1053/ctrv.2000.0204 [DOI] [PubMed] [Google Scholar]
- 41.Beltagy I, Peters ME, Cohan A. Longformer: the long-document transformer. arXiv. Preprint posted online December 2, 2020. doi: 10.48550/arXiv.2004.05150 [DOI]
- 42.Zaheer M, Guruganesh G, Dubey A, et al. Big Bird: transformers for longer sequences. arXiv. Preprint posted online January 8, 2021. doi: 10.48550/arXiv.2007.14062 [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eMethods. Obtaining Data, Text Processing, Language Models Used, Hardware Used, and Implementation
eTable 1. Definition of Evaluation Metrics Reported in This Work
eTable 2. Performance of All Models When Predicting Surviving the Given Number of Months, With Extended Metrics
eFigure 1. Visualizing Word Importance of CNN Models Used to Predict 6- and 60-Month Survival
eFigure 2. Visualizing Word Importance of CNN Models Used to Predict 6- and 60-Month Survival, Adapted for Color Blindness
eReferences
Data Sharing Statement
