Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 1.
Published in final edited form as: J Geriatr Oncol. 2023 Apr 19;14(4):101498. doi: 10.1016/j.jgo.2023.101498

Supervised Learning Applied to Classifying Fallers versus Non-Fallers Among Older Adults with Cancer

Erika Ramsdale 1, Madhav Kunduru 2, Lisa Smith 1, Eva Culakova 1, Junchao Shen 3, Sixu Meng 4, Martin Zand 5, Ajay Anand 2
PMCID: PMC10174263  NIHMSID: NIHMS1895647  PMID: 37084629

Abstract

Introduction:

Supervised machine learning approaches are increasingly used to analyze clinical data, including in geriatric oncology. This study presents a machine learning approach to understand falls in a cohort of older adults with advanced cancer starting chemotherapy, including fall prediction and identification of contributing factors.

Materials and Methods:

This secondary analysis of prospectively collected data from the GAP 70+ Trial (NCT02054741; PI: Mohile) enrolled patients aged ≥70 with advanced cancer and ≥1 geriatric assessment domain impairment who planned to start a new cancer treatment regimen. Of ≥2,000 baseline variables (“features”) collected, 73 were selected based on clinical judgment. Machine learning models to predict falls at three months were developed, optimized, and tested using data from 522 patients. A custom data preprocessing pipeline was implemented to prepare data for analysis. Both undersampling and oversampling techniques were applied to balance the outcome measure. Ensemble feature selection was applied to identify and select the most relevant features. Four models (logistic regression [LR], k-nearest neighbor [kNN], random forest [RF], and MultiLayer Perceptron [MLP]) were trained and subsequently tested on a holdout set. Receiver operating characteristic (ROC) curves were generated and area under the curve (AUC) was calculated for each model. SHapley Additive exPlanations (SHAP) values were utilized to further understand individual feature contributions to observed predictions.

Results:

Based on the ensemble feature selection algorithm, the top eight features were selected for inclusion in the final models. Selected features aligned with clinical intuition and prior literature. The LR, kNN, and RF models performed equivalently well in predicting falls in the test set, with AUC values 0.66–0.67, and the MLP model showed AUC 0.75. Ensemble feature selection resulted in improved AUC values compared to using LASSO alone. SHAP values, a model-agnostic technique, revealed logical associations between selected features and model predictions.

Discussion:

Machine learning techniques can augment hypothesis-driven research, including in older adults for whom randomized trial data are limited. Interpretable machine learning is particularly important, as understanding which features impact predictions is a critical aspect of decision-making and intervention. Clinicians should understand the philosophy, strengths, and limitations of a machine learning approach applied to patient data.

INTRODUCTION

Increased familiarity with machine learning methods could enhance communication between clinicians, researchers, and data scientists, as well as enable readers to more critically evaluate research employing machine learning methods. The intent of this paper is to introduce and illustrate supervised learning methods to readers who may be unfamiliar with them, grounded in a topic (falls) which is familiar and interesting to geriatric oncology clinicians and researchers. Examining associations between variables and outcomes is a common analytic task for the clinician or clinical researcher. These analyses fall under the rubric of “supervised learning” problems. Regression modeling, commonly applied to investigate a wide variety of clinical questions, is an example of supervised learning. A defining characteristic of supervised machine learning is that the datasets are “labeled” – the raw data inputs to a machine learning algorithm are each tagged with an informative attribute that permits the machine to learn inference rules for classification or prediction.1 For example, labelling a patient’s CT scan to specify whether or not the patient has lung cancer can prepare (or “train”) a machine learning algorithm to independently diagnose lung cancer from unlabeled, previously unseen CT images. Similarly, discretely annotated notes in the patient’s chart indicating the presence or absence of geriatric syndromes in the patient can train a natural language processing algorithm to detect these diagnoses in unstructured clinician notes. Perhaps most commonly, a set of variables representing a patient (independent variables or features) is “labeled” with a particular outcome of interest, such as survival time, hospitalization, or treatment response rate (dependent variable). These can be input into machine learning models (“classification” models) with the goal of optimizing classification of patients or prediction of outcomes.

As for any statistical modeling, developing a supervised machine learning model involves multiple steps1 which can be overlapping and iterative rather than distinct and linear (Figure 1). Data scientists and statisticians with expertise in machine learning approach these problems similarly, starting with identification of the dataset and the key question (including specification of the “label” or outcome of interest). Data evaluation and cleaning are typically the most time-intensive activities within the process, requiring significant content expertise and scrupulousness as well as, in most cases, custom code snippets or manual adjustments unique to the dataset being analyzed. Transformation involves the preparations necessary to introduce the data to machine learning algorithms – this typically includes transforming all non-numeric data to numerically encoded vectors or tensors. “Pre-processing” pipelines are available2 or can be relatively easily developed to automate data transformation activities, requiring minimal user input to specify preferred methods. Exploratory data analysis (EDA) is the typical first analytic approach to the data; philosophically, this involves approaching the data impartially, using simple statistical methods and visualizations to reveal patterns and confirm that model assumptions will be met. Feature (variable) selection and model development and testing can happen concurrently, consecutively, or iteratively, depending upon the methods selected. Finally, selection of the appropriate data visualization(s) requires its own domain knowledge of techniques as well as a deep understanding of the audience for the results. Development of machine learning models therefore requires simultaneous expertise in statistics, computer science and programming, and domain knowledge (e.g., geriatrics, oncology) specific to the analytic question.

Figure 1.

Figure 1.

Data Science lifecycle with summary of approaches applied to this analysis. SMOTE = Synthetic Minority Over-sampling TEchnique; ROC = Receiver Operating Characteristic; SHAP = SHapley Additive exPlanations.

The example in this paper is taken from the domain of geriatric oncology. Falls are the leading cause of fatal and nonfatal injuries in older adults,3 and these injuries (for example, hip fracture) can severely impact physical function and independence.4 Falls are strongly associated with frailty in these patients,5 which in turn impacts survival, hospitalizations, quality of life, and other outcomes. Older adults with cancer have a higher rate of falls than those without a history of cancer;6 in these patients, falls may also result from or impact receipt of cancer treatment.7 Developing a model to predict whether a patient has increased risk of falling is a supervised learning task.

This secondary analysis describes the development of supervised learning models to predict likelihood of falling within three months of cancer treatment initiation, using data from a cohort (n=718) of vulnerable older adults with advanced cancer recruited to a national prospective cluster-randomized trial of geriatric assessment (GA), conducted in community-based (“real world”) oncology practices. Several considerations and methods applicable to supervised learning/classification problems are described in detail. An earlier paper served as a general introduction to machine learning for clinicians, and readers are strongly encouraged to read that manuscript first.1

MATERIALS AND METHODS

Dataset

The patient dataset (n=718) for this secondary analysis consists of baseline and three-month timepoint measures from a nationwide, multicenter, cluster-randomized study assessing whether providing information regarding GA to community oncologists reduced clinician-rated grade 3–5 chemotherapy toxicity (Geriatric Assessment for Patients [GAP70+] study; PI: Mohile; ClinicalTrials.gov identifier: NCT02054741)8 in older patients with advanced cancer starting a new cancer treatment regimen. The primary study was conducted by the University of Rochester Cancer Center (URCC) NCI Community Oncology Research Program (NCORP) Research Base and approved by the Institutional Review Boards at participating sites. All participants provided written informed consent. Eligible patients were (1) aged ≥70 years, (2) diagnosed with incurable stage III/IV solid tumor or lymphoma, (3) impaired in at least one GA domain other than polypharmacy, and (4) planning to start a new cancer treatment regimen with a high risk of adverse effects.

Each patient in the dataset was described by more than 2,000 features, encoding granular information about demographics, cancer and treatment variables, patient-reported symptoms, and geriatric assessment measures (including falls). Data collection and validation procedures have been described previously.9 Of the available features, 63 were selected based on clinician judgment, including the main demographic features (including age, race/ethnicity, and marital status); cancer type, stage, and treatment type; summary scores for geriatric assessment measures, and falls information. These data were a mix of categorical (string and ordinal numeric) and quantititave (continuous numeric) values. The outcome feature was whether or not the patient had a self-reported fall within the first three months of cancer treatment (“Have you had a fall since the last assessment?” using the prespecified data collection timepoints of four to six weeks and three months, with a “yes” to either question as yes and a “no” to both questions as no). This outcome was selected to illustrate classification methods because it is clinically relevant, it is dichotomous, and it involves a single measure to simplify methodologic illustration.

Data preprocessing

Data missingness.

One patient’s data were discarded due to missing most feature values, leaving 717 total patients. Within the remaining dataset, 0.9% (n=854) of feature/independent variable values and 27.1% (n=195) of outcome values (i.e., falls data) were missing. When deciding how to handle missing values, it is important to follow statistical principles related to missing data.10 To evaluate whether outcome values were missing completely at random (MCAR), a dummy variable for three-month falls was created (1=present, 0=missing), and Pearson correlation coefficients between features and the dummy variable were calculated using bootstrapping. Using principal components analysis and the first two principal components, t-Stochastic Neighbor Embedding (t-SNE) plots were generated, labeled by the three-month falls variable. Missing values did not appear to be MCAR, and so the decision was made to eliminate rows with a missing outcome variable, leaving 522 patients for analysis. It is important to note that these rows may represent patients who “dropped out” of the study early and may therefore represent the most vulnerable cohort, biasing model predictions. Based on the small percentage of feature (independent) variables missing, which suggested little if any potential impact of these missing values on subsequent results, these values were imputed and no further rows were eliminated, resulting in 522 patients available for analysis.

Train/test split.

Before any further analysis, 20% of the dataset was set aside as the test/validation set, with the remaining 80% as the training set. Using a custom Python class object, rows were randomly selected for the test/validation set. For other datasets using imputed outcomes values, the object assigns all rows with imputed values to the training set; test data should consist of real (not synthetic) data. This function was not needed for this dataset, as we did not impute outcome variables. The test/validation set was not utilized until the final model testing step.

Preprocessing pipeline.

Use of a preprocessing pipeline automates calculations and simplifies separate preprocessing of train and test/validation datasets to prevent data leakage. For example, preparing the entire dataset up-front allows the training set to “learn” information from the test set during the scaling and imputation steps, which could promote overfitting and poor performance on completely unseen data. To that end, a custom Python preprocessing package was developed, comprised of class objects that sequentially performed data encoding, scaling/normalization, and imputation on datasets, allowing users multiple choices at each step and resulting in transformed data ready for input into classification models. Analysis of all combinations of preprocessing steps (n=24) showed no significant differences in unoptimized model accuracies for naïve Bayes, Random Forest, or logistic regression models. The final pipeline selections were one-hot encoding, MinMax scaling, and median imputation.

Exploratory Data Analysis (EDA)

Some pairs of features encoded similar information: for example, a continuous-valued summary score and a categorical variable indicating impairment based on a cut-off of that score were highly correlated. Where these dyads existed (n=14), only the continuous variable was retained. Using the training set, a correlation matrix was generated for all remaining predictive features; five additional features were eliminated for correlation coefficients >0.9 with one or more other features (indicating redundancy in information captured). Two features (baseline weight and weight at six months prior to baseline) were combined into a single feature (weight change over six months). A total of 43 predictive features remained.

Examination of the three-month falls outcome showed class imbalance, with 89 patients sustaining a fall and the remaining 433 without falls. As class imbalance can impair performance of classification models, two recent techniques were used concomitantly to create a class-balanced training set. Synthetic Minority Oversampling Technique (SMOTE) creates synthetic samples for the minority class; oversampling the minority class can cause overfitting to the training set by introducing duplicate data. Tomek links, an undersampling strategy which identifies patient pairs with similar features but opposite classes and removes the majority class example, creates a clearer decision boundary for the classifier but has the limitation of information loss (Figure 2).11

Figure 2.

Figure 2.

Illustration of Tomek links using synthetic (i.e., created artificially for this illustration) dataset. As part of a sampling strategy, the paired data points would be removed prior to classification to create a more defined decision boundary (i.e., the boundary at which the algorithm predicts whether a point is part of the “majority” versus “minority” class).

Feature Selection

Multiple strategies exist to select a subset of optimal features for supervised learning. Domain knowledge (clinical judgment) remains critical to inform the feature selection process, and it was used to pre-filter features for inclusion as described above. Beyond this, machine learning feature selection methods can be generally categorized as filter (e.g., correlation coefficients, information gain, variation inflation factors), wrapper (e.g., recursive feature elimination and stepwise selection algorithms), or embedded (e.g., Least Absolute Shrinkage and Selection Operator [LASSO] or decision tree) methods.12 We selected several methods from each of the three categories to compare the features selected and evaluate robustness of feature selection across methods. We subsequently created a ranking algorithm to normalize output for each method, average feature importance score across all methods, and rank features by their average score.

Model Development and Testing

Model selection.

Numerous classification algorithms exist, including linear-based methods (e.g., logistic regression and support vector machines with a linear kernel), tree-based methods, neural networks, and others (e.g., k-nearest neighbor).13 Additionally, “ensemble” methods, such as bagging and boosting, can be applied to reduce variance or increase classification accuracy. Each model has advantages and disadvantages, the discussion of which falls outside the scope of this paper. However, an important consideration for clinical decision tools, beyond accuracy of model performance, is interpretability: can one understand how the model arrived at its prediction and the relative importance of each feature in that prediction? Some models, such as logistic regression, are highly interpretable, generating odds ratios for each feature; others, such as neural networks, are “black boxes” with obscured inner workings. We selected both interpretable and black box models for illustration and comparison: logistic regression, k-nearest neighbor, random forest (an ensemble decision tree model), and MultiLayer Perceptron (a neural network model). Model algorithms were applied using Python programming language and leveraging the scikit-learn package.2

Model tuning.

Model tuning, or adjustment of the model’s hyperparameters (the user-specified metrics which govern model behavior, such as number of trees in a random forest or number of hidden layers in a neural network) was initially evaluated using automated grid search algorithms to optimize performance on the training set. However, this method can cause overfitting to the training data at the expense of model performance on the test/validation data, which occurred in this case. Unfortunately, hyperparameter tuning is still highly dependent on user experience and training and lacks clear standardization in many cases. Due to overfitting using automated methods, the models were tuned manually using domain knowledge and experience.

Model Results

Model performance metrics depend upon the supervised learning question. Accuracy (percentage correctly classified) is often not the most useful metric, particularly for highly imbalanced datasets where high accuracy may be obtained by simply predicting the majority class for every input. The most important function of this predictive model may be to identify everyone who sustains a fall (high sensitivity or recall), even at the risk of false positives. Alternatively, the most important function may be to identify people least likely to fall, even at the risk of incorrectly including some who do fall (high specificity). The trade-off between these two metrics can be evaluated using receiver operating characteristic (ROC) curves.14 As both sensitivity and specificity may be important for our example, ROC curves were generated and evaluated for the tuned and optimized models applied to the test/validation dataset.

Both interpretable models (such as logistic regression) and “black box” models (such as MLP) were applied and evaluated in this study. Ongoing work in “explainable machine learning” is identifying methods to render these models more transparent, including SHapley Additive exPlanations (SHAP) values15 and Local Interpretable Model-agnostic Explanations (LIME),16 both applicable to any supervised learning algorithm. We utilized SHAP, derived from a game theory framework, to simulate and assess the marginal contribution of each feature to the model predictions.

RESULTS

The characteristics of participants in the GAP trial have been reported previously.9 Here we report the results for each step of the data science lifecycle, as implemented in this analysis.

Data Preprocessing

Pearson correlation coefficients indicated some features with weak but non-zero correlation to data missingness (ρ values of 0.1 – 0.2). Inspection of t-SNE plots showed no obvious clustering of missing values (Figure 3). As described above, rows with missing outcomes values were excluded from the analysis, resulting in 522 patients included (89 with falls and 433 without falls). SMOTE and Tomek links were applied to correct the data imbalance as described above.

Figure 3.

Figure 3.

t-Stochastic Neighbor Embedding (t-SNE) plot showing data for missing versus non-missing outcome (fall) variable. T-SNE plots are a method of visualizing high-dimensional data in two dimensions. Here, we see no obvious clustering of points, suggesting that outcomes data could possibly be missing at random.

Feature Selection

Visualizing the output of the ranking algorithm described above (Figure 4) and plotting relative importance to detect the “elbow” (Supplemental Figure 1), we selected the top eight features for inclusion in model development and testing.

Figure 4.

Figure 4.

Heat map showing features prioritized by different feature selection algorithms (x-axis labels) and average importance across all methods (leftmost column). Not all features are shown, and measures have been described previously for the parent study. SPPB = Short Physical Performance Battery; IADL = Instrumental Activities of Daily Living; OARS = Older Americans Resources and Services; BOMC = Blessed Orientation Memory Concentration. Corr coeff = Pearson correlation coefficient; MI = mutual information; RFECV = recursive feature elimination with cross-validation; Lasso = Least absolute shrinkage and selection operator.

Model Development and Testing

Figure 5 illustrates the different “decision boundaries” generated by each model based on the training set. Figure 6 demonstrates the ROC curves for each model applied to the test dataset. While performance was overall similar among all models, MLP, a neural network model, showed the best performance. Area under the curve (AUC) values were fair, ranging from 0.66 – 0.75. The use of an ensemble feature selection method (i.e., averaging of multiple individual methods) improved AUC significantly (up to 0.15) compared to use of Least Absolute Shrinkage and Selection Operator (LASSO) feature selection alone.

Figure 5.

Figure 5.

Decision boundaries for each model, using training set as input. Data are dimensionally reduced to 2 dimensions using principal components. Blue dots represent patients who had a fall, and red dots represent patients who did not fall. Background color represents probability gradient (e.g., dark red is model prediction of fall with higher probability, whereas light red represents a prediction of fall with lower probability). Mismatch between dot color and background color represents misclassification.

Figure 6.

Figure 6.

Receiver operating characteristic (ROC) curves and area under the curve (AUC) for each model. These show model performance on the test/validation dataset.

In this analysis, two of the models (logistic regression and random forest) are highly interpretable models, and feature importance can be quantified and ranked. Feature importance for logistic regression, measured by the odds ratio, is particularly familiar to and applicable by clinicians. In this example, most of the selected features align with and confirm clinical intuition (prior falls, performance status, weight loss, depression and cognitive status, etc.). By contrast, KNN and particularly MLP are less interpretable. The calculations within neural network models like MLP occur in a “black box,” not reconstructable by the interpreter. However, these calculations can be simulated via other methods like SHAP values. SHAP values generated for a random forest model are shown in Figure 7. The feature value is given by the color of the individual points, and the x-axis shows the direction the feature “pushed” the prediction probability. For example, a low Karnofsky Performance Status (KPS) “pushes” the probability of the model output higher, making it more likely that the model will result in a “faller” classification.

Figure 7.

Figure 7.

SHapley Additive exPlanations (SHAP) values for Random Forest model, using some of the top features selected by the ensemble feature selection model. KPS = Karnofsky Performance Status; MNA = Mini-Nutritional Assessment; GAD-7 = Generalized Anxiety Disorder-7; GDS = Geriatric Depression Scale; MSS = Medical Social Support.

DISCUSSION

Supervised learning problems (classification and prediction) are common in clinical scenarios. The burgeoning field of artificial intelligence, including machine learning approaches, is increasingly being applied to these problems in medicine, and it is crucial that clinicians understand the terminology, basic methodology, philosophy, and limitations of these approaches. Used wisely, these approaches can significantly augment diagnostic and prognostic accuracy,17,18 provide novel insights, and suggest new interventions.19 Wielded imprudently, these models can have undesirable or even dangerous consequences for clinical decision-making20 (like any other statistical method or analytic paradigm).

This paper illustrates a data science and machine learning approach to a supervised learning problem in geriatric oncology: predicting which vulnerable older patients will fall within three months of starting a new treatment regimen for advanced cancer. Many of the steps overlap with well-established regression methods used widely across different fields in healthcare; indeed, regression is a supervised machine learning method. Aspects of this analysis not often described in regression analyses include the use of a data preprocessing pipeline, addressing class imbalance, comparison of feature selection methods (instead of reliance on one method or content expertise to select features), comparison of different models, and explainable machine learning methods (e.g., SHAP values).

In this case, prediction models showed fair performance in classifying fallers versus non-fallers as measured by the AUC, with a small feature set. The performance of these models was similar to that of other decision tools widely used in geriatric oncology.21,22 The use of multiple feature selection algorithms provided information about the robustness of the “signal” created by the features, as opposed to “noise” that may not be discerned through use of a single model. In this case, some features (such as fall history) were selected by all methods, indicating a higher likelihood of a true association, whereas others (social support, age of the physician) were chosen by one method but not others. Averaging of multiple methods (in other words, ensemble feature selection) can improve performance of the model compared to use of a single method. Use of statistical methods versus domain expertise can also suggest novel insights, such as the possible association between physician age and the likelihood of falling during treatment, which could be explored in future hypothesis-drive research. Understanding which features are most important can also inform inform work identifying and testing new interventions to prevent falls.

In machine learning, there may be trade-offs between model performance and interpretability. Both are critical in the healthcare setting, and this analysis highlights this trade-off: the highest performing model was also the least interpretable. Although methods are emerging to peer inside the “black box” of neural network models (like SHAP values and LIME), it is unclear whether these methods can supplant the more intuitively applicable odds ratios of logistic regression. Neural network models like NLP are also challenging to reproduce exactly, given the mathematical methods involved in their generation.23 Moreover, all supervised machine learning methods suffer the same limitation as regression when used during the discovery phase and exploratory analyses: they can only measure associations, not causality. The ability of machine learning to ever accomplish causal inference and generalized problem-solving is a matter of significant current debate bridging philosophy, neuroscience, and computer science;24 in the meantime, machine learning should be an adjunct to, not a substitute for, carefully designed prospective research. The validation of findings from exploratory research in carefully designed prospective studies is an important part of transforming new discoveries into clinical practice.

In summary, a machine learning approach to classification and prediction problems arising in clinical practice can add multiple tools to the analytic toolbox of clinical researchers, including in geriatric oncology. Using a machine learning approach within the multidisciplinary framework of geriatric oncology research, we may be able to refine existing risk stratification tools to better identify patients who are at high risk of adverse events like falls, triggering prospective interventions to mitigate risk. Analyses based on machine learning approaches are increasing rapidly in the medical literature, and clinicians and researchers should, at minimum, be aware of the terminology and statistical approaches underpinning these approaches, as well as a basic understanding of their limitations.

Supplementary Material

1
2

Funding:

The work was funded through NCI K08CA248721 (E.R.), NIA R03AG067977 (E.R.), and U01CA233167 (PI Supriya Mohile).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CONFLICT OF INTEREST AND DISCLOSURES

Authors report no real or apparent conflicts of interest.

References

  • 1.Ramsdale E, Snyder E, Culakova E, et al. : An introduction to machine learning for clinicians: How can machine learning augment knowledge in geriatric oncology? J Geriatr Oncol 12:1159–1163, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pedregosa F, Varoquaux G, Gramfort A, et al. : Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12:2825–2830, 2011 [Google Scholar]
  • 3.Centers for Disease Control and Prevention. Web-based Injury Statistics Query and Reporting System (WISQARS) [Online]. (2003). National Center for Injury Prevention and Control, Centers for Disease Control and Prevention (producer). Available from: URL: www.cdc.gov/ncipc/wisqars. (Accessed April 29, 2022).
  • 4.Sekaran NK, Choi H, Hayward RA, et al. : Fall-associated difficulty with activities of daily living in functionally independent individuals aged 65 to 69 in the United States: a cohort study. J Am Geriatr Soc 61:96–100, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fried LP, Tangen CM, Walston J, et al. : Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci 56:M146–56, 2001 [DOI] [PubMed] [Google Scholar]
  • 6.Mohile SG, Fan L, Reeve E, et al. : Association of cancer with geriatric syndromes in older Medicare beneficiaries. J Clin Oncol 29:1458–64, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sattar S, Haase K, Kuster S, et al. : Falls in older adults with cancer: an updated systematic review of prevalence, injurious falls, and impact on cancer treatment. Support Care Cancer 29:21–33, 2021 [DOI] [PubMed] [Google Scholar]
  • 8.Mohile SG, Mohamed MR, Culakova E, et al. : A geriatric assessment (GA) intervention to reduce treatment toxicity in older patients with advanced cancer: A University of Rochester Cancer Center NCI community oncology research program cluster randomized clinical trial (CRCT). Journal of Clinical Oncology 38:12009–12009, 2020 [Google Scholar]
  • 9.Mohile SG, Mohamed MR, Xu H, et al. : Evaluation of geriatric assessment and management on the toxic effects of cancer treatment (GAP70+): a cluster-randomised study. Lancet 398:1894–1904, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sterne JA, White IR, Carlin JB, et al. : Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zeng M, Zou B, Wei F, et al. : Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), IEEE, 2016, pp 225–228 [Google Scholar]
  • 12.Jović A, Brkić K, Bogunović N: A review of feature selection methods with applications, 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), Ieee, 2015, pp 1200–1205 [Google Scholar]
  • 13.Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning, 2nd Edition. New York, Springer, 2009 [Google Scholar]
  • 14.Zou KH, O’Malley AJ, Mauri L: Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation 115:654–7, 2007 [DOI] [PubMed] [Google Scholar]
  • 15.Lundberg SM, Lee S-I: A unified approach to interpreting model predictions. Advances in neural information processing systems 30, 2017 [Google Scholar]
  • 16.Ribeiro MT, Singh S, Guestrin C: “ Why should i trust you?” Explaining the predictions of any classifier, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp 1135–1144 [Google Scholar]
  • 17.Capper D, Jones DTW, Sill M, et al. : DNA methylation-based classification of central nervous system tumours. Nature 555:469–474, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yokoyama S, Hamada T, Higashi M, et al. : Predicted Prognosis of Patients with Pancreatic Cancer by Machine Learning. Clin Cancer Res 26:2411–2421, 2020 [DOI] [PubMed] [Google Scholar]
  • 19.Vamathevan J, Clark D, Czodrowski P, et al. : Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery 18:463–477, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.O’Neil C: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Crown Publishing Group, 2016 [Google Scholar]
  • 21.Hurria A, Mohile S, Gajra A, et al. : Validation of a Prediction Tool for Chemotherapy Toxicity in Older Adults With Cancer. J Clin Oncol 34:2366–71, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Schonberg MA, Davis RB, McCarthy EP, et al. : Index to predict 5-year mortality of community-dwelling adults aged 65 and older using data from the National Health Interview Survey. J Gen Intern Med 24:1115–22, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ahmed H, Lofstead J: Managing Randomness to Enable Reproducible Machine Learning. Presented at the Proceedings of the 5th International Workshop on Practical Reproducible Evaluation of Computer Systems, 2022 [Google Scholar]
  • 24.Bishop JM: Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It. Frontiers in Psychology 11, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES