This study develops and validates machine learning approaches based on daily step counts collected by wearable devices in 3 prospective trials to predict hospitalizations during chemoradiotherapy.
Key Points
Question
How well can machine learning approaches using daily step counts from wearable devices predict risk of a first unplanned hospitalization event during concurrent chemoradiotherapy (CRT)?
Findings
In this secondary analysis of 3 prospective studies of 214 patients who underwent curative-intent CRT for a variety of solid tumors, machine learning approaches trained on only activity-monitoring data achieved good predictive ability in the stratified temporal validation cohort.
Meaning
Machine learning approaches using patient-generated health data can accurately predict risk for unplanned hospitalizations during CRT.
Abstract
Importance
Toxic effects of concurrent chemoradiotherapy (CRT) can cause treatment interruptions and hospitalizations, reducing treatment efficacy and increasing health care costs. Physical activity monitoring may enable early identification of patients at high risk for hospitalization who may benefit from proactive intervention.
Objective
To develop and validate machine learning (ML) approaches based on daily step counts collected by wearable devices on prospective trials to predict hospitalizations during CRT.
Design, Setting, and Participants
This study included patients with a variety of cancers enrolled from June 2015 to August 2018 on 3 prospective, single-institution trials of activity monitoring using wearable devices during CRT. Patients were followed up during and 1 month following CRT. Training and validation cohorts were generated temporally, stratifying for cancer diagnosis (70:30). Random forest, neural network, and elastic net–regularized logistic regression (EN) were trained to predict short-term hospitalization risk based on a combination of clinical characteristics and the preceding 2 weeks of activity data. To predict outcomes of activity data, models based only on activity-monitoring features and only on clinical features were trained and evaluated. Data analysis was completed from January 2022 to March 2023.
Main Outcomes and Measures
Model performance was evaluated in terms of the receiver operating characteristic area under curve (ROC AUC) in the stratified temporal validation cohort.
Results
Step counts from 214 patients (median [range] age, 61 [53-68] years; 113 [52.8%] male) were included. EN based on step counts and clinical features had high predictive ability (ROC AUC, 0.83; 95% CI, 0.66-0.92), outperforming random forest (ROC AUC, 0.76; 95% CI, 0.56-0.87; P = .02) and neural network (ROC AUC, 0.80; 95% CI, 0.71-0.88; P = .36). In an ablation study, the EN model based on only step counts demonstrated greater predictive ability than the EN model with step counts and clinical features (ROC AUC, 0.85; 95% CI, 0.70-0.93; P = .09). Both models outperformed the EN model trained on only clinical features (ROC AUC, 0.53; 95% CI, 0.31-0.66; P < .001).
Conclusions and Relevance
This study developed and validated a ML model based on activity-monitoring data collected during prospective clinical trials. Patient-generated health data have the potential to advance predictive ability of ML approaches. The resulting model from this study will be evaluated in an upcoming multi-institutional, cooperative group randomized trial.
Introduction
Toxic effects of concurrent chemoradiotherapy (CRT) cause treatment interruptions and hospitalizations, reducing treatment efficacy and increasing health care costs.1 In a recent randomized trial, machine learning (ML) approaches were applied to electronic health records and identified patients at high risk for unplanned medical visits during radiotherapy to direct supportive care.2 Wearable devices now allow continuous collection of objective and dynamic health data outside of health care settings,3 which may be leveraged through artificial intelligence and ML to predict adverse events.4
This study uses data from 3 prospective trials in which patients with a variety of solid tumors used wearable devices during CRT.5,6,7 The objective was to develop and validate a ML approach to predict unplanned hospitalizations using daily step counts. The resulting models are being evaluated in NRGF-001,8 which randomizes patients with non–small cell lung cancer undergoing CRT to care with or without continuous activity monitoring.
Methods
Study Design
This study included patients from 3 single-institution trials of activity monitoring during CRT from June 2015 to August 2018.3,5,6,7 Patients wore study-provided devices, and daily step counts were collected approximately 1 week before CRT through treatment completion. Patients were instructed to wear devices, but compliance was not enforced, reflecting everyday use. Waterproof devices that did not require charging were selected to maximize use. Three patients were excluded due to missing clinical data (n = 1) or hospitalization before CRT (n = 2).
ML models were developed to predict hospitalization in the week after a given prediction day, using data from the preceding 2 weeks. Overlapping observations were generated by incrementing the prediction day until first hospitalization or treatment completion, enabling models to be applied on any day of treatment. Hospitalizations were tracked on trial from enrollment through at least 1 month after CRT.
Due to the effect of hospitalization on step counts and readmission, data following first hospitalization were excluded. Each observation was weighted proportionally based on observations per patient to equalize representation. On average, patients had step counts on 87% of days through treatment or first hospitalization.
This study was approved by the institutional review board at the University of California, San Francisco. This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines.
Clinical Variables and Step Count Preprocessing
Clinical features considered included Eastern Cooperative Oncology Group Performance Status (PS), primary cancer, age, sex, and elapsed days of CRT at prediction (Table). To mitigate noise and missingness, step counts were smoothed into 3-day running averages, and features were normalized using a standard scaler. To account for variation over days of the week, step counts were aggregated into weekly features (mean, median, minimum, maximum, range, and standard deviation) from each of the 2 weeks preceding the prediction day. Absolute and relative changes in those aggregate statistics between the 2 preceding weeks were determined. The 7 daily smoothed step counts from the week preceding the prediction day were included, resulting in 31 step count and 46 total features.
Table. Patient Characteristics.
| Characteristic | No. (%) | ||
|---|---|---|---|
| Total patients (N = 214) | Training cohort (n = 151) | Validation cohort (n = 63) | |
| Age, median (range), y | 61 (53-68) | 61 (53-68) | 63 (54-68) |
| Hospitalization | 59 (27.6) | 44 (29.1) | 15 (23.8) |
| ECOG Performance Status | |||
| 0 | 96 (44.8) | 60 (39.7) | 36 (57.1) |
| 1 | 102 (47.7) | 77 (51.0) | 25 (39.7) |
| 2 | 16 (7.5) | 14 (9.3) | 2 (3.2) |
| Trials | |||
| 1 | 37 (17.3) | 37 (24.5) | 0 |
| 2 | 37 (17.3) | 36 (23.8) | 1 (1.6) |
| 3 | 140 (65.4) | 78 (51.7) | 62 (98.4) |
| Sex | |||
| Female | 101 (47.2) | 72 (47.7) | 29 (46.0) |
| Male | 113 (52.8) | 79 (52.3) | 34 (54.0) |
| Primary cancer site | |||
| Central nervous system | 7 (3.3) | 5 (3.3) | 2 (3.2) |
| Cervix | 19 (8.9) | 13 (8.6) | 6 (9.5) |
| Gastrointestinal | 60 (28.0) | 42 (27.8) | 18 (28.6) |
| Anal | 10 (4.6) | 7 (4.6) | 3 (4.8) |
| Esophagus | 12 (5.6) | 8 (5.3) | 4 (6.3) |
| Gastric | 8 (3.7) | 6 (4.0) | 2 (3.2) |
| Pancreatic | 8 (3.7) | 6 (4.0) | 2 (3.2) |
| Rectal | 19 (8.9) | 13 (8.6) | 6 (9.5) |
| Head and neck | 65 (30.4) | 46 (30.5) | 19 (30.2) |
| Lung | 63 (29.4) | 45 (29.8) | 18 (28.6) |
| NSCLC | 58 (27.1) | 41 (27.2) | 17 (27.0) |
| SCLC | 5 (2.3) | 4 (2.6) | 1 (1.6) |
| Step counts, median (range) | |||
| Preceding treatment | |||
| Days recorded | 6.5 (1-7) | 7.0 (3-7) | 3.0 (0-7) |
| Steps recorded | 5082 (3335-8064) | 5090 (3335-8064) | 5072 (3414-7943) |
| During treatment | |||
| Days recorded | 39.0 (30-44) | 40.0 (33-45) | 32.0 (26-42) |
| Steps recorded | 4792 (3246-6580) | 4944 (3318-6760) | 4552 (3142-6240) |
Abbreviations: ECOG, Eastern Cooperative Oncology Group; NSCLC, non–small cell lung cancer; SCLC, small cell lung cancer.
Model Development and Assessment
Patients were split into training (151 patients [70.6%]) and validation (63 patients [29.4%]) cohorts temporally (with earlier patients allocated to training and later patients allocated to validation), stratified by cancer diagnosis.9 Elastic net–regularized logistic regression (EN), random forest (RF), and ensembled sparse-input neural network (NN)10 were trained on the training cohort and evaluated on the holdout validation cohort by receiver operating characteristic area under curve (ROC AUC). Models were evaluated based on every prediction day across all patients, weighted by observations per patient. If any features were not computable due to missing data, the observation day was dropped. Across the training and validation cohorts, 1258 of 7041 possible prediction days (17.9%) were excluded due to missing data. Among these, 650 observations (9.2%) preceded sufficient step count collection, as patients started wearing devices late. Thus, 608 of 5783 days (10.5%) after wearable use start were missing.
Statistical Analysis
Hyperparameters were tuned using grid search and 3-fold cross-validation (EN: L1 ratio and C [inverse of regularization strength]; RF: maximum features and minimum samples split; and NN: input, full tree penalties). For each ML approach, 3 models were trained on (1) step counts only, (2) clinical features only, and (3) combined. Calibration was assessed. Confidence intervals and P values were calculated using bootstrap resampling on a patient level to account for within-person correlation. A Youden cutoff was determined to assess specificity, sensitivity, and accuracy for the best performing model. Analyses were performed using Python, version 3.9.7 (Python Software Foundation). A P < .05 was defined as statistically significant, and all tests were 2-sided.
Results
A total of 214 patients were analyzed. The median (range) age was 61 (53-68) years, 113 patients (52.8%) were male, and 198 patients (92.5%) had a baseline PS of 0 or 1 (Table). The most common primary cancer sites were head and neck (65 [30.4%]) and lung (63 [29.4%]). Fifty-nine patients (27.6%) were hospitalized during CRT.
EN demonstrated the best predictive performance on validation (ROC AUC, 0.83; 95% CI, 0.66-0.92), followed by NN (ROC AUC, 0.80; 95% CI, 0.71-0.88; P = .36) and RF (ROC AUC, 0.76; 95% CI, 0.56-0.87; P = .02) (Figure 1). Step count EN performed comparably with the combined EN (ROC AUC, 0.85; 95% CI, 0.70-0.93; P = .09). Both outperformed the EN model trained on only clinical features (ROC AUC, 0.53; 95% CI, 0.31-0.66; P < .001).
Figure 1. Receiver Operating Characteristic Curves.

Each machine learning approach is based on step counts and clinical features, and elastic net, neural net, and random forest models are trained on step counts and clinical features, only step counts, and only clinical features.
The step count EN model incorporated 7 aggregate weekly changes (relative changes in median, minimum, mean, and range, and absolute changes in maximum, median, and range), 3 aggregate features of the week preceding prediction day (median, minimum, and range), and 5 smoothed step counts from the week preceding prediction day (Figure 2A). Clinical features contributing to the combined EN model included anal and esophagus cancer diagnoses, and PS (Figure 2B). Hyperparameters for the best-performing step count EN were L1 ratio 0.2 and C 0.9. A Youden candidate cutoff (0.109) demonstrated 83.0% specificity, 60.7% sensitivity, and 82.1% accuracy.
Figure 2. Nonzero Coefficients for the Best Performing Elastic Net Model.
Positive and negative coefficients are correlated with increased and decreased likelihood of hospitalization in the subsequent week, respectively. ECOG indicates Eastern Cooperative Oncology Group.
Discussion
Prospective trial data were used to develop and validate ML approaches predicting hospitalization during CRT using daily step counts. EN models demonstrated greatest performance, without improvement with clinical variables. This demonstrates the potential for simple activity metrics to enhance or even supplant traditional metrics requiring clinical encounters, such as PS, to characterize patients’ conditions.
There are limited data demonstrating the clinical value of wearables beyond promoting activity.11,12 Associations between step counts and hospitalizations during CRT13 may have been obscured due to approach, sample size, or study population. Computational techniques may be critical to uncovering associations between large-volume, noisy wearable data and clinical outcomes.
ML approaches using step counts may direct supportive care to reduce unplanned acute care.2 This hypothesis is being tested in NRGF-001,8 one of the first studies to leverage activity monitoring to facilitate interventions toward improving clinical outcomes rather than physical activity. Participants on the experimental arm will receive wearable devices to provide treating physicians with ML-based predictions of hospitalization risk. A planned prospective single-institution study will seek to validate this model across different devices and compare and combine it with dynamic electronic health record and natural language processing–based approaches.2,14
Strengths and Limitations
Strengths of this study include a standardized device to limit measurement variability and mitigate biases from device ownership disparities,15 use of prospective trial data with a large sample size, and advanced analytic techniques. Limitations include study conduct in a unique academic urban setting—potentially introducing selection bias, use of a single activity metric, use time variability, potential informative missingness due to missing data (particularly during early treatment), and lack of prospective, higher dimensionality, and dynamic clinical data used in prior studies.2
Conclusions
Overall, this study and NRGF-0018 will support a growing body of evidence that predictive tools based on activity data may identify patients at risk for unplanned hospitalizations, potentially facilitating preventive interventions.2
Data Sharing Statement
References
- 1.Brooks GA, Li L, Uno H, Hassett MJ, Landon BE, Schrag D. Acute hospital care is the chief driver of regional spending variation in Medicare patients with advanced cancer. Health Aff (Millwood). 2014;33(10):1793-1800. doi: 10.1377/hlthaff.2014.0280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hong JC, Eclov NCW, Dalal NH, et al. System for High-Intensity Evaluation During Radiation Therapy (SHIELD-RT): a prospective randomized study of machine learning–directed clinical evaluations during radiation and chemoradiation. J Clin Oncol. 2020;38(31):3652-3661. doi: 10.1200/JCO.20.01688 [DOI] [PubMed] [Google Scholar]
- 3.Andraos TY, Asaro AM, Garg MK, et al. Real-time Activity Monitoring to Prevent Admissions during Radiotherapy (RAMPART). Int J Radiat Oncol Biol Phys. 2019;105(1)(suppl):E463. doi: 10.1016/j.ijrobp.2019.06.1424 [DOI] [Google Scholar]
- 4.Dunn J, Kidzinski L, Runge R, et al. Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat Med. 2021;27(6):1105-1112. doi: 10.1038/s41591-021-01339-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ohri N, Halmos B, Bodner WR, et al. Daily step counts: a new prognostic factor in locally advanced non–small cell lung cancer? Int J Radiat Oncol Biol Phys. 2019;105(4):745-751. doi: 10.1016/j.ijrobp.2019.07.055 [DOI] [PubMed] [Google Scholar]
- 6.Paul S, Bodner WR, Garg M, Tang J, Ohri N. Cardiac irradiation predicts activity decline in patients receiving concurrent chemoradiation for locally advanced lung cancer. Int J Radiat Oncol Biol Phys. 2020;108(3):597-601. doi: 10.1016/j.ijrobp.2020.05.042 [DOI] [PubMed] [Google Scholar]
- 7.Ohri N, Kabarriti R, Bodner WR, et al. Continuous activity monitoring during concurrent chemoradiotherapy. Int J Radiat Oncol Biol Phys. 2017;97(5):1061-1065. doi: 10.1016/j.ijrobp.2016.12.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Activity monitoring to improve patient care during chemoradiotherapy for locally advanced non–small cell lung cancer (LA-NSCLC). ClinicalTrials.gov identifier: NCT04878952. Updated January 9, 2024. Accessed February 21, 2024. https://clinicaltrials.gov/ct2/show/study/NCT04878952?term=NCT04878952&draw=2&rank=1
- 9.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55-63. doi: 10.7326/M14-0697 [DOI] [PubMed] [Google Scholar]
- 10.Feng J, Simon N. Ensembled sparse-input hierarchical networks for high-dimensional datasets. Stat Anal Data Min. Published online March 14, 2022. doi: 10.1002/sam.11579 [DOI] [Google Scholar]
- 11.Lynch BM, Nguyen NH, Moore MM, et al. A randomized controlled trial of a wearable technology-based intervention for increasing moderate to vigorous physical activity and reducing sedentary behavior in breast cancer survivors: the ACTIVATE Trial. Cancer. 2019;125(16):2846-2855. doi: 10.1002/cncr.32143 [DOI] [PubMed] [Google Scholar]
- 12.Mehta SJ, Hume E, Troxel AB, et al. Effect of remote monitoring on discharge to home, return to activity, and rehospitalization after hip and knee arthroplasty: a randomized clinical trial. JAMA Netw Open. 2020;3(12):e2028328. doi: 10.1001/jamanetworkopen.2020.28328 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sher DJ, Radpour S, Shah JL, et al. Pilot study of a wearable activity monitor during head and neck radiotherapy to predict clinical outcomes. JCO Clin Cancer Inform. 2022;6:e2100179. doi: 10.1200/CCI.21.00179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hong JC, Fairchild AT, Tanksley JP, Palta M, Tenenbaum JD. Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts. JAMIA Open. 2020;3(4):513-517. doi: 10.1093/jamiaopen/ooaa064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pratap A, Neto EC, Snyder P, et al. Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants. NPJ Digit Med. 2020;3(1):21. doi: 10.1038/s41746-020-0224-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Sharing Statement

