Abstract
Background:
The recent expansion of immunotherapy for stage IIB/IIC melanoma highlights a growing clinical need to identify patients at high risk of metastatic recurrence and, therefore, most likely to benefit from this therapeutic modality.
Objective:
To develop time-to-event risk prediction models for melanoma metastatic recurrence.
Methods:
Patients diagnosed with stage I/II primary cutaneous melanoma between 2000 and 2020 at Mass General Brigham and Dana-Farber Cancer Institute were included. Melanoma recurrence date and type were determined by chart review. Thirty clinicopathologic factors were extracted from electronic health records. Three types of time-to-event machine-learning models were evaluated internally and externally in the distant versus locoregional/non-recurrence prediction.
Results:
This study included 954 melanomas (155 distant, 163 locoregional, and 636 1:2 matched non-recurrences). Distant recurrences were associated with worse survival compared to locoregional/non-recurrences (HR: 6.21, p<0.001) and to locoregional recurrences only (HR: 5.79, p<0.001). The Gradient Boosting Survival model achieved the best performance (concordance index: 0.816; time-dependent AUC: 0.842; Brier score: 0.103) in the external validation.
Limitations:
Retrospective nature and cohort from one geography.
Conclusions:
These results suggest that time-to-event machine-learning models can reliably predict the metastatic recurrence from localized melanoma and help identify high-risk patients who are most likely to benefit from immunotherapy.
Keywords: stage I/II melanoma, metastatic recurrence, locoregional recurrence, time-to-event prediction, clinicopathologic factors
Capsule Summary
Clinicopathologic factors can be used to identify localized melanomas at high risk of metastatic recurrence and select patients who are most likely to benefit from adjuvant immunotherapy.
Time-to-event models achieved a concordance index 0.816 and a time-dependent area under the receiver operating characteristic curve 0.842 in the metastatic recurrence prediction.
Introduction
Dramatic improvements in cancer cell biology and immunology are constantly pushing forward therapeutic options, and treatment strategies for managing melanoma have continued to evolve over the past decade. Notably, pembrolizumab was approved in December 2021 as adjuvant therapy for patients with high-risk node-negative melanoma (stage IIB/IIC) following complete resection of the primary tumor.1 However, though there are established surveillance guidelines for advanced (stage III/IV) melanoma, the management of patients with stage IIB/IIC disease is largely up to the physician’s discretion, as there is limited evidence supporting specific recommendations. The decision to proceed with adjuvant immunotherapy is based on the recurrence risk, treatment-related toxicities, and patient preferences. This highlights a significant opportunity for developing decision-making tools to stratify patients with localized melanoma (stage I/II) according to their risk for recurrence and mortality.
Previous studies have demonstrated that typically up to 30% of patients with localized melanoma experience a disease recurrence.2,3 Adjuvant immunotherapy, if utilized in the appropriate population, may prevent recurrence and improve patient outcomes. However, this treatment class is associated with significant and potentially permanent toxicities and should therefore be targeted toward patients with the highest risk of recurrence.4,5,6,7,8 Furthermore, methods to identify patients at the highest risk of recurrence are lacking, underscoring the importance of re-evaluating current management and surveillance patterns to identify those whose risk profile favors systemic therapy. Specifically, prior studies have demonstrated that routine imaging detects only 21% of recurrent cases in patients with stage II disease.2 Besides, distant metastases following surgical resection of melanoma often portend one of the worst prognoses compared to locoregional recurrences.9 Also, distant recurrence and locoregional recurrence require different treatment strategies. For example, patients with distant recurrence may need systemic therapies, while patients with locoregional recurrence may benefit from local treatments, such as re-excision and intralesional therapy.10 Thus, a greater understanding of distant recurrence from localized melanoma is increasingly important.
This study is built on a recent publication from our research group, which successfully developed risk prediction models for any type of recurrence from early-stage melanoma.11 The goal of the current study is to specifically identify patients at the highest risk of distant recurrence, who are therefore most likely to benefit from adjuvant therapy. Since the timing of the recurrence is an important criterion in evaluating the most appropriate time for systemic intervention, we specifically focus on the methodological approach to time-to-event prediction. Compared to prior studies of melanoma recurrence that relied on conventional statistical models or concentrated on the binary classification of recurrent versus non-recurrent melanomas12,13, our time-to-event machine-learning approach will enable improved representation of the complexity of disease progression and the reality of clinical practice. In this study, we apply time-to-event algorithms from three categories including linear, ensemble, and deep learning-based models, and compare their performance in the distant recurrence risk prediction.
Methods
We developed time-to-event machine-learning models to predict distant recurrence versus locoregional/non-recurrence of localized melanomas. We also examined model performances in the prediction of distant versus locoregional recurrence and the prediction of distant recurrence versus non-recurrence. We additionally predicted distant, locoregional, and non-recurrence together with competing risk modeling.
Data Collection
This study included stage I/II melanomas at the Massachusetts General Hospital (MGH) and Brigham and Women’s Hospital/Dana-Farber Cancer Institute (DFCI) between January 2000 and February 2020. Figure 1 shows the flow diagram of how the study population was obtained. All recurrent melanomas were first identified, then non-recurrent melanomas were 2:1 best matched with recurrent melanomas on the primary melanoma diagnosis year using the “matchControls” function in R 4.1.0.14 eMethod 1 describes the inclusion criteria.
Figure 1. Study Design Flowchart.
Stage I and stage II cutaneous melanomas with no evidence of nodal or distant metastasis at the time of primary diagnosis were included. Thus, melanomas with positive sentinel lymph node biopsy and melanomas with microscopic satellites were excluded. Acral, mucosal, uveal, and desmoplastic melanomas were excluded. Melanomas without a pathology report available in electronic health records were excluded. All recurrent melanomas were first identified, then non-recurrent melanomas were 2:1 best matched with recurrent melanomas on diagnosis year of primary melanoma.
A manual review of electronic health records (EHRs) was conducted to ascertain the recurrence status (locoregional, distant, non-recurrence) and date. Melanomas that were stage IV at the time of recurrence, based on the American Joint Committee on Cancer 8th edition (AJCC-8) staging guidelines, were labeled as “distant recurrence.”15 All recurrent melanomas without distant metastases were labeled as “locoregional recurrence.” All melanomas without any recurrence were labeled as “non-recurrence.” The time-to-event was defined as the duration from primary melanoma diagnosis to the first recurrence if the melanoma recurred; otherwise, the duration to the date of death or last follow-up.
We extracted variables of interest (eTable 1) from two institutional clinical databases: Research Patient Data Registry16 and Enterprise Data Warehouse17. eMethod 2 details information extraction. Median household income was extracted from the U.S. Census data using the patient’s zip code.18 ICD codes were used to calculate the Charlson Comorbidity score (CCS)19. Based on clinical guidelines, sentinel lymph node biopsy (SLNB) is recommended for melanomas with Breslow thickness >1.00 mm and not recommended for primary melanomas that are <0.8 mm thick and non-ulcerated lesions.20 Primary melanomas that are 0.8 to 1.0 mm thick or are <0.8 mm thick and ulcerated may be offered SLNB after a clinical discussion of the procedure’s advantages and disadvantages.20 We incorporated the SLNB complexity to reflect real-world clinical settings.
Time-to-Event Machine Learning Methods
We compared prediction performances of three types of supervised time-to-event machine-learning algorithms (eMethod 3): ensemble models, including GradientBoostingSurvivalAnalysis (GBS)21 and RandomSurvivalForest (RSF)22, linear models, including CoxnetSurvivalAnalysis (Coxnet)23 and CoxPHSurvivalAnalysis (CoxPH)24, and deep learning models, including DeepSurv25 and CoxTime.26 The DeepHit27 was applied for competing risk modeling. Models were evaluated internally and externally by concordance index,28 time-dependent area under the receiver operating characteristic curve (AUC),29 and Brier score30 using: 1. five-fold cross-validation of the entire cohort (internal validation); 2. the MGH cohort for model development and the DFCI cohort for validating independently (external validation). Each experiment was repeated 50 times. Mean and 95% confidence interval (CI) were reported. Experiments were implemented by using scikit-survival 0.18.031 and pycox 0.2.3.32
eFigure 1 presents the model development and validation pipeline. In Phase 0, we examined model performances using all extracted features. In Phase 1, we conducted feature selection using the MGH cohort. We first investigated the predictive features by conducting permutation importance.33 Given the multicollinearity of features, we performed hierarchical clustering on features’ Spearman rank-order correlations, and a single feature in each highly correlated cluster was retained. In Phase 2, models were evaluated using the selected features. We also ranked the selected features by conducting permutation importance.
Statistical Analyses
We conducted Pearson’s Chi-squared test for categorical variables and t-test for continuous variables to compare groups. Kaplan-Meier curves for overall survival and distant recurrence free probabilities were utilized. Guarantee-time bias occurs when the outcome of interest competes with mortality (e.g., patients must be alive long enough to develop a recurrence). To account for this potential bias, time-varying Cox proportional hazards regression models were used to compute Hazard Ratios (HRs).34 Statistical analyses were conducted using R 4.1.0.14
Results
Participant Characteristics
Among the 954 stage I/II primary cutaneous melanomas, 155 melanomas recurred distantly, and 163 melanomas recurred locoregionally. The characteristics of the study population are described in Table 1 and eTable 3 (all variables). The comparison between distant and locoregional recurrences is presented in eTable 4.
Table 1.
Characteristics of the study population.
| Locoregional/Non-Recurrence (N=799) | Distant Recurrence (N=155) | P-value | |
|---|---|---|---|
| Institution | |||
| DFCI | 235 (29.4%) | 53 (34.2%) | 0.275 |
| MGH | 564 (70.6%) | 102 (65.8%) | |
| Duration of follow-up (year) | |||
| Median [IQR] | 6.9 [2.8, 11.2] | 5.7 [3.3, 8.9] | <0.001 |
| Mortality status | |||
| Alive | 599 (75.0%) | 60 (38.7%) | <0.001 |
| Dead | 200 (25.0%) | 95 (61.3%) | |
| Age at diagnosis (year) | |||
| Median [IQR] | 62 [51, 72] | 63 [54, 72] | 0.105 |
| Sex | |||
| Female | 362 (45.3%) | 47 (30.3%) | <0.001 |
| Male | 437 (54.7%) | 108 (69.7%) | |
| Race | |||
| White | 789 (98.7%) | 154 (99.4%) | 0.813 |
| Unavailable/Other | 10 (1.3%) | 1 (0.6%) | |
| Ethnicity | |||
| Non-Hispanic | 786 (98.4%) | 154 (99.4%) | 0.572 |
| Unavailable/Other | 13 (1.6%) | 1 (0.6%) | |
| Histology type | |||
| Lentigo maligna melanoma | 55 (6.9%) | 14 (9.0%) | <0.001 |
| Nodular melanoma | 103 (12.9%) | 36 (23.2%) | |
| Superficial spreading melanoma | 497 (62.2%) | 62 (40.0%) | |
| Melanoma, not otherwise specified | 144 (18.0%) | 43 (27.7%) | |
| Tumor site | |||
| Skin of face | 89 (11.1%) | 24 (15.5%) | <0.001 |
| Skin of lower limb and hip | 167 (20.9%) | 18 (11.6%) | |
| Skin of scalp and neck | 49 (6.1%) | 35 (22.6%) | |
| Skin of trunk | 294 (36.8%) | 54 (34.8%) | |
| Skin of upper limb and shoulder | 200 (25.0%) | 24 (15.5%) | |
| AJCC stage | |||
| 1A | 429 (53.7%) | 19 (12.3%) | <0.001 |
| 1B | 213 (26.7%) | 45 (29.0%) | |
| 2A | 64 (8.0%) | 29 (18.7%) | |
| 2B | 55 (6.9%) | 30 (19.4%) | |
| 2C | 38 (4.8%) | 32 (20.6%) | |
| Breslow thickness (mm) | |||
| Median [IQR] | 0.7 [0.4, 1.4] | 2.1 [1.1, 4.2] | <0.001 |
| Anatomic (Clark’s) level | |||
| Median [IQR] | 4 [2, 4] | 4 [4, 4] | <0.001 |
| Laterality | |||
| Left | 363 (45.4%) | 73 (47.1%) | 0.113 |
| Midline | 79 (9.9%) | 23 (14.8%) | |
| Right | 357 (44.7%) | 59 (38.1%) | |
| Sentinel lymph node biopsy | |||
| Not indicated | 419 (52.4%) | 19 (12.3%) | <0.001 |
| All nodes negative | 312 (39.0%) | 102 (65.8%) | |
| Not performed: unknown reason | 17 (2.1%) | 5 (3.2%) | |
| Not performed: due to age/comorbidity | 41 (5.1%) | 27 (17.4%) | |
| Not performed: deferred by patient | 10 (1.3%) | 2 (1.3%) | |
| Ulceration | |||
| Absent | 701 (87.7%) | 101 (65.2%) | <0.001 |
| Present | 94 (11.8%) | 50 (32.3%) | |
| Unavailable | 4 (0.5%) | 4 (2.6%) | |
| Mitotic rate (mitoses/mm2) | |||
| Median [IQR] | 1 [0, 3] | 4 [1, 12] | <0.001 |
| Total surgical margins (cm) | |||
| Median [IQR] | 1 [1, 2] | 2 [1, 2] | <0.001 |
| Tumor infiltrating lymphocytes (TIL) | |||
| Present: brisk | 56 (7.0%) | 7 (4.5%) | 0.044 |
| Present: non-brisk | 400 (50.0%) | 98 (63.2%) | |
| Absent | 160 (20.0%) | 25 (16.1%) | |
| Unavailable | 183 (22.9%) | 25 (16.1%) |
When comparing the distant recurrence group to the combined locoregional/non-recurrence group, there were no significant differences in age at diagnosis, race, ethnicity, marital status, CCS, history of previous melanoma (HOM), laterality, presence of regression, and perineural invasion. The percentage of males in the distant recurrence group was higher (70% versus 55%, p<0.001). The median income in the distant recurrence group was lower (95,000 versus 101,000 dollars, p=0.04). More patients with stage II melanoma were in the distant recurrence group (59% versus 20%, p<0.001). Figure 2.C presents the Kaplan-Meier curves of distant recurrence free probability stratified by AJCC-8 stage among the study population.
Figure 2. Kaplan-Meier curves and the distribution of recurrent melanomas.
A. A Kaplan-Meier curve of overall survival comparing the distant recurrence group and the locoregional/non-recurrence group. The Hazard Ratio for overall survival was computed using a univariate time-varying Cox Proportional Hazards model.
B. A Kaplan-Meier curve of overall survival comparing the distant recurrence group and the locoregional recurrence group. The Hazard Ratio for overall survival was computed using a univariate time-varying Cox Proportional Hazards model.
C. A Kaplan-Meier curve for distant recurrence stratified by AJCC stage.
D. The number of distant recurrences within a period from diagnosis compared to the number of locoregional recurrences.
When comparing the distant recurrence group to the locoregional recurrence group (eTable 4), there were no significant differences in all tumor characteristics except tumor site, laterality, and sentinel lymph node biopsy status. The time from primary diagnosis to distant recurrence was longer than the time to locoregional recurrence (2.6 versus 1.6 years, p=0.003). The histogram of recurrent melanomas is presented in Figure 2.D. Within two years, there were fewer distant recurrences than locoregional recurrences (40% versus 64%, p<0.001).
The median follow-up was 5.7 years (IQR: 3.3–8.9) for distant recurrences, 6.1 years (IQR: 3.2–10.8) for locoregional recurrences, and 7.0 years (IQR: 2.7–11.2) for non-recurrences (eTable 4). The distant recurrence group had the highest mortality rate compared to the locoregional recurrence and the non-recurrence groups (61.3% versus 38.7% versus 21.5%, p<0.001). The Kaplan-Meier curves for overall survival are presented in Figure 2.A and Figure 2.B. The Hazard Ratio of overall mortality for the distant recurrence group was 6.2 (95% CI: 4.8–8.0, p<0.001) when compared to the combined locoregional/non-recurrence group and was 5.8 (95% CI: 4.2–8.1) when compared to the locoregional recurrence group.
Performance of Prediction Models
We built time-to-event models for distant recurrence risk prediction using ensemble, linear, and deep-learning algorithms. We evaluated the model performance first by experimenting on all extracted variables and then on the 27 selected variables (see below for details). The results are presented in Table 2.
Table 2.
Time-to-event distant recurrence versus locoregional/non-recurrence prediction.
| Ensemble models | Linear models | Deep learning models | |||||
|---|---|---|---|---|---|---|---|
| RSF | GBS | CoxPH | CoxNet | DeepSurv | CoxTime | ||
| All extracted variables | Concordance index a | ||||||
|
Internal
(Mean and 95% CI) |
0.836 0.830–0.841 |
0.844 0.839–0.850 |
0.811 0.801–0.822 |
0.805 0.793–0.816 |
0.811 0.799–0.822 |
0.820 0.811–0.829 |
|
|
External
(Mean and 95% CI) |
0.798 0.797–0.799 |
0.811
0.811–0.812 |
0.711 0.710–0.712 |
0.710 0.709–0.711 |
0.738 0.728–0.748 |
0.749 0.743–0.755 |
|
| P-value b | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
| Time-dependent AUC a | |||||||
|
Internal
(Mean and 95% CI) |
0.867 0.861–0.873 |
0.877 0.869–0.885 |
0.845 0.833–0.857 |
0.846 0.837–0.856 |
0.852 0.839–0.865 |
0.860 0.850–0.869 |
|
|
External
(Mean and 95% CI) |
0.811 0.810–0.812 |
0.837
0.836–0.838 |
0.738 0.737–0.739 |
0.737 0.736–0.738 |
0.770 0.760–0.780 |
0.779 0.773–0.785 |
|
| P-value b | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
| Integrated time-dependent Brier score a | |||||||
|
Internal
(Mean and 95% CI) |
0.097 0.094–0.101 |
0.109 0.102–0.116 |
0.108 0.101–0.115 |
0.109 0.102–0.116 |
0.109 0.103–0.115 |
0.108 0.103–0.113 |
|
|
External
(Mean and 95% CI) |
0.107 0.106–0.107 |
0.103
0.103–0.104 |
0.126 0.126–0.127 |
0.127 0.126–0.127 |
0.127 0.125–0.129 |
0.126 0.123–0.129 |
|
| P-value b | <0.001 | 0.087 | <0.001 | <0.001 | <0.001 | <0.001 | |
| 27 selected variables | Concordance index a | ||||||
|
External
(Mean and 95% CI) |
0.810 0.809–0.811 |
0.816
0.815–0.816 |
0.774 0.773–0.774 |
0.777 0.776–0.777 |
0.767 0.756–0.778 |
0.772 0.764–0.78 |
|
| P-value c | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
| Time-dependent AUC a | |||||||
|
External
(Mean and 95% CI) |
0.822 0.821–0.823 |
0.842
0.842–0.843 |
0.792 0.791–0.793 |
0.796 0.796–0.797 |
0.784 0.772–0.795 |
0.786 0.777–0.795 |
|
| P-value c | <0.001 | <0.001 | <0.001 | <0.001 | 0.079 | 0.197 | |
| Integrated time-dependent Brier score a | |||||||
|
External
(Mean and 95% CI) |
0.105 0.105–0.106 |
0.103
0.103–0.104 |
0.112 0.111–0.112 |
0.110 0.108–0.112 |
0.126 0.123–0.128 |
0.129 0.125–0.133 |
|
| P-value c | <0.001 | 0.823 | <0.001 | <0.001 | 0.446 | 0.087 | |
When the concordance index and time-dependent AUC tend to 1, the model is more accurate. When the Brier score tends to 0, the model is more accurate.
P-value: t-test between the internal validation and the external validation.
P-value: t-test of the external validations when all extracted variables were used and when 27 selected variables were used.
The best performances when all extracted variables were used are highlighted in blue, and the best performances when 27 selected variables were used are highlighted in green.
For experiments on all extracted variables, the ensemble models (GBS and RSF) achieved better performance than the linear and deep-learning models (p<0.001). In the internal validation, GBS and RSF models achieved similar performance (concordance index: 0.844 versus 0.836, p=0.026; time-dependent AUC: 0.877 versus 0.867, p=0.057). In the external validation, GBS outperformed RSF (concordance index: 0.811 versus 0.798, p<0.001; time-dependent AUC: 0.837 versus 0.811, p<0.001).
We selected features by ranking all extracted features based on permutation importance (eFigures 2-4) and conducting correlation analysis (eFigure 5). Breslow thickness and mitotic rate were the two most important features in the GBS and RSF models (eFigure 2). AJCC stage, mitotic rate, and vertical growth phase type (VGT) appeared to be important in the linear models (eFigure 3). Breslow thickness, health insurance type, and total surgical margins were the most important features in the deep-learning models (eFigure 4). Other important features included SLNB, ulceration, melanoma histology type, age at diagnosis, tumor site, first visit due to melanoma (FVM), HOM, patient sex, and laterality. We initially selected 30 one-hot encoded features and conducted a correlation analysis. “Stage:1A”, “SLNB: Not indicated”, and “Laterality: Left” were removed since they were highly correlated with the other variables (eFigure 5). The final 27 selected features are specified in eTable 2.
When experimenting on the 27 selected features, all models achieved better performance in terms of concordance index in the external validation (p<0.001). Ensemble and linear models achieved better time-dependent AUC (p<0.001) compared to the results when all available features were included. The GBS models outperformed other models in the external validation (concordance index: 0.816; time-dependent AUC: 0.842, integrated time-dependent Brier score: 0.103). The results of the internal validation are presented in eTable 5.
Figure 3 displays the importance ranking of the selected features in each model. Breslow thickness, mitotic rate, SLNB:Age.comorbidity (inability to perform SLNB due to age or comorbidity of the patient), and Site:Scalp.neck were the common top five features in the ensemble models. Stage:2C, Stage:2B, Stage:2A, SLNB:Age.comorbidity, and Insurance:Self.Pay were the top five features in the linear models. Mitotic rate, Breslow thickness, Age at diagnosis, SLNB:Age.comorbidity, Site:Face, and Gender:Male played important roles in the deep-learning models.
Figure 3. Mean feature importance in the time-to-event distant recurrence versus locoregional/non-recurrence prediction on the MGH cohort.
First column (A and B): Ranking of the 27 selected features in the MGH cohort by the ensemble models (RSF and GBS, respectively).
Second column (C and D): Ranking of the 27 selected features in the MGH cohort by the linear models (CoxPH and Coxnet, respectively).
Third column (E and F): Ranking of the 27 selected features in the MGH cohort by the deep learning models (DeepSurv and CoxTime, respectively).
Secondary Analyses
We additionally evaluated model performances in the distant versus locoregional recurrence prediction (eTable 6) and the distant versus non-recurrence prediction (eTable 7). All models did not achieve satisfactory performance in the distant versus locoregional recurrence prediction. The best concordance index and time-dependent AUC in the external validation were less than 0.7 (eTable 6). The models achieved better concordance index and time-dependent AUC in the prediction of distant versus non-recurrence (eTable 7) compared to the results in the prediction of distant versus locoregional/non-recurrence (eTable 4), especially in the internal validation (p<0.01 for all models). For competing risk modeling, the model achieved concordance index of 0.767 (95% CI: 0.760–0.773) and Brier score of 1.49 (95% CI: 0.147–0.151) in the external validation.
Discussion
Overall, we found that models trained to predict distant recurrence versus locoregional/non-recurrence had better discriminative prediction than those trained to predict distant versus locoregional recurrence. While all models showed satisfactory performance, ensemble models had greater discriminating power (external: time-dependent AUC 0.822–0.842) than linear and deep-learning models (external: time-dependent AUC 0.784–0.796). Since patients with distant recurrence have significantly worse survival, these models can help identify patients who would benefit most from adjuvant immunotherapy. The enhanced performance of ensemble models is likely due to the combination of multiple diverse models used in the methodology and their ability to excel with a relatively modest volume of data.
In our models, Breslow thickness, mitotic rate, tumor site, and the reason for SLNB deferral were ranked as the most important features, which is largely consistent with the literature.35,36 Similarly, our prior study found Breslow thickness and mitotic rate to be the most predictive features for overall primary melanoma recurrence.11 Furthermore, another recent study demonstrated that the binary classification power of stage-related features alone has limitations and may lead to missed recurrent cases.13 Our findings, in conjunction with prior studies, highlight that there are features beyond those included in AJCC staging criteria that can help predict recurrence with improved accuracy to inform clinical recommendations and adjuvant therapy consideration in patients with early-stage melanoma.
Compared with our prior work11, this study focuses on distant metastases and delineates between distant and locoregional recurrences, given the increased risk of mortality and potential benefits from adjuvant immunotherapy for those at risk of distant recurrence. In addition, we included a comprehensive set of time-to-event algorithms. This study underscores opportunities for more sophisticated risk-stratifying tools to assess a patient’s risk for recurrence to inform patient selection for adjuvant therapy. By identifying those at the highest risk of distant recurrence, prediction models can prevent overtreatment and optimize adjuvant therapy benefits. A recent trial showed that adjuvant treatment with pembrolizumab for up to approximately one year reduced the risk of distant metastases by two-fold in patients with stage IIB/IIC melanoma.37 Patients often describe the possibility of recurrence as a psychological and emotional burden, as rapid disease progression can occur after a melanoma recurrence.38 Management considerations, including prognosis, potential adverse events, treatment burden, and costs to the healthcare system and the individual, are complex.39,40 Although the occurrence of grade 3 or worse immune-related adverse events is relatively uncommon, some patients experience life-long toxicities or toxicities that require long-term therapy, such as systemic corticosteroid or hormone replacement.37 As our study illustrates, time-to-event machine-learning algorithms demonstrate the improved ability to capture the complexities of melanoma recurrence that are underrepresented in the current literature and likely hold the promise of future advancements in precision medicine.39
Limitations of this study include the retrospective nature of clinical data extraction, where some variables were unavailable. Nonetheless, our machine-learning models performed satisfactorily in the external validation. Also, our models did not include immunosuppression information and were developed using a relatively small cohort of patients seen in an overall similar geography. Future studies should leverage time-to-event algorithms with immunosuppression information and larger generalized cohorts to guide clinically deployable prediction models. Furthermore, the incorporation of prognostic molecular markers will strengthen the discriminatory power and advance personalized treatment.
Supplementary Material
Funding sources:
YRS is supported in part by the Department of Defense under Award Number W81XWH2110819, the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health under Award Number K23AR080791, and the Melanoma Research Alliance Young Investigator Award.
Footnotes
Conflicts of Interest: YRS is an advisory board member/consultant and has received honoraria from Incyte Corporation, Castle Biosciences, Galderma, and Sanofi outside of the submitted work.
Reprint requests: Yevgeniy R. Semenov, MD, MA
IRB approval status: Reviewed and approved by Mass General Brigham Institutional Review Board (Protocol # 2020P002179).
Patient consent: Not applicable.
Supplemental Materials: https://data.mendeley.com/datasets/ft7w5xmbbv/1
Attachments: CONSORT checklist
Twitter handle: @EugeneSemenovMD
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Code Availability
The code is available at https://github.com/SemenovLab/Time2Event-MelanomaRecurrence.
Data Availability
The data generated for this study can only be shared per specific institutional review board (IRB) requirements. Upon request to the corresponding author, a data sharing agreement can be initiated following institution-specific guidelines.
References
- 1.Killock D. Pembrolizumab reduces recurrence risk in stage II melanoma. Nat Rev Clin Oncol. 2022;19(6):359. doi: 10.1038/s41571-022-00638-w [DOI] [PubMed] [Google Scholar]
- 2.Bleicher J, Swords DS, Mali ME, et al. Recurrence patterns in patients with Stage II melanoma: The evolving role of routine imaging for surveillance. J Surg Oncol. 2020;122(8):1770–1777. doi: 10.1002/jso.26214 [DOI] [PubMed] [Google Scholar]
- 3.von Schuckmann LA, Hughes MCB, Ghiasvand R, et al. Risk of Melanoma Recurrence After Diagnosis of a High-Risk Primary Tumor. JAMA Dermatol. 2019;155(6):688–693. doi: 10.1001/jamadermatol.2019.0440 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Palmieri DJ, Carlino MS. Immune Checkpoint Inhibitor Toxicity. Curr Oncol Rep. 2018;20(9):72. Published 2018 Jul 31. doi: 10.1007/s11912-018-0718-6 [DOI] [PubMed] [Google Scholar]
- 5.Nguyen N, Wan G, Ugwu-Dike P, et al. Influence of melanoma type on incidence and downstream implications of cutaneous immune-related adverse events in the setting of immune checkpoint inhibitor therapy. J Am Acad Dermatol. Feb 22 2023;doi: 10.1016/j.jaad.2023.02.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Leung BW, Wan G, Nguyen N, et al. Increased risk of cutaneous immune-related adverse events in patients treated with talimogene laherparepvec and immune checkpoint inhibitors: A multi-hospital cohort study. J Am Acad Dermatol. Mar 19 2023;doi: 10.1016/j.jaad.2023.02.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Leung B, Zhang S, Nguyen N, Wan G, Jairath R, Alexander N, Phillipps J, Zubiri L, Demehri S, Yu K, Gusev A, Kwatra SG, LeBoeuf N, Reynolds K, & Semenov Y. (2022). 259 tissue-specific homing in cutaneous immune-related adverse events. Journal of Investigative Dermatology, 142(8). 10.1016/j.jid.2022.05.266 [DOI] [Google Scholar]
- 8.Wan G, Leung B, DeSimone M, et al. 217 Time-to-event machine learning prediction of metastatic recurrence of localized melanoma. J Investig Dermatol. 2023;143(5):S37. [Google Scholar]
- 9.Soong SJ, Harrison RA, McCarthy WH, Urist MM, Balch CM. Factors affecting survival following local, regional, or distant recurrence from localized melanoma. J Surg Oncol. 1998;67(4):228–233. doi: [DOI] [PubMed] [Google Scholar]
- 10.Pasquali S, Hadjinicolaou AV, Chiarion Sileni V, Rossi CR, Mocellin S. Systemic treatments for metastatic cutaneous melanoma. Cochrane Database Syst Rev. Feb 6 2018;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wan G, Nguyen N, Liu F et al. Prediction of early-stage melanoma recurrence using clinical and histopathologic features. npj Precis. Onc. 6, 79 (2022). 10.1038/s41698-022-00321-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Matheson JA, Te Marvelde L, Mailer S, Speakman D, Spillane J, Henderson MA, & Gyorki DE. Prospective evaluation of prognostic indicators for early recurrence of cutaneous melanoma. Melanoma Research, 2017;27(1):43–49. [DOI] [PubMed] [Google Scholar]
- 13.Wan G, Leung B, Nguyen N, et al. The impact of stage-related features in melanoma recurrence prediction: A machine learning approach. JAAD Int. 2022;10:28–30. Published 2022 Aug 30. doi: 10.1016/j.jdin.2022.08.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.R: A language and environment for statistical computing. R Foundation for Statistical Computing; 2018. https://www.R-project.org/ [Google Scholar]
- 15.Gershenwald JE & Scolyer RA. Melanoma staging: American Joint Committee on Cancer (AJCC) 8th edition and beyond. Ann. Surg. Oncol. 2018;25:2105–2110. [DOI] [PubMed] [Google Scholar]
- 16.Nalichowski R, Keogh D, Chueh HC & Murphy SN. Calculating the benefits of a Research Patient Data Repository. AMIA Annu. Symp. Proc. 2006; 1044. [PMC free article] [PubMed] [Google Scholar]
- 17.The Enterprise Data Warehouse (EDW): Creating the Foundation for Effective Healthcare Improvement Analytics (Health Catalyst, 2015). [Google Scholar]
- 18.Bureau, U. S. C. Selected Income Characteristics, 2006–2020 American Community Survey 5-year Estimates (United States Census Bureau, 2020). [Google Scholar]
- 19.Roffman CE, Buchanan J & Allison GT. Charlson comorbidities index. J. Physiother. 2016; 62:171. [DOI] [PubMed] [Google Scholar]
- 20.Wong SL et al. Sentinel lymph node biopsy and management of regional lymph nodes in melanoma: American Society of Clinical Oncology and Society of Surgical Oncology Clinical Practice Guideline Update. J. Clin. Oncol. 2018; 36:399–413. [DOI] [PubMed] [Google Scholar]
- 21.Hothorn T, Buhlmann P, Dudoit S, Molinaro A, van der Laan MJ. Survival ensembles. Biostatistics. Jul 2006;7(3):355–73. doi: 10.1093/biostatistics/kxj011 [DOI] [PubMed] [Google Scholar]
- 22.Hemant I, Udaya BK, Eugene HB, Michael SL. Random survival forests. The Annals of Applied Statistics. September/1 2008;2(3):841–860. doi: 10.1214/08-AOAS169 [DOI] [Google Scholar]
- 23.Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of statistical software. 2011;39(5):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kalbfleisch JD. The efficiency of Cox’s likelihood function for censored data. Springer Series in Statistics.:119–129. doi: 10.1007/978-0-387-75692-9_6 [DOI] [Google Scholar]
- 25.Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. Deepsurv: Personalized Treatment Recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology. 2018;18(1). doi: 10.1186/s12874-018-0482-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kvamme H, Borgan Ø, Scheel I. Time-to-Event Prediction with Neural Networks and Cox Regression. Vitenskapelig artikkel. Journal of machine learning research. 2019;20(129):1–30. [Google Scholar]
- 27.Lee C, Zame W, Yoon J, van der Schaar M. DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. Proceedings of the AAAI Conference on Artificial Intelligence. April/26 2018;32(1)doi: 10.1609/aaai.v32i1.11842 [DOI] [Google Scholar]
- 28.Antolini L, Boracchi P, & Biganzoli E. A time-dependent discrimination index for survival data. Statistics in medicine, 2005;24(24):3927–3944. [DOI] [PubMed] [Google Scholar]
- 29.Lambert J, Chevret S. Summary measure of discrimination in survival models based on cumulative/dynamic time--dependent ROC curves. Stat Methods Med Res. Oct 2016;25(5):2088–2102. doi: 10.1177/0962280213515571 [DOI] [PubMed] [Google Scholar]
- 30.Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950;78(1):1–3. [Google Scholar]
- 31.Pölsterl S. Scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. J. Mach. Learn. Res. 2020;21(212):1–6.34305477 [Google Scholar]
- 32.Havakv. Havakv/pycox: Survival analysis with pytorch. GitHub. https://github.com/havakv/pycox. Accessed December 9, 2022. [Google Scholar]
- 33.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B and Grisel O, Blondel M and Prettenhofer P, Weiss R and Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
- 34.Zhang Z, Reinikainen J, Adeleke KA, Pieterse ME, Groothuis-Oudshoorn CGM. Time-varying covariates and coefficients in Cox regression models. Ann Transl Med. 2018;6(7):121. doi: 10.21037/atm.2018.02.12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ahmed I. Malignant melanoma: prognostic indicators. Mayo Clin Proc. 1997;72(4):356–361. doi: 10.4065/72.4.356 [DOI] [PubMed] [Google Scholar]
- 36.Azzola MF, Shaw HM, Thompson JF, et al. Tumor mitotic rate is a more powerful prognostic indicator than ulceration in patients with primary cutaneous melanoma: an analysis of 3661 patients from a single center. Cancer. 2003;97(6):1488–1498. doi: 10.1002/cncr.11196 [DOI] [PubMed] [Google Scholar]
- 37.Luke JJ, Rutkowski P, Queirolo P, et al. Pembrolizumab versus placebo as adjuvant therapy in completely resected stage IIB or IIC melanoma (KEYNOTE-716): a randomised, double-blind, phase 3 trial. Lancet. 2022;399(10336):1718–1729. doi: 10.1016/S0140-6736(22)00562-1 [DOI] [PubMed] [Google Scholar]
- 38.Livingstone A, Milne D, Dempsey K, et al. Should I Have Adjuvant Immunotherapy? An Interview Study Among Adults with Resected Stage 3 Melanoma and Their Partners. Patient. 2021;14(5):635–647. doi: 10.1007/s40271-021-00507-1 [DOI] [PubMed] [Google Scholar]
- 39.Leung B et al. Clinical and histopathologic risk factors for early-stage melanoma recurrence. J. Investig. Dermatol. 2022;142: S113 [Google Scholar]
- 40.Leung BW, Collier MR, Tiu BC, Wan G, Nguyen N, Tang K, Zhang S, Chen W, Chen ST, LeBoeuf NR, Semenov YR. Patterns in utilization of health care services and medications among patients with cutaneous immune-related adverse events: A population-level cohort study. J Am Acad Dermatol. 2023. May;88(5):1215–1218. doi: 10.1016/j.jaad.2022.12.042. Epub 2023 Feb 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code is available at https://github.com/SemenovLab/Time2Event-MelanomaRecurrence.
The data generated for this study can only be shared per specific institutional review board (IRB) requirements. Upon request to the corresponding author, a data sharing agreement can be initiated following institution-specific guidelines.



