Abstract
Objective:
Develop a predictive model to identify patients with 1 pathologic LN (pLN) versus >1pLN using machine learning applied to gene expression profiles and clinical data as input variables.
Summary Background Data:
Standard management for clinically detected melanoma lymph node (cLN) metastases is complete therapeutic LN dissection (TLND). However, more than 40% of patients with cLN will only have 1pLN on final review. Recent data suggest that targeted excision of just the one enlarged LN may provide excellent regional control, with less morbidity than TLND. Selection of patients for less morbid surgery requires accurate identification of those with only 1pLN.
Methods:
The Cancer Genome Atlas (TCGA) database was used to identify patients who underwent TLND for melanoma. Pathology reports in TCGA were reviewed to identify the number of pLNs. Patients were included for machine learning analyses if RNA sequencing data were available from a pLN. After feature selection, the top 20 gene expression and clinical input features were used to train a ridge logistic regression (RLR) model to predict patients with 1pLN vs >1pLN using 10-fold cross validation on 80% of samples. The model was then tested on the remaining hold out samples.
Results:
A total of 153 patients met inclusion criteria: 64 with 1pLN (42%) and 89 with >1pLNs (58%). Feature selection identified 1 clinical (extranodal extension) and 19 gene expression variables used to predict patients with 1pLN versus >1pLN. The RLR model identified patient groups with an accuracy of 90% and an area under the ROC curve (AUC) of 0.97.
Conclusions:
Gene expression profiles together with clinical variables can distinguish melanoma metastasis patients with 1 pLN versus >1 pLN. Future models trained using PET/CT imaging, gene expression, and relevant clinical variables may further improve accuracy and may predict patients who can be managed with a targeted LN excision rather than a complete TLND.
Keywords: machine learning, RNA sequencing, melanoma, complete lymph node dissection
Mini-Abstract
In this retrospective study of 153 patients, 1 clinical and 19 gene expression variables were identified and used to predict patients with 1 pLN versus >1 pLN with an accuracy of 90% and a ROC AUC of 0.97. Distinguishing melanoma patients with 1 pLN versus >1 pLN may inform surgical decision making for who can safely be managed with a targeted LN excision rather than a complete TLND.
INTRODUCTION
The surgical management of melanoma lymph node (LN) metastases is evolving away from routine completion lymph node dissections (CLND)1–3, driven by efforts to reduce morbidity while preserving oncologic outcomes. Patients with clinically occult melanoma LN metastases found on sentinel LN biopsy (SLNB) now undergo nodal observation via ultrasound and adjuvant systemic therapy, as two randomized controlled trials found no overall survival (OS) benefit of immediate CLND compared to nodal observation1, 2. In contrast, complete therapeutic lymph node dissection (TLND) still remains the standard of care for patients with macroscopic, clinically detected (palpable or detected by imaging) melanoma LN metastases (cLNs)4, 5, 6. However, more than 40% of patients with a cLN will only have 1 pLN on final review5–7, and importantly, these patients are at lower risk of recurrence and have increased OS compared to those with >1 pLN7. Thus, one may question whether patients with only 1 cLN may achieve effective regional disease control with a targeted excision of just that one enlarged LN and whether that would result in less morbidity compared to a TLND.
To date, no randomized prospective trials have assessed the survival benefit of TLND compared to targeted LN excision among patients with 1 cLN. However, selective lymphadenectomy for melanoma is already performed in the head and neck8, 9. Additionally, patients with inguinal cLNs who undergo a complete groin dissection (superficial and deep inguinal LNs) have no survival benefit compared to those undergoing only a superficial inguinal lymphadenectomy10, 11. Thus, prognosis may depend more on the extent of LN involvement rather than the extent of the lymphadenectomy performed. Our recent work suggests that patients who underwent a targeted LN excision, most of whom had 1 cLN, had only a 5% risk of recurrence in the same nodal basin prior to systemic metastasis over a median follow up of 32 months (Lynch, KT et al. manuscript submitted). It may, therefore, be reasonable to consider treating selected patients with targeted LN excision. Selection of patients for less morbid surgery requires accurate identification of those with only 1 pLN. Pre-operative positron emission tomography/computed tomography (PET/CT) was able to identify 74.4% of patients with only 1 pLN7. Among those with 1 cLN on pre-operative imaging, clinical characteristics identified patients with less than a 10% risk of additional pLNs in a preliminary study7. Furthermore, gene expression profiling has been successfully used to assess the metastatic potential of melanomas12, 13 and can predict localized melanoma versus those with LN metastases with an accuracy of 89–94%13.
The goal of the present study was to assess whether gene expression profiles of pLN differ between patients with 1 pLN versus >1 pLN in order to develop predictive models to aid in personalized surgical decision making. We hypothesized that (1) gene expression profiles in patients with 1 pLN will reflect intratumoral immune activation or lower malignant potential that are otherwise not detectable with clinical data alone, and that (2) gene expression profiles can be used to predict patients with 1 pLN versus >1 pLN using a machine learning approach.
METHODS
Data Source and Patient Selection
The Cancer Genome Atlas (TCGA) Firehose Legacy dataset14, 15 was used to identify patients with melanoma LN metastases who underwent a TLND and had RNA sequencing (RNA-seq) of a tumor-involved node (TIN). Patients were included in the study if the total number of pLNs were available and RNA-seq data was available from a pLN. Patients were excluded if no pLNs were identified on TLND, if matted nodes were noted, or if visceral metastases were present (Figure 1).
Figure 1.
Identification of patients in The Cancer Genome Atlas (TCGA) database meeting inclusion criteria.
Data Acquisition
Demographic information extracted from the TCGA dataset included age and year of diagnosis, sex, race, ethnicity, Breslow depth, presence of ulceration, mitotic rate, BRAF V600E/K mutation status, primary melanoma site, pathologic stage at diagnosis (T-stage and overall stage), and location of the TLND. RNA-seq data for a tumor involved LN was obtained from the TCGA database. Deidentified pathology reports were manually reviewed to determine the number of pLNs for each patient and to corroborate that a TLND rather than a SLNB was performed. Reports with incomplete data were excluded. Patients were grouped into having only 1 or >1 pLN noted on TLND. The presence or absence of extranodal extension was also obtained from the pathology reports.
For patients in TCGA whose primary melanoma site was designated as “regional lymph nodes” and who had “no” designated for the variable known primary site, the primary melanoma site designation was labelled “unknown” for this analysis. For patients with an unknown primary but had “Tx” designated as their T-stage, the Tx was changed to T0 to reflect standard AJCC staging. This information was further corroborated with pathology reports. For those with missing T-stage, Breslow depth and ulceration status were used to determine T-stage per AJCC 8th edition. Two patients with unknown primaries who underwent TLND but were labeled as Stage IV disease in the TCGA database, were reassigned as Stage III disease per AJCC guidelines for the analysis. The stage IV listings are explained for each in the Supplemental materials.
The primary outcomes of this study were differential gene expression as well as accuracy of the best performing machine learning model at predicting patients with 1 pLN versus >1 pLN. Secondary outcomes included median OS for patients with 1 pLN versus >1 pLN.
Patient demographics along with primary and secondary outcomes were analyzed and summarized. Numerical data was summarized using interquartile range, minimum and maximum values, while categorical data was summarized using count and percentage. Statistical significance of numerical and categorical patient demographic data was determined using the Wilcoxon Rank Sum Test and Fisher’s Exact Test, respectively, using R.
Statistical Analysis- Differential Gene Expression and Gene-Set Enrichment Analysis
Differential gene expression analysis (DGEA) and gene-set enrichment analysis (GSEA)16 were utilized to determine if the metastases of patients with only 1 pLN were distinct from those with >1 pLN. DGEA was conducted using DESeq217 in R. A 5% False-Discovery Rate (FDR) cutoff was applied to identify genes over/under-expressed between those with 1 pLN versus >1 pLN. GSEA was performed using the normalized log2-fold changes derived from DESeq2 as input into the Clusterprofiler package in R to identify significantly enriched pathways in the two groups at a 5% FDR.
Statistical Analysis- Unsupervised and Supervised Machine learning for prediction of LN status
Unsupervised analyses were performed using principal components analysis (PCA) and hierarchical clustering to assess the extent to which gene expression profiles can separate patients with 1 pLN versus >1 pLNs. Hierarchical clustering was conducted on the top 500 most variable genes using the hclust function with Euclidean distance and “complete” clustering method after variance stabilizing transformation (VST)18, 19 of the count matrix.
Feature selection was performed on the initial dataset which consists of 54,155 genes and 6 clinical features. The clinical features with less than 15% missing data across patients included: age at diagnosis, sex, primary tumor site, TLND site, T-stage, and presence of extranodal extension. Feature selection involved the following steps:
Logistic regression was performed on individual features using the sklearn logistic regression package with lbfgs solver. The top 200 features having highest F1 score were considered for next step.
Correlation and clustering analysis was performed on the 200 features using the Python correlation package with Pearson correlation method and then clustering analysis was performed using the fcluster tool of scipy package using distance criterion. We removed features which have less than 0.2 correlation with the binary vector indicating if patients had 1 pLN (0) or > 1pLN (1) (i.e., output or target). Highly co-correlated features (correlation > 0.65) were removed, keeping only one feature which had the highest correlation value with the target, yielding 105 features.
SequentialFeatureSelection (SFS) was applied using the sklearn RidgeClassifer with forward selecting process and 10-fold cross validation, resulting in 20 features.
Finally, these 20 features were included in the final model built by ridge logistic regression using 10-fold cross-validation. The final model was trained on 80% (N = 122) of the 153 patient samples using the R package glmnet and assessed on 20% (N = 31) hold out test data.
Statistical Analysis- Survival Curves
Kaplan-Meier curves were generated to estimate median OS with corresponding lower and upper 95% confidence intervals (CIs). A log-rank test was used to compare OS between patients with 1 pLN versus >1 pLNs and determine statistical significance. OS was defined as the time, in months, from the date of diagnosis to either death or last follow-up. Kaplan-Meier curves stratified by 1 pLN versus >1 pLN were also generated for each gene found to be predictive in our final model.
RESULTS
Among the 479 melanoma patients initially identified, we excluded 108 samples identified as primary melanomas and 64 identified as being in visceral sites. Of the remaining 307 patients, 154 samples were found to represent metastases to one or more LNs that were treated with TLND. As such, a total of 154 patients with melanoma LN metastases who underwent a TLND and had RNA-seq of a tumor-involved node from the TCGA database were included: 64 patients (42%) with 1 pLN and 90 patients (58%) with >1 pLNs (Figure 1). These 154 patient samples were used for the DGEA and GSEA. One of these patients in the >1 pLN group did not have complete clinical variables available so that individual was not used for the machine learning analysis. Among the 89 patients with >1 pLN, the number of patients with 2 pLNs, 3 pLNs, and ≥4 pLNs are 15 (10%), 15 (10%), and 59 (39%) patients, respectively (Figure 2A).
Figure 2.
(A) Distribution of the number of patients with varied numbers of positive lymph nodes. (B) Kaplan-Meier curves for patients with 1 pLN (n=64) versus >1 pLN (n=89). The median OS (lower, upper 95% CI) was 167.8 months (112.5, NA) among patients with 1 pLN compared to 59.4 months (43.4, 118) among those with >1 pLN (p<0.0001). Data were truncated at 180 months since data after that time represented less than 5 patients per group.
Clinicopathologic characteristics of the patient population included in the subsequent analyses are shown in Supplemental Table 1. Breslow depth (median 1.4 for 1 pLN, 2.2 for >1pLN, p=0.033) and extranodal extension (27% vs 63%, p<0.0001) were the only demographic variables significantly different for patients with 1 pLN versus >1 pLN. The majority of melanomas were located on the extremities, with the axilla being the most common site of TLND. Six patients had preoperative systemic therapy, 5 (5.6%) in the >1pLN group and 1 (1.6%) in the 1pLN group. There was no difference in the incidence of preoperative systemic therapy between the groups (p = 0.4, Supplemental Table 2). Multivariable logistic regression was performed to identify clinicopathologic variables independently associated with >1 pLN. Extranodal extension was the only clinicopathologic feature associated with >1pLN with an odds ratio of 2.84 (95%CI 1.90–4.45, Supplemental table 3).
Kaplan-Meier curves demonstrate a significant difference in median OS between patients with 1 pLN versus >1 pLN (Log-Rank p<0.0001, Figure 2B). The median OS (lower, upper 95% CI) was 167.8 months (112.5, NA) among patients with 1 pLN compared to 59.4 months (43.4, 118) among those with >1 pLN.
Tumor Biology and LN Status
DGEA revealed 420 differentially expressed genes between patients with 1 pLN versus >1 pLN (Figure 3A). GSEA revealed 22 pathways that were significantly altered between the two groups (Figure 3B). Significantly downregulated pathways in patients with only 1 pLN compared to >1 pLN included cell cycle activation, oxidative phosphorylation, hypoxia, DNA repair, KRAS signaling, and epithelial to mesenchymal transition (EMT), among others. Significantly upregulated pathways in the group with only 1 pLN included interferon alpha signaling and coagulation. There were trends toward increased interferon-gamma signaling, allograft rejection and other immune related pathways also in those with 1 pLN.
Figure 3.
(A) Differential gene expression analysis showing 420 genes differentially expressed between patients with 1 pLN versus (red dots) >1 pLN (green dots) using DESeq2 and a 5% FDR cutoff. (B) Gene-set enrichment analysis showing 22 pathways significantly altered in patients with 1 pLN compared to those with >1 pLN. Negative and positive enrichment scores reflect downregulated and upregulated pathways in those with 1 pLN compared to >1 pLN, respectively.
Unsupervised & Supervised Machine learning for prediction of LN status
Neither PCA (Supplemental Figure 1A) nor hierarchical clustering (Supplemental Figure 1B) was able to separate them reliably based on gene expression profiles. Among the predictive models explored, (deep neural network20, random forest21, ridge and lasso logistic regression22, 23), the ridge logistic regression model was by far the best performing at predicting patients with either 1 pLN or >1 pLN, with an accuracy of 90% and an area under the ROC curve (AUC) of 0.97 on the test data (Figure 4A/B). The sensitivity and specificity were 91.7% and 89.5% (Figure 4B). The precision, recall and F1 score (a balanced measure of a test’s accuracy) were 84.6%, 91.6%, and 88%, respectively. Sequential forward floating selection identified 20 variables— 1 clinical variable (extranodal extension) and 19 gene expression variables (Table 1)— which were used to identify patients with 1 pLN versus >1 pLN. A separate model excluding extranodal extension, including only the original 19 genes, performed with an accuracy of 87%. Differences in expression of the 19 genes between the two groups are shown in Figure 4C.
Figure 4.
(A) The Receiver Operating Characteristic (ROC) curve for the ridge logistic regression model used on hold out test data showing an area under the curve (AUC) of 0.97 to predict patients with 1 pLN versus >1 pLN. (B) Ridge Logistic Regression Model Confusion Matrix on Hold Out Test Data. (C) Box Plot for 19 Genes in Final Machine Learning Model.
Table 1.
Final Predictive Supervised Machine Learning Model
| Variable Type | Ensembl ID | Gene Name | Gene Type | Chromosome | Model Coefficient |
|---|---|---|---|---|---|
| Gene Expression | ENSG00000177692 | DNAJC28 | Protein coding | 21 | −0.11028974 |
| Gene Expression | ENSG00000166787 | SAA3P | Transcribed unprocessed pseudogene | 11 | −0.12911111 |
| Gene Expression | ENSG00000164611 | PTTG1 | Protein coding | 5 | 0.04933153 |
| Gene Expression | ENSG00000206053 | HN1L | Protein coding | 16 | 0.14938187 |
| Gene Expression | ENSG00000119335 | SET | Protein coding | 9 | 0.06186232 |
| Gene Expression | ENSG00000064115 | TM7SF3 | Protein coding | 12 | −0.14504659 |
| Gene Expression | ENSG00000144559 | TAMM41 | Protein coding | 3 | 0.05935011 |
| Gene Expression | ENSG00000139985 | ADAM21 | Protein coding | 14 | −0.11602226 |
| Gene Expression | ENSG00000138772 | ANXA3 | Protein coding | 4 | −0.07444409 |
| Gene Expression | ENSG00000267695 | RP11–1030E3.1 | LincRNA | 18 | −0.10673755 |
| Gene Expression | ENSG00000265888 | DSCAS | Antisense | 18 | −0.13317391 |
| Gene Expression | ENSG00000265625 | RP11–68I3.11 | Sense intronic | 17 | 0.10957796 |
| Gene Expression | ENSG00000264260 | RP11–94B19.1 | LincRNA | 18 | −0.08360553 |
| Gene Expression | ENSG00000259584 | RP11–521C20.2 | LincRNA | 15 | −0.14609903 |
| Gene Expression | ENSG00000258823 | CTD-2555K7.3 | Unprocessed pseudogene | 14 | −0.08813580 |
| Gene Expression | ENSG00000258354 | MIR3180–1 | LincRNA | 16 | −0.16000250 |
| Gene Expression | ENSG00000279673 | RP11–185E8.2 | TEC | 3 | −0.12467341 |
| Gene Expression | ENSG00000279294 | RP11–274A11.3 | TEC | 16 | −0.12232569 |
| Gene Expression | ENSG00000254761 | RP11–672A2.1 | LincRNA | 11 | −0.09926241 |
| Clinical Variable: Extranodal Extension | N/A | N/A | N/A | N/A | 0.28308154 |
| Intercept | N/A | N/A | N/A | N/A | 0.34560304 |
Abbreviations: N/A, not applicable; LincRNA, long intergenic non-coding RNA; TEC, to be experimentally confirmed
DISCUSSION
These data support our hypotheses that (1) gene expression profiles in patients with 1 pLN will reflect intratumoral immune activation or lower malignant potential that are otherwise not detectable with clinical data alone, and that (2) gene expression profiles can be used to predict patients with 1 pLN versus >1 pLN using a machine learning approach. Given the limited sample size and that we constrained all models to avoid overfitting, it is not surprising that a simple ridge logistic regression model outperformed neural network and random forest models. To our knowledge, this is the first study utilizing RNA-seq data with machine learning to predict melanoma patients with only 1 pLN versus >1 pLN in an attempt to inform surgical decision making. These findings may be useful for preoperative assessment of patients with cLN to identify those who may be safely managed with only excision of the palpable LN. Until the availability of effective systemic therapies, aggressive resection of regional nodes was considered the only hope for cure, but as systemic adjuvant therapy has dramatically improved patient outcomes, and neoadjuvant therapy further improves outcomes, it is critical for surgeons to reconsider the extent of surgery needed, especially for the 40% of patients with disease confined to one LN. The present study presents an approach to identify those patients with high precision, while also identifying gene expression profiles in tumor-involved nodes that may illuminate factors associated with melanoma progression.
DGEA identified genes that were more highly expressed in patients with only 1 pLN were often associated with enhanced immune function (e.g., RPH3A; Rabphilin 3A)24 and decreased tumor invasiveness (e.g., CDH18; Cadherin 18 gene)25. In contrast, genes more highly expressed among patients with >1 pLN were often associated with enhanced cell proliferation, cancer cell survival, and anti-inflammatory functions (e.g., ELFN2; Extracellular Leucine Rich Repeat And Fibronectin Type III Domain Containing-2)26, and with pro-metastatic potential and overall poor survival (e.g., HIF3A; Hypoxia Inducible Factor 3 Subunit Alpha)27. Our supervised machine learning algorithm identified a set of 19 genes: 8 were protein coding, 2 were pseudogenes, 1 was antisense, 1 was sense intronic, and 5 were long intergenic non-coding RNA (lncRNA) (Table 1). The remaining two genes were novel and their functions remain unknown.
Among these 19 genes (Figure 4C), upregulated expression of PTTG1 (Pituitary tumor transforming gene-1, or securin) has been implicated in the tumorigenesis and disease progression of many solid tumors28, 29, including melanoma30, 31. PTTG1 is essential in regulating sister chromatid separation during mitosis32 and induces mitogenic and angiogenic genes c-Myc32, VEGF and bFGF33. PTTG1 is included in gene signatures associated with metastasis and shorter survival in several tumor types34 and associated with the metastatic phenotype in melanoma35. Inhibition of PTTG1 expression impairs proliferation and invasiveness among melanoma cell lines resistant to dabrafenib31. Mechanisms underlying the growth- and invasion-promoting activity of PTTG1 include EMT36, DNA repair, and E2F pathways, and it is relevant that these pathways were upregulated in the GSEA of patients with >1 pLN in our analysis.
Our 19-gene set also included HN1L (Haematological and neurological expressed 1‐like) which also activates E2F pathways and is associated with tumorigenesis, metastasis, and overall poor prognosis in breast cancer via enhancing MYC activity37. Our algorithm also identified ANXA3 (Annexin A3), which is upregulated by HIF1A (Hypoxia Inducible Factor 1-Alpha) and associated with tumor progression, metastasis, and poor prognosis in colon cancer38, breast cancer39, and hepatocellular carcinoma40. From our DGEA (Figure 3A) and GSEA (Figure 3B), those with >1 pLN had significantly upregulated HIF3A and hypoxia pathway genes, respectively.
The enrichment for lncRNAs (26% (5/19)) in the list of 19 genes (Table 1) is intriguing. Dysregulated expression of lncRNAs has been implicated in the development of numerous cancers including melanoma 41, 42, 43, potentially through their role in regulating cell proliferation, apoptosis, invasion, and/or differentiation. Our findings support a potentially important role for lncRNAs in the development and/or progression of melanoma. As such, further investigation into the pathophysiological role of the 5 lncRNAs identified herein may aid in better understanding melanoma progression.
We also identified one clinical variable from feature selection: extranodal extension (ENE). ENE was present in 17 patients (27%) with 1 pLN and 59 patients (66%) with >1 pLN. ENE is associated with higher rates of regional recurrence, distant metastasis, and worse OS and is one factor currently used to guide adjuvant radiation therapy44. ENE is usually not known until after LN biopsy. Thus, a targeted LN excision could be done first and if ENE is found, this clinical factor can be used when considering CLND. Alternatively, a model using only gene expression, which we show is 87% accurate and would only require a core biopsy, could be used. There was a significant difference in Breslow depth between patients with 1 pLN versus >1 pLN, but this clinical variable was not used in the machine learning algorithm because 39 patients (25%) had a missing value, which would have greatly limited the power of our model. However, Breslow depth may be used to enhance future models if all samples have such data.
Neoadjuvant therapy is promising for patients with cLN and is being studied in various settings and combinations. A single dose of PD-1 Ab can induce pathologic complete response (pCR) or near-CR in about 30% of patients45, and combined therapy with PD-1 and CTLA-4 antibodies can induce pCR in even higher proportions. The PRADO trial therefore offers to excise only the index (largest) lymph node after two doses of ipilimumab and nivolumab, and to avoid other surgical management if no viable tumor remains in that node 46–48. Among participants with metastatic disease to regional nodes, a pCR was observed in 49%, a pathologic near-CR in 12%48, 49. Thus, a pCR or near pCR was identified in 61%, but residual viable melanoma was identified in 50%. Interestingly, 58% of the patients on the PRADO trial had only 1cLN based on PET-CT scan, and pathologic response rates were similar for 1 cLN and >1 cLN49. These findings offer a promising approach to reduce the need for TLND in patients who are eligible for IPI/NIVO neoadjuvant therapy and who have a major pathologic response. However, for those without a pCR or near pCR, TLND was still recommended. Recently, an initial report of the S1801 clinical trial showed that neoadjuvant pembrolizumab (then surgery, followed by adjuvant pembrolizumab) significantly prolonged event-free survival compared to surgery followed by adjuvant pembrolizumab50. In that trial, the pCR rate was 21%; so, 79% still had viable tumor, and the study design for that trial was that all patients with stage III melanoma were to be treated with TLND. Thus, the extent of surgery after neoadjuvant therapy can be reconsidered for this majority of patients.
We propose that if patients can be reliably identified as having melanoma in only 1 node, excision of that node should be sufficient surgical management, regardless of response to neoadjuvant therapy. Importantly, neoadjuvant therapy prolongs event-free survival over adjuvant therapy; and it is now important to reconsider whether all these patients still require TLND. Especially if those with only 1 pLN can be determined accurately at presentation, we posit that those patients can be spared unnecessary resection of all nodes. Also, neoadjuvant checkpoint blockade therapy is not an ideal option for patients with autoimmune disease, immune suppression, or those who undergo an excisional biopsy of the one enlarged node before knowing the diagnosis. Further, its effectiveness may be altered in patients who have recurred in regional nodes after receiving PD-1 antibody therapy in the adjuvant setting for stage II melanoma. Thus, we propose that accurate identification of patients with 1 pLN may support clinical trials of surgical management regardless of whether the patient undergoes, or responds to, neoadjuvant therapy.
Limitations
This study has several limitations. The data were retrospective, and clinical data were limited from TCGA. The relatively small sample size (N=153) for the predictive modeling is a limitation, especially when considering the 20% hold out test sample size (N=31). In the hold-out test sample, the NPV was 94%, and the PPV was 85%, but we acknowledge that the precision of those estimates is limited by the sample size. As such, a validation study is recommended. Nonetheless, the findings are novel and offer a chance to use molecular data to personalize surgical management. Since we have recently shown that PET/CT imaging can accurately identify patients with 1 pLN in 74% of patients7, a subsequent validation study combining gene expression and clinical variables with PET/CT may further enhance accuracy above the 90% reported in the current study. The TCGA data do not provide access to imaging data, which is a limitation, but we would anticipate that imaging data would likely enhance the accuracy of predictive modeling presented here. Overall, these data provide rationale for future prospective studies that build on the approach we have defined and include PET/CT imaging. Ideally such work will lead to assessments of clinical outcomes for patients with 1 cLN who are treated with a targeted LN excision, integrated with best available neoadjuvant therapy.
CONCLUSION
Molecular profiling is increasingly being used to inform clinical decisions for cancer therapy. Our supervised machine learning analysis reveals that gene expression profiles together with clinical variables can be used to distinguish melanoma metastasis patients with 1 pLN versus those with >1 pLN. DGEA revealed that tumors from patients with 1 pLN had lower expression of genes important in tumor progression and higher expression of genes with immune function. Future models trained using PET/CT imaging, gene expression, and relevant clinical variables may further improve accuracy and may predict patients who can safely be managed with a targeted LN excision rather than a complete TLND.
Supplementary Material
ACKNOWLEDGEMENTS
Data from this project were obtained from The Cancer Genome Atlas (cbioportal.org).
Funding Source:
Financial support was provided by the United States Public Health Services Training Grants T32CA163177 (M.O.M., R.D.V) and T32HL007849 (K.T.L.), and P30 CA044579 (Bioinformatics Core), and the Rebecca Clary Harris Memorial Fellowship from the University of Virginia (K.T.L.), and philanthropy from Hackney Family Charitable Foundation.
Conflicts of Interest:
Craig L. Slingluff, Jr. has the following disclosures, none of which are felt to represent conflicts of interests regarding the present manuscript: Research support to the University of Virginia from Celldex (funding, drug), Glaxo-Smith Kline (funding), Merck (funding, drug), 3M (drug), Theraclion (device staff support); Funding to the University of Virginia from Polynoma for PI role on the MAVIS Clinical Trial; Funding to the University of Virginia for roles on Scientific Advisory Boards for Immatics and CureVac. Also, Craig L. Slingluff, Jr. receives licensing fee payments through the UVA Licensing and Ventures Group for patents for peptides used in cancer vaccines. Stefan Bekiranov consults with Glaxo-Smith Kline on quantum computing and quantum machine learning which does not represent a conflict of interest regarding the present manuscript.
REFERENCES
- 1.Faries MB, Thompson JF, Cochran AJ, et al. Completion Dissection or Observation for Sentinel-Node Metastasis in Melanoma. N Engl J Med 2017; 376(23):2211–2222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Leiter U, Stadler R, Mauch C, et al. Complete lymph node dissection versus no dissection in patients with sentinel lymph node biopsy positive melanoma (DeCOG-SLT): a multicentre, randomised, phase 3 trial. Lancet Oncol 2016; 17(6):757–767. [DOI] [PubMed] [Google Scholar]
- 3.Hyngstrom JR, Chiang YJ, Cromwell KD, et al. Prospective assessment of lymphedema incidence and lymphedema-associated symptoms following lymph node surgery for melanoma. Melanoma Res 2013; 23(4):290–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.National Comprehensive Cancer Network (NCCN) clinical practice guidelines in oncology, cutaneous melanoma, Version 2.2021. Available at: https://www.nccn.org/guidelines/category_1#melanoma. [DOI] [PubMed]
- 5.White RR, Stanley WE, Johnson JL, et al. Long-term survival in 2,505 patients with melanoma with regional lymph node metastasis. Ann Surg 2002; 235(6):879–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wevers KP, Bastiaannet E, Poos HP, et al. Therapeutic lymph node dissection in melanoma: different prognosis for different macrometastasis sites? Ann Surg Oncol 2012; 19(12):3913–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kwak M, Song Y, Gimotty PA, et al. Characteristics Associated with Pathologic Nodal Burden in Patients Presenting with Clinical Melanoma Nodal Metastasis. Ann Surg Oncol 2019; 26(12):3962–3971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.O’Brien CJ, Petersen-Schaefer K, Ruark D, et al. Radical, modified, and selective neck dissection for cutaneous malignant melanoma. Head Neck 1995; 17(3):232–41. [DOI] [PubMed] [Google Scholar]
- 9.Geltzeiler M, Monroe M, Givi B, et al. Regional control of head and neck melanoma with selective neck dissection. JAMA Otolaryngol Head Neck Surg 2014; 140(11):1014–8. [DOI] [PubMed] [Google Scholar]
- 10.Egger ME, Brown RE, Roach BA, et al. Addition of an iliac/obturator lymph node dissection does not improve nodal recurrence or survival in melanoma. J Am Coll Surg 2014; 219(1):101–8. [DOI] [PubMed] [Google Scholar]
- 11.van der Ploeg AP, van Akkooi AC, Schmitz PI, et al. Therapeutic surgical management of palpable melanoma groin metastases: superficial or combined superficial and deep groin lymph node dissection. Ann Surg Oncol 2011; 18(12):3300–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Haqq C, Nosrati M, Sudilovsky D, et al. The gene expression signatures of melanoma progression. Proc Natl Acad Sci U S A 2005; 102(17):6092–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bhalla S, Kaur H, Dhall A, et al. Prediction and Analysis of Skin Cancer Progression using Genomics Profiles of Patients. Sci Rep 2019; 9(1):15790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cerami E, Gao J, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 2(5):401–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gao J, Aksoy BA, Dogrusoz U, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 2013; 6(269):pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005; 102(43):15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15(12):550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huber W, von Heydebreck A, Sueltmann H, et al. Parameter estimation for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol Biol 2003; 2:Article3. [DOI] [PubMed] [Google Scholar]
- 19.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol 2010; 11(10):R106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Goodfellow-et-al. Deep Learning: MIT Press, 2016. [Google Scholar]
- 21.Breiman L. Random Forests. Machine Learning 2001; 45:5–32. [Google Scholar]
- 22.Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Vol. 12: Technometrics; 1970:pp. 55–67. [Google Scholar]
- 23.Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 1996; 58:267–288. [Google Scholar]
- 24.Yuan Q, Ren C, Xu W, et al. PKN1 Directs Polarized RAB21 Vesicle Trafficking via RPH3A and Is Important for Neutrophil Adhesion and Ischemia-Reperfusion Injury. Cell Rep 2017; 19(12):2586–2597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bai YH, Zhan YB, Yu B, et al. A Novel Tumor-Suppressor, CDH18, Inhibits Glioma Cell Invasiveness Via UQCRC2 and Correlates with the Prognosis of Glioma Patients. Cell Physiol Biochem 2018; 48(4):1755–1770. [DOI] [PubMed] [Google Scholar]
- 26.Zhang YQ, Zhang JJ, Song HJ, et al. Overexpression of CST4 promotes gastric cancer aggressiveness by activating the ELFN2 signaling pathway. Am J Cancer Res 2017; 7(11):2290–2304. [PMC free article] [PubMed] [Google Scholar]
- 27.Zhou X, Guo X, Chen M, et al. HIF-3α Promotes Metastatic Phenotypes in Pancreatic Cancer by Transcriptional Regulation of the RhoC-ROCK1 Signaling Pathway. Mol Cancer Res 2018; 16(1):124–134. [DOI] [PubMed] [Google Scholar]
- 28.Dai L, Song ZX, Wei DP, et al. CDC20 and PTTG1 are Important Biomarkers and Potential Therapeutic Targets for Metastatic Prostate Cancer. Adv Ther 2021; 38(6):2973–2989. [DOI] [PubMed] [Google Scholar]
- 29.Ren Q, Jin B. The clinical value and biological function of PTTG1 in colorectal cancer. Biomed Pharmacother 2017; 89:108–115. [DOI] [PubMed] [Google Scholar]
- 30.Winnepenninckx V, Debiec-Rychter M, Beliën JA, et al. Expression and possible role of hPTTG1/securin in cutaneous malignant melanoma. Mod Pathol 2006; 19(9):1170–80. [DOI] [PubMed] [Google Scholar]
- 31.Caporali S, Alvino E, Lacal PM, et al. Targeting the PTTG1 oncogene impairs proliferation and invasiveness of melanoma cells sensitive or with acquired resistance to the BRAF inhibitor dabrafenib. Oncotarget 2017; 8(69):113472–113493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Pei L. Identification of c-myc as a down-stream target for pituitary tumor-transforming gene. J Biol Chem 2001; 276(11):8484–91. [DOI] [PubMed] [Google Scholar]
- 33.Malik MT, Kakar SS. Regulation of angiogenesis and invasion by human Pituitary tumor transforming gene (PTTG) through increased expression and secretion of matrix metalloproteinase-2 (MMP-2). Mol Cancer 2006; 5:61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ramaswamy S, Ross KN, Lander ES, et al. A molecular signature of metastasis in primary solid tumors. Nat Genet 2003; 33(1):49–54. [DOI] [PubMed] [Google Scholar]
- 35.Winnepenninckx V, Lazar V, Michiels S, et al. Gene expression profiling of primary cutaneous melanoma and clinical outcome. J Natl Cancer Inst 2006; 98(7):472–82. [DOI] [PubMed] [Google Scholar]
- 36.Yoon CH, Kim MJ, Lee H, et al. PTTG1 oncogene promotes tumor malignancy via epithelial to mesenchymal transition and expansion of cancer stem cell population. J Biol Chem 2012; 287(23):19516–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhang C, Xu B, Lu S, et al. HN1 contributes to migration, invasion, and tumorigenesis of breast cancer by enhancing MYC activity. Mol Cancer 2017; 16(1):90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Du K, Ren J, Fu Z, et al. ANXA3 is upregulated by hypoxia-inducible factor 1-alpha and promotes colon cancer growth. Transl Cancer Res 2020; 9(12):7440–7449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhou T, Liu S, Yang L, et al. The expression of ANXA3 and its relationship with the occurrence and development of breast cancer. J buon 2018; 23(3):713–719. [PubMed] [Google Scholar]
- 40.Liu C, Li N, Liu G, et al. Annexin A3 and cancer. Oncol Lett 2021; 22(6):834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhang W, Guan X, Tang J. The long non-coding RNA landscape in triple-negative breast cancer. Cell Prolif 2021; 54(2):e12966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tripathi MK, Doxtater K, Keramatnia F, et al. Role of lncRNAs in ovarian cancer: defining new biomarkers for therapeutic purposes. Drug Discov Today 2018; 23(9):1635–1643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yu X, Zheng H, Tse G, et al. Long non-coding RNAs in melanoma. Cell Prolif 2018; 51(4):e12457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Straker RJ 3rd, Song Y, Sun J, et al. Adjuvant Radiation Therapy for Clinical Stage III Melanoma in the Modern Therapeutic Era. Ann Surg Oncol 2021; 28(7):3512–3521. [DOI] [PubMed] [Google Scholar]
- 45.Huang AC, Orlowski RJ, Xu X, et al. A single dose of neoadjuvant PD-1 blockade predicts clinical outcomes in resectable melanoma. Nat Med 2019; 25(3):454–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.van Akkooi ACJ, Hieken TJ, Burton EM, et al. Neoadjuvant Systemic Therapy (NAST) in Patients with Melanoma: Surgical Considerations by the International Neoadjuvant Melanoma Consortium (INMC). Ann Surg Oncol 2022. [DOI] [PubMed]
- 47.Reijers ILM, Rawson RV, Colebatch AJ, et al. Representativeness of the Index Lymph Node for Total Nodal Basin in Pathologic Response Assessment After Neoadjuvant Checkpoint Inhibitor Therapy in Patients With Stage III Melanoma. JAMA Surg 2022; 157(4):335–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Blank CU, Reijers ILM, Pennington T, et al. First safety and efficacy results of PRADO: A phase II study of personalized response-driven surgery and adjuvant therapy after neoadjuvant ipilimumab (IPI) and nivolumab (NIVO) in resectable stage III melanoma. J. Clin Oncol 2020; 38(15 suppl). [Google Scholar]
- 49.Reijers ILM, Menzies AM, van Akkooi ACJ, et al. Personalized response-directed surgery and adjuvant therapy after neoadjuvant ipilimumab and nivolumab in high-risk stage III melanoma: the PRADO trial. Nat Med 2022; 28(6):1178–1188. [DOI] [PubMed] [Google Scholar]
- 50.Patel MO S, Prieto V, Lowe M, Buchbinder E, CHEN Y, Hyngstrom J, Lao CD, Truong T, Chandra S, Kendra K, Devoe C, Hedge A, Mangla A, Sharon E, Korde L, Moon J, Sondak V, Ribas A. LBA6 - Neoadjuvant versus adjuvant pembrolizumab for resected stage III-IV melanoma (SWOG S1801). Annals of Oncology 2022; 33 (suppl_7): S808–S869. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




