Abstract
Background
It is arguable whether individuals with T1–T2 papillary thyroid cancer (PTC) who have a clinically negative (cN0) diagnosis should undergo prophylactic central lymph node dissection (pCLND) on a routine basis. Many inflammatory indices, including the neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), monocyte-to-lymphocyte ratio (MLR), and systemic immune-inflammatory index (SII), have been reported in PTC. However, the associations between the systemic inflammation response index (SIRI) and the risk of central lymph node metastasis (CLNM) remain unclear.
Methods
Retrospective research involving 1,394 individuals with cN0T1–T2 PTC was carried out, and the included patients were randomly allocated into training (70%) and testing (30%) subgroups. The preoperative inflammatory indices and ultrasound (US) features were used to train the models. To assess the forecasting factors as well as drawing nomograms, the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression were utilized. Then eight interpretable models based on machine learning (ML) algorithms were constructed, including decision tree (DT), K-nearest neighbor (KNN), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost). The performance of the models was evaluated by incorporating the area under the precision-recall curve (auPR) and the area under the receiver operating characteristic curve (auROC), as well as other conventional metrics. The interpretability of the optimum model was illustrated via the shapley additive explanations (SHAP) approach.
Results
Younger age, larger tumor size, capsular invasion, location (lower and isthmus), unclear margin, microcalcifications, color Doppler flow imaging (CDFI) blood flow, and higher SIRI (≥0.77) were independent positive predictors of CLNM, whereas female sex and Hashimoto thyroiditis were independent negative predictors, and nomograms were subsequently constructed. Taking into account both the auROC and auPR, the RF algorithm showed the best performance, and superiority to XGBoost, CatBoost and ANN. In addition, the role of key variables was visualized in the SHAP plot.
Conclusions
An interpretable ML model based on the SIRI and US features can be used to predict CLNM in individuals with cN0T1–T2 PTC.
Keywords: Central lymph node metastasis (CLNM), papillary thyroid cancer (PTC), systemic inflammation response index (SIRI), machine learning (ML), shapley additive explanations (SHAP)
Highlight box.
Key findings
• The systemic inflammation response index (SIRI) has now been identified as a risk predictor for central lymph node metastasis (CLNM).
• We constructed and verified eight machine learning models based on SIRI and ultrasound features to evaluate CLNM risk in patients with cN0T1–T2 papillary thyroid cancer (PTC); the random forest model performed the best followed by extreme gradient boosting, categorical boosting, and artificial neural network.
• The interpretability of the models was illustrated via the SHapley Additive exPlanations approach.
What is known and what is new?
• Several researchers have discovered that monocyte-to-lymphocyte ratio, platelet-to-lymphocyte ratio, neutrophil-to-lymphocyte ratio, and systemic immune-inflammatory index are predictive factors for CLNM and lateral lymph node metastasis (LLNM).
• A pioneering exploration of the predictive value of SIRI in CLNM was carried out.
What is the implication, and what should change now?
• Preoperative prediction of CLNM may benefit patients with cN0 T1–T2 PTC.
Introduction
The primary histologic subtype of thyroid tumors is papillary thyroid cancer (PTC) (1,2). Individuals with lymph node metastasis (LNM) including central lymph node metastasis (CLNM) and lateral lymph node metastasis (LLNM) exhibit greater likelihood of disease persistence, recurrence, and re-operative surgery even though PTC commonly appears as an inert tumor (3-7). Meanwhile, an elevated incidence of postoperative complications is caused by prophylactic central lymph node dissection (pCLND) (8,9). pCLND should not be advised for individuals with the cN0T1–T2 subgroup following the principles of the 2015 American Thyroid Association (ATA) guidelines, whereas guidelines in the majority of East Asian nations prefer pCLND given the premise of ensuring sufficient parathyroid gland and recurrent laryngeal nerve protection (10-12). Consequently, it is essential to establish a reliable preoperative forecasting algorithm model and to determine potential indicators of CLNM.
Studies have revealed that the inflammatory index, including the systemic immune-inflammatory index (SII), neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR) and monocyte-to-lymphocyte ratio (MLR), can serve as crucial indicators regarding the malignant biological behavior that occurs in several cancers (13-17). Recently, a new blood inflammatory index, the systemic inflammation response index (SIRI), has been utilized for forecasting the prognosis of breast cancer, nasopharyngeal carcinoma, cervical cancer, pancreatic cancer, and colorectal cancer (18-22). However, the relationship between preoperative SIRI levels in peripheral blood and the risk of CLNM in PTC remains unclear.
Machine learning (ML) is a robust collection of algorithms equipped to comprehend, modify, evaluate, and forecast records. It has been extensively utilized to investigate a wide range of illnesses (23). Common ML algorithms include the decision tree (DT), random forest (RF), artificial neural network (ANN), support vector machine (SVM), and K-nearest neighbors (KNN) algorithms (24). categorical boosting (CatBoost), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) are the most recognized frameworks in the academic discipline of boosting. They stand out among the alternative boosting algorithms owing to the integration of weak classifiers that minimize the loss function. However, most ML algorithms trigger black box issues (25). To tackle the inexplicability dilemma, the shapley additive explanations (SHAP) approach is presented as an alternative solution (26).
Hence, by using the preoperative inflammatory index and ultrasound (US) features, we embarked on establishing and validating eight interpretable ML models to assess CLNM likelihood in individuals with cN0T1–T2 PTC. We presented this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-23-349/rc).
Methods
Study population
A retrospective analysis was carried out on individuals with cN0T1–T2 PTC between January 2020 and December 2021. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Committee of Third Xiangya Hospital, Central South University (No. quick23472), and individual consent for this retrospective analysis was waived.
Data collection
The inclusion criteria were as follows: (I) T1–T2 (≤40 mm); (II) cN0 (preoperatively non-suspicious positive lymph node); (III) PTC (firstly diagnosed). The exclusion criteria were as follows: (I) cervical irradiation during childhood; (II) previously diagnosed with head and neck cancer or any other kind of tumors (n=36, including secondary metastases to the lungs, primary tumors in other sites, etc.); (III) individuals with preoperative infection or other inflammation (except Hashimoto thyroiditis) (n=16, including rheumatoid arthritis, Sjogren syndrome, etc.); (IV) individuals who suffered from multiple organ dysfunction such as heart failure, liver failure, or uremia (n=9); (V) primary or secondary illnesses causing abnormalities of the blood system (n=6, including aplastic anemia, etc.); (VI) incompleteness of medical records (n=58). Ultimately, a total of 125 individuals were excluded; thus, 1,394 individuals were included to establish and assess the model, and they were randomly allocated into training (70%) and testing (30%) subsets.
Surgical strategy
Individuals diagnosed with PTC routinely undergo pCLND in our department. Thyroid lobectomy or total thyroidectomy corresponds to ipsilateral or bilateral pCLND, respectively.
Inflammatory indices and US features
The 20 features were as follows: gender, age, tumor size, capsular invasion, laterality, multifocality, location, echogenicity, solid composition (almost 100% solid component), unclear margin, shape, aspect ratio, microcalcifications, color Doppler flow imaging (CDFI) blood flow, Hashimoto thyroiditis and inflammatory indices, including MLR, PLR, NLR, SII, and SIRI. The inflammatory index ranks were split into high and low subgroups, which were determined by the receiver operating characteristic (ROC) curve’s optimal cutoff value for presence of lymph-node metastasis. By applying the aforementioned strategies, each of the optimum cut-off value was determined: NLR (low <1.83, high ≥1.83), PLR (low <146.58, high ≥146.58), MLR (low <0.27, high ≥0.27), SII (low <395.22, high ≥395.22), and SIRI (low <0.77, high ≥0.77).
Construction of the nomogram
We ran 10-fold cross-validation codes to calculate the optimum punishment parameter and conduct dimensionality reduction procedures on the statistical framework. Then, we screened the least absolute shrinkage and selection operator (LASSO) regression to obtain nonzero coefficient features. Subsequently, a nomogram plot was drawn based on the results of the multivariate regression analysis. Currently recognized tools for nomogram evaluation include the ROC curve, calibration curve, and decision curve analysis (DCA), all of which were employed.
Development, evaluation, and visual interpretation of ML models
There were eight supervised ML algorithms, including DT, KNN, SVM, ANN, RF, CatBoost, LightGBM, and XGBoost. Fivefold cross-validation was utilized to diminish overfitting, and then we performed repeated testing and tuning to obtain the optimal model parameters. The sensitivity, specificity, area under the ROC curve (auROC), accuracy, precision, recall, area under the precision-recall curve (auPR), F1-score, and Matthews correlation coefficient (MCC) of the ML algorithms were calculated. We used the confusion matrix as a visual illustration and employed DCA for assessing the clinical usefulness. Through game-theoretic tactics, SHAP presents a superior visual tool for evaluating the significance of the attributes.
Statistical analysis
We ran the software R (version 4.3.0), Anaconda 3, and Python (version 3.10.9) environments as statistical tools. The following packages were running in the generation of the code for algorithms: “pROC”, “caret”, “glmnet”, “rms”, “ggDCA”, “ggplot2”, “tidymodels”, “fastshap”, “bonsai”, “treesnip”, and “reticulate”.
Results
Patient characteristics
Table 1 displays the baseline traits of CLNM(+) and CLNM(−). There were no significant differences between the training and testing subsets (P>0.05) (Table 2).
Table 1. Baseline characteristics of the whole cohort grouped by lymph node status.
Variables | Total (n=1,394) | CLNM(−) (n=718) | CLNM(+) (n=676) | P | χ2 |
---|---|---|---|---|---|
Gender | <0.001 | 37.345 | |||
Male | 330 (23.7) | 121 (16.9) | 209 (30.9) | ||
Female | |||||
Age (years) | <0.001 | 107.798 | |||
>55 | 192 (13.8) | 139 (19.4) | 53 (7.8) | ||
40–55 | 596 (42.8) | 359 (50.0) | 237 (35.1) | ||
<40 | 606 (43.5) | 220 (30.6) | 386 (57.1) | ||
Tumor size (mm) | <0.001 | 90.250 | |||
<10 | 1,027 (73.7) | 606 (84.4) | 421 (62.3) | ||
10–20 | 299 (21.4) | 97 (13.5) | 202 (29.9) | ||
21–40 | 68 (4.9) | 15 (2.1) | 53 (7.8) | ||
Capsular invasion | <0.001 | 40.393 | |||
No | 1,269 (91.0) | 688 (95.8) | 581 (85.9) | ||
Yes | – | – | – | ||
Laterality | <0.001 | 18.010 | |||
Unilateral | 1,095 (78.6) | 597 (83.1) | 498 (73.7) | ||
Bilateral | – | – | – | ||
Multifocality | <0.001 | 15.323 | |||
Solitary tumor | 894 (64.1) | 496 (69.1) | 398 (58.9) | ||
Multifocal tumor | – | – | – | ||
Location | <0.001 | 71.795 | |||
Upper | 257 (18.4) | 170 (23.7) | 87 (12.9) | ||
Middle | 631 (45.3) | 360 (50.1) | 271 (40.1) | ||
Lower | 373 (26.8) | 141 (19.6) | 232 (34.3) | ||
Isthmus | 133 (9.5) | 47 (6.5) | 86 (12.7) | ||
Echogenicity | 0.006 | 12.584 | |||
Hyper or isoechoic | 22 (1.6) | 11 (1.5) | 11 (1.6) | ||
Mixed-echoic | 64 (4.6) | 28 (3.9) | 36 (5.3) | ||
Hypo-echoic | 1,277 (91.6) | 672 (93.6) | 605 (89.5) | ||
Very hypo-echoic | 31 (2.2) | 7 (1.0) | 24 (3.6) | ||
Solid composition | <0.001 | 20.749 | |||
No | 939 (67.4) | 524 (73.0) | 415 (61.4) | ||
Yes | – | – | – | ||
Unclear margin | <0.001 | 37.997 | |||
No | 399 (28.6) | 258 (35.9) | 141 (20.9) | ||
Yes | – | – | – | ||
Shape | <0.001 | 19.172 | |||
Regular | 932 (66.9) | 519 (72.3) | 413 (61.1) | ||
Irregular or lobulated | |||||
Aspect ratio | 0.003 | 8.692 | |||
A/T <1 | 757 (54.3) | 362 (50.4) | 395 (58.4) | ||
A/T ≥1 | – | – | – | ||
Microcalcifications | <0.001 | 95.395 | |||
No | 458 (32.9) | 322 (44.8) | 136 (20.1) | ||
Yes | – | – | – | ||
CDFI blood flow | <0.001 | 120.559 | |||
No | 1,072 (76.9) | 639 (89.0) | 433 (64.1) | ||
Yes | – | – | – | ||
Hashimoto thyroiditis | <0.001 | 28.898 | |||
No | 924 (66.3) | 428 (59.6) | 496 (73.4) | ||
Yes | – | – | – | ||
NLR | <0.001 | 14.022 | |||
Low (<1.83) | 617 (44.3) | 353 (49.2) | 264 (39.1) | ||
High (≥1.83) | – | – | – | ||
PLR | 0.116 | 2.472 | |||
Low (<146.58) | 875 (62.8) | 436 (60.7) | 439 (64.9) | ||
High (≥146.58) | – | – | – | ||
MLR | 0.007 | 7.396 | |||
Low (<0.27) | 1,123 (80.6) | 599 (83.4) | 524 (77.5) | ||
High (≥0.27) | – | – | – | ||
SII | <0.001 | 13.736 | |||
Low (<395.22) | 528 (37.9) | 306 (42.6) | 222 (32.8) | ||
High (≥395.22) | – | – | – | ||
SIRI | <0.001 | 17.874 | |||
Low (<0.77) | 791 (56.7) | 447 (62.3) | 344 (50.9) | ||
High (≥0.77) | – | – | – |
Data are presented as N (%). CLNM, central lymph node metastasis; A/T, aspect ratio (height divided by width on transverse views); CDFI, color Doppler flow imaging; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; MLR, monocyte-to-lymphocyte ratio; SII, systemic immune-inflammatory index; SIRI, systemic inflammation response index.
Table 2. Baseline characteristics between the training and testing sets.
Variables | Total (n=1,394) | Training set (n=976) | Testing set (n=418) | P | χ2 |
---|---|---|---|---|---|
CLNM | 0.834 | 0.044 | |||
Negative group | 718 (51.5) | 505 (51.7) | 213 (51.0) | ||
Positive group | – | – | – | ||
Gender | 0.147 | 2.104 | |||
Male | 330 (23.7) | 220 (22.5) | 110 (26.3) | ||
Female | – | – | – | ||
Age (years) | 0.408 | 1.794 | |||
>55 | 192 (13.8) | 138 (14.1) | 54 (12.9) | ||
40–55 | 596 (42.8) | 425 (43.5) | 171 (40.9) | ||
<40 | 606 (43.5) | 413 (42.3) | 193 (46.2) | ||
Tumor size (mm) | 0.963 | 0.075 | |||
<10 | 1,027 (73.7) | 718 (73.6) | 309 (73.9) | ||
10–20 | 299 (21.4) | 211 (21.6) | 88 (21.1) | ||
21–40 | 68 (4.9) | 47 (4.8) | 21 (5.0) | ||
Capsular invasion | 0.542 | 0.372 | |||
No | 1,269 (91.0) | 885 (90.7) | 384 (91.9) | ||
Yes | – | – | – | ||
Laterality | 0.554 | 0.35 | |||
Unilateral | 1,095 (78.6) | 762 (78.1) | 333 (79.7) | ||
Bilateral | – | – | – | ||
Multifocality | 0.433 | 0.614 | |||
Solitary tumor | 894 (64.1) | 619 (63.4) | 275 (65.8) | ||
Multifocal tumor | – | – | – | ||
Location | 0.177 | 4.936 | |||
Upper | 257 (18.4) | 183 (18.8) | 74 (17.7) | ||
Middle | 631 (45.3) | 448 (45.9) | 183 (43.8) | ||
Lower | 373 (26.8) | 263 (26.9) | 110 (26.3) | ||
Isthmus | 133 (9.5) | 82 (8.4) | 51 (12.2) | ||
Echogenicity | 0.763 | 1.159 | |||
Hyper or isoechoic | 22 (1.6) | 15 (1.5) | 7 (1.7) | ||
Mixed-echoic | 64 (4.6) | 42 (4.3) | 22 (5.3) | ||
Hypo-echoic | 1,277 (91.6) | 899 (92.1) | 378 (90.4) | ||
Very hypo-echoic | 31 (2.2) | 20 (2.0) | 11 (2.6) | ||
Solid composition | >0.99 | 0 | |||
No | 939 (67.4) | 657 (67.3) | 282 (67.5) | ||
Yes | – | – | – | ||
Unclear margin | 0.592 | 0.287 | |||
No | 399 (28.6) | 284 (29.1) | 115 (27.5) | ||
Yes | – | – | – | ||
Shape | 0.135 | 2.233 | |||
Regular | 932 (66.9) | 640 (65.6) | 292 (69.9) | ||
Irregular or lobulated | – | – | – | ||
Aspect ratio | 0.77 | 0.085 | |||
A/T <1 | 757 (54.3) | 533 (54.6) | 224 (53.6) | ||
A/T ≥1 | – | – | – | ||
Microcalcifications | 0.52 | 0.413 | |||
No | 458 (32.9) | 315 (32.3) | 143 (34.2) | ||
Yes | – | – | – | ||
CDFI blood flow | 0.401 | 0.705 | |||
No | 1,072 (76.9) | 744 (76.2) | 328 (78.5) | ||
Yes | – | – | – | ||
Hashimoto thyroiditis | 0.846 | 0.038 | |||
No | 924 (66.3) | 649 (66.5) | 275 (65.8) | ||
Yes | – | – | – | ||
NLR | 0.518 | 0.417 | |||
Low (<1.83) | 617 (44.3) | 426 (43.6) | 191 (45.7) | ||
High (≥1.83) | – | – | – | ||
PLR | 0.459 | 0.549 | |||
Low (<146.58) | 875 (62.8) | 606 (62.1) | 269 (64.4) | ||
High (≥146.58) | – | – | – | ||
MLR | 0.112 | 2.527 | |||
Low (<0.27) | 1,123 (80.6) | 775 (79.4) | 348 (83.3) | ||
High (≥0.27) | – | – | – | ||
SII | 0.387 | 0.748 | |||
Low (<395.22) | 528 (37.9) | 362 (37.1) | 166 (39.7) | ||
High (≥395.22) | – | – | – | ||
SIRI | 0.182 | 1.782 | |||
Low (<0.77) | 791 (56.7) | 542 (55.5) | 249 (59.6) | ||
High (≥0.77) | – | – | – |
Data are presented as N (%). CLNM, central lymph node metastasis; A/T, aspect ratio (height divided by width on transverse views); CDFI, color Doppler flow imaging; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; MLR, monocyte-to-lymphocyte ratio; SII, systemic immune-inflammatory index; SIRI, systemic inflammation response index.
LASSO regression feature selection in the training set
LASSO regression yielded 15 nonzero coefficient features, including gender, age, tumor size, capsular invasion, laterality, multifocality, location, solid component, unclear margin, microcalcifications, CDFI blood flow, Hashimoto’s thyroiditis, NLR, MLR, and SIRI (Figure 1).
Construction and validation of the nomogram
Utilizing the 15 nonzero coefficient features, the multivariate analysis revealed that younger age, larger tumor size, capsular invasion, location (lower and isthmus), unclear margin, microcalcifications, CDFI blood flow, and higher SIRI (≥0.77) were independent positive predictors of CLNM, while female and Hashimoto thyroiditis were independent negative predictors (Table 3). Next, a nomogram plot was drawn on the basis of the training cohort’s multivariate analysis (Figure 2). The ROC curve demonstrated a desirable discrimination capacity, with AUCs of 0.834 and 0.803 in the training and testing cohorts, respectively (Figure 3A,3B). The calibration curve exhibited notable consistency, regarding mean absolute errors in the two cohorts of 0.017 and 0.015, respectively (Figure 3C,3D). The DCA showed broad clinical utility when the threshold probability of an individual was between approximately 20% and 90% (Figure 3E,3F).
Table 3. Multivariate analysis.
Variables | OR | 95% CI | P |
---|---|---|---|
Gender | |||
Male | Reference | ||
Female | 0.511 | 0.347–0.752 | 0.001 |
Age (years) | |||
>55 | Reference | ||
40–55 | 1.772 | 1.073–2.926 | 0.025 |
<40 | 4.794 | 2.877–7.988 | <0.001 |
Tumor size (mm) | |||
<10 | Reference | ||
10–20 | 3.194 | 2.126–4.799 | <0.001 |
21–40 | 6.675 | 2.754–16.178 | <0.001 |
Capsular invasion | |||
No | Reference | ||
Yes | 2.862 | 1.598–5.126 | <0.001 |
Laterality | |||
Unilateral | Reference | ||
Bilateral | 1.276 | 0.748–2.177 | 0.371 |
Multifocality | |||
Solitary tumor | Reference | ||
Multifocal tumor | 1.296 | 0.825–2.037 | 0.261 |
Location | |||
Upper | Reference | ||
Middle | 1.077 | 0.7–1.658 | 0.735 |
Lower | 2.288 | 1.429–3.663 | 0.001 |
Isthmus | 3.373 | 1.738–6.545 | <0.001 |
Solid composition | |||
No | Reference | ||
Yes | 1.369 | 0.974–1.923 | 0.07 |
Unclear margin | |||
No | Reference | ||
Yes | 2.43 | 1.697–3.479 | <0.001 |
Microcalcifications | |||
No | Reference | ||
Yes | 1.851 | 1.311–2.613 | <0.001 |
CDFI blood flow | |||
No | Reference | ||
Yes | 3.47 | 2.349–5.125 | <0.001 |
Hashimoto thyroiditis | |||
No | Reference | ||
Yes | 0.409 | 0.29–0.577 | <0.001 |
NLR | |||
Low (<1.83) | Reference | ||
High (≥1.83) | 1.337 | 0.917–1.95 | 0.131 |
MLR | |||
Low (<0.27) | Reference | ||
High (≥0.27) | 1.393 | 0.894–2.169 | 0.142 |
SIRI | |||
Low (<0.77) | Reference | ||
High (≥0.77) | 1.578 | 1.046–2.379 | 0.03 |
OR, odd ratio; CI, confidence interval; CDFI, color Doppler flow imaging; NLR, neutrophil-to-lymphocyte ratio; MLR, monocyte-to-lymphocyte ratio; SIRI, systemic inflammation response index.
Development and evaluation of ML models
Utilizing the 15 potential features selected by LASSO regression, 8 ML algorithm prediction models for CLNM were developed. The RF model performed optimally, with auROC values of 0.8177 and auPR values of 0.8029, followed by XGBoost (0.8130 and 0.7987), CatBoost (0.8130 and 0.7985) and ANN (0.8105 and 0.7990). Detailed information concerning sensitivity, specificity, accuracy, precision, recall, f1 score and mcc is summarized in Table 4 and Figure S1. The DCA plot proved that RF had better clinical suitability (Figure S1C).
Table 4. Model capabilities of the eight ML algorithms in the testing set.
Model | ROC_AUC | PR_AUC | Sensitivity | Specificity | Accuracy | Precision | Recall | F1_score | MCC |
---|---|---|---|---|---|---|---|---|---|
DT | 0.7478 | 0.7168 | 0.6293 | 0.7465 | 0.6890 | 0.7049 | 0.6293 | 0.6649 | 0.3786 |
CatBoost | 0.8130 | 0.7985 | 0.6537 | 0.8263 | 0.7416 | 0.7836 | 0.6537 | 0.7128 | 0.4880 |
KNN | 0.7501 | 0.7311 | 0.7463* | 0.6150 | 0.6794 | 0.6511 | 0.7463* | 0.6955 | 0.3641 |
LightGBM | 0.8057 | 0.7937 | 0.7415 | 0.7371 | 0.7392 | 0.7308 | 0.7415 | 0.7361 | 0.4785 |
RF | 0.8177* | 0.8029* | 0.6732 | 0.7934 | 0.7344 | 0.7582 | 0.6732 | 0.7132 | 0.4705 |
XGBoost | 0.8130 | 0.7987 | 0.6439 | 0.8310* | 0.7392 | 0.7857* | 0.6439 | 0.7078 | 0.4842 |
SVM | 0.8088 | 0.7940 | 0.6927 | 0.7840 | 0.7392 | 0.7553 | 0.6927 | 0.7226 | 0.4791 |
ANN | 0.8105 | 0.7990 | 0.7268 | 0.7887 | 0.7584* | 0.7680 | 0.7268 | 0.7469* | 0.5168* |
*, the maximum value of the column. ML, machine learning; ROC, receiver operating characteristic; AUC, area under curve; PR, precision-recall; MCC, Matthews correlation coefficient; DT, decision tree; CatBoost, categorical boosting; KNN, K-nearest neighbors; LightGBM, light gradient boosting machine; RF, random forest; XGBoost, extreme gradient boosting; SVM, support vector machine; ANN, artificial neural network.
The RF model performance
Figure 4A shows the modeling process of the RF model, and choosing the appropriate parameters (mtry:2 and ntree:250) made the RF model perform best. Starting from the 100th DT, the error of the RF algorithm gradually flattened, indicating that the generalization ability of the RF algorithm gradually increased. We also used RF to explore the importance of variables. As illustrated in Figure 4B, the top 5 variables, CDFI blood flow, location, age, tumor size, and microcalcifications, are analogous when evaluated via two measures: a decrease in classification accuracy (mean lowered accuracy) and a decrease in node impurity (mean decreased Gini). The SIRI, which performed relatively better among the three inflammatory indices, ranked tenth and ninth in the mean decrease accuracy and Gini plot, respectively.
The confusion matrix of RF is displayed in Figure S2A,S2B. The respective auROCs were 85.48% and 81.77%, and DeLong’s test between the training and testing cohorts revealed that there were no statistically significant differences (P>0.05) (Figure S2C). The learning curves indicate that the training and testing sets have a strong fitting ability and high stability (Figure S2D). In general, the RF model effectively prevents overfitting. Additionally, Figure S2E,S2F visually shows the predicted probability distribution of the RF model.
Explanation of the ML model with the SHAP method
We performed interpretability manipulations using the SHAP tool in the RF and XGBoost models. Ranking of variable contributions was assessed by the mean absolute SHAP values (Figure 5A,5B). The top ten features in the RF model were age, CDFI blood flow, tumor size, location, microcalcifications, Hashimoto thyroiditis, unclear margin, SIRI, gender, and solid composition. In addition, we constructed scatter plots of SHAP summary plots, which visualized the relationship between eigenvalues and predicted probabilities by color (Figure S3A,S3B). The larger the absolute value on the x-axis, the more the attribute affects the output, with colors representing high (red) and low (blue) raw eigenvalues. We can see that a higher SIRI has a positive impact, while Hashimoto thyroiditis has a negative impact. To visualize the contributions of individual variable levels, we implemented it with the help of the facet wrap method based on the SHAP value (Figure S4).
Figure 6 presents a couple of classic scenarios that showcase the model’s capacity for interpretation. The CLNM-absent individual obtained a poorer SHAP value (0.15) (Figure 6A), while CLNM-present individual obtained a stronger SHAP value (0.94) (Figure 6B).
Discussion
Principal findings
We had three major findings in this research. First, in addition to traditional US features, the inflammatory index, especially the SIRI, was found to be a risk predictor for CLNM. Second, we established and verified eight ML models to assess CLNM likelihood in individuals with cN0T1–T2 PTC. The RF model performed the best (with maximum auROC and auPR), followed by XGBoost, CatBoost, and ANN. Third, the interpretability of the models was illustrated via the SHAP approach.
Consistent with numerous previous clinical studies, younger age, presence of CDFI blood flow, larger tumor, tumor located in the lower or isthmus, microcalcifications, absence of Hashimoto, unclear margin, male gender and capsular invasion were all found to be risk factors for CLNM (27-29). Several researchers have discovered that blood inflammatory indices such as MLR, PLR, NLR, and SII are predictive factors for CLNM and LLNM, and are even associated with poor prognosis and relapse (30-33). However, use of a new blood inflammatory index, SIRI, and research regarding its importance in PTC are still lacking. This could be the first investigation regarding the relationship between SIRI and CLNM to the best of our knowledge.
In previous studies, scholars have used several ML algorithms to predict CLNM and LLNM (34,35). Different algorithms have their pros and cons (36). The RF model performed the best in this research. Several new ensemble learning algorithms have also shown good predictive performance, including XGBoost and CatBoost. Ensemble learning is mainly divided into bagging algorithms and boosting algorithms. Via bagging theory, the RF algorithm incorporates many DTs. The gradient boosting decision tree (GBDT) represents a broad family of algorithms through the boosting theory of ensemble learning. LightGBM, XGBoost and CatBoost are the latest and most recognized algorithm members with enhanced capabilities in the GBDT theory family (37-39).
It is worth noting that a major flaw in most ML models is the black box problem. What we did to conquer this flaw was introduce the SHAP tool. In this study, we not only applied the global interpretations of SHAP to visually demonstrate the whole attributes but also delivered specific interpretations utilizing SHAP individual force plots encompassing both positive and negative effects.
Limitations
First, there is a need for more prospective extensive investigations, considering that these findings relied on retrospective observations. Second, the models were randomly allocated into training and testing cohorts to diminish overfitting due to a lack of external validation. Next, we will build the database with the collaboration of multiple medical centers to further examine the model’s capabilities.
Conclusions
Interpretable ML models based on the SIRI and US features can be used to predict CLNM in individuals with cN0T1–T2 PTC.
Supplementary
Acknowledgments
The authors would like to thank the numerous individuals who participated in this study.
Funding: None.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Committee of Third Xiangya Hospital, Central South University (No. quick23472) and individual consent for this retrospective analysis was waived.
Footnotes
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-23-349/rc
Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-23-349/dss
Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-23-349/prf
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-23-349/coif). The authors have no conflicts of interest to declare.
References
- 1.Pizzato M, Li M, Vignat J, et al. The epidemiological landscape of thyroid cancer worldwide: GLOBOCAN estimates for incidence and mortality rates in 2020. Lancet Diabetes Endocrinol 2022;10:264-72. 10.1016/S2213-8587(22)00035-3 [DOI] [PubMed] [Google Scholar]
- 2.Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]
- 3.Pinheiro RA, Leite AK, Cavalheiro BG, et al. Incidental Node Metastasis as an Independent Factor of Worse Disease-Free Survival in Patients with Papillary Thyroid Carcinoma. Cancers (Basel) 2023;15:943. 10.3390/cancers15030943 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yu ST, Ge J, Wei Z, et al. The lymph node yield in the initial lateral neck dissection predicts recurrence in the lateral neck of papillary thyroid carcinoma: a revision surgery cohort study. Int J Surg 2023;109:1264-70. 10.1097/JS9.0000000000000316 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Luo Z, Hei H, Qin J, et al. Lymph node ratio in lateral neck is an independent risk factor for recurrence-free survival in papillary thyroid cancer patients with positive lymph nodes. Endocrine 2022;78:484-90. 10.1007/s12020-022-03173-x [DOI] [PubMed] [Google Scholar]
- 6.Hutchinson KA, Guerra A, Payne AE, et al. Risk Factors Associated With Reoperative Surgery for Thyroid Malignancies: A Retrospective Cohort Study. Otolaryngol Head Neck Surg 2023;168:392-7. 10.1177/01945998221099799 [DOI] [PubMed] [Google Scholar]
- 7.Wang W, Ding Y, Jiang W, et al. Can Cervical Lymph Node Metastasis Increase the Risk of Distant Metastasis in Papillary Thyroid Carcinoma? Front Endocrinol (Lausanne) 2022;13:917794. 10.3389/fendo.2022.917794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Baud G, Jannin A, Marciniak C, et al. Impact of Lymph Node Dissection on Postoperative Complications of Total Thyroidectomy in Patients with Thyroid Carcinoma. Cancers (Basel) 2022;14:5462. 10.3390/cancers14215462 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Privitera F, Centonze D, La Vignera S, et al. Risk Factors for Hypoparathyroidism after Thyroid Surgery: A Single-Center Study. J Clin Med 2023;12:1956. 10.3390/jcm12051956 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang J, Han Y, Min Y, et al. Prophylactic central neck dissection for cN0 papillary thyroid carcinoma: is there any difference between western countries and China? A systematic review and meta-analysis. Front Endocrinol (Lausanne) 2023;14:1176512. 10.3389/fendo.2023.1176512 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang Y, Xiao Y, Pan Y, et al. The effectiveness and safety of prophylactic central neck dissection in clinically node-negative papillary thyroid carcinoma patients: A meta-analysis. Front Endocrinol (Lausanne) 2022;13:1094012. 10.3389/fendo.2022.1094012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Feng JW, Ye J, Wu WX, et al. Management of cN0 papillary thyroid microcarcinoma patients according to risk-scoring model for central lymph node metastasis and predictors of recurrence. J Endocrinol Invest 2020;43:1807-17. 10.1007/s40618-020-01326-1 [DOI] [PubMed] [Google Scholar]
- 13.Duan X, Yang B, Zhao C, et al. Prognostic value of preoperative hematological markers in patients with glioblastoma multiforme and construction of random survival forest model. BMC Cancer 2023;23:432. 10.1186/s12885-023-10889-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Huang C, Wang M, Chen L, et al. The pretherapeutic systemic inflammation score is a prognostic predictor for elderly patients with oesophageal cancer: a case control study. BMC Cancer 2023;23:505. 10.1186/s12885-023-10982-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Huai Q, Luo C, Song P, et al. Peripheral blood inflammatory biomarkers dynamics reflect treatment response and predict prognosis in non-small cell lung cancer patients with neoadjuvant immunotherapy. Cancer Sci 2023. [Epub ahead of print]. doi: . 10.1111/cas.15964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Makino T, Izumi K, Iwamoto H, et al. Comparison of the Prognostic Value of Inflammatory and Nutritional Indices in Nonmetastatic Renal Cell Carcinoma. Biomedicines 2023;11:533. 10.3390/biomedicines11020533 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Duque-Santana V, López-Campos F, Martin-Martin M, et al. Neutrophil-to-Lymphocyte Ratio and Platelet-to-Lymphocyte Ratio as Prognostic Factors in Locally Advanced Rectal Cancer. Oncology 2023;101:349-57. 10.1159/000526450 [DOI] [PubMed] [Google Scholar]
- 18.Dong J, Sun Q, Pan Y, et al. Pretreatment systemic inflammation response index is predictive of pathological complete response in patients with breast cancer receiving neoadjuvant chemotherapy. BMC Cancer 2021;21:700. 10.1186/s12885-021-08458-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lu YF, Wu CY, Lo WC, et al. Postchemoradiotherapy systemic inflammation response index predicts treatment response and overall survival for patients with locally advanced nasopharyngeal cancer. J Formos Med Assoc 2023;122:1141-9. 10.1016/j.jfma.2023.05.003 [DOI] [PubMed] [Google Scholar]
- 20.Shan M, Deng Y, Zou W, et al. Salvage radiotherapy strategy and its prognostic significance for patients with locoregional recurrent cervical cancer after radical hysterectomy: a multicenter retrospective 10-year analysis. BMC Cancer 2023;23:905. 10.1186/s12885-023-11406-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pacheco-Barcia V, Mondéjar Solís R, France T, et al. A systemic inflammation response index (SIRI) correlates with survival and predicts oncological outcome for mFOLFIRINOX therapy in metastatic pancreatic cancer. Pancreatology 2020;20:254-64. 10.1016/j.pan.2019.12.010 [DOI] [PubMed] [Google Scholar]
- 22.Cai H, Chen Y, Zhang Q, et al. High preoperative CEA and systemic inflammation response index (C-SIRI) predict unfavorable survival of resectable colorectal cancer. World J Surg Oncol 2023;21:178. 10.1186/s12957-023-03056-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Satapathy P, Pradhan KB, Rustagi S, et al. Application of machine learning in surgery research: current uses and future directions - editorial. Int J Surg 2023;109:1550-1. 10.1097/JS9.0000000000000421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kufel J, Bargieł-Łączek K, Kocot S, et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine. Diagnostics (Basel) 2023;13:2582. 10.3390/diagnostics13152582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ali S, Akhlaq F, Imran AS, et al. The enlightening role of explainable artificial intelligence in medical & healthcare domains: A systematic literature review. Comput Biol Med 2023;166:107555. 10.1016/j.compbiomed.2023.107555 [DOI] [PubMed] [Google Scholar]
- 26.Nohara Y, Matsumoto K, Soejima H, et al. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584. 10.1016/j.cmpb.2021.106584 [DOI] [PubMed] [Google Scholar]
- 27.Li J, Sun P, Huang T, et al. Preoperative prediction of central lymph node metastasis in cN0T1/T2 papillary thyroid carcinoma: A nomogram based on clinical and ultrasound characteristics. Eur J Surg Oncol 2022;48:1272-9. 10.1016/j.ejso.2022.04.001 [DOI] [PubMed] [Google Scholar]
- 28.Wang W, Ding Y, Meng C, et al. Patient's age with papillary thyroid cancer: Is it a key factor for cervical lymph node metastasis? Eur J Surg Oncol 2023;49:1147-53. 10.1016/j.ejso.2023.02.011 [DOI] [PubMed] [Google Scholar]
- 29.Meng C, Wang W, Zhang Y, et al. The influence of nodule size on the aggressiveness of thyroid carcinoma varies with patient's age. Gland Surg 2021;10:961-72. 10.21037/gs-20-747 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhao L, Zhou T, Zhang W, et al. Blood immune indexes can predict lateral lymph node metastasis of thyroid papillary carcinoma. Front Endocrinol (Lausanne) 2022;13:995630. 10.3389/fendo.2022.995630 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang Z, Xia F, Wang W, et al. The systemic immune-inflammation index-based model is an effective biomarker on predicting central lymph node metastasis in clinically nodal-negative papillary thyroid carcinoma. Gland Surg 2021;10:1368-73. 10.21037/gs-20-666 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Huang Y, Liu Y, Mo G, et al. Inflammation Markers Have Important Value in Predicting Relapse in Patients with papillary thyroid carcinoma: A Long-Term Follow-Up Retrospective Study. Cancer Control 2022;29:10732748221115236. 10.1177/10732748221115236 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Song L, Zhu J, Li Z, et al. The prognostic value of the lymphocyte-to-monocyte ratio for high-risk papillary thyroid carcinoma. Cancer Manag Res 2019;11:8451-62. 10.2147/CMAR.S219163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wu Y, Rao K, Liu J, et al. Machine Learning Algorithms for the Prediction of Central Lymph Node Metastasis in Patients With Papillary Thyroid Cancer. Front Endocrinol (Lausanne) 2020;11:577537. 10.3389/fendo.2020.577537 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Feng JW, Ye J, Qi GF, et al. LASSO-based machine learning models for the prediction of central lymph node metastasis in clinically negative patients with papillary thyroid carcinoma. Front Endocrinol (Lausanne) 2022;13:1030045. 10.3389/fendo.2022.1030045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol 2019;20:e262-73. 10.1016/S1470-2045(19)30149-4 [DOI] [PubMed] [Google Scholar]
- 37.Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Guyon I, von Luxburg U, Bengio S, et al. editors. Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2017. [Google Scholar]
- 38.Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery; 2016. doi: 10.1145/2939672.2939785. [DOI] [Google Scholar]
- 39.Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data 2020;7:94. 10.1186/s40537-020-00369-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.