Skip to main content
Medicine logoLink to Medicine
. 2025 Mar 21;104(12):e41987. doi: 10.1097/MD.0000000000041987

Model development and validation for predicting small-cell lung cancer bone metastasis utilizing diverse machine learning algorithms based on the SEER database

Shuai Qie a,*, Xin Zhang a, Jiusong Luan a, Zhelun Song a, Jingyun Li a, Jingyu Wang a
PMCID: PMC11936617  PMID: 40128015

Abstract

The aim of this study was to devise a machine learning algorithm with superior performance in predicting bone metastasis (BM) in small cell lung cancer (SCLC) and create a straightforward web-based predictor based on the developed algorithm. Data comprising demographic and clinicopathological characteristics of patients with SCLC and their potential BM were extracted from the Surveillance, Epidemiology, and End Results database between 2010 and 2018. This data was then utilized to develop 12 machine learning algorithm models: support vector machine, logistic regression, NaiveBayes, extreme gradient boosting, decision tree, random forest, ExtraTrees, LightGBM, GradientBoosting, AdaBoost, MLP, and k-nearest neighbor. The models were compared and evaluated using various metrics, including accuracy, precision, recall rate, F1-score, the area under the receiver operating characteristic curve (AUC) value, and the Brier score. The objective was to predict the likelihood of BM in SCLC patients based on their demographic and clinicopathological features. The best-performing model was then chosen, and the associations between the clinicopathological characteristics and the target variable (presence or absence of BM) were interpreted based on this model. This analysis aimed to provide insights into the factors that may influence the risk of BM in SCLC patients. A total of 89,366 SCLC patients were included in this study, and among them, 8269 (9.25%) patients developed BM. The age, T stage, N stage, liver metastasis, lung metastasis, marital status, income, M stage, American Joint Committee on Cancer stage, and brain metastasis were identified as independent risk factors for SCLC. Among the various predictive models evaluated, the machine learning model utilizing the XGB algorithm showed the highest performance in both internal and external data validation, achieving AUC scores of training set AUC: 0.965, validation set AUC: 0.962, and testing set AUC: 0.961. Subsequently, the XGB algorithm was utilized to develop a web-based predictor for BM in patients with SCLC. This study has developed a web-based predictor utilizing the XGB algorithm to forecast the risk of BM in SCLC patients, aiming to provide doctors with valuable assistance in clinical decision-making.

Keywords: bone metastasis, machine learning, prognosis, SEER, small-cell lung cancer

1. Introduction

Small-cell lung cancer (SCLC) is rapidly emerging as a significant cancer with a high mortality rate. Among the various complications associated with SCLC, bone metastasis (BM) is a particularly devastating one.[1] This condition occurs when the cancer cells spread from the lungs to the bones, often leading to severe pain, fractures, and hypercalcemia. Annually, a significant proportion of SCLC patients are diagnosed with bone metastases, significantly impacting their quality of life and prognosis.[2] Moreover, autopsy reports reveal that the incidence of BM in SCLC patients is even higher, emphasizing the need for early detection and aggressive management of this complication.[3]

The prediction of BM in patients with SCLC has been a crucial yet challenging task in modern oncology.[4] SCLC, a highly aggressive subtype of lung cancer, often exhibits rapid progression and metastasis, including to the bones. Accurate prediction of BM is paramount in guiding clinical decision-making, patient management, and resource allocation.[5]

However, current guidelines, such as the National Comprehensive Cancer Network in the United States, lack stratified recommendations for BM screening in SCLC patients.[6] This gap in guidance can lead to unnecessary diagnostic procedures for patients with low-risk features and potential missed detections in those with high-risk profiles. Therefore, the development of a predictive model based on machine learning algorithms, utilizing the Surveillance, Epidemiology, and End Results (SEER) database, holds significant promise in addressing this clinical need.

Machine learning algorithms have demonstrated remarkable capabilities in analyzing complex datasets and extracting meaningful patterns. By leveraging the rich information contained in the SEER database, we can train a model to identify patterns predictive of BM in SCLC patients.[7] Such a model can stratify patients into different risk groups, enabling targeted bone scan screenings for high-risk individuals and avoiding unnecessary testing for those at lower risk.

The necessity of this study is underscored by the potential benefits it offers. For high-risk patients, early detection of BM can lead to timely intervention and improved outcomes. Conversely, for low-risk patients, avoiding unnecessary bone scans can reduce the burden on the healthcare system, decrease costs, and enhance the patient’s quality of life. Additionally, such a predictive model can inform the development of more tailored and effective treatment strategies, further personalizing the management of SCLC.

We aim to compare the predictive capabilities of these models using various assessment indicators to identify the optimal machine learning approach for analyzing the intricate relationship between SCLC-related BM and various clinicopathological features. This will enable us to gain deeper insights into the pathogenesis of SCLC, identify high-risk patients, and potentially devise more targeted and effective therapeutic strategies.

2. Methods

2.1. Data collection

At the request of institutional review boards and ethics committees, this database has been de-identified by patient information. Committees and ethics committees require that this information be publicly available.

In the retrospective study, we obtained clinical data on BM in SCLC from the SEER database through SEER*Stat 8.4.0 software. The SEER database covers approximately 27.8% of the population data in the United States and is sourced from 17 registries https://seer.cancer.gov/. For the training set, we selected patients diagnosed with SCLC and BM from 2010 to 2015 with confirmed pathological diagnosis.

We followed the following exclusion criteria: (1) patients with other primary cancers or other malignant tumors; (2) patients identified through autopsy or death certificates; (3) patients with uncertain clinical data values. The validation set contains patients diagnosed with SCLC and BM in 2 hospitals from 2016 to 2017 and also applies the above exclusion and inclusion criteria. The test set contains patients diagnosed with small cell lung cancer and BM in 2018.

We extracted patient characteristics, including age, sex, tumor size, primary site, grade, laterality, TNM stage, stage T, stage N, stage M, liver metastasis, brain metastasis, lung metastasis, income, and marital status. For the diagnosis of BM in lung cancer, we determined it based on pathological results from surgical resection or tumor biopsy. In this way, we constructed a complete, time-sequenced training set, validation set, and test set for research and analysis of BM in SCLC. Figure 1 shows the process of patient screening.

Figure 1.

Figure 1.

The flow chart for the selection of the study population and the wed-based model.

To adhere to the regulations set forth by the Institutional Review Board and the ethics committee, the database has undergone a de-identification process, removing any identifying patient information. Additionally, these governing bodies have mandated that the processed, non-identifiable information be made publicly accessible.

2.2. Sampling precession

SMOTE sampling, also known as Synthetic Minority Over-Sampling Technique, is a sampling technique that addresses the imbalance of categories in classification problems. Its basic idea is to analyze the minority class samples and synthesize new samples artificially to increase the number of minority class samples, thus making the category distribution of the training sample set more balanced. SMOTE sampling does not directly copy minority class samples, but instead synthesizes new samples between 2 minority class samples through linear interpolation.

2.3. Feature selection

After encoding and transforming the clinical characteristics mentioned above, the feature selection process utilizing the lasso regression method is described in detail. Additionally, a brief overview of the principles of lasso regression is provided.

The lasso regression, also known as the least absolute shrinkage and selection operator, is a regression analysis method that performs both regression analysis and variable selection. It adds a penalty term to the ordinary least squares loss function, resulting in a shrinkage of the coefficients towards zero. This shrinkage effectively reduces the impact of variables with minor contributions to the model, helping to identify the most important features (Fig. 2A and B).

Figure 2.

Figure 2.

(A and B) Illustration of feature selection process via Lasso regression.

2.4. Model development

Random forest (RF): the RF approach, which reduces training variance and enhances integration and generalization, employs a collection of trees to train and predict samples, ensuring robustness and accuracy. Naive Bayes Classifier (Naive Bayes), a probabilistic classification method based on Bayes’ theorem with strong (naive) independence assumptions between features. Logistic regression, a statistical method used for binary classification by modeling the probability of a certain class. Support vector machine (SVM), a supervised learning model that analyzes data and recognizes patterns used for classification or regression. It works by finding a hyperplane that maximizes the margin between 2 classes. K-nearest neighbors (KNN), a simple algorithm that assigns a class to an object based on the classes of its KNN. RF, an ensemble learning method that constructs a multitude of decision trees (DTs) and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Extra trees, a variant of Random Forests where the splits are chosen randomly rather than using the best splits. It provides less accurate predictions but is often faster to compute. Extreme Gradient Boosting (XGBoost), an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. Light Gradient Boosting Machine, a gradient boosting framework that uses tree-based learning algorithms. It is designed to be efficient with high accuracy, capable of handling large datasets and supporting various features. Gradient Boosting DT (Gradient Boosting), an ensemble method that uses DTs and the gradient boosting technique to iteratively improve predictions by adding new models that correct the errors of the previous models. Adaptive Boosting (AdaBoost), an adaptive boosting algorithm that fits a sequence of weak learners (usually DTs) to weighted data. Each subsequent weak learner is trained to correct the errors of the previous learners. Multi-layer perceptron, a type of feedforward artificial neural network that consists of multiple layers of nodes. It is often used for classification and regression tasks.

The dataset was divided into 3 distinct subsets: training set (2010–2015), validation set (2016–2017), and test set (2018 and onward). The training set was utilized to build the prediction models, while the validation set served to fine-tune the models and assess their performance. Finally, the test set was employed to evaluate the generalization ability of the models on unseen data.

2.5. Model tests and evaluation

The examination dataset served as a tool to assess 6 distinct machine-learning models. The evaluation metrics encompassed accuracy, precision, recall rate, F1_score, and the area beneath the ROC curve (AUC) value, all of which were utilized to measure the performance of the 6 machine learning models. Notably, the AUC value, derived from the ROC curve, is a graphical representation that illustrates the diagnostic prowess of a binary classifier as its discrimination threshold varies. This AUC value stands as a standard metric in the comprehensive evaluation of the models.

2.6. Statistical analysis and software

In this particular research, we employed IBM SPSS Statistics (Version 22) and Python (Version 3.9.7) as our primary tools for conducting statistical analyses. To assess the continuous data, we made use of either independent sample t-tests or Mann–Whitney U tests, depending on the nature of the data. For categorical data, we resorted to chi-square tests or Fisher exact test for comparison. We deemed values of P < .05 to be statistically significant.

To precisely predict the risk of bone metastases, we harnessed various algorithms including logistic regression, Naive Bayes, SVM, KNN, RF, extra trees, XGBoost, LightGBM, GradientBoosting, AdaBoost, and MLP. These algorithms were utilized based on the identified risk factors to establish robust diagnostic models (Fig. 3).

Figure 3.

Figure 3.

Comparison of AUC values across 11 machine learning models.

2.7. Model visualization

For patients with SCLC, we have developed risk assessment tools using web pages, specifically tailored for diagnosis and prognosis. These tools provide clinicians with a convenient platform to access and utilize the risk assessment functionalities. By logging into the designated website, clinicians can assess the risk of bone metastases in SCLC patients, thereby enabling more informed decision-making and improved patient care.

3. Results

3.1. Clinical characteristics description of training, validation, and testing sets

Table 1 summarizes the clinical features of the patients in the training, validation, and testing sets, including age, sex, tumor size, primary site, grade, laterality, TNM stage, stage T, stage N, stage M, liver metastasis, brain metastasis, lung metastasis, income, and marital status.

Table 1.

Stratification of demographic and clinicopathologic variables across the entire cohort based on brain metastasis status.

Clinical features ALL Train set Test set Validation set P value
Age (years) 70.57 ± 9.17 70.38 ± 9.42 70.62 ± 9.00 70.94 ± 8.72 <.001
Tumor_size (mm) 46.05 ± 28.65 46.53 ± 28.85 46.25 ± 27.94 44.86 ± 28.70 <.001
Sex <.001
 Female 34,173 (38.16) 18,439 (37.61) 6597 (38.18) 9137 (39.31)
 Male 55,373 (61.84) 30,588 (62.39) 10,680 (61.82) 14,105 (60.69)
Race .049
 White 75,071 (83.84) 41,254 (84.15) 14,391 (83.30) 19,426 (83.58)
 Black 9956 (11.12) 5400 (11.01) 1962 (11.36) 2594 (11.16)
 American Indian/Alaska Native 636 (0.71) 329 (0.67) 127 (0.74) 180 (0.77)
 Asia or Pacific Islander 3883 (4.34) 2044 (4.17) 797 (4.61) 1042 (4.48)
Primary_Site .003
 Main bronchus 4724 (5.28) 2726 (5.56) 883 (5.11) 1115 (4.80)
 Upper lobe 51,847 (57.90) 28,359 (57.84) 9947 (57.57) 13,541 (58.26)
 Middle lobe 3505 (3.91) 1949 (3.98) 665 (3.85) 891 (3.83)
 Lower lobe 28,412 (31.73) 15,415 (31.44) 5575 (32.27) 7422 (31.93)
 Overlapping lesion 1058 (1.18) 578 (1.18) 207 (1.20) 273 (1.17)
Laterality .554
 Left 39,090 (43.65) 21,426 (43.70) 7583 (43.89) 10,081 (43.37)
 Right 50,456 (56.35) 27,601 (56.30) 9694 (56.11) 13,161 (56.63)
Stage TNM <.001
 I 25,805 (28.82) 13,457(27.45) 5156 (29.84) 7192 (30.94)
 II 12,352 (13.79) 7065 (14.41) 2220 (12.85) 3067 (13.20)
 III 26,054 (29.10) 14,096 (28.75) 5009 (28.99) 6949 (29.90)
 IV 25,335 (28.29) 14,409 (29.39) 4892 (28.32) 6034 (25.96)
Stage T <.001
 T1 22,342 (24.95) 10,933 (22.30) 4444 (25.72) 6965 (29.97)
 T2 27,749 (30.99) 16,639 (33.94) 5521 (31.96) 5589 (24.05)
 T3 19,321 (21.58) 11,345 (23.14) 3684 (21.32) 4292 (18.47)
 T4 20,134 (22.48) 10,110 (20.62) 3628 (21.00) 6396 (27.52)
Stage N <.001
 N0 44,482 (49.68) 23,647 (48.23) 8555 (49.52) 12,280 (52.84)
 N1 8956 (10.00) 4986 (10.17) 1648 (9.54) 2322 (9.99)
 N2 27,431 (30.63) 15,838 (32.30) 5282 (30.57) 6311 (27.15)
 N3 8677 (9.69) 4556 (9.29) 1792 (10.37) 2329 (10.02)
Stage M <.001
 M0 64,211 (71.71) 34,618 (70.61) 12,385 (71.68) 17,208 (74.04)
 M1A 8398 (9.38) 5119 (10.44) 1592 (9.21) 1687 (7.26)
 M1B 15,945(17.81) 9096 (18.55) 2871 (16.62) 3978 (17.12)
 M1NOS 992 (1.11) 194 (0.40) 429 (2.48) 369 (1.59)
Brain metastases .883
 No 85,355 (95.32) 46,742 (95.34) 16,456 (95.25) 22,157 (95.33)
 Yes 4191 (4.68) 2285 (4.66) 821 (4.75) 1085 (4.67)
Liver metastases .045
 No 85,203 (95.15) 46,659 (95.17) 16,382 (94.82) 22,162 (95.35)
 Yes 4343 (4.85) 2368 (4.83) 895 (5.18) 1080 (4.65)
Lung metastases <.001
 No 81,449 (90.96) 44,479 (90.72) 15,569 (90.11) 21,401 (92.08)
 Yes 8097 (9.04) 4548 (9.28) 1708 (9.89) 1841 (7.92)
Distant lymph node metastases
 No 38,951 (43.50) Null 16,585 (95.99) 22,366 (96.23)
 Yes 1510 (1.69) Null 660 (3.82) 850 (3.66)
 Unkown 49,085 (54.82) 49,027 (100.00) 32 (0.19) 26 (0.11)
Grade
 I 1372 (1.53) 1033 (2.11) 339 (1.96) Null
 II 18,856(21.06) 14,054(28.67) 4802 (27.79) Null
 III 22,518 (25.15) 17,104 (34.89) 5414 (31.34) Null
 IV 370 (0.41) 272 (0.55) 98 (0.57) Null
 Unkown 46,430 (51.85) 16,564 (33.79) 6624 (38.34) 23,242 (100.00)
Marital_status <.001
 Single/unmarried 12,919 (14.43) 6488 (13.23) 2611 (15.11) 3820 (16.44)
 Divorced/separated/widowed 29,113 (32.51) 16,009 (32.65) 5683 (32.89) 7421 (31.93)
 Married 43,883 (49.01) 24,445 (49.86) 8343 (48.29) 11,095 (47.74)
 Unkown 3631 (4.05) 2085 (4.25) 640 (3.70) 906 (3.90)
Income <.001
 <40,000 5271 (5.89) 3333 (6.80) 948 (5.49) 990 (4.26)
 –60,000 28,353 (31.66) 16,353 (33.36) 5492 (31.79) 6508 (28.00)
 >60,000 55,922 (62.45) 29,341 (59.85) 10,837 (62.73) 15,744 (67.74)

In the results section of our machine learning paper, we present the feature weights derived from our analysis. The weights reflect the relative importance of each feature in contributing to the overall model’s predictive performance (Fig. 4).

Figure 4.

Figure 4.

Feature selection and coefficient.

In our machine learning study, we tested many algorithms like SVM, KNN, Random Forest, and more, to see which one worked best with our clinical data. We judged them based on accuracy, precision, recall, F1-score, and how fast and scalable they were. After careful analysis, XGBoost stood out. It predicted very accurately, was fast, and could handle our complex data well. XGBoost used key features we found to grasp important patterns in the data. Its strong predictions and efficiency help us make good clinical choices. So, we chose XGBoost for further study and predictions (see Table 2, Fig. 3).

Table 2.

Assessment of the efficacy of 11 machine learning models in both the train set and validation set.

Model_name Accuracy AUC 95% CI Sensitivity Specificity PPV NPV Precision Recall F1 Threshold Task
LR 0.895 0.953 0.9517–0.9553 0.972 0.888 0.464 0.997 0.464 0.972 0.628 0.061 Train
LR 0.902 0.954 0.9518–0.9569 0.984 0.894 0.484 0.998 0.484 0.984 0.649 0.194 Validation
LR 0.9 0.952 0.9487–0.9549 0.971 0.892 0.492 0.997 0.492 0.971 0.653 0.063 Test
NaiveBayes 0.894 0.931 0.9292–0.9337 0.973 0.887 0.462 0.997 0.462 0.973 0.626 0.98 Train
NaiveBayes 0.908 0.5 1.0000–1.0000 0 1 0 0.908 0 0 NaN 0 Validation
NaiveBayes 0.903 0.5 1.0000–1.0000 0 1 0 0.903 0 0 NaN 0 Test
SVM 0.887 0.962 0.9606–0.9639 0.96 0.879 0.443 0.996 0.443 0.96 0.606 0.103 Train
SVM 0.877 0.959 0.9566–0.9616 0.966 0.868 0.426 0.996 0.426 0.966 0.591 0.094 Validation
SVM 0.896 0.959 0.9557–0.9617 0.962 0.889 0.482 0.995 0.482 0.962 0.643 0.095 Test
KNN 0.934 0.976 0.9744–0.9768 0.902 0.938 0.591 0.99 0.591 0.902 0.714 0.2 Train
KNN 0.916 0.923 0.9170–0.9291 0.776 0.93 0.529 0.976 0.529 0.776 0.629 0.2 Validation
KNN 0.911 0.922 0.9153–0.9290 0.775 0.926 0.53 0.975 0.53 0.775 0.629 0.2 Test
RandomForest 0.964 0.996 0.9954–0.9961 0.993 0.961 0.716 0.999 0.716 0.993 0.832 0.208 Train
RandomForest 0.898 0.915 0.9088–0.9217 0.915 0.896 0.472 0.991 0.472 0.915 0.622 0.02 Validation
RandomForest 0.888 0.926 0.9191–0.9321 0.939 0.883 0.463 0.993 0.463 0.939 0.62 0.017 Test
ExtraTrees 0.979 0.998 0.9978–0.9982 1 0.977 0.812 1 0.812 1 0.896 0.111 Train
ExtraTrees 0.876 0.923 0.9170–0.9288 0.928 0.87 0.42 0.992 0.42 0.928 0.578 0.05 Validation
ExtraTrees 0.887 0.916 0.9085–0.9231 0.918 0.884 0.458 0.99 0.458 0.918 0.611 0.033 Test
XGBoost 0.901 0.965 0.9634–0.9665 0.971 0.894 0.479 0.997 0.479 0.971 0.642 0.165 Train
XGBoost 0.902 0.962 0.9602–0.9647 0.985 0.894 0.484 0.998 0.484 0.985 0.649 0.12 Validation
XGBoost 0.9 0.961 0.9586–0.9641 0.973 0.892 0.492 0.997 0.492 0.973 0.654 0.123 Test
LightGBM 0.898 0.963 0.9619–0.9650 0.974 0.89 0.47 0.997 0.47 0.974 0.634 0.114 Train
LightGBM 0.902 0.963 0.9610–0.9656 0.985 0.894 0.484 0.998 0.484 0.985 0.649 0.082 Validation
LightGBM 0.9 0.962 0.9594–0.9649 0.974 0.892 0.491 0.997 0.491 0.974 0.653 0.07 Test
GradientBoosting 0.914 0.959 0.9569–0.9602 0.907 0.915 0.516 0.99 0.516 0.907 0.658 0.176 Train
GradientBoosting 0.918 0.962 0.9595–0.9641 0.902 0.919 0.531 0.989 0.531 0.902 0.668 0.176 Validation
GradientBoosting 0.9 0.96 0.9571–0.9627 0.974 0.892 0.491 0.997 0.491 0.974 0.653 0.155 Test
AdaBoost 0.897 0.954 0.9522–0.9557 0.97 0.89 0.468 0.997 0.468 0.97 0.631 0.466 Train
AdaBoost 0.903 0.957 0.9542–0.9591 0.983 0.894 0.485 0.998 0.485 0.983 0.65 0.466 Validation
AdaBoost 0.9 0.957 0.9536–0.9595 0.97 0.893 0.493 0.996 0.493 0.97 0.654 0.466 Test
MLP 0.899 0.96 0.9586–0.9619 0.972 0.891 0.472 0.997 0.472 0.972 0.635 0.117 Train
MLP 0.903 0.961 0.9584–0.9631 0.984 0.895 0.486 0.998 0.486 0.984 0.651 0.097 Validation
MLP 0.9 0.957 0.9540–0.9599 0.972 0.892 0.491 0.997 0.491 0.972 0.653 0.059 Test

CI = confidence interval, KNN = K-nearest neighbor, lightGBM = light gradient boosting machine, LR = logistic regression, MLP = multi-layer perceptron.

Using the aforementioned feature selection method, we identified 11 features with strong correlation: age, T stage, N stage, liver metastasis, lung metastasis, marital status, income, M stage, stage, and brain metastasis. The outcome of this feature selection is depicted in Fig. 5.

Figure 5.

Figure 5.

Model variables and their relative influences.

3.2. Evaluation of XGBoost model using AUC, DCA, and confusion matrix

AUC evaluation: Training set AUC: 0.965, validation set AUC: 0.962, and testing set AUC: 0.961 (Fig. 6A and B).

Figure 6.

Figure 6.

(A and B) AUC curves for estimating bone metastasis in validation set (A) and test set (B).

The high AUC values achieved by the XGBoost model across all datasets indicate excellent discriminative performance, suggesting that the model can effectively distinguish between patients with and without BM.

DCA Evaluation: In this study, we also evaluated each model through DCA. The decision curve analysis are presented in Fig. 7. The DCA scores are also consistently high, reflecting strong agreement between the model’s predictions and the actual outcomes. This further supports the reliability and diagnostic accuracy of the XGBoost model (Fig. 7A and B).

Figure 7.

Figure 7.

Decision curves on validation set (A) and test set (B).

3.3. Confusion matrix evaluation

The confusion matrix provides a more detailed analysis of the model’s classification performance. The confusion matrix reveals a high number of true positives and true negatives, indicating that the XGBoost model effectively identifies both positive and negative cases of BM in SCLC. However, there are a few false positives and false negatives, which suggest areas for potential improvement in future iterations of the model (Fig. 8A and B).

Figure 8.

Figure 8.

Validation set confusion matrix (A) and test set (B) confusion matrix.

3.4. Development of a web-based visualization tool

For easier interpretation and access, a web-based tool was created for visualizing model results. Users can interactively explore predictions and clinical feature contributions. The intuitive interface lets nonexperts understand model outputs and make decisions based on BM risk predictions for small cell lung cancer patients.

In summary, the study created a machine learning model using SEER clinical data to predict BM in small cell lung cancer. SMOTE effectively balanced classes, boosting model performance. The chosen XGBoost model showed strong prediction with high AUC and DCA scores. A web tool was also made for easier clinical decision-making (Fig. 1).

4. Discussion

As per our knowledge, this marks the inaugural research endeavor dedicated to developing ML algorithms specifically for predicting the occurrence of BM in small cell lung cancer. Leveraging data extracted from the SEER database, we trained and validated the XGBoost prediction model. Notably, the XGBoost model demonstrated superior classification performance, exhibiting high specificity and NPV, outperforming other models in accurately identifying patients with BM in small cell lung cancer.

Bone metastases often predict reduced quality of life and shorter survival.[8,9] The resulting SREs, such as bone pain, pathological fracture, spinal cord compression, hypercalcemia, and the pain caused by related treatments, seriously affect the quality of life of patients.[9] While controlling the primary disease, it is particularly important to actively prevent and treat bone-related events of bone metastases.[10] On the basis of the systematic treatment of the primary disease, multiple department treatment model was adopted for BM, and individualized comprehensive treatment programs were developed in a planned and reasonable manner to reduce or delay the occurrence of bone metastasis complications and bone related events.[1113] Will help improve the quality of life of patients. Only 50% of lung cancer patients with BM have clinical symptoms.[14,15] Lung cancer BM is often accompanied by severe bone pain and SREs (pathological fracture, spinal cord compression, hypercalcemia, etc), which not only significantly affects patients’ sleep, mood, and daily life ability, but also threatens patients’ survival.[16,17] Bone pain is the main clinical symptom of BM.[18,19] Bone pain appeared as the tumor grew to intramedullary pressure > 6.67 kPa, and gradually worsened with the progression of the disease. Prostaglandins and interleukin-1 secreted by tumors pain mediators such as interleukin-2, tumor necrosis factor, and tumor invasion of periosteum, nerves, and soft tissues all lead to severe pain (see Annex 1 for evaluation of pain degree).[20,21] Pathological fracture is often the first symptom of lung cancer with BM. About 1/3 of the patients had bone metastases as the first symptom without primary cancer.[2225] Previously, patients could have no symptoms at all and even live with tumors for months to years. Hypercalcemia is one of the fatal causes of lung cancer with BM.[16,26] Fatigue, emaciation, anemia and low fever can also occur in the late stage of BM of lung cancer.[27,28] The psychological pain related to lung cancer patients is mainly manifested as anxiety, depression, disappointment and loneliness.[29,30] Therefore, the psychological needs of patients are large, such as security, love and be loved, understanding, self-esteem and so on. If these needs are not recognized and better met, it is impossible to obtain pain and other symptoms relieve.

Currently, there is a paucity of published research papers reporting the utilization of machine learning algorithms for predicting BM in small cell lung cancer. Despite the significance of this clinical problem, the development of accurate prediction models remains an unmet need. Given the complexity and heterogeneity of cancer biology, the application of advanced machine learning techniques holds promise in improving our ability to predict and manage BM in small cell lung cancer patients.[3135] Future research efforts are urgently needed to explore and validate innovative machine learning models that can enhance the prediction accuracy and clinical utility in this challenging area.

To identify lung cancer patients with a heightened risk of BM, we innovatively crafted a clinical predictor leveraging an advanced machine learning algorithm, specifically XGBoost. The application of the XGBoost algorithm in tumor prediction models has demonstrated remarkable potential. XGBoost, a powerful gradient boosting machine learning technique, excels in handling complex and high-dimensional data, making it an ideal candidate for tumor prediction tasks. By leveraging the algorithm’s ability to capture nonlinear relationships and interactions among various clinical and biological features, researchers can develop accurate and robust prediction models for tumor occurrence, progression, and response to treatment.[36,37] The integration of XGBoost into tumor prediction models has the potential to significantly improve patient outcomes by enabling earlier diagnosis, personalized treatment plans, and more effective monitoring of disease progression.[33,35] In the present study, we aimed to develop a predictive model for BM in SCLC using machine learning algorithms, with a focus on the XGBoost model. Our approach involved the utilization of 3 distinct datasets: training, validation, and testing, to ensure the robustness and generalization of our model.

Among the various clinical features considered, we employed Lasso regression to identify those that were most predictive of BM. The selected features included age, T stage, N stage, liver metastasis, lung metastasis, marital status, income, stage M, stage, and brain metastasis. These features reflect the complexity of SCLC and its potential for metastasis to various organs, including the bone.

The XGBoost model emerged as the best performer in our study, likely due to its ability to handle complex relationships and nonlinearities in the data. This is in line with previous studies that have demonstrated the superiority of XGBoost in various medical prediction tasks. The evaluation metrics used in our study, including AUC, DCA, and confusion matrix, provided a comprehensive assessment of the model’s performance.

Clinically, the ability to predict BM in SCLC patients is crucial for optimizing treatment strategies and improving patient outcomes. Early detection of metastasis can lead to timely interventions that may mitigate the severity of the condition and improve the overall quality of life for patients.[25,38,39] Our model, with its high predictive accuracy, could potentially serve as a valuable tool in the clinical decision-making process.

Compared to previous studies, our work offers several novel insights.[4,5,40] Firstly, we used a comprehensive set of clinical features, including both demographic and disease-specific variables, to predict BM. This approach allowed us to capture a wider range of potential predictors and improve the accuracy of our model. Secondly, by employing 3 separate datasets, we were able to ensure the reliability and reproducibility of our results.

However, our study also has some limitations. The sample size used in our study may have limited the generalizability of our findings. Future studies with larger sample sizes are needed to further validate our model. Additionally, while we have identified a set of predictive features, the underlying biological mechanisms that drive BM in SCLC remain unclear. Future research should focus on elucidating these mechanisms to develop more targeted and effective treatment strategies.

In conclusion, our study demonstrates the potential of machine learning algorithms, particularly XGBoost, in predicting BM in SCLC. The identified clinical features provide valuable insights into the disease process and could inform clinical decision-making. While our model shows promise, further research is needed to enhance its accuracy and applicability in real-world clinical settings.

Author contributions

Data curation: Xin Zhang.

Formal analysis: Jingyun Li.

Investigation: Jiusong Luan.

Methodology: Jiusong Luan.

Software: Zhelun Song, Jingyu Wang.

Supervision: Jingyu Wang.

Validation: Zhelun Song.

Writing – original draft: Shuai Qie.

Writing – review & editing: Shuai Qie.

Abbreviations:

AUC
area under the receiver operating characteristic curve
BM
bone metastasis
DT
decision tree
KNN
K-nearest neighbor
RF
random forest
SCLC
small-cell lung cancer
SEER
Surveillance, Epidemiology, and End Results
SVM
support vector machine
XGBoost
extreme gradient boosting

The authors have no funding and conflicts of interest to disclose.

The datasets generated during and/or analyzed during the current study are publicly available.

How to cite this article: Qie S, Zhang X, Luan J, Song Z, Li J, Wang J. Model development and validation for predicting small-cell lung cancer bone metastasis utilizing diverse machine learning algorithms based on the SEER database. Medicine 2025;104:12(e41987).

SQ, XZ, and JL are contributed equally to this work.

Contributor Information

Xin Zhang, Email: 174812503@qq.com.

Jiusong Luan, Email: 519420222@qq.com.

Zhelun Song, Email: 1072978014@qq.com.

Jingyun Li, Email: 496228560@qq.com.

Jingyu Wang, Email: 120972100@qq.com.

References

  • [1].Wu Y, Zhang J, Zhou W, Yuan Z, Wang H. Prognostic factors in extensive-stage small cell lung cancer patients with organ-specific metastasis: unveiling commonalities and disparities. J Cancer Res Clin Oncol. 2024;150:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Zhang S, Wang Y, Li S, Liu Y, Cheng Y. A retrospective analysis of prognostic factors and treatment choices in small cell lung cancer with liver metastasis. J Thorac Dis. 2023;15:6776–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Hsiao SC, Chen Y-H, Lo C-C, Lin C-I. A noteworthy treatment of metastatic small-cell lung cancer with afatinib, followed by subsequent development of rare metastatic lesions in the ascending and sigmoid colon. Cancer Rep (Hoboken). 2020;3:e1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Chen Q, Liang H, Zhou L, et al. Deep learning of bone metastasis in small cell lung cancer: a large sample-based study. Front Oncol. 2023;13:1097897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Rong YT, Zhu YC, Wu Y. A novel nomogram predicting cancer-specific survival in small cell lung cancer patients with brain metastasis. Transl Cancer Res. 2022;11:4289–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Ganti AKP, Loo BW, Bassetti M, et al. Small cell lung cancer, version 2.2022, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw. 2021;19:1441–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Shibaki R, Fujimoto D, Nozawa T, et al. Machine learning analysis of pathological images to predict 1-year progression-free survival of immunotherapy in patients with small-cell lung cancer. J ImmunoTher Cancer. 2024;12:e007987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Ni J, Zhang X, Wang H, et al. Clinical characteristics and prognostic model for extensive-stage small cell lung cancer: a retrospective study over an 8-year period. Thorac Cancer. 2022;13:539–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Zhao C, Zhang Z, Hu X, et al. Hyaluronic acid correlates with bone metastasis and predicts poor prognosis in small-cell lung cancer patients. Front Endocrinol (Lausanne). 2021;12:785192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Xie Z, Liu J, Wu M, et al. Real-world efficacy and safety of thoracic radiotherapy after first-line chemo-immunotherapy in extensive-stage small-cell lung cancer. J Clin Med. 2023;12:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].He M, Chi X, Shi X, et al. Value of pretreatment serum lactate dehydrogenase as a prognostic and predictive factor for small-cell lung cancer patients treated with first-line platinum-containing chemotherapy. Thorac Cancer. 2021;12:3101–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Zhang C, Shang X, Sun J, et al. Clinicopathological difference and survival impact of patients with c-SCLC and SCLC. Int J Gen Med. 2021;14:6899–906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Lee S, Shim HS, Ahn B-C, et al. Efficacy and safety of atezolizumab, in combination with etoposide and carboplatin regimen, in the first-line treatment of extensive-stage small-cell lung cancer: a single-center experience. Cancer Immunol Immunother. 2022;71:1093–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Huang L, Feng Y, Xie T, Zhu H, Tang Le, Shi Y. Incidence, survival comparison, and novel prognostic evaluation approaches for stage iii–iv pulmonary large cell neuroendocrine carcinoma and small cell lung cancer. BMC Cancer. 2023;23:312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Vouchara A, Karlafti E, Intzidis IT, Karakatsanis A, Michalopoulos A, Paramythiotis D. Cutaneous lesions: an unusual clinical presentation of small cell lung cancer. Am J Case Rep. 2022;23:e935313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Gong L, Xu L, Yuan Z, Wang Z, Zhao L, Wang P. Clinical outcome for small cell lung cancer patients with bone metastases at the time of diagnosis. J Bone Oncol. 2019;19:100265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Rades D, Motisi L, Veninga T, Conde-Moreno A, Cacicedo J, Schild SE. Predictors of outcomes and a scoring system for estimating survival in patients treated with radiotherapy for metastatic spinal cord compression from small-cell lung cancer. Clin Lung Cancer. 2019;20:322–9. [DOI] [PubMed] [Google Scholar]
  • [18].Hayes SM, Wiese C, Schneidewend R. Tumor lysis syndrome following a single dose of nivolumab for relapsed small-cell lung cancer. Case Rep Oncol. 2021;14:1652–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Gomi D, Fukushima T, Kobayashi T, Sekiguchi N, Koizumi T, Oguchi K. Fluorine-18-fluorodeoxyglucose-positron emission tomography evaluation in metastatic bone lesions in lung cancer: possible prediction of pain and skeletal-related events. Thorac Cancer. 2019;10:980–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Kairemo K, Rasulova N, Suslaviciute J, Alanko T. Radionuclide treatment with 153Sm-EDTMP is effective for the palliation of bone pain in the context of extensive bone marrow metastases: a case report. Asia Ocean J Nucl Med Biol. 2014;2:131–4. [PMC free article] [PubMed] [Google Scholar]
  • [21].Lin CL, Chang J-L, Lo H-C, Wu K-A. Extramedullary-intradural spinal metastasis of small cell lung cancer causing cauda equina syndrome. Am J Med Sci. 2010;339:192–4. [DOI] [PubMed] [Google Scholar]
  • [22].Kersting D, Sandach P, Sraieb M, et al. (68)Ga-SSO-120 PET for initial staging of small cell lung cancer patients: a single-center retrospective study. J Nucl Med. 2023;64:1540–9. [DOI] [PubMed] [Google Scholar]
  • [23].Qu Y, Wang Z, Feng J, et al. Pneumonitis, appendicitis, and biliary obstruction during toripalimab treatment in a patient with extensive-stage small-cell lung cancer: a case report. Ann Palliat Med. 2021;10:9267–75. [DOI] [PubMed] [Google Scholar]
  • [24].Ullah A, Saeed O, Karki NR, et al. Clinicopathological and treatment patterns of combined small-cell lung carcinoma with future insight to treatment: a population-based study. J Clin Med. 2023;12:991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Xue M, Chen G, Chen XG, Hu JY. Predictors for survival in patients with bone metastasis of small cell lung cancer: a population-based study. Medicine (Baltimore). 2021;100:e27070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Ogino H, Hanibuchi M, Kakiuchi S, et al. Analysis of the prognostic factors of extensive disease small-cell lung cancer patients in tokushima university hospital. J Med Invest. 2016;63:286–93. [DOI] [PubMed] [Google Scholar]
  • [27].Hsieh TL, Chen JJ, Chien CS, et al. Small cell lung cancer with liver and bone metastasis associated with hypercalcemia and acute pancreatitis--a case report. Changgeng Yi Xue Za Zhi. 1995;18:190–3. [PubMed] [Google Scholar]
  • [28].Katakami N, Kunikane H, Takeda K, et al. Prospective study on the incidence of bone metastasis (BM) and skeletal-related events (SREs) in patients (pts) with stage IIIB and IV lung cancer-CSP-HOR 13. J Thorac Oncol. 2014;9:231–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Li RX, Li X-L, Wu G-J, et al. Analysis of risk factors leading to anxiety and depression in patients with prostate cancer after castration and the construction of a risk prediction model. World J Psychiatry. 2024;14:255–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Simões C, Julião M, Calaveiras P, Câmara P, Santos T. Ketamine subcutaneous continuous infusion for depressive symptoms at home: a case report beyond pain use [published online ahead of print May 6, 2024]. Palliat Support Care. doi: 10.1017/S1478951524000798. [DOI] [PubMed] [Google Scholar]
  • [31].Motohashi M, Funauchi Y, Adachi T, et al. A new deep learning algorithm for detecting spinal metastases on computed tomography images. Spine (Phila Pa 1976). 2024;49:390–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Wang H, Chen Y, Qiu J, et al. Machine learning based on SPECT/CT to differentiate bone metastasis and benign bone lesions in lung malignancy patients. Med Phys. 2024;51:2578–88. [DOI] [PubMed] [Google Scholar]
  • [33].Xinyang S, Shuang Z, Tianci S, et al. A machine learning radiomics model based on bpMRI to predict bone metastasis in newly diagnosed prostate cancer patients. Magn Reson Imaging. 2024;107:15–23. [DOI] [PubMed] [Google Scholar]
  • [34].Ye H, Shen X, Li Y, et al. Proteomic and metabolomic characterization of bone, liver, and lung metastases in plasma of breast cancer patients. Proteomics Clin Appl. 2024;18:e2300136. [DOI] [PubMed] [Google Scholar]
  • [35].Zhang Y, Xiao L, LYu L, Zhang L. Construction of a predictive model for bone metastasis from first primary lung adenocarcinoma within 3 cm based on machine learning algorithm: a retrospective study. PeerJ. 2024;12:e17098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Lim J, Jeon H-G, Seo Y, Kim M, Moon JU, Cho SH. Survival prediction model for patients with hepatocellular carcinoma and extrahepatic metastasis based on XGBoost algorithm. J Hepatocell Carcinoma. 2023;10:2251–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Wei J, Lu S, Liu W, et al. A machine learning-based model for clinical prediction of distal metastasis in chondrosarcoma: a multicenter, retrospective study. PeerJ. 2023;11:e16485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Liu L, Wei J, Teng F, et al. Clinicopathological features and prognostic analysis of 247 small cell lung cancer with limited-stage after surgery. Hum Pathol. 2021;108:84–92. [DOI] [PubMed] [Google Scholar]
  • [39].Fan Z, Huang Z, Tong Y, Zhu Z, Huang X, Sun H. Sites of synchronous distant metastases, prognosis, and nomogram for small cell lung cancer patients with bone metastasis: a large cohort retrospective study. J Oncol. 2021;2021:9949714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Zou J, Guo S, Xiong MT, et al. Ageing as key factor for distant metastasis patterns and prognosis in patients with extensive-stage small cell lung cancer. J Cancer. 2021;12:1575–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES