Abstract
Background:
Chronic obstructive pulmonary disease is a common respiratory disease. The severity of acute exacerbation of chronic obstructive pulmonary disease is related to disease progression and risk of death. However, the existing grading standards mainly depend on indicators, such as respiratory rate, whether to apply assisted respiratory muscles, and changes in consciousness state, and only reflect the subjective judgment. Imaging omics can extract muscle characteristic data for more complex analysis, which helps to provide a more objective and accurate method to assess the severity of disease for clinic.
Objectives:
The purpose of this study is to construct a severity prediction model based on the combination of chest CT muscle imaging features and clinical data in hospitalized patients with AECOPD.
Methods:
234 hospitalized patients with AECOPD were retrospectively included, divided into 79 grade I, 74 grade II, and 81 grade III. Clinical data and chest CT images were collected. Construction of clinical feature model combined with muscle imaging omics model based on Python machine learning platform.
Results:
The number of hospitalizations for acute exacerbation, disease course, risk of acute exacerbation in stable stage, white blood cell count, neutrophil count, creatinine, and N-terminal B-type natriuretic peptide precursor were statistically different among hospitalized patients with AECOPD in the last year (all P < 0.05). The best model to predict the severity of AECOPD by cascade probability combination method is Xgboost model with AUC of 0.890.
Conclusions:
The disease grading prediction model of AECOPD inpatients constructed based on clinical data and muscle imaging omics characteristics has good performance, and has great potential in assisting clinicians to more accurately stratify the risk of AECOPD inpatients.
Keywords: Acute exacerbation of chronic obstructive pulmonary disease, Imagomics, Machine Learning, Predictive models
Introduction
Chronic obstructive pulmonary disease (COPD) is a heterogeneous lung disease characterized by chronic respiratory failure, with persistent, progressive airflow obstruction due to abnormalities in the airways (bronchitis, bronchiolitis) and/or alveoli (emphysema) [1, 2]. Studies have shown that the need for hospitalization is independently associated with mortality in Acute exacerbation of chronic obstructive pulmonary disease (AECOPD), and the risk of death increases with the increasing frequency of acute exacerbations [3]. At present, COPD treatment guidelines emphasize the daily management of chronic diseases to prevent AECOPD in high-risk patients.
Although there are multiple evaluation criteria for stratifying the severity of AECOPD, the most commonly used to stratify the severity of patients’ condition to make treatment decisions is the GOLD 2022 guideline recommended protocol, which divides the severity of AECOPD patients into three levels around respiratory rate, whether to apply assisted respiratory muscles, altered state of consciousness, hypoxemia, and hypercapnia. However, these grading indicators are interfered by many factors, such as subjective factors of doctors and patients, treatment, etc. While imaging techniques, such as CTA or duplex ultrasound provide valuable data, they are not without limitations. CTA requires the use of iodinated contrast agents, which carries the risk of contrast-induced nephropathy, especially in the elderly or in patients with pre-existing renal insufficiency—a common comorbidity in patients with severe AECOPD [4, 5]. Duplex ultrasound, on the other hand, is operator-dependent and may be limited by patient position. In contrast, non-contrast CT (NCCT) is extensively performed on admission in patients with AECOPD, and the utilization of NCCT for radiomics analysis offers a unique advantage as it provides objective assessment without the need for additional contrast exposure or specialized ultrasound. This makes it a safer and more universally applicable tool for rapid risk stratification in vulnerable patient populations.
The prevalence of sarcopenia in COPD patients is high, and the prevalence increases with disease progression [6]. At the same time, the severity of sarcopenia in COPD patients is significantly negatively correlated with patients’ pulmonary function classification, disease grouping, and health-related clinical outcomes [7]. Sarcopenia in COPD originates from multiple and overlapping mechanisms, including systemic inflammatory response, oxidative stress (ROS, etc.), hypoxemia, hypercapnia, glucocorticoid use, etc [8, 9]. Therefore, the study of sarcopenia can partially reflect the systemic state of COPD. At present, quantitative analysis of pectoral muscles on chest CT has been extensively studied in COPD, and imaging omics for more complex analysis by extracting muscle features may reveal predictive insights into the severity of AECOPD that traditional assessment methods have missed.
Therefore, this study aimed to develop and validate a machine learning model that integrates clinical data and chest CT muscle imaging omics features to predict the severity of hospitalized patients with AECOPD.
Materials and Methods
Case data collection
Patients who met the GOLD 2022 AECOPD diagnostic criteria who were hospitalized at the Second Affiliated Hospital of Soochow University from December 2020 to December 2023 were retrospectively included.
Inclusion criteria: 1 The study population includes age years old; 2 Meet the diagnostic criteria of GOLD 2022 AECOPD; 3 Complete relevant laboratory examinations and chest CT scan within 24 hours before and after admission, and have complete imaging data; 4 Complete clinical data.
Exclusion criteria: 1 Combined with other respiratory diseases, such as asthma, pulmonary embolism, pneumothorax, lung cancer, interstitial lung disease, active pulmonary infectious diseases, etc.; 2 Combined with consumptive or metabolic diseases, such as hyperthyroidism, malignant tumors and diabetes; 3 Combined with severe liver and kidney dysfunction, immune system diseases, cardiovascular diseases, etc. or in the acute stage of such diseases; 4 Missing or incomplete clinical data; 5 The quality of image data does not meet the requirements, such as motion artifacts, insufficient resolution, etc.
A total of 602 hospitalized patients with AECOPD who met the diagnostic criteria of GOLD 2022 were excluded, 40 patients without chest CT examination within 24 hours, 121 patients with other respiratory diseases, 101 patients with wasting diseases, 45 patients with missing or incomplete clinical data, and 61 patients with poor CT image quality. Finally, 234 patients with AECOPD who met the inclusion criteria were included in this study. According to the GOLD 2022 AECOPD classification, 79 cases were grade I, 74 cases were grade II, and 81 cases were grade III (Fig. 1).
Fig. 1.
Flowchart of AECOPD patient inclusion
Clinical data were collected within 24 hours before and after admission, including: age, gender, Body mass index (BMI, weight (kg)/height ()), smoking index (number of cigarettes per day number of years of smoking), number of acute exacerbations in the last year, disease course, whether drug treatment is used, risk of acute exacerbations in stable phase (according to the number of acute exacerbations in the previous year, modified British medical research council (mMRC) or COPD Assessment (CAT), divided into low-risk and high-risk groups) [1], White blood cell (WBC) count, Neutrophil (N) count, Hemoglobin (Hb), Eosinophil (EO) count, High-sensitivity c-reactive protein (hs-CRP), Procalcitonin (PCT), Creatinine (CRE), prealbumin, albumin, Prothrombin time (PT), D-dimer, International normalized ratio (INR), antithrombin III (ATIII), fibrinogen, N-terminal pro-B-type natriuretic peptide (NT-proBNP). If there are multiple examinations within 24 hours before and after admission, the results of the first examination will be included.
According to the severity grading method of AECOPD patients recommended by GOLD 2022 guidelines, the study subjects were divided into three groups: non-acute respiratory failure group (grade I), acute respiratory failure non-life-threatening group (grade II) and acute respiratory failure with life-threatening group (grade III), as shown in Table 1.
Table 1.
GOLD 2022 AECOPD patient severity classification
| Grade I | Grade II | Grade III | |
|---|---|---|---|
| Respiratory rate (times/min) | 20–30 | ||
| Application of assisted respiratory muscles | without | with | with |
| Altered state of consciousness | without | without | with |
| Hypoxemia | Improved by Venturi Mask 24–35% concentration oxygen | Improved by Venturi mask > 35% concentration oxygen | Cannot be improved by oxygen administered to a venturi mask or oxygen administered to > 40% concentration |
| Hypercapnia | without | Yes, PaCO2 increased from baseline or increased to 50–60 mmHg | Yes, PaCO2 increased from baseline or > 60 mmHg or acidosis (pH 7.25) |
Muscle segmentation on CT images
Region of interest (ROI) was sketched manually using itk-snap software (Version 4.0.0, https://sourceforge.net/projects/itk-snap/files/itk-snap/4.0.0/), left and right muscles were identified using a predefined attenuation range −50 to 90 HU, colored manually. The left and right pectoralis major and pectoralis minor muscles on a single axial level above the level of the aortic arch are ROI1; The left and right erector spinae muscles on a single axial level at the level of the twelfth thoracic vertebra are ROI2; The total volume and density values of erector spinae and pectoral muscles in chest CT and the product of the two were included in the clinical data, that is, ROI3 (volume is the volume of a single voxel multiplied by the number of voxels in the ROI region, density is the average density, and the sum of voxel values in the ROI region divided by the number of voxels). The sketching results were jointly validated by two experienced radiologists. Inter-observer agreement was assessed using the intraclass correlation coefficient (ICC). The results show that most of the features have ICC values greater than 0.85, indicating good inter-observer agreement and ensuring the reliability of the extracted features.
After all the features are normalized and preprocessed, the image omics features are extracted. A total of 1874 quantitative image omics features were extracted from the region of interest of each voxel.
Feature screening and model construction
SMOTE is used to balance the feature of the extracted imagomics feature positive samples. After performing standardized preprocessing on the imaging features of erector spinae and pectoral muscles extracted from CT images, the Least Absolute Shrinkage and Selection Operator (LASSO) regression model was applied to perform feature dimensionality reduction, and finally a set of imaging features for assessing AECOPD risk classification was constructed. Twenty-three clinical data were screened by LASSO algorithm, and the clinical data used to evaluate the risk classification of AECOPD were obtained. The missing clinical data of some samples shall be replaced by the median of the same batch of samples. Modeling was performed using screened clinical data and radiomics features. In this study, six common machine learning algorithms are used to train models, namely, Nu–support vector classification (Nu–SVC), C-support vector classification (C-SVC), logistic regression (LR), random forest (RF), adaptive boosting (Adaboost) and eXtreme Gradient Boosting (Xgboost) classifier, and features are input into these six classifiers for model construction. Experiments were performed on the test and training sets by five times of fivefold cross-validation.
In this study, a cascade classification strategy was used to establish a prediction model of AECOPD severity. First, a two-class model was constructed to distinguish non-life-threatening cases (grade I and II) from life-threatening cases (grade III), and then a respiratory failure grading model (grade I and II) was established for non-life-threatening cases. Finally, a three-class classification model of AECOPD severity was obtained by cascade probability combination method.
Statistical analysis
SPSS 26.0 software was used for statistical analysis. The basic characteristics of the sample were summarized by descriptive statistics, and the continuous variables were tested for normality. Data conforming to the normal distribution were expressed by mean and standard deviation ( ± s), and non-normal distribution data were expressed by median and interquartile range [M (Q1, Q3)]; Categorical variables were expressed in frequency and percentage, i.e., N (%). One-way ANOVA was used for normally distributed continuous variables, and non-parametric Kruskal–Wallis H test was used for non-normally distributed continuous variables; chi-square test or Fisher test were used for difference analysis of categorical variables; Significance was marked as P < 0.05.
The study used Receiver operator characteristic curve (ROC) analysis to evaluate the prediction performance of each model, and calculated its Area under the curve (AUC), 95% Confidence interval (CI), accuracy, sensitivity, specificity, precision, and compared the classification performance of different classification algorithms on the test set.
Results
Baseline characteristics of hospitalized patients
A total of 234 hospitalized patients with AECOPD were included and divided into 79 cases with grade I, 74 cases with grade II, and 81 cases with grade III according to the GOLD 2022 AECOPD severity grading criteria. The clinical data of the patients are shown in Table 2.
Table 2.
Comparison results of clinical data among different severity degrees of hospitalized patients with AECOPD
| Indicators | Grade I (n=79) | Grade II (n=74) | Grade III (n=81) | Statistical value | P |
|---|---|---|---|---|---|
| Gender (n, %) | 0.078 | 0.962 | |||
| Male | 66 (83.50) | 63 (85.10) | 68 (84.00) | ||
| Female | 13 (16.50) | 11 (14.90) | 13 (16.00) | ||
| Age (years, ) | 74.19 ± 8.48 | 77.05 ± 7.85 | 74.94 ± 8.37 | 2.461 | 0.088 |
| BMI (kg/m, ) | 22.66 ± 3.49 | 21.93 ± 3.77 | 22.64 ± 4.81 | 0.729 | 0.484 |
| Smoking index (Year package, ) | 593.18 ± 567.10 | 529.54 ± 468.95 | 533.22 ± 531.74 | 0.309 | 0.734 |
| Number of hospitalizations for acute exacerbations in the last year (Time, ) | 0.47 ± 0.68 | 0.75 ± 1.08 | 1.09 ± 1.10 | 7.986 | < 0.001 |
| Duration of disease (years, ) | 9.85 ± 11.26 | 13.61 ± 12.01 | 13.66 ± 9.37 | 3.144 | 0.045 |
| Use of medication (n, %) | 3.562 | 0.168 | |||
| Yes | 28 (35.40) | 36 (48.60) | 39 (48.10) | ||
| No | 51 (64.60) | 38 (51.40) | 42 (51.90) | ||
| Risk of exacerbation in stable phase (n, %) | 16.592 | < 0.001 | |||
| Low | 41 (51.90) | 29 (39.20) | 17 (21.00) | ||
| High | 38 (48.10) | 44 (59.50) | 64 (79.00) | ||
| WBC (10/L, ) | 7.38 ± 2.78 | 8.93 ± 3.54 | 9.28 ± 4.28 | 6.193 | 0.002 |
| N (10/L, ) | 5.22 ± 2.55 | 6.81 ± 3.42 | 7.22 ± 3.65 | 8.315 | < 0.001 |
| PCT (ng/ml, ) | 0.08 ± 0.08 | 0.34 ± 1.55 | 0.16 ± 0.28 | 1.406 | 0.248 |
| Hb (g/L, ) | 132.33 ± 15.65 | 137.09 ± 15.32 | 137.63 ± 21.28 | 2.118 | 0.123 |
| EO (10/L, ) | 0.15 ± 0.17 | 0.15 ± 0.18 | 0.10 ± 0.19 | 1.890 | 0.153 |
| hs-CRP (mg/L, ) | 16.29 ± 28.84 | 29.30 ± 56.11 | 20.79 ± 26.33 | 1.964 | 0.143 |
| CRE (mol/L, ) | 85.14 ± 27.16 | 76.66 ± 29.22 | 70.98 ± 23.70 | 5.684 | 0.004 |
| Albumin (g/L, ) | 39.11 ± 4.10 | 39.89 ± 4.60 | 38.44 ± 5.08 | 1.854 | 0.159 |
| Prealbumin (g/L, ) | 0.20 ± 0.07 | 0.18 ± 0.06 | 0.18 ± 0.06 | 1.215 | 0.300 |
| PT (s, ) | 12.54 ± 1.20 | 12.62 ± 1.15 | 12.97 ± 1.38 | 2.490 | 0.085 |
| D-dimer (g/ml, ) | 0.79 ± 1.41 | 0.57 ± 0.55 | 0.84 ± 0.95 | 1.388 | 0.252 |
| INR () | 1.01 ± 0.08 | 1.03 ± 0.09 | 1.04 ± 0.13 | 1.722 | 0.181 |
| AT III (%, ) | 86.52 ± 10.82 | 85.75 ± 12.18 | 83.89 ± 12.91 | 0.925 | 0.398 |
| Fibrinogen (g/L, ) | 3.71 ± 1.47 | 4.21 ± 1.39 | 3.86 ± 1.07 | 2.621 | 0.075 |
| NT-proBNP (pg/ml, ) | 395.46 ± 988.56 | 424.02 ± 759.75 | 1291.92 ± 2084.92 | 9.117 | < 0.001 |
, P < 0.05; , P < 0.01; , P < 0.001. a, P < 0.05 compared with grade I; b, P < 0.05 compared with grade II; c, P < 0.05 compared with grade III
Chi-square test and one-way analysis of variance showed that the number of hospitalizations for acute exacerbation in the last year (P < 0.001), disease course (P 0.045), stable exacerbation risk (P < 0.001), white blood cell count (P 0.002), neutrophil count (P < 0.001), creatinine (P 0.004), and NT-proBNP (P < 0.001) were statistically different among the three groups.
Construction of joint model
Grades I, II and III are classified according to whether they are life-threatening
The LASSO algorithm is used to screen the extracted image omics features, and 27 features with non-zero coefficients are obtained, including 18 texture features (8 GLCM, 5 NGTDM, 3 GLSZM, 2 GLDM) and 9 first-order gray histogram features, as shown in Fig. 2 and Table 3.
Fig. 2.
Imaging omics results after LASSO screening
Table 3.
Names and coefficients of 27 imagomics features with non-zero coefficients
| Feature Name | Image Type | Feature Type | Eigenvalues | Weight coefficient |
|---|---|---|---|---|
| log-sigma-1–5-mm3D_glszm_SmallAreaEmphasis | Gaussian Laplace | Texture features | Small area emphasis | −0.124194090 |
| squareroot_ngtdm_Busyness | Square root | Texture features | Degree of variation | 0.053362835 |
| Wavelet LLH_gldm_DependenceNonUniformityNormalized | Wavelet | Texture features | Dependency heterogeneity | 0.021936254 |
| wavelet-LHH_firstorder_Median | Wavelet | First-order gray histogram features | Median number | 0.171763434 |
| log-sigma-1-mm-3D_glcm_MCC | Gaussian Laplace | Texture features | Maximum correlation coefficient | 0.088768429 |
| logarithm_ngtdm_Strength | Logarithm | Texture features | Return strength | 0.075331320 |
| original_firstorder_Mean | Original image | First-order gray histogram features | Average | −0.060474813 |
| wavelet-LLH_glcm_MCC | Wavelet | Texture features | Maximum correlation coefficient | −0.057410093 |
| square_firstorder_InterquartileRange | Square | First-order gray histogram features | Interquartile range | −0.050552331 |
| lbp-3D-k_ngtdm_Busyness | Local binary | Texture features | Degree of variation | −0.050211373 |
| exponential_glszm_SizeZoneNonUniformity | Index | Texture features | Dimensional region inhomogeneity | 0.048018975 |
| wavelet-LHH_glcm_ClusterShade | Wavelet | Texture features | Clustering shadow | 0.047864251 |
| lbp-3D-m1_firstorder_Median | Local binary | First-order gray histogram features | Median number | 0.045189763 |
| wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis | Wavelet | Texture features | Emphasize high gray scale and large dependence | −0.038146203 |
| wavelet-HHH_ngtdm_Contrast | Wavelet | Texture features | Contrast ratio | −0.032083419 |
| gradient_glcm_Idmn | Gradient transformation | Texture features | Deficit moment normalization | 0.030651755 |
| wavelet-HLL_firstorder_Mean | Wavelet | First-order gray histogram features | Average | 0.028701025 |
| squareroot_glcm_ClusterProminence | Square root | Texture features | Cluster prominence | −0.028196745 |
| log-sigma-1–5-mm-3D_firstorder_RootMeanSquared | Gaussian Laplace | First-order gray histogram features | Root mean square | −0.025216849 |
| square_glcm_MCC | Square | Texture features | Maximum correlation coefficient | 0.021900238 |
| wavelet-HLH_firstorder_Median | Wavelet | First-order gray histogram features | Median number | −0.019751195 |
| wavelet-HHL_glszm_LargeAreaLowGrayLevelEmphasis | Wavelet | Texture features | High gray scale long running length | 0.016001842 |
| wavelet-LLH_glcm_Contrast | Wavelet | Texture features | Contrast ratio | 0.015893369 |
| wavelet-HHH_firstorder_Median | Wavelet | First-order gray histogram features | Median number | −0.014626083 |
| squareroot_ngtdm_Strength | Square root | Texture features | Strength | 0.014449562 |
| wavelet-LLH_firstorder_Median | Wavelet | First-order gray histogram features | Median number | 0.004796186 |
| log-sigma-1-mm-3D_glcm_Idmn | Gaussian Laplace | Texture features | Deficit moment normalization | −0.004576479 |
LASSO algorithm was used to screen 23 clinical data, and 13 variables were obtained, namely, NT-proBNP, antithrombin III, D-dimer, prothrombin time, albumin, prealbumin, creatinine, eosinophil Cyte count, neutrophil count, drug control, smoking index, BMI, sex, see Fig. 3 and Table 4.
Fig. 3.
Clinical data results after LASSO screening
Table 4.
Names and coefficients of 13 clinical data with non-zero coefficients
| Name | Weight coefficient |
|---|---|
| NT-proBNP | 0.086622414 |
| CRE | −0.060036175 |
| Drug | −0.0546340231 |
| PT | 0.053007572 |
| EO | −0.039905134 |
| pre-ALB | 0.030660692 |
| gender | −0.022259690 |
| N | 0.021396957 |
| BMI | 0.015585371 |
| ATIII | −0.011841793 |
| ALB | −0.007673772 |
| D-dimer | 0.002599356 |
| smoking index | −0.000825472 |
Including the above 27 imaging features, 13 clinical data, the total volume and density values of erector spinae and pectoral muscles in chest CT and their products, a total of 43 features were input into six classifiers to construct a combined model. The best model in the training group was Xgboost model with AUC of 0.935 (95% CI, 0.9210.948), accuracy, sensitivity, specificity, and precision of 0.863, 0.910, 0.816, and 0.832, respectively, and the best model in the test group was RF model with AUC of 0.931 (95% CI, 0.853–1.000), accuracy, sensitivity, specificity, and precision of 0.860, 0.800, 0.893, and 0.800, respectively, as shown in Fig. 4 and Table 5.
Fig. 4.
ROC curves of CT imaging omics combined with clinical data model for predicting life-threatening AECOPD. (a) ROC curve of the training cohort. (b) ROC curve of the test cohort.
Table 5.
Diagnostic performance of CT imaging omics combined with clinical data model in predicting whether AECOPD is life-threatening in the training and test groups
| AUC (95% CI) | Accuracy | Sensitivity | Specificity | Precision | ||
|---|---|---|---|---|---|---|
| CSVC | Training | 0.921 (0.905, 0.936) | 0.844 | 0.836 | 0.852 | 0.850 |
| Test | 0.860 (0.751, 0.968) | 0.721 | 0.467 | 0.857 | 0.636 | |
| LR | Training | 0.912 (0.896, 0.928) | 0.838 | 0.813 | 0.862 | 0.855 |
| Test | 0.840 (0.705, 0.976) | 0.744 | 0.933 | 0.643 | 0.583 | |
| RF | Training | 0.931 (0.917, 0.945) | 0.861 | 0.846 | 0.877 | 0.873 |
| Test | 0.931 (0.853, 1) | 0.860 | 0.800 | 0.893 | 0.800 | |
| Adaboost | Training | 0.918 (0.903, 0.934) | 0.856 | 0.854 | 0.857 | 0.857 |
| Test | 0.9 (0.799, 1) | 0.791 | 0.800 | 0.786 | 0.667 | |
| Xgboost | Training | 0.935 (0.921, 0.948) | 0.863 | 0.910 | 0.816 | 0.832 |
| Test | 0.919 (0.840, 0.998) | 0.814 | 0.867 | 0.786 | 0.684 | |
| NuSVC | Training | 0.924 (0.910, 0.939) | 0.849 | 0.849 | 0.849 | 0.849 |
| Test | 0.858 (0.749, 0.968) | 0.767 | 0.733 | 0.786 | 0.647 |
Grade I and Grade II were classified according to the presence or absence of respiratory failure
The LASSO algorithm is used to screen the extracted image omics features, and 21 features with non-zero coefficients are obtained, including 14 texture features (4 GLCM, 3 GLSZM, 3 GLDM, 3 NGTDM, 1 GLRLM), 6 first-order gray histogram features and 1 shape feature, as shown in Fig. 5 and Table 6.
Fig. 5.
Imagomics results after LASSO screening
Table 6.
Names and coefficients of 21 imagomics features with non-zero coefficients
| Feature Name | Image Type | Feature Type | Eigenvalues | Weight coefficient |
|---|---|---|---|---|
| exponential_firstorder_Range | Index | First-order gray histogram features | Scope | −0.098551556 |
| wavelet-LLL_glszm_GrayLevelNonUniformityNormalized | Wavelet | Texture features | Gray intensity variation | −0.089025666 |
| original_shape_Maximum2DDiameterRow | Original image | Shape features | Maximum 2D diameter | 0.069989043 |
| squareroot_ngtdm_Busyness | Square root | Texture features | Degree of variation | 0.055137651 |
| log-sigma-1–5-mm-3D_glcm_Idmn | Gaussian Laplace | Texture features | Homogeneity | 0.052796116 |
| gradient_ngtdm_Coarseness | Gradient transformation | Texture features | Roughness | −0.051926800 |
| wavelet-LLL_glcm_MCC | Wavelet | Texture features | Maximum correlation coefficient | 0.044209935 |
| wavelet-LHL_firstorder_90Percentile | Wavelet | First-order gray histogram features | 90th percentile | 0.043427594 |
| wavelet-LHH_firstorder_Kurtosis | Wavelet | First-order gray histogram features | Kurtosis | −0.043065732 |
| wavelet-LHL_glrlm_LongRunHighGrayLevelEmphasis | Wavelet | Texture features | Long-term high gray-level emphasis | 0.042273364 |
| log-sigma-0–5-mm-3D_gldm_DependenceVariance | Gaussian Laplace | Texture features | Dependent variance | 0.033456475 |
| log-sigma-1-mm-3D_glszm_LargeAreaHighGrayLevelEmphasis | Gaussian Laplace | Texture features | Large area high gray-scale emphasis | 0.026504495 |
| exponential_firstorder_Kurtosis | Index | First-order gray histogram features | Kurtosis | −0.024836586 |
| squareroot_glszm_ZoneEntropy | Square root | Texture features | Regional entropy | 0.021905257 |
| log-sigma-1–5-mm-3D_firstorder_Skewness | Gaussian Laplace | First-order gray histogram features | Skewness | −0.020516155 |
| wavelet-LLH_firstorder_Skewness | Wavelet | First-order gray histogram features | Skewness | 0.014138977 |
| wavelet-LHH_ngtdm_Contrast | Wavelet | Texture features | Contrast ratio | 0.013612707 |
| log-sigma-1-mm-3D_glcm_MCC | Gaussian Laplace | Texture features | Maximum correlation coefficient | 0.012286738 |
| log-sigma-1-mm-3D_glcm_Imc2 | Gaussian Laplace | Texture features | Correlation information measurement | 0.008612290 |
| gradient_gldm_DependenceVariance | Gradient transformation | Texture features | Dependent variance | 0.005168462 |
| wavelet-HLL_gldm_DependenceEntropy | Wavelet | Texture features | Dependent entropy | −0.004487384 |
LASSO algorithm was used to screen 23 clinical data, and 10 variables were obtained, namely, albumin, creatinine, procalcitonin, hemoglobin, neutrophil count, risk of exacerbation in stable phase, whether drug control is being used, course of disease, the number of hospitalizations and age of acute exacerbation in last year, as shown in Fig. 6 and Table 7.
Fig. 6.
Results of clinical data after LASSO screening
Table 7.
Names and coefficients of 10 clinical data with non-zero coefficients
| Name | Weight coefficient |
|---|---|
| age | 0.0952147161998937 |
| N | 0.0903028993339369 |
| risk of aggravation | 0.0612629371169276 |
| PCT | 0.0480171498769332 |
| CRE | −0.0449644822964078 |
| ALB | 0.0363473267703825 |
| Hb | 0.0319401213840267 |
| length of disease | 0.0239407523510268 |
| Drug | −0.0175317008883738 |
| Hospitalization frequency | 0.0160001349474592 |
Including the above 21 imaging features, 10 clinical data, chest CT erector spinae volume and density values and their products, a total of 34 features were input into six classifiers to construct a combined model. The best model in the training group was Nu–SVC model with AUC of 0.889 (95% CI, 0.8640.915), accuracy, sensitivity, specificity, and precision of 0.812, 0.821, 0.803, and 0.793, respectively, and the best model in the test group was Xgboost model with AUC of 0.871 (95% CI, 0.7460.996), accuracy, sensitivity, specificity, and precision of 0.7, 1.0, 0.4, and 0.625, respectively, as shown in Fig. 7 and Table 8.
Fig. 7.
ROC curves of CT imaging omics combined with clinical data model for discriminating between Grade I and Grade II AECOPD. (a) ROC curve of the training cohort. (b) ROC curve of the test cohort.
Table 8.
Diagnostic performance of CT imaging omics combined with clinical data model in predicting AECOPD grades I and II in training and test groups
| AUC (95% CI) | Accuracy | Sensitivity | Specificity | Precision | ||
|---|---|---|---|---|---|---|
| CSVC | Training | 0.591 (0.381, 0.801) | 0.533 | 0.533 | 0.533 | 0.533 |
| Test | 0.860 (0.751, 0.968) | 0.721 | 0.467 | 0.857 | 0.636 | |
| LR | Training | 0.858 (0.827, 0.890) | 0.817 | 0.814 | 0.819 | 0.805 |
| Test | 0.693 (0.498, 0.888) | 0.7 | 0.867 | 0.533 | 0.65 | |
| RF | Training | 0.761 (0.722, 0.799) | 0.683 | 0.662 | 0.702 | 0.671 |
| Test | 0.858 (0.725, 0.991) | 0.733 | 0.933 | 0.533 | 0.667 | |
| Adaboost | Training | 0.751 (0.712, 0.790) | 0.716 | 0.686 | 0.743 | 0.711 |
| Test | 0.684 (0.487, 0.882) | 0.667 | 0.733 | 0.6 | 0.64 | |
| Xgboost | Training | 0.722 (0.682, 0.762) | 0.671 | 0.738 | 0.61 | 0.635 |
| Test | 0.871 (0.746, 0.996) | 0.7 | 1.00 | 0.4 | 0.625 | |
| NuSVC | Training | 0.889 (0.864, 0.915) | 0.812 | 0.821 | 0.803 | 0.793 |
| Test | 0.667 (0.465, 0.868) | 0.6 | 0.667 | 0.533 | 0.588 |
Construction of three-classification model
The results of the above two test sets were combined with the cascade probability method, and the best model for predicting the severity of AECOPD was the Xgboost model, with an AUC of 0.890 (95% CI, 0.8190.961), and accuracy, sensitivity, specificity and precision of 0.767, 0.933, 0.651 and 0.651, respectively, as shown in Fig. 8 and Table 9.
Fig. 8.

ROC curves for the three-class prediction of AECOPD severity (Grade I, II, and III) in the test cohort
Table 9.
Diagnostic performance of predicted AECOPD three categories in the test group
| AUC (95% CI) | Accuracy | Sensitivity | Specificity | Precision | |
|---|---|---|---|---|---|
| CSVC | 0.761 (0.648, 0.875) | 0.761 | 0.5 | 0.744 | 0.577 |
| LR | 0.783 (0.676, 0.900) | 0.726 | 0.900 | 0.605 | 0.614 |
| RF | 0.877 (0.798, 0.956) | 0.781 | 0.833 | 0.744 | 0.694 |
| Adaboost | 0.798 (0.696, 0.900) | 0.740 | 0.767 | 0.721 | 0.657 |
| Xgboost | 0.890 (0.819, 0.961) | 0.767 | 0.933 | 0.651 | 0.651 |
| NuSVC | 0.785 (0.674, 0.895) | 0.699 | 0.667 | 0.721 | 0.625 |
Discussion
This study constructed a machine learning model that combined chest CT muscle imaging omics and clinical data to predict the severity of the disease in hospitalized patients with AECOPD. To our knowledge, this is the first study to combine clinical data with muscle imaging omics features to predict AECOPD. The model shows excellent predictive performance for life-threatening, respiratory failure, and further prediction of three-level classification.
In the past, there have been many studies on the correlation analysis between CT imaging features and stable COPD. For example, texture analysis has shown its effectiveness in assessing the degree of emphysema. A study by Ginsburg et al. [10] showed that a texture-based approach was effective in classifying the lungs of never smokers, smokers without emphysema, and smokers with emphysema, suggesting that it is possible to identify early stages of smoking-related lung injury before emphysema develops. Lafata et al. [11] showed that the radiomics features extracted from CT images have the potential to quantify lung function changes and evaluate the correlation with spirometry testing. The same approach using radiomics can be extended to study its relationship with other gold standard COPD markers, such as the FEV1/FVC ratio or the frequency of exacerbations associated with COPD patients, enabling an accurate diagnosis of the severity of COPD. Kazuya Tanimura et al. [12] conducted a 5–10-year follow-up study on 130 male patients and 20 smoking control men to explore the relationship between erector spinae muscle as one of the anti-gravity muscle groups and the survival of COPD patients. The results of the study showed that the cross-sectional area of erector spinae muscle in COPD patients was lower than that in smoking controls, and it could reflect physical activity and the severity of COPD at the same time, which could be used as a predictor of mortality in COPD patients (HR, 0.85, 95% CI, 0.790.92, P < 0.001).
With the rapid development of medical imaging technology, many studies have clearly demonstrated lung structure, airway dilation, emphysema and other characteristics using high-resolution scanning of CT images, providing new basis and means for diagnosis and grading of COPD [12–14]. As a new research method, imaging omics extracts a large number of quantitative features from medical images with the help of computer analysis technology, and has shown important application value in the prediction, diagnosis and treatment evaluation of many diseases [15–18]. Imaging omics can comprehensively reflect the morphology, texture and other information of lesions, help to reveal potential disease patterns and pathological changes, and show great potential in the clinical management of complex diseases. Using chest CT to detect changes in muscle quantity and quality can facilitate individualized intervention, estimating the prognosis of COPD exacerbation [19].
Compared with previous studies, there are several key novelties in this study. Considering that the common comorbidity of COPD is sarcopenia [20], this study did not use indicators susceptible to subjective factors, but systematically extracted chest CT muscle imaging omics features, which ensured the use of objective physiological information and reduced potential prediction errors. In addition, previous prediction models mostly studied stable COPD, and this model studied the severity of acute exacerbation of COPD, which is of great benefit to the daily management of patients and the reduction of disease burden.
In this study, machine learning method was used to show that most of the features strongly correlated with the severity of acute exacerbation of COPD patients in the imaging features of pectoralis major, pectoralis minor and erector spinae on chest CT were texture features. Among the 27 features of life-threatening imaging omics, the first-order median feature of wavelet–LHH has a high relative weight, and the rest features are mainly wavelet transform texture features; Among the 21 imagomics features classified for severity grade I and grade II, the exponential first-order range features have a higher relative weight, and the rest features are mainly wavelet transform and Gaussian Laplace texture features. This is consistent with Kalysta Makimoto et al. ’s [21] understanding that radiomics can quantify subtle tissue heterogeneity imperceptible to the naked eye. For example, features such as“small area emphasis”may indicate the fragmentation of homogeneous muscle tissue into smaller, more distinct regions [17, 18, 22], which may pathophysiologically correspond to fat infiltration or fibrosis, a common feature of sarcopenia in COPD. Features such as“Busyness”and certain first-order statistical features (e.g., median and intensity) can reflect changes in muscle metabolic activity and structural integrity. Positively weighted features may be associated with preserved muscle structure and higher metabolic demands; Negatively weighted features may suggest degenerative changes and hypofunction. In addition, indicators such as size area inhomogeneity and dependence inhomogeneity can locate the distribution characteristics of lesions, while parameters such as deficit moment normalization and contrast can be used to evaluate the local inflammatory state. Radiomics features are obtained from medical images by applying various quantification methods to features that are difficult to recognize with the naked eye. These radiomic features enable the phenomena occurring on the muscle cross section due to muscle gain or loss to be quantified and digitized into various radiological indicators. The image texture features reflect the gray distribution features, and imply the heterogeneous composition or distribution of lesions in dimensional space [23]. It is worth noting that most of the features selected for inclusion in the radiomics model are high-order statistical features, indicators of texture and heterogeneity that can more clearly show subtle changes in tissue morphology [24]. In this study, multiple muscle radiomics features are ultimately used for the construction of prediction models, and most of them include features of voxel spatial distribution (high-order texture features and wavelet features), which can amplify subtle differences and have more accurate evaluation ability than general radiological feature parameters.
COPD is a systemic inflammatory disease involving multiple systems. The severity of its acute exacerbation is not only related to respiratory rate, degree of hypoxemia and degree of consciousness disturbance, but also affected by systemic inflammatory response, skeletal muscle consumption and multiple organ functional status. Comprehensive effects. In this study, a prediction model based on machine learning was constructed by combining multi-dimensional data, such as patient clinical data and chest CT muscle imaging omics characteristics, which provided a new direction for accurate stratification of acute exacerbation severity and provided a basis for individualized treatment strategies.
Clinical feasibility and implementation prospects
The models constructed in this study show good potential for clinical translation. The proposed clinical implementation workflow can be briefly described as follows: for patients hospitalized with AECOPD, after completing routine admission chest CT and blood tests, (1) radiologists or trained AI algorithms quickly outline the ROIs of pectoralis major, pectoralis minor, and erector spinae on CT images; (2) automated software extracts radiomics features from the ROI and integrates them with the patient’s clinical data; (3) the integrated data are fed into a pre-trained RF model, which outputs the probability that the patient’s severity is grade III; (4) if the severity is non-grade III, the data are fed into a pre-trained Xgboost model, which outputs the probability that the patient’s severity is grade I and grade II. This provides clinicians with objective decision support tools. The workflow utilizes existing clinical data without the need for additional imaging, demonstrating high feasibility and cost-effectiveness.
Limitations
This study has several limitations. First, the single-center retrospective nature and the limited sample size of this study are important limitations. Although we employed rigorous methods, such as LASSO regression and cross-validation to mitigate the risk of overfitting, but external validation in an independent cohort from different institutions is a crucial step for assessing the generalizability and robustness of a clinical prediction model, and this constitutes a core component of our future research plan. Furthermore, another significant limitation is the lack of pulmonary function data in this study. This is primarily due to the retrospective nature of the study and the clinical reality of managing AECOPD patients: during acute exacerbations, patients are often too ill to perform the forced expiratory maneuvers required for spirometry, leading to the common unavailability of pulmonary function tests during this phase. Consequently, our model is based on readily available CT and blood parameters at admission, which conversely gives it unique value for acute-phase assessment. However, the inability to correlate our model with the gold standard of pulmonary function is a drawback. Future studies that include pulmonary function tests during the stable phase and investigate their relationship with the acute-phase radiomics model would be of great significance.
Conclusion
This study innovatively combines muscle imaging omics with clinical data, avoids the interference of subjective factors in traditional grading methods, and provides a more objective and quantitative severity assessment tool. This model shows good application potential in assisting clinicians in risk stratification and individualized treatment of AECOPD patients.
Acknowledgements
This study would like to express its gratitude to the patients who participated in this study, the funds that provided funding and the professors who helped in the research.
Abbreviations
- AECOPD
Acute exacerbation of chronic obstructive pulmonary disease;
- AUC
Area under the curve;
- CI
Confidence interval;
- COPD
Chronic obstructive pulmonary disease;
- CT
Computed tomography;
- LASSO
Least Absolute Shrinkage and Selection Operator;
- MRI
Magnetic resonance imaging;
- NT-proBNP
N-terminal pro-B-type natriuretic peptide;
- ROC
Receiver operator characteristic curve;
- ROI
Region of interest.
Author contributions
All authors contributed to study conception and design. Zian Liu, Shiyuan Gao, Qiong Pan: raise questions, screen variables, collect data, write papers; Zhe Ye, Fengmei Li: screen variables; Yiwen Huang, Jiahui Yuan: collect data; Yixin Lian, Chen Geng: guide research, revise the thesis.
Funding
This work was supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2023ZD0503606); in part by National Natural Science Foundation of China (62441114); in part by Nuclear Medicine Technology Innovation Research Project(ZHYLZD 2025006) in part by Suzhou Science & Technology Projects (SSD2023008, SYW20240238, SKY2023223, SYW2025002); in part by Suzhou University Suzhou Medical College - Qilu Medical Research Fund Project(24QL200217).
Data availability
No data sets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
This is a retrospective study and does not require further ethics committee approval as it does not involve animal or human clinical trials and is not unethical. The patient’s information was hidden before it was studied. All methods were performed in accordance with declaration of Helsinki.
Consent for publication
Not applicable.
Competing interests
The authors declare no conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally to this work.
References
- 1.Agustí A, Celli BR, Criner GJ, Halpin D, Anzueto A, Barnes P, et al. Global initiative for chronic obstructive lung disease 2023 report: gold executive summary. J Pan Afr Thorac Soc. 2022;4(2):58–80. [Google Scholar]
- 2.Liu Y, Liu T, Ruan L, Zhu D, He Y, Jia J, et al. Cilia plays a pivotal role in the hypersecretion of airway mucus in mice. Curr Mol Pharmacol. 2024;17(1):18761429368288. [DOI] [PubMed] [Google Scholar]
- 3.Haile SR, Guerra B, Soriano JB, Puhan MA. Multiple score comparison: a network meta-analysis approach to comparison and external validation of prognostic scores. BMC Med Res Methodol. 2017;17(1):172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kwok WC, Tam TC, Ho JC, Lam DC, Ip MS, Yap DY. Hospitalized acute exacerbation in chronic obstructive pulmonary disease-impact on long-term renal outcomes. Respir Res. 2024;25(1):36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu Z, Ma Z, Ding C. Association between COPD and CKD: a systematic review and meta-analysis. Front Public Health. 2024. 10.3389/fpubh.2024.1494291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lipovec NC, Schols AM, Borst B, Beijers RJ, Kosten T, Omersa D, et al. Sarcopenia in advanced copd affects cardiometabolic risk reduction by short-term high-intensity pulmonary rehabilitation. J Am Med Dir Assoc. 2016;17(9):814–20. [DOI] [PubMed] [Google Scholar]
- 7.Araújo BE, Teixeira PP, Valduga K, Silva Fink J, Silva FM. Prevalence, associated factors, and prognostic value of sarcopenia in patients with acute exacerbated chronic obstructive pulmonary disease: a cohort study. Clin Nutr ESPEN. 2021;42:188–94. [DOI] [PubMed] [Google Scholar]
- 8.Matera MG, Page C, Cazzola M. Sarcopenia as a treatable trait in COPD: from mechanisms to management. Respir Med. 2025;248:108401. [DOI] [PubMed] [Google Scholar]
- 9.Yan Y, Hu J, Han N, Li HT, Yang X, Li LG, et al. Sorafenib-loaded metal-organic framework nanoparticles for anti-hepatocellular carcinoma effects through synergistically potentiating ferroptosis and remodeling tumor immune microenvironment. Mater Today Bio. 2025;32:101848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ginsburg SB, Lynch DA, Bowler RP, Schroeder JD. Automated texture-based quantification of centrilobular nodularity and centrilobular emphysema in chest ct images. Acad Radiol. 2012;19(10):1241–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lafata KJ, Zhou Z, Liu J-G, Hong J, Kelsey CR, Yin F-F. An exploratory radiomics approach to quantifying pulmonary function in ct images. Sci Rep. 2019;9(1):11509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tanimura K, Sato S, Fuseya Y, Hasegawa K, Uemasu K, Sato A, et al. Quantitative assessment of erector spinae muscles in patients with chronic obstructive pulmonary disease. Novel chest computed tomography-derived index for prognosis. Ann Am Thorac Soc. 2016;13(3):334–41. [DOI] [PubMed] [Google Scholar]
- 13.Pishgar F, Shabani M, Silva TQAC, Bluemke DA, Budoff M, Barr RG, et al. Quantitative analysis of adipose depots by using chest ct and associations with all-cause mortality in chronic obstructive pulmonary disease: longitudinal analysis from mesarthritis ancillary study. Radiology. 2021;299(3):703–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Attaway AH, Welch N, Yadav R, Bellar A, Hatipoğlu U, Meli Y, et al. Quantitative computed tomography assessment of pectoralis and erector spinae muscle area and disease severity in chronic obstructive pulmonary disease referred for lung volume reduction. COPD: Journal of Chronic Obstructive Pulmonary Disease. 2021;18(2):191–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ebadi M, Bhanji RA, Dunichand-Hoedl AR, Mazurak VC, Baracos VE, Montano-Loza AJ. Sarcopenia severity based on computed tomography image analysis in patients with cirrhosis. Nutrients. 2020;12(11):3463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kim YJ. Machine learning models for sarcopenia identification based on radiomic features of muscles in computed tomography. Int J Environ Res Public Health. 2021;18(16):8710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jong EE, Sanders KJ, Deist TM, Elmpt W, Jochems A, Timmeren JE, et al. Can radiomics help to predict skeletal muscle response to chemotherapy in stage iv non-small cell lung cancer? Eur J Cancer. 2019;120:107–13. [DOI] [PubMed] [Google Scholar]
- 18.Dong X, Dan X, Yawen A, Haibo X, Huan L, Mengqi T, et al. Identifying sarcopenia in advanced non-small cell lung cancer patients using skeletal muscle ct radiomics and machine learning. Thorac Cancer. 2020;11(9):2650–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Qiao X, Hou G, Kang J, Wang Q-Y, Yin Y. Ct attenuation and cross-sectional area of the pectoralis are associated with clinical characteristics in chronic obstructive pulmonary disease patients. Front Physiol. 2022;13:833796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Seymour J, Spruit M, Hopkinson N, Natanek S, Man W-C, Jackson A, et al. The prevalence of quadriceps weakness in copd and the relationship with disease severity. Eur Respir J. 2010;36(1):81–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Makimoto K, Hogg J, Bourbeau J, Tan W, Kirby M. Ct imaging with machine learning for predicting progression to copd in individuals at risk. Chest. 2023. 10.1016/j.chest.2023.06.008. [DOI] [PubMed] [Google Scholar]
- 22.Shahzadi I, Zwanenburg A, Frohwein LJ, Schramm D, Meyer HJ, Hinnerichs M, et al. Short-term mortality prediction in acute pulmonary embolism: radiomics values of skeletal muscle and intramuscular adipose tissue. J Cachexia Sarcopenia Muscle. 2024;15(4):1430–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Li Z, Liu L, Zhang Z, Yang X, Li X, Gao Y, et al. A novel ct-based radiomics features analysis for identification and severity staging of copd. Acad Radiol. 2022;29(5):663–73. [DOI] [PubMed] [Google Scholar]
- 24.Huynh E, Coroller TP, Narayan V, Agrawal V, Hou Y, Romano J, et al. Ct-based radiomic analysis of stereotactic body radiation therapy patients with lung cancer. Radiother Oncol. 2016;120(2):258–66. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No data sets were generated or analysed during the current study.







