Skip to main content
Biosafety and Health logoLink to Biosafety and Health
. 2025 Jul 11;7(4):238–244. doi: 10.1016/j.bsheal.2025.07.004

Random forest-based predictor selection and pneumonia risk probability assessment in acute respiratory infections: A cross-sectional study in Chongqing, China, 2023–2024

Yunshao Xu a,1, Yuping Duan a,1, Jule Yang b, Mingyue Jiang a, Yanxia Sun a, Yanlin Cao a, Li Qi b,, Zeni Wu a,, Luzhao Feng a,
PMCID: PMC12412401  PMID: 40918203

Highlights

  • Scientific questions: Progression of acute respiratory infection (ARI) to a severe lower respiratory tract infection, such as pneumonia, can increase clinical severity and place a greater burden on healthcare resources. Currently, limited evidence exists on the use of machine learning to identify key pneumonia predictors from a broad range of potential factors, including demographic, clinical, and pathogen detection data.

  • Evidence before this study: Certain respiratory pathogens and clinical indicators have been identified as potential risk factors for pneumonia. However, the strength of the association between each indicator and pneumonia development remains unclear, and there is a need to identify appropriate predictors using comprehensive data analysis methods.

  • New findings: We used a random forest algorithm and a logistic regression-based nomogram to identify significant predictors of pneumonia among ARI patients. Elevated D-dimer levels and influenza A virus infection were found to be significantly associated with an increased risk of pneumonia. Other important indicators identified included age, neutrophils, lymphocytes, total protein, globulin, triglycerides, total bilirubin, and procalcitonin.

  • Significance of the study: The findings provide a basis for identifying high-risk ARI patients who may progress to pneumonia. The identified predictors can be used to develop clinical prediction models to guide early diagnosis and intervention, potentially improving patient outcomes and optimizing the use of medical resources.

Keywords: Acute respiratory infection (ARI), D-dimer, Influenza A virus (IFV-A), Pneumonia, Random forest algorithm

Abstract

Progression of acute respiratory infection (ARI) to pneumonia increases severity and healthcare burden. Limited evidence exists on using machine learning to identify predictors from demographics, clinical, and pathogen detection data. This study aimed to identify pneumonia predictors in ARI patients using machine learning methods. This observational study was conducted in Chongqing, China, from September 2023 to April 2024. Outpatients and inpatients with ARI were recruited weekly. A random forest algorithm was used for predictor selection, followed by a logistic regression-based nomogram to analyze the probability of pneumonia. Among the 1,638 patients with ARI, those with pneumonia had higher rates of influenza A virus (IFV-A) (49.2 % vs. 39.6 %), influenza B virus (26.3 % vs. 18.6 %), and respiratory syncytial virus (6.1 % vs. 1.9 %) infection than those without pneumonia. In the subgroup of 79 patients with comprehensive blood tests, pneumonia was positively associated with hemoglobin (130.00 g/L vs. 124.00 g/L), blood urea nitrogen (5.73 mmol/L vs. 4.85 mmol/L), C-reactive protein (36.10 mg/L vs. 25.25 mg/L), procalcitonin (0.11 μg/L vs. 0.07 μg/L), and D-dimer (0.95  μg/L vs. 0.80 μg/L) levels, whereas pneumonia was inversely associated with neutrophils (4.20 × 109/L vs. 4.76 × 109/L), aspartate aminotransferase (22.50 U/L vs. 24.00 U/L), and uric acid (280.90 μmol/L vs. 330.00 μmol/L) levels. Elevated D-dimer levels (adjusted odds ratio [aOR] = 1.002, 95 % confidence interval [CI]: 1.001–1.004) and IFV-A infection (aOR = 9.308, 95 % CI: 2.433–35.606) were significantly associated with increased pneumonia probability. In future clinical practice, particular attention should be given to ARI patients with elevated D-dimer levels and IFV-A infections.

1. Introduction

Acute respiratory infection (ARI) is characterized by mild and transient symptoms; however, it has a high incidence rate in the general population. In 2019, the age-standardized disability adjusted of life years rate for respiratory infections and tuberculosis (which includes lower respiratory tract infections) was 384.9 per 100,000 population, and the mortality rate was 13.6 per 100,000 population [1]. If ARI progresses to a severe lower respiratory tract infection, such as pneumonia, it may lead to increased clinical severity and place a greater demand on medical resources, including hospitalization and treatments [2]. A multicenter study found that among 88,182 adult patients hospitalized with invasive pneumococcal disease or noninvasive all-cause pneumonia, the overall mortality rate was 8.3 %, median length of stay was 6 days, and average cost per admission was $9,791, highlighting the significant clinical and economic burden of these conditions [3].

During clinical diagnosis, physicians often rely on routine blood tests to assess ARI progression and guide clinical treatment [4]. However, biomarkers like C-reactive protein (CRP) lack specificity; elevated levels in pneumonia do not enhance predictive severity scores [5]. In recent years, due to improvements in various etiological detection methods and surveillance systems for respiratory pathogens, researchers have proposed that specific respiratory pathogen infections may be associated with the occurrence of pneumonia [6,7]. However, the current clinical tests for patients with ARI are numerous and complex, and the strength of the association between each indicator and pneumonia development remains unclear. Therefore, it is necessary to employ a suitable methodology to explore and identify appropriate indicators that can serve as potential predictors for pneumonia development.

Machine learning approaches are increasingly valued for identifying pneumonia risk factors, especially their ability to manage complex factor associations and screen for strongly correlated variables [8]. In this study, we utilized a multistep machine learning approach to identify potential clinical indicators for pneumonia development and assessed their predictive power in evaluating pneumonia risk. Our research supports early detection of pneumonia risk factors and early identification of high-risk patients in clinical practice. This advancement has the potential to alleviate the disease burden of pneumonia by facilitating timely interventions and improving overall public health outcomes.

2. Materials and methods

2.1. Study design

This observational study was conducted at 23 hospitals across eight districts in Chongqing, China, between September 2023 and April 2024. Both outpatients and inpatients with ARI were recruited from relevant departments, including pediatrics, fever clinics, respiratory medicine, and infectious diseases. A random sampling approach was used to enroll 20–30 outpatients and 10–20 inpatients per week at each study site. Upper respiratory tract specimens were collected by trained healthcare workers. ARI was defined as the sudden onset of and the presence of at least one of the following respiratory symptoms: cough, sore throat, shortness of breath, and nasal congestion [9]. Individuals who had resided in Chongqing for less than half a year and those who declined to participate were excluded.

2.2. Variables and data collection

Trained healthcare workers used a questionnaire to collect demographic and clinical information, including sex, age, body mass index (BMI), location, chronic conditions, smoking status, influenza vaccination status, pneumococcal vaccination status, and the use of antiviral drugs, Chinese patent medicine, and antibiotics before admission. Hematological characteristics of ARI patietns were also collected comprising white blood cells, neutrophils, lymphocytes, platelets, hemoglobin, total protein, albumin, globulin, total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, alanine aminotransferase, aspartate aminotransferase, total bilirubin, direct bilirubin, blood urea nitrogen, creatinine, uric acid, CRP, procalcitonin, and D-dimer. Pneumonia was diagnosed by physicians based on radiological evidence obtained from chest imaging.

2.3. Nucleic acid testing

Nucleic acid testing using real-time quantitative polymerase chain reaction was conducted with nasopharyngeal swab samples to detect 16 respiratory pathogens: influenza A virus (IFV-A), influenza virus B (IFV-B), parainfluenza virus, enterovirus, respiratory syncytial virus (RSV), human metapneumovirus, coronavirus, bokavirus, human adenovirus, Haemophilus influenzae (H. influenzae), Streptococcus pneumoniae (S.pneumoniae), Pseudomonas aeruginosa (P. aeruginosa), Staphylococcus aureus (S. aureus), Klebsiella pneumoniae, Mycoplasma pneumoniae, and Bordetella pertussis. The Respiratory Pathogen Nucleic Acid Multiplex Test Kit (Jiangsu Hechuang Biotechnology Co., Suzhou, China) was used.

2.4. Statistical analyses

Categorical variables are expressed as numbers (n) and proportions (%). Continuous variables are expressed as medians with interquartile ranges (IQRs). The univariate analysis included chi-square and Kruskal-Wallis tests.

For the multivariate analysis, a random forest algorithm was initially employed to select variables with higher importance values as predictors. The random forest model was implemented using the “randomForest” package in R. We incorporated all 64 variables reflecting patients' demographic information, pathogen infection status, and hematological characteristics into the random forest model for analysis. Rows containing missing values were removed. Fig. 1 illustrates the number of samples included in the analysis after the exclusion of those with missing values at each stage of the analysis. Considering the small sample size, we constructed 500 decision trees based on the random forest model. Each tree was trained on a random subset of the training data (generated via bootstrapping), thereby maximizing the stability and accuracy of the research results. We further utilized the 10-fold cross-validation method for validation. The dataset was randomly divided into 10 equal-sized subsets, with nine used for training and one for validation in each iteration, repeated 10 times. This approach ensured comprehensive model evaluation and maximized data utilization. The concordance index (C-index) was calculated for both the training and testing data to evaluate the internal validation and predictive performance of the model. The importance of each feature was calculated and visualized using the “ggplot2” package. The plot hierarchically ordered features according to their mean decrease in Gini impurity, a measure that captures each feature's impact on the model's predictive accuracy.

Fig. 1.

Fig. 1

Study flow chart. Abbreviation: ARI, acute respiratory infection.

Subsequently, to further enhance the interpretability of the predictive model, a nomogram was constructed based on a logistic regression model to analyze the probability of pneumonia development across different levels of these predictors. Analyses were conducted using the “rms” package in R. This approach allowed the visualization of the contribution of each predictor to the overall pneumonia risk [10]. The top 10 predictors, ranked by importance value, in the random forest model were incorporated into a logistic regression for subsequent predictive analysis. We selected the final model by calculating the Akaike Information Criterion (AIC), which balances model fit and complexity to prevent overfitting and enhance generalizability. Additionally, we assessed the model's predictive capacity for all conceivable positive and negative pairs by calculating the C-index. The nomogram was generated using the predicted probability of pneumonia. We also presented the results of the analysis in terms of adjusted odds ratios (aOR) and 95 % confidence intervals (CI).

All data were analyzed using R version 4.4.2 (R Core Team, Vienna, Austria, 2024). Two-sided P-values less than 0.05 were considered to indicate statistical significance.

3. Results

A total of 1,638 patients with ARI were included in this study, of whom 179 (10.9 %) were clinically diagnosed with pneumonia (Fig. 1). The median age of patients diagnosed with pneumonia (32.00 [IQR: 6.00–69.00] years) was significantly higher than that of the non-pneumonia group (19.00 [IQR: 7.00–41.25] years). Patients with pneumonia were more likely to reside in suburban areas (27.4 % vs. 19.9 %); have chronic conditions (31.3 % vs. 15.3 %); and take antiviral drugs (26.3 % vs. 18.1 %), Chinese patent medicine (27.9 % vs. 18.2 %), and antibiotics (27.9 % vs. 14.3 %) before seeking medical care. In addition, patients without pneumonia were more likely to be non-smokers (45.6 % vs. 40.8 %). Patients diagnosed with pneumonia had significantly higher rates of IFV-A (49.2 % vs. 39.6 %), IFV-B (26.3 % vs. 18.6 %), and RSV (6.1 % vs. 1.9 %) infections than those without pneumonia.

Given that the influenza virus (IFV) was detected in 60.1 % of the patients with ARI, a further analysis was performed to investigate IFV co-infection with other pathogens. The results indicated that the predominant pathogens co-detected with IFV were bacteria; the top four were H. influenzae (n = 185), S. pneumoniae (n = 165), P. aeruginosa (n = 99), and S. aureus (n = 73). Additionally, the prevalence of pathogen co-detection was similar between patients with and without pneumonia (Table S1).

Among the 1,638 patients with ARI included in the analyses, 79 had complete records of all 22 blood examination parameters (Fig. 1). Of these, the patients diagnosed with pneumonia had significantly higher median levels of hemoglobin (130.00 g/L vs. 124.00 g/L), blood urea nitrogen (5.73 mmol/L vs. 4.85 mmol/L), C-reactive protein (36.10 mg/L vs. 25.25 mg/L), procalcitonin (0.11 μg/L vs. 0.07 μg/L), and D-dimer (0.95 μg/L vs. 0.80 μg/L) than those without pneumonia. Conversely, patients with pneumonia had significantly lower median levels of neutrophils (4.20 × 109/L vs. 4.76 × 109/L), aspartate aminotransferase (22.50 U/L vs. 24.00 U/L), and uric acid (280.90 μmol/L vs. 330.00 μmol/L) than those without pneumonia (Table 1).

Table 1.

Baseline hematological characteristics of ARI patients in Chongqing from September 2023 to April 2024.

Parameters
Non-pneumonia*
(n = 66)
Pneumonia*
(n = 13)
Overall*
(n = 79)
P-value
WBC (× 109/L) 7.34 (5.29, 8.68) 6.91 (6.22, 8.43) 7.26 (5.42, 8.64) 0.741
NEU (× 109/L) 4.76 (3.03, 6.42) 4.20 (0.78, 5.31) 4.62 (2.89, 5.97) 0.013
LYMPH (× 109/L) 1.10 (0.63, 1.54) 0.53 (0.04, 0.94) 1.02 (0.53, 1.50) 0.815
PLT (× 109/L) 216.50 (128.00, 285.00) 187.00 (137.50, 219.50) 195.00 (134.00, 263.00) 0.892
HGB (g/L) 124.00 (111.00, 136.00) 130.00 (115.50, 142.50) 127.00 (111.00, 136.00) < 0.001
TP (g/L) 69.00 (62.88, 75.00) 61.30 (55.70, 67.60) 68.20 (61.30, 73.90) 0.156
ALB (g/L) 36.25 (31.95, 41.60) 34.40 (32.95, 36.70) 35.70 (32.25, 40.93) 0.227
GLB (g/L) 30.70 (26.17, 35.25) 28.00 (22.60, 32.90) 30.60 (26.00, 35.00) 0.833
TC (mmol/L) 3.97 (3.50, 4.51) 3.66 (3.49, 4.52) 3.92 (3.51, 4.50) 0.055
HDL-C (mmol/L) 1.08 (0.82, 1.21) 1.10 (0.87, 1.40) 1.09 (0.82, 1.30) 0.835
LDL-C (mmol/L) 2.30 (1.71, 2.61) 2.16 (1.89, 2.36) 2.24 (1.74, 2.55) 0.077
TG (mmol/L) 1.16 (0.81, 1.87) 0.99 (0.68, 1.93) 1.09 (0.81, 1.84) 0.936
ALT (U/L) 18.10 (14.00, 31.18) 18.20 (12.45, 25.25) 18.10 (14.00, 30.60) 0.127
AST (U/L) 24.00 (18.00, 35.20) 22.50 (19.19, 26.80) 24.00 (18.40, 35.00) 0.026
TBIL (μmol/L) 10.10 (6.35, 13.48) 10.25 (6.95, 20.16) 10.10 (6.40, 13.70) 0.220
DBIL (μmol/L) 2.70 (2.00, 4.46) 3.09 (2.35, 6.03) 2.70 (2.00, 4.50) 0.388
BUN (mmol/L) 4.85 (3.40, 6.52) 5.73 (4.45, 6.35) 5.02 (3.54, 6.37) 0.037
CRE (μmol/L) 67.50 (57.823, 79.23) 68.00 (62.25, 84.30) 68.00 (58.40, 80.6) 0.074
UA (μmol/L) 330.00 (254.73, 410.73) 280.90 (255.90, 300.35) 303.80 (255.20, 404.20) 0.010
CRP (mg/L) 25.25 (5.76, 61.31) 36.10 (6.69, 104.50) 26.00 (5.78, 64.90) < 0.001
PCT (μg/L) 0.07 (0.05, 0.17) 0.11 (0.08, 0.39) 0.08 (0.05, 0.18) < 0.001
D-dimer (μg/L) 0.80 (0.31, 1.57) 0.95 (0.27, 3.78) 0.82 (0.31, 1.82) 0.010

Abbreviations: ALB, albumin; ALT, alanine aminotransferase; ARI, acute respiratory infection; AST, aspartate aminotransferase; BUN, blood urea nitrogen; CRE, creatinine; CRP, C-reactive protein; DBIL, direct bilirubin; GLB, globulin; HDL-C, high-density lipoprotein cholesterol; HGB, hemoglobin; IQR, interquartile range; LDL-C, low-density lipoprotein cholesterol; LYMPH, lymphocytes; M, median; NEU, neutrophils; PCT, procalcitonin; PLT, platelets; TBIL, total bilirubin; TC, total cholesterol; TG, triglycerides; TP, total protein; UA, uric acid; WBC, white blood cells.

*

Among 1,638 ARI patients, due to the presence of missing values in 22 blood parameters for 1,559 patients, those patients were excluded from the analysis, and the median and interquartile ranges were calculated for the remaining 79 patients with complete records for all 22 blood parameters. Data are expressed as medians with interquartile ranges (IQRs).

P-values reflect the Kruskal-Wallis test for comparisons between ARI patients with pneumonia and without pneumonia.

All variables were incorporated into the random forest algorithm. The internal validation of the random forest analysis was conducted using 10-fold cross-validation, with the model's predictive performance assessed by the C-index. The C-index values were 0.83 for the training data and 0.78 for the testing data. The feature importance analysis revealed that the top 10 variables associated with pneumonia among patients with ARI were age; neutrophil, lymphocyte, total protein, globulin, triglyceride, total bilirubin, procalcitonin, D-dimer levels; and IFV-A infection (Fig. 2A). Based on these 10 selected predictors, a nomogram was developed using logistic regression to visualize the probability of pneumonia across the different values of each predictor among 132 patients with complete records of the 10 selected parameters (Fig. 1). The optimal model demonstrated a C-index of 0.88 and an AIC of 96.36. The logistic regression analysis revealed that elevated D-dimer levels (aOR = 1.002, 95 % CI: 1.001–1.004) and IFV-A infection (aOR = 9.308, 95 % CI: 2.433–35.606) were significantly associated with the development of pneumonia (Fig. 2B, Table 2).

Fig. 2.

Fig. 2

Feature importance for predicting pneumonia (A) and predicted pneumonia nomogram (B). In Panel A, the symbol “&” denotes the concurrent detection of two pathogens. In Panel B, age is categorized into groups. The 18–59 years age group is the control group, while the 5–17 years, 0–4 years, and ≥ 60 years age groups were designated as groups 2, 3, and 4, respectively. “Points” (red line on the top) derived from individual variables (i.e, age group, NEU, LYMPH, TP, GLB, TG, TBIL, PCT, D-dimer, and IFV-A) are summed up to calculate the “Total points” (red line at the bottom). A vertical line, which is drawn downward from the “Total points” to “Probability of pneumonia” (blue line at the bottom), is utilized to determine the predicted value of “Probability of pneumonia”. Abbreviaions: ALB, albumin; ALT, alanine aminotransferase; AST, aspartate aminotransferase; B. Pertussis, Bordetella pertussis; BKV, bokavirus; BMI, body mass index; BUN, blood urea nitrogen; CRE, creatinine; CRP, C-reactive protein; DBIL, direct bilirubin; EV, enterovirus; GLB, globulin; H. influenzae, Haemophilus influenzae; HAdv, Human adenovirus; HCoV, coronavirus; HDL-C, high-density lipoprotein cholesterol; HGB, hemoglobin; HMPV, human metapneumovirus; HPIV, parainfluenza virus; HRhv, human rhinovirus; IFV-A, influenza virus A; IFV-B, influenza virus B; K. pneumoniae, Klebsiella pneumoniae; LDL-C, low-density lipoprotein cholesterol; LYMPH, lymphocytes; M. pneumonia, mycoplasma pneumonia; NEU, neutrophils; P. aeruginosa, Pseudomonas aeruginosa; PCT, procalcitonin; PLT, platelets; RSV, respiratory syncytial virus; S. aureus, Staphylococcus aureus; S. pneumoniae, Streptococcus pneumoniae; TBIL, total bilirubin; TC, total cholesterol; TG, triglycerides; TP, total protein; UA, uric acid; WBC, white blood cells.

Table 2.

Performance of admission prediction model for pneumonia development in acute respiratory infection patients.

Predictors
Non-pneumonia*
(n = 110)
Pneumonia*
(n = 22)
aOR (95 % CI) P-values
Age group (years), n (%)
 <5 0 (0.00) 0 (0.00)
 5–17 4 (3.64) 1 (4.55) 6.763 (0.410–111.519) 0.181
 18–59 42 (38.18) 3 (13.64) Ref.
 ≥60 64 (58.18) 18 (81.82) 4.185 (0.711–24.639) 0.114
Blood examination, median (IQR)
 NEU (× 109/L) 4.78 (3.04, 6.74) 3.55(0.76, 5.70) 0.987 (0.939–1.037) 0.599
 LYMPH (× 109/L) 1.09 (0.59, 1.72) 0.51 (0.05, 0.82) 1.003 (0.739–1.362) 0.984
 TP (g/L) 69.90 (63.98, 74.97) 61.85 (56.13, 67.55) 0.945 (0.881–1.013) 0.110
 GLB (g/L) 30.65 (26.68, 35.00) 27.80 (22.95, 32.33) 0.941 (0.859–1.031) 0.194
 TG (mmol/L) 1.21 (0.84, 1.64) 0.97 (0.65, 1.39) 1.019 (0.777–1.338) 0.890
 TBIL (μg/L) 10.20 (6.95, 13.70) 10.58 (7.38, 22.28) 1.065 (0.957–1.184) 0.248
 PCT (μg/L) 0.06 (0.04, 0.14) 0.12 (0.08, 0.37) 1.012 (0.971–1.054) 0.578
 D-dimer (μg/L) 0.60 (0.30, 1.43) 2.52 (0.31, 549.75) 1.002 (1.001–1.004) 0.007
Pathogen detection, n (%)
 IFV-A 26 (23.64) 13 (59.10) 9.308 (2.433–35.606) 0.001
Model performance
 AIC 96.36
 C-index 0.88

Abbreviations: AIC, Akaike Information Criterion; aOR, adjusted odds ratio; C-index, concordance index; CI, confidence interval; Ref., reference; GLB, globulin; IFV-A, influenza virus A; IQR, interquartile range; LYMPH, lymphocytes; NEU, neutrophils; PCT, procalcitonin; TBIL, total bilirubin; TG, triglycerides; TP, total protein.

*

Among 1,638 ARI patients, due to the presence of missing values in the selected 10 parameters for 1,506 patients, those patients were excluded from the analysis.

In the logistic regression model, all predictors were adjusted as potential confounders during covariate selection, with the final optimal model determined by the lowest AIC and highest C-index.

4. Discussion

IFV infection was detected in 60.1 % of the 1,638 patients with ARI included in this study. Among these IFV-infected patients, co-detection of bacterial pathogens was common. The two bacterial pathogens most frequently co-detected with IFV were H. influenzae and S. pneumoniae. Research suggests that the seasonal prevalences of IFV and H. influenzae overlap to some extent. This overlap may be related to the synergistic effects of these pathogens when host immune function is compromised. For instance, IFV infection can weaken the host respiratory mucosal barrier, thereby increasing the risk of H. influenzae infection [11]. Additionally, studies have proposed that IFV infection may enhance susceptibility to S. pneumoniae, particularly for patients with community-acquired pneumonia. Coinfection with IFV and S. pneumoniae can lead to severe clinical manifestations and complex treatment processes [12]. Bacterial superinfection in the lungs of people suffering from influenza is a key element that promotes severe disease and mortality [13]. It indicated that in the treatment of patients with severe ARI, it is crucial to focus not only on IFV infection but also on multi-pathogen testing to identify potential coinfections. This is because some opportunistic pathogenic bacteria may exacerbate the clinical symptoms of patients with IFV infection, leading to more severe clinical outcomes.

Of the patients with ARI included in this study, 10.9 % were diagnosed with pneumonia. The univariate analysis revealed that the proportions of IFV-A, IFV-B, S. pneumoniae, and RSV infections were significantly higher in patients with pneumonia than in those without pneumonia. This suggests that the risk of progression to pneumonia varies depending on the respiratory pathogen type involved. Additionally, significant differences in the median values of several blood parameters were observed between patients with and without pneumonia, indicating that the elevation or reduction of certain blood parameter values may be associated with the occurrence of pneumonia in patients with ARI.

The multistep machine learning analysis revealed that IFV-A infection and elevated D-dimer levels were significantly associated with pneumonia diagnosis. Other studies have suggested that one-third of patients infected with IFV develop pneumonia, which is a significant cause of mortality [14]. Additionally, IFV infection is a common cause of severe pneumonia, with overactivation of the host immune system, including macrophages, potentially contributing to the severe pathological changes mediated by IFV infection [15]. Previous studies have demonstrated that abnormal coagulation results, especially markedly elevated D-dimer levels, are associated with poor prognosis in patients with pneumonia [16]. Pneumonia-induced systemic inflammatory response syndrome activates the coagulation system; inflammatory cells, such as neutrophils and monocytes, release inflammatory mediators that stimulate endothelial cells to express tissue factor, thereby initiating the extrinsic coagulation pathway. Additionally, inflammatory cells can directly damage the vascular endothelium, exposing the collagen fibers and activating the intrinsic coagulation pathway [17]. These mechanisms support the rationale for using the D-dimer level as a predictor of pneumonia occurrence, as suggested by the results of this study.

The primary limitation of our study is that the majority of the patients with ARI lacked comprehensive blood examination records. Consequently, the sample size available for analyses using random forest and logistic regression models was relatively small, which may have compromised the statistical power of our findings. Although some indicators did not show significant associations with pneumonia in the final logistic regression model, this might be attributed to the limited sample size. Future studies with larger populations are warranted to further explore these associations. On the other hand, given the large number of variables included in our study, it is currently not feasible to obtain a sufficient amount of data for external validation. Studies could be conducted in the future to supplement and perform external validation.

By integrating demographic, clinical, and pathogen detection data using machine learning methods, we suggested that elevated D-dimer levels and IFV-A infection have a significantly predictive value for pneumonia development among patients with ARI. In addition, several other indicators, such as age, neutrophils, lymphocytes, total protein, globulin, triglycerides, total bilirubin, and procalcitonin, were identified by the random forest model as important factors for the progression of ARI patients to pneumonia. The associations between these indicators and the development of pneumonia in ARI patients can be further explored in future studies with larger sample sizes.

Ethics statement

Ethical approval for the study was obtained from the Ethics Committee of the Chongqing Center for Disease Control and Prevention. Informed consent was obtained from all participants or their guardians (Identifier: 2023-KY-028-2).

Acknowledgements

This work was supported by the Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences (2022-12M-CoV19-004), the Science & Technology Fundamental Resources Investigation Program (2023FY100600), Special Funds for the Basic Research and Development Program of the Central Non-profit Research Institutes of China (2021-RC330-002), and China Preventive Medicine Association (CPMA2024CRBFK).

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Author contributions

Yunshao Xu: Project administration, Methodology, Investigation, Data curation, Conceptualization. Yuping Duan: Writing – original draft, Visualization, Software, Methodology, Formal analysis. Jule Yang: Resources, Investigation. Mingyue Jiang: Resources, Conceptualization. Yanxia Sun: Supervision, Methodology. Yanlin Cao: Resources, Data curation. Li Qi: Supervision, Funding acquisition, Conceptualization. Zeni Wu: Writing – review & editing, Validation, Methodology, Formal analysis. Luzhao Feng: Writing – review & editing, Validation, Supervision, Funding acquisition, Conceptualization.

Footnotes

Supplementary data to this article can be found online at https://doi.org/10.1016/j.bsheal.2025.07.004.

Contributor Information

Li Qi, Email: qili19812012@126.com.

Zeni Wu, Email: zeni.wu@pumc.edu.cn.

Luzhao Feng, Email: fengluzhao@cams.cn.

Supplementary data

The following are the Supplementary data to this article:

Supplementary Data 1
mmc1.docx (38.4KB, docx)

References

  • 1.GBD 2019 Diseases and Injuries Collaborators, Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: A systematic analysis for the Global Burden of Disease Study 2019, Lancet 396 (10258) (2020) 1204–1222, 10.1016/s0140-6736(20)30925-9. [DOI] [PMC free article] [PubMed]
  • 2.Beishuizen B.H.H., Stein M.L., Buis J.S., Tostmann A., Green C., Duggan J., Connolly M.A., Rovers C.P., Timen A. A systematic literature review on public health and healthcare resources for pandemic preparedness planning. BMC Public Health. 2024;24(1):3114. doi: 10.1186/s12889-024-20629-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mohanty S., Cossrow N., Yu K.C., Ye G., White M., Gupta V. Clinical and economic burden of invasive pneumococcal disease and noninvasive all-cause pneumonia in hospitalized US adults: A multicenter analysis from 2015 to 2020. Int. J. Infect. Dis. 2024;143 doi: 10.1016/j.ijid.2024.107023. [DOI] [PubMed] [Google Scholar]
  • 4.Capdevila J.A., Martínez-Vázquez J.M., Almirante B., Hernandez A. Liver alterations in acute pneumonia. Arch. Intern. Med. 1990;150(10):2206–2209. doi: 10.1001/archinte.150.10.2206. [DOI] [PubMed] [Google Scholar]
  • 5.Wu J., Jin Y.U., Li H., Xie Z., Li J., Ao Y., Duan Z. Evaluation and significance of C-reactive protein in the clinical diagnosis of severe pneumonia. Exp. Ther. Med. 2015;10(1):175–180. doi: 10.3892/etm.2015.2491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sun Q., Liu Z., Jiang M., Lu Q., Tu Y. The circulating characteristics of common respiratory pathogens in Ningbo, China, both before and following the cessation of COVID-19 containment measures. Sci. Rep. 2024;14(1):25876. doi: 10.1038/s41598-024-77456-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sun Y., Dai L., Shan Y., Yang Y., Wu Y., Huang X., Ma N., Huang Q., Jiang M., Jia M., Yang W., Feng L. Pathogen characteristics of respiratory infections in the season after the COVID-19 pandemic between August and December 2023: Evidence from direct-to-consumer testing-based surveillance in Guangzhou and Beijing. China, Int. J. Infect. Dis. 2024;147 doi: 10.1016/j.ijid.2024.107195. [DOI] [PubMed] [Google Scholar]
  • 8.Sophonsri A., Lou M., Ny P., Minejima E., Nieberg P., Wong-Beringer A. Machine learning to identify risk factors associated with the development of ventilated hospital-acquired pneumonia and mortality: Implications for antibiotic therapy selection. Front. Med. 2023;10 doi: 10.3389/fmed.2023.1268488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jia M., Li T., Jiang M., Dai P., Tang W., Xu Y., Wang Q., Li Q., Duan Y., Xiong Y., Han X., Li Z., Qian J., Feng L., Qi L., Yang W. Estimated number and incidence of influenza-associated acute respiratory infection cases in winter 2021/22 in Wanzhou District, China. Public Health. 2024;237:141–146. doi: 10.1016/j.puhe.2024.09.012. [DOI] [PubMed] [Google Scholar]
  • 10.Florin T.A., Ambroggio L., Lorenz D., Kachelmeyer A., Ruddy R.M., Kuppermann N., Shah S.S. Development and internal validation of a prediction model to risk stratify children with suspected community-acquired pneumonia. Clin. Infect. Dis. 2021;73(9):e2713–e2721. doi: 10.1093/cid/ciaa1690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Dunning J., Thwaites R.S., Openshaw P.J.M. Seasonal and pandemic influenza: 100 years of progress, still much to learn. Mucosal Immunol. 2020;13(4):566–573. doi: 10.1038/s41385-020-0287-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Luo X., Yuan Q., Li J., Wu J., Zhu B., Lv M. Alterations in the prevalence and serotypes of Streptococcus pneumoniae in elderly patients with community-acquired pneumonia: A meta-analysis and systematic review. Pneumonia (Nathan) 2025;17(1):5. doi: 10.1186/s41479-025-00156-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McCullers J.A. The co-pathogenesis of influenza viruses with bacteria in the lung. Nat. Rev. Microbiol. 2014;12(4):252–262. doi: 10.1038/nrmicro3231. [DOI] [PubMed] [Google Scholar]
  • 14.Fullana Barceló M.I., Artigues Serra F., Millan Pons A.R., Asensio Rodriguez J., Ferre Beltran A., Del Carmen Lopez M., Bilbao J.R., Prieto Jaume M.R. Analysis of viral pneumonia and risk factors associated with severity of influenza virus infection in hospitalized patients from 2012 to 2016. BMC Infect. Dis. 2024;24(1):302. doi: 10.1186/s12879-024-09173-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Uematsu T., Fujita T., Nakaoka H.J., Hara T., Kobayashi N., Murakami Y., Seiki M., Sakamoto T. Mint3/Apba3 depletion ameliorates severe murine influenza pneumonia and macrophage cytokine production in response to the influenza virus. Sci. Rep. 2016;6:37815. doi: 10.1038/srep37815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tang N., Li D., Wang X., Sun Z. Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia. J. Thromb. Haemost. 2020;18(4):844–847. doi: 10.1111/jth.14768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.He X., Zhang C., Ji J., Liu Y., Feng W., Luo L., Fan H., Guo L. Prognostic factors in hospitalized patients with COVID-19 pneumonia and effectiveness of prophylactic anticoagulant therapy: A single-center retrospective study. BMC Infect. Dis. 2025;25(1):303. doi: 10.1186/s12879-025-10666-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1
mmc1.docx (38.4KB, docx)

Articles from Biosafety and Health are provided here courtesy of Elsevier

RESOURCES