Abstract
Background
Heterogeneity is a critical characteristic of severe coronavirus disease 2019 (COVID-19) pneumonia. Integrating chest computed tomography (CT) imaging and plasma proteomics holds the potential to elucidate Image-Expression Axes (IEAs) that can effectively address this disease heterogeneity.
Methods
A cohort of subjects diagnosed with severe COVID-19 pneumonia at 12 participating hospitals between December 2022 and March 2023 was prospectively screened for eligibility. Context-aware self-supervised representation learning (CSRL) was employed to extract intricate features from CT images. Quantification of plasma proteins was achieved using the Olink® inflammation panel. A deep learning model was meticulously trained, with CSRL features serving as input and the proteomic data as the target. This trained model facilitated the construction of IEAs, offering a representation of the underlying disease heterogeneity. The potential of these IEAs for prognostic and predictive enrichment was subsequently explored via conventional regression models.
Results
The study cohort comprised 1979 eligible patients, who were stratified into a training set of 630 individuals and a testing set of 1349 individuals. Three distinct IEAs were identified: IEA1 was correlated with shock conditions, IEA2 was associated with the systemic inflammatory response syndrome (SIRS), and IEA3 was reflective of the coagulation profile. Notably, IEA1 (odds ratio [OR]= 0.52, 95 % confidence interval [CI]: 0.40 to 0.67, P < 0.001) and IEA2 (OR=0.74, 95 % CI: 0.62 to 0.90, P=0.002) exhibited significant associations with the risk of mortality. Intriguingly, patients characterized by lower IEA1 values (<-2, indicative of more severe shock) demonstrated a reduced mortality risk when administered with steroids. Conversely, patients with higher IEA2 values seemed to benefit from a judicious approach to fluid infusion.
Conclusions
Our comprehensive approach, seamlessly integrating advanced deep learning techniques, proteomic profiling, and clinical data, has unraveled intricate interdependencies between IEAs, protein abundance patterns, therapeutic interventions, and ultimate patient outcomes in the context of severe COVID-19 pneumonia. These discoveries make a significant contribution to the rapidly advancing field of precision medicine, paving the way for tailored therapeutic strategies that can significantly impact patient care.
Keywords: Covid-19, Systemic inflammatory response syndrome, Heterogeneity, Self-supervised representation learning
Introduction
Coronavirus disease 2019 (COVID-19), a highly significant disease, has made an indelible impact on global health systems and individuals’ well-being.[1] The pandemic's rapid spread and profound consequences have highlighted the urgency for a thorough investigation and comprehensive solutions.[2] While it is true that COVID-19 is no longer a pressing threat in China at this moment, the importance of this study lies in its exploration of the heterogeneity of severe COVID-19 pneumonia. Understanding the variability in disease presentation, progression, and response to treatment is crucial for developing effective clinical management strategies. Furthermore, the insights gained from this study could potentially be applied to pneumonia caused by other pathogens, enhancing our ability to diagnose and treat a broader range of pulmonary infections. One of the remarkable aspects of COVID-19 is its inherent heterogeneity, with varying clinical presentations, disease trajectories, and outcomes.[3] This variability has posed challenges in predicting disease severity and tailoring effective treatment strategies. Amid this complexity, researchers and medical professionals have been exploring innovative approaches to better understand and manage COVID-19. A promising avenue involves the integration of chest computed tomography (CT) scans and plasma proteomics as a potential solution. Chest CT scans provide a detailed visualization of the lungs, enabling the detection of COVID-19-related lung abnormalities and inflammatory responses.[4,5] Concurrently, plasma proteomics analysis allows for the profiling of proteins in the bloodstream, offering insights into the systemic inflammatory processes triggered by the virus.[3,[6], [7], [8] By combining these two approaches, it becomes possible to capture radiomic features of the inflammatory response across multiple levels of the disease. This integration provides a holistic understanding of the disease's impact on the body, allowing for more accurate risk assessment, personalized treatment decisions, and monitoring of treatment efficacy.
In this context, the synergy between chest CT and plasma proteomics heralds a new approach in the fight against COVID-19. The fusion of imaging and molecular data not only enhances our understanding of disease heterogeneity but also empowers healthcare professionals with valuable tools for improved patient care. In the study, we integrated chest CT and plasma proteomics of severe COVID-19 cases, hypothesizing that harnessing integrated deep learning training holds the potential to unveil groundbreaking Image-Expression Axes (IEAs). These novel IEAs possess the dual potential of serving as prognostic indicators, correlating with clinical outcomes, and as predictive indicators, capable of influencing the therapeutic impact of interventions in the management of severe COVID-19 pneumonia.
Methods
Study setting and population
The study was conducted at 12 medical institutions in China between December 2022 and March 2023, which included 4 primary/secondary healthcare facilities, 6 tertiary healthcare facilities, and 2 university-affiliated hospitals, all located in mainland of China. Confirmation of COVID-19 infection was established through nucleic acid testing.[9] Severe cases were classified based on meeting one or more of the criteria outlined in the document released by the Office of the National Health Commission (https://www.gov.cn/zhengce/zhengceku/2023–01/06/content_5735343.htm): (1) pulse oxygen saturation below 93 % while breathing ambient air; (2) presence of respiratory distress with a respiratory rate (RR) exceeding 30/min; (3) arterial oxygen partial pressure (PaO2) to fraction of inspired oxygen (FiO2) ratio (P/F ratio) falling below 300 mmHg; and (4) progression of symptoms with lung lesions advancing over 50 % within 24–48 h. Patients were excluded if they met any of the subsequent criteria: (1) age <18 years; (2) consent to a do-not-resuscitate order; (3) absence of chest CT data; (4) active malignancy; and (5) immunosuppression due to factors such as prolonged corticosteroid use, HIV, or other autoimmune disorders. Ethical approval was acquired from all participating institutions (Primary center's approval number: 2023–0048), and the study adhered to the principles of the Helsinki Declaration. Informed consent was obtained from the patient prior to collecting blood samples for plasma protein tests, in accordance with ethical guidelines and regulations.
Data collection and preprocessing
Three types of data were collected for the COVID-19 cohort. The clinical data involved demographics, laboratory tests, vital signs, fluid intake, vasopressors, and steroids. Patients were followed during their hospital stay, and the primary outcome was hospital mortality. Chest CT data on day 1 after hospital admission were obtained and exported from the PACS system in Digital Imaging and Communications in Medicine (DICOM) format. All images were re-sampled to isotropic 1 mm3 resolution. The Hounsfield Units were mapped to the intensity window of (−1024, 240) and then normalized to (−1, 1). Context-aware self-supervised representation learning (CSRL) was employed to extract features of CT images.[10] First, we transform a 3D image into a graph configuration of patches centered at distinct landmarks, precisely positioned by an anatomical atlas. This graph structure aligns with the anatomical correspondences established through image registration between the subject's image and the atlas image. Second, to effectively accommodate varied image dimensions, we introduce a hierarchical model that discerns anatomy-specific characteristics at the patch level and subject-specific traits at the graph level. At the patch level, a conditional encoder is employed to seamlessly amalgamate local region textures with their corresponding anatomical locations. Subsequently, at the graph level, we leverage a graph convolutional network (GCN) to seamlessly integrate the interrelationship between diverse anatomical regions (refer to Supplementary Material Methods for more details). A GCN processes graph-structured data, capturing relationships between different anatomical regions.
Protein quantification was performed using the Olink® inflammation panel (Olink Proteomics AB, Uppsala, Sweden) as per the manufacturer's stipulated guidelines. The Olink protocol employs the well-established proximity extension assay technology,[11] by which pairs of oligonucleotide-labeled antibody probes selectively bind to their respective target proteins. Upon close proximity, these probe pairs undergo pair-wise hybridization of the associated oligonucleotides. The introduction of a DNA polymerase triggers a proximity-dependent DNA polymerization event, leading to the formation of a distinct polymerase chain reaction (PCR) target sequence. This ensuing DNA sequence is subsequently detected and quantified using a state-of-the-art microfluidic real-time PCR instrument (Signature Q100, LC-Bio Technology CO., Ltd., Hangzhou, China).
Deep learning integration of image features and proteomics to derive IEAs
We initiated the creation of IEAs by employing CSRL features as input for a multilayer perceptron (MLP). This MLP generated a condensed, low-dimensional representation for each individual patch. Subsequent to this, we carried out supervised dimension reduction utilizing a Product of experts (PoE) model,[12] leading to the acquisition of subject-level IEAs. To ensure the autonomy of each IEA in capturing distinct disease processes, we implemented statistically independent constraints via the Hilbert–Schmidt independence criterion (HSIC).[13] A conclusive linear layer leveraged these IEAs as input to concurrently predict protein levels. The model's parameters were collectively optimized using the Adam optimization algorithm.[14] Throughout the training process, we systematically evaluated the influence of feature selection on proteins. The proteomic data from the Olink panel was selected as the target because it provides quantitative molecular insights that can be linked to the imaging features extracted from CT scans. This integration allows the model to explore correlations between structural and molecular information, identifying potential biomarkers for personalized treatment. For this, we subjected each protein to testing its association with the top 128 principal components of the CSRL features within the training dataset. The assessment encompassed the consideration of varied thresholds for protein inclusion, delineated by the P-value derived from the F-test for each protein.[15]
To ascertain the optimal threshold P-value and the number of proteins utilized for model training, we conducted a thorough analysis of the stability of IEAs across various cross-validation (CV) folds (Supplementary Table S1). To determine the optimal number of IEAs, we performed a variance analysis on a set of proteins. The model was trained on a randomly split training dataset, and the variance explained was calculated using a holdout validation set. The total variance explained increased with the number of IEAs, but the rate of increase slowed after the third IEA. This deceleration in variance gain between the third and fourth IEAs indicated diminishing returns, thus justifying the selection of the third IEA for the final model.
The dataset with proteomic data (n=630) was randomly partitioned into training and validation subsets, with 470 instances assigned to the training set and 160 instances allocated to the validation set. Our model training was executed within the training set using a rigorous five-fold CV strategy, which yielded a total of five distinct models. The culmination of this process entailed computing the ultimate IEAs by calculating the average value across the IEAs extracted from these five models. Once the IEA model was trained, it could be applied to new chest CT images to derive IEAs for the sample CT scan. The IEA model was used to derive IEAs for the testing dataset (n=1349), and the clinical implications of these IEAs were explored to see whether they were consistent with those in the training set (Figure 1).
Figure 1.
Overview of the data analysis workflow. The study encompassed two distinct cohorts: one with proteomics data and the other without. For image analysis, CT images were partitioned into 323 mm3 patches, generating 128 features through the CSRL algorithm. These features were then fed into a MLP, processing a tensor of dimensions 575 × 128 × 630 to yield a subject-level data representation of dimensions 575 × 3 × 630. The latent representation at the subject level, referred to as IEAs, was obtained by summarizing the patch-level features into a 3 × 630 matrix. A linear layer (3 92) was introduced to predict protein abundance for each subject, utilizing the IEA as input. To maintain their independence, we applied independent constraints on the IEAs. The overall objective function aimed to minimize the mean-squared error of protein-level predictions. Following the training of the IEA model, it was applied to the testing cohort to extract IEAs for each subject. Subsequently, the clinical relevance of IEAs was independently assessed in both the training and testing datasets. The primary focus lies in evaluating the predictive and prognostic enrichment of the IEAs. Prognostic enrichment analysis aimed to determine whether IEAs were linked to critical clinical outcomes, such as mortality, the utilization of vasopressors, and MV. On the other hand, predictive enrichment exploration involved the inclusion of interaction terms between treatment modalities (fluids and steroids) and IEAs.
CSRL: Context-aware self-supervised representation learning; CT: Computed tomography; IEAs: Image-Expression Axes; MLP: Multilayer perceptron; MV: Mechanical ventilation; SSL: Self-supervised representation learning.
Biological interpretation of IEAs
For each protein identified by its OlinkID, the analysis conducts a Welch 2-sample t-test at a confidence level of 0.95. To account for multiple comparisons, the Benjamini–Hochberg method is applied for P-value adjustment.[16] The differential expressions of plasma proteins were visualized using volcano plots. The comparison groups comprised low and high values of each IEA, as well as survivors and non-survivors. Biological interpretation of differential expression proteins was performed using enrichment analysis based on statistical test results and full data using clusterProfiler's GSEA and enrich functions for Molecular Signatures Database (MSigDB).[17,18]
Clinical implications of IEAs
To enhance the clinical relevance of the IEAs, we conducted correlation analyses to investigate potential associations between IEAs and relevant clinical variables. By exploring these correlations, we aimed to uncover links between IEAs and clinical indicators. Logistic models were adjusted for potential confounders, including age, sex, comorbidities, and baseline severity (Sequential Organ Failure Assessment [SOFA] score), to minimize bias and isolate the effect of IEAs on prognosis. Additionally, we delved into the independent associations between IEA values and pivotal clinical outcomes, including mortality, the use of mechanical ventilation (MV), and vasopressors. This exploration was conducted through the application of logistic regression models. In pursuit of predictive insights, we extended our analysis to consider the predictive enrichment of IEAs. This entailed the incorporation of interaction terms between IEAs and factors such as steroids or fluid intake. A notable outcome of this endeavor was the identification of significant interaction coefficients. Such coefficients indicated that specific IEAs held potential implications for varying treatment effects. In essence, the presence of a significant interaction coefficient suggested that the corresponding IEA played a role in influencing differential treatment outcomes.
Results
Study population
The initial screening encompassed 2534 patients, but subsequent application of exclusion criteria led to a final sample of 1979 patients for analysis. Within this cohort, the training set comprised 630 patients with proteomic data, while the testing set encompassed 1349 patients without proteomic data. Notably, no significant differences were observed in terms of other variables between the training and testing datasets. The patient distribution displayed a higher representation of male patients, accounting for 65.1 % of the cohort. The clinical outcomes exhibited similarities between the two datasets, and notably, the cohort's overall mortality rate stood at 22.5 %. Furthermore, 44.0 % of the patients necessitated MV (Supplementary Table S2).
Identification of IEAs
To ascertain the optimal threshold P-value and the number of proteins utilized for model training, we conducted a thorough analysis of the stability of IEAs across various CV folds. Pearson correlation coefficients were computed among IEAs identified within distinct CV folds to gauge their stability. Employing a range of thresholds for the adjusted P-value that associates each protein with the CSRL features, we systematically curated subsets of proteins for model inclusion. Notably, the threshold of P=0.05 coupled with 55 selected proteins yielded the highest mean correlation coefficients for the three IEAs. Consequently, this threshold was designated for primary analysis.
Furthermore, the optimal number of IEAs (n=3) was established based on the cumulative variance explained by the selected proteins (Supplementary Figure S1 and S2). Comparing the training and testing datasets, the IEA1 and IEA2 values exhibited slightly lower medians in the training set than in the testing set (Supplementary Table S2). Specifically, the median (interquartile range [IQR]) of IEA1 was −0.06 (−0.61, 0.53) in the training set compared to 0.04 (−0.49, 0.89) in the testing set (adjusted P = 0.048). Likewise, the median (IQR) of IEA2 was 0.49 (−0.96, 0.83) in the training set, contrasting with 0.6 (0.07, 0.87) (adjusted P = 0.048) in the testing set.
Clinical interpretation of IEAs
In order to enhance the clinical insights garnered from the IEAs, we undertook a thorough correlation analysis involving clinical variables. IEA1 emerged with significant correlations with shock indices, including lactate, the cardiovascular component of the SOFA (SOFAcv), and minimum blood pressure. This axis was construed as representative of the shock response. Notably, IEA2 exhibited notable correlations with inflammatory response markers such as C-reactive protein (CRP), white blood cell count (WBC), body temperature, RR, and heart rate, thereby earning the designation of the systemic inflammatory response syndrome (SIRS) axis. IEA3, on the other hand, displayed significant correlations with coagulation profiles, positioning itself as the coagulation axis (Figure 2). This correlation pattern held consistent in both the training and testing datasets (Supplementary Figure S3). Expanded correlation matrices encompassing clinical variables can be found in Supplementary Figures S4 and S5. Representative CT images were included to illustrate the relationship between IEA values and lung characteristics (Figure 3). Specifically, lower IEA1 values were characterized by extensive lung infiltrates, while higher IEA1 values depicted near-normal lung parenchyma (Figure 3A,B). Similarly, lower IEA2 values exhibited a grand glass pattern of infiltrates, while higher IEA2 values were marked by centric opacity infiltrates (Figure 3C,D).
Figure 2.
Correlation matrix between clinical features and IEAs. The correlation matrix represents the pairwise correlations between variables in the dataset. Each cell in the matrix displays the correlation coefficient between two variables, with colors indicating the strength and direction of the correlation. Positive correlations are depicted in shades of red, while negative correlations are represented in shades of blue. The color intensity and size of each cell correspond to the magnitude of the correlation coefficient. In this representation, cells displaying a cross mark indicate the absence of statistical significance, with P-values exceeding 0.05. The values within the cells indicate the correlation coefficients. This matrix provides a comprehensive overview of the interrelationships between variables, aiding in the identification of potential patterns and associations within the dataset.
aPTT: Activated partial thromboplastin time; CRP: C-reactive protein; Fluidin: Fluid infusion in 24 h; HRmax: Maximum heart rate; IEA: Image-Expression Axis; INR: International normalized ratio; Lac: Lactate; MAPmin: Minimum mean arterial pressure; PaCO2: Arterial carbon dioxide partial pressure; PLT: Platelet; Procal: Procalcitonin; RRmax: Maximum respiratory rate; SAPmin: Minimum systolic arterial pressure; SOFAcv: Cardiovascular component of Sequential Organ Failure Assessment score; Tmax: Maximum body temperature; TT: Thrombin time; WBC: White blood cell count.
Figure 3.
Example chest CT for different values of IEAs. Lower IEA1 values were associated with the presence of extensive lung infiltrates, as depicted in panel (A), while higher IEA1 values were indicative of near-normal lung parenchyma, as observed in panel (B). In a similar manner, lower IEA2 values exhibited a pattern of grand glass infiltrates, illustrated in panel (C), while higher IEA2 values were characterized by centric opacity infiltrates, as seen in panel (D). Furthermore, a higher IEA3 value was linked to the presence of infiltrates along the bronchial regions, visualized in panel (E), whereas lower IEA3 values were indicative of structural changes in lung parenchyma attributed to chronic lung diseases, as depicted in panel (F).
CT: Computed tomography; IEA: Image-Expression Axis.
Biological function enrichment along the IEA axis
For enhanced comprehension of biological functionalities along the IEA continuum, we assessed protein abundance disparities between low and high IEA value groups, as well as in survivors and non-survivors. These disparities were gauged through t-tests to ascertain statistical significance. In this context, it was observed that survivors exhibited higher abundances of TRAIL, interleukin (IL)7, TRANCE, and CCL28 compared to non-survivors. Conversely, the low IEA1 group manifested elevated abundances of IL6, IL8, OSM, and MCP-3 relative to the high IEA1 group (Figure 4). These distinct protein abundance patterns were further substantiated by their enrichment within specific biological pathways, such as interleukin signaling pathways, JAK-STAT signaling pathways, and responses to oxygen-containing compounds (Supplementary Figure S6). Moving on to low IEA2, differentially expressed proteins encompassing CXCL11, MCP-3, IL7, and CCL28 exhibited notable disparities. These biomarkers were also found to be enriched within pathways including the severe acute respiratory syndrome coronavirus 2 (SARS-COV2) innate immune response, responses to oxygen-containing compounds, and responses to chemokines. This comprehensive analysis provides valuable insights into the relationships between IEA values and key biological processes, potentially shedding light on underlying mechanisms contributing to observed clinical outcomes (Supplementary Figure S6).
Figure 4.
Volcano plots illustrate the differences in protein abundance when comparing different groups. The volcano plot illustrates the differential expression analysis results for protein abundances between two distinct groups. Each point on the plot represents a protein, with its position determined by the differences between groups (X-axis) and the negative logarithm of the adjusted P-value (Y-axis). Proteins exhibiting statistically significant differences in abundance are indicated by dots positioned far from the center along the x-axis, with upregulated proteins represented on the right side and downregulated proteins on the left side. The dashed horizontal line indicates the significance threshold for adjusted P-values. The color-coded dots help to distinguish proteins with a significant difference in expression. (i.e., non-significant proteins are denoted by blue color and significant ones are denoted by red color). This plot aids in visualizing and identifying proteins of potential biological importance and provides insights into the molecular mechanisms underlying the observed experimental conditions: (A) survivors vs. non-survivors, (B) Low IEA1 vs. high IEA1, (C) Low IEA2 vs. high IEA2, (D) Low IEA3 vs. high IEA3.
IEA: Image-Expression Axis.
Predictive and prognostic enrichment of IEAs
Given the noteworthy correlations between IEAs and clinical variables, logistic regression models were constructed incorporating all three IEAs. This approach was employed to avoid potential collinearity issues that could arise from models adjusted for other clinical variables. The outcomes demonstrated that both IEA1 (odds ratio [OR]=0.52, 95 % confidence interval [CI]: 0.40 to 0.67, P < 0.001) and IEA2 (OR=0.74, 95 % CI: 0.62 to 0.90, P=0.002) exhibited consistent associations with mortality risk. Similarly, IEA1 and IEA2 showed associations with the risk of vasopressor usage ( IEA1: median=0.41, [IQR: 0.32, 0.52], P < 0.001 and IEA2: median=0.69, [IQR: 0.57, 0.82], P < 0.001) and MV ( IEA1: median=0.33, [IQR: 0.26, 0.42], P < 0.001 and IEA2: median=0.76, [IQR: 0.63, 0.90], P < 0.001) in training set. Notably, IEA3’s association with these clinical outcomes was not consistently observed across the training and testing datasets (Table 1).
Table 1.
Association of IEA with important clinical outcomes in training and testing set.
| Variables | Training set (n = 630) |
Testing set (n = 1349) |
||
|---|---|---|---|---|
| OR (95 % CI) | P-value | OR (95 % CI) | P-value | |
| Mortality | ||||
| IEA1 | 0.52 (0.40 to 0.67) | <0.001 | 0.40 (0.34 to 0.48) | <0.001 |
| IEA2 | 0.74 (0.62 to 0.90) | 0.002 | 0.72 (0.62 to 0.84) | <0.001 |
| IEA3 | 0.96 (0.77 to 1.20) | 0.719 | 1.44 (1.22 to 1.71) | <0.001 |
| MV | ||||
| IEA1 | 0.33 (0.26 to 0.42) | <0.001 | 0.42 (0.36 to 0.49) | <0.001 |
| IEA2 | 0.76 (0.63 to 0.90) | 0.002 | 0.56 (0.48 to 0.65) | <0.001 |
| IEA3 | 0.89 (0.73 to 1.08) | 0.224 | 1.09 (0.95 to 1.26) | 0.224 |
| Use of vasopressor | ||||
| IEA1 | 0.41 (0.32 to 0.52) | <0.001 | 0.46 (0.39 to 0.53) | <0.001 |
| IEA2 | 0.69 (0.57 to 0.82) | <0.001 | 0.59 (0.51 to 0.68) | <0.001 |
| IEA3 | 0.83 (0.67 to 1.01) | 0.062 | 1.04 (0.90 to 1.20) | 0.596 |
| Use of CRRT | ||||
| IEA1 | 0.84 (0.54 to 1.29) | 0.436 | 0.79 (0.61 to 1.02) | 0.076 |
| IEA2 | 0.79 (0.60 to 1.03) | 0.084 | 0.82 (0.67 to 1.00) | 0.044 |
| IEA3 | 0.94 (0.68 to 1.31) | 0.722 | 1.26 (1.02 to 1.56) | 0.033 |
Abbreviations: IEA = image expression axis, OR = odds ratio, CI = confidence interval, CRRT = continuous renal replacement therapy.
To assess predictive enrichment potential, we delved into the inclusion of interaction terms involving fluid intake and steroid use alongside the IEAs. Subsequently, we analyzed the outcomes and observed significant interactions between IEA1 and steroid use (P=0.015), as well as between IEA2 and the volume of fluid infusion (P=0.002). For patients with lower IEA1 values (<−2, indicating greater shock), the utilization of steroids correlated with a reduced risk of mortality. Conversely, in patients exhibiting higher IEA1 values, the application of steroids was associated with an elevated risk of death. In a different context, patients characterized by lower IEA2 values (suggesting heightened severity of inflammatory response) experienced a lowered mortality risk when subjected to increased fluid infusion. Conversely, patients displaying higher IEA2 values might benefit from a more restrained fluid infusion strategy (Figure 5). This analysis underscores the complex nature of the relationship between IEAs, treatment modalities, and clinical outcomes, unveiling potential avenues for personalized therapeutic approaches.
Figure 5.
Modification effect of IEAs on the therapeutic effects of corticosteroids and fluid infusion. Cox proportional hazards regression models including interaction terms between treatment and IEA were fit on the overall data. Significant interaction effects were identified for fluids × IEA2 and steroids × IEA1. A: the plot showing the risk score for subjects using or not using corticosteroids at different values of IEA1 (P = 0.015 for interaction); the error bar indicates the 95 % CI. B: The plot showing the risk score amount of fluid infusion at different values of IEA2 (P=0.002 for interaction); the shaded area indicates the 95 % CI.Higher risk score means higher mortality rate.
CI: Confidence interval; IEA: Image-Expression axis.N: No; Y: Yes.
Discussion
The present study has unraveled significant insights into the complexity of severe COVID-19 pneumonia through the integration of deep learning techniques, proteomics, and clinical data. By harnessing the power of advanced data analytics, we have uncovered multifaceted relationships between IEAs, protein abundances, clinical variables, and patient outcomes. Our findings suggest important implications for personalized therapeutic interventions and provide insights into the complex pathophysiological mechanisms underlying disease progression. Notably, we identified three key Integrated Evaluation Approaches associated with different clinical domains. IEA1 emerged as a shock axis, showing significant correlations with shock indices such as lactate levels, SOFAcv scores, and minimum blood pressure. Meanwhile, IEA2 revealed a SIRS axis, correlating with inflammatory markers like CRP, WBC, body temperature, RR, and heart rate. IEA3 demonstrated a coagulation axis, indicating significant relationships with coagulation profiles. These correlations highlight the potential of IEAs as comprehensive biomarkers that encompass various clinical domains and reflect the intricate interactions among biological processes during disease progression.
Our investigation into the predictive and prognostic capabilities of IEAs unveiled noteworthy insights. The mortality risk was consistently associated with IEA1 and IEA2, suggesting their potential as early prognostic markers for severe COVID-19 pneumonia. Furthermore, interaction analyses disclosed intriguing therapeutic implications. For patients with lower IEA1 values, the administration of steroids was linked to reduced mortality risk, indicating a potential benefit for patients in shock. On the contrary, higher IEA1 values conferred an elevated risk of death with steroid usage, highlighting the importance of tailoring treatment strategies based on individual IEA profiles. Similarly, IEA2’s interaction with fluid infusion unveiled a personalized approach for fluid management. This dynamic interplay between IEAs, therapeutic interventions, and clinical outcomes underscores the potential for precision medicine approaches in managing severe COVID-19 cases. These findings were consistent with other reports on COVID-19 or sepsis population that individualized fluid resuscitation or steroid use should be implemented to improve clinical outcomes.[3,[19], [20], [21]
Disease axes represent a dynamic framework for characterizing the multifaceted nature of disease presentations, encompassing continuous disease traits that coexist in varying degrees within a single individual.[22] These axes play a pivotal role in addressing the intricate landscape of disease heterogeneity, offering invaluable insights into the intricate interplay of different disease components. Notably, disease axes provide a more nuanced perspective compared to conventional distinct clusters, allowing for a more comprehensive depiction of clinical manifestations.[23] This approach has yielded promising outcomes in various medical contexts, and its potential has been exemplified in the realm of chronic obstructive pulmonary disease (COPD) heterogeneity.[22,[24], [25], [26] Kinney et al.[22] harnessed factor analysis to discern five distinct disease axes, which were subsequently linked to mortality outcomes. Importantly, they discovered a significant synergistic interaction between two of these factors that further influenced mortality. Building upon this foundation, Chen et al.[15] identified two disease axes from chest CT scans of COPD patients. These axes not only bore significant associations with long-term mortality but also underscored the broader applicability of disease axes in disease characterization. Expanding upon these insights, our study embarked on a novel exploration into the realm of severe COVID-19 pneumonia. Notably, our investigation demonstrated that disease axes in this context not only possess prognostic significance, influencing clinical outcomes but also harbor predictive potential that can inform tailored treatment strategies. This revelation signifies a significant advancement in our understanding of disease axes, as they emerge as powerful tools that transcend mere characterization to encompass predictive precision medicine applications. Importantly, the robustness of these enrichments extended beyond the confines of the training dataset, as they were consistently reaffirmed in the independent testing datasets.
A significant strength of this study lies in its innovative approach to deriving IEAs using a deep learning model with proteomic data as the training target, as opposed to employing non-supervised dimension reduction techniques. This methodological choice carries several distinct advantages. By utilizing the proteomics dataset for model training, the IEAs are constructed with a direct link to the underlying biological processes. Consequently, the IEAs reflect the most pertinent features of chest CT scans, effectively isolating them from noise features stemming from technical artifacts, institutional variations, and other potential parameter disparities.[27] Furthermore, this approach establishes a highly practical and versatile framework for translating research findings into clinical practice. Once the model is trained, it can be seamlessly applied to raw chest CT data, enabling clinicians to utilize the IEA model without requiring the measurement of plasma proteomics. This presents a marked advantage over methods that necessitate costly and time-consuming plasma proteomics analysis. Given the ease of obtaining chest CT scans and the often expedited reporting timeframe, the proposed IEA model can be readily integrated into the dynamic clinical setting. This adaptability is particularly valuable in real-world clinical scenarios, where timely and accurate decision-making is paramount. In essence, the approach taken in this study not only ensures a robust and biologically informed derivation of IEAs but also optimizes the practicality of their implementation in clinical contexts. By bridging the gap between advanced computational methodologies and clinical applicability, this study enhances the potential for personalized patient care and treatment optimization in severe COVID-19 pneumonia management.
It is crucial to acknowledge the limitations of our study. Firstly, while our approach offers valuable insights into patient stratification and personalized therapeutic avenues, the study's observational nature warrants a cautious interpretation of causal relationships. Additionally, while our study's sample size is substantial, it remains crucial to replicate these findings in larger and more diverse cohorts to validate their robustness across different patient populations and demographics. Finally, it is important to acknowledge the limitations inherent in our study's scope of proteomic analysis. Our investigation focused solely on the inflammation panel of the Olink assay, thereby capturing a specific subset of plasma proteins related to the inflammatory response. However, it is crucial to recognize that the intricate tapestry of COVID-19′s impact spans across multiple systems within the body.[28,29] As such, the current IEA derived from the measured proteins provides only a partial glimpse into the complex interplay of biological processes.
Conclusions
In conclusion, our integrated approach, encompassing deep learning techniques, proteomics, and clinical data, has unveiled intricate relationships between IEAs, protein abundances, clinical variables, and patient outcomes in severe COVID-19 pneumonia. These findings contribute to the burgeoning field of precision medicine, offering promising avenues for tailored therapeutic interventions. Our study not only highlights the potential of IEAs as comprehensive biomarkers but also emphasizes the importance of personalized approaches in managing complex diseases like severe COVID-19 pneumonia. Future research should focus on the validation of these findings and their translation into clinical practice, with the ultimate goal of improving patient outcomes in the battle against this global pandemic.
CRediT authorship contribution statement
Yucai Hong: Investigation. Lin Chen: Writing – review & editing, Writing – original draft, Data curation. Yang Yu: Writing – review & editing, Methodology, Investigation, Data curation. Ziyue Zhao: Writing – review & editing, Validation, Supervision, Software. Ronghua Wu: Writing – review & editing, Visualization, Supervision, Data curation. Rui Gong: Writing – review & editing, Visualization, Supervision, Data curation. Yandong Cheng: Project administration, Formal analysis, Data curation. Lingmin Yuan: Resources, Project administration, Investigation. Shaojun Zheng: Writing – review & editing, Validation, Investigation. Cheng Zheng: Writing – review & editing, Visualization, Supervision, Formal analysis. Ronghai Lin: Writing – review & editing, Supervision, Resources. Jianping Chen: Validation, Supervision, Software, Project administration. Kangwei Sun: Writing – review & editing, Supervision, Software, Formal analysis, Data curation. Ping Xu: Writing – review & editing, Data curation, Conceptualization. Li Ye: Methodology, Investigation, Formal analysis, Data curation. Chaoting Han: Writing – review & editing, Project administration, Formal analysis, Data curation. Xihao Zhou: Resources, Investigation, Data curation. Yaqing Liu: Methodology, Investigation. Jianhua Yu: Project administration. Yaqin Zheng: Formal analysis. Jie Yang: Investigation. Jiajie Huang: Software. Juan Chen: Software. Junjie Fang: Supervision. Chensong Chen: Data curation. Bo Fan: Writing – review & editing. Honglong Fang: Validation. Baning Ye: Software. Xiyun Chen: Methodology, Conceptualization. Xiaoli Qian: Visualization. Junxiang Chen: Writing – review & editing, Software. Haitao Yu: Data curation. Jun Zhang: Supervision. Xi-Ming Pan: Methodology. Yi-Xing Zhan: Investigation. You-Hai Zheng: Visualization. Zhang-Hong Huang: Methodology. Chao Zhong: Formal analysis. Ning Liu: Resources. Hongying Ni: Supervision. Gengsheng Zhang: Funding acquisition. Zhongheng Zhang: Writing – original draft, Investigation, Conceptualization.
Acknowledgments
Acknowledgments
None.
Funding
The study was supported by funding from the National Natural Science Foundation of China (grant numbers 82272180, 82472243), China National Key Research and Development Program (grant numbers 2023YFC3603104; 2022YFC2504500), the Huadong Medicine Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under Grant No. LHDMD24H150001, the Fundamental Research Funds for the Central Universities (grant number 226–2023–00123), the Project of Drug Clinical Evaluate Research of Chinese Pharmaceutical Association (grant number NO.CPA-Z06-ZC-2021–004) and A collaborative scientific project co-established by the Science and Technology Department of the National Administration of Traditional Chinese Medicine and the Zhejiang Provincial Administration of Traditional Chinese Medicine (grant number GZY-ZJ-KJ-24082), and Project of Zhejiang University Longquan Innovation Center (grant number ZJDXLQCXZCJBGS2024016).
Ethics Statement
The study was approved by the ethics committee of Sir Run Shaw Hospital (approval number: 2023–0048), and informed consent was obtained from participants or their family members. The research was carried out in accordance with the Helsinki Declaration.
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Declaration of Generative AI in Scientific Writing
During the preparation of this work, the author(s) used ChatGPT in order to improve the readability of the manuscript. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
Data Availability
Data are available upon formal approval and reasonable request to the corresponding author. The study was conducted under the CMAISE framework (CMAISE-COVID19); The data reported in this paper have been deposited in the OMIX, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences(ngdc.cncb.ac.cn/omix: accession no.OMIX006496). The descriptions of the full CMAISE project can be found at. https://github.com/zh-zhang1984/CMAISE/wiki.
Managing editor: Jingling Bao/Zhiyu Wang.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jointm.2024.11.001.
Contributor Information
Hongying Ni, Email: nihongying2@163.com.
Gengsheng Zhang, Email: genshengzhang@zju.edu.cn.
Zhongheng Zhang, Email: zh_zhang1984@zju.edu.cn.
Appendix. Supplementary materials
References
- 1.Li J., Huang D.Q., Zou B., Yang H., Hui W.Z., Rui F., et al. Epidemiology of COVID-19: a systematic review and meta-analysis of clinical characteristics, risk factors, and outcomes. J Med Virol. 2021;93(3):1449–1458. doi: 10.1002/jmv.26424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dhama K, Khan S., Tiwari R., Sircar S., Bhat S., Malik Y.S., et al. Coronavirus disease 2019-COVID-19. Clin Microbiol Rev. 2020;33(4) doi: 10.1128/CMR.00028-20. e28–e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Verhoef P.A., Spicer A.B., Lopez-Espina C., Bhargava A., Schmalz L., Sims M.D., et al. Analysis of protein biomarkers from hospitalized COVID-19 patients reveals severity-specific signatures and two distinct latent profiles with differential responses to corticosteroids. Crit Care Med. 2023;51(12):1697–1705. doi: 10.1097/CCM.0000000000005983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bocchino M., Rea G., Capitelli L., Lieto R., Bruzzese D. Chest CT lung abnormalities 1 year after COVID-19: a systematic review and meta-analysis. Radiology. 2023;308(1) doi: 10.1148/radiol.230535. [DOI] [PubMed] [Google Scholar]
- 5.Ebrahimzadeh S., Islam N., Dawit H., Salameh J.P., Kazi S., Fabiano N., et al. Thoracic imaging tests for the diagnosis of COVID-19. Cochrane Database Syst Rev. 2022;5(5) doi: 10.1002/14651858.CD013639.pub5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Talla A., Vasaikar S.V., Szeto G.L., Lemos M.P., Czartoski J.L., MacMillan H., et al. Persistent serum protein signatures define an inflammatory subcategory of long COVID. Nat Commun. 2023;14(1):3417. doi: 10.1038/s41467-023-38682-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hu Z., van der Ploeg K., Chakraborty S., Arunachalam P.S., Mori D.A.M., Jacobson K.B., et al. Early immune markers of clinical, virological, and immunological outcomes in patients with COVID-19: a multi-omics study. Elife. 2022;11:e77943. doi: 10.7554/eLife.77943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Vijayakumar B., Boustani K., Ogger P.P., Papadaki A., Tonkin J., Orton C.M., et al. Immuno-proteomic profiling reveals aberrant immune cell regulation in the airways of individuals with ongoing post-COVID-19 respiratory disease. Immunity. 2022;55(3):542. doi: 10.1016/j.immuni.2022.01.017. –56e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jin Y.H., Cai L., Cheng Z.S., Cheng H., Deng T., Fan Y.P., et al. A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version) Mil Med Res. 2020;7(1):4. doi: 10.1186/s40779-020-0233-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sun L., Yu K., Batmanghelich K. Context matters: graph-based self-supervised representation learning for medical images. Proc AAAI Conf Artif Intell. 2021;35(6):4874–4882. [PMC free article] [PubMed] [Google Scholar]
- 11.Wik L., Nordberg N., Broberg J., Björkesten J., Assarsson E., Henriksson S., et al. Proximity extension assay in combination with next-generation sequencing for high-throughput proteome-wide analysis. Mol Cell Proteomics. 2021;20 doi: 10.1016/j.mcpro.2021.100168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hinton G.E. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002;14(8):1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]
- 13.Gretton A., Bousquet O., Smola A., Schölkopf B. International Conference on Algorithmic Learning Theory. 2005. Measuring statistical dependence with hilbert-schmidt norms; pp. 63–77. [DOI] [Google Scholar]
- 14.Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization. 2017. doi: 10.48550/arXiv.1412.6980
- 15.Chen J., Xu Z., Sun L., et al. Deep Learning Integration of Chest Computed Tomography Imaging and Gene Expression Identifies Novel Aspects of COPD. Chronic Obstr Pulm Dis. 2023;10(4):355–368. doi: 10.15326/jcopdf.2023.0399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chen S.-Y., Feng Z., Yi X. A general introduction to adjustment for multiple comparisons. J Thorac Dis. 2017;9:1725–1729. doi: 10.21037/jtd.2017.05.34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb) 2021;2 doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liberzon A., Birger C., Thorvaldsdóttir H., Ghandi M., Mesirov J.P., Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang Z., Zhang G., Goyal H., Mo L., Hong Y. Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis. Crit Care. 2018;22:347. doi: 10.1186/s13054-018-2279-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhang Z., Pan Q., Ge H., Xing L., Hong Y., Chen P. Deep learning-based clustering robustly identified two classes of sepsis with both prognostic and predictive values. EBioMedicine. 2020;62 doi: 10.1016/j.ebiom.2020.103081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tenda E.D., Henrina J., Samosir J., Amalia R., Yulianti M., Pitoyo C.W., et al. Machine learning-based COVID-19 acute respiratory distress syndrome phenotyping and clinical outcomes: a systematic review. Heliyon. 2023;9(6):e17276. doi: 10.1016/j.heliyon.2023.e17276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kinney G.L., Santorico S.A., Young K.A., Cho M.H., Castaldi P.J., San José Estépar R, et al. Identification of chronic obstructive pulmonary disease axes that predict all-cause mortality: the COPDGene study. Am J Epidemiol. 2018;187(10):2109–2116. doi: 10.1093/aje/kwy087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Castaldi P.J., Boueiz A., Yun J., Estepar R.S.J., Ross J.C., Washko G., et al. Machine learning characterization of COPD subtypes: insights from the COPDGene study. Chest. 2020;157(5):1147–1157. doi: 10.1016/j.chest.2019.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Castaldi P.J., Benet M., Petersen H., Rafaels N., Finigan J., Paoletti M., et al. DO “COPD Subtypes” really exist? COPD heterogeneity and clustering in 10 independent cohorts. Thorax. 2017;72:998–1006. doi: 10.1136/thoraxjnl-2016-209846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Roy K., Smith J., Kolsum U., Borrill Z., Vestbo J., Singh D. COPD phenotype description using principal components analysis. Respir Res. 2009;10(1):41. doi: 10.1186/1465-9921-10-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chen J., Cho M., Silverman E.K., Hokanson J.E., Kinney G.L., Crapo J.D., et al. Turning subtypes into disease axes to improve prediction of COPD progression. Thorax. 2019;74(9):906–909. doi: 10.1136/thoraxjnl-2018-213005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chowdhary C.L., Acharjya D.P. Segmentation and feature extraction in medical imaging: a systematic review. Procedia Comput Sci. 2020;167:26–36. doi: 10.1016/j.procs.2020.03.179. [DOI] [Google Scholar]
- 28.Osuchowski M.F., Winkler M.S., Skirecki T., Cajander S., Shankar-Hari M., Lachmann G., et al. The COVID-19 puzzle: deciphering pathophysiology and phenotypes of a new disease entity. Lancet Respir Med. 2021;9(6):622–642. doi: 10.1016/S2213-2600(21)00218-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lotfi M., Rezaei N. SARS-CoV-2: a comprehensive review from pathogenicity of the virus to clinical consequences. J Med Virol. 2020;92(10):1864–1874. doi: 10.1002/jmv.26123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available upon formal approval and reasonable request to the corresponding author. The study was conducted under the CMAISE framework (CMAISE-COVID19); The data reported in this paper have been deposited in the OMIX, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences(ngdc.cncb.ac.cn/omix: accession no.OMIX006496). The descriptions of the full CMAISE project can be found at. https://github.com/zh-zhang1984/CMAISE/wiki.





