Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2025 Jun 12;117:105800. doi: 10.1016/j.ebiom.2025.105800

Proteomic biomarkers of emphysema-predominant and non-emphysema-predominant chronic obstructive pulmonary disease

Yu-Hang Zhang a, Peter J Castaldi a, Russell P Bowler b,e, Katherine A Pratte b, Gregory L Kinney c, Kendra A Young c, Heena Rijhwani a, Sharon M Lutz f, Craig P Hersh a,d, Michael H Cho a,d, Jarrett D Morrow a, Edwin K Silverman a,d,
PMCID: PMC12192512  PMID: 40505416

Summary

Background

Chronic Obstructive Pulmonary Disease (COPD) is a complex and heterogeneous disease. Emphysema-predominant and non-emphysema predominant COPD are two major disease subtypes capturing important aspects of COPD heterogeneity. Molecular differences between these COPD subtypes are unknown.

Methods

We assessed plasma proteomic associations (using SomaScan) with emphysema-predominant vs. non-emphysema predominant COPD subtypes in COPDGene; replication of significant associations was performed in SPIROMICS. We performed pathway analyses on COPD subtype plasma proteomic associations and used weighted gene correlation network analysis to find COPD subtype-associated protein correlation networks. We tested previously reported COPD genetic variants for association with COPD subtypes and COPD subtype-associated proteomic biomarkers.

Findings

One hundred and twenty-four proteins were significantly associated with COPD subtypes in COPDGene, with 64 proteins (65 SOMAmers) validated in SPIROMICS. Higher correlations were observed between proteomic biomarkers with greater expression levels in non-emphysema predominant participants with COPD. Cell adhesion, collagen-containing extracellular matrix, and epithelial mesenchymal transition were biological pathways enriched for COPD subtype proteomic associations. One COPD subtype-associated correlation network module was identified, including highly connected proteomic biomarkers like PXDN and EFNA2. We observed significant genetic effects on COPD subtypes for rs2579762 in LRMDA and on COPD subtype-associated proteomic biomarkers including sRAGE and Ganglioside GM2 Activator.

Interpretation

We identified and replicated multiple plasma proteomic biomarkers associated with emphysema-predominant vs. non-emphysema predominant COPD. Pathway analyses, correlation-based network analyses, and genetic association analyses of these proteins may provide insight into the molecular heterogeneity of COPD.

Funding

National Heart, Lung, and Blood Institute (NIH).

Keywords: COPD, Emphysema, Proteomics, Correlation network analysis, Quantitative trait locus analysis


Research in Context.

Evidence before this study

Chronic obstructive pulmonary disease (COPD) is a complex and heterogeneous condition characterised by persistent airflow limitation. COPD subtyping could lead to improvements in disease diagnosis, progression prognostication, and treatment strategy optimisation. Multiple previous approaches to examine COPD heterogeneity have converged upon a subtyping distinction based on quantitative emphysema assessed by chest CT scan. Subjects with significant airflow obstruction (FEV1 < 80% predicted with FEV1/FVC < 0.7) and greater than or equal to 10% computed tomography (CT) quantitative densitometric emphysema (percentage of low attenuation areas less than −950 Hounsfield units (LAA-950)) have been classified as emphysema predominant COPD (EPD), while non-emphysema-predominant disease (NEPD) has been defined by significant airflow obstruction with <5% LAA-950. This EPD/NEPD COPD subtyping classification is associated with multiple previously applied COPD subtype definitions and reflects COPD progression patterns.

Previous proteomic studies have identified plasma biomarkers associated with COPD affection status, lung function decline, and emphysema.

Added value of this study

We used a widely reported aptamer-based proteomics assay (SOMAScan) to identify molecular differences between emphysema and non-emphysema predominant COPD in plasma. We recognised 124 plasma protein biomarkers associated with EPD vs. NEPD in the COPDGene study, and we were able to replicate most of these associations in the SPIROMICS study. These proteomic biomarkers are enriched in biological functions including cell adhesion, collagen-containing extracellular matrix, and epithelial mesenchymal transition. Correlation-based network analyses identified highly connected proteomic biomarkers within a COPD subtype-associated correlation module.

Implications of all the available evidence

Multiple plasma protein biomarkers distinguish EPD and NEPD subtypes of COPD, suggesting that these COPD subtypes have different pathogenetic mechanisms. This information could eventually lead to more accurate disease diagnosis, disease progression prognostication, and treatment optimisation.

Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a complex and heterogeneous disease defined by persistent airflow obstruction.1,2 The identification of subtypes within the COPD syndrome is important for more accurate disease diagnosis, disease progression prognostication, and treatment optimisation. However, current diagnostic approaches do not account for heterogeneity in COPD.3

Multiple approaches have been used to examine COPD heterogeneity using clinical, physiological, and imaging phenotypes. Castaldi et al. identified four clusters associated with clinical characteristics, including exacerbations, and COPD-associated genetic variants using K-means cluster analysis.4 Kinney et al. recognised five factors/disease axes to represent COPD pathophysiologic heterogeneity by factor analysis integrating pulmonary function testing and computed tomography imaging results.5 Ross et al. identified four lung function trajectories with significant genetic associations using a data-driven approach.6 Protein biomarkers have also been used in cluster analysis to identify COPD subtypes.7 However, there remains little consensus about COPD subtyping, and transferability of unsupervised clustering solutions across different study populations is limited.8

Emphysema is an important clinical parameter capturing COPD heterogeneity. Patients with more emphysema have, on average, lower lung function, longer duration of disease, higher mortality,9,10 and lower exercise capacity.11 Castaldi et al. assessed the heterogeneity of COPD through emphysema severity12; emphysema-predominant and non-emphysema predominant COPD were classified using spirometric and quantitative imaging criteria. In participants with COPD, emphysema-predominant disease (EPD) was defined as significant airflow obstruction (FEV1 < 80% predicted with FEV1/FVC < 0.7) with ≥10% computed tomography (CT) quantitative densitometric emphysema (percentage of low attenuation areas less than −950 Hounsfield units (LAA-950)), while non-emphysema-predominant disease (NEPD) was defined by significant airflow obstruction with <5% LAA-950. This EPD/NEPD COPD subtyping classification was strongly associated with several previously applied COPD subtype definitions and reflected different COPD progression patterns.12 However, the molecular differences between emphysema-predominant and non-emphysema predominant COPD have not been assessed.

Proteomic analysis can reveal the interactions, functions, and structures of proteins13 and provide robust molecular profiles for a systematic understanding of disease pathogenesis.14 Plasma proteomics reflects the molecular status in the peripheral circulation for systemic disease assessments, monitoring, and screening.15 Several circulating proteomic biomarkers have been previously associated with different aspects of COPD: 1) COPD affection status biomarkers including C-reactive protein16,17; 2) FEV1 biomarkers including interleukin 618; 3) FEV1/FVC ratio biomarkers including lipocalin-119; and 4) emphysema biomarkers including sRAGE.20,21

Several large-scale commercial proteomic panels have been developed; we used SomaScan, an aptamer-based proteomics assay capturing thousands of human proteins in various biological materials with high sensitivity and specificity.22 We hypothesised that plasma proteomics analysis would reveal molecular differences between emphysema-predominant and non-emphysema-predominant COPD.

Methods

Study population

The COPDGene Study is a multi-centre observational study designed to investigate the pathobiological mechanisms, heterogeneity, and progression of Chronic Obstructive Pulmonary Disease (COPD).23 The heterogeneity of COPD has been observed at different levels including physiology, imaging, clinical course, and response to therapeutic intervention.24 The COPDGene Study has enrolled more than 10,000 smokers with or without COPD across different COPD GOLD spirometry grades. Clinical phenotypes including chest computed tomography (CT) phenotypes have been characterised to evaluate emphysema, gas trapping, and airway wall thickening.23 The first participants in COPDGene were enrolled in 2007 (Phase 1), and baseline enrolment ended in 2011. Non-Hispanic White and Black subjects with at least 10 pack-years of smoking and self-reported age of 45–80 at enrolment were eligible to participate. A set of non-smoking controls was also enrolled, but they were not included in this project. Five-year follow-up data collection was obtained (Phase 2 cohort, 2013–2017) which included fresh frozen plasma collection. Subjects who participated in Phase 2 (2013–2017) of the study were selected for plasma proteomic analysis. Participants with emphysema-predominant COPD, non-emphysema predominant COPD, or controls from the COPDGene Phase 2 cohort were selected as follows:

  • I.

    Control population: FEV1 ≥ 80% predicted, FEV1/FVC ≥ 0.7 based on post-bronchodilator spirometry (pre-bronchodilator spirometry was used if post-bronchodilator spirometry was not available), pack-years ≥ 10 and LAA-950 < 5%

  • II.

    Emphysema-predominant COPD: FEV1 < 80% predicted, FEV1/FVC < 0.7 based on post-bronchodilator spirometry (pre-bronchodilator spirometry was used if post-bronchodilator spirometry was not available), pack-years ≥ 10 and LAA-950 ≥ 10%

  • III.

    Non-emphysema predominant COPD: FEV1 < 80% predicted, FEV1/FVC < 0.7 based on post-bronchodilator spirometry (pre-bronchodilator spirometry was used if post-bronchodilator spirometry was not available), pack-years ≥ 10 and LAA-950 < 5%

SPIROMICS (SubPopulations and InteRmediate Outcome Measures In COPD Study) supports the prospective collection and analysis of phenotypic, biomarker, genetic, genomic, and clinical data from subjects with COPD to identify subpopulations and intermediate outcome measures. We included subjects with a minimum of 20 pack-years of smoking from SPIROMICS using the same definitions of emphysema-predominant COPD, non-emphysema predominant COPD, and control subjects as in COPDGene to assess replication of proteomic associations.

The proteomics analysis workflow is shown in Fig. 1.

Fig. 1.

Fig. 1

CONSORT diagram for COPD subtype proteomics analysis.

Ethics

All study participants provided written informed consent, and studies were approved by local Institutional Review Boards. The current study was approved by the Mass General Brigham institutional review board (IRB #2007P000554).

Omics data preprocessing

Plasma samples were sent to SomaLogic for protein quantification using SomaLogic SomaScan® version 4.0 (5.0K) assay for human plasma in the COPDGene cohort and version 4.1 (7.0K) assay in SPIROMICS. For the COPDGene cohort, initially there were 5285 SOMAmers, 4979 of which assayed human proteins representing 4860 proteins or protein subunits (4727 UniProt IDs). SomaLogic standardised the SomaScan® data per their protocol using the approach in Serban et al.25 To control variability between sample batches and individual assays, median signal normalisation was performed using adaptive normalisation by maximum likelihood; plate scaling and calibration of SOMAmer analytes were also applied. Before data analysis, logarithmic transformation with base 2 was performed for normalisation. For the SPIROMICS cohort, there were 7596 SOMAmers representing 6626 human proteins or protein subunits (6401 UniProt IDs). Among all 4979 SOMAmers from SomaLogic SomaScan® version 4.0 (5.0K) assays detecting human proteins, there were 46 SOMAmers detecting protein complexes that consist of multiple proteins. As for 7596 SOMAmers from version 4.1 (7.0K) assays, there were 82 SOMAmers detecting multiple proteins in protein complexes. In 125 selected COPD subtype-associated SOMAmers, one SOMAmer (3359_11, targeting Cyclin-dependent kinase 8: Cyclin-C complex) detects a protein complex.

For downstream analyses, proteomic biomarker identification was performed at the SOMAmer level and annotated with protein name and UniProt ID. Selected proteomic biomarker numbers were summarised at both SOMAmer and UniProt ID levels. Proteomic associations for downstream pathway analyses including gene set enrichment analyses and gene ontology enrichment analyses were summarised at the unique HGNC (Human Gene Organisation Gene Nomenclature Committee) symbol level to avoid redundancy.

Genotyping data in the COPDGene cohort were obtained using the Human Omni Express chip (Illumina, San Diego, CA). Genotype imputation on the COPDGene cohorts was performed across the genome for the COPDGene cohort with the Haplotype Reference Consortium (HRC)26 for non-Hispanic whites and 1000 Genomes Phase 1 v3 for Blacks, respectively. Eighty-two genome-wide significant COPD GWAS loci were previously reported in a genome-wide association analysis with 35,735 COPD cases and 222,076 controls from UK Biobank and the International COPD Genetics Consortium.27 We also included rs28929474 (Z allele)28 in SERPINA1 which was genotyped by Illumina HumanExome arrays when available or imputed from GWAS data if not available. We included genotyped or imputed data for 82 COPD GWAS loci and rs28929474 from control, emphysema-predominant COPD, and non-emphysema predominant COPD participants from the Phase 2 COPDGene cohort for further analysis.

Proteomic biomarker identification

We utilised linear regression models to identify proteomic biomarkers comparing emphysema-predominant vs. non-emphysema predominant COPD. Age, sex, race, pack-years, BMI, smoking status (current or former smokers), clinical centre, and cellular components (white blood cell counts and platelet counts) were introduced as covariates for linear regression models, as shown below:

ProteomicSOMAmersEmphysema/Non-emphysemapredominantCOPD+age+sex+race+pack-years+BMI+currentsmokingstatus+clinicalcenter+cellularcomponentsplatelets,whitebloodcells

To evaluate whether our COPDGene cohort is large enough to detect significant proteomic associations, we performed power calculations using R package pwr v. 1.3-0 for the linear regression model. Minimal sample size was estimated with R2 for linear regression model controlled at 0.3 and statistical power controlled at 0.9 to evaluate weak association detection capacity (variables with low explanatory levels). SPIROMICS proteomics data were generated from SomaScan 4.1 (7.0K) assay as replication. Proteomic biomarkers for each comparison were selected with false discovery rate (FDR) controlled at 0.05 in the COPDGene and SPIROMICS cohorts separately.

For additional comparisons between linear regression models with or without adjustment for FEV1% predicted as a sensitivity test, we also test their significant concordance in effect direction.

Pair-wise protein-protein Pearson's correlations for emphysema-predominant vs. non-emphysema predominant COPD proteomic biomarkers with higher expression levels in emphysema-predominant or lower expression levels in emphysema-predominant were compared.

Considering the epidemiological associations between COPD and cardiovascular diseases, we searched for overlap between COPD subtype biomarkers and previously reported cardiac biomarkers in two datasets: Gene2Phenotype Cardiac Gene Panel29 and Cardiovascular Disease Atlas.30

Pathway analysis

Gene set enrichment and gene ontology enrichment analyses were conducted to explore pathway-level differences in the pathogenesis of COPD subtypes. We performed gene set enrichment analysis using R package GAGE v.2.48.031 with HALLMARK,32 KEGG,33 and REACTOME34 gene sets from Molecular Signatures DataBase (MSigDB).35,36 Proteins were initially ranked based on the P-values (statistical significance) from linear regression models for emphysema-predominant vs. non-emphysema predominant COPD biomarker identification. We then integrated the negative log 10 transformed P-values with the direction of beta coefficients (negative for higher expression level in non-emphysema predominant COPD, and positive for higher expression level in emphysema-predominant COPD) to capture regulatory directions. Higher absolute values represent the most differentially detected proteins. Positive directions indicate proteins with higher expression level in emphysema-predominant COPD, while negative directions indicate proteins with higher expression level in non-emphysema predominant COPD.

We also performed gene ontology enrichment analysis using R package topGO v.2.5037 with significant emphysema-predominant vs. non-emphysema predominant COPD proteomic biomarkers with FDR < 0.05 as genes of interest and all detected proteins from SomaScan assay as gene background.

Weighted gene correlation network analysis

We calculated proteomic residuals to remove effects of covariates including age, sex, race, pack-years, BMI, clinical centre, current smoking status, and cellular components (platelets and white blood cells). Weighted gene correlation network analysis (WGCNA) was performed on all available proteins (4979 SOMAmers, 4727 proteins at UniProt ID level) in COPDGene COPD subjects (emphysema-predominant and non-emphysema predominant COPD) using the R package WGCNA v1.70-338 with proteomic residuals after covariate adjustment. We set minimal module size at ten and optimised soft threshold (power) according to scale-free topology model fit and mean connectivity to evaluate connectivity between network nodes.

We estimated eigengene expression level for each identified module. The module eigengene is the first principal component of proteins from that module. Associations between module eigengenes and emphysema-predominant vs. non-emphysema predominant COPD were evaluated using linear regression models:

ModuleeigengeneEmphysema-predominantvs.Non-emphysemapredominantCOPD

Module membership is defined as the Pearson's correlation between module eigengene expression level and each single gene in the module, reflecting the module representativity and connectivity level of its gene components. Gene significance is defined as the association between each gene from the module and a specific trait. We evaluated the module representativity of components in the selected module by calculating Pearson's correlation coefficients between component expression levels and eigengene expression level.

Gene ontology enrichment analysis was performed on genes from COPD subtype associated modules using R package topGO v.2.50.37

Phenotypic and proteomic genetic associations

In COPDGene, we summarised candidate SNPs including 82 COPD GWAS loci from Sakornsakolpat et al.27 and rs28929474 (Z allele) in SERPINA1. To evaluate whether the COPDGene cohort is large enough to detect significant quantitative trait loci associations, we performed power calculations using R package pwr.

We tested COPD-associated SNPs with pairwise comparisons among 1) control population, 2) emphysema-predominant COPD, and 3) non-emphysema predominant COPD using additively coded genotypes. Logistic regression models were established comparing COPD subtypes and control participants with the following covariates: age, sex, pack-years, BMI, current smoking status, clinical centre, and top ten principal components of genetic ancestry, as shown below. Models were stratified by race (non-Hispanic White and Black participants), and results were combined by fixed effect meta-analyses using METAL.39

DiseasestatusCOPDGWASSNPs+age+sex+pack-years+BMI+currentsmokingstatus+clinicalcentre+populationstratificationPCs

We performed quantitative trait locus analysis using MatrixEQTL40 to reveal the associations between COPD genetic loci and proteomic biomarkers with age, sex, pack-years, BMI, current smoking status, clinical centres, top ten principal components of genetic ancestry, and top two proteomic principal components as covariates, as shown below:

COPDsubtypebiomarkersCOPDGWASSNPs+age+sex+pack-years+BMI+currentsmokingstatus+clinicalcentre+cellularcomponents(platelets,whitebloodcells)+populationstratificationPCs+proteinexpressionPCs

Genomic locations for candidate proteins and COPD GWAS loci were summarised based on GRCh37 human genome assemblies. We also combined QTL associations across ancestry groups using METAL.39

Machine learning model for COPD subtype classification

To predict emphysema and non-emphysema predominant COPD subtypes, we created a random forest model. We utilised 10-fold cross-validation to tune parameters: number of trees and mtry with optimised prediction accuracy. We optimised a random forest model in the COPDGene proteomics cohort using 10 fold cross-validation evaluated by prediction accuracy. A feature selection approach was performed for each training process using the mutual information between each protein and COPD subtypes. The final model was established in the entire COPDGene proteomics cohort with the optimised number of trees and mtry. The final optimised random forest model was tested in SPIROMICS, and the feature importance from the final optimised model was also evaluated.

Statistics

Clinical characteristics

Quantitative variables were compared among control, emphysema-predominant COPD, and non-emphysema predominant COPD participants using independent Student's t-tests, while categorical variables were analysed using Chi-square tests. Chi-square tests were used since the expected number of observations was at least 5 for each cell of the contingency table analysis.

Sample size determination

With linear regression R2 controlled at 0.3 (variables with low explanatory levels), we utilised R package pwr to estimate minimal sample size required for COPD subtype proteomic biomarker identification and quantitative trait loci recognition.

Proteomic biomarker identification

For linear regression models, we selected covariates based on known effects on SomaScan assay results: clinical centre and cellular components and similar covariates as have been used in other COPD clinical epidemiological studies: age, sex, pack-years, current smoking, and BMI. We used one-way ANOVA followed by the Student-Newman-Keuls (SNK) post-hoc test to identify differentially expressed proteins among EPD, NEPD, and control groups. This approach offers greater statistical power to detect group-specific differences compared to more conservative multiple comparison methods.

Sensitivity analyses

Sensitivity analysis was performed to assess the stability of the results by adjusting for FEV1% predicted. This helps determine whether the primary findings remain consistent after adjusting for severity of airflow obstruction.

Pathway analyses

Significant gene ontology enrichment results were selected with topGO weight01 statistics P-value < 0.001, as a stringent criterion for statistical significance. Gene set enrichment analyses results were selected with FDR controlled at 0.05.

Weighted gene correlation network analyses

The selection of the soft-thresholding power (β) is a critical step for constructing a biologically meaningful network. The soft-thresholding power controls the extent to which strong gene-gene correlations are emphasised while weak correlations are suppressed. We set minimal module size at ten and optimised soft threshold (power) according to scale-free topology model fit and mean connectivity to evaluate connectivity between network nodes.

Phenotypic and proteomic genetic associations

Population stratification covariates were added for linear and logistic regression models. Significant phenotypic genetic associations were selected with P-value < 0.05 and significant quantitative trait loci associations were selected with FDR controlled at 0.05.

Machine learning modelling

With ntree (number of decision trees) tuning from 400 to 1000 and mtry (number of variables sampled for each decision tree) tuning from 3 to 12, we utilised 10-fold cross-validation to tune parameters: number of trees and mtry with optimised prediction accuracy. A feature selection approach was performed for each training process using the mutual information between each protein and COPD subtypes. The number of optimised features was rounded to the square root of training set sample number.

Role of funders

The study sponsors had no role in the study design; in the collection, analysis, and interpretation of the data; in the writing of the report; and in the decision to submit the paper for publication.

Results

Clinical characteristics

Participants were classified into control, EPD, and NEPD groups based on spirometry and quantitative CT emphysema. Clinical and demographic characteristics are presented in Table 1 (COPDGene) and Supplemental Table S1 (SPIROMICS). A longer smoking history and worse lung function values were observed in EPD compared to participants with NEPD. The CONSORT diagram for COPD subtype proteomics analysis is shown in Fig. 1.

Table 1.

Clinical characteristics of control, and participants with emphysema-predominant COPD, and non-emphysema predominant COPD in the COPDGene cohort.

Controlb,c Non-emphysema predominant COPDb,c Emphysema-predominant COPDb,c P for emphysema/non-emphysema predominant COPDb
Number of participants 2091 722 726
Age (Years) 63.11 (8.28) 65.84 (8.34) 69.15 (7.71) <0.0001
Sex (Male/All subjects) 0.46 (952/2091) 0.49 (352/722) 0.59 (426/726) 0.0002
Sex (Female/All subjects) 0.54 (1139/2091) 0.51 (370/722) 0.41 (300/726) 0.0002
Race (White/All subjects)e 0.65 (1364/2091) 0.73 (524/722) 0.79 (570/726) 0.01
Pack-yearsd 34.80 (23.80) 44.00 (26.20) 48.00 (31.65) <0.0001
BMI (Kg/M2)a 29.36 (6.00) 30.38 (6.68) 25.94 (5.17) <0.0001
Smoking status (Former smokers/All subjects) 0.59 (1235/2091) 0.51 (366/722) 0.79 (574/726) <0.0001
FEV1/FVC Ratioa 0.78 (0.05) 0.60 (0.08) 0.43 (0.10) <0.0001
FEV1% predicted 97.54 (11.58) 61.12 (12.84) 43.29 (16.54) <0.0001
PRM fSAD (%)a,d 6.93 (6.58) 16.26 (12.91) 34.56 (14.84) <0.0001
Chronic bronchitisa (Fraction of affirmative responses) 0.09 (185/2091) 0.24 (173/722) 0.26 (185/726) 0.5
Six-minute walk distance (Metres) 433.97 (120.30) 365.03 (122.35) 320.66 (135.04) <0.0001
Pi10a 1.99 (0.41) 2.74 (0.627) 2.66 (0.49) 0.004
DLCO (% predicted)a 88.70 (17.55) 74.63 (18.46) 48.63 (17.02) <0.0001
Total corticosteroid usea (User/All subjects) 0.03 (59/2091) 0.08 (60/722) 0.13 (93/726) 0.007
a

BMI: body mass index; PRM fSAD: Parametric Response Mapping, functional small airway disease; Pi10: square root of wall area for a hypothetical airway of 10-mm lumen perimeter on computed tomography; FEV1: amount of air forcefully exhaled in the first second after taking a deep breath; FVC: the total amount of air forcefully exhaled after a full inhalation; DLCO: the diffusing capacity of the lungs for carbon monoxide; Total corticosteroid use: inhaled and/or oral corticosteroids use reported at the COPDGene Phase 2 visit.

b

Chi square tests were used for categorical variables and T tests were used for continuous variables.

c

Mean (SD) for quantitative variables without skewness and proportion (n/N) for categorical variables.

d

Median (IQR) for variables with skewness: PRM fSAD (%) and Pack-years.

e

Participants were enrolled in two racial groups: non-Hispanic White and Black.

Proteomic biomarker identification

Power calculations confirming adequate sample size for proteomic biomarker detection are presented in Supplemental Figure S1. PCA plots for proteomics residuals in the COPDGene and SPIROMICS cohorts, which demonstrate overlap between participants with EPD and participants with NEPD for most proteins, are presented in Supplemental Figure S2. We identified 125 SOMAmers representing 124 proteins (UniProt ID) at FDR < 0.05 comparing EPD and NEPD COPD within COPDGene, as shown in Supplemental Table S2. sRAGE (encoded by AGER) is the most significant subtype-associated proteomic biomarker identified in COPDGene. We visualised the directions and significance of COPD subtype proteomic associations in COPDGene and SPIROMICS using volcano plots, as shown in Fig. 2 (COPDGene cohort) and Supplemental Figure S3 (SPIROMICS). The top 20 proteomic biomarkers in COPDGene (FDR < 0.05) that were at least nominally associated with COPD subtypes in SPIROMICS (P-value < 0.05) are shown in Table 2; the full list of 66 COPDGene SomaScan biomarkers (SOMAmers, representing 65 UniProt IDs) with nominal P-value < 0.05 in SPIROMICS is shown in Supplemental Table S3. Only one of these 66 associations (STOML1) had an opposite direction of effect in COPDGene and SPIROMICS. Of interest, five proteins were associated with COPD subtypes with FDR < 0.05 in both COPDGene and SPIROMICS (sRAGE, KRT20, CD93, KLK10, and WFDC1). Among 124 COPD subtype proteomic biomarkers (UniProt ID) in COPDGene, 120 had the same direction of effect in COPDGene and SPIROMICS, a highly statistically significant concordance in effect direction (P = 9.12 × 10−31, as shown in Supplemental Table S5). For all 4727 proteins (UniProt ID) assessed in COPDGene, 59% had higher levels in participants with EPD; however, among the 124 subtype proteomic biomarkers, 88% had higher protein levels in participants with NEPD. A similar pattern was observed in SPIROMICS.

Fig. 2.

Fig. 2

Volcano plot for proteomic biomarkers identified by comparison between emphysema-predominant and non-emphysema predominant COPD. Models were adjusted for age, sex, race, pack-years of smoking, BMI, current smoking status, clinical centre, and cellular components (white blood cell counts and platelet counts).

Table 2.

Top twenty emphysema-predominant vs. non-emphysema predominant COPD proteomic biomarkers in COPDGene with nominal replication (P < 0.05) in SPIROMICS, ranking by P-values in COPDGene.

Protein name Gene name UniProt ID SOMAmer COPDGene P COPDGene FDR COPDGene CIs SPIROMICS P SPIROMICS CIs
Advanced glycosylation end product-specific receptor, soluble AGER Q15109 4125_52 3.2E-15 1.6E-11 −0.36, −0.22 2.8E-05 −0.45, −0.16
N-terminal pro-BNP NPPB P16860 7655_11 7.0E-09 1.3E-05 −0.38, −0.19 3.0E-02 −0.34, −0.02
Polymeric immunoglobulin receptor PIGR P01833 3216_2 1.2E-08 1.5E-05 −0.22, −0.11 1.1E-03 −0.30, −0.08
Collagen alpha-1(XXVIII) chain COL28A1 Q2UY09 10702_1 6.1E-08 6.0E-05 −0.13, −0.06 2.2E-03 −0.15, −0.03
Peptidyl-prolyl cis-trans isomerase C PPIC P45877 18819_21 7.5E-08 6.2E-05 −0.12, −0.06 4.6E-04 −0.16, −0.05
Laminin subunit gamma 2 LAMC2 Q13753 9580_5 2.7E-07 1.9E-04 −0.16, −0.07 2.1E-03 −0.21, −0.05
EGF-like repeat and discoidin I-like domain-containing protein 3 EDIL3 O43854 9360_33 5.8E-07 3.6E-04 −0.17, −0.08 2.2E-03 −0.24, −0.06
Peroxidasin homologue PXDN Q92626 13463_1 7.6E-07 3.8E-04 −0.2, −0.09 3.3E-02 −0.21, −0.01
Neuroblastoma suppressor of tumorigenicity 1 NBL1 P41271 2944_66 1.6E-06 6.5E-04 −0.16, −0.07 2.8E-03 −0.16, −0.03
Trefoil factor 2 TFF2 Q03403 9191_8 2.2E-06 8.2E-04 −0.18, −0.07 2.3E-03 −0.26, −0.06
Complement component C1q receptor CD93 Q9NPY3 14136_234 3.2E-06 1.1E-03 −0.09, −0.04 2.0E-08 −0.20, −0.10
Collagen alpha-3(VI) chain COL6A3 P12111 11196_31 3.8E-06 1.3E-03 −0.11, −0.04 3.4E-04 −0.16, −0.05
Secreted frizzled-related protein 4 SFRP4 Q6FHJ7 17447_52 5.2E-06 1.5E-03 −0.13, −0.05 6.8E-04 −0.20, −0.05
WAP four-disulfide core domain protein 1 WFDC1 Q9HC57 9316_67 5.6E-06 1.5E-03 −0.12, −0.05 5.3E-05 −0.21, −0.07
Ephrin-B2: extracellular domain EFNB2 P52799 14131_37 5.8E-06 1.5E-03 −0.14, −0.06 1.9E-04 −0.23, −0.07
Ephrin-A4 EFNA4 P52798 2614_28 8.5E-06 2.0E-03 −0.09, −0.04 3.9E-03 −0.12, −0.03
Desmocollin-2 DSC2 Q02487 13126_52 1.0E-05 2.1E-03 −0.10, −0.04 1.5E-02 −0.15, −0.02
Complement decay-accelerating factor CD55 P08174 5069_9 1.7E-05 3.4E-03 −0.08, −0.03 6.6E-04 −0.12, −0.03
Netrin receptor UNC5B UNC5B Q8IZJ1 15394_79 1.9E-05 3.5E-03 −0.12, −0.05 5.4E-04 −0.21, −0.06
Inactive tyrosine-protein kinase transmembrane receptor ROR1 ROR1 Q01973 2590_69 2.4E-05 4.2E-03 −0.12, −0.04 7.4E-03 −0.18, −0.03

P (P-value), FDR (false discovery rate), and CIs (Confidence Intervals for Beta coefficient from linear regression model).

As an alternative approach to identify COPD subtype proteomic biomarkers that leverages the large number of control participants in COPDGene, we evaluated the proteomic expression differences among controls, participants with EPD, and participants with NEPD using ANOVA (one-way ANOVA) and post-hoc SNK tests.41 Most of the proteomic target biomarkers (102/124 at UniProt ID level, 103/125 at SomaScan Probe level) identified in the linear regression models were also significant using the ANOVA/SNK approach, as shown in Supplemental Figure S4. Comparison of protein expression levels for COPD subtype biomarkers in control, emphysema-predominant COPD, and non-emphysema predominant COPD are shown in Supplemental Table S7: most proteomic biomarkers had similar levels in participants with NEPD and control, and significantly lower values in participants with EPD.

To evaluate the impact of airflow obstruction severity on proteomic biomarker detection, we identified significant COPD subtype proteomic associations with adjustment for FEV1% predicted (FDR controlled at 0.05) in Supplemental Table S4. Fifty-nine of the original 124 protein biomarkers (two biomarkers with higher expression level in EPD and 57 biomarkers with higher expression level in NEPD) were consistent after adjusting for FEV1% predicted with FDR controlled at 0.05, as shown in Supplemental Figure S5.

Proteomic expression levels for two previously reported proteins in COPD pathogenesis, sRAGE21 and PIGR,42,43 are shown in Supplemental Figure S6. Lower protein levels were observed in EPD for both proteins compared to participants with NEPD and control.

Pair-wise protein-protein correlations were evaluated in COPD subtype proteomic biomarkers. Higher correlations were observed between biomarkers with higher expression levels in participants with NEPD, as shown in Supplemental Figure S7.

We also compared identified COPD subtype biomarkers with previously reported cardiac biomarkers (60 detectable in COPDGene SOMAScan proteomics panel) and found five overlapping biomarkers including NPPB,44 DSC2,45 CST3,46 KRT1,47 and ADH1B.48 All five proteins have higher expression levels in non-emphysema predominant COPD and all were previously reported to be expressed more highly in cardiac diseases (Supplemental Table S8).

Pathway analysis

Since the vast majority of the 124 protein biomarkers identified in COPDGene had a similar direction of effect in SPIROMICS, we performed gene set enrichment analysis and gene ontology enrichment analysis using the 124 significant protein biomarkers from COPDGene. Significant results for HALLMARK gene sets and GO enrichment are shown in Fig. 3. Significant KEGG and REACTOME gene set enrichment results are presented in Supplemental Figure S8. COPD subtype-associated proteomic biomarkers are significantly enriched for important biological processes including cell adhesion, collagen-containing extracellular matrix, and epithelial mesenchymal transition.

Fig. 3.

Fig. 3

HALLMARK Gene sets and gene ontology enrichment analysis for COPD subtype proteomic associations. Significant gene sets were selected with q-value < 0.05, and significant gene ontology terms were selected with P-value < 0.001.

Weighted gene correlation network analysis

The power threshold for WGCNA was optimised as shown in Supplemental Figure S9. With minimum network module size controlled at 10, we identified eleven correlation network modules. Only the green module was associated with EPD vs. NEPD COPD subtypes with FDR controlled at 0.05. COPD subtype proteomic association P-values across different network modules are compared in Supplemental Figure S10. Proteins in the green module were more significantly associated with COPD subtypes compared to other modules. There are 105 proteins (represented by 108 SOMAmers) in module green and among them, 40 proteins (42 SOMAmers) are COPD subtype-associated proteomic biomarkers. Module membership is the correlation between module eigengene and module components and reflects the intramodular connectivity of each component.38 Gene significance (COPD subtype associations) and module membership for green module components are listed in Supplemental Table S9 and their associations are shown in Supplemental Figure S11. With COPD subtype association FDR controlled at 0.05 and module membership > 0.8, we identified thirteen COPD subtype protein biomarkers representative for the green module. These highly connected protein biomarkers were PXDN, EFNA2, EFNB2, SELENOM, UNC5B, EPHB4, LRP10, ROR1, TNFRSF1A, EPHA2, GAS1, ROR2, and SPON2.

Gene ontology enrichment analysis was performed with green module components as proteins of interest and all detected human proteins as background. Proteins from the green COPD subtype-associated module enrich in biological processes including cell adhesion and ephrin receptor signalling pathway with TopGO Fisher's P-value < 0.001, as shown in Supplemental Figure S12.

Genetic associations for COPD subtypes and COPD subtype proteomic biomarkers

We analysed top SNPs from 82 COPD GWAS loci27 and rs28929474 in SERPINA1, as shown in Table 3. Seven nominally significant (P < 0.05) EPD vs. NEPD associations for COPD genetic loci were found including rs28929474 in SERPINA1; the only COPD-associated SNP significant at FDR < 0.05 was rs2579762 in LRMDA as shown in Supplemental Table S10. Associations of COPD SNPs with each COPD subtype/control comparison are presented in Supplemental Figure S13. SNPs near ADGRG6 and TGFB2 were nominally associated with EPD vs. Control only, while another SNP near STN1 was associated with NEPD vs. Control only.

Table 3.

Genetic variant information for 82 GWAS loci reported in Sakornsakolpat et al. and rs28929474 in SERPINA1.

RS ID Chromosome Position Risk/alternative allele Nearest gene COPD odds ratio COPD GWAS P-value
rs13140176 chr4 145489098 A/G HHIP 1.18 4.1E-59
rs34712979 chr4 106819053 A/G NPNT 1.18 3.0E-46
rs9399401 chr6 142668901 T/C ADGRG6 1.16 1.6E-40
rs10037493 chr5 147854970 C/T HTR4 1.13 2.6E-33
rs1441358 chr15 71612514 G/T THSD4 1.13 7.4E-33
rs55676755 chr15 78898932 G/C CHRNA3 1.11 2.7E-26
rs7068966 chr10 12277992 C/T CDC123 1.1 6.2E-23
rs3095329 chr6 30693816 G/A IER3 1.12 2.1E-21
rs4888379 chr16 75340231 T/A CFDP1 1.1 5.9E-21
rs16825267 chr2 229569919 C/G PID1 1.19 1.8E-20
rs7671261 chr4 89883818 A/G FAM13A 1.09 1.4E-18
rs2070600 chr6 32151443 C/T AGER 1.21 1.1E-17
rs10866659 chr5 156937043 G/A ADAM19 1.09 1.2E-16
rs2806356 chr6 109266255 C/T ARMC2 1.1 2.9E-15
rs2955083 chr3 127961178 A/T EEFSEC 1.13 3.5E-15
rs7642001 chr3 168746145 A/G MECOM 1.08 1.9E-14
rs9350191 chr6 19842661 T/C ID4 1.12 5.1E-14
rs10114763 chr9 4143749 T/A GLIS3 1.07 8.7E-13
rs62191105 chr2 239872704 C/T TWIST2 1.09 2.6E-12
rs2571445 chr2 218683154 A/G TNS1 1.07 3.4E-12
rs156394 chr9 23588684 T/C ELAVL2 1.07 3.8E-12
rs10152300 chr15 84392907 G/A ADAMTSL3 1.08 4.2E-12
rs647097 chr18 8808464 C/T MTCL1 1.08 1.0E-11
rs1529672 chr3 25520582 C/A RARB 1.09 2.5E-11
rs11118406 chr1 219924894 T/A SLC30A10 1.08 4.1E-11
rs646695 chr6 140280398 C/T CITED2 1.08 4.6E-11
rs17759204 chr3 55158224 G/A CACNA2D3 1.07 8.8E-11
rs10760580 chr9 101661650 G/A COL15A1 1.07 1.2E-10
rs955277 chr2 9290357 T/C ASAP2 1.07 1.9E-10
rs2442776 chr3 11640601 G/A VGLL4 1.09 2.0E-10
rs2579762 chr10 78318879 C/A LRMDA 1.06 2.6E-10
rs629619 chr1 111738108 T/C DENND2D 1.08 2.9E-10
rs9329170 chr8 8697658 C/G MFHAS1 1.1 3.6E-10
rs4093840 chr3 123077042 A/T ADCY5 1.06 3.9E-10
rs9617650 chr22 18488883 G/C MICAL3 1.08 4.4E-10
rs76841360 chr1 40060025 A/G PABPC4 1.08 5.0E-10
rs1551943 chr5 52195033 A/G ITGA1 1.08 5.8E-10
rs153916 chr5 95036700 T/C SPATA9 1.06 6.3E-10
rs9435731 chr1 17306029 A/C MFAP2 1.06 6.6E-10
rs117261012 chr11 86444761 G/A PRSS23 1.09 6.9E-10
rs2897075 chr7 99630342 C/T ZKSCAN1 1.07 7.3E-10
rs12185268 chr17 43923683 G/A SPPL2C 1.08 9.9E-10
rs7958945 chr12 115947901 G/A MED13L 1.06 1.0E-09
rs12519165 chr5 170901586 A/T FGF18 1.07 1.1E-09
rs13198656 chr6 22004909 T/C PRL 1.06 1.2E-09
rs72626215 chr19 46294136 G/A DMWD 1.07 1.7E-09
rs11655567 chr17 69216687 C/T SOX9 1.06 1.9E-09
rs7307510 chr12 96237570 C/T SNRPF 1.08 2.6E-09
rs9525927 chr13 44842503 G/A SERP2 1.08 2.8E-09
rs4585380 chr4 75673363 G/A BTC 1.07 3.4E-09
rs4757118 chr11 13171236 T/C ARNTL 1.06 3.8E-09
rs798565 chr7 2752152 G/A AMZ1 1.07 3.9E-09
rs72673419 chr1 60913143 T/C C1orf87 1.14 4.0E-09
rs3009947 chr1 218689155 C/T TGFB2 1.06 4.0E-09
rs72699855 chr14 93105953 G/C RIN3 1.08 4.8E-09
rs11579382 chr1 239901006 C/G CHRM3 1.06 6.5E-09
rs2040732 chr7 20418134 C/T ITGB8 1.06 6.9E-09
rs1631199 chr6 117259673 G/C RFX6 1.06 7.6E-09
rs73158393 chr22 33335386 C/G SYN3 1.07 7.7E-09
rs62065216 chr17 38218773 A/G THRA 1.06 8.1E-09
rs72731149 chr15 49984710 G/C DTWD1 1.12 8.3E-09
rs10929386 chr2 15906179 C/T DDX1 1.06 9.1E-09
rs34727469 chr17 36835079 T/C RPL23 1.09 1.1E-08
rs8044657 chr16 58022625 G/A TEPP 1.11 1.1E-08
rs1334576 chr6 7211818 A/G RREB1 1.06 1.2E-08
rs8080772 chr17 28413129 T/C EFCAB5 1.06 1.4E-08
rs979453 chr5 150595073 G/A CCDC69 1.06 1.4E-08
rs72902175 chr2 157013035 T/C NR4A2 1.09 1.6E-08
rs7866939 chr9 85126163 C/T RASEF 1.06 1.8E-08
rs13073544 chr3 29472412 C/G RBMS3 1.06 2.0E-08
rs721917 chr10 81706324 G/A SFTPD 1.06 2.2E-08
rs62375246 chr5 132439010 A/T HSPA4 1.06 2.2E-08
rs1570221 chr10 105656874 A/G STN1 1.06 2.2E-08
rs62259026 chr3 57746515 C/T SLMAP 1.07 2.4E-08
rs11049386 chr12 28320536 T/A CCDC91 1.06 2.7E-08
rs803923 chr9 119401650 A/G ASTN2 1.06 2.7E-08
rs34651 chr5 72144005 C/T TNPO1 1.11 3.0E-08
rs2096468 chr21 35661745 A/C KCNE2 1.06 4.0E-08
rs4660861 chr1 45946636 G/T TESK2 1.06 4.4E-08
rs56134392 chr16 10709013 C/T TEKT5 1.06 4.5E-08
rs12466981 chr2 42433247 C/T EML4 1.06 4.9E-08
rs7650602 chr3 141147414 C/T ZBTB38 1.06 4.9E-08
rs28929474 chr14 94844947 T/C SERPINA1 Not a GWAS Variant

Protein information for the COPD subtype proteomic biomarkers in COPDGene used for pQTL analysis is shown in Table 4. QQ plots for cis (±1 Mb with 83 COPD SNPs) and trans (all SNPs beyond ±1 Mb from a protein biomarker gene) pQTL analysis associating COPD SNPs with COPD subtype proteomic biomarkers are shown in Supplemental Figure S14. Power calculations confirming adequate sample size using R package pwr for QTL analysis are presented in Supplemental Figure S1. No significant trans-pQTL associations were found. We identified significant cis-pQTL associations with FDR controlled at 0.05 between rs2070600 and sRAGE in all participants, and separately in participants who are controls, participants with EPD, and participants with NEPD, and between rs979453 and GM2A in all participants and separately in control and NEPD populations, as shown in Supplemental Table S11.

Table 4.

Protein information (GRCh37) for COPD subtype associated proteomic biomarkers in COPDGene cohort for pQTL analysis.

Protein name Gene name UniProt ID SOMAmer FDR Chr Start End
Advanced glycosylation end product-specific receptor, soluble AGER Q15109 4125_52 1.6E-11 6 32148745 32152101
Glucagon GCG P01275 4891_50 1.3E-05 2 162999392 163008914
N-terminal pro-BNP NPPB P16860 7655_11 1.3E-05 1 11917521 11918988
Polymeric immunoglobulin receptor PIGR P01833 3216_2 1.5E-05 1 207101863 207119811
Collagen alpha-1(XXVIII) chain COL28A1 Q2UY09 10702_1 6.0E-05 7 7395834 7575484
Peptidyl-prolyl cis-trans isomerase C PPIC P45877 18819_21 6.2E-05 5 122358945 122372436
Laminin subunit gamma 2 LAMC2 Q13753 9580_5 1.9E-04 1 183155373 183214035
EGF-like repeat and discoidin I-like domain-containing protein 3 EDIL3 O43854 9360_33 3.6E-04 5 83236373 83680611
Peroxidasin homologue PXDN Q92626 13463_1 3.8E-04 2 1635659 1748624
Neurotensin/neuromedin N NTS P30990 7857_22 3.8E-04 12 86268073 86276770
N-acetylglucosamine-1-phosphotransferase subunit gamma GNPTG Q9UJJ9 10666_7 3.8E-04 16 1401924 1413352
Neuroblastoma suppressor of tumorigenicity 1 NBL1 P41271 2944_66 6.5E-04 1 19967048 19984945
Trefoil factor 2 TFF2 Q03403 9191_8 8.2E-04 21 43766466 43771237
Complement component C1q receptor CD93 Q9NPY3 14136_234 1.1E-03 20 23059986 23066977
Collagen alpha-3(VI) chain 1 COL6A3 P12111 11196_31 1.3E-03 2 238232646 238323018
RELT-like protein 1 RELL1 Q8IUW5 13399_33 1.4E-03 4 37592422 37687998
Ephrin-B2: extracellular domain EFNB2 P52799 14131_37 1.5E-03 13 107142079 107187462
Secreted frizzled-related protein 4 SFRP4 Q6FHJ7 17447_52 1.5E-03 7 37945543 38065297
WAP four-disulfide core domain protein 1 WFDC1 Q9HC57 9316_67 1.5E-03 16 84328252 84363450
Retinol-binding protein 4 RBP4 P02753 15633_6 2.0E-03 10 95351444 95361501
Ephrin-A4 EFNA4 P52798 2614_28 2.0E-03 1 155036207 155042029
Peptide YY PYY P10082 3727_35 2.1E-03 17 42030106 42081837
Soluble calcium-activated nucleotidase 1 CANT1 Q8WVQ1 6480_1 2.1E-03 17 76987799 77005949
Desmocollin-2 DSC2 Q02487 13126_52 2.1E-03 18 28645940 28682378
Complement decay-accelerating factor CD55 P08174 5069_9 3.4E-03 1 207494853 207534311
Netrin receptor UNC5B UNC5B Q8IZJ1 15394_79 3.5E-03 10 72972327 73062621
Follistatin-related protein 3 FSTL3 O95633 3438_10 3.5E-03 19 676392 683385
Inactive tyrosine-protein kinase transmembrane receptor ROR1 ROR1 Q01973 2590_69 4.2E-03 1 64239693 64647181
Ganglioside GM2 activator GM2A P17900 15441_6 5.1E-03 5 150591711 150650001
Ribonuclease T2 RNASET2 O00584 16913_8 5.1E-03 6 167342992 167370679
Neurocan core protein NCAN O14594 15573_110 5.5E-03 19 19322782 19363042
Ephrin type-A receptor 2 EPHA2 P29317 4834_61 5.9E-03 1 16450832 16482582
Tyrosine-protein kinase transmembrane receptor ROR2 ROR2 Q01974 7861_9 6.0E-03 9 94325373 94712444
Beta-2-microglobulin B2M P61769 3485_28 6.3E-03 15 45003675 45011075
Myocilin MYOC Q99972 16558_2 6.6E-03 1 171604557 171621823
Xyloside xylosyltransferase 1 XXYLT1 Q8NBI6 6375_75 6.6E-03 3 194789008 194991896
Tryptase beta 1 TPSAB1 Q15661 9409_11 6.6E-03 16 1290697 1292555
Interleukin-18 IL18 Q14116 5661_15 6.8E-03 11 112013974 112034840
Thrombospondin-2 THBS2 P35442 3339_33 7.2E-03 6 169615875 169654139
Vesicular integral-membrane protein VIP36 LMAN2 Q12907 9468_8 7.5E-03 5 176758563 176778853
Retinol-binding protein 5 RBP5 P82980 19241_31 8.4E-03 12 7276280 7281538
Prefoldin subunit 5 PFDN5 Q99471 4271_75 8.4E-03 12 53689075 53693234
Kallikrein-10 KLK10 O43240 6227_1 8.6E-03 19 51515995 51523431
ADAMTS-like protein 2 ADAMTSL2 Q86TH1 6379_62 9.1E-03 9 136397286 136440641
Tumour necrosis factor receptor superfamily member 1A TNFRSF1A P19438 2654_19 1.1E-02 12 6437923 6451280
C-type mannose receptor 2 MRC2 Q9UBG0 3041_55 1.1E-02 17 60704762 60770958
Lumican LUM P51884 13114_50 1.1E-02 12 91496406 91505608
Brain-specific serine protease 4 PRSS22 Q9GZN4 4534_10 1.1E-02 16 2902728 2908171
Matrilin-4 MATN4 O95460 7083_74 1.1E-02 20 43922085 43937169
Ribonuclease UK114 RIDA P52758 14636_25 1.2E-02 8 99114572 99129469
Protein S100-A16 S100A16 Q96FQ6 17836_17 1.2E-02 1 153579362 153585621
Out at first protein homologue OAF Q86UD1 6414_8 1.2E-02 11 120081475 120101041
Gastrokine-2 GKN2 Q86XP6 6416_8 1.2E-02 2 69172364 69180102
Left-right determination factor 2 LEFTY2 O00292 15503_15 1.3E-02 1 226124298 226129189
Cadherin-1 CDH1 P12830 18429_10 1.3E-02 16 68771128 68869451
Dual specificity mitogen-activated protein kinase kinase 2 MAP2K2 P36507 3628_3 1.3E-02 19 4090319 4124126
CMRF35-like molecule 6 CD300C Q08708 5066_134 1.3E-02 17 72537247 72542282
Alkaline phosphatase, placental-like ALPG P10696 6715_63 1.3E-02 2 233271553 233275424
Neuronal pentraxin-1 NPTX1 Q15818 9256_78 1.3E-02 17 78440948 78451643
Cadherin-17 CDH17 Q12864 16613_3 1.4E-02 8 95139399 95229531
Growth arrest-specific protein 1 GAS1 P54826 5463_22 1.4E-02 9 89559279 89562104
Natural cytotoxicity triggering receptor 1 NCR1 O76036 8360_169 1.5E-02 19 55417508 55427508
High mobility group protein B1 HMGB1 P09429 2524_56 1.6E-02 13 31032884 31191734
Tyrosine-protein phosphatase non-receptor type 1 PTPN1 P18031 3005_5 1.6E-02 20 49126891 49201299
Ribonuclease pancreatic RNASE1 P07998 7211_2 1.6E-02 14 21269387 21271437
Hepatitis A virus cellular receptor 2 HAVCR2 Q8TDQ0 5134_52 1.7E-02 5 156512843 156569880
Leucocyte immunoglobulin-like receptor subfamily A member 3 LILRA3 Q8N6C8 6391_52 1.8E-02 19 54799854 54809952
Netrin receptor UNC5B UNC5B Q8IZJ1 7776_20 1.8E-02 10 72972327 73062621
Interleukin-2 receptor subunit beta IL2RB P14784 9343_16 1.8E-02 22 37521878 37571094
Cyclin-dependent kinase 8: Cyclin-C complex CCNC P24863 3359_11 2.1E-02 6 99990256 100016849
CDK8 P49336 13 26828276 26979375
Ankyrin repeat and SOCS box protein 9 ASB9 Q96DX5 19601_15 2.1E-02 X 15253410 15288589
Amyloid-like protein 1 APLP1 P51693 7210_25 2.1E-02 19 36358801 36370693
Tumour necrosis factor receptor superfamily member 1B TNFRSF1B P20333 8368_102 2.1E-02 1 12227060 12269285
Cartilage oligomeric matrix protein COMP P49747 8043_153 2.1E-02 19 18893583 18902123
Selenoprotein M SELENOM Q8WWX9 15336_7 2.2E-02 22 31500758 31516055
Neural cell adhesion molecule 1120 kDa isoform NCAM1 P13591 4498_62 2.3E-02 11 112831997 113149158
Carbonyl reductase [NADPH] 1 CBR1 P16152 12381_26 2.3E-02 21 37442239 37445464
Microfibril-associated glycoprotein 4 MFAP4 P55083 5636_10 2.3E-02 17 19286755 19290553
Cystatin-C CST3 P01034 2609_59 2.4E-02 20 23608534 23619110
Pigment epithelium-derived factor SERPINF1 P36955 9211_19 2.4E-02 17 1665253 1680868
Ephrin-A2 EFNA2 O43921 14124_6 2.5E-02 19 1286153 1301430
Glutamate carboxypeptidase 2 FOLH1 Q04609 5478_50 2.5E-02 11 49168187 49230222
Aldo-keto reductase family 1 member C3 AKR1C3 P42330 17377_1 2.6E-02 10 5077546 5149878
Thrombospondin-4 THBS4 P35443 3340_53 2.8E-02 5 79287134 79379110
Matrix-remodelling-associated protein 7 MXRA7 P84157 8005_1 2.8E-02 17 74668633 74707098
SPARC-related modular calcium-binding protein 1 SMOC1 Q9H4F8 13118_5 2.8E-02 14 70320848 70499083
SPARC-related modular calcium-binding protein 2 SMOC2 Q9H3U7 15635_4 2.8E-02 6 168841831 169073984
Low-density lipoprotein receptor-related protein 10 LRP10 Q7Z4F1 16610_13 2.8E-02 14 23340822 23350789
Tryptase beta 2 TPSB2 P20231 3403_1 2.8E-02 16 1290697 1292555
IGF-like family receptor 1 IGFLR1 Q9H665 7244_16 2.8E-02 19 36230058 36233354
Keratin, type II cytoskeletal 1 KRT1 P04264 9931_20 2.8E-02 12 53068520 53074191
Netrin-G1 NTNG1 Q9Y2I2 5637_81 2.8E-02 1 107682629 108026080
Ephrin type-B receptor 4 EPHB4 P54760 15530_33 2.9E-02 7 100400187 100425121
Thrombospondin-3 THBS3 P49746 8982_65 2.9E-02 1 155165379 155178842
Keratin, type I cytoskeletal 20 KRT20 P35900 12975_11 3.2E-02 17 39032193 39041479
Cathepsin E CTSE P14091 15376_134 3.2E-02 1 206317459 206332104
Matrilin-2 MATN2 O00339 3325_2 3.2E-02 8 98881068 99048944
Procollagen galactosyltransferase 1 COLGALT1 Q8NBJ5 5638_23 3.2E-02 19 17666403 17693971
EGF-containing fibulin-like extracellular matrix protein 1 EFEMP1 Q12805 8480_29 3.2E-02 2 56093102 56151274
Polypeptide N-acetylgalacto-saminyltransferase 16 GALNT16 Q8N428 8923_94 3.2E-02 14 69725994 69821183
Alcohol dehydrogenase 1B ADH1B P00325 9834_62 3.2E-02 4 100226121 100242558
Tyrosine-protein phosphatase non-receptor type 4 PTPN4 P29074 14254_27 3.3E-02 2 120517207 120741394
RGM domain family member B RGMB Q6NW40 3331_8 3.3E-02 5 98104354 98134347
Follistatin FST P19883 4132_27 3.3E-02 5 52776239 52782964
Bone morphogenetic protein 7 BMP7 P18075 2972_57 3.4E-02 20 55743804 55841685
Tumour necrosis factor receptor superfamily member 1B TNFRSF1B P20333 3152_57 3.4E-02 1 12227060 12269285
Serine protease inhibitor Kazal-type 9 SPINK9 Q5DT21 8042_88 3.4E-02 5 147700766 147719412
Stomatin-like protein 1 STOML1 Q9UBI4 17344_23 3.5E-02 15 74275547 74286963
Interleukin-18 receptor accessory protein IL18RAP O95256 2993_1 3.6E-02 2 103035149 103069025
Apolipoprotein L1 APOL1 O14791 9506_10 3.6E-02 22 36649056 36663576
Vascular endothelial growth factor D VEGFD O43915 13098_93 3.6E-02 X 15363713 15402498
Cell adhesion molecule 1 CADM1 Q9BY67 3326_58 3.6E-02 11 115039938 115375675
Alpha-1-microglobulin AMBP P02760 15453_3 3.6E-02 9 116822407 116840752
CD48 antigen CD48 P09326 3292_75 3.6E-02 1 160648536 160681641
Membrane cofactor protein CD46 P15529 17682_1 3.9E-02 1 207925402 207968858
Cathepsin Z CTSZ Q9UBR2 4971_1 3.9E-02 20 57570240 57582302
Voltage-dependent calcium channel subunit alpha-2/delta-3 CACNA2D3 Q8IZS8 8885_6 4.4E-02 3 54156574 55108584
Brorin VWC2 Q2TAL6 15308_108 4.6E-02 7 49813257 49961546
Glycine N-acyltransferase GLYAT Q6IB77 19506_6 4.6E-02 11 58407899 58499447
Calpain-2 catalytic subunit CAPN2 P17655 14684_17 4.8E-02 1 223889295 223963720
Cellular retinoic acid-binding protein 2 CRABP2 P29373 11696_7 <5.0E-02 1 156669398 156675608
Low-density lipoprotein receptor-related protein 11 LRP11 Q86VZ4 15472_16 <5.0E-02 6 150139934 150186199
Ribonuclease K6 RNASE6 Q93091 5646_20 <5.0E-02 14 21249210 21250626
Complement C1q tumour necrosis factor-related protein 5 C1QTNF5 Q9BXJ0 7810_20 <5.0E-02 11 119209652 119217383
Spondin-2 SPON2 Q9BUD6 8099_42 <5.0E-02 4 1160720 1202750

Machine learning model for COPD subtype classification

We established a random forest model for COPD subtype classification in COPDGene. We optimised the random forest model using classification accuracy and identified the optimised random forest model with number of trees as 600 and mtry as 8. The final model was further evaluated in SPIROMICS with prediction accuracy as 0.60 and area under curve (AUC) as 0.65 (CIs: 0.61–0.69), as shown in Supplemental Figure S15. Top optimised features selected by the machine learning model are presented in Supplemental Figure S16, including significant EPD/NEPD proteomic biomarkers AGER and PIGR.

Additional information regarding Results is provided in the Supplemental Materials.

Discussion

Participants with emphysema predominant and non-emphysema predominant COPD have differences in clinical characteristics and disease progression. Castaldi et al. recently demonstrated that multiple different approaches for COPD subtyping could be well-captured using the EPD/NEPD criteria that we adopt in these analyses.12 However, the biological mechanisms distinguishing EPD and NEPD COPD subtypes are unknown. Our analysis focused on identification of proteomic biomarkers, protein correlations, biomarker-enriched biological pathways, correlation networks, and genetic regulatory effects to reveal molecular differences between COPD subtypes.

We identified 124 COPD subtype-associated proteomic biomarkers in COPDGene and validated these associations using several approaches. Firstly, ANOVA and post-hoc SNK tests including participants classified as COPD subtypes and control supported most of the same proteomic biomarkers. Secondly, 64 of these 124 COPD subtype-associated proteins were nominally validated with the same effect direction in SPIROMICS. We compared the directionality of associations among the 124 EPD vs. NEPD COPD protein biomarkers in COPDGene and SPIROMICS, and 120 had the same direction of effect. Thus, we found high concordance between COPDGene and SPIROMICS, supporting the validity of these proteomic biomarkers for COPD subtypes. The identification of several significant biological pathways related to EPD vs. NEPD COPD protein biomarkers, and a correlation network module associated with EPD vs. NEPD suggest that these EPD vs. NEPD COPD-associated proteins are involved in shared biological processes related to COPD heterogeneity. Interestingly, most of the EPD vs. NEPD COPD protein biomarkers had higher plasma levels in participants with NEPD; the aetiology of this phenomenon is unknown but could relate to distinct pathobiological processes in EPD and NEPD.

Among five proteins with COPD subtype association FDR < 0.05 in both COPDGene and SPIROMICS, the soluble RAGE (sRAGE) protein biomarker (encoded by the AGER gene) acts as a decoy to block inflammation.49,50 In 2011, Miniati reported lower sRAGE levels in patients with COPD.50 In 2021, Keefe et al. utilised an integrative genomic approach to identify sRAGE as a causal and protective biomarker for lung function.51 Pratte et al. associated sRAGE with airflow obstruction and emphysema.52 Lower circulating sRAGE is related to severe airflow obstruction and increased emphysema in COPD,51,52 consistent with our results. CD93 is a myeloid cell biomarker associated with intercellular adhesion and inflammation53,54 by activating beta-1 integrin through the CD93/Multierin-2/active beta-1 integrin complex.55 Plosa and Zent confirmed that beta-1 integrin regulates lung inflammation, and beta-1 integrin deletion in type 2 alveolar epithelial cells leads to the development of emphysema in mice.56 Another biomarker, KLK10, has been reported to be associated with FEV1% predicted based on blood methylation analysis in COPD.57 As for other proteomic biomarkers with nominal replication in SPIROMICS, PIGR (Polymeric immunoglobulin receptor) binds to polymeric IgA.58 Gohy et al. recognised that PIGR is downregulated in COPD, and the downregulation is associated with COPD severity.43 An airway innate immunity study presented by Richmond et al. utilised mouse models to investigate the underlying mechanisms for persistent airway inflammation in COPD.59 PIGR deficiency was shown to be associated with innate immune activation, which further drives progressive small airway remodelling and emphysema. Other nominally replicated associations for EPD vs. NEPD COPD included terminal pro-BNP; these findings may reflect key differences in comorbidities between participants with EPD and NEPD COPD, including heart disease.9

Gene set and gene ontology enrichment analyses revealed potential biological functions for significant COPD subtype proteomic associations, including epithelial mesenchymal transition (EMT), cell adhesion, and collagen-containing extracellular matrix. Su et al. noted that EMT is an important pathophysiological process involved in airway fibrosis, airway remodelling, and malignant transformation of COPD associated with cigarette smoking.60 Fujioka et al. identified the potential of human adipose-derived mesenchymal stem cells to ameliorate elastase-induced emphysema through mesenchymal-epithelial transition in mouse models.61 For cell adhesion, Mimae et al. reported that a cell adhesion molecule in lung tissue, CADM1 (Cell Adhesion Molecule 1), activated alveolar cell apoptosis in emphysematous lungs,62 and the apoptosis of alveolar cells can further contribute to emphysema development.63 Karakioulaki et al. reviewed the role of extracellular matrix remodelling in COPD and confirmed that collagen remodelling plays an important role in emphysema development,64 which was experimentally verified in mouse models decades ago.65

We identified thirteen COPD subtype protein biomarkers representative of a correlation network module that was associated with COPD subtypes. Highly connected module components for this COPD subtype-associated module include TNFRSF1A, EFNA2, and EFNB2. Fujita et al. recognised the critical role of TNFRSF1A in the pathogenesis of pulmonary emphysema in mouse models.66 Soluble ephrin-B2 was identified as a profibrotic mediator in lung fibrosis.67 EFNA2 has been reported to be associated with weight loss in COPD, and cachexia is a key systemic effect in some advanced patients with COPD.68 Thus, correlation network analysis revealed molecular differences between EPD vs. NEPD COPD from multiple functional perspectives.

Genetic associations for COPD subtypes and COPD subtype-associated proteomic biomarkers were also evaluated. Due to the modest sample size for a genetic association study, only previously identified COPD genetic determinants were included. A SNP in LRMDA was associated with EPD vs. NEPD COPD, and six other SNPs had nominal EPD vs. NEPD COPD associations (including the Z allele of SERPINA1). Thus, some COPD genetic determinants appear to have differential impact in EPD and NEPD COPD. In addition to the previously reported genetic control of sRAGE,20,21,27 we found that sphingolipid associated protein (GM2A) levels were associated with a local COPD GWAS SNP (rs979453) in control and NEPD participants but not in EPD COPD, suggesting that GM2A is a functional COPD susceptibility gene. Key genes in the chromosome 5q region have not been reported, but our analyses indicated that GM2A may be at least one of the functional genes in this region. Kechris and Bowler reported significant associations between sphingolipids and emphysema in plasma,69 which potentially could provide a connection between GM2A and emphysema.

Clinical characteristics including treatment with corticosteroid medications and comorbidities such as cardiovascular disease may also affect COPD proteomic associations. Corticosteroids have strong immune-modulating effects.70 We observed that corticosteroids are more frequently used in patients with emphysema-predominant COPD, which may be due to the relatively lower lung function of participants with emphysema-predominant COPD. Cardiovascular diseases frequently coexist with COPD.71 We found that five previously reported cardiac biomarkers, including NPPB, DSC2, CST3, KRT1, and ADH1B, were differentially expressed in COPD subtypes, with higher levels in participants with NEPD. This is consistent with the increased risk for cardiovascular events in participants with NEPD in relation to elevated coronary artery calcification scores that we recently reported in COPDGene.72

Although we have performed a comprehensive proteomic analysis of two COPD subtypes, there are some limitations in our current study. One potential limitation is that we focused on plasma rather than lung proteomics; plasma proteomic biomarkers may not reflect the molecular heterogeneity within the lungs, and the cells and tissues producing these plasma proteins are uncertain. The use of plasma proteomics can capture systemic differences between COPD subtypes but may also introduce bias in pathway analyses. Plasma proteomics was evaluated using SomaLogic SomaScan® panel; however, SomaScan is an aptamer-based proteomic platform with potential off-target cross-reactivity and limited capacity for isoform detection. Since the panel includes a minority of human proteins, additional COPD subtype biomarkers are likely to be discovered with more comprehensive proteomic assessments. For disease subtyping, the EPD/NEPD classification does not capture all aspects of COPD heterogeneity (e.g., emphysema pathological pattern and distribution). The attenuation of some of the identified protein biomarker associations after adjusting for FEV1 (% predicted) suggests that some of these biomarkers may relate to disease severity. Another limitation is related to unmeasured cofounding factors for evaluating proteomic disease associations. In our analyses, we included several known SomaScan methodological covariates (clinical centre and cellular components) and widely used COPD clinical epidemiological covariates (age, sex, self-reported race, pack-years of smoking, current smoking status, and BMI) as confounders. However, there may still be unknown methodological effects (e.g., unknown technical factors for SomaScan analysis) and biological effects (clinical/biological covariates for emphysema pathogenesis). Although individuals with recent COPD exacerbations were not included, recent plasma and systemic infections could also influence our proteomic results. Inaccurate reports for self-reported covariates like smoking status may further affect our results. In our study, we tried to overcome these limitations by replicating proteomic associations in an independent cohort, SPIROMICS. Using an independent cohort not only validates our findings on COPD subtype proteomic associations but may also limit the effects of unmeasured confounders.

In plasma proteomics analyses, multiple biomarkers associated with emphysema-predominant vs. non-emphysema predominant COPD subtypes were identified. These COPD subtype proteomic biomarkers were enriched for biological functions including epithelial mesenchymal transition, cell adhesion, and collagen-containing extracellular matrix. Correlation-based network analyses identified a network module significantly associated with EPD vs. NEPD COPD. We also observed significant genetic effects on COPD subtypes and their associated proteomic biomarkers. Further study will be required to determine whether any of these identified proteomic biomarkers trigger inciting events in COPD pathogenesis that lead to disease heterogeneity.

Contributors

Conceptualisation: Edwin K. Silverman, Yu-Hang Zhang, Peter J. Castaldi, Russell P. Bowler, Katherine A. Pratte, Gregory L. Kinney, Kendra A. Young, Sharon M. Lutz, Craig P. Hersh, Michael H. Cho, Jarrett D. Morrow, Heena Rijhwani.

Data curation: Edwin K. Silverman, Yu-Hang Zhang, Peter J. Castaldi, Russell P. Bowler, Katherine A. Pratte, Michael H. Cho, Jarrett D. Morrow, Edwin K. Silverman.

Formal analysis: Edwin K. Silverman, Yu-Hang Zhang.

Funding acquisition: Edwin K. Silverman.

Investigation: Edwin K. Silverman, Yu-Hang Zhang.

Methodology: Edwin K. Silverman, Yu-Hang Zhang, Peter J. Castaldi, Katherine A. Pratte, Sharon M. Lutz, Michael H. Cho, Jarrett D. Morrow.

Project administration: Edwin K. Silverman.

Resources: Edwin K. Silverman, Russell P. Bowler, Katherine A. Pratte.

Software: Edwin K. Silverman, Yu-Hang Zhang, Katherine A. Pratte.

Supervision: Edwin K. Silverman.

Validation: Edwin K. Silverman, Russell P. Bowler, Yu-Hang Zhang.

Visualisation: Edwin K. Silverman, Russell P. Bowler, Yu-Hang Zhang.

Writing: Edwin K. Silverman, Yu-Hang Zhang, Peter J. Castaldi, Russell P. Bowler, Katherine A. Pratte, Gregory L. Kinney, Kendra A. Young, Heena Rijhwani, Sharon M. Lutz, Craig P. Hersh, Michael H. Cho, Jarrett D. Morrow.

Authors have directly accessed and verified data reported in the manuscript: Edwin K. Silverman, Yu-Hang Zhang, Peter J. Castaldi, Russell P. Bowler, Katherine A. Pratte, Gregory L. Kinney, Kendra A. Young, Sharon M. Lutz, Craig P. Hersh, Michael H. Cho, Jarrett D. Morrow.

All authors contributed substantially to the study design, data analysis, or interpretation. All authors reviewed the manuscript for intellectual content, contributed to revisions, and approved the final version for submission.

Data sharing statement

The COPDGene and SPIROMICS deidentified participant data sets and data dictionaries are available for monitored public access through dbGaP (Database of Genotypes and Phenotypes). More information about the study and how to access SPIROMICS data is available at www.spiromics.org.

Declaration of interests

The authors report the following conflicts of interest:

In the past three years, EKS has received grant support from Bayer and Northpond Laboratories. CPH has received grant support from Alpha-1 Foundation, Bayer, Boehringer-Ingelheim, and Vertex, and consulting fees from Chiesi, Ono, Sanofi, and Takeda. JDM has received support from an Alpha-1 Foundation Research Grant. MHC has received grant support from Bayer. PJC has received grant support from Sanofi and Bayer, and consulting fees from Verona.

Acknowledgements

EKS was supported by R01HL133135, R01HL147148, P01HL114501, and R01HL152728.

SML was supported by R01MH129337. KAP was supported by U01HL089897, U01HL089856, R01HL137995 and R01HL129937. JDM was supported by an Alpha-1 Foundation Research Grant, NHLBI TOPMed Fellowship, and K25HL136846. PJC was supported by U01HL089856 and R01HL133135. SomaScan data generation in COPDGene was funded through R01HL137995 (RPB). The COPDGene study (NCT00608764) is supported by grants from the NHLBI (U01HL089897 to National Jewish Health and U01HL089856 to Brigham and Women's Hospital), by NIH contract 75N92023D00011 to National Jewish Health, and by the COPD Foundation through contributions made to an Industry Advisory Committee that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion.

The authors thank the SPIROMICS participants and participating physicians, investigators, study coordinators, and staff for making this research possible. The authors would like to acknowledge the University of North Carolina at Chapel Hill BioSpecimen Processing Facility (http://bsp.web.unc.edu/) and Alexis Lab (https://www.med.unc.edu/cemalb/facultyresearch/alexislab/) for sample processing, storage, and sample disbursements.

SPIROMICS was supported by contracts from the NIH/NHLBI(HHSN268200900013C, HHSN268200900014C, HHSN268200900015C, HHSN268200900016C, HHSN268200900017C, HHSN268200900018C, HHSN268200900019C, HHSN268200900020C), grants from the NIH/NHLBI (U01 HL137880, U24 HL141762, R01 HL182622, and R01 HL144718), and supplemented by contributions made through the Foundation for the NIH and the COPD Foundation from Amgen; AstraZeneca/MedImmune; Bayer; Bellerophon Therapeutics; Boehringer-Ingelheim Pharmaceuticals, Inc.; Chiesi Farmaceutici S.p.A.; Forest Research Institute, Inc.; Genentech; GlaxoSmithKline; Grifols Therapeutics, Inc.; Ikaria, Inc.; MGC Diagnostics; Novartis Pharmaceuticals Corporation; Nycomed GmbH; Polarean; ProterixBio; Regeneron Pharmaceuticals, Inc.; Sanofi; Sunovion; Takeda Pharmaceutical Company; and Theravance Biopharma and Mylan/Viatris. Proteomic sample profiling was funded by Novartis.

Footnotes

Appendix A

Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2025.105800.

Appendix A. Supplementary data

Supplemental Tables and Figures
mmc1.docx (3.6MB, docx)

References

  • 1.Singh D., Agusti A., Anzueto A., et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease: the GOLD science committee report 2019. Eur Respir J. 2019;53(5) doi: 10.1183/13993003.00164-2019. [DOI] [PubMed] [Google Scholar]
  • 2.Han M.K., Agusti A., Calverley P.M., et al. Chronic obstructive pulmonary disease phenotypes: the future of COPD. Am J Respir Crit Care Med. 2010;182(5):598–604. doi: 10.1164/rccm.200912-1843CC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Vogelmeier C.F., Criner G.J., Martinez F.J., et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease 2017 report. GOLD executive summary. Am J Respir Crit Care Med. 2017;195(5):557–582. doi: 10.1164/rccm.201701-0218PP. [DOI] [PubMed] [Google Scholar]
  • 4.Castaldi P.J., Dy J., Ross J., et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax. 2014;69(5):415–422. doi: 10.1136/thoraxjnl-2013-203601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kinney G.L., Santorico S.A., Young K.A., et al. Identification of chronic obstructive pulmonary disease axes that predict all-cause mortality: the COPDGene study. Am J Epidemiol. 2018;187(10):2109–2116. doi: 10.1093/aje/kwy087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ross J.C., Castaldi P.J., Cho M.H., et al. Longitudinal modeling of lung function trajectories in smokers with and without chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2018;198(8):1033–1042. doi: 10.1164/rccm.201707-1405OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dagher R., Fogel P., Wang J., et al. Proteomic profiling of serum identifies a molecular signature that correlates with clinical outcomes in COPD. PLoS One. 2022;17(12) doi: 10.1371/journal.pone.0277357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Castaldi P.J., Benet M., Petersen H., et al. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts. Thorax. 2017;72(11):998–1006. doi: 10.1136/thoraxjnl-2016-209846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hersh C.P., Make B.J., Lynch D.A., et al. Non-emphysematous chronic obstructive pulmonary disease is associated with diabetes mellitus. BMC Pulm Med. 2014;14(1):164. doi: 10.1186/1471-2466-14-164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Diaz A.A., Bartholmai B., San José Estépar R., et al. Relationship of emphysema and airway disease assessed by CT to exercise capacity in COPD. Respir Med. 2010;104(8):1145–1151. doi: 10.1016/j.rmed.2010.02.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Madan M., Gupta P., Mittal R., Chhabra S. Clinical differences between emphysematous and non-emphysematous COPD. Eur Respir J. 2016;48(suppl 60) [Google Scholar]
  • 12.Castaldi P.J., Xu Z., Young K.A., et al. Heterogeneity and progression of chronic obstructive pulmonary disease: emphysema-predominant and non–emphysema-predominant disease. Am J Epidemiol. 2023;192(10):1647–1658. doi: 10.1093/aje/kwad114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wilkins M.R., Sanchez J.C., Gooley A.A., et al. Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev. 1996;13(1):19–50. doi: 10.1080/02648725.1996.10647923. [DOI] [PubMed] [Google Scholar]
  • 14.Al-Amrani S., Al-Jabri Z., Al-Zaabi A., Alshekaili J., Al-Khabori M. Proteomics: concepts and applications in human medicine. World J Biol Chem. 2021;12(5):57–69. doi: 10.4331/wjbc.v12.i5.57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Regan E.A., Hersh C.P., Castaldi P.J., et al. Omics and the search for blood biomarkers in chronic obstructive pulmonary disease. Insights from COPDGene. Am J Respir Cell Mol Biol. 2019;61(2):143–149. doi: 10.1165/rcmb.2018-0245PS. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Walter R.E., Wilk J.B., Larson M.G., et al. Systemic inflammation and COPD: the Framingham Heart Study. Chest. 2008;133(1):19–25. doi: 10.1378/chest.07-0058. [DOI] [PubMed] [Google Scholar]
  • 17.Chen H., Wang D., Bai C., Wang X. Proteomics-based biomarkers in chronic obstructive pulmonary disease. J Proteome Res. 2010;9(6):2798–2808. doi: 10.1021/pr100063r. [DOI] [PubMed] [Google Scholar]
  • 18.Bradford E., Jacobson S., Varasteh J., et al. The value of blood cytokines and chemokines in assessing COPD. Respir Res. 2017;18(1):180. doi: 10.1186/s12931-017-0662-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nicholas B.L., Skipp P., Barton S., et al. Identification of lipocalin and apolipoprotein A1 as biomarkers of chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2010;181(10):1049–1060. doi: 10.1164/rccm.200906-0857OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yonchuk J.G., Silverman E.K., Bowler R.P., et al. Circulating soluble receptor for advanced glycation end products (sRAGE) as a biomarker of emphysema and the RAGE axis in the lung. Am J Respir Crit Care Med. 2015;192(7):785–792. doi: 10.1164/rccm.201501-0137PP. [DOI] [PubMed] [Google Scholar]
  • 21.Cheng D.T., Kim D.K., Cockayne D.A., et al. Systemic soluble receptor for advanced glycation endproducts is a biomarker of emphysema and associated with AGER genetic variants in patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2013;188(8):948–957. doi: 10.1164/rccm.201302-0247OC. [DOI] [PubMed] [Google Scholar]
  • 22.Candia J., Daya G.N., Tanaka T., Ferrucci L., Walker K.A. Assessment of variability in the plasma 7k SomaScan proteomics assay. Sci Rep. 2022;12(1) doi: 10.1038/s41598-022-22116-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Regan E.A., Hokanson J.E., Murphy J.R., et al. Genetic epidemiology of COPD (COPDGene) study design. COPD. 2010;7(1):32–43. doi: 10.3109/15412550903499522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.van Zelst C.M., Goossens L.M.A., Witte J.A., et al. Stratification of COPD patients towards personalized medicine: reproduction and formation of clusters. Respir Res. 2022;23(1):336. doi: 10.1186/s12931-022-02256-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Serban K.A., Pratte K.A., Strange C., et al. Unique and shared systemic biomarkers for emphysema in Alpha-1 Antitrypsin deficiency and chronic obstructive pulmonary disease. EBioMedicine. 2022;84 doi: 10.1016/j.ebiom.2022.104262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.McCarthy S., Das S., Kretzschmar W., et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sakornsakolpat P., Prokopenko D., Lamontagne M., et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51(3):494–505. doi: 10.1038/s41588-018-0342-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Silverman E.K., Sandhaus R.A. Clinical practice. Alpha1-antitrypsin deficiency. N Engl J Med. 2009;360(26):2749–2757. doi: 10.1056/NEJMcp0900449. [DOI] [PubMed] [Google Scholar]
  • 29.Josephs K.S., Roberts A.M., Theotokis P., et al. Beyond gene-disease validity: capturing structured data on inheritance, allelic requirement, disease-relevant variant classes, and disease mechanism for inherited cardiac conditions. Genome Med. 2023;15(1):86. doi: 10.1186/s13073-023-01246-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Qian Q., Xue R., Xu C., Wang F., Zeng J., Xiao J. CVD Atlas: a multi-omics database of cardiovascular disease. Nucleic Acids Res. 2025;53(D1):D1348–D1355. doi: 10.1093/nar/gkae848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Luo W., Friedman M.S., Shedden K., Hankenson K.D., Woolf P.J. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161. doi: 10.1186/1471-2105-10-161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liberzon A., Birger C., Thorvaldsdóttir H., Ghandi M., Mesirov J.P., Tamayo P. The Molecular Signatures Database hallmark gene set collection. Cell Syst. 2015;1(6):417–425. doi: 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–D462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fabregat A., Jupe S., Matthews L., et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46(D1):D649–D655. doi: 10.1093/nar/gkx1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Subramanian A., Tamayo P., Mootha V.K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Liberzon A., Subramanian A., Pinchback R., Thorvaldsdóttir H., Tamayo P., Mesirov J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Rahnenfuhrer A., Alexa A. topGO: enrichment analysis for gene ontology. R Package Version. 2019;2(1):357. [Google Scholar]
  • 38.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Willer C.J., Li Y., Abecasis G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28(10):1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.De Muth J. Taylor & Francis; 2006. Basic Statistics and Pharmaceutical Statistical Applications 2nd ed Pharmacy Education Series. [Google Scholar]
  • 42.Ohlmeier S., Mazur W., Linja-Aho A., et al. Sputum proteomics identifies elevated PIGR levels in smokers and mild-to-moderate COPD. J Proteome Res. 2012;11(2):599–608. doi: 10.1021/pr2006395. [DOI] [PubMed] [Google Scholar]
  • 43.Gohy S.T., Detry B.R., Lecocq M., et al. Polymeric immunoglobulin receptor down-regulation in chronic obstructive pulmonary disease. Persistence in the cultured epithelium and role of transforming growth factor-β. Am J Respir Crit Care Med. 2014;190(5):509–521. doi: 10.1164/rccm.201311-1971OC. [DOI] [PubMed] [Google Scholar]
  • 44.Goetze J.P., Bruneau B.G., Ramos H.R., Ogawa T., de Bold M.K., de Bold A.J. Cardiac natriuretic peptides. Nat Rev Cardiol. 2020;17(11):698–717. doi: 10.1038/s41569-020-0381-0. [DOI] [PubMed] [Google Scholar]
  • 45.Sen-Chowdhry S., Prasad S.K., Syrris P., et al. Cardiovascular magnetic resonance in arrhythmogenic right ventricular cardiomyopathy revisited: comparison with task force criteria and genotype. J Am Coll Cardiol. 2006;48(10):2132–2140. doi: 10.1016/j.jacc.2006.07.045. [DOI] [PubMed] [Google Scholar]
  • 46.Koenig W., Twardella D., Brenner H., Rothenbacher D. Plasma concentrations of cystatin C in patients with coronary heart disease and risk for secondary cardiovascular events: more than simply a marker of glomerular filtration rate. Clin Chem. 2005;51(2):321–327. doi: 10.1373/clinchem.2004.041889. [DOI] [PubMed] [Google Scholar]
  • 47.Fang H.C., Wu B.Q., Hao Y.L., et al. KRT1 gene silencing ameliorates myocardial ischemia-reperfusion injury via the activation of the Notch signaling pathway in mouse models. J Cell Physiol. 2019;234(4):3634–3646. doi: 10.1002/jcp.27133. [DOI] [PubMed] [Google Scholar]
  • 48.Katz D.H., Tahir U.A., Bick A.G., et al. Whole genome sequence analysis of the plasma proteome in Black adults provides novel insights into cardiovascular disease. Circulation. 2022;145(5):357–370. doi: 10.1161/CIRCULATIONAHA.121.055117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Smith D.J., Yerkovich S.T., Towers M.A., Carroll M.L., Thomas R., Upham J.W. Reduced soluble receptor for advanced glycation end-products in COPD. Eur Respir J. 2011;37(3):516–522. doi: 10.1183/09031936.00029310. [DOI] [PubMed] [Google Scholar]
  • 50.Miniati M., Monti S., Basta G., Cocci F., Fornai E., Bottai M. Soluble receptor for advanced glycation end products in COPD: relationship with emphysema and chronic cor pulmonale: a case-control study. Respir Res. 2011;12(1):37. doi: 10.1186/1465-9921-12-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Keefe J., Yao C., Hwang S.J., et al. An integrative genomic strategy identifies sRAGE as a causal and protective biomarker of lung function. Chest. 2022;161(1):76–84. doi: 10.1016/j.chest.2021.06.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Pratte K.A., Curtis J.L., Kechris K., et al. Soluble receptor for advanced glycation end products (sRAGE) as a biomarker of COPD. Respir Res. 2021;22(1):127. doi: 10.1186/s12931-021-01686-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bohlson S.S., Silva R., Fonseca M.I., Tenner A.J. CD93 is rapidly shed from the surface of human myeloid cells and the soluble form is detected in human plasma. J Immunol. 2005;175(2):1239–1247. doi: 10.4049/jimmunol.175.2.1239. [DOI] [PubMed] [Google Scholar]
  • 54.Greenlee M.C., Sullivan S.A., Bohlson S.S. CD93 and related family members: their role in innate immunity. Curr Drug Targets. 2008;9(2):130–138. doi: 10.2174/138945008783502421. [DOI] [PubMed] [Google Scholar]
  • 55.Lugano R., Vemuri K., Yu D., et al. CD93 promotes β1 integrin activation and fibronectin fibrillogenesis during tumor angiogenesis. J Clin Invest. 2018;128(8):3280–3297. doi: 10.1172/JCI97459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Plosa E.J., Benjamin J.T., Sucre J.M., et al. β1 Integrin regulates adult lung alveolar epithelial cell inflammation. JCI Insight. 2020;5(2) doi: 10.1172/jci.insight.129259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lee M.K., Hong Y., Kim S.Y., Kim W.J., London S.J. Epigenome-wide association study of chronic obstructive pulmonary disease and lung function in Koreans. Epigenomics. 2017;9(7):971–984. doi: 10.2217/epi-2017-0002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Johansen F.E., Kaetzel C. Regulation of the polymeric immunoglobulin receptor and IgA transport: new advances in environmental factors that stimulate pIgR expression and its role in mucosal immunity. Mucosal Immunol. 2011;4(6):598–602. doi: 10.1038/mi.2011.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Richmond B.W., Brucker R.M., Han W., et al. Airway bacteria drive a progressive COPD-like phenotype in mice with polymeric immunoglobulin receptor deficiency. Nat Commun. 2016;7 doi: 10.1038/ncomms11240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Su X., Wu W., Zhu Z., Lin X., Zeng Y. The effects of epithelial–mesenchymal transitions in COPD induced by cigarette smoke: an update. Respir Res. 2022;23(1):225. doi: 10.1186/s12931-022-02153-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Fujioka N., Kitabatake M., Ouji-Sageshima N., et al. Human adipose-derived mesenchymal stem cells ameliorate elastase-induced emphysema in mice by mesenchymal-epithelial transition. Int J Chron Obstruct Pulmon Dis. 2021;16:2783–2793. doi: 10.2147/COPD.S324952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Mimae T., Hagiyama M., Inoue T., et al. Increased ectodomain shedding of lung epithelial cell adhesion molecule 1 as a cause of increased alveolar cell apoptosis in emphysema. Thorax. 2014;69(3):223–231. doi: 10.1136/thoraxjnl-2013-203867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Demedts I.K., Demoor T., Bracke K.R., Joos G.F., Brusselle G.G. Role of apoptosis in the pathogenesis of COPD and pulmonary emphysema. Respir Res. 2006;7(1):53. doi: 10.1186/1465-9921-7-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Karakioulaki M., Papakonstantinou E., Stolz D. Extracellular matrix remodelling in COPD. Eur Respir Rev. 2020;29(158) doi: 10.1183/16000617.0124-2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.D'Armiento J., Dalal S.S., Okada Y., Berg R.A., Chada K. Collagenase expression in the lungs of transgenic mice causes pulmonary emphysema. Cell. 1992;71(6):955–961. doi: 10.1016/0092-8674(92)90391-o. [DOI] [PubMed] [Google Scholar]
  • 66.Fujita M., Ouchi H., Ikegame S., et al. Critical role of tumor necrosis factor receptor 1 in the pathogenesis of pulmonary emphysema in mice. Int J Chron Obstruct Pulmon Dis. 2016;11:1705–1712. doi: 10.2147/COPD.S108919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Lagares D., Ghassemi-Kakroodi P., Tremblay C., et al. ADAM10-mediated ephrin-B2 shedding promotes myofibroblast activation and organ fibrosis. Nat Med. 2017;23(12):1405–1415. doi: 10.1038/nm.4419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Lakshman Kumar P., Wilson A.C., Rocco A., et al. Genetic variation in genes regulating skeletal muscle regeneration and tissue remodelling associated with weight loss in chronic obstructive pulmonary disease. J Cachexia Sarcopenia Muscle. 2021;12(6):1803–1817. doi: 10.1002/jcsm.12782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Bowler R.P., Jacobson S., Cruickshank C., et al. Plasma sphingolipids associated with chronic obstructive pulmonary disease phenotypes. Am J Respir Crit Care Med. 2015;191(3):275–284. doi: 10.1164/rccm.201410-1771OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ernst P., Saad N., Suissa S. Inhaled corticosteroids in COPD: the clinical evidence. Eur Respir J. 2015;45(2):525–537. doi: 10.1183/09031936.00128914. [DOI] [PubMed] [Google Scholar]
  • 71.Rabe K.F., Hurst J.R., Suissa S. Cardiovascular disease and COPD: dangerous liaisons? Eur Respir Rev. 2018;27(149) doi: 10.1183/16000617.0057-2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Yang H.M., Ryu M.H., Carey V.J., et al. COPD subtypes are differentially associated with cardiovascular events and COPD exacerbations. Chest. 2024;166(6):1360–1370. doi: 10.1016/j.chest.2024.07.148. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Tables and Figures
mmc1.docx (3.6MB, docx)

Articles from eBioMedicine are provided here courtesy of Elsevier

RESOURCES