Abstract
Rationale
Emphysema is a chronic obstructive pulmonary disease phenotype with important prognostic implications. Identifying blood-based biomarkers of emphysema will facilitate early diagnosis and development of targeted therapies.
Objectives
To discover blood omics biomarkers for chest computed tomography–quantified emphysema and develop predictive biomarker panels.
Methods
Emphysema blood biomarker discovery was performed using differential gene expression, alternative splicing, and protein association analyses in a training sample of 2,370 COPDGene participants with available blood RNA sequencing, plasma proteomics, and clinical data. Internal validation was conducted in a COPDGene testing sample (n = 1,016), and external validation was done in the ECLIPSE study (n = 526). Because low body mass index (BMI) and emphysema often co-occur, we performed a mediation analysis to quantify the effect of BMI on gene and protein associations with emphysema. Elastic net models with bootstrapping were also developed in the training sample sequentially using clinical, blood cell proportions, RNA-sequencing, and proteomic biomarkers to predict quantitative emphysema. Model accuracy was assessed by the area under the receiver operating characteristic curves for subjects stratified into tertiles of emphysema severity.
Measurements and Main Results
Totals of 3,829 genes, 942 isoforms, 260 exons, and 714 proteins were significantly associated with emphysema (false discovery rate, 5%) and yielded 11 biological pathways. Seventy-four percent of these genes and 62% of these proteins showed mediation by BMI. Our prediction models demonstrated reasonable predictive performance in both COPDGene and ECLIPSE. The highest-performing model used clinical, blood cell, and protein data (area under the receiver operating characteristic curve in COPDGene testing, 0.90; 95% confidence interval, 0.85–0.90).
Conclusions
Blood transcriptome and proteome-wide analyses revealed key biological pathways of emphysema and enhanced the prediction of emphysema.
Keywords: emphysema, biomarkers, transcriptomics, proteomics, prediction
At a Glance Commentary
Scientific Knowledge on the Subject
Differential gene expression and protein analyses have uncovered some of the molecular underpinnings of emphysema. However, no studies have assessed alternative splicing mechanisms and analyzed proteomic data from recently developed high-throughput panels. In addition, although emphysema has been associated with low body mass index (BMI), it is still unclear how BMI affects the transcriptome and proteome of the disease. Finally, the effectiveness of multiomics biomarkers in determining the severity of emphysema has not yet been investigated.
What This Study Adds to the Field
We performed whole-blood genome-wide RNA-sequencing and plasma SomaScan proteomic analyses in the large and well-phenotyped COPDGene study. In addition to confirming earlier findings, our differential gene expression, alternative splicing, and protein analyses identified novel biomarkers and pathways of chest computed tomography–quantified emphysema. Our mediation analysis detected varying degrees of transcriptomic and proteomic mediation due to BMI. Our supervised machine learning modeling suggested the potential utility of incorporating blood-based multiomics data for improving the prediction of emphysema.
Introduction
Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality (1). Emphysema, the anatomic destruction of lung parenchyma frequently observed in subjects with COPD, has been independently associated with an increased risk for cardiovascular disease, lung cancer, and mortality (2–4). Timely diagnosis calls for a blood-based predictive model because it may identify emphysema in subjects in whom computed tomography (CT) scans are not clinically indicated. Emphysema blood biomarkers would also overcome the issues of radiation exposure and false-positive findings associated with CT scans (5). In addition, early disease biomarkers and a stronger understanding of the molecular basis of emphysema are needed to develop novel personalized therapies to improve the prognosis of affected individuals (2, 6, 7).
Previous transcriptomic studies have identified emphysema-associated genes (such as COL6A1, CD19, PTX3, and RAGE) and biological processes (such as innate and adaptive immunity, inflammation, and tissue remodeling) primarily from gene expression analyses using lung tissue samples (8–12). However, fewer studies have evaluated the associations of emphysema with blood transcriptomics, alternative splicing, or proteomics. Alternative splicing, the regulatory process in which multiexon human genes are expressed in multiple transcript isoforms, has been implicated in the pathophysiology of several lung diseases, such as asthma, pulmonary fibrosis, pulmonary arterial hypertension, and COPD (13–19). Protein concentrations have also been studied for potential emphysema biomarker identification, and sRAGE, ICAM1, CCL20, and adiponectin concentrations in blood and eotaxin concentrations in BAL fluid were found to be associated with emphysema (20–23), though the protein panels used for these studies included fewer proteins than the more recently developed panels. Finally, previous research that used blood-based emphysema predictive models had small sample sizes and only tested one omics modality at a time (20, 24–27).
We hypothesized that 1) transcriptomic and proteomic characterization of smokers would elucidate emphysema pathobiology and yield novel disease biomarkers, 2) many emphysema associations with transcripts and proteins are influenced by body mass index (BMI), and 3) multiomics modeling would provide improved prediction of emphysema relative to readily available clinical variables. To test these hypotheses, we analyzed whole-blood genome-wide RNA-sequencing (RNA-seq) and plasma SomaScan proteomic data from the large and well-phenotyped COPDGene (Genetic Epidemiology of COPD) study. Given the high clinical correlation between emphysema and BMI (28), we performed mediation analyses to understand the influence of BMI on emphysema-associated genes and proteins. We also developed machine learning predictive models for emphysema using transcriptomic and proteomic biomarkers. Some of these results were previously reported as an abstract (29) and a preprint (30).
Methods
Study Descriptions
Participants were recruited from the COPDGene study (NCT00608764, www.copdgene.org), a longitudinal study investigating the genetic basis of COPD. The COPDGene population consists of 10,371 non-Hispanic White and African American subjects aged 44–90 years old with an average of 44 pack-years of lifetime cigarette smoking history (31). Subjects had varying degrees of COPD severity as measured by the Global Initiative for Chronic Obstructive Lung Disease grading system. COPDGene obtained 5-year follow-up data and is currently obtaining 10-year follow-up data of available subjects. Questionnaires, chest CT scans, and spirometry have been gathered at 21 clinical facilities in the United States. RNA-seq and plasma proteomic measurements were obtained from a subset of subjects at their 5-year follow-up visit (visit 2). Each center acquired institutional review board approval and written informed consents. In our analyses, we used COPDGene visit 2 data, which included 3,386 subjects with available clinical, RNA-seq, and SomaScan proteomic data. ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints), used for external validation, is a study designed to investigate the progression of COPD and predictive biomarkers associated with COPD (32).
Emphysema Quantification
Emphysema was quantified as the 15th percentile of the attenuation histogram + 1,000 Hounsfield units (Perc15 density) using Thirona software (www.thirona.eu) in COPDGene and Slicer software (www.slicer.org) in ECLIPSE. In COPDGene, Perc15 density values were corrected for the inspiratory depth variations using Multi-Ethnic Study of Atherosclerosis normative equations (predicted lung volume using baseline age, time-varying height, and BMI; adjusted Perc15 density) (33–35). This correction was made because it had been demonstrated to provide a more robust measure of longitudinal changes in emphysema (34). Lower adjusted Perc15 values correspond to more CT-quantified emphysema. Associations are reported with adjusted Perc15 density throughout the paper (i.e., upregulation and downregulation), unless otherwise specified.
Internal and External Validation
We randomly partitioned our studied COPDGene cohort into training and testing samples comprising 70% and 30% of the subjects, respectively. All association and mediation analyses, as well as prediction model training, were conducted using the training sample. The internal validation of the identified biomarkers and constructed predictive model was performed in the testing sample. For external validation of our findings, we used blood microarray data and CT-quantified emphysema from 526 subjects with COPD in the ECLIPSE study. We also compared our results with the gene expression profiles of alveolar macrophages and bronchial epithelial cells from two previously published studies (36, 37).
RNA Isolation, Library Preparation, Filtering, and Normalization
Illumina sequencers were used to obtain gene, isoform, and exon counts from total blood RNA isolated from visit 2 participants. Genomic features with very low expression (average counts per million [CPM] <0.2 or number of subjects with CPM <0.5 less than 50) or extremely highly expressed genes (number of subjects with CPM >50,000 less than 50) were filtered out before applying trimmed mean of M values normalization from edgeR (version 3.24.3), which accounts for differences in sequencing depth (38). Counts were transformed to log2 CPM values and quantile normalized to further remove systematic noise from the data.
Protein Measurements and Filtering
At COPDGene visit 2, plasma samples were assayed for 4,979 proteins using the SomaScan Human Plasma 5.0K assay, a multiplex aptamer-based assay (SomaLogic) (39). The SomaScan data were standardized per the SomaLogic protocol to control for interassay variation between analytes and batch differences between plates (40). Samples with low volume, failed hybridization control, or failed dilution scale were removed. Proteomic data for 5,670 participants passed quality control. The protein counts were transformed to log2 relative fluorescence units values.
RNA-Seq Differential Expression, Use, and Protein Association Analyses
We used the limma-voom linear modeling approach (as implemented in limma version 3.38.3) to test for the associations between emphysema and whole-blood RNA transcripts (41, 42). The diffSplice function from limma was used to test for differential use of isoforms and exons. Although differential expression refers to the change in the absolute expression levels of a feature, differential use captures alternative splicing and refers to the change in the relative expression levels of the isoforms/exons within a given gene. We used multivariable linear modeling to assess the association of the SomaScan proteins with emphysema. In the emphysema “primary” model, we adjusted for age, race, sex, pack-years of smoking, current smoking status, forced expiratory volume in one second (FEV1), complete blood count (CBC) cell proportions, CT scanner model, and library preparation batch for RNA-seq or clinical center for proteins. The validation rate in the testing sample was determined on the basis of a threshold P value <0.05 and a consistent direction of effect in the training and testing datasets. A sensitivity analysis was performed in which the list of covariates from the primary model was expanded to include BMI. Finally, we ran association analyses in which only technical factors (CT scanner model and library preparation batch for RNA-seq or clinical center for proteins) were adjusted for to select candidate predictors to be used in our prediction models. Multiple comparisons were corrected with the Benjamini-Hochberg method using a threshold of significance of a false discovery rate (FDR) of 5% (43). We generated heatmaps of the top 15 differentially expressed genes and top 15 significant proteins for the bottom 25 and top 25 subjects ranked by their adjusted Perc15 values. We also generated scatterplots for the top genes and proteins to show the relationships between the gene expression and protein levels and the adjusted Perc15 values. We also generated splicing plots to show the differences in exon and transcript use for our top differentially used isoforms. We then generated Sashimi plots to visually illustrate the top five differential exon use (DEU) results, with each plot centered on the exon demonstrating differential use. Integrative Genomics Viewer version 2.15.4 (44) was used to produce these Sashimi plots for the 50 subjects exhibiting the most severe emphysema (lowest adjusted Perc15 density values) and the 50 subjects with the least severe emphysema (highest adjusted Perc15 density values). GTF (GENCODE release 37) (45) was imported to Integrative Genomics Viewer to ensure that the reference annotations used in the plots were consistent with those used in our differential isoform and exon use analyses. Finally, we performed sensitivity analysis using cellular deconvolution (CD) generated by CIBERSORTx, a machine learning technique that identifies the gene expression profile of different cell types for large cohorts (46). Cell fraction imputation was calculated for 22 immune cell types using CIBERSORTx Impute Cell Fraction (relative mode) and the LM22 reference signature matrix (47).
Mediation Analysis
We conducted mediation analysis using the medflex R package (48) to distinguish how much of the effect of emphysema on gene expression or protein levels acted through BMI (referred to as the indirect effect) and how much of the effect of emphysema directly influenced gene expression or protein levels (referred to as the direct effect). The analysis was performed on the significant genes and proteins identified in the primary association analysis. A mediated proportion representing the ratio of the indirect effect over the total effect was computed for each gene with significant total effect. The P values of the direct, indirect, and total effects for each biomarker were subject to a threshold significance of 5% FDR.
Gene Set Enrichment Analyses
The biological enrichment of the gene sets derived from the gene expression, transcript use, and protein association analyses was evaluated using the topGO (version 2.33.1) weight01 algorithm, which accounts for the dependency in the Gene Ontology (GO) topology (49). We only reported GO pathways with at least three significant genes and a 5% FDR.
Predictive Modeling
We constructed elastic net models to predict cross-sectional emphysema (50). The outcome variable was the adjusted Perc15 density. We used clinical variables that are readily available in the primary care setting (age, race, sex, BMI, pack-years of smoking, and current smoking status), CBC (proportions of neutrophils, eosinophils, monocytes, lymphocytes, and platelets), and the RNA-seq and proteins that reached statistical significance in the association analyses performed in the training data (adjusted only for the scanner model and library preparation batch or clinical center). To determine the top-performing RNA data type to be used in the main models, we first constructed models using clinical + gene, clinical + isoform, and clinical + exon counts. We then constructed models in the following order: clinical only, clinical + CBC, clinical + CBC + RNA- seq, clinical + CBC + proteins, and clinical + CBC + RNA-seq + proteins. The outcome and the predictors were centered and scaled. The models were trained using fivefold cross-validation, minimizing the mean squared error (51) on the left-out fold. After model training on the continuous emphysema variable, we classified subjects into tertiles of adjusted Perc15 density. We evaluated the predictive performances of the models in the testing sample using R2 for the continuous emphysema and the area under receiver operating characteristic curve (AUROC) for the model accuracy to distinguish subjects in the highest and lowest tertiles of emphysema severity. We compared the AUROCs with the DeLong test using the pROC R package (52). Predictors were ranked by the absolute values of their coefficients from the regression model. To enhance the reliability of our prediction models, we performed 100 bootstrap iterations on each elastic net model.
Statistical Analysis
Data were reported as means with SDs or counts with percents. Continuous variables were tested with Kruskal-Wallis tests, and categorical variables were tested with chi-square tests. Upregulated versus downregulated genes as well as positive versus negative signs of the protein coefficients are provided with respect to their relationships with adjusted Perc15 density (i.e., negative coefficients indicate a greater extent of emphysema).
Data Availability
The raw RNA-seq and proteomic data for COPDGene were made available through dbGaP (COPDGene accession number phs000765.v3.p2). Additional methods are available in the online supplement.
Results
Subject Characteristics
A total of 3,386 subjects from COPDGene visit 2 with complete clinical, RNA-seq, and protein data were included in our analyses (Figure 1). As shown in Table 1, the included subjects were mostly non-Hispanic White individuals with a balanced representation by sex, a mean age of 65 years, a mean BMI of 29 kg/m2, and a mean of 41 pack-years of smoking. The subjects’ characteristics did not significantly differ between the training and testing data, which consisted of 2,370 and 1,016 subjects, respectively. A comparison of subjects with and without missing data showed that the two groups were largely similar (see Table E1 in the online supplement). A schematic overview of the analyses performed is illustrated in Figure E1. As shown in Table E2, the characteristics differed between the studied COPDGene and ECLIPSE subjects. Compared with the COPDGene training sample, we have more males and non-Hispanic White individuals and a lower percentage of current smokers in ECLIPSE. The ECLIPSE cohort consists primarily of individuals in Global Initiative for Chronic Obstructive Lung Disease II–IV, whereas COPDGene subjects had more varying degrees of COPD severity. Finally, the mean adjusted Perc15 density values are lower in ECLIPSE than in COPDGene, which likely reflects the older CT technology and somewhat noisier scans in ECLIPSE.
Figure 1.
COPDGene visit 2 participant flow diagram. AA = African American; BMI = body mass index; CBC = complete blood count; DGE = differential gene expression; DIU = differential isoform use; DEU = differential exon use; NHW = non-Hispanic White; RNA-seq = RNA sequencing.
Table 1.
Characteristics of Subjects in Training and Testing Datasets in COPDGene Visit 2
| Training (n = 2,370) | Testing (n = 1,016) | P Value | |
|---|---|---|---|
| Age, yr | 65.07 (8.78) | 65.42 (8.85) | 0.28 |
| Sex, % male | 51.35% | 49.80% | 0.41 |
| Race, % NHW | 72.87% | 76.18% | 0.04 |
| BMI | 28.94 (6.33) | 28.70 (6.01) | 0.31 |
| Smoking pack-years | 41.65 (25.72) | 41.23 (25.90) | 0.66 |
| Current smoker | 858 (36.20%) | 346 (34.06%) | 0.23 |
| FEV1, L | 2.22 (0.84) | 2.21 (0.85) | 0.75 |
| FEV1, % predicted | 80.57 (24.35) | 80.52 (24.33) | 0.95 |
| FVC, L | 3.20 (0.95) | 3.19 (0.96) | 0.88 |
| Adjusted Perc15 density | 85.96 (24.8) | 85.72 (24.91) | 0.79 |
| % Segmental airway wall thickness | 49.70 (8.37) | 49.54 (8.37) | 0.62 |
| Pi10 | 2.24 (0.57) | 2.23 (0.56) | 0.54 |
| GOLD grade | |||
| PRISm | 299 (12.62%) | 120 (11.81%) | 0.89 |
| Normal spirometry | 996 (42.03%) | 415 (40.85%) | |
| 1 | 232 (9.79%) | 100 (9.84%) | |
| 2 | 425 (17.93%) | 187 (18.41%) | |
| 3 | 209 (8.82%) | 95 (9.35%) | |
| 4 | 84 (3.54%) | 35 (3.44%) | |
| Exacerbation history, % | 14.96% | 15.03% | 0.73 |
| SGRQ score | 24.94 (24.35) | 24.40 (23.11) | 0.55 |
| mMRC dyspnea score | |||
| 0 | 1,294 (54.60%) | 562 (55.31%) | 0.78 |
| 1 | 284 (11.98%) | 133 (13.09%) | |
| 2 | 276 (11.65%) | 117 (11.52%) | |
| 3 | 361 (15.23%) | 144 (14.17%) | |
| 4 | 155 (6.54%) | 60 (5.91%) | |
| CAD | 199 (8.40%) | 74 (7.28%) | 0.28 |
| Diabetes | 383 (16.16%) | 143 (14.07%) | 0.12 |
| Hypertension | 1,136 (47.93%) | 508 (50.00%) | 0.27 |
Definition of abbreviations: BMI = body mass index; CAD = coronary artery disease; FVC = forced vital capacity; FEV1 = forced expiratory volume in one second; GOLD = Global Initiative for Chronic Obstructive Lung Disease; mMRC = modified Medical Research Council; NHW = non-Hispanic White; Perc15 = 15th percentile of the attenuation histogram + 1,000 Hounsfield units; Pi10 = square root of airway wall area of hypothetical airway with internal perimeter of 10 mm; PRISm = preserved ratio impaired spirometry; SGRQ = St. George’s Respiratory Questionnaire.
Participant characteristics reported here are from visit 2, when omics data were obtained. Continuous variables are expressed as mean and SD. Categorical variables are expressed as absolute value and/or percentage. Adjusted Perc15 density: Hounsfield units at the 15th percentile of CT density histogram at TLC, corrected for the inspiratory depth (per convention, adjusted Perc15 density values are reported as Hounsfield units + 1,000); CAD: self-reported history of coronary artery disease; exacerbation history: at least one chronic obstructive pulmonary disease exacerbation (acute worsening of respiratory symptoms that required systemic steroids and/or antibiotics) in the previous year; GOLD 1: FEV1/FVC <0.70 and post-bronchodilator FEV1 ⩾80% predicted; GOLD 2: FEV1/FVC <0.70 and post-bronchodilator FEV1 50–79% predicted; GOLD 3: FEV1/FVC <0.70 and post-bronchodilator FEV1 30–49% predicted; GOLD 4: FEV1/FVC <0.70 and post-bronchodilator FEV1 <30% predicted; PRISm: defined as FEV1/FVC ⩾0.70 but with FEV1 <80% predicted; race: self-reports as either NHW or African American.
Differential Gene Expression Analysis
We performed differential gene expression (DGE) analysis in the 2,370 subjects of the COPDGene training dataset. We assessed our linear model assumption via diagnostic plots that indicated that our model assumptions, including the linear assumption, are generally acceptable with these data (Figure E2). Of 19,177 genes, 3,829 reached significance at 5% FDR (Tables 2 and E3; Figures 2A and E3). A total of 1,822 genes were upregulated and 2,007 were downregulated with respect to adjusted Perc15 density (i.e., they have opposite directions for their associations with emphysema) (Figure 3A). The GO enrichment analysis identified 10 significantly enriched biological processes, including neutrophil degranulation, regulation of nuclear factor κB signaling, viral transcription, T cell proliferation, and regulation of tumor necrosis factor–mediated signaling (Table 3).
Table 2.
Top Five Differentially Expressed Genes, Differentially Used Isoforms, and Differentially Used Exons Associated with Adjusted Perc15 Density in COPDGene
| ID | HUGO Gene Name | Log Fold Change | Average Log Expression | FDR | |
|---|---|---|---|---|---|
| DGE | ENSG00000160179 | ABCG1 | −0.007 | 4.907 | 4 × 10−19 |
| ENSG00000138772 | ANXA3 | 0.006 | 4.825 | 8 × 10−17 | |
| ENSG00000164674 | SYTL3 | −0.004 | 5.212 | 8 × 10−17 | |
| ENSG00000253981 | ALG1L13P | −0.006 | 2.721 | 3 × 10−15 | |
| ENSG00000169877 | AHSP | 0.012 | 4.573 | 6 × 10−15 | |
| DIU | ENST00000432854 | DBNL | 0.017 | −1.701 | 1 × 10−20 |
| ENST00000483180 | NFKBIZ | −0.015 | −1.759 | 2 × 10−13 | |
| ENST00000357428 | USP33 | 0.013 | −2.868 | 7 × 10−13 | |
| ENST00000315939 | WNK1 | 0.012 | 2.770 | 5 × 10−12 | |
| ENST00000339486 | RIOK3 | 0.008 | 8.065 | 5 × 10−12 | |
| DEU | 360147 | PSMA1 | −0.004 | 1.511 | 1 × 10−7 |
| 413338 | FRY | 0.004 | 1.744 | 2 × 10−7 | |
| 450397 | CCNDBP1 | −0.004 | 2.388 | 3 × 10−7 | |
| 514701 | VMP1 | 0.002 | 4.936 | 1 × 10−6 | |
| 510631 | ATP6V0A1 | 0.003 | 2.087 | 4 × 10−6 |
Definition of abbreviations: DEU = differential exon use; DGE = differential gene expression; DIU = differential isoform use; FDR = false discovery rate.
Adjusted Perc15 density: Hounsfield units at the 15th percentile of computed tomography (CT) density histogram at TLC, corrected for the inspiratory depth (per convention, adjusted Perc15 density values are reported as Hounsfield units + 1,000). The lower the Perc15 values are, the more CT-quantified emphysema is present. For the DGE, DIU, and DEU analyses, the covariates used were age, race, sex, pack-years of smoking, current smoking status, FEV1, complete blood count cell count proportions, library preparation batch, and CT scanner model. A threshold FDR of 5% was applied. Genes and isoforms are represented by their Ensembl Gene ID and Ensembl Transcript ID, respectively. HUGO gene name corresponds to the unique gene identified by the Ensembl Gene ID (DGE) and the gene associated with the isoform or exon (DIU and DEU). Log fold change values indicate change per unit increase in adjusted Perc15. Positive log fold change values represent upregulated genes, whereas negative ones correspond to downregulated ones with respect to adjusted Perc15 density (i.e., they have opposite signs for their associations with emphysema). Average log expression is the average of the log-transformed counts of the gene in analyzed subjects.
Figure 2.
Heatmap of expression levels (high expression in red, low expression in blue) of top 15 (A) differential gene expression (DGE) and (B) proteins in COPDGene. Subjects are listed in increasing order of adjusted 15th percentile of the attenuation histogram + 1,000 Hounsfield units (Perc15). The heatmaps are scaled by row. Subjects 1–25 represent those with the lowest adjusted Perc15, whereas subjects 26–50 represent those with the highest adjusted Perc15. The DGE and proteins are sorted in decreasing order (from top to bottom) of log fold change and β-coefficients, respectively.
Figure 3.
Volcano plots of the primary model representing (A) differentially expressed genes, (B) differentially used isoforms, and (C) differentially used exons in COPDGene. Genes significantly associated with adjusted Perc15 density appear above the red line marked at false discovery rate (FDR) 5%. Upregulated genes are in blue, and downregulated genes are in red. Genes/isoforms/exons that are not differentially expressed or used are gray and appear below the threshold line. Adjusted Perc15 density = Hounsfield units at the 15th percentile of computed tomography density histogram at TLC, corrected for the inspiratory depth (per convention, adjusted Perc15 density values are reported as HU + 1,000). The lower the Perc15 values are, the more computed tomography–quantified emphysema is present. Upregulated versus downregulated genes are reported with respect to adjusted Perc15 density (i.e., they have opposite directions for their associations with emphysema).
Table 3.
Gene Ontology Biological Processes Enriched in Differentially Expressed Genes and Proteins Associated with Adjusted Perc15 Density in COPDGene
| GO ID | GO Term | Total Number of Genes in Category | Number of Adjusted Perc15 Density-associated Genes in Category | Adjusted P Value | |
|---|---|---|---|---|---|
| DGE | GO:0006614 | SRP-dependent cotranslational protein targeting to membrane | 99 | 72 | 2 × 10−17 |
| GO:0006413 | Translational initiation | 185 | 97 | 2 × 10−15 | |
| GO:0000184 | Nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | 120 | 77 | 3 × 10−10 | |
| GO:0019083 | Viral transcription | 174 | 87 | 1 × 10−9 | |
| GO:0043312 | Neutrophil degranulation | 466 | 212 | 3 × 10−8 | |
| GO:0002181 | Cytoplasmic translation | 98 | 44 | 2 × 10−3 | |
| GO:0051092 | Positive regulation of NF-κB transcription factor activity | 144 | 70 | 7 × 10−3 | |
| GO:0046718 | Viral entry into host cell | 111 | 58 | 1 × 10−2 | |
| GO:0042102 | Positive regulation of T cell proliferation | 84 | 47 | 2 × 10−2 | |
| GO:0010803 | Regulation of tumor necrosis factor–mediated signaling pathway | 51 | 25 | 3 × 10−2 | |
| Protein | GO:0007155 | Cell adhesion | 711 | 175 | 1 × 10−7 |
Definition of abbreviations: DGE = differential gene expression; GO = Gene Ontology.
Adjusted Perc15 density: Hounsfield units at the 15th percentile of computed tomography (CT) density histogram at TLC, corrected for the inspiratory depth (per convention, adjusted Perc15 density values are reported as Hounsfield units + 1,000). The lower the Perc15 values are, i.e., the closer to −1,000 Hounsfield units, the more CT-quantified emphysema is present. For the DGE and protein analyses, covariates used were age, race, sex, pack-years of smoking, current smoking status, FEV1, complete blood count cell count proportions, library preparation batch (DGE) or clinical center (protein), and CT scanner model. We only reported the GO pathways with at least three significant genes. Enriched GO terms were identified using false discovery rate (FDR), 5%. Total number of genes in category refers to all genes studied that fall under the GO term. The number of adjusted Perc15-associated genes in category refers to the genes that reached significance (FDR, 5%) in the DGE and protein analyses.
Differential Isoform and Exon Use Analyses
We next performed differential isoform use (DIU) and DEU analyses on the COPDGene training dataset to investigate the changes in relative isoform and exon levels within single-parent genes. Out of 78,837 isoforms and 209,707 exons tested, 942 isoforms and 260 exons reached significance (FDR, 5%) (Table E4). The differentially used isoforms mapped to 801 individual genes, 40% of which (317 of 801) were also identified in the DGE analysis. The differentially used exons mapped to 170 genes (Table E5), 70% of which (119 of 170) were also differentially expressed. Fifty-five percent (520 of 942) of the significant isoforms and 36% (94 of 260) of the significant exons were upregulated with respect to adjusted Perc15 density (Figures 3B and 3C). The GO enrichment analyses performed on the differentially used isoforms and differentially used exons did not yield any significant pathways.
Figure E4 and Table E6 reveal the relationship between the number of differentially used isoforms per gene and the total number of isoforms expressed per gene. By definition, DIU is performed only when there are two or more expressed isoforms per gene. The total number of isoforms expressed for our DIU-tested genes ranges from 2 to 50 (mean, 10.7; SD, 8.07), whereas the number of differentially used isoforms ranges from 1 to 6 (mean, 1.27; SD, 0.62). As expected, for genes with a larger number of expressed isoforms, the number of differentially used isoforms also increases (Figure E4A). However, Figure E4B shows that in most genes where DIU is detected, only one differentially used isoform exists. In Figure E4C, we examined the isoform variability ratios (differentially used isoforms/total isoforms expressed), which demonstrates that for most genes with DIU, the isoform variability ratio is low, indicating that there are multiple non-DIU isoforms present within those genes. The splicing plots in Figure E5 demonstrate the deviation in the magnitude and direction of the coefficient of association with adjusted Perc15 density for the top differentially used isoforms and exons. These deviations serve as the underlying data support for the identification of differential use of these isoforms and exons from our analysis. Finally, we generated Sashimi plots for the 50 subjects exhibiting the most severe emphysema and the 50 subjects with the least severe emphysema (Figure E6). In several instances, the observed junctional reads align well with the DEU results. In other cases, the changes are more challenging to observe partly because of the high background of intronic reads. In general, splicing events can be defined in various ways at different levels of quantification, such as isoform switching based on isoform counts, DEU based on exonic reads, and differential splice site usage using junctional reads. Not surprisingly, these different approaches can yield different results, particularly for complex splicing events.
Cellular Deconvolution Analysis
The CIBERSORTx analysis revealed 11 cell types out of 22 with nonzero cell fractions: B cells naive, B cells memory, plasma cells, T cells CD4 naive, T cells CD4 memory resting, T cells CD4 memory activated, NK cells resting, monocytes, dendritic cells activated, mast cells resting, and neutrophils (Figure E7). Most associations that were significant in the analysis using blood count data remained significant in the analyses adjusting for the 11 CIBERSORTx cell type proportions. However, generally, the CD analysis revealed fewer biological pathways (Table E7).
Protein Association Analysis
We tested 4,979 SomaScan proteins measured in the training dataset using multivariable linear regression. Fourteen percent (714 of 4,979) were significantly associated with emphysema (FDR, 5%) (Table E8 and Figures 2B, E8, and E9). One significantly enriched biological process was identified: cell adhesion (Table 3). Figure 4 summarizes the overlap of the biomarkers between DGE, DIU, DEU, and protein analyses, showing that most of the significant biomarkers are unique to each analysis.
Figure 4.
Number of significant genes associated with adjusted 15th percentile of the attenuation histogram + 1,000 Hounsfield units (Perc15) density from the differential gene expression (DGE), differential isoform use (DIU), differential exon use (DEU), and protein association analyses in COPDGene. HUGO (Human Genome Organization) gene symbols were used to find the intersection of biomarkers between the DGE, DIU, DEU, and protein analyses. Multiple proteins may map to a single gene. Therefore, the diagram does not reflect the total number of proteins significantly associated with adjusted Perc15 density.
Validation of the Association Analyses
For internal validation of the emphysema biomarkers identified in the COPDGene training sample, we analyzed 1,016 subjects with RNA-seq and proteomic data in the COPDGene testing sample. We observed that the effect sizes were highly correlated between the analyses performed in the training and testing data for the DGE, DEU, and protein analyses (Pearson’s r = 0.84, 0.88, and 0.93, respectively). A lower correlation (r = 0.32) was observed in the DIU analysis. We further determined whether biomarkers were validated by using a threshold of (testing) P < 0.05 and checking if the training and testing data had a consistent direction of effect. Respectively, 41% (1,576 of 3,829), 29% (271 of 942), 60% (155 of 260), and 53% (376 of 714) of the DGE, DIU, DEU, and protein biomarkers were validated (Tables E3, E4, E5, and E8).
For external validation, we used blood microarray data and CT-quantified emphysema from 526 subjects with COPD in ECLIPSE. Of the 3,829 genes that reached significance in the COPDGene DGE analysis, 3,155 were available in the ECLIPSE microarray data. To demonstrate overall similarity in the results between COPDGene and ECLIPSE, we evaluated the Pearson correlation coefficient between the log fold changes for the emphysema differential expression analysis in both studies, which showed a strong positive correlation of 0.75. We also evaluated the validation rate of the significant COPDGene findings using a P value threshold of <0.05 in ECLIPSE, always requiring a consistent direction of effect in both studies. A total of 1,258 genes (or 39.87%) were validated. A total of 1,255 of these genes overlapped with the internally validated genes in the COPDGene testing sample.
Mediation Analysis
Because severe emphysema is often associated with low BMI, we performed sensitivity analyses that also adjusted for BMI. We observed that 98% (3,735 of 3,829) of the genes and 81% (576 of 714) of the proteins (Figure E10) associated with emphysema from the primary analysis were no longer significant after adjustment for BMI (Tables E9 and E10). These observations suggest that BMI mediates many of the emphysema-associated transcriptomic and proteomic changes. To investigate this, we performed mediation analysis based on the directed acyclic graph (DAG) (Figure E11) to divide each observed biomarker association into a direct pathway (emphysema directly affects gene/protein expression) and an indirect pathway (emphysema affects gene/protein expression via its effects on BMI). Of the 3,829 differentially expressed genes and 714 proteins that reached significance in the primary model (i.e., no BMI adjustment), 74% of genes (2,842 of 3,829) and 62% of proteins (444 of 714) showed evidence of mediation with a significant indirect effect and no significant direct effect (FDR, 5%) (Table 4; Tables E11 and E12).
Table 4.
Mediated Proportions and Direct, Indirect, and Total Effects of Top Five Most and Least Mediated Differentially Expressed Genes Significantly Associated with Adjusted Perc15 Density in COPDGene
| Ensembl Gene ID | HUGO Gene Name | Mediated Proportion | Direct Effect |
Indirect Effect |
Total Effect |
||||
|---|---|---|---|---|---|---|---|---|---|
| β-Coefficient | FDR | β-Coefficient | FDR | β-Coefficient | FDR | ||||
| Most mediated genes (genes with significant indirect effect) | ENSG00000160179 | ABCG1 | 0.822 | −0.001 | 0.297 | −0.005 | 8 × 10−32 | −0.006 | 2 × 10−18 |
| ENSG00000169877 | AHSP | 0.882 | 0.001 | 0.538 | 0.009 | 3 × 10−31 | 0.011 | 5 × 10−15 | |
| ENSG00000118113 | MMP8 | 1.054 | −0.001 | 0.849 | 0.010 | 3 × 10−26 | 0.009 | 1 × 10−9 | |
| ENSG00000158578 | ALAS2 | 0.912 | 0.001 | 0.710 | 0.008 | 1 × 10−24 | 0.009 | 2 × 10−11 | |
| ENSG00000119326 | CTNNAL1 | 0.928 | 0.000 | 0.783 | 0.006 | 2 × 10−24 | 0.007 | 9 × 10−11 | |
| Least mediated genes (genes with significant direct effect) | ENSG00000253230 | MIR124-1HG | −0.687 | 0.008 | 1 × 10−13 | −3 × 10−3 | 1 × 10−9 | 0.005 | 2 × 10−8 |
| ENSG00000154165 | GPR15 | 0.463 | 0.008 | 3 × 10−11 | −0.002 | 9 × 10−6 | 0.005 | 4 × 10-8 | |
| ENSG00000198574 | SH2D1B | −0.350 | −0.005 | 3 × 10−7 | 0.001 | 3 × 10−3 | −0.003 | 3 × 10−6 | |
| ENSG00000167680 | SEMA6B | −0.929 | 0.006 | 8 × 10−6 | −0.003 | 6 × 10−7 | 0.003 | 7 × 10−4 | |
| ENSG00000063438 | AHRR | −1.030 | 0.008 | 2 × 10−5 | −0.003 | 3 × 10−7 | 0.004 | 2 × 10−3 | |
Definition of abbreviation: FDR = false discovery rate.
Mediation analysis was performed to distinguish how much of the effect of emphysema on gene expression acted through body mass index (referred to as the indirect effect) and how much of the effect of emphysema directly influenced gene expression (referred to as the direct effect). Covariates: body mass index, sex, age, race, pack-years of smoking, current smoking status, and FEV1. Mediated proportions of top five genes are listed together with the coefficients and FDR of their direct, indirect, and total effects. Mediated proportion is defined as the ratio of indirect effect to the sum of the indirect and direct effects. Genes are sorted in order of decreasing FDR for the total effect.
Prediction
To select blood biomarkers for inclusion in cross-sectional predictive models for emphysema, we performed association analyses in the training dataset, adjusting only for technical factors (CT scanner model and library preparation batch/clinical center), which yielded 12,104 genes, 2,967 isoforms, 1,789 exons, and 1,298 proteins that were used as candidate predictors. To evaluate whether RNA-seq is more informative at the gene, isoform, or exon level, we trained three separate models (clinical + gene, clinical + isoform, and clinical + exon). The AUROCs were 0.78, 0.80, and 0.79, respectively (Table E13). We focused on gene-level quantifications for the subsequent models because the AUROC values did not heavily differ. We next evaluated the relative contribution of CBC proportions, genes, and proteins compared with a baseline model using clinical variables alone. We used bootstrapping for every model to capture the inherent variability of the datasets. The model using only clinical variables explained 34% of the variance of emphysema in the testing data. The gene-level expression data are predictive by themselves (R2 = 0.21) but less powerful than the clinical data only (R2 = 0.34). However, substantial improvement was seen from adding protein data (R2 = 0.39 for clinical + CBC + protein; Table E13). The model with clinical + CBC + gene + protein data did not perform as well as the model with clinical + CBC + protein data. The same pattern was seen when we evaluated model performance for distinguishing subjects in the top versus bottom emphysema tertile; the highest-performing model was the clinical + CBC + protein model with an AUROC of up to 0.90 (95% confidence interval, 0.85, 0.90). Figure 5 summarizes the model results, and Table E13 summarizes the AUROCs, α-values, and L1 ratios. Ranked by absolute β-coefficients, the top 10 predictors of a representative all-inclusive (clinical + CBC + gene + protein) model included BMI, sRAGE, and two biomarkers that have not previously been linked to emphysema (the MIR124-1HG gene and the PSMP protein) (Figure 6).
Figure 5.
The areas under the receiver operating characteristic curve (AUROCs) for the elastic net prediction models in COPDGene: clinical (age, race, sex, body mass index, pack-years of smoking, and current smoking status) only, clinical + complete blood count (CBC) proportions of neutrophils, eosinophils, monocytes, lymphocytes, and platelets, clinical + CBC + genes, clinical + CBC + proteins, and clinical + CBC + genes + proteins. The table summarizes the pairwise DeLong P values of the model comparisons. P < 0.05 values are bolded.
Figure 6.
Top 10 predictors sorted in descending order by the absolute values of their β-coefficients from the elastic net model using clinical (age, race, sex, body mass index [BMI], pack-years of smoking, and current smoking status), complete blood count (CBC) proportions of neutrophils, eosinophils, monocytes, lymphocytes, and platelets, gene, and protein data in COPDGene. The horizontal lines represent the magnitude of the coefficient for each feature. All predictors were centered and scaled.
We also used ECLIPSE to assess the accuracy of the clinical, gene, clinical + CBC, and clinical + CBC + gene prediction. The ECLIPSE study did not have proteomic data. We trained new models to predict quantitative emphysema in the COPDGene training dataset limited to the 1,255 genes that reached significance in COPDGene and were available in both the COPDGene RNA-seq data and ECLIPSE microarray data. We compared the AUROC values for subjects stratified into tertiles of emphysema severity in the COPDGene testing sample (internal validation) and ECLIPSE (external validation). The AUROC for the prediction of adjusted Perc15 in the models involving clinical, CBC, or gene data ranged from 0.79 to 0.87 in the COPDGene testing sample and from 0.71 to 0.75 in ECLIPSE (Table E14). The trends are similar between the two studies. The slightly lower predictive performance in ECLIPSE may be accounted for by differences in the baseline clinical characteristics and CT measures between the two studied cohorts (Table E2), as well as differences in the blood collection methods. Figure E12 reveals a more right-skewed distribution of adjusted Perc15 values in ECLIPSE compared with COPDGene.
Replication of Previously Reported Emphysema-associated Genes in Alveolar Macrophages and Bronchial Brushes
A previously published RNA-seq study by Morrow and colleagues examined the gene expression profiles of 63 alveolar macrophage and large-airway epithelium samples from 6 COPD cases and 15 control subjects (including 4 nonsmokers) (36). They performed differential expression analysis via the R package DESeq2 to test associations between gene expression and CT measures of emphysema. Ninety-nine genes in alveolar macrophages and 19 genes in bronchial epithelium were found to be significantly associated with emphysema at an FDR of 10%. We were only able to match 91 and 12 genes in our study for the alveolar macrophages and bronchial epithelium, respectively, because the remaining genes were filtered out, given their low expression levels (CPM, <0.2). We assessed the effect sizes and P values of these 91 and 12 genes in our study for both emphysema models with CBC data and with CD. For the alveolar macrophages, we observed nominal significance (P < 0.05) for 58 and 49 out of the 91 genes (64% and 54%) for our CBC and CD models, respectively. Of these, 15 and 12 out of the 58 and 49 genes (26% and 24%) had the same direction of effect for the CBC and CD models, respectively (Table E15). For the bronchial epithelium, we observed nominal significance for 5 (42%) out of the 12 genes, of which 3 (60%) out of 5 had the same direction of effect in both the CBC and CD models (Table E15). The Pearson correlations for the log fold changes of the alveolar macrophage and bronchial epithelium genes were 0.28 and −0.61 for the CBC model and 0.25 and −0.62 for the CD model, respectively (Figure E13).
A microarray study by Rathnayake and colleagues investigated the association between the gene expression profiling of samples from bronchial brushings and emphysema levels, measured by CT-based parametric response mapping in 44 individuals (32 asymptomatic smokers and 12 with COPD) (37). Using the R package limma, 133 genes reached statistical significance; however, the authors only reported the top 20 genes in their paper. Out of these top 20 genes, we could only match 11 in our study; the remaining genes had CPM <0.2 and were filtered out. Four (36%) of the 11 genes were nominally significant (P < 0.05) in our study, of which 50% (two of four) had the same direction of effect in both our CBC and CD models (Table E16). The Pearson correlation for the log fold changes of the four bronchial brush genes was 0.63 in the CBC model and 0.60 in the CD model (Figure E13).
These findings suggest that there are some shared tissue-specific transcriptomic patterns, though, as expected, there are also differences in the transcriptomic responses between whole blood, airway macrophages, and the bronchial epithelium. Caution should be taken when comparing the β-coefficients and measures of significance across studies, because there were differences in the study design (including emphysema measurements), sample size, and statistical methods. The log fold change values reported in both the Morrow and colleagues (36) and Rathnayake and colleagues (37) studies have the opposite direction as the log fold changes reported in our study because lower adjusted Perc15 values in our study correspond to more CT-quantified emphysema.
Discussion
In this study, we performed the largest blood transcriptomic and proteomic profiling of CT-quantified emphysema to date, including investigations into alternative splicing mechanisms, identifying thousands of validated blood biomarker associations. The biological relevance of these findings was assessed through GO pathway analyses, which mostly demonstrated enrichment for inflammatory pathways and cell differentiation. Mediation analysis revealed that 74% of the differentially expressed genes and 62% of the associated proteins are mediated through BMI, implying that blood biomarker associations with emphysema largely reflect shared biological processes with BMI. We also demonstrated the potential utility of incorporating multiomics data in predicting emphysema.
In previous biomarker studies of emphysema, the extracellular matrix, nuclear factor κB, transforming growth factor-β, B cell antigen receptor, and oxidative phosphorylation pathways were among the most frequently reported emphysema-associated pathways (8, 53, 54). However, most of these studies focused on a single omics modality (20, 24, 55). Our investigation of blood-based transcriptomic and proteomic biomarkers supported numerous established emphysema-associated pathways and enabled novel biomarker discovery. In addition, our alternative splicing analysis for the first time revealed widespread evidence of alternative splicing associated with emphysema. It is important to note that our analyses demonstrated association rather than causality with emphysema.
Most blood biomarker associations with emphysema occur through BMI, as indicated by the significant mediation of the tested genes and proteins. This suggests that some of the molecular processes identified in this analysis may be causally related to emphysema and BMI. We must keep in mind, though, that our mediation analysis is based on the following assumptions: no unmeasured confounding of the emphysema–BMI–gene expression/protein level relationship, no measurement error for the exposure or mediator, and the arrows in the DAG are correctly specified. Although our specified DAG is reasonable, based on prior knowledge, there are other plausible alternative DAGs but no currently available methods to simultaneously test these possibilities.
CT is the best noninvasive method for detecting emphysema. However, CT has several drawbacks, including increased costs, radiation exposure, and high rates of unrelated false-positive findings (5). Accurate risk prediction tools that use the best available data sources to stratify patients on the basis of their specific risk profiles could help with more efficient early and targeted interventions. Until recently, such prediction models were only created using data from a single omics type with or without standard clinical features (56–59). In this first study to use genes, alternative splicing, and proteins combined with clinical and CBC predictors, we developed models that could classify upper and lower tertiles of emphysema severity with good accuracy. Although alternative splicing predictors were worth exploring, gene data had a higher AUROC. Protein predictors, when combined with clinical + CBC predictors, yielded the best AUROC across all models. From the top 10 predictors of the clinical + CBC + gene + protein model, sRAGE, which minimizes tissue injury and inflammation, has consistently been recognized as a candidate emphysema biomarker (5, 10, 60, 61). Not previously connected to emphysema, PSMP has been implicated in inflammation and cancer development (62, 63), and MIR124-1HG has been shown to affect Wnt signaling and inflammation (64–67). Their putative roles and functions in emphysema require further investigation.
Our prediction models demonstrated reasonable predictive performance not only in internal validation (COPDGene testing sample) but also in external validation (ECLIPSE), despite some differences in data quality and data acquisition between the two studies. Furthermore, employing bootstrapping allowed more robust estimations. Additional work needs to be done in larger independent cohorts. Therefore, the predictive models from this study are not ready for clinical use but may be useful in designing emphysema clinical trials to enrich the study populations of patients who are most likely to benefit from therapeutic interventions. For clinical use, better-performing models that have been validated more extensively in multiple additional and relevant target populations are necessary.
Strengths and Limitations
This study has several strengths. The large sample size allowed us to identify many more significant associations than any previous study, and we could split our sample to allow the validation of our findings. This is the first study that has examined alternative splicing mechanisms in emphysema in addition to DGE and protein association analyses. The limma-voom method that we used in our analyses is a standard approach for RNA-seq that has been shown to perform favorably with respect to both edgeR and DEseq2 (42). DEseq2 and edgeR may underestimate FDR because of violations of the negative binomial assumption in large datasets (68), whereas limma-voom has been shown to consistently yield accurate FDR estimates by correctly modeling the mean–variance relationship rather than making specific distributional assumptions (69, 70). A comparison of software packages for detecting differential expression in RNA-seq studies has also suggested limma-voom as a robust method that performs generally well under many circumstances while being computationally fastest to run (71). The variables adjusted for as covariates in our biomarker discovery analyses were selected on the basis of their clinical relevance to emphysema and their frequent use in previous COPD studies (32, 72). Because emphysema often co-occurs with low BMI, we performed mediation analyses to better understand the relationship between molecular markers, emphysema, and BMI, providing suggestive evidence of shared biology between emphysema and BMI. Finally, we were able to improve emphysema prediction models with the use of multiomics data.
This study also has several limitations. CBC quantifications do not capture the variability of the immune cell subpopulations, which limits the ability to localize these effects to specific cell types. Future studies may address this by using single-cell transcriptomic data. Limitations to the SomaScan proteomics include the lack of SOMAmers for small molecules such as desmosine, fibrinogen degradation product (Aa-Val360, a specific product generated by elastase cleavage of fibrinogen), and sphingomyelin, which have been suggested to be emphysema biomarkers in other studies (40, 67, 73, 74). Next, the mediation analysis needs to be viewed as hypothesis generating because it is based on a number of assumptions. Last, it is important to note that the observed improvement in the AUROC when adding transcriptomic and proteomic biomarkers to clinical + CBC was relatively modest. Although blood-based biomarkers may provide some incremental benefit, the overall enhancement in predictive performance might be limited. Caution is needed when interpreting the implications and clinical significance of the predictive modeling.
Conclusions
Our transcriptomic and proteomic analyses yielded numerous inflammatory and cell differentiation pathways connected to emphysema as well as novel potential blood biomarkers of the disease. Although not yet ready to be used in clinical practice, with further validation, our prediction model might be helpful as a less invasive indicator of emphysema severity that could guide patient enrollment in clinical trials. Future research is necessary to compare blood and lung tissue biomarkers, understand how they change as emphysema progresses, and evaluate the impact of implementing these predictive models to personalize and improve patient care.
Acknowledgments
Acknowledgment
COPDGene investigators, core units:
Administrative center: James D. Crapo, M.D. (principal investigator); Edwin K. Silverman, M.D., Ph.D. (principal investigator); Barry J. Make, M.D.; Elizabeth A. Regan, M.D., Ph.D.
Genetic analysis center: Terri H. Beaty, Ph.D.; Peter J. Castaldi, M.D., M.Sc.; Michael H. Cho, M.D., M.P.H.; Dawn L. DeMeo, M.D., M.P.H.; Adel Boueiz, M.D., M.M.Sc.; Marilyn G. Foreman, M.D., M.S.; Auyon Ghosh, M.D.; Lystra P. Hayden, M.D., M.M.Sc.; Craig P. Hersh, M.D., M.P.H.; Jacqueline Hetmanski, M.S.; Brian D. Hobbs, M.D., M.M.Sc.; John E. Hokanson, M.P.H., Ph.D.; Wonji Kim, Ph.D.; Nan Laird, Ph.D.; Christoph Lange, Ph.D.; Sharon M. Lutz, Ph.D.; Merry-Lynn McDonald, Ph.D.; Dmitry Prokopenko, Ph.D.; Matthew Moll, M.D., M.P.H.; Jarrett Morrow, Ph.D.; Dandi Qiao, Ph.D.; Elizabeth A. Regan, M.D., Ph.D.; Aabida Saferali, Ph.D.; Phuwanat Sakornsakolpat, M.D.; Edwin K. Silverman, M.D., Ph.D.; Emily S. Wan, M.D.; Jeong Yun, M.D., M.P.H.
Imaging center: Juan Pablo Centeno; Jean-Paul Charbonnier, Ph.D.; Harvey O. Coxson, Ph.D.; Craig J. Galban, Ph.D.; MeiLan K. Han, M.D., M.S.; Eric A. Hoffman; Stephen Humphries, Ph.D.; Francine L. Jacobson, M.D., M.P.H.; Philip F. Judy, Ph.D.; Ella A. Kazerooni, M.D.; Alex Kluiber; David A. Lynch, M.B.; Pietro Nardelli, Ph.D.; John D. Newell, Jr., M.D.; Aleena Notary; Andrea Oh, M.D.; Elizabeth A. Regan, M.D., Ph.D.; James C. Ross, Ph.D.; Raul San Jose Estepar, Ph.D.; Joyce Schroeder, M.D.; Jered Sieren; Berend C. Stoel, Ph.D.; Juerg Tschirren, Ph.D.; Edwin Van Beek, M.D., Ph.D.; Bram van Ginneken, Ph.D.; Eva van Rikxoort, Ph.D.; Gonzalo Vegas Sanchez Ferrero, Ph.D.; Lucas Veitel; George R. Washko, M.D.; Carla G. Wilson, M.S.
Pulmonary function test quality assurance center, Salt Lake City, UT: Robert Jensen, Ph.D.
Data coordinating center and biostatistics, National Jewish Health, Denver, CO: Douglas Everett, Ph.D.; Jim Crooks, Ph.D.; Katherine Pratte, Ph.D.; Matt Strand, Ph.D.; Carla G. Wilson, M.S.
Epidemiology core, University of Colorado Anschutz Medical Campus, Aurora, CO: John E. Hokanson, M.P.H., Ph.D.; Erin Austin, Ph.D.; Gregory Kinney, M.P.H., Ph.D.; Sharon M. Lutz, Ph.D.; Kendra A. Young, Ph.D.
Mortality adjudication core: Surya P. Bhatt, M.D.; Jessica Bon, M.D.; Alejandro A. Diaz, M.D., M.P.H.; MeiLan K. Han, M.D., M.S.; Barry Make, M.D.; Susan Murray, Sc.D.; Elizabeth Regan, M.D.; Xavier Soler, M.D.; Carla G. Wilson, M.S.
Biomarker core: Russell P. Bowler, M.D., Ph.D.; Katerina Kechris, Ph.D.; Farnoush Banaei Kashani, Ph.D.
COPDGene Investigators, clinical centers:
Veterans Affairs Ann Arbor Healthcare System: Jeffrey L. Curtis, M.D.; Perry G. Pernicano, M.D.
Baylor College of Medicine, Houston, TX: Nicola Hanania, M.D., M.S.; Mustafa Atik, M.D.; Aladin Boriek, Ph.D.; Kalpatha Guntupalli, M.D.; Elizabeth Guy, M.D.; Amit Parulekar, M.D.
Brigham and Women’s Hospital, Boston, MA: Dawn L. DeMeo, M.D., M.P.H.; Craig Hersh, M.D., M.P.H.; Francine L. Jacobson, M.D., M.P.H.; George Washko, M.D.
Columbia University, New York, NY: R. Graham Barr, M.D., Dr.P.H.; John Austin, M.D.; Belinda D’Souza, M.D.; Byron Thomashow, M.D.
Duke University Medical Center, Durham, NC: Neil MacIntyre, Jr., M.D.; H. Page McAdams, M.D.; Lacey Washington, M.D.
HealthPartners Research Institute, Minneapolis, MN: Charlene McEvoy, M.D., M.P.H.; Joseph Tashjian, M.D.
Johns Hopkins University, Baltimore, MD: Robert Wise, M.D.; Robert Brown, M.D.; Nadia N. Hansel, M.D., M.P.H.; Karen Horton, M.D.; Allison Lambert, M.D., M.H.S.; Nirupama Putcha, M.D., M.H.S.
Lundquist Institute for Biomedical Innovation at Harbor UCLA Medical Center, Torrance, CA: Richard Casaburi, Ph.D., M.D.; Alessandra Adami, Ph.D.; Matthew Budoff, M.D.; Hans Fischer, M.D.; Janos Porszasz, M.D., Ph.D.; Harry Rossiter, Ph.D.; William Stringer, M.D.
Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX: Amir Sharafkhaneh, M.D., Ph.D.; Charlie Lan, D.O.
Minneapolis Veterans Affairs Healthcare System: Christine Wendt, M.D.; Brian Bell, M.D.; Ken M. Kunisaki, M.D., M.S.
Morehouse School of Medicine, Atlanta, GA: Eric L. Flenaugh, M.D.; Hirut Gebrekristos, Ph.D.; Mario Ponce, M.D.; Silanath Terpenning, M.D.; Gloria Westney, M.D., M.S.
National Jewish Health, Denver, CO: Russell Bowler, M.D., Ph.D.; David A. Lynch, M.B.
Reliant Medical Group, Worcester, MA: Richard Rosiello, M.D.; David Pace, M.D.
Temple University, Philadelphia, PA: Gerard Criner, M.D.; David Ciccolella, M.D.; Francis Cordova, M.D.; Chandra Dass, M.D.; Gilbert D’Alonzo, D.O.; Parag Desai, M.D.; Michael Jacobs, Pharm.D.; Steven Kelsen, M.D., Ph.D.; Victor Kim, M.D.; A. James Mamary, M.D.; Nathaniel Marchetti, D.O.; Aditi Satti, M.D.; Kartik Shenoy, M.D.; Robert M. Steiner, M.D.; Alex Swift, M.D.; Irene Swift, M.D.; Maria Elena Vega-Sanchez, M.D.
University of Alabama, Birmingham, AL: Mark Dransfield, M.D.; William Bailey, M.D.; Surya P. Bhatt, M.D.; Anand Iyer, M.D.; Hrudaya Nath, M.D.; J. Michael Wells, M.D.
University of California, San Diego, CA: Douglas Conrad, M.D.; Xavier Soler, M.D., Ph.D.; Andrew Yen, M.D.
University of Iowa, Iowa City, IA: Alejandro P. Comellas, M.D.; Karin F. Hoth, Ph.D.; John Newell, Jr., M.D.; Brad Thompson, M.D.
University of Michigan, Ann Arbor, MI: MeiLan K. Han, M.D., M.S.; Ella Kazerooni, M.D., M.S.; Wassim Labaki, M.D., M.S.; Craig Galban, Ph.D.; Dharshan Vummidi, M.D.
University of Minnesota, Minneapolis, MN: Joanne Billings, M.D.; Abbie Begnaud, M.D.; Tadashi Allen, M.D.
University of Pittsburgh, Pittsburgh, PA: Frank Sciurba, M.D.; Jessica Bon, M.D.; Divay Chandra, M.D., M.Sc.; Joel Weissfeld, M.D., M.P.H.
University of Texas Health, San Antonio, San Antonio, TX: Antonio Anzueto, M.D.; Sandra Adams, M.D.; Diego Maselli-Caceres, M.D.; Mario E. Ruiz, M.D.; Harjinder Singh, M.D.
A complete list of COPDGene study group members may be found before the beginning of the References.
Footnotes
This work was supported by National Heart, Lung, and Blood Institute grants K08 HL141601, K08 HL146972, K08 HL136928, K01 HL157613, R01 HL167072, R01 HL124233, U01 HL089897, R01 HL147326, R01 HL133135, P01 HL114501, U01 HL089897, and U01 HL089856. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an industry advisory committee composed of AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion. The ECLIPSE study (NCT00292552; GSK code SCO104960) was funded by GlaxoSmithKline.
Author Contributions: A.B. and P.J.C. had full access to all the data in the study, take responsibility for the integrity of the data and the accuracy of the data analysis, and had authority over manuscript preparation and the decision to submit the manuscript for publication. Study concept and design: A.B., P.J.C., and Z.X. Acquisition, analysis, or interpretation of data: All authors. Drafting of the manuscript: R.S., A.B., and P.J.C. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Z.X., S.M.L., P.J.C., and A.B. Obtained funding: A.B., P.J.C., and E.K.S. Study supervision: All authors. All authors gave final approval of the version to be published.
This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org.
Originally Published in Press as DOI: 10.1164/rccm.202301-0067OC on November 2, 2023
Author disclosures are available with the text of this article at www.atsjournals.org.
Contributor Information
for the COPDGene investigators:
James D. Crapo, Edwin K. Silverman, Barry J. Make, Elizabeth A. Regan, Terri H. Beaty, Peter J. Castaldi, Michael H. Cho, Dawn L. DeMeo, Adel Boueiz, Marilyn G. Foreman, Auyon Ghosh, Lystra P. Hayden, Craig P. Hersh, Jacqueline Hetmanski, Brian D. Hobbs, John E. Hokanson, Wonji Kim, Nan Laird, Christoph Lange, Sharon M. Lutz, Merry-Lynn McDonald, Dmitry Prokopenko, Matthew Moll, Jarrett Morrow, Dandi Qiao, Elizabeth A. Regan, Aabida Saferali, Phuwanat Sakornsakolpat, Edwin K. Silverman, Emily S. Wan, Jeong Yun, Juan Pablo Centeno, Jean-Paul Charbonnier, Harvey O. Coxson, Craig J. Galban, MeiLan K. Han, Eric A. Hoffman, Stephen Humphries, Francine L. Jacobson, Philip F. Judy, Ella A. Kazerooni, Alex Kluiber, David A. Lynch, Pietro Nardelli, John D. Newell, Jr., Aleena Notary, Andrea Oh, Elizabeth A. Regan, James C. Ross, Raul San Jose Estepar, Joyce Schroeder, Jered Sieren, Berend C. Stoel, Juerg Tschirren, Edwin Van Beek, Bram van Ginneken, Eva van Rikxoort, Gonzalo Vegas Sanchez Ferrero, Lucas Veitel, George R. Washko, Carla G. Wilson, Robert Jensen, Douglas Everett, Jim Crooks, Katherine Pratte, Matt Strand, Carla G. Wilson, John E. Hokanson, Erin Austin, Gregory Kinney, Sharon M. Lutz, Kendra A. Young, Surya P. Bhatt, Jessica Bon, Alejandro A. Diaz, MeiLan K. Han, Barry Make, Susan Murray, Elizabeth Regan, Xavier Soler, Carla G. Wilson, Russell P. Bowler, Katerina Kechris, Farnoush Banaei Kashani, Jeffrey L. Curtis, Perry G. Pernicano, Nicola Hanania, Mustafa Atik, Aladin Boriek, Kalpatha Guntupalli, Elizabeth Guy, Amit Parulekar, Dawn L. DeMeo, Craig Hersh, Francine L. Jacobson, George Washko, R. Graham Barr, John Austin, Belinda D’Souza, Byron Thomashow, Neil MacIntyre, Jr., H. Page McAdams, Lacey Washington, Charlene McEvoy, Joseph Tashjian, Robert Wise, Robert Brown, Nadia N. Hansel, Karen Horton, Allison Lambert, Nirupama Putcha, Richard Casaburi, Alessandra Adami, Matthew Budoff, Hans Fischer, Janos Porszasz, Harry Rossiter, William Stringer, Amir Sharafkhaneh, Charlie Lan, Christine Wendt, Brian Bell, Ken M. Kunisaki, Eric L. Flenaugh, Hirut Gebrekristos, Mario Ponce, Silanath Terpenning, Gloria Westney, Russell Bowler, David A. Lynch, Richard Rosiello, David Pace, Gerard Criner, David Ciccolella, Francis Cordova, Chandra Dass, Gilbert D’Alonzo, Parag Desai, Michael Jacobs, Steven Kelsen, Victor Kim, A. James Mamary, Nathaniel Marchetti, Aditi Satti, Kartik Shenoy, Robert M. Steiner, Alex Swift, Irene Swift, Maria Elena Vega-Sanchez, Mark Dransfield, William Bailey, Surya P. Bhatt, Anand Iyer, Hrudaya Nath, J. Michael Wells, Douglas Conrad, Xavier Soler, Andrew Yen, Alejandro P. Comellas, Karin F. Hoth, John Newell, Jr., Brad Thompson, MeiLan K. Han, Ella Kazerooni, Wassim Labaki, Craig Galban, Dharshan Vummidi, Joanne Billings, Abbie Begnaud, Tadashi Allen, Frank Sciurba, Jessica Bon, Divay Chandra, Joel Weissfeld, Antonio Anzueto, Sandra Adams, Diego Maselli-Caceres, Mario E. Ruiz, and Harjinder Singh
References
- 1. Lindberg A, Lindberg L, Sawalha S, Nilsson U, Stridsman C, Lundbäck B, et al. Large underreporting of COPD as cause of death—results from a population-based cohort study. Respir Med . 2021;186:106518. doi: 10.1016/j.rmed.2021.106518. [DOI] [PubMed] [Google Scholar]
- 2.Li Y, Swensen SJ, Karabekmez LG, Marks RS, Stoddard SM, Jiang R, et al. Effect of emphysema on lung cancer risk in smokers: a computed tomography-based assessment. Cancer Prev Res (Phila) 2011;4:43–50. doi: 10.1158/1940-6207.CAPR-10-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rahman HH, Niemann D, Munson-McGee SH. Association between asthma, chronic bronchitis, emphysema, chronic obstructive pulmonary disease, and lung cancer in the US population. Environ Sci Pollut Res Int . 2023;30:20147–20158. doi: 10.1007/s11356-022-23631-3. [DOI] [PubMed] [Google Scholar]
- 4. Morgan AD, Zakeri R, Quint JK. Defining the relationship between COPD and CVD: what are the implications for clinical practice? Ther Adv Respir Dis . 2018;12:1753465817750524. doi: 10.1177/1753465817750524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Carolan BJ, Hughes G, Morrow J, Hersh CP, O’Neal WK, Rennard S, et al. The association of plasma biomarkers with computed tomography-assessed emphysema phenotypes. Respir Res . 2014;15:127. doi: 10.1186/s12931-014-0127-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lopez-Campos JL, Alcazar B. Evaluation of symptomatic patients without airflow obstruction: back to the future. J Thorac Dis . 2016;8:E1657–E1660. doi: 10.21037/jtd.2016.12.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Guo NL, Wan YW. Network-based identification of biomarkers coexpressed with multiple pathways. Cancer Inform. 2014;13:37–47. doi: 10.4137/CIN.S14054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Zuo Q, Wang Y, Yang D, Guo S, Li X, Dong J, et al. Identification of hub genes and key pathways in the emphysema phenotype of COPD. Aging (Albany NY) . 2021;13:5120–5135. doi: 10.18632/aging.202432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Zhang Y, Tedrow J, Nouraie M, Li X, Chandra D, Bon J, et al. Elevated plasma level of pentraxin 3 is associated with emphysema and mortality in smokers. Thorax . 2021;76:335–342. doi: 10.1136/thoraxjnl-2020-215356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Paci P, Fiscon G, Conte F, Licursi V, Morrow J, Hersh C, et al. Integrated transcriptomic correlation network analysis identifies COPD molecular determinants. Sci Rep . 2020;10:3361. doi: 10.1038/s41598-020-60228-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Lamontagne M, Timens W, Hao K, Bossé Y, Laviolette M, Steiling K, et al. Genetic regulation of gene expression in the lung identifies CST3 and CD22 as potential causal genes for airflow obstruction. Thorax . 2014;69:997–1004. doi: 10.1136/thoraxjnl-2014-205630. [DOI] [PubMed] [Google Scholar]
- 12. Sakornsakolpat P, Morrow JD, Castaldi PJ, Hersh CP, Bossé Y, Silverman EK, et al. Integrative genomics identifies new genes associated with severe COPD and emphysema. Respir Res . 2018;19:46. doi: 10.1186/s12931-018-0744-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kowalski ML, Borowiec M, Kurowski M, Pawliczak R. Alternative splicing of cyclooxygenase-1 gene: altered expression in leucocytes from patients with bronchial asthma and association with aspirin-induced 15-HETE release. Allergy . 2007;62:628–634. doi: 10.1111/j.1398-9995.2007.01366.x. [DOI] [PubMed] [Google Scholar]
- 14. Deng N, Sanchez CG, Lasky JA, Zhu D. Detecting splicing variants in idiopathic pulmonary fibrosis from non-differentially expressed genes. PLoS One . 2013;8:e68352. doi: 10.1371/journal.pone.0068352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Cogan J, Austin E, Hedges L, Womack B, West J, Loyd J, et al. Role of BMPR2 alternative splicing in heritable pulmonary arterial hypertension penetrance. Circulation . 2012;126:1907–1916. doi: 10.1161/CIRCULATIONAHA.112.106245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Saferali A, Yun JH, Parker MM, Sakornsakolpat P, Chase RP, Lamb A, et al. Analysis of genetically driven alternative splicing identifies FBXO38 as a novel COPD susceptibility gene. PLoS Genet . 2019;15:e1008229. doi: 10.1371/journal.pgen.1008229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Saferali A, Xu Z, Sheynkman GM, Hersh CP, Cho MH, Silverman EK, et al. Characterization of a COPD-associated NPNT functional splicing genetic variant in human lung tissue via long-read sequencing. 2020.
- 18. Faiz A, van den Berge M, Vermeulen CJ, Ten Hacken NHT, Guryev V, Pouwels SD. AGER expression and alternative splicing in bronchial biopsies of smokers and never smokers. Respir Res . 2019;20:70. doi: 10.1186/s12931-019-1038-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Kim WJ, Lim JH, Lee JS, Lee SD, Kim JH, Oh YM. Comprehensive analysis of transcriptome sequencing data in the lung tissues of COPD subjects. Int J Genomics . 2015;2015:206937. doi: 10.1155/2015/206937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zhang YH, Hoopmann MR, Castaldi PJ, Simonsen K, Midha M, Cho MH, et al. Lung proteomic biomarkers associated with chronic obstructive pulmonary disease. Am J Physiol Lung Cell Mol Physiol . 2021;321:L1119–L1130. doi: 10.1152/ajplung.00198.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Faner R, Tal-Singer R, Riley JH, Celli B, Vestbo J, MacNee W, et al. Lessons from ECLIPSE: a review of COPD biomarkers. Thorax . 2014;69:666–672. doi: 10.1136/thoraxjnl-2013-204778. [DOI] [PubMed] [Google Scholar]
- 22. Miller M, Ramsdell J, Friedman PJ, Cho JY, Renvall M, Broide DH. Computed tomographic scan-diagnosed chronic obstructive pulmonary disease-emphysema: eotaxin-1 is associated with bronchodilator response and extent of emphysema. J Allergy Clin Immunol . 2007;120:1118–1125. doi: 10.1016/j.jaci.2007.08.045. [DOI] [PubMed] [Google Scholar]
- 23. Bracke KR, D’Hulst AI, Maes T, Moerloose KB, Demedts IK, Lebecque S, et al. Cigarette smoke-induced pulmonary inflammation and emphysema are attenuated in CCR6-deficient mice. J Immunol . 2006;177:4350–4359. doi: 10.4049/jimmunol.177.7.4350. [DOI] [PubMed] [Google Scholar]
- 24. Keene JD, Jacobson S, Kechris K, Kinney GL, Foreman MG, Doerschuk CM, et al. Biomarkers predictive of exacerbations in the SPIROMICS and COPDGene cohorts. Am J Respir Crit Care Med . 2017;195:473–481. doi: 10.1164/rccm.201607-1330OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Zemans RL, Jacobson S, Keene J, Kechris K, Miller BE, Tal-Singer R, et al. Multiple biomarkers predict disease severity, progression and mortality in COPD. Respir Res . 2017;18:117. doi: 10.1186/s12931-017-0597-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Celli BR, Cote CG, Marin JM, Casanova C, Montes de Oca M, Mendez RA, et al. The body-mass index, airflow obstruction, dyspnea, and exercise capacity index in chronic obstructive pulmonary disease. N Engl J Med . 2004;350:1005–1012. doi: 10.1056/NEJMoa021322. [DOI] [PubMed] [Google Scholar]
- 27. Thomsen M, Dahl M, Lange P, Vestbo J, Nordestgaard BG. Inflammatory biomarkers and comorbidities in chronic obstructive pulmonary disease. Am J Respir Crit Care Med . 2012;186:982–988. doi: 10.1164/rccm.201206-1113OC. [DOI] [PubMed] [Google Scholar]
- 28. McNicholas WT. COPD-OSA overlap syndrome: evolving evidence regarding epidemiology, clinical consequences, and management. Chest . 2017;152:1318–1326. doi: 10.1016/j.chest.2017.04.160. [DOI] [PubMed] [Google Scholar]
- 29. Suryadevara R, Gregory A, Masoomi A, Xu Z, Berman S, Yun JH, et al. Blood transcriptomics-based machine learning prediction of emphysema in smokers [abstract] Chest . 2021;160(4 Suppl):A1841–A1842. [Google Scholar]
- 30.Suryadevara R, Gregory A, Lu R, Xu Z, Masoomi A, Lutz SM, et al. Blood-based transcriptomic and proteomic biomarkers of radiologic emphysema. 2022. p. 2022.2010.2025.22281458.
- 31. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic Epidemiology of COPD (COPDGene) study design. COPD . 2010;7:32–43. doi: 10.3109/15412550903499522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Vestbo J, Anderson W, Coxson HO, Crim C, Dawber F, Edwards L, et al. Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE) Eur Respir J . 2008;31:869–873. doi: 10.1183/09031936.00111707. [DOI] [PubMed] [Google Scholar]
- 33. Quanjer PH, Stanojevic S, Cole TJ, Baur X, Hall GL, Culver BH, et al. Multi-ethnic reference values for spirometry for the 3–95-yr age range: the global lung function 2012 equations. Eur Respir J . 2012;40:1324–1343. doi: 10.1183/09031936.00080312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Parr DG, Sevenoaks M, Deng C, Stoel BC, Stockley RA. Detection of emphysema progression in alpha 1-antitrypsin deficiency using CT densitometry; methodological advances. Respir Res . 2008;9:21. doi: 10.1186/1465-9921-9-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Pompe E, Strand M, van Rikxoort EM, Hoffman EA, Barr RG, Charbonnier JP, et al. Five-year progression of emphysema and air trapping at CT in smokers with and those without chronic obstructive pulmonary disease: results from the COPDGene study. Radiology . 2020;295:218–226. doi: 10.1148/radiol.2020191429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Morrow JD, Chase RP, Parker MM, Glass K, Seo M, Divo M, et al. RNA-sequencing across three matched tissues reveals shared and tissue-specific gene expression and pathway signatures of COPD. Respir Res . 2019;20:65. doi: 10.1186/s12931-019-1032-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Rathnayake SNH, Hoesein F, Galban CJ, Ten Hacken NHT, Oliver BGG, van den Berge M, et al. Gene expression profiling of bronchial brushes is associated with the level of emphysema measured by computed tomography-based parametric response mapping. Am J Physiol Lung Cell Mol Physiol . 2020;318:L1222–L1228. doi: 10.1152/ajplung.00051.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics . 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Gold L, Ayers D, Bertino J, Bock C, Bock A, Brody EN, et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One . 2010;5:e15004. doi: 10.1371/journal.pone.0015004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Serban KA, Pratte KA, Strange C, Sandhaus RA, Turner AM, Beiko T, et al. Unique and shared systemic biomarkers for emphysema in alpha-1 antitrypsin deficiency and chronic obstructive pulmonary disease. EBioMedicine . 2022;84:104262. doi: 10.1016/j.ebiom.2022.104262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res . 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol . 2014;15:R29. doi: 10.1186/gb-2014-15-2-r29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B . 1995;57:289–300. [Google Scholar]
- 44. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform . 2013;14:178–192. doi: 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res . 2019;47:D766–D773. doi: 10.1093/nar/gky955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol . 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Ryu MH, Yun JH, Morrow JD, Saferali A, Castaldi P, Chase R, et al. Blood gene expression and immune cell subtypes associated with chronic obstructive pulmonary disease exacerbations. Am J Respir Crit Care Med . 2023;208:247–255. doi: 10.1164/rccm.202301-0085OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Steen J, Loeys T, Moerkerke B, Vansteelandt S. medflex: an R package for flexible mediation analysis using natural effect models J Stat Softw 2017. 76 1 46 36568334 [Google Scholar]
- 49. Alexa A, Rahnenführer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics . 2006;22:1600–1607. doi: 10.1093/bioinformatics/btl140. [DOI] [PubMed] [Google Scholar]
- 50. Quan D, Ren J, Ren H, Linghu L, Wang X, Li M, et al. Exploring influencing factors of chronic obstructive pulmonary disease based on elastic net and Bayesian network. Sci Rep . 2022;12:7563. doi: 10.1038/s41598-022-11125-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature . 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- 52. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics . 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Faner R, Cruz T, Casserras T, López-Giraldo A, Noell G, Coca I, et al. Network analysis of lung transcriptomics reveals a distinct B-cell signature in emphysema. Am J Respir Crit Care Med . 2016;193:1242–1253. doi: 10.1164/rccm.201507-1311OC. [DOI] [PubMed] [Google Scholar]
- 54. Cruickshank-Quinn CI, Jacobson S, Hughes G, Powell RL, Petrache I, Kechris K, et al. Metabolomics and transcriptomics pathway approach reveals outcome-specific perturbations in COPD. Sci Rep . 2018;8:17132. doi: 10.1038/s41598-018-35372-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Qiu W, Cho MH, Riley JH, Anderson WH, Singh D, Bakke P, et al. Genetics of sputum gene expression in chronic obstructive pulmonary disease. PLoS One . 2011;6:e24395. doi: 10.1371/journal.pone.0024395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Liu Q, Sun D, Wang Y, Li P, Jiang T, Dai L, et al. Use of machine learning models to predict prognosis of combined pulmonary fibrosis and emphysema in a Chinese population. BMC Pulm Med . 2022;22:327. doi: 10.1186/s12890-022-02124-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Humphries SM, Notary AM, Centeno JP, Strand MJ, Crapo JD, Silverman EK, et al. Deep learning enables automatic classification of emphysema pattern at CT. Radiology . 2020;294:434–444. doi: 10.1148/radiol.2019191022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Castaldi PJ, Boueiz A, Yun J, Estepar RSJ, Ross JC, Washko G, et al. Machine learning characterization of COPD subtypes: insights from the COPDGene study. Chest . 2020;157:1147–1157. doi: 10.1016/j.chest.2019.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Castaldi PJ, Dy J, Ross J, Chang Y, Washko GR, Curran-Everett D, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax . 2014;69:415–422. doi: 10.1136/thoraxjnl-2013-203601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Castaldi PJ, Cho MH, San Jose Estepar R, McDonald ML, Laird N, Beaty TH, et al. Genome-wide association identifies regulatory loci associated with distinct local histogram emphysema patterns. Am J Respir Crit Care Med . 2014;190:399–409. doi: 10.1164/rccm.201403-0569OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Pratte KA, Curtis JL, Kechris K, Couper D, Cho MH, Silverman EK, et al. Soluble receptor for advanced glycation end products (sRAGE) as a biomarker of COPD. Respir Res . 2021;22:127. doi: 10.1186/s12931-021-01686-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. She S, Wu X, Zheng D, Pei X, Ma J, Sun Y, et al. PSMP/MSMP promotes hepatic fibrosis through CCR2 and represents a novel therapeutic target. J Hepatol . 2020;72:506–518. doi: 10.1016/j.jhep.2019.09.033. [DOI] [PubMed] [Google Scholar]
- 63. Pei X, Sun Q, Zhang Y, Wang P, Peng X, Guo C, et al. PC3-secreted microprotein is a novel chemoattractant protein and functions as a high-affinity ligand for CC chemokine receptor 2. J Immunol . 2014;192:1878–1886. doi: 10.4049/jimmunol.1300758. [DOI] [PubMed] [Google Scholar]
- 64. Huyghe A, Van den Ackerveken P, Sacheli R, Prévot PP, Thelen N, Renauld J, et al. MicroRNA-124 regulates cell specification in the cochlea through modulation of Sfrp4/5. Cell Rep . 2015;13:31–42. doi: 10.1016/j.celrep.2015.08.054. [DOI] [PubMed] [Google Scholar]
- 65.Bayat A, Saki N, Nikakhlagh S, Mirmomeni G, Raji H, Soleimani H, et al. Is COPD associated with alterations in hearing? A systematic review and meta-analysis. Int J Chron Obstruct Pulmon Dis. 2019;14:149–162. doi: 10.2147/COPD.S182730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Ortapamuk H, Naldoken S. Brain perfusion abnormalities in chronic obstructive pulmonary disease: comparison with cognitive impairment. Ann Nucl Med . 2006;20:99–106. doi: 10.1007/BF02985621. [DOI] [PubMed] [Google Scholar]
- 67. Cantor J, Ochoa A, Ma S, Liu X, Turino G. Free desmosine is a sensitive marker of smoke-induced emphysema. Lung . 2018;196:659–663. doi: 10.1007/s00408-018-0163-1. [DOI] [PubMed] [Google Scholar]
- 68. Li Y, Ge X, Peng F, Li W, Li JJ. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biol . 2022;23:79. doi: 10.1186/s13059-022-02648-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Gauthier M, Agniel D, Thiébaut R, Hejblum BP. dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate. NAR Genom Bioinform . 2020;2:lqaa093. doi: 10.1093/nargab/lqaa093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Pimentel H, Bray NL, Puente S, Melsted P, Pachter L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods . 2017;14:687–690. doi: 10.1038/nmeth.4324. [DOI] [PubMed] [Google Scholar]
- 71. Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform . 2015;16:59–70. doi: 10.1093/bib/bbt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Lamprecht B, McBurnie MA, Vollmer WM, Gudmundsson G, Welte T, Nizankowska-Mogilnicka E, et al. COPD in never smokers: results from the population-based burden of obstructive lung disease study. Chest . 2011;139:752–763. doi: 10.1378/chest.10-1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Manon-Jensen T, Langholm LL, Rønnow SR, Karsdal MA, Tal-Singer R, Vestbo J, et al. End-product of fibrinogen is elevated in emphysematous chronic obstructive pulmonary disease and is predictive of mortality in the ECLIPSE cohort. Respir Med . 2019;160:105814. doi: 10.1016/j.rmed.2019.105814. [DOI] [PubMed] [Google Scholar]
- 74. Bowler RP, Jacobson S, Cruickshank C, Hughes GJ, Siska C, Ory DS, et al. Plasma sphingolipids associated with chronic obstructive pulmonary disease phenotypes. Am J Respir Crit Care Med . 2015;191:275–284. doi: 10.1164/rccm.201410-1771OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The raw RNA-seq and proteomic data for COPDGene were made available through dbGaP (COPDGene accession number phs000765.v3.p2). Additional methods are available in the online supplement.






