Abstract
Rationale: Disease activity in idiopathic pulmonary fibrosis (IPF) remains highly variable, poorly understood, and difficult to predict.
Objectives: To identify a predictor using short-term longitudinal changes in gene expression that forecasts future FVC decline and to characterize involved pathways and cell types.
Methods: Seventy-four patients from COMET (Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in IPF) cohort were dichotomized as progressors (≥10% FVC decline) or stable. Blood gene-expression changes within individuals were calculated between baseline and 4 months and regressed with future FVC status, allowing determination of expression variations, sample size, and statistical power. Pathway analyses were conducted to predict downstream effects and identify new targets. An FVC predictor for progression was constructed in COMET and validated using independent cohorts. Peripheral blood mononuclear single-cell RNA-sequencing data from healthy control subjects were used as references to characterize cell type compositions from bulk peripheral blood mononuclear RNA-sequencing data that were associated with FVC decline.
Measurements and Main Results: The longitudinal model reduced gene-expression variations within stable and progressor groups, resulting in increased statistical power when compared with a cross-sectional model. The FVC predictor for progression anticipated patients with future FVC decline with 78% sensitivity and 86% specificity across independent IPF cohorts. Pattern recognition receptor pathways and mTOR pathways were downregulated and upregulated, respectively. Cellular deconvolution using single-cell RNA-sequencing data identified natural killer cells as significantly correlated with progression.
Conclusions: Serial transcriptomic change predicts future FVC decline. An analysis of cell types involved in the progressor signature supports the novel involvement of natural killer cells in IPF progression.
Keywords: longitudinal changes of blood gene expression, relative decline of FVC, idiopathic pulmonary fibrosis, multigene predictor for progression, cell type composition deconvolution
At a Glance Commentary
Scientific Knowledge on the Subject
The inability to predict short-term FVC change has impacted therapeutic development, as large sample sizes are required for this rare disease to ensure that a sufficient number of patients progress during the trial follow-up period.
What This Study Adds to the Field
Our results support a method for using longitudinal transcriptomic changes to predict disease progression in idiopathic pulmonary fibrosis. The pathways involved identify mechanisms recognized in fibrosis. The role of natural killer cells in this process is novel.
Idiopathic pulmonary fibrosis (IPF) is a deadly and progressive scarring lung disease (1). The heterogeneity of disease progression suggests that pathogenic activity varies over time and by individual. Clinically, disease activity is commonly measured by a change in physiologic measures such as the FVC and the DlCO. Longitudinal change in FVC has been successfully used as an endpoint in IPF clinical trials (2, 3). Cross-sectional–based models reliably predict long-term IPF outcomes; however, they fail to predict the short-term change in FVC. Clinical prediction models reliably predict increased mortality risk but fail to accurately predict FVC decline (4). The inability to predict short-term FVC change has impacted therapeutic development, as large sample sizes are required for this rare disease to ensure that a sufficient number of patients progress during the trial follow-up period.
We previously showed that a cross-sectional 52-gene signature from peripheral blood successfully predicted transplant-free survival in IPF cohorts around the world (5, 6) and was largely driven by adaptive immune response pathways. Like prior clinical studies, this and other cross-sectional transcriptomic signatures were less reliable in predicting baseline and short-term change in FVC (7). Similarly, cross-sectional evaluation of plasma and serum surrogates for predicting change in FVC have been met with limited success (8–10).
We hypothesized that longitudinal changes in peripheral blood gene expression (GE) would be predictive of future and not past FVC decline. We used these findings to derive a longitudinal expression-based predictor of FVC decline, compared this approach to the cross-sectional model, and validated our FVC predictor in independent IPF cohorts. We then determined pathways and cell types underpinning this gene predictor of FVC decline. Some of the results of this study have previously been reported in the form of an abstract (11).
Methods
IPF Cohorts and Transcriptome Data Collection
The COMET (Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in IPF) training cohort (n = 74) was a prospective observational trial with follow-up visits every 4 months for 1 year (7) (NCT01071707). Validation cohorts included prospectively enrolled patients at the University of Chicago (UChicago) (n = 27) and Imperial College London (Imperial) (n = 24) (6). The study was approved by institutional review boards and ethics committees at participating centers. All participants provided written informed consent. Patients with IPF were diagnosed according to international guidelines (1, 12). The presence of progressive disease was defined as ≥10% relative decline in FVC over 12 months in the training cohort. Patients in the independent validation cohorts that possessed two serial blood draws with both baseline pulmonary function test (PFT) and a subsequent PFT later than the second draw to allow prediction were included. Each patient’s progression status was not reclassified during the follow-up time period after the index progression time point if progression occurred. This was to maintain the baseline %FVC and progressor versus stable categorization. COMET and Imperial transcriptome data were generated from peripheral blood mononuclear (PBMC) and whole blood, respectively, using Affymetrix PrimeView Array and Affymetrix Human Gene 1.1 ST Array, respectively. UChicago transcriptome data was generated from bulk PBMC RNA-sequencing (PBMC RNA-seq). Detailed clinical demographics (see Table E1 in the online supplement) and transcriptome data collection for each cohort are in the online supplement.
Comparison of Longitudinal and Cross-sectional Models
The coefficient of variation (CoV) defined as the ratio of gene-specific SD to the mean was computed using baseline GE (CoVGE) for cross-sectional model and using 0–4 months GE changes (CoVΔGE) for the longitudinal model. Intrasubject CoV was defined as:
where d is the difference between two paired measurements and m is the mean of paired measurements (13). Sample size and statistical power were calculated with the assumption of a completely randomized two-group design using R/CRAN package “sizepower” (14). Intrasubject CoV of progressor and stable patients in each COMET subset cohort with various transcriptomic sampling starting time points and intervals was also assessed for homogeneity of GE changes.
Construction of an FVC Predictor for Progression in Training and Validation in Independent Cohorts
Figure 1 illustrates the steps used in constructing the FVC predictor for progression in the training cohort and subsequent validation in independent cohorts. Using the Bioconductor package “limma” to implement the empirical Bayesian-moderated t test in R (15), we derived the within-patient GE changes between the baseline and 4-month visit (ΔGE0–4-mo) for each patient in the COMET training cohort and then compared between stable and progressive groups defined by < or ⩾10% FVC decline status, which is one of the selected longitudinal features commonly observed in clinical practice associated with increased mortality in patients with IPF (1). P values were adjusted for multiple comparisons using the Benjamini-Hochberg method (16). The significant genes were defined as false discovery rate (FDR) <5% and fold change >2. Pathways and network analyses were performed using Ingenuity Pathway Analysis software with Fisher exact test. The R package “glmet” (17–19) was used to perform logistic least absolute shrinkage and selection operator (LASSO) to enhance the prediction accuracy via variable selection and regularization. Tenfold Cross-Validation (CV) was performed in conjunction with LASSO regression to evaluate classification rate accuracy. Genes were filtered using P < 0.05 and CV support ⩾50% criteria to compile a final list of genes as the FVC predictor for progression. The equation to calculate the predicted FVC-predictor score is defined as:
where Bi is the LASSO regression coefficients derived from the training cohort, Xi is the logarithmic GE values in the test cohort, and N is the number of the matched FVC predictor genes in the test cohort. Receiver operator analysis was performed using R-CRAN package “pROC” (20) and “OptimalCutpoints” (21).
Figure 1.

Flowchart of development and validation of the FVC predictor for progression. (A) COMET training cohort. Steps of identifying the FVC predictor consisting of 25 genes predictive of future FVC decline status using a short-term longitudinal (0–4 mo) within-patient gene-expression changes (ΔGE0–4-mo) model are shown. (B) Independent validation cohorts and COMET subset cohorts with different transcriptome assay platforms. (B1) Longitudinal gene-expression changes in the independent validation cohorts were calculated between baseline and the median of immediate follow-up sampling time points. “Cross-GE-platform-gene-matched” step comparing varied transcriptome assay platforms retained 23 and 18 of the 25 genes in the FVC predictor from the Imperial College London and the University of Chicago (UChicago) cohorts, respectively. Scores were determined as continuous values by a least-squares multiple regression model for each matched gene in each cohort and regressed with future FVC decline. (B2) Longitudinal gene-expression changes in three COMET time point subsets (ΔGE0–8-mo, ΔGE4–8-mo, and ΔGE0–12-mo) were calculated and regressed with future FVC decline. ROC/AUC analysis was used to test the prognosis prediction efficiency for both internal and external cohorts. (C) Ingenuity pathway analyses were applied to significant genes between stable and progressors with false discovery rate (FDR) > 0.05 and fold change > 2. AUC = area under the curve; COMET = Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis; ΔGE0–4-mo = gene-expression changes between the baseline and 4-month visit; ΔGE0–8-mo = gene-expression changes between the baseline and 8-month visit; ΔGE0–12-mo = gene-expression changes between the baseline and 12-month visit; ΔGE4–8-mo = gene-expression changes between the 4-month and 8-month visit; LASSO = Logistic Least Absolute Shrinkage and Selection Operator; PBMC = peripheral blood mononuclear cell; ROC = receiver operating characteristic.
Deconvolution of Bulk UChicago Longitudinal PBMC RNA-seq
The PBMC single-cell RNA (scRNA)-seq data set of healthy donors was freely downloaded from Broad Institute Single Cell Portal and used as a reference to perform cell type deconvolution of bulk UChicago longitudinal PBMC RNA-seq. Method details are provided in the online supplement.
Results
The demographic and clinical features of each cohort are shown in Table 1. In the COMET training cohort, 22% (16/74) experienced ⩾10% FVC decline, suggestive of active fibrotic pathogenesis in the disease, and were characterized as progressor patients, whereas the prevalence for ⩾10% FVC decline was 30% and 63% for UChicago and Imperial cohorts, respectively. These higher rates likely reflect the effect of longer follow-up times. The median prediction time from the second blood draw to the PFT follow up was approximately 12 months in UChicago and approximately 6 months in Imperial. No significant differences in demographics or lung function were noted in the training and validation cohorts when stratified by 10% FVC decline status (Table E2).
Table 1.
Clinical Demographics in Idiopathic Pulmonary Fibrosis Study Cohorts
| Clinical Demographics | Training: | Independent Validation |
|
|---|---|---|---|
| COMET | Imperial | UChicago | |
| Sample size | 74 | 24 | 27 |
| GE sampling from baseline to median of immediate follow-up time point (IQR) | 4 mo (n/a)* | 6 mo (n/a)* | 16.6 mo (9.3–25.9) |
| FVC follow-up median (IQR) | 12 mo (n/a)* | 12 mo (7.8–16.2) | 28 mo (20.2–35.8) |
| Sex, M/F | 52/22 | 18/6 | 19/8 |
| White | 94.6% | 94.3% | 85.2% |
| Smokers | 66.2% | 57.1% | 48.1% |
| Age, mean ± SD | 66.6 ± 7.6 | 66.8 ± 7.2 | 66.1 ± 6.5 |
| Baseline FVC % predicted, mean ± SD | 69.7 ± 18.4 | 73.3 ± 19.2 | 64.2 ± 16.2 |
| Stable/progressor (by 10% FVC) | 58/16 | 9/15 | 19/8 |
| Stable/progressor (by 15% DlCO) | 48/26 | 5/17† | 9/14† |
Definition of abbreviations: COMET = Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis; GE = gene expression; Imperial = Imperial College London; IQR = interquartile range; n/a = not applicable; UChicago = University of Chicago.
Fixed blood drawn (i.e., gene expression) sampling time point.
Contains missing values.
We assessed the homogeneity of the GE of the cross-sectional and longitudinal models by the CoV, which estimates the level of dispersion around the ratio of the SD to the mean. The minus versus average graphs (Figure 2) assess what percentage of the genes have greater expression variations, whereby the percentage of genes are classified using CoVΔGE–CoVΔGE > 0. Among patients with stable status, 89% of the genes had greater variations in their cross-sectional GE than the corresponding longitudinal GE changes (ΔGE), that is, demonstrated positive values in Figure 2A. Similarly, 67.5% of genes demonstrated greater variations in cross-sectional GE than the longitudinal ΔGE in the progressor group (Figure 2B). In concordance with the reduced gene variations observed in longitudinal ΔGE data than in the cross-sectional GE data, our statistical power calculation demonstrated that the longitudinal model only required 16 cases to achieve 90% power (1–β) with a significance level of α = 0.05 (Figure 2C). This was in contrast to the cross-sectional model, which required 63 progressors for the same performance. These results justified our study using COMET as the training cohort (16 progressors) in the predictor construct. We further examined the intrasubject CoV between progressor and stable patients in all three COMET time point subsets (ΔGE0–4-mo, ΔGE4–8-mo, and ΔGE8–12-mo), and the results remained consistent, with the progressor group retaining its homogeneity of ΔGE variations (34–40%) (Figure 2D). Genes may be more tightly regulated in patients experiencing progression than those with stable disease. This is supported by the observation of fewer intrapatient variations of ΔGE in the progressor group compared with the stable group. These molecular findings provided the rationale and feasibility of using longitudinal GE modeling for the prediction of future FVC decline. Because any differences in variations at different sampling time points are subtle, we retained the most clinically applicable timeframe, ΔGE0–4-mo, to develop the FVC predictor for progression.
Figure 2.
Coefficient of variation (CoV) analyses and power estimation of COMET (Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis) cohort data. The CoV, defined as the ratio of gene-specific SD to the mean, was computed using baseline gene expression (CoVGE) for the cross-sectional model and 0- to 4-month gene-expression change (CoVΔGE) for the longitudinal model in stable and progressor groups of the COMET training cohort, respectively. (A and B) Minus versus average plots assess the portion of the genes with greater within-group variations, as determined by CoVGE–CoVΔGE > 0. (A) Eighty-nine percent of the genes have CoVGE–CoVΔGE > 0, indicating that variations within the stable group are greater in the cross-sectional model than in the longitudinal change model. (B) Sixty-seven and a half percent of the genes have CoVGE–CoVΔGE > 0, indicating that variations within the progressor group are greater in the cross-sectional model than in the longitudinal changes model. (C) Power estimation on the basis of postulated sample sizes of peripheral blood mononuclear transcriptome. The horizontal dotted line indicates a power of 0.9, at an α of 0.05, whereas the corresponding sample size is 63 for baseline GE and 16 for GE changes (ΔGE). (D) Intrasubject CoV of ΔGE analysis across different peripheral blood mononuclear sampling time in COMET. The black bar represents ΔGE with larger intrasubject CoV in progressor than in stable patients, and the gray bar represents ΔGE with larger intrasubject CoV in stable patients than in progressor patients. The results remained consistent with the progressor group retaining fewer variations (34–40%). GE = gene expression.
To construct the FVC predictor for progression, 19,394 annotated genes were included from the training cohort. Comparing the progressors (n = 16) with stable patients (n = 58) identified 3,906 genes at FDR < 0.05. A total of 418 of these genes had a fold change >2. Overall, 167 and 251 genes demonstrated increased or decreased expression over time, respectively, and were subjected to pathway analysis. Multiple innate immune pathways, including mTOR signaling, APK/JNK, CD40, IL-8, and IL-23 signaling, were upregulated (Figure 3A), whereas pattern recognition receptors of bacteria and viruses, TREM1, and IFN signaling were downregulated (Figure 3B). Genes involved in each pathway are listed in Table E3. In addition, network analysis revealed the downregulated genes with EGR1 at the hub center (Figure 3C). LASSO regression further reduced the genes to 39, which correctly classified 88% (n = 65/74) of patients in the training set. Genes for the FVC predictor were prioritized based on regression using criteria of P < 0.05 and 10-fold CV support ⩾50%, which resulted in 25 genes that compose the FVC predictor for progression (Table 2). We used a least-squares multiple regression model to summarize the prediction score for FVC decline. Multicollinearity occurs when independent variables in a multiple regression model are correlated. The robustness of prediction fitted by a multiple regression model can be highly affected or biased by the multicollinearity of the predictor genes. Therefore, we inspected the correlation matrix of the 25 genes in the COMET training cohort. The average of absolute correlation coefficients between any gene pairs was 0.19. The highest correlation coefficient of 0.45 was only observed between LINC00319 and FAM111B (Figure E1).
Figure 3.
Ingenuity pathway analyses. A two-group comparison with criteria of FDR < 5% and fold change > 2 identified 167 with ΔGE0–4-mo higher in progressor than in stable patients and 251 genes with ΔGE0–4-mo lower in progressor than in stable patients. (A and B) Multiple innate immune pathways, including mTOR signaling, APK/JNK, CD40, IL-8, and IL-23 signaling, were upregulated (A), whereas pattern recognition receptors of bacteria and viruses, TREM1, and IFN signaling were downregulated (B). (C) Network analysis for the downregulated genes with EGR1 at the hub center. ΔGE0–4-mo = gene-expression changes between the baseline and 4-month visit.
Table 2.
List of 25 FVC Predictors for Progression Derived from COMET Training Cohort and Their Corresponding LASSO Regression Coefficient and Percentage of Tenfold CV Support (≥50% CV)
| Gene Symbol* | Gene Description | LASSO Regression Coefficient | % CV Support | Matched in Imperial Cohort† | Matched in UChicago Cohort† |
|---|---|---|---|---|---|
| CNR2 | Cannabinoid receptor 2 | −1.066 | 100 | § | ‖ |
| ITLN1 | Intelectin 1 | 3.440 | 100 | § | ‖ |
| LINC00319 | Long intergenic nonprotein coding RNA 319 | 4.758 | 100 | § | _ |
| PCDHB15 | Protocadherin β 15 | 3.521 | 100 | § | ‖ |
| RAB3C | RAB3C, RAS oncogene family | −2.218 | 100 | § | ‖ |
| MSR1 | Macrophage scavenger receptor 1 | 2.157 | 90 | § | ‖ |
| SSU72P8 | SSU72 pseudogene 8 | −1.367 | 90 | _ | _ |
| MAZ | MYC associated zinc finger protein | 2.188 | 80 | § | ‖ |
| NT5E | 5′-nucleotidase ecto | −3.361 | 80 | § | ‖ |
| PLA2G4A | Phospholipase A2 group IVA | −0.542 | 80 | § | ‖ |
| ATP6AP1L | ATPase H+ transporting accessory protein 1 like | −2.080 | 70 | § | _ |
| IGLC1 | Immunoglobulin lambda constant 1 | −0.0005 | 70 | _ | _ |
| TP63 | Tumor protein p63 | 1.710 | 70 | § | ‖ |
| ZNF252P | Zinc finger protein 252, pseudogene | −0.289 | 70 | § | ‖ |
| APTX | Aprataxin | −1.380 | 60 | § | ‖ |
| GYPA | Glycophorin A (MNS blood group) | 0.029 | 60 | § | ‖ |
| HBB | Hb subunit β | 0.037 | 60 | § | ‖ |
| PNMA5 | PNMA family member 5 | −0.942 | 60 | § | ‖ |
| RLBP1 | Retinaldehyde binding protein 1 | 0.038 | 60 | § | _ |
| FAM111B | Family with sequence similarity 111 member B | −0.601 | 50 | § | ‖ |
| GABRR1 | γ-aminobutyric acid type A receptor rho1 subunit | 0.058 | 50 | § | _ |
| GPR39 | G protein–coupled receptor 39 | 0.706 | 50 | § | _ |
| PAWR | Proapoptotic WT1 regulator | −0.515 | 50 | § | ‖ |
| PLCL1 | Phospholipase C like 1 (inactive) | 0.617 | 50 | § | † |
| RBM43 | RNA binding motif protein 43 | 0.284 | 50 | § | † |
Definition of abbreviations: COMET = Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis; CV = cross-validation; GE = gene expression; Imperial = Imperial College London; LASSO = Logistic Least Absolute Shrinkage and Selection Operator; UChicago = University of Chicago.
Genes in FVC predictor derived from COMET training cohort. Detailed prioritization criteria are described in Figure 1A.
Genes in FVC predictor derived from COMET training cohort were mapped by “cross-GE-platform-gene-match.”
Genes mapped to Imperial cohort (details are described in Figure 1B1).
Gene mapped to UChicago cohort (details are described in Figure 1B1).
Hierarchical clustering using the 25 genes successfully discriminated FVC decline status while having no association with DlCO decline status (Figure 4A). The principal component analysis map of the training data confirmed the distinct separation of stable and progressive patients (Figure 4B). The principal component analysis variables factor map aligned the direction of association of individual genes with these groups (Figure 4C).
Figure 4.
Classification of COMET (Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis) training cohort using the genes constituting FVC predictor for progression. (A) Hierarchical clustering of the 74 patients with idiopathic pulmonary fibrosis in COMET. All of the 16 FVC progressor (FVC = 1) patients (in blue) were enriched in the bottom cluster, whereas the upper cluster only contained FVC stable (FVC = 0) patients (in green). Red, white, and blue colors indicate gene expression change values above, at, or below the average gene-expression changes of the corresponding gene. (B and C) Principal component analysis of COMET training cohort based on the FVC predictor using R/CRAN package “FactoMineR.” (B) Individual factor map with confidence ellipses around FVC progressor (blue) or stable (green) status. (C) Variables factor map with predictor genes. The projection of the arrowhead of each variable (i.e., gene) onto each dimension represents the component loadings of the corresponding gene. Dimension 1 (Dim 1) and Dimension 2 (Dim 2) represent the amount of the variations contained in the original data set.
To assess the generalizability of the FVC predictor for progression, we conducted a validation analysis using two independent cohorts with each using differing RNA collections and transcriptome platforms. The Imperial cohort used RNA extracted from whole blood and was previously run on a microarray platform (Affymetrix Human Gene 1.1 ST Array). However, the UChicago cohort used RNA isolated from PBMCs, and the transcriptome was determined using a bulk RNA-seq approach. Only 23 and 18 of the 25 genes in the FVC predictor were matched in the Imperial and UChicago cohorts, respectively (Table 2).
We calculated new FVC-GE predictor scores by the matched genes and assessed FVC-predictor performance. UChicago and Imperial cohorts achieved a sensitivity and specificity ranging between 0.75 and 0.78 and 0.90 and 0.78, respectively (Table E4). The positive predictive values were 0.78 and 0.86; the negative predictive values were 0.90 and 0.70. Aggregation of validation cohorts provided a sensitivity/specificity of 78.3%/85.7% with positive predictive values/negative predictive values of 0.82/0.83. The receiver operating characteristic (ROC) analysis revealed areas under the curve (AUCs) of 0.80 and 0.77 for UChicago and Imperial cohorts, respectively, using 10% FVC decline as a predicted outcome (Figure 5A). Examination of a 5% FVC decline as a cutoff was also conducted to ascertain if this smaller level of decline would maintain prediction. ROC analysis results of the independent validation cohorts showed comparable if expected slightly poorer results than the 10% FVC decline ROCs (Figure E2).
Figure 5.

Receiver operating characteristic and area under the curve analysis (AUC) of FVC predictor for progression. AUC values with 95% confidence intervals are displayed in the bottom right of each graph. The dashed red line denotes specificity at 75%. (A) Independent validation cohorts. At anchored specificity of approximately 75%, the sensitivities are 75.0% and 80.0% for UChicago and Imperial cohorts, respectively. (B) Training and subset of the COMET (Correlating Outcomes with Biochemical Markers to Estimate Time-Progression in Idiopathic Pulmonary Fibrosis) cohort (I) with increasing transcriptome sampling durations for determination of ΔGE. At anchored specificity of approximately 75%, sensitivities are 100%, 92.3%, and 78.6% for 0–4, 0–8, and 0–12 months, respectively. FVC-predictor performance modestly diminished moving from sampling intervals of 4 month to 8 month and 4 month to 12 month. (C) Training and subsets of the COMET cohort (II) with 4-month transcriptome sampling but varying baselines for ΔGE determination. At anchored specificity of approximately 75%, sensitivities are 100%, 69.2%, and 36.4% for 0–4, 4–8, and 8–12 months, respectively. Performance diminishes more dramatically and the use of months 8–12 is ineffective. Detailed receiver operating characteristic/AUC analysis results can be found in Tables E3 and E4. ΔGE = genetic-expression changes; Imperial = Imperial College London; UChicago = University of Chicago.
We then evaluated the different sampling starting time points (i.e., ΔGE0–4-mo vs. ΔGE4–8-mo and ΔGE8–12-mo) and interval differences (i.e., ΔGE0–4-mo vs. ΔGE0–8-mo and ΔGE0–12-mo) to better understand the relationship between GE changes and FVC changes. Hierarchical clustering did not reveal appreciable changes, as patients were successfully clustered by FVC decline status and continued to not have any association with DlCO decline status (Figures E3A–E3C). The new FVC predictor for progression score was computed and validated between diverse time points in the COMET cohort (Table E5). ROC analysis of the FVC predictor of progression demonstrated performance decay with an increasing duration between blood sampling, with a decrease in AUC from the training set (0–4 mo) value of 1 to an AUC = 0.9 for 0–8 months and 0.83 for 0–12 month sampling durations (Figure 5B). When the 4–8 month sampling period was used, despite a shift of baseline, the AUC remained 0.78 (Figure 5C). GE changes between 0 and 12 months or 8–12 months were not meant to measure future FVC prediction, as there was no additional follow-up time in COMET. We performed a correlation analysis of FVC predictor scores derived from different GE sampling time points to baseline to ascertain if the GE behavior was maintained in subjects over time. The coefficient and P value of FVC predictor scores were r = 0.63 and P = 9.2 × 10−9 between 0–4 months and 0–8 months, r = 0.62 and P = 1.2 × 10−6 between 0–4 months and 0–12 months, and r = 0.75 and P = 8.4 × 10−11 between 0–8 months and 0–12 months. These findings indicated that the FVC predictor for progression was relatively conserved across the disease’s natural history.
We used annotated PBMC scRNA-seq data from healthy donors as a reference to perform cell type deconvolution analysis. The heat map of the correlation matrix revealed a positive correlation of FVC predictor score with longitudinal change in monocytes and natural killer (NK) cells and a negative correlation with B cells, CD4, and CD8 T cells (Figure 6A). Pearson’s correlation test further confirmed the significance of the correlation between FVC predictor score and longitudinal changes of NK cells (Figure 6B) (r = 0.45, P = 0.02).
Figure 6.
Correlation of FVC-predictor scores with longitudinal changes in cell type abundance. Deconvolution of University of Chicago longitudinal bulk peripheral blood mononuclear RNA-sequencing data was performed using single-cell RNA-sequencing data derived from peripheral blood mononuclear cells of healthy donors. (A) Heat map of the correlation matrix of each cell type and FVC-predictor scores for progression. The red and green represent correlation and anticorrelation, respectively. (B) Pearson’s correlation of FVC predictor for progression score with each cell type change was displayed in the corresponding box (the coefficient on top and P value in parentheses). The color in each box represents the direction of correlation (red) or anticorrelation (green). The scale bar on the right indicates the degree of correlation. NK = natural killer.
Discussion
Our study demonstrates that serial sampling from the COMET cohort enabled us to determine what transcriptomic change predicts clinical outcomes in IPF. FVC decline events are highly variable in patients with IPF (22–26). This variability necessitates larger sample sizes to avoid underpowering clinical trials and studies using an FVC endpoint. For simplicity of application, cross-sectional prediction models are preferred over ones that require serial sampling. However, our longitudinal sampling model increased power substantially over one-time sampling, even in time frames as short as 4-month intervals. The resulting reduction in the required sample size helped facilitate the development of a multiple-GE predictor capable of anticipating future categorical FVC decline events, independent of DlCO decline events. The molecular pathways captured appear attributable to fibrosis and immune regulation. We further validated the prognosis FVC predictor in two independent cohorts. Cellular deconvolution of the longitudinal changes of each cell type identified a potential role for NK cells in the progression of IPF. Thus, a longitudinal model may offer a means for selecting patients more likely to progress within the context of a clinical trial. Furthermore, the gene pathways revealed may offer better intervenable targets than those from cross-sectional designs.
We took a stepwise approach in identifying an FVC predictor for progression to better understand molecular contributions in blood cells to disease progression as defined by FVC decline. This involved moving from the largest gene set to a refined 25 genes. Ingenuity pathway analyses revealed prominent immune pathways. However, CTLA4 and ICOS/CD28 pathways commonly associated with T cell lymphocyte expression noted in our mortality transcriptomic surrogate (4, 5) were conspicuously absent, suggesting that restriction of the surrogate endpoint from composite mortality markers to early lung function decline matters in model development. Instead, FVC decline was associated with downregulation of pattern recognition receptors of bacteria and viruses and IFN signaling and upregulation of CD40, IL-8, and IL-23 signaling. Interestingly, plasma levels of all three of these are associated with outcomes in ILD, including IPF (27, 28), suggesting pathogenic involvement of macrophages and NK cells. Together, our findings support the role of the innate immune system in early events preceding lung function decline, whereas our previous work reveals a role for adaptive immunity in patients with early mortality (5).
Genes in the FVC predictor including TP63, NT5E, FAM111B, HBB, PLA2G4A, MSR1, CNR2, and ITLN1 are linked to lung fibrosis. TP63 has been reported in the abnormal reepithelialization and lung remodeling in IPF (29), whereas CD73 (NTE5) enhances radiation-induced lung fibrosis in mice (30). Though PBMCs are predominantly immune cells, our findings support that peripheral blood is capable of reflecting fibrotic pathways and signaling activity in the lungs. Similar findings have been demonstrated with longitudinal changes in circulating plasma biomarkers (31).
The consistent test performance of the FVC predictor in two independent, international IPF cohorts is a study strength. Despite differences in the type of blood sampled, platforms used to capture transcriptome data, and the intervals between blood sampling among the cohorts, the AUC performances supported certain clinical application flexibility. Although the transcriptome sampling and PFT intervals were fixed in COMET, such time points varied across the IPF registry cohorts more closely approximates clinical practice. The ideal timeframe for blood sample acquisition remains unknown. Our data also suggest decay in predictive power as the time between samplings is increased from 4 to 8 and 12 months. This is congruent with our original hypothesis that disease activity of IPF varies and is not linear. Another explanation is that model selects for immutable GE traits found in a specific subset of patients with more rapid disease trajectory instead of changing disease activity. However, this would seem less likely, as many patients with IPF experience stepwise progression of the disease.
The use of scRNA-seq data from control individuals enabled deconvolution of our bulk PBMC RNA-seq data set to determine data attributable to various cell types in the UChicago cohort. When we correlated the presumptive cell types with our predictor score, our findings suggest a novel role of NK cells in IPF. NK cells secrete cytotoxic granules perforin and granzyme, and as innate immune first responders, they enhance the downstream immune response via secretion of cytokines IFN-γ and TNF-α, in turn affecting cells such as macrophages and dendritic cells (32). Among peripheral organs, the lung contains the largest percentage of NK cells, in which mature terminally differentiated NK cell subsets with high effector function dominate in health (33). Their protective role against fibrosis is believed to be mediated through IFN-γ (34). Yet, NK cells in the peripheral blood are reduced in patients with IPF compared with control subjects (35). In addition, in the BAL fluid of patients with IPF, NKG2D, an activation receptor on NK cells, is reduced and suggests an impaired function for these cells (36). Our data suggest upregulation in GE over time in patients experiencing an FVC decline event. This upregulation may reflect a dynamic increase in the number of cells or increased activation in the peripheral compartment. Either etiology could be a compensatory response to ongoing disease stimulation. This would fit with previous data demonstrating ties between viral infection and IPF progression (37) and increased lung bacterial loads, specifically Streptococcus pneumoniae, and poor outcomes reflective of disease progression in IPF (38). Finally, NK cells have been implicated in the pathogenesis of fibrosis and regenerative repair in other organs such as the liver (39). Our cellular deconvolution of bulk RNA-seq data demonstrates the activity of NK cell GE preceding FVC decline in IPF progression.
This study has several limitations. Lack of a consistent transcriptome platform resulted in an incomplete overlap of genes in the FVC predictor and restricted us from applying a universal FVC-GE predictor score threshold. Instead, we applied a modified threshold based on ROCs using the available data. Because the number of initial predictor variables (genes) vastly exceeds the number of observations (patients), we chose LASSO regression for simultaneous gene selection and regularization to decrease the size of coefficients and enhance the prediction accuracy and interpretability of the resulting statistical model. However, other popular regularization statistics such as Ridge regression may have a unique advantage in controlling potential multicollinearity between predictor genes. Several genes in the FVC predictor had very small coefficients, but they arose in our model with markedly significant P values and percentage of CV support. It may be that a shorter gene list still performs equally or that other alternative genes in a more extensive ranking list ultimately perform better. GE requires normalization before downstream data analysis. Batch effects associated with RNA isolation and cDNA library preparation also prohibit uniform scoring and cutoff. Although we have validated the predictor in subsets of the COMET cohort and in external independent cohorts, each cohort’s size is still relatively small. Greater statistical power might have been achieved with FVC as a continuous variable, but we sought to preserve simplicity for assessing ROC/AUC curves. Consequently, this study is underpowered to compute a uniform cutoff criterion. None of the subjects were on U.S. Food and Drug Administration–approved therapies preventing assessment of responses to drugs. At this juncture, these limitations make it impossible to clarify the best future application of the FVC predictor for progression, whether in phase II versus phase III trials for more rapid readouts of a biologic endpoint or for sample size reduction in patient selection. There is a need for more validation in different patient populations and in the setting of approved therapies.
Our findings support the plausibility of a blood-based molecular signature for disease progression. The potential of such a signature may provide a surrogate endpoint in future clinical trials that could facilitate efficiency and power in developing therapeutics for IPF. The precise role of NK cells in disease pathogenesis remains unknown at this time; however, their role is intriguing for future functional studies. We suspect that the most accurate predictor will be derived from clinical trial populations that incorporate multiple molecular markers as a model. Further refinement of our predictor in larger cohorts on a uniform transcriptome assay platform in conjunction with therapeutic intervention is warranted, which may lead to a precision medicine approach in IPF.
Footnotes
Supported by NHLBI grants R01HL130796 and UG3HL145266 (I.N.), NHLBI grant K23HL138190 and American Lung Association Dalsemer Award (J.M.O.), NHLBI grant K23HL143135 (C.A.B.), NHLBI grant R35HL144481 (B.B.M.), NHLBI grant K23HL146942 (A.A.), and Action for Pulmonary Mike Bray fellowship (P.L.M.).
Author Contributions: Y.H., J.M.O., S.-F.M., and I.N. supervised the study. Y.H., A.U., and S.-Y.L. performed the analyses. Y.H., J.M.O., S.-F.M., S.-Y.L., A.J.B., C.A.B., J.S.K., A.A., N.K., and I.N. interpreted the data. J.M.O., S.-F.M., C.A.B., R.V., A.A., M.E.S., P.L.M., T.M.M., J.D.H.-M., N.K., B.B.M., F.J.M., and I.N. participated in sample and data collection. Y.H., J.M.O., S.-F.M., and I.N. wrote the draft of the manuscript. All authors participated in the critical revision and final approval of the manuscript.
This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org.
Originally Published in Press as DOI: 10.1164/rccm.202008-3093OC on March 9, 2021
Author disclosures are available with the text of this article at www.atsjournals.org.
References
- 1. Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, et al. ATS/ERS/JRS/ALAT Committee on Idiopathic Pulmonary Fibrosis. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. Am J Respir Crit Care Med . 2011;183:788–824. doi: 10.1164/rccm.2009-040GL. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. du Bois RM, Weycker D, Albera C, Bradford WZ, Costabel U, Kartashov A, et al. Forced vital capacity in patients with idiopathic pulmonary fibrosis: test properties and minimal clinically important difference. Am J Respir Crit Care Med . 2011;184:1382–1389. doi: 10.1164/rccm.201105-0840OC. [DOI] [PubMed] [Google Scholar]
- 3. Karimi-Shah BA, Chowdhury BA. Forced vital capacity in idiopathic pulmonary fibrosis--FDA review of pirfenidone and nintedanib. N Engl J Med . 2015;372:1189–1191. doi: 10.1056/NEJMp1500526. [DOI] [PubMed] [Google Scholar]
- 4. Ley B, Bradford WZ, Vittinghoff E, Weycker D, du Bois RM, Collard HR. Predictors of mortality poorly predict common measures of disease progression in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2016;194:711–718. doi: 10.1164/rccm.201508-1546OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Herazo-Maya JD, Noth I, Duncan SR, Kim S, Ma SF, Tseng GC, et al. Peripheral blood mononuclear cell gene expression profiles predict poor outcome in idiopathic pulmonary fibrosis. Sci Transl Med . 2013;5:205ra136. doi: 10.1126/scitranslmed.3005964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Herazo-Maya JD, Sun J, Molyneaux PL, Li Q, Villalba JA, Tzouvelekis A, et al. Validation of a 52-gene risk profile for outcome prediction in patients with idiopathic pulmonary fibrosis: an international, multicentre, cohort study. Lancet Respir Med . 2017;5:857–868. doi: 10.1016/S2213-2600(17)30349-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Yang IV, Luna LG, Cotter J, Talbert J, Leach SM, Kidd R, et al. The peripheral blood transcriptome identifies the presence and extent of disease in idiopathic pulmonary fibrosis. PLoS One . 2012;7:e37708. doi: 10.1371/journal.pone.0037708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Hamai K, Iwamoto H, Ishikawa N, Horimasu Y, Masuda T, Miyamoto S, et al. Comparative study of circulating MMP-7, CCL18, KL-6, SP-A, and SP-D as disease markers of idiopathic pulmonary fibrosis. Dis Markers . 2016;2016:4759040. doi: 10.1155/2016/4759040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Neighbors M, Cabanski CR, Ramalingam TR, Sheng XR, Tew GW, Gu C, et al. Prognostic and predictive biomarkers for patients with idiopathic pulmonary fibrosis treated with pirfenidone: post-hoc assessment of the CAPACITY and ASCEND trials. Lancet Respir Med . 2018;6:615–626. doi: 10.1016/S2213-2600(18)30185-1. [DOI] [PubMed] [Google Scholar]
- 10. Naik PK, Bozyk PD, Bentley JK, Popova AP, Birch CM, Wilke CA, et al. COMET Investigators. Periostin promotes fibrosis and predicts progression in patients with idiopathic pulmonary fibrosis. Am J Physiol Lung Cell Mol Physiol . 2012;303:L1046–L1056. doi: 10.1152/ajplung.00139.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Huang Y, Oldham JM, Ma SF, Hou PC, Strek ME, Molyneaux PL, et al. Short-term longitudinal gene expression changes predict forced vital capacity decline in idiopathic pulmonary fibrosis [abstract] Am J Respir Crit Care Med . 2020;201:A2426. [Google Scholar]
- 12. American Thoracic Society; European Respiratory Society. American Thoracic Society/European Respiratory Society international multidisciplinary consensus classification of the idiopathic interstitial pneumonias: this joint statement of the American Thoracic Society (ATS), and the European Respiratory Society (ERS) was adopted by the ATS board of directors, June 2001 and by the ERS Executive Committee, June 2001. Am J Respir Crit Care Med . 2002;165:277–304. doi: 10.1164/ajrccm.165.2.ats01. [DOI] [PubMed] [Google Scholar]
- 13. Hyslop NP, White WH. Estimating precision using duplicate measurements. J Air Waste Manag Assoc . 2009;59:1032–1039. doi: 10.3155/1047-3289.59.9.1032. [DOI] [PubMed] [Google Scholar]
- 14. Lee ML, Whitmore GA. Power and sample size for DNA microarray studies. Stat Med . 2002;21:3543–3570. doi: 10.1002/sim.1335. [DOI] [PubMed] [Google Scholar]
- 15. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol . 2004;3:Article3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]
- 16. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc . 1995;57:289–300. [Google Scholar]
- 17. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw . 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
- 18. Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for cox’s proportional hazards model via coordinate descent. J Stat Softw . 2011;39:1–13. doi: 10.18637/jss.v039.i05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, et al. Strong rules for discarding predictors in lasso-type problems. J R Stat Soc Series B Stat Methodol . 2012;74:245–266. doi: 10.1111/j.1467-9868.2011.01004.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics . 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. López-Ratón M, Rodríguez-Álvarez MX, Cadarso-Suárez C, Gude-Sampedro F. OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic tests. J Stat Softw . 2014;61:1–36. [Google Scholar]
- 22. Noth I, Anstrom KJ, Calvert SB, de Andrade J, Flaherty KR, Glazer C, et al. Idiopathic Pulmonary Fibrosis Clinical Research Network (IPFnet) A placebo-controlled randomized trial of warfarin in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2012;186:88–95. doi: 10.1164/rccm.201202-0314OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Raghu G, Anstrom KJ, King TE, Jr, Lasky JA, Martinez FJ. Idiopathic Pulmonary Fibrosis Clinical Research Network. Prednisone, azathioprine, and N-acetylcysteine for pulmonary fibrosis. N Engl J Med . 2012;366:1968–1977. doi: 10.1056/NEJMoa1113354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Zisman DA, Schwarz M, Anstrom KJ, Collard HR, Flaherty KR, Hunninghake GW. Idiopathic Pulmonary Fibrosis Clinical Research Network. A controlled trial of sildenafil in advanced idiopathic pulmonary fibrosis. N Engl J Med . 2010;363:620–628. doi: 10.1056/NEJMoa1002110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. King TE, Jr, Bradford WZ, Castro-Bernardini S, Fagan EA, Glaspole I, Glassberg MK, et al. ASCEND Study Group. A phase 3 trial of pirfenidone in patients with idiopathic pulmonary fibrosis. N Engl J Med . 2014;370:2083–2092. doi: 10.1056/NEJMoa1402582. [DOI] [PubMed] [Google Scholar]
- 26. Richeldi L, du Bois RM, Raghu G, Azuma A, Brown KK, Costabel U, et al. INPULSIS Trial Investigators. Efficacy and safety of nintedanib in idiopathic pulmonary fibrosis. N Engl J Med . 2014;370:2071–2082. doi: 10.1056/NEJMoa1402584. [DOI] [PubMed] [Google Scholar]
- 27. Richards TJ, Kaminski N, Gibson KF. Plasma proteins for risk prediction in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2012;185:1329–1330. doi: 10.1164/ajrccm.185.12.1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Adegunsoye A, Oldham JM, Bonham C, Hrusch C, Nolan P, Klejch W, et al. Prognosticating outcomes in interstitial lung disease by mediastinal lymph node assessment: an observational cohort study with independent validation. Am J Respir Crit Care Med . 2019;199:747–759. doi: 10.1164/rccm.201804-0761OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Chilosi M, Poletti V, Murer B, Lestani M, Cancellieri A, Montagna L, et al. Abnormal re-epithelialization and lung remodeling in idiopathic pulmonary fibrosis: the role of deltaN-p63. Lab Invest . 2002;82:1335–1345. doi: 10.1097/01.lab.0000032380.82232.67. [DOI] [PubMed] [Google Scholar]
- 30. Wirsdörfer F, Jendrossek V. The role of lymphocytes in radiotherapy-induced adverse late effects in the lung. Front Immunol . 2016;7:591. doi: 10.3389/fimmu.2016.00591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Maher TM, Oballa E, Simpson JK, Porte J, Habgood A, Fahy WA, et al. An epithelial biomarker signature for idiopathic pulmonary fibrosis: an analysis from the multicentre PROFILE cohort study. Lancet Respir Med . 2017;5:946–955. doi: 10.1016/S2213-2600(17)30430-7. [DOI] [PubMed] [Google Scholar]
- 32. Abel AM, Yang C, Thakar MS, Malarkannan S. Natural killer cells: development, maturation, and clinical utilization. Front Immunol . 2018;9:1869. doi: 10.3389/fimmu.2018.01869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Dogra P, Rancan C, Ma W, Toth M, Senda T, Carpenter DJ, et al. Tissue determinants of human NK cell development, function, and residence. Cell . 2020;180:749–763, e13. doi: 10.1016/j.cell.2020.01.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Jiang D, Liang J, Hodge J, Lu B, Zhu Z, Yu S, et al. Regulation of pulmonary fibrosis by chemokine receptor CXCR3. J Clin Invest . 2004;114:291–299. doi: 10.1172/JCI16861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Galati D, De Martino M, Trotta A, Rea G, Bruzzese D, Cicchitto G, et al. Peripheral depletion of NK cells and imbalance of the Treg/Th17 axis in idiopathic pulmonary fibrosis patients. Cytokine . 2014;66:119–126. doi: 10.1016/j.cyto.2013.12.003. [DOI] [PubMed] [Google Scholar]
- 36. Aquino-Galvez A, Pérez-Rodríguez M, Camarena A, Falfan-Valencia R, Ruiz V, Montaño M, et al. MICA polymorphisms and decreased expression of the MICA receptor NKG2D contribute to idiopathic pulmonary fibrosis susceptibility. Hum Genet . 2009;125:639–648. doi: 10.1007/s00439-009-0666-1. [DOI] [PubMed] [Google Scholar]
- 37. Moore BB, Moore TA. Viruses in idiopathic pulmonary fibrosis: etiology and exacerbation. Ann Am Thorac Soc . 2015;12:S186–S192. doi: 10.1513/AnnalsATS.201502-088AW. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Huang Y, Ma SF, Espindola MS, Vij R, Oldham JM, Huffnagle GB, et al. COMET-IPF Investigators. Microbes are associated with host innate immune response in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2017;196:208–219. doi: 10.1164/rccm.201607-1525OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Tosello-Trampont A, Surette FA, Ewald SE, Hahn YS. Immunoregulatory role of NK cells in tissue inflammation and regeneration. Front Immunol . 2017;8:301. doi: 10.3389/fimmu.2017.00301. [DOI] [PMC free article] [PubMed] [Google Scholar]




