Skip to main content
American Journal of Respiratory and Critical Care Medicine logoLink to American Journal of Respiratory and Critical Care Medicine
. 2023 Oct 3;208(11):1196–1205. doi: 10.1164/rccm.202303-0395OC

Clonal Somatic Mutations in Chronic Lung Diseases Are Associated with Reduced Lung Function

Jeong H Yun 1,2,3,*, M A Wasay Khan 4,*, Auyon Ghosh 5, Brian D Hobbs 1,2,3, Peter J Castaldi 1,3, Craig P Hersh 1,2,3, Peter G Miller 3,6, Carlyne D Cool 7, Frank Sciurba 8, Lucas Barwick 9, Andrew H Limper 10, Kevin Flaherty 11, Gerard J Criner 12, Kevin Brown 13, Robert Wise 14, Fernando Martinez 15, Edwin K Silverman 1,3, Dawn DeMeo 1,3; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Michael H Cho 1,2,3,, Alexander G Bick 4,‡,
PMCID: PMC10868367  PMID: 37788444

Abstract

Rationale

Constantly exposed to the external environment and mutagens such as tobacco smoke, human lungs have one of the highest somatic mutation rates among all human organs. However, the relationship of these mutations to lung disease and function is not known.

Objectives

To identify the prevalence and significance of clonal somatic mutations in chronic lung diseases.

Methods

We analyzed the clonal somatic mutations from 1,251 samples of normal and diseased noncancerous lung tissue RNA sequencing with paired whole-genome sequencing from the Lung Tissue Research Consortium. We examined the associations of somatic mutations with lung function, disease status, and computationally deconvoluted cell types in two of the most common diseases represented in our dataset, chronic obstructive pulmonary disease (COPD; 29%) and idiopathic pulmonary fibrosis (IPF; 13%).

Measurements and Main Results

Clonal somatic mutational burden was associated with reduced lung function in both COPD and IPF. We identified an increased prevalence of clonal somatic mutations in individuals with IPF compared with normal control subjects and individuals with COPD independent of age and smoking status. IPF clonal somatic mutations were enriched in disease-related and airway epithelial–expressed genes such as MUC5B in IPF. Patients who were MUC5B risk variant carriers had increased odds of developing somatic mutations of MUC5B that were explained by increased expression of MUC5B.

Conclusions

Our identification of an increased prevalence of clonal somatic mutation in diseased lung that correlates with airway epithelial gene expression and disease severity highlights for the first time the role of somatic mutational processes in lung disease genetics.

Keywords: somatic mutation, chronic obstructive pulmonary disease, idiopathic pulmonary fibrosis


At a Glance Commentary

Scientific Knowledge on the Subject

Somatic mutations have been proposed as contributors to chronic lung diseases since microsatellite instability was found in patients with chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis more than 20 years ago, but the prevalence and significance of somatic mutations in benign lung diseases have not been studied.

What This Study Adds to the Field

In this study we profiled high-frequency clonal somatic mutations from lung tissues. The results highlight that acquired somatic mutations are an important axis of genetic variation that may contribute to chronic lung disease pathogenesis.

Lung tissue has one of the highest somatic mutation rates across normal tissues (1). The contribution of the somatic mutations from tobacco smoking leading to lung cancer is well established, and studies have shown that tobacco smoking increases the mutational burden of normal human bronchial epithelial cells (2, 3). However, although acquired somatic mutations have been proposed as contributors to disease pathogenesis (46), the relationship of mutational burden to lung function and disease has not been well studied. Unlike somatic mutations in cancer that drive large clonal expansions in proliferating cells, identifying somatic mutations in nonmalignant tissues can be challenging because of lower mutation rates and a very low frequency of somatic variants in polyclonal tissue (7). Recent advances in sequencing technology and enhanced somatic variant calling have enabled the detection of somatic mutations with increased sensitivity, revealing somatic mutations to be common in both health and diseases (8).

We hypothesized that lung tissue somatic mutational burden would be increased in chronic lung diseases associated with tobacco smoking, such as chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF). To test this hypothesis, we leveraged the fact that high-frequency clonal somatic mutations in DNA can be detected from the corresponding RNA, as the somatic mutation is expressed sufficiently. We analyzed large-scale lung tissue RNA sequencing (RNA-seq) data and paired whole-genome sequencing data to examine the patterns of clonal somatic single nucleotide variants (sSNVs) from lung specimens in 1,251 subjects in the Lung Tissue Research Consortium (LTRC) from the Trans-Omics for Precision Medicine program (9). Some of the results of these studies have been previously reported in preprint form (https://www.medrxiv.org/content/10.1101/2023.03.03.23286771v1).

Methods

LTRC Lung Tissue

Lung tissue samples were obtained through the NHLBI-sponsored LTRC using a standardized protocol that was described in the original study design. Details regarding subject recruitment were published previously (10). Briefly, LTRC lung tissue samples were collected on well-characterized individuals who underwent clinically indicated thoracic surgery, including lung transplantation, lung volume reduction surgery for emphysema, or pulmonary nodule and cancer resection. Lung tissue samples were isolated from regions without evidence of possible lung cancer and reviewed at the LTRC pathology core.

Institutional review boards approved the study at all participating institutions, and all subjects provided written informed consent per LTRC protocol. Lung tissue samples from the LTRC were sequenced at the University of Washington, Northwest Genomics Center, during phase 4 of the Trans-Omics for Precision Medicine program. See the online supplement for additional details.

Somatic Mutation Calling

We applied the following steps of the RNA-MuTect pipeline (1) to identify somatic variants in RNA-seq data with paired whole-genome sequencing samples: applying RNA-MuTect to Spliced Transcripts Alignment to a Reference–aligned RNA-seq binary alignment map (BAM) files with the ALLOW_N_CIGAR_READS flag. The inputs of the step mentioned were the HG38 reference genome and bed file, as well as the tumor BAM and BAM index files of each patient. The output resulted in a mutation annotation format file with a list of somatic variants for each patient sample. The variants were then converted to variant call format using the maf2vcf.pl script and further annotated using SnpEff to determine the gene for each variant. The output of RNA-MuTect generated a median allele frequency (MAF) for each variant sample, which was further filtered out to only retain those with 10–30%, to identify somatic mutations and exclude any germline mutations. Mutations that were observed in more than 10 individuals and 77 individuals who had a total number of mutations outside the interquartile range were removed from all analyses. We excluded mutations mapped to noncoding regions (introns, intergenic regions, noncoding RNAs) and regions prone to alignment errors (pseudogenes, IgG gene, HLA genes) unless otherwise specified.

Statistical Analysis

Phenotype and cellular composition association analyses were performed by fitting multivariable linear regression models. The total number of sSNVs was natural log transformed when modeled as a dependent variable. P values are uncorrected unless adjusted P values were necessary for significance after multiple hypothesis testing was specified (genome-wide gene-level significance testing, post hoc comparison for cellular proportions and cancer driver genes). Statistical analyses were performed using R 4.0.3 (http://www.r-project.org). See the online supplement for cell-type deconvolution, sensitivity analysis, gene-level analysis, and mutational signature analyses.

Data Availability

Data are available on the National Center for Biotechnology Information database of Genotypes and Phenotypes (accession number phs001662).

Results

Clonal Somatic Mutation Calling from Lung RNA-Seq

We applied the RNA-MuTect pipeline (1) to lung tissue RNA-seq data to identify clonal sSNVs. RNA-MuTect has been validated across multiple tissue types, including the lung, to detect clonal DNA mutations with high allele fractions (>7%) in RNA (1). To avoid misclassification of germline genetic mutations as somatic mutations from the RNA-seq data, we excluded germline mutations identified using whole-genome DNA sequencing from paired blood samples and common variants with MAFs greater than 30%. To avoid recurrent sequencing artifacts, we removed mutations that were observed in more than 10 individuals. To ensure that we had high confidence in the somatic mutations, we removed variants with MAFs < 10% and variants with fewer than two supporting reads. We also removed individuals who had a total number of mutations outside the interquartile range, resulting in 113,657 high-confidence coding region mutations in 1,251 individuals (Figure 1). The majority of subjects (65%) had chronic lung diseases; the most common of these were COPD and IPF (see Table E1 in the online supplement). Four hundred seventeen (33%) subjects had lung cancer at the time of surgery. A separate analysis was performed in subgroups with smoking-associated chronic lung diseases, COPD (n = 358), IPF (n = 163), and normal control subjects (n = 29), the latter three groups confirmed by histopathology and lung function tests (Figure 1; see Table E2).

Figure 1.


Figure 1.

Study design. From 1,341 unique subjects with paired WGS and bulk lung RNAseq, data were processed using the RNA-MuTect pipeline for clonal somatic single-nucleotide variant calling. After filtering steps, 1,251 unique subjects with 113,657 variants were analyzed, with a subgroup of normal control subjects, patients with COPD, and patients with IPF with 51,102 variants. COPD = chronic obstructive pulmonary disease; Ig = immunoglobulin; IPF = idiopathic pulmonary fibrosis; MAF = major allele frequency; RNAseq = RNA sequencing; WGS = whole-genome sequencing. Reprinted by permission from Reference 11.

Among 1,251 subjects, missense mutations (52%) were more common than silent mutations (47%). The most common sSNV classes were C>T (47%) and T>C (26%) mutations. The median number of sSNVs per sample was 87 (46 excluding silent mutation) (see Figure E1).

Effects of Age and Smoking on Clonal Somatic Mutations

Somatic mutation frequency linearly increases with age in normal human bronchial epithelial cells and at a higher rate in smokers (3). In our data, we did not find a significant overall association between clonal mutational burden (total number of sSNVs) and chronological age (Figures 2A), although a trend toward positive association with age was observed in the control group (Figures 2B and E2). Similarly, we found a weak inverse association with cumulative smoking (τ = −0.066; P = 0.001). However, we found that the overall clonal mutational burden was significantly elevated in patients with IPF compared with control subjects (see Table E2) and compared with published data on normal lung tissue using the same method (1). LTRC subjects overall have high cumulative tobacco smoke exposure, in the range in which the mutation frequencies level off (3). Thus, we postulate that because of the high baseline mutational burden in the diseased tissue and the significant smoking history across the cohort, age and cumulative smoking history do not correlate with mutational burden, as has been shown in populations without such constraints (3).

Figure 2.


Figure 2.

Mutational burden is inversely associated with lung function in chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF). (A and B) Lack of association between total number of SNVs and age in all subjects (A) and in subgroups of normal control subjects, patients with COPD, and patients with IPF (B) using Kendall rank correlation. (C and D) Lung function (FEV1) is inversely associated with age in all subjects (C) and in patients with COPD and those with IPF (D). Kendall rank correlation coefficients are shown. SNV = single-nucleotide variant. Reprinted by permission from Reference 11.

Lung Function and Clonal Somatic Mutational Burden

We next investigated whether clonal mutational burden was associated with metrics of disease severity. Lung function, measured using parameters such as FEV1 and FVC, is a physiologic measure that assesses the integrated function of the respiratory system. Lung function decline normally begins after 35 years of age and accelerates with smoking or lung disease processes (12), reflecting disease severity. We detected a strong inverse association between mutational burden and lung function, as assessed using both FEV1 (τ = −0.1; P = 1.5 × 10−7) and FVC (τ = −0.16; P < 2.2 × 10−16) (Figures 2C and 2D and E3A–E3D). In a multivariable linear regression model to control for known determinants of lung function, including age, sex, height, race, and smoking history, we found that lung somatic mutational burden remained a statistically significant predictor of lung function in all subjects, as well as in the disease subgroups (Tables 1 and E3).

Table 1.

Lung Function (FEV1) Is Associated with Lung Clonal Somatic Mutational Burden

  All (n = 1,145)
COPD (n = 335)
IPF (n = 146)
Outcome: FEV1 β (95% CI) P Value β (95% CI) P Value β (95% CI) P Value
Total sSNVs* −0.15 (−0.20 to −0.10) <3 × 10−8 −0.12 (−0.20 to −0.04) 0.003 −0.20 (−0.29 to −0.11) 4 × 10−5
Age* −0.065 (−0.11 to −0.02) 0.0056 0.11 (0.05 to 0.18) 0.001 0.08 (−0.008 to 0.16) 0.07
Male sex 0.24 (0.11 to 0.37) 0.0003 0.12 (−0.08 to 0.31) 0.25 0.35 (0.08 to 0.61) 0.01
Race            
 African American 0.16 (−0.22 to 0.55) 0.40 0.19 (−0.37 to 0.75) 0.50 0.01 (−1.1 to 1.1) 0.98
 Hispanic 0.06 (−0.24 to 0.36) 0.70 −0.45 (−0.97 to 0.06) 0.08 0.049 (−0.42 to 0.52) 0.84
 Other 0.019 (−0.48 to 0.52) 0.94 0.07 (−0.74 to 0.88) 0.86 0.02 (−0.58 to 0.61) 0.95
Height* 0.28 (0.21 to 0.34) <2 × 10−16 0.27 (0.17 to 0.36) <1 × 10−7 0.20 (0.085 to 0.32) 0.001
Smoking* −0.21(−0.26 to −0.17) <2 × 10−16 −0.01 (−0.07 to 0.05) 0.85 0.02 (−0.065 to 0.1) 0.65

Definition of abbreviations: CI = confidence interval; COPD = chronic obstructive pulmonary disease; IPF = idiopathic pulmonary fibrosis; sSNV = somatic single-nucleotide variant. Reprinted by permission from Reference 11.

Multivariable linear regression was conducted for FEV1; covariates are specified in the table.

*

Continuous variables are standardized.

We also found the FEV1:FVC ratio, an indicator of airflow obstruction, to be inversely associated with COPD mutational burden. DlCO was inversely associated with both COPD and IPF mutational burden (Figures E3C–E3F).

Disease-Specific Clonal Somatic Mutations

We compared clonal mutational burdens in control subjects, individuals with COPD, and individuals with IPF. In both univariate and multivariate analyses adjusted for age, sex, race, smoking history, and sequencing depth, we found clonal somatic mutation burden to be significantly increased in subjects with IPF compared with those with COPD and control subjects (Figure 3A; see Table E4). To determine whether specific genes are more frequently clonally mutated in disease conditions, we performed Fisher exact tests on all genes between COPD and IPF. We identified 18 genes that were clonally mutated in IPF compared with COPD at a false discovery rate (FDR) <0.05, which included IPF-relevant genes such as MUC4, MUC5B, and AHNAK2 (Figure 3B). These 18 genes were significantly enriched for lung ciliated cells (MUC4, AHNDAK2, SPAG17, DNAH11, CFAP74, and GRIN3B) and goblet cell marker genes (MUC4, MUC16, and MUC5B) from the Molecular Signatures Database (FDRs of 0.007 and 0.04, respectively). There was no difference between the control and COPD groups, and only MUC4 was differentially mutated between the control and IPF groups, which could be due to limited power with a small number of normal control subjects.

Figure 3.


Figure 3.

Clonal somatic mutational pattern in chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF). (A) Total number of clonal somatic SNVs by group. IPF lungs have increased mutational burden compared with control and COPD lungs. Pairwise comparisons were made using the Wilcoxon rank sum test, and P values were adjusted using the Bonferroni method. (B) Genes enriched for clonal somatic mutations in IPF compared with COPD. The forest plot depicts differentially mutated genes between COPD and IPF using the Fisher exact test with a false discovery rate <0.05. Bars indicate 95% confidence interval for the log odds ratio. Adj = adjusted; CI = confidence interval; Inf = infinite; OR = odds ratio; SNV = single-nucleotide variant.

Cell Type Associations with Clonal Somatic Mutational Burden

Lung tissue consists of heterogeneous cell types, and distinct disease processes can affect cell types differently. COPD is characterized by infiltration of inflammatory cells and loss of alveolar epithelial cells (emphysema) (13), while IPF has progressive interstitial fibrosis with profibrotic macrophages and aberrant basal-like cells (14, 15). We estimated the relative abundance of these cell types in each sample with the RNA expression data and examined how the cellular composition in the diseased lung tissue was associated with clonal mutational burden. The relative abundances of airway epithelial cells, alveolar epithelial cells, stromal cells, immune cells, and endothelial cells were quantified by deconvolution of RNA-seq data using Bisque (16), as previously described (17) (Figures 4A and 4B), and pathologic aberrant basaloid cells were separately deconvoluted with Bisque (16) using a published IPF single-cell RNA-seq reference dataset (see Table E5) (14). The clonal mutational burden was positively correlated with the proportion of airway epithelial cells and inversely correlated with the proportion of alveolar epithelial cells (Figure 4C–4E). The mutational burden was not significantly associated with stromal cells but was significantly associated with immune and endothelial cells in the IPF samples, possibly because of cellular proportion changes associated with IPF (see Figure E4). It is also notable that aberrant basaloid cells, most commonly found in IPF (14), are captured within airway epithelial cells, and their abundance correlated with mutational burden in both COPD and IPF (Figure 4F). In multivariable regression analysis adjusted for age, sex, race, smoking history, and total mapped reads, both airway-to-alveolar epithelial ratio and lung function (FEV1, FVC, and DlCO) were statistically significantly and independently associated with clonal mutational burden, whereas FEV1:FVC ratio was associated clonal mutational burden only in COPD (Tables 2 and E6–E8).

Figure 4.


Figure 4.

Mutational burden is associated with the proportion of high-turnover cell types. (A) Relative abundance of deconvoluted cell types from Bisque. ***Adjusted P < 0.001 (t test) compared with control subjects. (B) Schematic of airway and alveolar epithelial cell types. (C–F) Correlation of somatic mutational burden and deconvoluted cell types. Aberrant basaloid cell was separately deconvoluted and shows relative abundance, not absolute proportions. Kendall rank correlation coefficients are shown. ATI = alveolar type 1 epithelial cells; ATII = alveolar type 2 epithelial cells; COPD = chronic obstructive pulmonary disease; epi = epithelial; IPF = idiopathic pulmonary fibrosis; SNV = single-nucleotide variant. Reprinted by permission from Reference 11.

Table 2.

Lung Somatic Mutation Burden Is Associated with Lung Function (FEV1) and Airway-to-Alveolar Epithelial Ratio

  All (n = 1,143)
COPD (n = 333)
IPF (n = 146)
Outcome: Total sSNV β (95% CI) P Value β (95% CI) P Value β (95% CI) P Value
Airway:alveolar ratio* 0.16 (0.14 to 0.17) <2 × 10−16 0.04 (0.006 to 0.07) 0.018 0.20 (0.16 to 0.25) 1.4 × 10−15
FEV1* −0.04 (−0.06 to −0.03) 1.9 × 10−6 −0.05 (−0.09 to −0.02) 0.002 −0.07 (−0.12 to −0.02) 0.003
Age* −0.004 (−0.02 to 0.01) 0.62 −0.005 (−0.04 to 0.02) 0.1 0.003 (−0.04 to 0.04) 0.87
Male sex −0.02 (−0.06 to 0.01) 0.21 −0.05 (−0.12 to 0.02) 0.14 −0.01 (−0.11 to 0.09) 0.9
Race            
 African American 1.06 (0.94 to 1.19) <2 × 10−16 1.11 (0.88 to 1.34) <2 × 10−16 0.83 (0.6 to 1.3) 0.001
 Hispanic 0.18 (0.07 to 0.29) 0.01 0.07 (−0.19 to 0.3) 0.66 0.19 (−0.02 to 0.41) 0.08
 Other 0.16 (−0.02 to 0.34) 0.08 −0.28 (−0.68 to 0.12) 0.17 0.05 (−0.24 to 0.3) 0.74
Smoking* −0.02 (−0.04 to −0.003) 0.02 −0.01 (−0.05 to 0.02) 0.4 0.01 (−0.03 to 0.05) 0.3
Somatic coverage* −0.006 (−0.02 to 0.01) 0.47 −0.005 (−0.04 to 0.03) 0.77 −0.03 (−0.07 to 0.01) 0.14

Definition of abbreviations: CI = confidence interval; COPD = chronic obstructive pulmonary disease; IPF = idiopathic pulmonary fibrosis; sSNV = somatic single-nucleotide variant.

The number of total sSNVs was log transformed. Reprinted by permission from Reference 11.

*

Continuous variables are standardized.

Association between Germline and Somatic MUC5B Variants

As one of the top clonally mutated genes in IPF was MUC5B, we next examined the relationship between germline MUC5B promoter variant (rs35705950) and somatic mutation of MUC5B. IPF diagnosis was associated with increased odds of developing clonal somatic mutations in MUC5B (odds ratio, 5.4 [95% confidence interval (CI), 3.05–9.87]; see Table E9A, left panel). For MUC5B promoter variant carriers, the odds of having MUC5B clonal somatic mutation increased to 2.22 (95% CI, 1.35–3.67; see Table E9B, left panel). The clonal somatic mutations in MUC5B occurred throughout the MUC5B gene, without a specific hotspot regardless of IPF disease status or MUC5B promoter variant status (see Figure E5). The increase in MUC5B clonal somatic mutations in IPF was fully explained by the possible abundance of airway epithelial cells, as these significant associations were no longer observed after adjusting for MUC5B gene expression or the deconvoluted airway epithelial proportions (see Table E9A, middle and right panels, and Table E9B, middle panel). Increased MUC5B somatic mutations associated with the MUC5B promoter variant were also explained by the increased MUC5B gene expression associated with that promoter variant (see Table E9B, right panel). Finally, we calculated the ratio of nonsynonymous to synonymous mutations (dN/dS) to test whether MUC5B is under positive selection in IPF. As nonsynonymous mutations are putatively under selection while synonymous mutations are likely neutral, a dN/dS ratio of greater than 1 identifies genes presumably under positive selection. For MUC5B, the dN/DS ratio was 0.65 (95% CI, 0.5–0.9; FDR = 0.04), suggesting that it is not under positive selection. In comparison, MUC4, the top mutated gene in IPF, was suggestive of positive selection, with a dN/dS ratio of 1.27 (95% CI, 1.5–1.8; FDR = 1.6 × 10−8).

Sensitivity Analysis

We observe that the most commonly mutated genes encoded for mucin proteins, which have elevated expression in patients with IPF and advanced COPD (18), are also frequently mutated in public exomes (frequently mutated genes [FLAGS]) (19). To rule out the possibility that differential gene expression, and thus the sensitivity of detection by RNA-seq, was the primary driver of the association among mutational burden, lung function, disease status, and cell-type proportions, we performed two sensitivity analyses. First, we excluded the top 20 FLAGS (see Table E10) and confirmed that the relationship among mutational burden, cell types, and lung function held (see Figure E6). Second, given the number of sSNVs after normalizing by the degree of length-scaled gene expression, the mutational burden remains elevated in subjects with IPF (see Figure E7).

Finally, to examine the somatic mutational burden in different patient cohorts, we separately analyzed the four different study sites from the LTRC among samples that had study site information (283 of 545 disease subgroup subjects). We found the association among lung function, cell type, and somatic mutational burden to be consistent with the analysis of the combined cohort (see Figure E8).

Somatic Mutations of Cancer Driver Genes

In addition to sharing the risk factor of tobacco smoking, patients with COPD and those with IPF have an increased risk of developing lung cancer, independent of smoking exposure (20, 21). To determine whether clonal somatic mutations in chronic lung diseases are enriched for known lung cancer and cancer driver genes (see File E1) (2), we compared the lung somatic mutational patterns by history of lung cancer, smoking history, and disease states. Overall, clonal somatic mutations in lung cancer driver genes were identified in 707 (56%) subjects. There was no significant enrichment of clonal somatic mutations in known lung cancer driver genes in subgroups of individuals with cancer identified at the time of lung tissue collection or in a stratum of smoking history (see Figures E9A and E9B). COPD and IPF samples had a trend toward higher proportions of cancer driver gene mutations within individual samples (see Figure E9C; median proportions: control, 0; COPD, 1.03; IPF, 0.77; sample-level P = 0.013), as well as higher proportions of samples with any cancer driver gene compared with normal control subjects (see Figure E9D; control, 34%; COPD, 59%; IPF, 55%; population-level P = 0.03), although only COPD had significantly higher cancer driver gene mutations after multiple testing adjustment (sample-level adjusted P = 0.09, population-level adjusted P = 0.049 compared with control subjects) (see Figure E9). A history of cancer was not significantly associated with the frequency of cancer driver gene mutations in COPD.

Mutational Signature Analysis

Next, we examined whether there are mutational signatures associated with aging, smoking, or lung cancer in our data. Mutational signatures represent specific patterns of mutagenesis generated by different mutational processes, such as mutagen exposures or defective DNA repair (22). We identified two single-base substitution signatures (SBSs) from the entire set of subjects that are similar to SBS 6 (defective DNA mismatch repair) and SBS 25 (chemotherapy treatment). The normal control subjects and patients with COPD had similar signature profiles, while patients with IPF had an additional signature related to clock-like signatures SBS 1 and SBS 5 (Figure E10).

Discussion

Using large-scale sequencing data with detailed clinical and pathologic phenotyping, we found that lung samples from patients with IPF harbor increased clonal mutations compared with lung samples from normal control subjects or patients with COPD. Although the detection of somatic variants from RNA-seq data is limited to high-frequency, macroscopic clones, by comparing different disease groups, we identified pervasive and distinct mutational patterns among chronic lung diseases and normal lungs. Our observations permit several conclusions.

First, we found a relationship between clonal somatic mutation and reduced lung function. The independent association of lung function and mutational burden, despite the differences in cell types, composition, and transcript levels, implies that the function of the organ, the proximal correlate of disease severity, is closely related to the somatic mutational burden. A stronger association was found between mutational burden and disease-relevant measures than nonrelevant measures; for example, in COPD, mutational burden was associated with airflow obstruction (FEV1:FVC ratio) but not with lung capacity (FVC), whereas IPF mutational burden was strongly associated with lung capacity (FVC) and DlCO but not with FEV1:FVC ratio.

Second, we found distinct mutational patterns in IPF that suggest increased somatic clonal expansions of airway epithelial cells and an age-related signature. Across all samples and in both COPD and IPF lungs, clonal mutational burden correlated with the relative abundance of airway epithelial cells and pathologic aberrant basaloid cells inferred from deconvolution. However, this relationship was strongest in IPF. In IPF lungs, clonal somatic mutations were enriched in airway epithelial cell marker genes. Furthermore, enrichment of MUC5B clonal mutation in IPF was fully explained by airway epithelial cell abundance as well as MUC5B gene expression. The dN/dS ratio in IPF putatively suggests MUC4, another airway epithelial–expressing gene, to be a driver of positive selection.

The global association between suggested airway epithelial abundance and clonal mutational burden can be explained by cellular proliferation rate. Clonal expansions (i.e., accumulation of somatic mutations) reflect cumulative DNA damage. As the number of cell divisions is one of the major causes of base substitution mutations from replication errors, we speculate that the abundance of high-turnover cells, such as airway epithelial cells that include airway stem cells or cells undergoing aberrant repair, tracks with clonal mutational burden (Figure 4).

Clonal expansion of airway epithelial cells can also be explained by the cellular response in the diseased tissue. Notably, airway epithelial genes mutated in IPF lungs include genes that are known to be important in IPF pathogenesis. MUC4, MUC16, and AHNAK2 interact with the TGFβ1 pathway to induce fibrosis and epithelial-to-mesenchymal transition (23, 24). MUC5B is overexpressed in IPF lungs and with the MUC5B promoter variant that impairs mucociliary clearance (25). Enrichment of somatic mutations in disease-causing genes and pathways is found in other nonmalignant diseases. In ulcerative colitis, somatic mutations are enriched and positively selected for IL-17 pathway genes, which confer resistance to the cells from IL-17A–mediated cytotoxicity (26). In a similar fashion, somatic clones contributing to IPF pathogenesis may also have a selection advantage that will perpetuate the disease. The IPF lungs also had a mutational signature associated with a clock-like signature, underscoring its association with aging. The role of somatic mutations in the pathogenesis of COPD and IPF has long been hypothesized, as both diseases are strongly associated with tobacco smoking, and evidence of DNA damage such as microsatellite instability and loss of heterozygosity has been observed (46, 27). Whether these somatic variants directly drive disease pathology remains to be determined given our cross-sectional design. However, even if these cells with clonal mutations arise through a selective advantage in the context of injured and inflamed tissue, their correlation with pathologic cell types and disease severity, an increased proportion of mutations of disease-relevant genes in IPF, and cancer driver gene mutations in COPD suggest that somatic mutations may contribute to disease progression, prognosis, and the risk of future malignancy, which is elevated in COPD and IPF independent of age and tobacco smoking and correlates with more severe disease (28).

Our findings are similar to somatic mutational patterns reported in liver cirrhosis, in which the number of somatic mutations correlated with known biomarkers of liver function but not with age or smoking status (29). Future investigations on the association of somatic mutation frequency with the function of the organs in other disease contexts, such as glomerular filtration rate in the kidney or ejection fraction of the heart, would be helpful to understand the global implications of somatic mutations in chronic nonmalignant diseases. Our study is limited by its cross-sectional design, limited racial and ethnic diversity in the study population, lack of validation samples, and analysis restricted to high-frequency clonal mutations from RNA-seq data. We found that self-reported African American race is consistently associated with increased somatic mutational burden. As the African American population accounted for only 2% of all subjects analyzed, studies of somatic mutations in racially diverse cohorts in the future are warranted to generalize this finding. The sensitivity of detecting somatic mutation from RNA-seq depends on the depth of sequencing, the expression amount of the genes, and the clonal diversity (1). Gene expression differences across samples will not affect mutational burden or mutational signature analysis, as we adjusted for sequencing depths, but residual confounding by gene expression may be present for gene-level enrichment analysis. Therefore, we performed a sensitivity analysis to investigate the effects of gene length and gene expression. We also acknowledge that deconvolution is a statistical inference that does not validate cellular proportions, but enrichment of clonal mutations in cell type–specific genes also suggests that airway epithelial cells are associated with clonal mutations.

Conclusions

As one of the first demonstrations of somatic mutations in lung diseases and their consequences, our study opens questions and areas for future study. What drives and selects the somatic mutations? What are the molecular consequences of somatic mutations? Is there a relationship between germline genetics and the risk of developing somatic mutations? Functional studies interrogating the impact of mutated genes would be required to determine the exact role of somatic mutations in COPD and IPF. Finally, the evaluation of somatic mutations from anatomically resolved tissue or multiple different single cell types would further advance our understanding of how somatic mutations evolve and contribute to the pathology of chronic respiratory diseases, resulting from the complex interplay of genetic susceptibility, environmental exposures, and aging.

Footnotes

Supported by NHLBI grant K08HL146972 and an Eleanor and Miles Shore Faulty Development Award at Harvard Medical School (J.H.Y.); NIH grant R01 HL162813 (B.D.H.); NIH grants R01HL124233 and R01HL147326 (P.J.C.); NIH grant P01HL114501 (C.P.H.); NIH grants P01Hl114501, U01HL089856, R01Hl133135, R01HL147148, and R01HL152728 (E.K.S.); NIH grants R01HG011393 and R21HL156122 (D.D.); NIH grants R01HL149861, R01HL153248, and R01HL147148 (M.H.C.); and National Institute on Aging grant DP5 OD029586, a Burroughs Wellcome Fund Career Award for Medical Scientists, and a Pew-Stewart Scholar for Cancer Research award, supported by the Pew Charitable Trusts and the Alexander and Margaret Stewart Trust (A.G.B.).

Author Contributions: J.H.Y., M.H.C., and A.G.B. designed the study and wrote the manuscript. M.W.K. and A.G.B. performed somatic variant calling analysis. J.H.Y. performed mutation profiling and phenotype association analysis. M.H.C. performed germline variant calling analysis. J.H.Y. and A.G. performed deconvolution analysis. J.H.Y. and B.D.H. performed phenotype data curation. P.J.C. and C.P.H. performed RNA sequencing data curation. P.G.M. contributed to the interpretation of phenotype association analysis. C.D.C., F.S., L.B., A.H.L., K.F., G.J.C., K.B., R.W., F.M., E.K.S., and D.D. generated data. All authors reviewed the results and approved the final version of the manuscript.

This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org.

Originally Published in Press as DOI: 10.1164/rccm.202303-0395OC on October 3, 2023

Author disclosures are available with the text of this article at www.atsjournals.org.

References

  • 1. Yizhak K, Aguet F, Kim J, Hess JM, Kübler K, Grimsby J, et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science . 2019;364:eaaw0726. doi: 10.1126/science.aaw0726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Yoshida K, Gowers KHC, Lee-Six H, Chandrasekharan DP, Coorens T, Maughan EF, et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature . 2020;578:266–272. doi: 10.1038/s41586-020-1961-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Huang Z, Sun S, Lee M, Maslov AY, Shi M, Waldman S, et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat Genet . 2022;54:492–498. doi: 10.1038/s41588-022-01035-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Anderson GP, Bozinovski S. Acquired somatic mutations in the molecular pathogenesis of COPD. Trends Pharmacol Sci . 2003;24:71–76. doi: 10.1016/S0165-6147(02)00052-4. [DOI] [PubMed] [Google Scholar]
  • 5. Vassilakis DA, Sourvinos G, Spandidos DA, Siafakas NM, Bouros D. Frequent genetic alterations at the microsatellite level in cytologic sputum samples of patients with idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2000;162:1115–1119. doi: 10.1164/ajrccm.162.3.9911119. [DOI] [PubMed] [Google Scholar]
  • 6. Mori M, Kida H, Morishita H, Goya S, Matsuoka H, Arai T, et al. Microsatellite instability in transforming growth factor-beta 1 type II receptor gene in alveolar lining epithelial cells of idiopathic pulmonary fibrosis. Am J Respir Cell Mol Biol . 2001;24:398–404. doi: 10.1165/ajrcmb.24.4.4206. [DOI] [PubMed] [Google Scholar]
  • 7. Dou Y, Gold HD, Luquette LJ, Park PJ. Detecting somatic mutations in normal cells. Trends Genet . 2018;34:545–557. doi: 10.1016/j.tig.2018.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Mustjoki S, Young NS. Somatic mutations in “benign” disease. N Engl J Med . 2021;384:2039–2052. doi: 10.1056/NEJMra2101920. [DOI] [PubMed] [Google Scholar]
  • 9.National Heart, Lung, and Blood Institute. 2022. https://topmed.nhlbi.nih.gov
  • 10. Yang IV, Pedersen BS, Rabinovich E, Hennessy CE, Davidson EJ, Murphy E, et al. Relationship of DNA methylation and gene expression in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med . 2014;190:1263–1272. doi: 10.1164/rccm.201408-1452OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yun JH, Wasay Khan MA, Ghosh A, Hobbs BD, Castaldi PJ, Hersh CP, et al. Somatic mutations in chronic lung disease are associated with reduced lung function [preprint]. medRxiv; 2023. [accessed 2023 Aug 12]: Available from https://www.medrxiv.org/content/10.1101/2023.03.03.23286771v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Oelsner EC, Balte PP, Bhatt SP, Cassano PA, Couper D, Folsom AR, et al. Lung function decline in former smokers and low-intensity current smokers: a secondary data analysis of the NHLBI Pooled Cohorts Study. Lancet Respir Med . 2020;8:34–44. doi: 10.1016/S2213-2600(19)30276-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Agustí A, Hogg JC. Update on the pathogenesis of chronic obstructive pulmonary disease. N Engl J Med . 2019;381:1248–1256. doi: 10.1056/NEJMra1900475. [DOI] [PubMed] [Google Scholar]
  • 14. Adams TS, Schupp JC, Poli S, Ayaub EA, Neumark N, Ahangari F, et al. Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis. Sci Adv . 2020;6:eaba1983. doi: 10.1126/sciadv.aba1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Habermann AC, Gutierrez AJ, Bui LT, Yahn SL, Winters NI, Calvi CL, et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci Adv . 2020;6:eaba1972. doi: 10.1126/sciadv.aba1972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Jew B, Alvarez M, Rahmani E, Miao Z, Ko A, Garske KM, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun . 2020;11:1971. doi: 10.1038/s41467-020-15816-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Ghosh AJ, Hobbs BD, Yun JH, Saferali A, Moll M, Xu Z, et al. NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium Lung tissue shows divergent gene expression between chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. Respir Res . 2022;23:97. doi: 10.1186/s12931-022-02013-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Denneny E, Sahota J, Beatson R, Thornton D, Burchell J, Porter J. Mucins and their receptors in chronic lung disease. Clin Transl Immunology . 2020;9:e01120. doi: 10.1002/cti2.1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJ, van Karnebeek C, Wasserman WW. FLAGS, frequently mutated genes in public exomes. BMC Med Genomics . 2014;7:64. doi: 10.1186/s12920-014-0064-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Young RP, Hopkins RJ, Christmas T, Black PN, Metcalf P, Gamble GD. COPD prevalence is increased in lung cancer, independent of age, sex and smoking history. Eur Respir J . 2009;34:380–386. doi: 10.1183/09031936.00144208. [DOI] [PubMed] [Google Scholar]
  • 21. Hubbard R, Venn A, Lewis S, Britton J. Lung cancer and cryptogenic fibrosing alveolitis: a population-based cohort study. Am J Respir Crit Care Med . 2000;161:5–8. doi: 10.1164/ajrccm.161.1.9906062. [DOI] [PubMed] [Google Scholar]
  • 22. Koh G, Degasperi A, Zou X, Momen S, Nik-Zainal S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat Rev Cancer . 2021;21:619–637. doi: 10.1038/s41568-021-00377-7. [DOI] [PubMed] [Google Scholar]
  • 23. Ballester B, Milara J, Cortijo J. Mucins as a new frontier in pulmonary fibrosis. J Clin Med . 2019;8:1447. doi: 10.3390/jcm8091447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Zhu D, Zhang Q, Li Q, Wang G, Guo Z. Inhibition of AHNAK nucleoprotein 2 alleviates pulmonary fibrosis by downregulating the TGF-β1/Smad3 signaling pathway. J Gene Med . 2022;24:e3442. doi: 10.1002/jgm.3442. [DOI] [PubMed] [Google Scholar]
  • 25. Hancock LA, Hennessy CE, Solomon GM, Dobrinskikh E, Estrella A, Hara N, et al. Muc5b overexpression causes mucociliary dysfunction and enhances lung fibrosis in mice. Nat Commun . 2018;9:5363. doi: 10.1038/s41467-018-07768-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Nanki K, Fujii M, Shimokawa M, Matano M, Nishikori S, Date S, et al. Somatic inflammatory gene mutations in human ulcerative colitis epithelium. Nature . 2020;577:254–259. doi: 10.1038/s41586-019-1844-5. [DOI] [PubMed] [Google Scholar]
  • 27. Siafakas NM, Tzortzaki EG, Sourvinos G, Bouros D, Tzanakis N, Kafatos A, et al. Microsatellite DNA instability in COPD. Chest . 1999;116:47–51. doi: 10.1378/chest.116.1.47. [DOI] [PubMed] [Google Scholar]
  • 28. Calabrò E, Randi G, La Vecchia C, Sverzellati N, Marchianò A, Villani M, et al. Lung function predicts lung cancer risk in smokers: a tool for targeting screening programmes. Eur Respir J . 2010;35:146–151. doi: 10.1183/09031936.00049909. [DOI] [PubMed] [Google Scholar]
  • 29. Zhu M, Lu T, Jia Y, Luo X, Gopal P, Li L, et al. Somatic mutations increase hepatic clonal fitness and regeneration in chronic liver disease. Cell . 2019;177:608–621.e12. doi: 10.1016/j.cell.2019.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data are available on the National Center for Biotechnology Information database of Genotypes and Phenotypes (accession number phs001662).


Articles from American Journal of Respiratory and Critical Care Medicine are provided here courtesy of American Thoracic Society

RESOURCES