Abstract
Background
Whole genome sequence (WGS) data in multi-ancestry samples supports discovery of low-frequency or population-specific genetic variants associated with chronic obstructive pulmonary disease (COPD) and lung function.
Results
We performed single variant, structural variant, and gene-based analysis of pulmonary function (FEV1, FVC and FEV1/FVC) and COPD case–control status in 44,287 multi-ancestry participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program. We validated findings using the UK Biobank and assessed implicated genes using lung single-cell RNA-seq (scRNA-seq) data sets. Applying a genome-wide significance threshold (P < 5 × 10–9), we replicated known loci and identified novel associations near LY86, MAGI1, GRK7, and LINC02668. Colocalization with gene expression quantitative trait loci (eQTL) from the Lung Tissue Research Consortium highlighted known candidate genes including ADAM19, THSD4, C4B, and PSMA4, which were not identified through other eQTL sources. Multi-ancestry analysis improved fine-mapping resolution (e.g., HTR4 and RIN3). Gene-based analysis identified and replicated HMCN1. In human lung scRNA-seq data sets, lung epithelial cells and immune cell types showed enriched expression, while fibroblasts showed higher expression for HMCN1. CRISPR targeting HMCN1 in IMR90 demonstrated reduced expression of collagen genes.
Conclusions
Large-scale multi-ancestry WGS analysis improves variant discovery and fine-mapping resolution for lung function and COPD and highlights biologically relevant genes and pathways.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03921-y.
Keywords: Multi-ancestry GWAS, COPD, Lung function, Fine-mapping, Colocalization with molecular QTLs, Lung tissue single-cell RNA-seq validation, CRISPR knockdown
Background
Chronic respiratory diseases are leading causes of morbidity and mortality worldwide [1]. Chronic obstructive pulmonary disease (COPD) is defined by a reduction in forced expiratory volume to forced vital capacity ratio (FEV1/FVC), and reductions in FEV1 or FVC are associated with increased mortality from respiratory and non-respiratory causes [2–4]. Genetic variation is a major contributor to decrements in lung function and risk of COPD [5, 6], and genetic association studies have identified hundreds of related genetic loci [7], with a substantial overlap between studies of population-based lung function and COPD.
We previously published a whole genome sequence analysis of lung function and COPD for 19,996 multi-ancestry participants from TOPMed [8]. Our prior work was intended as a pilot for an eventual expanded effort, and was limited by its implementation using an earlier release of the TOPMed whole genome sequence data (TOPMed Freeze 5b). Importantly, only a subset of TOPMed participants and cohorts with measures of lung function and COPD had whole genome sequence data available at the time of our prior effort. Our current work represents whole genome sequence analysis of lung function and COPD using a substantially expanded set of participants from the NHLBI TOPMed whole genome sequencing program. Additionally, the current effort makes use of newly released transcriptome resources from the Lung Tissue Research Consortium (LTRC) for identification of molecular targets using integrative approaches, as well as leveraging state-of-the-art approaches for fine mapping and prioritization of cell types.
Most genetic studies of lung function and COPD to date have focused on common variants. Rare variants, though likely accounting for less of the heritability of these traits than common variants [5, 9], can identify pathobiology important for developing therapeutics and individual risk prediction. Deep sequencing allows identification of very rare variants, as well as comprehensive discovery of common variants and identification of structural variants. Additionally, multi-ancestry studies can reveal signals of genetic association not seen in more homogeneous populations and contribute new insights into known genetic loci. The largest published multi-ancestry GWAS involved 580,869 participants identified 1,020 independent loci for lung function and COPD [7], of these, 713 were novel signals. However, this largest multi-ancestry GWAS was conducted on imputed genotypes. To address these limitations, we leveraged the collaborative effort of deep whole genome sequencing and participants from diverse ancestry groups in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, we aimed to perform the largest sequencing-based multi-ancestry GWAS to date to comprehensively study the spectrum of variation and implicated genes for lung function and COPD in 44,287 individuals. We performed single variant, structural variant, and gene-based analysis for lung function and COPD. We identified novel variants by conditioning on previously reported variants and tested novel variants and genes for replication in the UK Biobank. For the identified single-variant associations, we performed fine-mapping and molecular cis-quantitative trait locus (QTL) colocalization analyses to determine the most likely causal variants and potential molecular drivers. Furthermore, we examined single-cell data sets to determine which cell types expressed selected candidate genes from the genetic analyses.
Results
Discovery of a novel locus and new conditionally significant variants associated with FEV1/FVC and moderate-to-severe COPD
Single variant-based association identified 881 genome-wide significant variants in 21 loci across all phenotypes and studies using P-value threshold of 5 × 10–9 based on prior guidelines for multi-ancestry sequencing based studies [10]. These variants contributed to a total of 1,560 significant variant-trait associations across all phenotypes and studies. While effect estimates were generally consistent across subgroups (Additional file 1: Fig. S1), we observed variation in direction and magnitude for a subset of loci (Additional file 1: Fig. S1 I, T, W, AA, AD, AF). However, the confidence intervals for these subgroup-specific estimates were wide and overlapping with the null, and none of the subgroup differences were statistically significant. Among these findings, we identified one novel locus with no previously reported variants within ± 1 Mb of rs572516809, intronic to LINC02668 on chromosome 10, associated with FEV1/FVC (MAF 0.04%, Beta for A allele = 8.23, P-value = 3.58 10–9) (Table 1). The remaining 880 variants in 20 loci were all located within ± 1 Mb of a previously reported variant. To determine whether these variants were novel, we conditioned on previously described GWAS (Additional file 2: Table S1). In total, 1,502 variant-trait associations involving one of these variants and falling within ± 1 Mb of a known association were tested in conditional analyses with Bonferroni correction and three additional lead variants were conditionally significant (Table 1). As a follow-up analysis, we assessed whether these signals were in linkage disequilibrium (LD) with variants newly reported in Shrine et al. (2023) [7], which was published after the completion of our main analysis. Based on this LD comparison, a COPDmod-associated variant, rs115933414 (5’ UTR of GRK7, MAF = 0.5%, Odds ratio for A allele = 2.9, P-value = 4.74 10–9), remained conditionally significant and had no previously reported GWAS variants within ± 1 Mb [7]. In addition, two variants associated with FEV1/FVC were found to be in low LD with any previously reported variants within ± 1 Mb [7]: rs138244134 (intronic to MAGI1, MAF = 0.1%, Beta for T allele = −5.23, P-value = 5.44 10–10, max r2 = 4.9 10–5) and rs912057 (intronic to LY86, MAF = 40.7%, Beta for G allele = 0.37, P-value = 5.76 10–10, max r2 = 4.5 10–3). We also conducted ancestry-specific analyses, but we did not identify any novel ancestry-specific results.
Table 1.
Lead novel variants representing four distinct loci
| Phenotype | Subgroup | Chr | Position (b38) |
rs ID | Alleles (effect/other) |
EAF | Beta/OR* (SE) | P-value | Shrine et al. (2023)** | Gene*** | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Discovery | Conditional |
# variants within 1 Mb |
Maximum LD R-squared |
|||||||||
| FEV1/FVC | Combined | 6 | 6,736,698 | rs912057 | G/A | 0.593 | 0.37 (0.06) | 5.76E-10 | 9.48E-06 | 7 | 0.0045 | LY86 |
| 10 | 3,255,632 | rs572516809 | A/C | < 0.001 | 8.23 (1.4) | 3.58E-09 | - | 0 | - | LINC02668 | ||
| Population-based | 3 | 65,875,486 | rs138244134 | T/A | 0.001 | −5.23 (0.84) | 5.44E-10 | 5.37E-10 | 2 | 4.85E-05 | MAGI1 | |
|
Moderate-to-severe COPD |
Combined | 3 | 141,765,054 | rs115933414 | A/G | 0.004 | 2.94 (0.18) | 4.74E-09 | 4.48E-09 | 0 | - | GRK7 |
* Effect sizes are reported as standardized betas and standard errors (SE, see Additional file 1) for continuous traits and odds ratios (ORs) for binary traits
** Shrine et al. (2023): additional conditioning on prior variants reported in this region. Note that there is no conditional p-value for rs572516809 due to lack of known associated variants within ± 1 Mb
*** Gene name is based on OpenTargets or Ensembl;
Finally, for single variants reported at our prior genome-wide significance threshold of P = 5 × 10–8 in our our previous analysis of TOPMed Freeze 5b data [8], we performed a comparison of results for the prior (Freeze 5b) and current (Freeze 8) analyses (Additional file 2: Table S2). For variants which were reported as novel at the time of publication of our Freeze 5b analysis [8], 15 out of 16 comparisons showed attenuated evidence of association in the current analysis of TOPMed Freeze 8 data, underscoring the importance of replication and more stringent P-values for novel genetic association findings using multi-ancestry whole-genome sequencing. For variants which were reported as falling within previously known genomic regions at the time of publication of our Freeze 5b analysis [8], we observed that 25 out of 36 comparisons showed stronger evidence of associations in our current Freeze 8 analysis compared to our prior work. This finding demonstrates the overall greater power and resolution of our current effort, supporting its likely improved utility for statistical fine-mapping.
Pulmonary function shows association with structural variants at 17q21.31 and 9p13.3
In whole-genome analysis of structural variants with a genome-wide significance level of 5 10–7, we identified five structural variants (4 deletions and 1 duplication) associated with FEV1 and FVC at 17q21.31, as well as an inversion at 9p13.3 within the NFX1 gene associated with FEV1/FVC (Table 2). The most significant structural variant, chr17:45500101—45519000, falls into a ncRNA gene, LOC105369225, and partially spans a pseudo gene, LRRC37A4P. This deletion is not only near the 17q21.31 inversion previously identified in association with COPD and lung function, but also in linkage disequilibrium with rs2532349 (r2 = 0.965), a variant that also tags the 17q21.31 inversion [11]. The 9p13.3 inversion has not been previously described.
Table 2.
Summary of significant structural variants identified in the analysis
| Chr | Start | End | SV type | Phenotype | P-value |
|---|---|---|---|---|---|
| 9 | 33,344,152 | 33,344,422 | Inversion | FEV1/FVC | 6.81E-08 |
| 17 | 45,494,001 | 45,536,300 | Deletion | FEV1 | 1.74E-08 |
| 17 | 45.496,201 | 45,537,300 | Deletion | FEV1 | 1.60E-08 |
| FVC | 9.67E-08 | ||||
| 17 | 45,500,101 | 45,519,000 | Deletion | FEV1 | 6.81E-09 |
| FVC | 4.72E-08 | ||||
| 17 | 46,099,041 | 46,099,365 | Deletion | FEV1 | 5.99E-08 |
| 17 | 46,254,801 | 46,277,800 | Duplication | FEV1 | 2.81E-08 |
| FVC | 8.95E-08 |
The table reports genomic position (Chr, Start, End), structural variant type (SV type), associated phenotypes and P-value. Significance was determined using a threshold of P < 1 × 10–7
Gene-based testing identifies associations with five genomic regions
We performed SMMAT [12], a hybrid variant set-based mixed model association test, for gene-centric rare variant (MAF ≤ 5%) sets based on two functional annotations, high-confidence loss of function variants (hcLoF) and any loss of function or missense (LoF/Missense). We additionally performed a burden test. For each phenotype, study design, and aggregation criterion (hcLoF and LoF/missense), we determined statistical significance using a trait- and subgroup-specific Bonferroni threshold that accounts for the number of valid tests (Additional file 2: Table S3). Overall, we observed 14 significant genes across all phenotypes, subgroups and aggregation methods, of which ten genes were not previously reported in COPD or lung function specific analyses and they were also significant after adjusting for single variant associations (Table 3 and Additional file 2: Table S4).
Table 3.
Novel high-confidence loss-of-function (hcLoF) genes identified in the gene-based analysis
| Gene | Chr | start | end | gene_type | Phenotype | Subgroup | SMMAT P-value | Burden | # variants | cMAC (case/control) |
# Carriers | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Discovery | Conditional | Beta/OR* | SE* | P-value | ||||||||||
| DMAP1 | 1 | 44,213,455 | 44,220,681 | protein_coding | Severe COPD | Combined | 1.34E-05 | 3.82E-05 | 0.942 | 0.1518 | 0.6939 | 3 | 541 (109/432) | 540 |
| TTC22 | 1 | 54,779,712 | 54,801,323 | protein_coding | FVC | Combined | 5.45E-06 | - | −0.1061 | 0.0365 | 0.0037 | 20 | 207 | 199 |
| HMCN1 | 1 | 185,734,391 | 186,190,949 | protein_coding | FEV1/FVC | Combined | 4.87E-08 | 4.76E-08 | −1.8073 | 0.5396 | 0.0008 | 62 | 237 | 235 |
| Population-based | 1.28E-07 | 1.32E-07 | −1.9357 | 0.5681 | 0.0007 | 40 | 164 | 164 | ||||||
| ENSG00000285868 | 5 | 157,341,596 | 157,460,100 | protein_coding | Severe COPD | COPD-enriched | 3.14E-05 | - | 3.451 | 0.2828 | 1.19E-05 | 3 | 135 (36/99) | 132 |
| GZMM | 19 | 544,034 | 549,924 | protein_coding | Moderate-to-Severe COPD | Combined | 9.05E-06 | - | 1.352 | 0.1479 | 0.0415 | 5 | 379 (118/261) | 376 |
| Severe COPD | Combined | 1.52E-05 | - | 1.627 | 0.2401 | 0.0426 | 5 | 314 (53/261) | 312 | |||||
* Effect sizes are reported as standardized betas and standard errors (SE, see Additional file 1) for continuous traits and odds ratios (ORs) for binary traits
HMCN1 hcLoF was associated with reduced FEV1/FVC in the combined study (P-value = 4.87 10–8, effect size of burden = −1.81) and remained significant after conditioning on single GWAS variants in the region. Additional novel genes were TTC22 (associated with reduced FVC in combined analysis, P-value = 5.45 10–5), ENSG00000285868 (associated with increased risk of COPDsev in COPD-enriched analysis, P-value = 3.14 10–5), and GZMM (associated with increased risk of COPDmod in combined analysis, P-value = 9.05 10–6). We also observed genome-wide significance of the aggregation units of LoF/Missense variants in four genes at previously described GWAS loci: TNXB, AGER, ADGRG6 and SERPINA1. The associations were no longer significant (Additional file 2: Table S4) after adjusting for the previously reported variants and significant single variants within 1 Mb flanks of the gene.
For the two genes reported based on gene-based analysis of rare putative loss of function variants in our prior TOPMed Freeze 5b analysis [8], we examined their evidence in the corresponding analyses from the current effort and found nominal associations for CRISP1 with FVC in African Americans (SMMAT hcLoF P-value = 0.007) and ARHGEF17 with FEV1/FVC in combined analysis (SMMAT hcLoF P-value = 0.04), again underscoring the importance of replication and large-scale studies for rare variants.
Replication analysis highlights differences in LD between European and Multi-ancestry cohorts that may underlie differences in association
We examined the significance of four novel variants identified in the single variant-based association analysis (Table 1) using study-defined unrelated European ancestry participants and all (European ancestry and other ancestry) participants in UK Biobank, respectively. rs572516809 on chromosome 10 was not available in UK Biobank. For the remaining three variants, using a Bonferroni threshold (=0.017), two did not replicate. The third, rs912057, replicated in both European and Multi-ancestry participants (Additional file 2: Table S5) but was no longer significant after conditioning on prior GWAS variants. To investigate this discrepancy, we compared LD in this region between the discovery and replication data sets. Among seven previously reported GWAS variants in the region, rs1294417 was the only previously reported variant in LD with rs912057 in TOPMed (r2 = 0.53); in the UK Biobank, the LD was stronger in both European ancestry (r2 = 0.69) and multi-ancestry (r2 = 0.69) participants. In both UK Biobank and TOPMed, all other previously reported variants showed low LD with rs912057 (r2 < 0.07). Additionally, we also compared allele frequencies of the 11 conditioned variants between TOPMed and UK Biobank (Additional file 2: Table S6), and allele frequencies were highly consistent across datasets, with differences ranging from 0.0019 to 0.1125 (median 0.0228). The stronger linkage disequilibrium could account for some of the discrepancies in the conditional analysis, but these results could also be a false positive in TOPMed.
Replication of gene-based rare variant association for HMCN1
To examine whether gene-based rare variant findings could be replicated, we used previously published results of gene-based phenome-wide association studies (PheWAS) in the UK Biobank (Additional file 2: Table S7) [13, 14]. The effect of HMCN1 on FEV1/FVC was consistent in replication data (Burden Beta of AZPheWAS = 0.52, overall combined P-value < 1 10–22). Other genes identified in TOPMed did not replicate in UK Biobank.
Fine-mapping prioritizes variants for 47 genome-wide significant regions and demonstrates the value of multi-ancestry analysis
For each genome-wide significant phenotype and subgroup combination (n = 47), we performed multi-ancestry fine-mapping using the Sum of Single Effects (SuSiE) (Methods). Most regions had one credible set (41 regions out of 47 regions, 87.2%, including the newly described LY86 locus, consistent with lack of secondary association), and the credible sets had 1 to 2,337 (median of 8) variants with posterior inclusion probabilities (PIPs) ranging from 0.027–0.998 (median of 0.282) (Fig. 1 and Additional file 2: Table S8). To examine the benefit of including the non-European ancestry participants, we compared our multi-ancestry results with fine mapping using only European-ancestry participants (Fig. 2 and Additional file 2: Table S9). Comparison of fine-mapping resolution between multi-ancestry and European-only analyses demonstrated clear advantages of the multi-ancestry approach. The median credible set size was smaller in the multi-ancestry analysis (8 variants) than in the European-only analysis (15 variants), indicating improved precision in narrowing down candidate variants. The genomic intervals spanned by the credible sets were similar across both approaches (median ~ 30 kb). Notably, the maximum PIP was higher in the multi-ancestry setting (median 28.2% vs. 17.1%), suggesting stronger localization of the putative causal variant. Taken together, these findings highlight the added value of leveraging multi-ancestry data for improving fine-mapping resolution. While in general, most fine-mapped variants did not have large differences in PIP, we found 33 fine-mapped signals with a PIP difference > = 0.2 (Additional file 2: Table S10). For example, in fine-mapping of a locus near HTR4 for FEV1/FVC ratio, we observed a large increase in PIP for the lead variant rs7733410 (chr5:148,476,959) identified by using multi-ancestry participants (PIP = 0.95) versus using European ancestry participants (PIP = 0.24) (Fig. 2E, 2 F). Other signals exhibiting marked improvements in resolution of fine-mapping for multi-ancestry as compared to European ancestry were located in/near RIN3, RARB, and FAM13A, among others (Additional file 2: Table S10).
Fig. 1.
Summary of the multi-ancestry fine-mapping
Fig. 2.

Comparison of multi-ancestry and European-ancestry fine-mapping
Colocalization of molecular QTL from provides support for seven candidate genes
We conducted colocalization analysis of fine-mapped variants to identify sharing causal variants between molecular QTLs and GWAS (Methods). We used gene expression cis-eQTLs from the Lung Tissue Research Consortium (LTRC), which includes COPD cases, and identified seven gene expression traits colocalizing with FEV1/FVC or FEV1 and four (two overlapping) with COPDmod and COPDsev (Table 4, Fig. 3, Additional file 1: Fig. S2). With the exception of PSMA4, genes identified in our LTRC analyses were prioritized as genes of interest in the most recent large-scale GWAS of lung function [7], though ADAM19, THSD4, and C4B were prioritized based on other (i.e. non-eQTL evidence) [7]. We also found overlap between LTRC eQTL and those from the Multi-Ethnic Study of Atherosclerosis (MESA) QTL and GTEx Lung eQTL (Additional file 2: Table S11).
Table 4.
Colocalization (PP.H4 > = 0.8) of LTRC cis-eQTLs with fine-mapped variants
| Gene Name | Trait | Stratum | PP.H4 | Support from non-LTRC QTL Colocalization | Shrine et al. 2023 | |
|---|---|---|---|---|---|---|
| Novel Gene | Prioritization Evidence | |||||
| RARB | FEV1/FVC | Combined | 0.9971 | None | FALSE | eQTL; PoPs; nearest |
| FEV1/FVC | Population-based | 0.9954 | None | |||
| HHIP | FEV1/FVC | Combined | 0.9408 | MESA-mQTL | FALSE | eQTL; PoPs; nearest |
| FEV1 | Combined | 0.9325 | None | |||
| Moderate-to-Severe COPD | COPD-enriched | 0.9157 | None | |||
| HHIP-AS1 | FEV1/FVC | Combined | 0.8917 | None | FALSE | eQTL; nearest |
| FEV1 | Combined | 0.9123 | None | |||
| Moderate-to-Severe COPD | Combined | 0.8889 | None | |||
| Moderate-to-Severe COPD | COPD-enriched | 0.9109 | None | |||
| NPNT | FEV1/FVC | Combined | 0.9999 | None | FALSE | eQTL; PoPs; credible; nearest |
| FEV1/FVC | Population-based | 0.9997 | None | |||
| FEV1 | Combined | 0.9993 | None | |||
| FEV1 | Population-based | 0.9941 | None | |||
| FEV1 | Population-based | 0.8115 | GTEx-eQTL | |||
| ADAM19 | FEV1/FVC | Combined | 0.9708 | MESA-eQTL, GTEx-eQTL | FALSE | PoPs; nearest |
| C4B | FEV1/FVC | Combined | 0.9937 | None | TRUE | rare_disease |
| FEV1/FVC | Population-based | 0.9883 | None | |||
| RIN3 | Moderate-to-Severe COPD | Combined | 0.9098 | None | FALSE | eQTL; PoPs; nearest |
| Severe COPD | Combined | 0.954 | None | |||
| THSD4 | FEV1/FVC | Combined | 0.9976 | MESA-mQTL | FALSE | PoPs; nearest |
| PSMA4 | Severe COPD | COPD-enriched | 0.9414 | None | N/A | N/A |
LTRC, the Lung Tissue Research Consortium; FEV1, forced expiratory volume in 1 s; FEV1/FVC, FEV1 to forced vital capacity ratio; PP.H4, posterior probability of colocalization; eQTL, gene expression quantitative trait loci; PoPs, the gene with the highest polygenic priority score; nearest, the nearest gene to the sentinel SNP identified in Shrine et al. 2023; functional credible, the annotation-informed credible set identified in Shrine et al. 2023; rare-disease, rare Mendelian-disease genes within ± 500 kb of a lung-function sentinel SNP identified in Shrine et al. 2023; None, no colocalization support due to either no fine-mapping with 95% credible set or PP.H4 < 0.8; N/A, unavailable prioritization evidence
Fig. 3.
Regional plots of pulmonary trait association and LTRC eQTL for colocalized gene regions
Single-cell RNA-seq to identify putative cell types for implicated genes
To determine cell type-specific expression for genes of interest identified by single variant, rare variant, or colocalization analyses, we analyzed a single cell RNA sequencing (scRNA) atlas comprising integrated data sets of healthy and diseased human lung tissue of patients with interstitial lung disease (ILD) or COPD (Fig. 4). In this joint data set [15], prioritized genes were expressed across epithelial, endothelial, immune, and stromal cells. Some genes were specifically and highly expressed in a single cell type or tissue compartment (e.g. AGER) while others were found differentially expressed in multiple cell types or with low expression across multiple cell types (e.g. GRK7). In epithelial cells, there was specific expression of AGER with alveolar type 1 cells, HHIP and HHIP-AS1 with alveolar type 2 cells, and RARB with tuft cells; less specificity, but still notable for relative expression included NPNT in alveolar type 1 cells, THSD4 in basal epithelial cells, and TTC22 in multiple epithelial cell types. In immune cells, GZMM was expressed in NK cells, LY86 in antigen-presenting cells (macrophages, dendritic cells, monocytes), PSMA4 in proliferating immune cells including alveolar macrophages and T lymphocytes, and RIN3 and ADAM19 in multiple immune cell types. All cell lines used in this study were purchased from the American Type Culture Collection (ATCC). The cell lines were authenticated according to ATCC’s standard certification procedures and were routinely tested for mycoplasma contamination every six months, or additionally whenever abnormal cell morphology or growth characteristics were observed.
Fig. 4.
Summary of cell type-specific gene expression of prioritized genes based on single cell atlas data from control lung tissue samples
HMCN1 deficiency in lung fibroblasts hampers matrix gene expression
Single-cell data indicated fibroblasts as the main cell type in the lungs that expresses HMCN1. To determine the function of HMCN1 in human lung fibroblasts, we generated a stable knockout line of HMCN1 in immortalized IMR-90 cells, a human fetal lung fibroblast cell line. With over 50% knockout efficiency of HMCN1 mRNA levels, we found lower levels of collagen genes. These effects appeared to be magnified when cells were cultured under viability (non-proliferation) conditions with 0.1% FBS (Additional file 2: Table S12 and Additional file 1: Fig. S3).
Discussion
In this collaborative effort, we analyzed whole genome sequences of 44,287 multi-ancestry samples to identify the genetic risk factors for reduced lung function and COPD. Our results provide additional support for many well-characterized COPD-related genes and loci such as HHIP, NPNT, RARB, and RIN3, but also present evidence for new and emerging candidates including a single variant locus near LY86 and gene-based rare variant association for HMCN1. Through integration with eQTL from the LTRC, we also provide additional support for molecular targets, including ADAM19, THSD4, C4B, and PSMA4, that were not supported by lung eQTL from other sources in recent work [7]. Examination of the identified molecular targets in lung single-cell expression resources highlighted the expression of some of these genes in lung epithelial and immune cell subsets. Our study also illustrates some of the potential benefits of whole genome sequencing-based multi-ancestry studies for identifying causal variants via fine mapping.
Most of our associations were identified in much larger, array-based genome-wide association studies. This includes the results of our structural variant analysis, where we identified an association in 17q that is likely the same signal identified by single variant analysis related to the 17q21.31 inversion [16, 17]. We identified one additional significant inversion (9:33,344,152–33,344,422) located within the NFX1 gene. While the effector gene for this structural variant is not known, NFX1 is a nuclear protein complex that can regulate inflammatory response through MHC II and IFN-gamma [18]. Our genome-wide analysis did identify an additional common variant association near LY86 and rare variants near LINC02668, MAGI1, and GRK7. LY86 (lymphocyte antigen 86, also known as MD-1 [Myeloid differentiation 1]) is implicated in innate immunity and inflammation [19, 20], and variants in the region of LY86 have been reported in a genome-wide study examining COPD-dependent effects of genetic variation on lung cancer risk [21]. LINC02668 is a long non coding RNA with expression in many tissues including lung, previously associated with glioma survival [22]. MAGI1 (Membrane associated guanylate kinase, WW and PDZ domain containing 1) functions in cytoplasmic, cadherin-mediated, tight junction adhesion in epithelial and endothelial cells, and was found to be differentially expressed in healthy smokers versus COPD small airway epithelium [23, 24]. GRK7 is perhaps a less likely effector gene, as expression is mostly in the eye. We note that these genetic association findings require replication and further analysis to identify the likely effector genes.
Our gene-based association identified and replicated an association of rare hcLoF variants in HMCN1 with FEV1/FVC ratio. The HMCN1 region is associated with lung function in a genome-wide association study [25], and rare variants in HMCN1 were previously identified in the UK Biobank exome analysis [13], Hemicentins are members of the fibulin family, which are well-known in GWAS [26, 27] and Mendelian diseases [28, 29]. HMCN1 was recently identified as part of a lung fibroblast signature, and together with ITGA8, required for epithelial mesenchymal interaction that anchors NPNT [30], all genes previously implicated by GWAS [31, 32]. HMCN1 was also found to be differentially expressed in pulmonary fibrosis vs control lung, and fibroblasts treated with TGFꞵ increased HMCN1 expression in a dose-dependent manner, and silencing HMCN1 reduced alpha-smooth muscle actin (αSMA) [33]. Additionally, our experiments found HMCN1 loss in lung fibroblasts reduces expression of matrix genes, particularly under conditions mimicking cellular homeostasis (viability without proliferation), suggesting a mechanism by which HMCN1 may impair lung repair or regeneration through modulating the composition of the extracellular matrix.
In addition to association analysis, we performed several analyses to implicate target variants, genes and cell types. Consistent with other recent studies [7, 34, 35], we observed the benefits of multi-ancestry fine-mapping compared to analyses focused on European-ancestry participants alone. Credible sets derived from analysis of multi-ancestry participants [1] contained fewer variants overall, and [2] spanned shorter total physical distance, despite a relatively small proportion of non-European samples. This improvement in fine-mapping resolution was observed for signals in regions of high LD among European ancestry, but reduced LD among multi-ancestry, at loci including HTR4, RIN3, RARB and FAM13A. These fine-mapping results clarify the number of independent signals at each locus, and facilitate future QTL mapping and other efforts to identify the function of individual variants, for example, by the NHGRI’s Impact of Genomic Variation on Function consortium.
We also found the benefit of multi-omic data, identifying additional evidence for ADAM19, THSD4, PSMA4, and C4B. While other sources of evidence suggest a role for complement in COPD [36], results for C4B should be interpreted with caution as long-range linkage disequilibrium in the MHC region poses challenges to fine-mapping and co-localization [37]. To identify potential cell types, we used single-cell transcriptomic data, identifying alveolar epithelial cells (HHIP, NPNT), airway basal cells (THSD4), natural killer cells (GZMM), and antigen presenting cells (ADAM19, LY86, RIN3), adding to evidence for not only epithelial cells [38–40] but other cell types in the pathobiology of COPD [41]. Single-cell data also suggests differences in patterns of expression across disease states for genes including THSD4 and PSMA4, which may provide an explanation for the differences in colocalization results when overlapping with disease-ascertained lung tissue samples from LTRC compared to GTEx.
As the current work represents an expanded version of our prior effort [8], we highlight here the major distinctions between these two efforts. The prior work was carried out as a pilot study using whole genome sequence data from 19,996 participants from TOPMed Freeze 5b, in preparation for the current expanded effort using whole genome sequence data from 44,287 in TOPMed Freeze 8. While the prior effort carried out conditional analysis to identify secondary signals within the regions of variants identified at genome-wide significance, the current effort applied the newer and now widely used approach of statistical fine-mapping, allowing for reports of credible sets, in addition to individual variants. Further, while our prior work performed downstream integration of WGS results with gene expression (eQTL) from GTEx, the current study presents, for the first time, integration of TOPMed WGS for lung function and COPD with eQTL from the LTRC, a valuable resource representing lung tissue obtained from a disease-enriched cohort. In the present work, we have used currently available single cell gene expression resources to prioritize cell types for candidate genes, an approach that was not generally available at the time of our prior publication. Finally, distinct from our prior work, the current manuscript presents for the first time the analysis of structural variation called from TOPMed whole genome.
Our study has several limitations. While our study using whole genome sequencing allows a near-comprehensive assessment of genetic variation, and we included multiple ancestries, we did not identify any novel ancestry-specific associations, and the majority of our findings were identified in previously published genome-wide association studies [7, 31]. These data are consistent with our finding that most of the heritability of lung function and COPD appears to be from common variants [5], and that rare variant discovery requires large sample sizes (particularly of non-European ancestry), newer analytic methods, and better functional variant prediction. We focused on lung function and COPD using GOLD criteria; other studies use ICD-coded COPD, which has poor agreement with GOLD [42, 43]. Our single-variant rare variants require replication; many of our gene-based rare variant sets did not replicate in the UK Biobank. Our variant-to-gene-to-cell type analysis would be strengthened by additional investigations, such as open chromatin data (e.g. scATAC-Seq), single cell QTL, and functional perturbation.
Conclusions
In this expanded collaborative effort, we analyzed whole genome sequence data for 44,287 multi-ancestry samples from the NHLBI TOPMed Program to identify the genetic risk factors for reduced lung function and COPD. We showed the benefit of detailed sequencing data in multiple ancestries to nominate causal variants through fine mapping, and of multiple omics to nominate causal genes through QTL colocalization.
Methods
Study participants, sequencing, and phenotypes
Our overall study design is shown in Fig. 5. We included participants encompassing multiple study-reported race and ancestral backgrounds from eight population-based studies (the Atherosclerosis Risk in Communities [ARIC] Study [44], the Coronary Artery Risk Development in Young Adults [CARDIA] [45], the Cleveland Family Study [CFS] [46], the Cardiovascular Health Study [CHS] [47], the Framingham Heart Study [FHS] [48], Hispanic Community Health Study/Study of Latinos [HCHS/SOL] [49], the Jackson Heart Study [JHS] [50] and the Multi-Ethnic Study of Atherosclerosis [MESA] [51]) and four COPD-enriched studies (the Genetic Epidemiology of COPD [COPDGene] [52], Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints [ECLIPSE] [53], Boston Early-Onset COPD Study [EOCOPD] [54] and Lung Tissue Research Consortium [LTRC] [55]) participating in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Participants were grouped by study-reported race. Outliers were excluded based on ancestry principal components, assessed through scatter plots of the first three PCs, where individuals clustering distinctly apart from the main population were flagged and removed. Groups with fewer than 100 participants per study and race were excluded. After sample quality control, we retained 44,287 participants including 24,873 European, 11,051 African, 7,792 Latin American, and 571 East Asians (Additional file 2: Table S13). Details of cohorts, their protocol and sample quality control process are described in the Additional file 1.
Fig. 5.
Overview of the statistical modeling workflow
Genotypes were obtained from Whole Genome Sequence (WGS) data in TOPMed Freeze 8 samples (Additional file 1). After variant quality control, we retained a total of 877,506,482 non-structural variants (811,228,902 single nucleotide variants and 66,277,580 indels) and 466,455 structural variant sites (231,817 deletions, 197,412 duplications and 37,226 inversions).
To minimize heterogeneity of phenotypes across multiple TOPMed studies, we used harmonized phenotypes following the protocol of the NHLBI Pooled Cohorts Study (Additional file 1) [56]. To maximize availability of phenotypes, we analyzed pre-bronchodilator forced expiratory volume in one second (FEV1), forced vital capacity (FVC) and its ratio (FEV1/FVC). For COPD, we tested two dichotomous phenotypes using the Global Initiative for Chronic Obstructive Lung Disease (GOLD) severity grades and FEV1% predicted [57]: 1) COPDmod (FEV1% predicted < 80%) cases vs. controls (FEV1% ≥ 80% and FEV1/FVC ≥ 0.7) and 2) COPDsev (FEV1% < 50%) cases vs. controls.
Genetic association analyses of novel loci identification
Our primary analysis for each phenotype included multi-ancestry participants. We also performed stratified analysis by ancestry, and in population-based studies versus COPD-enriched studies (hereafter referred to as subgroups). We performed 1) single variant-, with a minor allele count > 25, 2) variant set-based and 3) structural variant analyses. We identified novel variants based on conditioning on variants previously reported for lung function or GOLD-criteria based COPD [7, 31], and tested novel variants and genes for replication in the UK Biobank. For further information, a detailed description of the entire analysis process can be found in Additional file 1.
Statistical fine-mapping of GWAS variants
Non-causal variants can be significantly associated with a phenotype due to high correlation with the true causal variants through Linkage Disequilibrium (LD). Thus, identifying the set of variants that are most likely to be causal to the phenotype within each genetic locus, so-called fine-mapping, is essential in understanding the biological mechanism of the phenotype [58]. We performed fine-mapping for all variants with MAC 25 in a 100 kb window of each significant association region using the Sum of Single Effects (SuSiE) model with summary statistics implemented in the R package susieR [59, 60]. We calculated Z-scores from single variant-based association analyses and an in-sample LD matrix of variants in each significant region as an input of the SuSiE model. We assumed that there are at most 10 causal variants in each significant region although SuSiE is robust to the choice of the maximum number of causal variants. Under these settings, we tried to find a 95% credible set which is a subset of variants with probability of 0.95 or greater of containing at least one causal variant in all significant regions across all phenotypes and subgroups.
Molecular cis-QTL colocalization with fine-mapped variants
To determine whether there may be identifiable molecular drivers (e.g., gene transcripts) of our genomic associations, we performed colocalization using Lung Tissue Research Consortium (LTRC), MESA omics data, and GTEx.
For LTRC, we used cis-eQTLs from TOPMed Freeze 1RNA. Freeze 1RNA TOPMed cis-eQTL results were generated in a collaboration between the TOPMed Informatics Research Center, TOPMed Multi-Omics working group, and the TOPMed parent studies contributing RNA-seq and distributed to TOPMed investigators. The LTRC cis-eQTL mapping was performed using tensorQTL [61] in 1,360 unrelated European-dominant samples with WGS data in the TOPMed Freeze 9b samples for variants with MAF ≥ 1%. The mapping window was set to ± 1 Mb of the gene transcription start site (TSS). The LTRC cis-eQTL mapping analysis was adjusted for covariates including genotypic sex, the first 15 genotype PCs, and 75 gene expression PCs.
For MESA, we used cis-quantitative trait loci (cis-QTLs) corresponding to three types of omics data: gene expression (eQTL) from RNA-seq of peripheral blood mononuclear cells (PBMCs), methylation (mQTL) from whole blood using the Illumina EPIC array [62], and plasma protein (pQTL) based on SOMAScan proteomics [63]. Both eQTL and mQTL mapping were performed using tensorQTL [61] in ~ 900 multi-ancestry samples from two time points (Exam 1 and Exam 5, 10 years apart) with WGS data in the TOPMed Freeze 8 samples for variants with MAF ≥ 1%. The mapping window was set to ± 1 Mb of the transcription start site (TSS) for eQTL and ± 500 kb of the CpG site for mQTL. To control for population stratification, the first 11 PC-AiR PCs were included in the model. The PEER factors [64] were used as covariates in the model to control for both technical and biological variation. The pQTL mapping was conducted on 971 multi-ancestry samples using SOMAscan HTS Assay 1.3 K for plasma proteins [63].
To test whether the colocalization signals from LTRC could be detected in a more general population without ascertainment for lung disease, we used GTEx V8 Lung cis-eQTL (downloaded from GTEx Portal [65]) to validate LTRC colocalized genes. The GTEx V8 Lung cis-eQTL were generated on 515 predominantly European-ancestry samples.
To examine colocalization, we first performed SuSiE fine mapping on summary statistics of GWAS and QTLs with in-sample LD respectively to get fine-mapped variants. The colocalization analysis was then conducted on fine-mapped variants as implemented in coloc v5.1.0 R package [66]. We reported the results where the posterior probability of colocalization (PP.H4) 0.8. We further focused on colocalization results supporting novel GWAS variants identified in our study. The colocalization plots were generated by using the LocusCompareR package [67].
Colocalization analysis
eQTL in GTEx v7: eQTL data used for the analyses described in this manuscript were obtained from the GTEx Portal on March 22, 2019 and represented sample sizes ranging from 70 to 491 per tissue, with 383 samples for lung. As the summary statistics for GTEx v7 were in human genome assembly hg19, the liftOver tool was used to map the coordinates to the same assembly.
eQTL and mQTL in MESA
RNA-seq was performed for PBMCs from MESA and profiling of genome-wide methylation was performed for whole blood using the Illumina EPIC array. eQTL and mQTL mapping was performed using tensorQTL [61] in ~ 900 individuals from MESA Exam 1 and Exam 5 data for variants with MAF ≥ 1%. The mapping window was set to ± 1 Mb of the TSS for eQTLs and ± 500 kb of the CpG site for mQTLs. We used 11 genotype PCs as covariates to control for population stratification effects, and we used PEER factors [64] to control for both technical and biological variation. The optimal number of PEER factors to use was determined to maximize the cis-eGene and cis-mProbe discovery.
Colocalization analysis
Colocalization analysis of novel GWAS variants and eQTL/mQTL was performed using the R/coloc v3.1 package [66] with default priors (p1 = 1 × 10–4, p2 = 1 × 10–4, and p12 = 1 × 10–5). Using the molecular QTL resources described above, we conducted colocalization analysis on all variants within ± 500 kb of the novel WGS variants (reported in Table 1), as well as those identified in conditional analysis (reported in Table 1). We kept only results where the lead WGS variant was a significant e/mQTL and posterior probability of colocalization (PP4) > 0.5. We further focused on results where the model of a single shared causal variant driving both association signals (PP4) was strongly preferred over a model of two distinct causal variants (PP3)—PP4/(PP3 + PP4) ≥ 0.9. In addition, we required adequate power for these results to detect colocalization, which we quantified using a cutoff of PP3 + PP4 ≥ 0.8.
Follow-up of selected methylation sites/probes
Those methylation sites/probes demonstrating colocalization with TOPMed WGS were examined further to determine whether measured methylation from MESA Exam 1 was associated with the corresponding lung function traits in MESA which were based on follow-up measures from MESA Exam 4. The association of measured methylation with lung function was tested in linear regression of the following form:
PFT ~ methylation + age + age^2 + sex + height + height^2 + weight (for FVC only) + current smoking + former smoking + pack-years of smoking + genetic PCs of ancestry + methylation PEER factors.
We applied inverse normal transformation on both PFT trait and methylation levels of the colocalized methylation sites/probes. We report methylation-lung function associations for those results demonstrating Bonferroni-corrected significance accounting for the number of methylation markers tested.
Overlap with pathways previously implicated by GWAS
We examined genes in the following categories for overlap with pathways previously implicated by GWAS: (1) novel genes supported by eQTL colocalization, as indicated in Table 3 and Additional file 2: Table S3, (2) genes supported by findings in our TOPMed analysis that overlap previously reported GWAS signals in Additional file 2: Table S1, and (3) novel variants identified in our study that are located within gene introns. We selected for examination gene ontology (GO) terms represented as enriched among genes implicated by GWAS of the recent lung function GWAS paper by Shrine et al. [7]. We then used the database of GO term inclusion provided at http://amigo.geneontology.org/ to report the overlap of our identified genes with the selected GO terms.
Single-Cell analysis
We analyzed publicly available datasets of human lung cells from the Human Lung Cell Atlas [15] from the Chan Zuckerberg Initiative (CZI) or Gene Expression Omnibus (GEO) [68]. We included single-cell RNA sequencing studies of control lung tissue [15], as well as diseased lung tissue from patients with interstitial lung disease or COPD [68]. Data sets were analyzed using Seurat v3 [69]. Counts matrices for each study were scaled and normalized for downstream analysis. Available metadata, including published cell type identity and cell population for each manuscript, was used for annotation. Visualization of gene expression was performed using DotPlot to determine percentage and scaled expression levels of each gene across detailed cell types as well as broad cell populations (epithelial, immune, etc.). Data sets with healthy and disease states were subset according to diagnosis prior to analysis and compared.
CRISPR knockdown of HMCN1 in lung fibroblasts
To test the effect of loss of HMCN1, we performed CRISPR knockdown of HMCN1 in lung fibroblasts (IMR90) on five collagen genes in either 0.1% or 10% FBS. Further details including cell culture, molecular cloning, lentiviral packaging, and infection, are in Additional file 1.
Supplementary Information
Additional file 1. Text, Results, Acknowledgements and Fig. S1 - S5 [70–115].
Acknowledgements
Whole-Genome Sequencing Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for “NHLBI TOPMed: Atherosclerosis Risk in Communities (ARIC)” (phs001211) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201500015C and 3U54HG003273-12S2) and the Broad Institute for MIT and Harvard (3R01HL092577-06S1). WGS for “NHLBI TOPMed: Cardiovascular Health Study (CHS)” (phs001368) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201500015C). WGS for “NHLBI TOPMed: The Cleveland Family Study” (phs000954) was performed at the University of Washington Northwest Genomics Center (3R01HL098433-05S1). WGS for “NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study” (phs000974) was performed at the Broad Institute of MIT and Harvard (HHSN268201500014C and 3R01HL092577-06S1). WGS for “NHLBI TOPMed: The Jackson Heart Study” (phs000964) was performed at the University of Washington Northwest Genomics Center (HHSN268201100037C). WGS for “NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA)” (phs001416) was performed at the Broad Institute of MIT and Harvard (3U54HG003067-13S1 and HHSN268201500014C). WGS for “NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program” (phs000946) was performed at the University of Washington Northwest Genomics Center (3R01HL089856-08S1). WGS for “NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program” (phs000951) was performed at the University of Washington Northwest Genomics Center (3R01 HL089856-08S1) and the Broad Institute of MIT and Harvard (HHSN268201500014C). WGS for "Hispanic Community Health Study/Study of Latinos (HCHS/SOL)" (phs001395) and "Coronary Artery Risk Development in Young Adults (CARDIA)" (phs001612)was performed at the the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201600033I). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC, and general study coordination, were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I). The TOPMed MESA Multi-Omics project was conducted by the University of Washington and LABioMed (HHSN2682015000031/HHSN26800004). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. A full list of authors for the NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium is provided at https://www.nhlbiwgs.org/topmed-banner-authorship.
Population- and Family-based Cohorts
The Atherosclerosis Risk in Communities Study SJL is supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (ZO1 ES043012). The Genome Sequencing Program (GSP) was funded by the National Human Genome Research Institute (NHGRI), the National Heart, Lung, and Blood Institute (NHLBI), and the National Eye Institute (NEI). The GSP Coordinating Center (U24 HG008956) contributed to cross-program scientific initiatives and provided logistical and general study coordination. The Centers for Common Disease Genomics (CCDG) program was supported by NHGRI and NHLBI, and whole genome sequencing was performed at the Baylor College of Medicine Human Genome Sequencing Center (UM1 HG008898 and R01HL059367). The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. (75N92022D00001, 75N92022D00002, 75N92022D00003, 75N92022D00004, 75N92022D00005). The authors thank the staff and participants of the ARIC study for their important contributions.
Coronary Artery Risk Development in Young Adults
The Coronary Artery Risk Development in Young Adults study is conducted at the University of Alabama at Birmingham (HHSN268201800005I & HHSN268201800007I), Northwestern University (HHSN268201800003I), University of Minnesota (HHSN268201800006I), and Kaiser Foundation Research Institute (HHSN268201800004I). CARDIA was also partially supported by the Intramural Research Program of the National Institute on Aging (NIA) and an intra‐agency agreement between NIA and NHLBI (AG0005).
The Cardiovascular Health Study
This Cardiovascular Health Study (CHS) research was supported by NHLBI contracts 75N92021D00006, HHSN268201200036C, HHSN268200800007C, HHSN268200960009C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants U01HL080295, U01HL130114, R01HL087652, R01HL105756, R01HL103612, R01HL085251, and R01HL120393 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through R01AG023629 from the National Institute on Aging (NIA). A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org.
The Cleveland Family Study
The Cleveland Family Study and SR were supported by NIH grants HL 046389, HL113338, and 1R35HL135818. BC is supported by the NIH grant K01 HL135405 and an American Thoracic Society Foundation Unrestricted Grant (Sleep) (http://foundation.thoracic.org).
The Framingham Heart Study
The Framingham Heart Study (FHS) acknowledges the support of contracts NO1-HC-25195, HHSN268201500001I and 75N92019D00031 from the National Heart, Lung and Blood Institute and grant supplement R01 HL092577-06S1 for this research. We also acknowledge the dedication of the FHS study participants without whom this research would not be possible. Dr. Vasan is supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine.
The Jackson Heart Study
The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS.
The Multi-Ethnic Study of Atherosclerosis
The Multi-Ethnic Study of Atherosclerosis (MESA) projects are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. The MESA Lung Study is supported by R01-HL077612 and R01-HL093081. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1TR001881, DK063491, and R01HL105756. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutes can be found at http://www.mesa-nhlbi.org.
The Hispanic Community Health Study/Study of Latinos
The authors thank the staff and participants of HCHS/SOL for their important contributions. The Hispanic Community Health Study/Study of Latinos is a collaborative study supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the University of North Carolina (HHSN268201300001I/N01-HC-65233), University of Miami (HHSN268201300004I/N01-HC-65234), Albert Einstein College of Medicine (HHSN268201300002I/N01-HC-65235), University of Illinois at Chicago – HHSN268201300003I/N01-HC-65236 Northwestern Univ), and San Diego State University (HHSN268201300005I/N01-HC-65237). The following Institutes/Centers/Offices have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: National Institute on Minority Health and Health Disparities, National Institute on Deafness and Other Communication Disorders, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of Neurological Disorders and Stroke, NIH Institution-Office of Dietary Supplements.
The UK Biobank
This research has been conducted using the UK Biobank Resource under application 20915.
COPD-enriched studies
Boston Early-Onset COPD Study The Boston Early-Onset COPD Study (dbGaP accession number phs000946) was supported by the following NIH grants: R01 HL075478, U01 HL089856, and R01 HL113264.
Genetic Epidemiology of COPD (COPDGene)
The project described was supported by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
COPD Foundation Funding
The COPDGene® project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion.
COPDGene® Investigators – Core Units
Administrative Center: James D. Crapo, MD (PI); Edwin K. Silverman, MD, PhD (PI); Barry J. Make, MD; Elizabeth A. Regan, MD, PhD.
Genetic Analysis Center: Terri Beaty, PhD; Ferdouse Begum, PhD; Peter J. Castaldi, MD, MSc; Michael Cho, MD; Dawn L. DeMeo, MD, MPH; Adel R. Boueiz, MD; Marilyn G. Foreman, MD, MS; Eitan Halper-Stromberg; Lystra P. Hayden, MD, MMSc; Craig P. Hersh, MD, MPH; Jacqueline Hetmanski, MS, MPH; Brian D. Hobbs, MD; John E. Hokanson, MPH, PhD; Nan Laird, PhD; Christoph Lange, PhD; Sharon M. Lutz, PhD; Merry-Lynn McDonald, PhD; Margaret M. Parker, PhD; Dandi Qiao, PhD; Elizabeth A. Regan, MD, PhD; Edwin K. Silverman, MD, PhD; Emily S. Wan, MD; Sungho Won, Ph.D.; Phuwanat Sakornsakolpat, M.D.; Dmitry Prokopenko, Ph.D..
Imaging Center: Mustafa Al Qaisi, MD; Harvey O. Coxson, PhD; Teresa Gray; MeiLan K. Han, MD, MS; Eric A. Hoffman, PhD; Stephen Humphries, PhD; Francine L. Jacobson, MD, MPH; Philip F. Judy, PhD; Ella A. Kazerooni, MD; Alex Kluiber; David A. Lynch, MB; John D. Newell, Jr., MD; Elizabeth A. Regan, MD, PhD; James C. Ross, PhD; Raul San Jose Estepar, PhD; Joyce Schroeder, MD; Jered Sieren; Douglas Stinson; Berend C. Stoel, PhD; Juerg Tschirren, PhD; Edwin Van Beek, MD, PhD; Bram van Ginneken, PhD; Eva van Rikxoort, PhD; George Washko, MD; Carla G. Wilson, MS.
PFT QA Center, Salt Lake City, UT: Robert Jensen, PhD.
Data Coordinating Center and Biostatistics, National Jewish Health, Denver, CO: Douglas Everett, PhD; Jim Crooks, PhD; Camille Moore, PhD; Matt Strand, PhD; Carla G. Wilson, MS Epidemiology Core, University of Colorado Anschutz Medical Campus, Aurora, CO: John E. Hokanson, MPH, PhD; John Hughes, PhD; Gregory Kinney, MPH, PhD; Sharon M. Lutz, PhD; Katherine Pratte, MSPH; Kendra A. Young, PhD.
Mortality Adjudication Core: Surya Bhatt, MD; Jessica Bon, MD; MeiLan K. Han, MD, MS; Barry Make, MD; Carlos Martinez, MD, MS; Susan Murray, ScD; Elizabeth Regan, MD; Xavier Soler, MD; Carla G. Wilson, MS.
Biomarker Core: Russell P. Bowler, MD, PhD; Katerina Kechris, PhD; Farnoush Banaei-Kashani, Ph.D.
COPDGene® Investigators – Clinical Centers
Ann Arbor VA: Jeffrey L. Curtis, MD; Carlos H. Martinez, MD, MPH; Perry G. Pernicano, MD.
Baylor College of Medicine, Houston, TX: Nicola Hanania, MD, MS; Philip Alapat, MD; Mustafa Atik, MD; Venkata Bandi, MD; Aladin Boriek, PhD; Kalpatha Guntupalli, MD; Elizabeth Guy, MD; Arun Nachiappan, MD; Amit Parulekar, MD.
Brigham and Women’s Hospital, Boston, MA: Dawn L. DeMeo, MD, MPH; Craig Hersh, MD, MPH; Francine L. Jacobson, MD, MPH; George Washko, MD.
Columbia University, New York, NY: R. Graham Barr, MD, DrPH; John Austin, MD; Belinda D’Souza, MD; Gregory D.N. Pearson, MD; Anna Rozenshtein, MD, MPH, FACR; Byron Thomashow, MD Duke University Medical Center, Durham, NC: Neil MacIntyre, Jr., MD; H. Page McAdams, MD; Lacey Washington, MD.
HealthPartners Research Institute, Minneapolis, MN: Charlene McEvoy, MD, MPH; Joseph Tashjian, MD Johns Hopkins University, Baltimore, MD: Robert Wise, MD; Robert Brown, MD; Nadia N. Hansel, MD, MPH; Karen Horton, MD; Allison Lambert, MD, MHS; Nirupama Putcha, MD, MHS Los Angeles Biomedical Research Institute at Harbor UCLA Medical Center, Torrance, CA: Richard Casaburi, PhD, MD; Alessandra Adami, PhD; Matthew Budoff, MD; Hans Fischer, MD; Janos Porszasz, MD, PhD; Harry Rossiter, PhD; William Stringer, MD.
Michael E. DeBakey VAMC, Houston, TX: Amir Sharafkhaneh, MD, PhD; Charlie Lan, DO.
Minneapolis VA: Christine Wendt, MD; Brian Bell, MD.
Morehouse School of Medicine, Atlanta, GA: Marilyn G. Foreman, MD, MS; Eugene Berkowitz, MD, PhD; Gloria Westney, MD, MS.
National Jewish Health, Denver, CO: Russell Bowler, MD, PhD; David A. Lynch, MB.
Reliant Medical Group, Worcester, MA: Richard Rosiello, MD; David Pace, MD.
Temple University, Philadelphia, PA: Gerard Criner, MD; David Ciccolella, MD; Francis Cordova, MD; Chandra Dass, MD; Gilbert D’Alonzo, DO; Parag Desai, MD; Michael Jacobs, PharmD; Steven Kelsen, MD, PhD; Victor Kim, MD; A. James Mamary, MD; Nathaniel Marchetti, DO; Aditi Satti, MD; Kartik Shenoy, MD; Robert M. Steiner, MD; Alex Swift, MD; Irene Swift, MD; Maria Elena Vega-Sanchez, MD.
University of Alabama, Birmingham, AL: Mark Dransfield, MD; William Bailey, MD; Surya Bhatt, MD; Anand Iyer, MD; Hrudaya Nath, MD; J. Michael Wells, MD.
University of California, San Diego, CA: Joe Ramsdell, MD; Paul Friedman, MD; Xavier Soler, MD, PhD; Andrew Yen, MD.
University of Iowa, Iowa City, IA: Alejandro P. Comellas, MD; Karin F. Hoth, PhD; John Newell, Jr., MD; Brad Thompson, MD.
University of Michigan, Ann Arbor, MI: MeiLan K. Han, MD, MS; Ella Kazerooni, MD; Carlos H. Martinez, MD, MPH.
University of Minnesota, Minneapolis, MN: Joanne Billings, MD; Abbie Begnaud, MD; Tadashi Allen, MD.
University of Pittsburgh, Pittsburgh, PA: Frank Sciurba, MD; Jessica Bon, MD; Divay Chandra, MD, MSc; Carl Fuhrman, MD; Joel Weissfeld, MD, MPH.
University of Texas Health Science Center at San Antonio, San Antonio, TX: Antonio Anzueto, MD; Sandra Adams, MD; Diego Maselli-Caceres, MD; Mario E. Ruiz, MD.
The ECLIPSE study (NCT00292552; GSK code SCO104960) was funded by GlaxoSmithKline
ECLIPSE Investigators — Bulgaria: Y. Ivanov, Pleven; K. Kostov, Sofia. Canada: J. Bourbeau, Montreal; M. Fitzgerald, Vancouver, BC; P. Hernandez, Halifax, NS; K. Killian, Hamilton, ON; R. Levy, Vancouver, BC; F. Maltais, Montreal; D. O'Donnell, Kingston, ON. Czech Republic: J. Krepelka, Prague. Denmark: J. Vestbo, Hvidovre. The Netherlands: E. Wouters, Horn-Maastricht. New Zealand: D. Quinn, Wellington. Norway: P. Bakke, Bergen. Slovenia: M. Kosnik, Golnik. Spain: A. Agusti, J. Sauleda, P. de Mallorca. Ukraine: Y. Feschenko, V. Gavrisyuk, L. Yashina, Kiev; N. Monogarova, Donetsk. United Kingdom: P. Calverley, Liverpool; D. Lomas, Cambridge; W. MacNee, Edinburgh; D. Singh, Manchester; J. Wedzicha, London. United States: A. Anzueto, San Antonio, TX; S. Braman, Providence, RI; R. Casaburi, Torrance CA; B. Celli, Boston; G. Giessel, Richmond, VA; M. Gotfried, Phoenix, AZ; G. Greenwald, Rancho Mirage, CA; N. Hanania, Houston; D. Mahler, Lebanon, NH; B. Make, Denver; S. Rennard, Omaha, NE; C. Rochester, New Haven, CT; P. Scanlon, Rochester, MN; D. Schuller, Omaha, NE; F. Sciurba, Pittsburgh; A. Sharafkhaneh, Houston; T. Siler, St. Charles, MO; E. Silverman, Boston; A. Wanner, Miami; R. Wise, Baltimore; R. ZuWallack, Hartford, CT. ECLIPSE Steering Committee: H. Coxson (Canada), C. Crim (GlaxoSmithKline, USA), L. Edwards (GlaxoSmithKline, USA), D. Lomas (UK), W. MacNee (UK), E. Silverman (USA), R. Tal Singer (Co-chair, GlaxoSmithKline, USA), J. Vestbo (Co-chair, Denmark), J. Yates (GlaxoSmithKline, USA). ECLIPSE Scientific Committee: A. Agusti (Spain), P. Calverley (UK), B. Celli (USA), C. Crim (GlaxoSmithKline, USA), B. Miller (GlaxoSmithKline, USA), W. MacNee (Chair, UK), S. Rennard (USA), R. Tal-Singer (GlaxoSmithKline, USA), E. Wouters (The Netherlands), J. Yates (GlaxoSmithKline, USA).
RNA-Seq eQTL
Freeze 1RNA TOPMed cis-eQTL results were generated in a collaboration between the TOPMed Informatics Research Center, TOPMed Multi-Omics working group, and the TOPMed parent studies contributing RNA-seq and distributed to TOPMed investigators. We acknowledge the contributing cohorts, sequencing centers, and the TOPMed IRC.
Peer review information
Tim Sands was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.
Authors’ contributions
W.K., X.H., K.K., A.M. and M.H.C. designed the study. W.K., X.H., K.K. and D.Z. analyzed data. W.K., X.H., K.K. and S.C. provided statistical support. W.K., X.H., K.K., D.Z., A.M. and M.H.C. wrote the manuscript. S.J.L., B.Y., L.R.L., A.C.M., N.G., S.B.G., R.K., L.J.R., L.H., M.F, B.C.C., S.R., S.A.G., B.M.P., T.M.B., C.M.S., I.R., M.H.C., E.K.S., H.X., G.T..O., M.L.D., R.K., T.S., L.M.R., A.P.C., S.K., F.A., T.L., S.S.R., J.I.R., P.P.B., E.C.O., R.G.B., A.M., X.H., D.M.M., G.A.M., H.D., S.D., P.O., J.D.S., S.K.D., L.A., C.F. and D.Z. acquired or interpreted data. All authors contributed to critical revision of the manuscript. All authors, TOPMed Consortium and TOPMed Lung Working Group provided administrative, technical or material support.
Funding
This research was supported by NIH/NHLBI R01 HL153248 (A.M. and M.H.C.), R01 HL135142; R01 HL137927; R01 HL089856; R01 HL147148 (M.H.C.), and R01 HL131565 (A.M.). Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for "NHLBI TOPMed: Atherosclerosis Risk in Communities (ARIC)" (phs001211) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201500015C and 3U54HG003273-12S2) and the Broad Institute for MIT and Harvard (3R01HL092577-06S1). WGS for “NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)” (phs001612) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201600033I). WGS for “NHLBI TOPMed: The Cleveland Family Study” (phs000954) was performed at the University of Washington Northwest Genomics Center (3R01HL098433-05S1 and HHSN268201600032I). WGS for "NHLBI TOPMed: Cardiovascular Health Study (CHS)" (phs001368) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201500015C, 3U54HG003273-12S2 and HHSN268201600033I). WGS for “NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study” (phs000974) was performed at the Broad Institute of MIT and Harvard (3U54HG003067-12S2 and 3R01HL092577-06S1). WGS for “NHLBI TOPMed—NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)” (phs001395) was performed at the Baylor College of Medicine Human Genome Sequencing Center (HHSN268201600033I). WGS for “NHLBI TOPMed: The Jackson Heart Study” (phs000964) was performed at the University of Washington Northwest Genomics Center (HHSN268201100037C). WGS for “NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA)” (phs001416) was performed at the Broad Institute of MIT and Harvard (3U54HG003067-13S1 and HHSN268201500014C). WGS for “NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program” (phs000951) was performed at the University of Washington Northwest Genomics Center (3R01 HL089856-08S1) and the Broad Institute of MIT and Harvard (HHSN268201500014C). WGS for “NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)” (phs001472) was performed at the McDonnell Genome Institute (HHSN268201600037I). WGS for “NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program” (phs000946) was performed at the University of Washington Northwest Genomics Center (3R01HL089856-08S1). WGS for “NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)” (phs001662) was performed at the Broad Institute of MIT and Harvard (HHSN268201600034I). RNA-seq for “NHLBI TOPMed: LTRC” (phs001662) and “NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA)” (phs001416) was performed at the Broad Institute of MIT and Harvard (HHSN268201600032I and HHSN268201600034I, respectively). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC, and general study coordination, were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1, contract HHSN268201800001I), and TOPMed MESA Multi-Omics (HHSN2682015000031/HSN26800004). Phenotype harmonization for pulmonary traits was contributed by the NHLBI Pooled Cohorts Study with funding from NIH/NHLBI R21 HL121457, R21 HL129924, K23 HL130627, R01 HL077612. Study-specific acknowledgments are presented in the Additional file 1. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The TOPMed Data Coordinating Center, supported by NLHBI contract HHSN26800001, established the dbGaP accession (phs001974) for sharing TOPMed Genomic Summary Results (GSR) and has curated submitted data sets. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. A full list of investigators for the NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium is provided at https://topmed.nhlbi.nih.gov/topmed-banner-authorship.
Data availability
All data, including whole genome sequencing, phenotypes, and association study results, are available through dbGaP under the relevant accession numbers (see Additional file 1, and (https://topmed.nhlbi.nih.gov/topmed-data-access-scientific-community). All datasets are access-controlled, and access requires approval of Data Access Committee(s) (DAC) through the dbGaP system. General information and instructions for applying can be found at: (https://grants.nih.gov/policy-and-compliance/policy-topics/sharing-policies/accessing-data/dbgap).
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wonji Kim, Xiaowei Hu and Kangjin Kim Contributed equally to this work.
Ani Manichaikul and Michael H. Cho Jointly supervised this work.
Contributor Information
Ani Manichaikul, Email: amanicha@virginia.edu.
Michael H. Cho, Email: remhc@channing.harvard.edu
References
- 1.Kochanek KD, Murphy SL, Xu J, Arias E. Mortality in the United States, 2013. NCHS Data Brief. 2014;178:1–8. [PubMed] [Google Scholar]
- 2.Baughman P, Marott JL, Lange P, Martin CJ, Shankar A, Petsonk EL, et al. Combined effect of lung function level and decline increases morbidity and mortality risks. Eur J Epidemiol. 2012;27(12):933–43. [DOI] [PubMed] [Google Scholar]
- 3.Beaty TH, Cohen BH, Newill CA, Menkes HA, Diamond EL, Chen CJ. Impaired pulmonary function as a risk factor for mortality. Am J Epidemiol. 1982;116(1):102–13. [DOI] [PubMed] [Google Scholar]
- 4.Friedman GD, Klatsky AL, Siegelaub AB. Lung function and risk of myocardial infarction and sudden cardiac death. N Engl J Med. 1976;294(20):1071–5. [DOI] [PubMed] [Google Scholar]
- 5.Kim W, Hecker J, Barr RG, Boerwinkle E, Cade B, Correa A, et al. Assessing the contribution of rare genetic variants to phenotypes of chronic obstructive pulmonary disease using whole-genome sequence data. Hum Mol Genet. 2020; [DOI] [PMC free article] [PubMed]
- 6.Zhou JJ, Cho MH, Castaldi PJ, Hersh CP, Silverman EK, Laird NM. Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers. Am J Respir Crit Care Med. 2013;188(8):941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shrine N, Izquierdo AG, Chen J, Packer R, Hall RJ, Guyatt AL, et al. Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk. Nat Genet. 2023;55(3):410–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhao X, Qiao D, Yang C, Kasela S, Kim W, Ma Y, et al. Whole genome sequence analysis of pulmonary function and COPD in 19,996 multi-ethnic participants. Nat Commun. 2020;11(1):5182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Weiner DJ, Nadig A, Jagadeesh KA, Dey KK, Neale BM, Robinson EB, et al. Polygenic architecture of rare coding variation across 394,783 exomes. Nature. 2023;614(7948):492–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pulit SL, de With SAJ, de Bakker PIW. Resetting the bar: statistical significance in whole-genome sequencing-based association studies of global populations. Genet Epidemiol. 2017;41(2):145–51. [DOI] [PubMed] [Google Scholar]
- 11.Wain LV, Shrine N, Miller S, Jackson VE, Ntalla I, Soler Artigas M, et al. Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank. Lancet Respir Med. 2015;3(10):769–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am J Hum Genet. 2019;104(2):260–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang Q, Dhindsa RS, Carss K, Harper AR, Nag A, Tachmazidou I, et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature. 2021;597(7877):527–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karczewski KJ, Solomonson M, Chao KR, Goodrich JK, Tiao G, Lu W, et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genomics. 2022 Sept 14;2(9):100168. [DOI] [PMC free article] [PubMed]
- 15.Sikkema L, Ramírez-Suástegui C, Strobl DC, Gillett TE, Zappia L, Madissoon E, et al. An integrated cell atlas of the lung in health and disease. Nat Med. 2023;29(6):1563–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Degenhardt F, Ellinghaus D, Juzenas S, Lerga-Jaso J, Wendorff M, Maya-Miles D, et al. Detailed stratified GWAS analysis for severe COVID-19 in four European populations. Hum Mol Genet. 2022;31(23):3945–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stefansson H, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, Barnard J, et al. A common inversion under selection in Europeans. Nat Genet. 2005;37(2):129–37. [DOI] [PubMed] [Google Scholar]
- 18.Wheeler MM, Stilp AM, Rao S, Halldórsson BV, Beyter D, Wen J, et al. Whole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program. Nat Commun. 2022;13(1):7592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Xiong X, Liu Y, Mei Y, Peng J, Wang Z, Kong B, et al. Novel protective role of myeloid differentiation 1 in pathological cardiac remodelling. Sci Rep. 2017;7(1):41857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Su S, Zhu H, Xu X, Wang X, Dong Y, Kapuku G, et al. <article-title update="added"> DNA methylation of the LY86 gene is associated with obesity, insulin resistance, and inflammation. Twin Res Hum Genet. 2014;17(3):183–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Watza D, Lusk CM, Dyson G, Purrington KS, Wenzlaff AS, Neslund-Dudas C, et al. <article-title update="added">COPD‐dependent effects of genetic variation in key inflammation pathway genes on lung cancer risk. Int J Cancer. 2020;147(3):747–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li X, Meng Y. Survival analysis of immune-related lncRNA in low-grade glioma. BMC Cancer. 2019;19(1):813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shaykhiev R, Otaki F, Bonsu P, Dang DT, Teater M, Strulovici-Barel Y, et al. Cigarette smoking reprograms apical junctional complex molecular architecture in the human airway epithelium in vivo. Cell Mol Life Sci. 2011. 10.1007/s00018-010-0500-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wörthmüller J, Rüegg C. MAGI1, a scaffold protein with tumor suppressive and vascular functions. Cells. 2021;10(6):1494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat Genet. 2019;51(3):481–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brandsma CA, van den Berge M, Postma DS, Jonker MR, Brouwer S, Paré PD, et al. A large lung gene expression study identifying fibulin-5 as a novel player in tissue repair in COPD. Thorax. 2015;70(1):21–32. [DOI] [PubMed] [Google Scholar]
- 27.Artigas MS, Wain LV, Miller S, Kheirallah AK, Huffman JE, Ntalla I, et al. Sixteen new lung function signals identified through 1000 genomes project reference panel imputation. Nat Commun. 2015;6(1):8658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hucthagowder V, Sausgruber N, Kim KH, Angle B, Marmorstein LY, Urban Z. Fibulin-4: a novel gene for an autosomal recessive cutis laxa syndrome. Am J Hum Genet. 2006;78(6):1075–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Loeys B, Van Maldergem L, Mortier G, Coucke P, Gerniers S, Naeyaert JM, et al. Homozygosity for a missense mutation in fibulin-5 (FBLN5) results in a severe form of cutis laxa. Hum Mol Genet. 2002 Sept 1;11(18):2113–8. [DOI] [PubMed]
- 30.Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Slyper M, et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science. 2022 May 13;376(6594):eabl4290. [DOI] [PMC free article] [PubMed]
- 31.Sakornsakolpat P, Prokopenko D, Lamontagne M, Reeve NF, Guyatt AL, Jackson VE, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51(3):494–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Saferali A, Xu Z, Sheynkman GM, Hersh CP, Cho MH, Silverman EK, et al. Characterization of a COPD-Associated NPNT Functional Splicing Genetic Variant in Human Lung Tissue via Long-Read Sequencing. MedRxiv Prepr Serv Health Sci. 2020 Nov 3;2020.10.20.20203927. [DOI] [PMC free article] [PubMed]
- 33.Wang Q, Dhindsa RS, Carss K, Harper A, Nag A, Tachmazidou I, et al. Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants. bioRxiv; 2020. p. 2020.12.13.422582. Available from: https://www.biorxiv.org/content/10.1101/2020.12.13.422582v1 . Cited 2023 Sept 1.
- 34.Robertson CC, Inshaw JRJ, Onengut-Gumuscu S, Chen WM, Santa Cruz DF, Yang H, et al. Fine-mapping, trans-ancestral and genomic analyses identify causal variants, cells, genes and drug targets for type 1 diabetes. Nat Genet. 2021;53(7):962–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mahajan A, Spracklen CN, Zhang W, Ng MCY, Petty LE, Kitajima H, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet. 2022;54(5):560–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.DiLillo KM, Norman KC, Freeman CM, Christenson SA, Alexis NE, Anderson WH, et al. A blood and bronchoalveolar lavage protein signature of rapid FEV1 decline in smoking-associated COPD. Sci Rep. 2023;13(1):8228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Norman PJ, Norberg SJ, Guethlein LA, Nemat-Gorgani N, Royce T, Wroblewski EE, et al. Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II. Genome Res. 2017;27(5):813–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Aghapour M, Raee P, Moghaddam SJ, Hiemstra PS, Heijink IH. Airway epithelial barrier dysfunction in chronic obstructive pulmonary disease: role of cigarette smoke exposure. Am J Respir Cell Mol Biol. 2018;58(2):157–69. [DOI] [PubMed] [Google Scholar]
- 39.Shaykhiev R, Crystal RG. Early events in the pathogenesis of chronic obstructive pulmonary disease. Smoking-induced reprogramming of airway epithelial basal progenitor cells. Ann Am Thorac Soc. 2014 Dec;11 Suppl 5(Suppl 5):S252–258. [DOI] [PMC free article] [PubMed]
- 40.Gohy ST, Hupin C, Fregimilicka C, Detry BR, Bouzin C, Gaide Chevronay H, et al. Imprinting of the COPD airway epithelium for dedifferentiation and mesenchymal transition. Eur Respir J. 2015;45(5):1258–72. [DOI] [PubMed] [Google Scholar]
- 41.Walters EH, Shukla SD, Mahmood MQ, Ward C. Fully integrating pathophysiological insights in COPD: an updated working disease model to broaden therapeutic vision. Eur Respir Rev Off J Eur Respir Soc. 2021;30(160):200364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Prieto-Centurion V, Rolle AJ, Au DH, Carson SS, Henderson AG, Lee TA, et al. Multicenter study comparing case definitions used to identify patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2014;190(9):989–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Joo J, Himes B. Gene-Based Analysis Reveals Sex-Specific Genetic Risk Factors of COPD. AMIA Annu Symp Proc. 2022;21(2021):601–10. [PMC free article] [PubMed] [Google Scholar]
- 44.ARIC Study Group. Atherosclerosis Risk in Communities (ARIC) Study. National Institutes of Health (NIH), dbGaP; 2010. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000280.v8.p2
- 45.CARDIA Study Group. Coronary Artery Risk Development in Young Adults (CARDIA) Study - Cohort. National Institutes of Health (NIH), dbGaP; 2010. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000285.v3.p2
- 46.CFS Study Group. NHLBI Cleveland Family Study (CFS) Candidate Gene Association Resource (CARe). National Institutes of Health (NIH), dbGaP; 2011. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000284.v2.p1
- 47.CHS Study Group. Cardiovascular Health Study (CHS) Cohort: an NHLBI-funded observational study of risk factors for cardiovascular disease in adults 65 years or older. National Institutes of Health (NIH), dbGaP; 2011. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000287.v7.p1
- 48.FHS Study Group. Framingham Cohort. National Institutes of Health (NIH), dbGaP; 2007. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000007.v35.p16
- 49.HCHS/SOL Study Group. Hispanic Community Health Study /Study of Latinos (HCHS/SOL). National Institutes of Health (NIH), dbGaP; 2015. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000810.v2.p2
- 50.JHS Study Group. Jackson Heart Study (JHS) Cohort. National Institutes of Health (NIH), dbGaP; 2011. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000286.v7.p2
- 51.MESA Study Group. Multi-Ethnic Study of Atherosclerosis (MESA) SHARe. National Institutes of Health (NIH), dbGaP; 2010. Available from: Multi-Ethnic Study of Atherosclerosis (MESA) SHARe
- 52.CDOPGene Study Group. Genetic Epidemiology of COPD (COPDGene) Funded by the National Heart, Lung, and Blood Institute. National Institutes of Health (NIH), dbGaP; 2011. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000179.v6.p2
- 53.ECLIPSE Study Group. Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE). National Institutes of Health (NIH), dbGaP; 2017. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001252.v1.p1
- 54.EOCOPD Study Group. NHLBI TOPMed: Boston Early-Onset COPD Study. National Institutes of Health (NIH), dbGaP; 2014. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000946.v6.p2
- 55.LTRC Study Group. NHLBI TOPMed: Lung Tissue Research Consortium (LTRC). National Institutes of Health (NIH), dbGaP; 2021. Available from: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001662.v4.p2
- 56.Oelsner EC, Balte PP, Cassano PA, Couper D, Enright PL, Folsom AR, et al. Harmonization of respiratory data from 9 US population-based cohorts: the NHLBI Pooled Cohorts Study. Am J Epidemiol. 2018;187(11):2265–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med. 1999;159(1):179–87. [DOI] [PubMed] [Google Scholar]
- 58.Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. 2018;19(8):491–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wang G, Sarkar A, Carbonetto P, Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2020;82(5):1273–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zou Y, Carbonetto P, Wang G, Stephens M. Fine-mapping from summary data with the “sum of single effects” model. bioRxiv. 2021. [DOI] [PMC free article] [PubMed]
- 61.Taylor-Weiner A, Aguet F, Haradhvala NJ, Gosai S, Anand S, Kim J, et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 2019;20(1):1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kasela S, Aguet F, Kim-Hellmuth S, Brown BC, Nachun DC, Tracy RP, et al. Interaction molecular QTL mapping discovers cellular and environmental modifiers of genetic regulatory effects. BioRxiv Prepr Serv Biol. 2023 June 29;2023.06.26.546528. [DOI] [PMC free article] [PubMed]
- 63.Schubert R, Geoffroy E, Gregga I, Mulford AJ, Aguet F, Ardlie K, et al. Protein prediction for trait mapping in diverse populations. PLoS One. 2022;17(2):e0264341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Stegle O, Parts L, Durbin R, Winn J. A bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol. 2010;6(5):e1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020 Sept 11;369(6509):1318–30. [DOI] [PMC free article] [PubMed]
- 66.Wallace C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 2021;17(9):e1009440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Liu B, Gloudemans MJ, Rao AS, Ingelsson E, Montgomery SB. Abundant associations with gene expression complicate GWAS follow-up. Nat Genet. 2019;51(5):768–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Adams TS, Schupp JC, Poli S, Ayaub EA, Neumark N, Ahangari F, et al. Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis. Sci Adv. 2020;6(28):eaba1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wright JD, Folsom AR, Coresh J, Sharrett AR, Couper D, Wagenknecht LE, et al. The ARIC (Atherosclerosis Risk In Communities) Study: JACC Focus Seminar 3/8. J Am Coll Cardiol. 2021;77(23):2939–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Mirabelli MC, Preisser JS, Loehr LR, Agarwal SK, Barr RG, Couper DJ, et al. Lung function decline over 25 years of follow-up among black and white adults in the ARIC study cohort. Respir Med. 2016;113:57–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hughes GH, Cutter G, Donahue R, Friedman GD, Hulley S, Hunkeler E, et al. Recruitment in the Coronary Artery Disease Risk Development in Young Adults (Cardia) Study. Control Clin Trials. 1987;8(4 Suppl):68S-73S. [DOI] [PubMed] [Google Scholar]
- 73.Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, Jacobs DR, et al. CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;41(11):1105–16. [DOI] [PubMed] [Google Scholar]
- 74.ATS statement--Snowbird workshop on standardization of spirometry. Am Rev Respir Dis. 1979 May;119(5):831–8. [DOI] [PubMed]
- 75.Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, Kronmal RA, et al. The cardiovascular health study: design and rationale. Ann Epidemiol. 1991;1(3):263–76. [DOI] [PubMed] [Google Scholar]
- 76.Enright PL, Kronmal RA, Higgins M, Schenker M, Haponik EF. Spirometry reference values for women and men 65 to 85 years of age: cardiovascular health study. Am Rev Respir Dis. 1993;147(1):125–33. [DOI] [PubMed] [Google Scholar]
- 77.Enright PL, Kronmal RA, Higgins MW, Schenker MB, Haponik EF. Prevalence and Correlates of Respiratory Symptoms and Disease in the Elderly. Chest. 1994 Sept 1;106(3):827–34. [DOI] [PubMed]
- 78.Larkin EK, Patel SR, Goodloe RJ, Li Y, Zhu X, Gray-McGuire C, et al. A candidate gene study of obstructive sleep apnea in European Americans and African Americans. Am J Respir Crit Care Med. 2010;182(7):947–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kannel WB, Feinleib M, Mcnamara PM, Garrison RJ, Castelli WP. An investigation of coronary heart disease in families: the framingham offspring study. Am J Epidemiol. 1979;110(3):281–90. [DOI] [PubMed] [Google Scholar]
- 80.Sempos CT, Bild DE, Manolio TA. Overview of the Jackson Heart Study: a study of cardiovascular diseases in African American men and women. Am J Med Sci. 1999;317(3):142–6. [DOI] [PubMed] [Google Scholar]
- 81.Wilson JG, Rotimi CN, Ekunwe L, Royal CDM, Crump ME, Wyatt SB, et al. Study Design for Genetic Analysis in the Jackson Heart Study. Ethn Dis. 2005;15:30–7. [PubMed] [Google Scholar]
- 82.Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, et al. Multi-Ethnic Study of Atherosclerosis: Objectives and Design. Am J Epidemiol. 2002;156(9):871–81. [DOI] [PubMed] [Google Scholar]
- 83.Hankinson JL, Kawut SM, Shahar E, Smith LJ, Stukovsky KH, Barr RG. Performance of American Thoracic Society-recommended spirometry reference values in a multiethnic sample of adults: the multi-ethnic study of atherosclerosis (MESA) lung study. Chest. 2010;137(1):138–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.LaVange LM, Kalsbeek WD, Sorlie PD, Avilés-Santa LM, Kaplan RC, Barnhart J, et al. Sample design and cohort selection in the Hispanic Community Health Study/Study of Latinos. Ann Epidemiol. 2010;20(8):642–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Loh PR, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48(11):1443–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47(3):284–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nat Genet. 2018;50(7):906–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Silverman EK, Chapman HA, Drazen JM, Weiss ST, Rosner B, Campbell EJ, et al. Genetic epidemiology of severe, early-onset chronic obstructive pulmonary disease. Risk to relatives for airflow obstruction and chronic bronchitis. Am J Respir Crit Care Med. 1998;157(6 Pt 1):1770–8. [DOI] [PubMed] [Google Scholar]
- 90.Silverman EK, Weiss ST, Drazen JM, Chapman HA, Carey V, Campbell EJ, et al. Gender-related differences in severe, early-onset chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2000;162(6):2152–8. [DOI] [PubMed] [Google Scholar]
- 91.Kim V, Han MK, Vance GB, Make BJ, Newell JD, Hokanson JE, et al. The Chronic Bronchitic Phenotype of COPD: An Analysis of the COPDGene Study. Chest. 2011 Sept 1;140(3):626–33. [DOI] [PMC free article] [PubMed]
- 92.Vestbo J, Anderson W, Coxson HO, Crim C, Dawber F, Edwards L, et al. Evaluation of COPD longitudinally to identify predictive surrogate end-points (ECLIPSE). Eur Respir J. 2008;31(4):869–73. [DOI] [PubMed] [Google Scholar]
- 93.Cho MH, Boutaoui N, Klanderman BJ, Sylvia JS, Ziniti JP, Hersh CP, et al. Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nat Genet. 2010;42(3):200–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019;35(24):5346–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet Epidemiol. 2015;39(4):276–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Conomos MP, Reiner AP, Weir BS, Thornton TA. Model-free estimation of recent genetic relatedness. Am J Hum Genet. 2016;98(1):127–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Sofer T, Zheng X, Gogarten SM, Laurie CA, Grinde K, Shaffer JR, et al. A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genet Epidemiol. 2019;43(3):263–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Tang ZZ, Lin DY. Meta-analysis for discovering rare-variant associations: statistical methods and software programs. Am J Hum Genet. 2015;97(1):35–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am J Hum Genet. 2017;101(1):37–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Lin DY. A simple and accurate method to determine genomewide significance for association tests in sequencing studies. Genet Epidemiol. 2019;43(4):365–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Strnad P, McElvaney NG, Lomas DA. Alpha1-antitrypsin deficiency. N Engl J Med. 2020;382(15):1443–55. [DOI] [PubMed] [Google Scholar]
- 102.Cho MH, Castaldi PJ, Wan ES, Siedlinski M, Hersh CP, Demeo DL, et al. A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Hum Mol Genet. 2012;21(4):947–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):s13742–015–0047–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, et al. GENCODE 2021. Nucleic Acids Res. 2021;49(D1):D916–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome Biol. 2016;17(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Liu X, White S, Peng B, Johnson AD, Brody JA, Li AH, et al. Wgsa: an annotation pipeline for human genome sequencing studies. J Med Genet. 2016;53(2):111–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Consortium BC. The NHLBI BioData Catalyst. Zenodo31 August 2020 Date Last Accessed. 2020;
- 109.Morris AP, Zeggini E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol. 2010;34(2):188–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5(2):e1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91(2):224–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018;50(9):1335–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Yates A, Beal K, Keenan S, McLaren W, Pignatelli M, Ritchie GR, et al. The Ensembl REST API: Ensembl data for any language. Bioinformatics. 2015;31(1):143–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Fisher RA. Statistical Methods for Research Workers. In: Kotz S, Johnson NL, editors. Breakthroughs in Statistics: Methodology and Distribution. New York, NY: Springer; 1992. p. 66–70. (Springer Series in Statistics). Available from: 10.1007/978-1-4612-4380-9_6. Cited 2023 Sept 1.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Text, Results, Acknowledgements and Fig. S1 - S5 [70–115].
Data Availability Statement
All data, including whole genome sequencing, phenotypes, and association study results, are available through dbGaP under the relevant accession numbers (see Additional file 1, and (https://topmed.nhlbi.nih.gov/topmed-data-access-scientific-community). All datasets are access-controlled, and access requires approval of Data Access Committee(s) (DAC) through the dbGaP system. General information and instructions for applying can be found at: (https://grants.nih.gov/policy-and-compliance/policy-topics/sharing-policies/accessing-data/dbgap).




