Abstract
Idiopathic pulmonary fibrosis (IPF) is a chronic lung condition with poor survival times. We previously published a genome-wide meta-analysis of IPF risk across three studies with independent replication of associated variants in two additional studies. To maximise power and to generate more accurate effect size estimates, we performed a genome-wide meta-analysis across all five studies included in the previous IPF risk GWAS. We utilised the distribution of effect sizes across the five studies to assess the replicability of the results and identified five robust novel genetic association signals implicating mTOR signalling, telomere maintenance and spindle assembly genes in IPF risk.
Introduction
Idiopathic pulmonary fibrosis (IPF) is a chronic lung disease believed to result from an aberrant response to alveolar injury leading to a build-up of scar tissue. This progressive scarring is eventually fatal with half of individuals dying within 3 to 5 years of diagnosis1. The cause of IPF is unknown but genetics play an important role in how susceptible an individual is to IPF2.
Genome-wide association studies (GWAS) are an approach whereby genetic variants from across the genome are tested for their association with a disease. Genetic loci identified by GWAS can implicate genes important in disease pathogenesis and drugs which target the products encoded by these genetically-supported genes are twice as likely to be successful during development. The genetic association statistics from a GWAS are also widely used to identify causal markers of disease through Mendelian randomisation, to conduct heritability estimation and for genetic correlation analyses.
We recently published a GWAS of IPF risk2. The discovery GWAS consisted of three studies (named as the UK, Chicago and Colorado studies) and a replication analysis performed in two independent studies (named as the UUS [USA, UK and Spain] and Genentech studies). This analysis reported 14 genetic signals which implicated host defence, cell-cell adhesion, spindle assembly, TGF-β signalling regulation and telomere maintenance as important biological processes involved in IPF disease risk. The effect size estimates from this analysis have been widely used in other genetic analyses3–5 and have been integrated into drug target discovery pipelines.
To maximise sample sizes for detection of new genetic associations, and to generate more precise effect size estimates, we have reanalysed the data and present a meta-analysis of genome-wide data from all 5 datasets included in our previous study. The results of this analysis implicate new genetic loci in IPF pathogenesis and provide a unique resource for other studies of IPF risk and pathogenesis.
Methods
Quality control and sample selection have been previously described2. In summary, datasets comprised of unrelated European-ancestry individuals from across the USA, UK and Spain, diagnosed using ATS/ERS guidelines6,7. Individuals in the Genentech study were sequenced using HiSeq X Ten platform (Illumina) and all other individuals were imputed from genotyping data using the HRC reference panel8. Genome-wide analyses were performed in each study separately using an additive logistic regression model adjusting for the first 10 genetic principal components to account for population stratification.
The five separate study-level GWAS were meta-analysed into one single GWAS, using an inverse-variance weighted fixed effect meta-analysis using METAL9. Variants were included in the meta-analysis if they were available in at least four studies. Genomic control was performed on the meta-analysis results using the LD score regression intercept to account for inflation not explained by polygenic effects10. Significant variants were defined as those with meta-analysis p<5×10−8 and conditional analyses were performed using GCTA-COJO to identify additional independent associated variants11. Independent associated variants were defined as variants remaining genome-wide significant after conditioning on the most significant variant (sentinel) in the region with consistent effect size estimates in the conditional and non-conditional analysis. Annotation of the sentinel variants was then performed using Variant Effect Predictor12.
To assess the robustness of novel results, we tested the strength and consistency of results across studies using MAMBA (Meta-Analysis Model-Based Assessment of replicability)13. Variants with a posterior probability of replicability (PPR)≥90% were considered robust and likely to replicate should additional independent datasets become available.
Summary statistics (i.e. effect size estimates, standard errors, p values and basic variant information) for all variants included in the genome-wide meta-analysis can be accessed at https://github.com/genomicsITER/PFgenetics.
Results
A total of 4,125 cases, 20,464 controls and 7,554,248 genetic variants were included in the analysis (Figure 1). The UUS study included one additional case (due to resolving a sample ID issue since the previous publication) and one fewer control (where the individual has since withdrawn consent from UK Biobank) than described in the previous GWAS2.
After conditional analyses, there were 23 independent signals with p<5×10−8 in the genome-wide meta-analysis (Figure 2). These 23 signals included all 14 associations reported in the previous GWAS (Supplementary Table 1). Of the nine novel genetic associations (Table 1), five showed evidence of replicability (PPR≥90%). The sentinel variants of these five loci included variants in introns of KNL1, NPRL3, STMN3 and RTEL1, and an intergenic variant in 10q25.1. All five novel variants had consistent direction of effect across all of the individual studies and reached nominal significance (p<0.05) in at least 3 of the studies. Twelve of the 14 previously reported signals had PPR>90% (Supplementary Table 1).
Table 1:
Chr | Position | rsid | Annotation | Ref allele | Effect allele | EAF | Direction | Study p≤0.05 | OR [95% CI] | p | PPR |
---|---|---|---|---|---|---|---|---|---|---|---|
i) Novel variants with high posterior probability of replication (PPR≥90%) | |||||||||||
10 | 111229861 | rs79684490 | Intergenic (10q25.1) | G | A | 4.6% | + + + + + | YYNYY | 1.40 [1.24, 1.57] | 3.52×10−8 | 94.0% |
15 | 40931708 | rs12912339 a | Intron of KNL1 | G | A | 15.9% | + + + + + | YYNYY | 1.30 [1.21, 1.39] | 7.41×10−13 | 96.5% |
16 | 162240 | rs74614704 | Intron of NPRL3 | G | A | 5.6% | + + + + + | YNNYY | 1.49 [1.33, 1.67] | 2.57×10−12 | 99.4% |
20 | 62284170 | rs112087793 b | Intron of STMN3 | T | C | 91.5% | + + + + + | YYYYY | 1.34 [1.21, 1.48] | 1.09×10−8 | 96.8% |
20 | 62324391 | rs41308092 b | Intron of RTEL1 | G | A | 2.1% | + + + + + | YYYYN | 1.75 [1.45, 2.10] | 3.13×10−9 | 99.9% |
ii) Novel variants not reaching PPR≥90% threshold | |||||||||||
1 | 214659598 | rs4233306 | Intron of PTPN14 | T | C | 80.2% | + + + + + | YYNNN | 1.23 [1.15, 1.32] | 3.41×10−9 | 37.4% |
6 | 43352980 | rs1214759 | Intergenic (6p21.2) | A | G | 67.9% | + + + + + | NYYYN | 1.18 [1.11, 1.25] | 1.71×10−8 | 21.9% |
9 | 109480268 | rs11788059 | Regulatory region variant (9q31.2) | T | C | 34.2% | + + + + + | NYNYY | 1.17 [1.10, 1.23] | 4.85×10−8 | 3.1% |
10 | 105640978 | rs7100920 | Regulatory region of OBFC1 | C | T | 49.0% | + + − + + | NYNYY | 1.19 [1.13, 1.26] | 1.67×10−10 | 32.1% |
Novel variants are defined as those not reaching significance criteria in previous analysis2 (the RTEL1 and OBFC1 signals have previously shown a possible association – see discussion). Effect sizes and directions are given in terms of the allele that increases risk of IPF. Chr=Chromosome. Position is based on genetic build 37. Annotation obtained from Variant Effect Predictor12. EAF=Effect allele frequency calculated across the five studies. The “Direction” column shows the direction of the beta in each of the five individual studies (+ means beta>0, − means beta<0). The “Study p≤0.05” column denotes which individual studies the variant reached nominal significance in (Y means p≤0.05, N means p>0.05). Both the direction and study p<0.05 are given in the order UK, Colorado, Chicago, UUS and then Genentech. OR=Odds ratio. CI=Confidence interval. PPR=posterior probability of replicability calculated using MAMBA13.
The signal at KNL1 is independent of the previously reported nearby signal in the IVD gene.
The RTEL1 and STMN3 signals are independent of each other.
Discussion
By increasing the number of cases in the discovery analysis by more than 50% compared with the previous IPF risk GWAS, we identified novel genetic signals associated with IPF risk and improved the precision of estimations for previously reported signals. The five novel loci had internal evidence of replicability giving us confidence that these signals are likely to be generalisable.
The signals in RTEL1 and OBFC1 have been reported previously but did not meet the significance criteria of the previous three-way GWAS2. The new MAMBA analysis suggests that the consistency of effect across studies provides high confidence that the RTEL1 signal will replicate should an independent dataset become available. This is not the case for the OBFC1 signal where a low posterior probability of replication suggests that there may be heterogeneity in effect across the contributing studies.
The novel signals require further characterisation to determine the likely causal gene and underlying functional effect of the variants. However, some of the genes that are closest to these new signals have strong candidacy for involvement in IPF pathogenesis. NPRL3 encodes a GATOR1 complex function component and acts through mTORC1 signalling to inhibit mTOR kinase activity14. mTOR regulates TGF-β collagen synthesis and inhibiting mTOR leads to increased deposition of scar tissue15. We previously reported an association implicating DEPTOR, another mTOR inhibiting gene. We also add to the evidence that cellular ageing plays a key role in IPF pathogenesis through associations at the telomere maintenance genes TERT, TERC and RTEL1. We previously reported associations in spindle assembly genes (MAD1L1 and KIF15) and have identified a novel genetic association in another spindle assembly gene KNL1 (Kinetochore Scaffold 1 also known as CASC5). STMN3 (Stathmin 3) implicates another cell replication process through tubulin binding14.
Our analysis also shows the benefits of including all samples in the genome-wide analysis. By utilising recent statistical methodological advances to test for the replicability of signals when all available datasets are included in the discovery GWAS13, we were able to identify five additional variants with evidence of being robustly associated with IPF risk. Additional independent replication of these signals would strengthen the evidence for their role in IPF susceptibility.
By maximising the statistical power of the analysis, we identified novel genetic associations with IPF risk. These signals may implicate biologically relevant genes that support the importance of TGF-β signalling and cell replication as important processes in disease pathogenesis
Supplementary Material
Funding
R Allen is an Action for Pulmonary Fibrosis Mike Bray Research Fellow. L Wain holds a GSK/Asthma + Lung UK Chair in Respiratory Research (C17–1). This work was supported by Medical Research Council Programme grant number MR/V00235X/1. This work was supported by National Institute of Health/National Heart, Lung and Blood Institute grant numbers R56HL158935 and K23HL138190 (J Oldham). This work was supported by Wellcome Trust grant number 221680/Z/20/Z (B Guillen-Guio). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The research was partially supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the National Health Service (NHS), the NIHR or the Department of Health. The UK and UUS studies selected controls from UK Biobank under application 8389. This research used the SPECTRE High Performance Computing Facility at the University of Leicester.
Footnotes
Competing interests
L Wain reports collaborative research funding from GSK and Orion Pharma, and consultancy for Galapagos, outside of the submitted work. A Stockwell and B Yaspan are employees of Genentech/Roche and hold stock and stock options in Roche. J Oldham reports personal fees from Boehringer Ingelheim, Genentech, United Therapeutics, AmMax Bio and Lupin pharmaceuticals unrelated to the submitted work. RG Jenkins is a trustee of Action for Pulmonary Fibrosis and reports personal fees from Astra Zeneca, Biogen, Boehringer Ingelheim, Bristol Myers Squibb, Chiesi, Daewoong, Galapagos, Galecto, GlaxoSmithKline, Heptares, NuMedii, PatientMPower, Pliant, Promedior, Redx, Resolution Therapeutics, Roche, Veracyte and Vicore. D Schwartz is the founder and chief scientific officer of Eleven P15, Inc., a company focused on the early detection and treatment of pulmonary fibrosis. TMM has received industry-academic funding from GlaxoSmithKline (GSK) R&D and UCB; and has received consultancy or speakers fees from Apellis, AstraZeneca, Bayer, Biogen Idec, Boehringer Ingelheim, Cipla, GSK R&D, InterMune, ProMetic, Roche, Sanofi-Aventis, Sanumed, and UCB.
Ethics Statement
This research was conducted using previously published work with appropriate ethics approval. The PROFILE study (which provided samples for the UK and UUS studies) had institutional ethics approval at the University of Nottingham (NCT01134822 – ethics reference 10/H0402/2) and Royal Brompton and Harefield NHS Foundation Trust (NCT01110694 – ethics reference 10/H0720/12). Spanish samples were recruited under ethics approval by ethics committee from the Hospital Universitario N.S. de Candelaria (reference of the approval: PI-19/12). The UUS study also included individuals from clinical trials with ethics approval (ACE [NCT00957242] and PANTHER [NCT00650091]). UK samples were recruited across multiple sites with individual ethics approval (University of Edinburgh Research Ethics Committee [The Edinburgh Lung Fibrosis Molecular Endotyping (ELFMEN) Study NCT04016181] 17/ES/0075, NRES Committee South West – Southmead, Yorkshire and Humber Research Ethics Committee 08/H1304/54, Nottingham Research Ethics Committee 09/H0403/59 and Royal Papworth Hospital Research Tissue Bank 18/EE/0269). For individuals recruited at the University of Chicago, consenting patients with IPF who were prospectively enrolled in the institutional review board-approved ILD registry (IRB#14163A) were included. Individuals recruited at the University of Pittsburgh Medical Centre had ethics approval from the University of Pittsburgh Human Research Protection Office (reference STUDY20030223: Genetic Polymorphisms in IPF). Individuals from the COMET (NCT01071707) and Lung Tissue Research Consortium (NCT02988388) studies were also included in the Chicago study. All subjects in the Colorado study gave written informed consent as part of IRB-approved protocols for their recruitment at each site and the GWAS study was approved by the National Jewish Health IRB and Colorado Combined Institutional Review Boards (COMIRB). Subjects in the Genentech study provided written informed consent for whole-genome sequencing of their DNA. Ethical approval was provided as per the original clinical trials (INSPIRE [NCT00075998], RIFF [NCT01872689], CAPACITY [NCT00287729 and NCT00287716] and ASCEND [NCT01366209]). For the USCF cohort, sample and data collection were approved by the University of California San Francisco Committee on Human Research and all patients provided written informed consent. For the Vanderbilt cohort, the Institutional Review Boards from Vanderbilt University approved the study and all participants provided written informed consent before enrolment.
References
- 1.Lederer DJ, Martinez FJ. Idiopathic pulmonary fibrosis. N Engl J Med 2018;378(19):1811–23. [DOI] [PubMed] [Google Scholar]
- 2.Allen RJ, Guillen-Guio B, Oldham JM, Ma S, Dressen A, Paynton ML, Kraven LM, Obeidat M, Li X, Ng M. Genome-wide association study of susceptibility to idiopathic pulmonary fibrosis. American Journal of Respiratory and Critical Care Medicine 2020;201(5):564–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhang Y, Zhao M, Guo P, Wang Y, Liu L, Zhao J, Gao L, Yuan Z, Xue F, Zhao J. Mendelian randomisation highlights hypothyroidism as a causal determinant of idiopathic pulmonary fibrosis. EBioMedicine 2021;73:103669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang L, Balmat TJ, Antonia AL, Constantine FJ, Henao R, Burke TW, Ingham A, McClain MT, Tsalik EL, Ko ER. An atlas connecting shared genetic architecture of human diseases and molecular phenotypes provides insight into COVID-19 susceptibility. Genome Medicine 2021;13(1):1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Duckworth A, Gibbons MA, Allen RJ, Almond H, Beaumont RN, Wood AR, Lunnon K, Lindsay MA, Wain LV, Tyrrell J. Telomere length and risk of idiopathic pulmonary fibrosis and chronic obstructive pulmonary disease: A mendelian randomisation study. The Lancet Respiratory Medicine 2021;9(3):285–94. [DOI] [PubMed] [Google Scholar]
- 6.Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, Colby TV, Cordier J, Flaherty KR, Lasky JA. An official ATS/ERS/JRS/ALAT statement: Idiopathic pulmonary fibrosis: Evidence-based guidelines for diagnosis and management. American Journal of Respiratory and Critical Care Medicine 2011;183(6):788–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Raghu G, Remy-Jardin M, Myers JL, Richeldi L, Ryerson CJ, Lederer DJ, Behr J, Cottin V, Danoff SK, Morell F. Diagnosis of idiopathic pulmonary fibrosis. an official ATS/ERS/JRS/ALAT clinical practice guideline. American Journal of Respiratory and Critical Care Medicine 2018;198(5):e44–68. [DOI] [PubMed] [Google Scholar]
- 8.McCarthy S, Das S, Kretzschmar W, Durbin R, Abecasis G, Marchini J. A reference panel of 64,976 haplotypes for genotype imputation. bioRxiv 2016:035170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Willer CJ, Li Y, Abecasis GR. METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26(17):2190–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bulik-Sullivan BK, Loh P, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM, Schizophrenia Working Group of the Psychiatric Genomics Consortium. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015;47(3):291–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden PA, Heath AC, Martin NG, Montgomery GW, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012. Mar 18;44(4):369–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F. The ensembl variant effect predictor. Genome Biol 2016;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.McGuire D, Jiang Y, Liu M, Weissenkampen JD, Eckert S, Yang L, Chen F, Berg A, Vrieze S, Jiang B. Model-based assessment of replicability for genome-wide association meta-analysis. Nature Communications 2021;12(1):1–14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stelzer G, Rosen R, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Iny Stein T, Nudel R, Lieder I, Mazor Y, et al. GeneCards – the human gene database. The GeneCards suite: From gene data mining to disease genome sequence analysis. Current Protocols in Bioinformatics 2016(54):1.30.1,1.30.33. [DOI] [PubMed] [Google Scholar]
- 15.Woodcock HV, Eley JD, Guillotin D, Platé M, Nanthakumar CB, Martufi M, Peace S, Joberty G, Poeckel D, Good RB. The mTORC1/4E-BP1 axis represents a critical signaling node during fibrogenesis. Nature Communications 2019;10(1):6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.