Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2026 Feb 19:2026.02.18.26345897. [Version 1] doi: 10.64898/2026.02.18.26345897

Contribution of dominant and recessive model effects to the genetic architecture of Idiopathic Pulmonary Fibrosis

Tamara Hernandez-Beeftink 1,2, Lauren J Donoghue 3, Abril Izquierdo 1,2, Samuel T Moss 1,2, Daniel Chin 1,2, Beatriz Guillen-Guio 1,2,4, Konain Fatima Bhatti 5, Simon Biddie 6,7, Nick Shrine 1,2, Richard Packer 1,2,8, Ayodeji Adegunsoye 9, Helen L Booth 10; CleanUP-IPF Investigators of the Pulmonary Trials Cooperative11, William A Fahy 12, Tasha E Fingerlin 13, Ian P Hall 14, Simon P Hart 15, Mike R Hill 16, Nik Hirani 17, Naftali Kaminski 18, Elena Lopez-Jimenez 5, Jose Miguel Lorenzo-Salazar 19, Shwu-Fan Ma 20, Robin J McAnulty 21, Mark I McCarthy 3, Amy D Stockwell 3, Toby M Maher 22, Ann B Millar 23, Philip L Molyneaux 24, Maria Molina-Molina 25,26,4, Vidya Navaratnam 27,28, Margaret Neighbors 3, Justin M Oldham 29, Helen Parfrey 30, Gauri Saini 31, Ian Sayers 14, X Rebecca Sheng 3, Mary E Strek 9, Iain Stewart 22, Martin D Tobin 1,2, Moira KB Whyte 17, Yingze Zhang 32, Fernando J Martinez 33, Brian L Yaspan 3, Carl J Reynolds 34, David A Schwartz 35, Carlos Flores 19,4,36,37, Imre Noth 20, Alison E John 5, R Gisli Jenkins 22, Richard J Allen 1,2, Olivia C Leavy 1,2,*, Louise V Wain 1,2,*,#
PMCID: PMC12934843  PMID: 41757189

Abstract

Rationale:

Idiopathic pulmonary fibrosis (IPF) is a rare, chronic, progressive lung disease with high mortality and few treatment options. Using an additive genetic model, genome-wide association studies (GWAS) have identified multiple risk loci highlighting new genes and pathways of interest. Since IPF risk could also be influenced by non-additive effects, we hypothesised that association analyses using alternative genetic models may provide additional mechanistic insight.

Objectives:

To perform GWAS of IPF susceptibility to detect associations where the underlying effects are consistent with recessive or dominant genetic models.

Methods:

We performed GWAS of IPF susceptibility, with logistic regression assuming dominant or recessive genetic models, including 5,159 IPF cases, from clinically-curated sources, and 27,459 controls. We functionally annotated independent signals and performed variant-to-gene mapping, applying fine-mapping to define potentially causal variants and genes. We assessed differential expression levels of genes of interest in publicly available single cell RNAseq data and in primary cells derived from IPF donors and controls.

Main Results:

We identified five genome-wide significant signals, under a recessive model, that had not been reported previously. These included exonic variants in the cell-cycle gene Polyamine-Modulated Factor 1 (PMF1) and in Epsin 3 (EPN3) genes. We also observed evidence of increased PMF1 expression in airway basal cells of IPF patients compared to controls.

Conclusions:

Using alternative genetic models in IPF susceptibility GWAS identified new signals and genes, providing new insights into IPF pathogenesis and potential future therapies.

Keywords: Idiopathic pulmonary fibrosis, Susceptibility GWAS, Non-additive, Dominant, Recessive, Genetics

INTRODUCTION

Idiopathic pulmonary fibrosis (IPF) is a rare chronic progressive illness of the respiratory system characterised by scarring of the lung interstitium (1)(2). Patients have a poor prognosis and currently there is a lack of treatments and therapeutic options that can stop or reverse the scarring of the lungs (1)(2)(3). The mean age at diagnosis is around 70 years of age, it is infrequent in individuals under 50 years old, and it is estimated to affect more than 7 per 100,000 people (4).

Multiple studies have focused on the identification of genetic risk factors associated with IPF (5)(6)(7)(8)(9)(10). The heritability of IPF due to single nucleotide polymorphisms (SNPs) has been estimated around 32% (11). Genome-wide association studies (GWAS) have identified multiple common signals of association and highlighted new genes and pathways of interest (5)(6)(7)(8)(9)(10), implicating genes involved in lung defence, telomere maintenance, cell-cell adhesion, cell proliferation, and fibrotic signalling.

All previously published IPF GWAS used an additive genetic model, which assumes an increase in risk for each copy of the effect allele. However, the genetic variation of complex traits could also be influenced by non-additive effects. GWAS using other genetic models such as the dominant model, where one or more copies of the effect allele are sufficient to increase risk, and the recessive model, where two copies are required to alter risk (12)(13), could provide new insight and a better understanding of the genetic variation in complex diseases (14), such as IPF.

Here, we hypothesised that association analyses assuming non-additive genetic models may identify new IPF susceptibility signals where the underlying effect is consistent with a recessive or dominant allelic effect.

METHODS

Study design and samples

We analysed seven independent case–control studies: Colorado (15), UUS (8), UK (7), IPF Job Exposures Study (IPF-JES) (16), Genentech (17), United States, United Kingdom, and Spain (US) (6) and Study of Clinical Efficacy of Antimicrobial Therapy Strategy Using Pragmatic Design in Idiopathic Pulmonary Fibrosis - University of California, Davis (CleanUP-UCD) (18) (Figure 1).

Figure 1. Sample selection workflow.

Figure 1.

SNPs: Single Nucleotide Polymorphisms; DM: Dominant genetic model; RM: Recessive genetic model; ADD: Additive genetic model. WGS: Whole-genome sequencing data.

All participating studies included unrelated individuals of genetically-determined European ancestry, with cases diagnosed according to the American Thoracic Society and European Respiratory Society guidelines (19). More information for each study can be found in the supplementary material. Written informed consent and ethics approval were correctly undertaken in all the studies following The Code of Ethics of the World Medical Association (Declaration of Helsinki) and approved by the appropriate institutional review or Research Ethics Committee.

Genotyping and pre-processing

Genotyping arrays (Affymetrix and Illumina) were used to genotype individuals from the Colorado, IPF-JES, UK, US, UUS, and CleanUP-UCD studies. Sample quality control included the removal of individuals with low call rate estimates, sex-mismatched data, high heterozygosity rates, non-European genetic ancestry, had high relatedness, and duplicates across studies. These six studies were imputed using the TOPMed whole-genome sequencing (WGS) reference panel (GRCh38) via the TOPMed Imputation Server (20)(21)(10). For the Genentech study, genotypes were obtained from WGS using the HiSeq X Ten platform (Illumina) to an average read depth of 30X. Related individuals and those with a call rate of less than 10% were removed from the analysis. More details can be found in previous publication (10) and are summarised in the supplementary material.

Association and statistical analysis

For six studies (Colorado, IPF-JES, UK, US, UUS, and CleanUP-UCD), genome-wide association analyses were conducted separately under dominant and recessive genetic models, using logistic regression with PLINK 2 (22) and adjusting each study for the first ten principal components to correct for population stratification. Variants for which the effect alleles were not aligned across all contributing studies were excluded from the meta-analysis. Variants with poor imputation quality (r2<0.5) were removed. In the Genentech study, dominant and recessive associations were performed using logistic regression with PLINK 1.9 (22), including sex, age, and five genetic-ancestry principal components as covariates.

The seven datasets were meta-analysed using inverse-variance weighted fixed effect meta-analysis in METAL (23), excluding variants that did not pass quality control in at least two studies. We estimated the genomic inflation factor using LDSC (24).

Signal selection

Independent signals were defined based on a p-value threshold of p<5.0×10−8 (genome-wide significance) in ±500kb window. Independence from previously reported signals was determined based on genomic distance (±500 kb) and linkage disequilibrium (r2<0.1). We excluded signals where the association in any individual study was more significant than in the meta-analysis. The forest plot R v4.1.3 package was used to visualise the per-study and meta-analysis p-values. Significant signals were compared with the corresponding results, under an additive genetic model, using the same dataset (10). Whilst our primary genome-wide significance threshold was p<5×10−8, we also highlighted signals that met more conservative thresholds: i) a Bonferroni-corrected genome-wide significance threshold for two models (p<5×10−8 / 2), and ii) a more stringent threshold of p<5×10−9, used in previous large GWAS (25). This is because each variant was tested under two different (though not independent) genetic models, and TOPMed imputation enables testing of a larger number of variants than in previous GWAS. We compared the recessive and dominant model association results for all SNPs previously reported to be associated with IPF susceptibility at genome-wide significance using the additive model (p<5×10−8).

Identification of putative causal genes and variants

Variant functional annotation

We annotated sentinel variants using Ensembl Variant Effect Predictor (VEP) v.115, and we identified potentially causal variants using the Wakefield Bayes factor method to calculate the posterior inclusion probability (PIP) of each variant to define a 95% credible set (26). To assess the regulatory potential and rank the functional roles of each SNP and its associated gene, we queried RegulomeDB (27), SpliceAI (28), and PrimateAI-3D (29). To map genetic variants within the 95% credible sets to alternative transcripts, we first annotated them to MANE-defined reference exons (Matched Annotation from NCBI and EMBL-EBI (30), as well as to Ensembl-annotated (Ensembl_EIC) and unannotated alternative exons (FLIbase_ENIC) (31). We then used FLIBase full-length transcript expression data derived from long-read mRNA sequencing (32) to assess expression of reference and alternative transcripts across tissue types. We plotted the transcripts for each related gene from FLIBase using the ‘ggtranscript’ function in R version 4.3.1.

We also annotated regions for functionality based on active chromatin marks, including DNase I hypersensitivity hotspots (DHS), transcription factor footprints, enhancer RNAs (eRNAs), Assay for Transposase-Accessible Chromatin using sequencing (ATACseq), ChIP sequencing (CHIPseq), and epigenetic modification of the DNA packaging protein histones (H3K27ac) (33). We used Open Targets to determine whether the variants had previously been associated with other phenotypes (p<0.001) (see supplementary material for further details). We conducted single-variant PheWAS analyses using a recessive model for 1,929 phenotypes in up to 393,063 European individuals for 389 quantitative traits and up to 158,054 cases and 394,247 controls for 1,540 binary traits in UK Biobank (UKBB) using the R Deep-PheWAS package (34).

Variant to gene mapping

Nearest-gene assignment was based exclusively on genomic distance, making it independent of prioritisation approaches that incorporate functional annotations. We followed the locus-to-gene (L2G) framework implemented in the Open Targets platform (https://platform-docs.opentargets.org/gentropy/locus-to-gene-l2g) (35), using the footprint distance metric to link variants to genes. According to this definition, a gene was considered the nearest one if the variant was located within its genomic footprint or if the linear distance to the transcriptional boundaries of the gene (TSS or transcription end site) was minimal. Gene coordinates were obtained from protein-coding canonical transcripts in Ensembl release 115. When a variant overlapped the coding regions of multiple canonical transcripts, all corresponding genes were reported and considered the nearest gene.

Quantitative trait loci analysis

We analysed gene expression and splicing quantitative trait loci (eQTLs and sQTLs) across 54 tissues from the Genotype-Tissue Expression Project (GTEx) V8 (36) (downloaded in July 2020 from https://www.gtexportal.org/; data is only available under the additive model). We also accessed plasma protein expression quantitative trait loci (pQTLs) statistics from UKBB for 46,836 individuals from the general population (data available only under additive model). We used coloc (37) R v4.1.3 package to investigate whether the same variant was driving both the genetic association with IPF risk and an association with gene, protein, or splice variant expression levels, and report shared signals with a colocalization probability >0.7.

Rare variant associations

We looked for exonic rare variant associations (minor allele frequency [MAF]<1%) with ‘pulmonary fibrosis’ within ±500 kb of our IPF sentinel variants using both single-variant and gene-based collapsing tests from 281,104 UKBB exomes (UKBB field IDs: 1121 & 22135; and ICD-10 code J841) accessed via the AstraZeneca PheWAS portal (https://azphewas.com/). We used a threshold of p<5×10−6 for both single-variant and gene-based tests.

Nearby Mendelian respiratory disease and mouse knockout ortholog genes

We looked for rare Mendelian disease genes from ORPHANET (https://www.orpha.net/) and human orthologs of mouse knockout genes from the International Mouse Phenotyping Consortium (https://www.mousephenotype.org/) that were within ±500 kb of each IPF-associated sentinel and were linked with respiratory-related terms (‘fibrosis’, ‘fibrotic’, ‘lung’, ‘pulmonary, ‘respiratory’, ‘immune’, ‘inflammation’, ‘inflammatory’).

Cell culture

In order to investigate the differences between the basal epithelial cells of the airways in IPF and non-IPF lungs, primary airway basal epithelial cells were isolated from fresh human lung tissues obtained from lung cancer resection (non-IPF; n=3) or from lung explant tissue (IPF; n=3). All tissue samples were collected by the Clinical Research Facility Biobank at the Royal Brompton and Harefield Hospitals under ethical approval (NRES reference 20/SC/0142). Fresh tissue samples were cut into 1–2 mm2 diameter cubes and plated at 20–30 per dish in petri dishes coated with 3 ug/ml collagen (PureCol® Bovine Collagen, Cell Systems). The biopsies were left to adhere to petri dishes for 10 mins before adding 4 ml of airway epithelial cell growth media (Merck) supplemented with Bovine Pituitary Extract (0.004 ml/ml), rhEpidermal Growth Factor (10 ng/ml), rhInsulin (5 μg/ml), Hydrocortisone (0.5 μg/ml), Epinephrine (0.5 μg/ml), Triiodo-L-thyronine (6.7 ng/ml), rhTransferrin (10 μg/ml) and Retinoic Acid (0.1 ng/ml). Tissues were maintained for 7–10 days until at least 50% of the biopsies had released basal cells before removal to reduce fibroblast contamination. Cells were grown to confluence in a humidified incubator set to 37°C with 5% CO2 and passaged using TrypLE (Life Technologies). All isolated basal cells were confirmed positive for KRT5 by immunofluorescence staining and mRNA analysis.

RNA extraction and mRNA gene expression

Basal cells at passage 2 were grown to confluence and RNA extraction was performed using a Maxwell® RSC SimplyRNA Cell Kit according to the manufacturers protocol using Maxwell® RSC Extraction System (Promega). 100 ng of basal cell RNA from each sample was reverse transcribed into complimentary DNA (cDNA) using SuperscriptIV (Thermo Fisher Scientific) according to the manufacturer’s recommendations. cDNA was subjected to quantitative RT-PCR analysis using Fast SYBR green (Thermo Fisher Scientific) and gene specific primers for PMF1 and B2M (housekeeping gene) using the QuantStudio 3. Primers sequence were as follows: PMF1-F:AGCTACCAGAGATTCACTGACT; PMF1-R:AGATTTGCTGTGTCATCGCAG; B2M-F: AATCCAAATGCGGCATCT; B2M-R: GAGTATGCCTGCCGTGTG.

Integration of publicly available scRNAseq Data

Three single-cell RNA sequencing (scRNA-seq) datasets were integrated to create a large representative IPF scRNA sequencing dataset (see supplementary material for further details). The integrated datasets were: GSE136831 (38), GSE135893 (39), and GSE128033 (40), obtained from NCBI’s public data repository. The integrated scRNA dataset included 52 IPF and 48 control samples. Non-IPF ILD samples were removed from two of the datasets (GSE136831 and GSE135893) during data processing and quality control. Cell-level filtering included removing cells with mitochondrial ratios <20% and retaining cells with >500 unique molecular identifiers per cell and >1,000 genes per cell. Differential expression of genes across different cell types at Human Lung Cell Atlas (HLCA) level 3 annotation was generated using R packages ggplot2 and Seurat. Statistical significance was calculated using the ‘t.test’ function in R (ggsignif package extension).

RESULTS

We analysed data from 5,159 IPF cases and 27,459 controls of European ancestry, and a total of 59 million and 8.7 million SNPs with high imputation quality for the dominant and recessive genetic models, respectively (Figure 2; Table S1; Figure S1). In the recessive model, the lower number of SNPs reflects the minimum number of minor allele homozygote genotypes required for analysis. We identified 57 independent significant signals (34 using the dominant model and 23 using the recessive model). Of these, five had not been previously reported, and were supported across multiple studies (Table 1; Figure 3; Table S2S3; Figure S2S3). These five signals were detected only when using a recessive genetic model. Of those, four met a more stringent threshold accounting for testing of two models (p<2.5×10−8), and two of those four also met a more stringent genome-wide threshold of p<5×10−9.

Figure 2. Mirror Manhattan plot of meta-analysis results.

Figure 2.

from the dominant (up) and recessive (down) genetic model. The y-axis shows the transformed p-values (–log10) while the x-axis represents chromosomal coordinates (GRCh38). The horizontal red line corresponds to the genome-wide threshold (p=5×10−8) and the blue line shows the suggestive significance threshold (p=5×10−5). The genomic inflation factor of the meta-analysis results from the dominant (λ=1.05) and recessive (λ=1.06) genetic models did not show major deviations from the null hypothesis of no association. The plot was truncated at p=10−60. Newly discovered annotated nearest genes are highlighted in red and bold. The most significant previously reported MUC5B signal (rs35705950) has a p<5×10−121 in both genetic models.

Table 1.

Five new signals of association with IPF detected when assuming a dominant or recessive genetic model.

Additive Genetic Model Dominant Genetic Model Recessive Genetic Model
CHR BP SNP Location Nearest gene A1/A2 A1Freq. OR (CI95%) P OR (CI95%) P OR (CI95%) P Significant model effect allele No. SNPs in credible set (largest PIP)
1 156232382 rs1052053 exonic PMF1 G/A 0.371 0.91 (0.87–0.95) 3.00×10−4 0.96 (0.90–1.02) 0.189 0.75 (0.68–0.83) 8.38×10 −9 (GG vs AA;AG) 22 (0.164)
9 38912426 rs74455724 intergenic CNTNAP3 A/G 0.190 1.05 (0.96–1.15) 0.259 1.00 (0.91–1.10) 0.962 2.08 (1.61–2.70) 2.40×10 −8 (AA vs AG;GG) 2 (0.857)
13 111161343 rs11838956 intronic ARHGEF7 T/C 0.241 1.11 (1.05–1.17) 2.97×10−4 1.06 (1.00–1.13) 0.077 1.47 (1.30–1.66) 4.40×10 −10 (TT vs TC;CC) 1 (0.997)
17 50541888 rs4794159 exonic EPN3 A/C 0.449 1.12 (1.07–1.18) 3.67×10−6 1.11 (1.04–1.19) 6.34×10−3 1.24 (1.15–1.33) 4.27×10 −8 (AA vs AC;CC) 2 (0.905)
20 5245010 rs6116709 intergenic CDS2 C/T 0.389 1.18 (1.10–1.27) 4.46×10−6 1.17 (1.05–1.29) 6.00×10−3 1.62 (1.41–1.86) 5.17×10 −12 (CC vs CT;TT) 1 (0.995)

A1: effect allele; A2: non-effect allele; Freq: Frequency; OR: Odds Ratio; PIP: posterior inclusion probability; Coordinates in GRCh38; CI: Confidence Interval.

Figure 3.

Figure 3.

Summary plot of the 5 new significant signals in the non-additive genetic models. The x-axis shows the region with nearby genes harbouring the obtained SNPs. The y-axis shows the transformed p-values (−log10). The horizontal dashed line corresponds to the genome-wide significance threshold (p=5×10−8). Each colour represents the transformed p-value obtained from the different genetic models: in blue the additive results (ADD), in red the results of the dominant genetic model (DM) and in green the recessive genetic model results (RM).

Bayesian fine-mapping was performed for each of the five loci to identify the most likely causal variants (Table S3). For the chromosome 1 (sentinel rs1052053), 22 variants formed the 95% credible set, two of which had a PIP>0.1 (rs1052053 in an exon of Polyamine-Modulated Factor 1 (PMF1), PIP=0.164, and rs35002119 in an intron of the same gene, PIP=0.150). For the other four signals, the 95% credible sets each contained a single SNP with PIP>0.85 (two credible sets contained only a single SNP with PIP=1.00). These were: an intergenic SNP between CNTNAP3 and FAM201A, an intergenic SNP between CDS2 and PROKR2, an intronic SNP in ARHGEF7, and an exonic SNP in EPN3. All sentinel variants were common (MAF>5%). None of the signals had nearby mouse ortholog knockout genes related to respiratory traits, nor did they overlap with any gene-based or rare-variant associations in the UKBB or with protein levels in blood.

PMF1 encodes polyamine modulated factor 1, which is involved in chromosome segregation (41). The sentinel exon variant (rs1052053-G) is a missense variant resulting in a glutamine to arginine substitution at position 75 (Gln75Arg), with a combined annotation-dependent depletion (CADD) score of 15.72 (implying it is amongst the 10% most harmful variants in the genome). Three variants, including the sentinel variant, in the 95% credible set (rs1052053, rs2241108, and rs2853641) overlap exons of PMF1 in the reference transcript (MANE; ENST00000359511), which is expressed in all tissues (Figure S4). This signal (chromosome 1 sentinel rs1052053) is also associated with the expression and splicing of multiple other genes across multiple tissues (Table S3). In addition, one variant with a very low contribution to the credible set (rs2072499, PIP=0.006) may affect splicing of SLC25A44 and is present in an exon of four alternative transcripts of SLC25A44: two expressed in all tissues, one expressed at low levels in 13 tissues, and one specific to pancreas and prostate (Figure S4).

The sentinel rs1052053-G had been previously associated at genome-wide significance with reduced cystatin C, serum urea, and creatinine levels (42) and increased risk of stroke (43). The PheWAS analysis using a recessive model highlighted strong associations primarily with renal biomarkers (Figure S5). Of note, rs1052053-G has a lower frequency (~30%) in individuals of African ancestry compared with other ancestries (35–40%; The 1000 Genomes Project (1KGP)). Furthermore, a rare third allele (C) has been detected at a frequency of 0.08% in European populations (source: The 1KGP) but this was not analysed in this study. Analysis by ORPHANET revealed that the LAMTOR2, LMNA and RIT1 genes, located within ±500 kb of rs1052053 were also implicated in rare Mendelian immune or respiratory diseases (Table S3). Analysis of publicly available scRNA-seq data revealed significantly increased PMF1 gene expression in fibroblasts and basal cells from IPF patients compared with controls (Figure S6), with decreased expression in Alveolar Type II (AT2) cells. We observed a potential increased PMF1 expression in basal cells from IPF patients using qPCR in primary cells isolated from fresh human lung tissue (Figure S7).

The exonic signal (rs4794159) in EPN3, encoding Epsin 3 which has a role in endocytosis, was annotated as ‘deleterious’. The risk allele (rs4794159-A) leads to a proline to threonine amino acid substitution at position 544 (Pro544Thr). This signal was also associated with the expression and splicing of several other genes across multiple tissues (Table S3). The second variant in the 95% credible set, rs7215760 (PIP=0.088), also is located in an exon in one alternative transcript that was expressed across different tissue types, including low-level expression in lung (Figure S4). At the single-cell level, EPN3 was expressed in basal cells of individuals with IPF but was not detected in controls (Figure S6). This signal (rs4794159) had not been associated with other traits at genome-wide significance, although it was nominally associated with sex hormone binding protein levels (p=5×10−6) (44). Rare variants in the nearby gene ITGA3 have previously been implicated in a multi-organ disorder interstitial lung disease-nephrotic syndrome-epidermolysis bullosa (ILNEB) syndrome (Table S3).

The third signal was an intronic variant to ARHGEF7 (Rho Guanine Nucleotide Exchange Factor 7). ARHGEF7 expression was increased in basal cells and fibroblasts from IPF patients compared with controls (Figure S6). The two new intergenic association signals were located nearest the CNTNAP3 gene on chromosome 9 and the CDS2 gene on chromosome 20. The signal on chromosome 9 was associated with expression of IGFBPL1 and TCEA1P3 in the brain (Table S3).

We also evaluated the effects of 35 signals previously reported for additive association with IPF susceptibility. All were most significant under an additive model, with the exception of the signals at the TRIM46, DSP, and FUT6 loci, which were more significant under a recessive model (padditive= 1.53×10−9 vs precessive= 9.78×10−12; padditive=4.81×10−53 vs precessive= 9.43×10−64; padditive= 1.05×10−9 vs precessive= 3.06×10−11; respectively) (Table S4; Figure S8).

DISCUSSION

We identified five new genetic signals of association with IPF susceptibility that would not have been identified when using an additive model in the same dataset and adds to accumulating evidence for altered mitotic spindle assembly in IPF, new evidence for endocytosis processes, and new evidence for cellular membrane trafficking. This highlights the value of considering alternative genetic models in GWAS for IPF.

The new exonic signal on chromosome 1 implicates the PMF1 gene both through the presence of a missense variant (Glutamine > Arginine) within the signal, and an association with PMF1 gene expression in lung tissue with increased expression observed in basal cells of IPF patients compared with controls. Interestingly, variants in linkage disequilibrium with rs1052053 have been implicated in alternative splicing of PMF1 in lung tissue and in COVID-19 severity (45). However, rs1052067, which is associated with COVID-19 severity (r2=0.630 with rs1052053), was not within our fine-mapped 95% credible set. PMF1 encodes Polyamine Modulated Factor 1, which forms part of the MIS12 complex, one of the multi-protein complexes that constitute the kinetochore that connects chromosomes to spindle microtubules to enable chromosome segregation during mitosis (46). Recent GWAS of IPF susceptibility have also implicated additional mitotic spindle assembly complex genes, including SPDL1 (47), KNL1 (9), KIF15 and MAD1L1 (46). KIF15, which encodes Kinesin Family Member 15 and promotes spindle elongation by sliding microtubules apart, has been shown to be down-regulated in replicating epithelial cells from IPF patients (48). This suggests that epithelial cells in the IPF lung experience impaired regenerative capacity, with reductions in both proliferation and appropriate differentiation (49), alongside an increase in the proportion of cells entering a senescent state (50)(51). Of note, a recent plasma genome-wide Mendelian randomization study identified methylation levels of PMF1 as a potential causal influence on IPF risk (52).

We also observed a new exonic signal on chromosome 17 implicating the EPN3 gene. This gene encodes an endocytic adaptor protein involved in clathrin-mediated endocytosis and cellular membrane trafficking (53). Studies have shown that endocytosis is disrupted in fibrosis (54), and inhibition of endocytosis alters collagen I homeostasis (55). Clathrin-mediated endocytic pathways can also regulate TGF-β receptor signalling (56). When this signalling is dysregulated, it promotes persistent fibroblast activation and differentiation into myofibroblasts, as well as excessive extracellular matrix deposition. Neither PMF1 nor EPN3 are currently under investigation as a therapeutic targets at any stage of clinical drug development.

Another new signal on chromosome 13 was an intronic variant to ARHGEF7, a guanine nucleotide exchange factor for Rho family GTPases. This is functionally similar to AKAP13 (A-kinase anchoring protein 13), another guanine nucleotide exchange factor previously associated with IPF (7). The nearest genes to the other new association signals were CNTNAP3 (Contactin Associated Protein Family Member 3), which encodes a cell recognition protein, and CDS2 (Cytidine Diphosphate Diacylglycerol Synthase 2), a key enzyme in phosphoinositide biosynthesis involved in cell signaling, migration, and angiogenesis. We were unable to confirm variant-to-gene mapping for these non-coding signals despite utilising multiple in silico approaches.

The power to detect non-additive effects in GWAS of complex traits is lower than that for additive effects (57). We maximised statistical power by including all seven available, clinically-curated IPF case-control datasets in our discovery stage. In the absence of a further dataset with adequate power for independent replication, we utilised a pragmatic approach to identify robust signals that included evaluation of per-study p-values compared to meta-analysis p-values to ensure that only signals that were supported across independent datasets, and not driven by individual studies, were reported. We also highlighted signals that met more conservative significance thresholds. However, independent replication across additional datasets remains the gold standard for assessing the robustness and generalisability of association signals. A lack of publicly available QTL data generated under recessive and dominant models limits our ability to further assess whether variant-gene expression, splicing, and protein level associations also align with a dominant or recessive model. To maintain consistency with our recessive-model GWAS, we conducted a recessive-model PheWAS, recognising that this model provides less statical power than an additive-model but may be more likely to detect a true recessive effect on gene expression. Our datasets comprised individuals of European genetic ancestry; therefore these association signals need to be assessed in other populations to understand their generalisability.

In conclusion, the use of alternative genetic models identified new genome-wide significant signals which implicate new genes in IPF development. These findings add to recent mechanistic insights from other GWAS highlighting the importance of the mitotic spindle assembly complex and impaired cell proliferation in the pathogenesis of IPF.

Supplementary Material

Supplement 1
media-1.docx (2.7MB, docx)
Supplement 2
media-2.xlsx (40.5KB, xlsx)

FUNDING

This work was partially supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre (NIHR203327). The views expressed are those of the author(s) and not necessarily those of the National Health Service, the NIHR, or the Department of Health and Social Care. This study was funded by a Medical Research Council Programme Grant (MR/V00235X/1) to RGJ and LVW by a Wellcome Trust PhD studentship for DC as part of the Wellcome Trust Genetic Epidemiology and Public Health Genomics Doctoral Training Programme (218505/Z/19/Z). LVW held a GlaxoSmithKline / Asthma + Lung UK Chair in Respiratory Research (C17-1). BGG is supported by Wellcome Trust grant 221680/Z/20/Z. JMO reports NIH/NHLBI grants R56HL158935 and K23HL138190, as well as institutional NIH grants R01HL169166 and R01HL166290; an institutional grant from Boehringer Ingelheim; personal consulting fees from Boehringer Ingelheim, Lupin Pharmaceuticals, Oorja Bio, Sunterra, and Mediar; travel support from Boehringer Ingelheim; service as DMC chair for GSK and DMC member for Novartis and Genentech; stock options in Gatehouse Bio; manuscript writing assistance from Boehringer Ingelheim; and leadership roles as Program Committee Chair of the American Thoracic Society and editorial board member for AJRCCM and Chest. CF is supported by the Instituto de Salud Carlos III (PI20/00876, PI23/00980, CB06/06/1088, and PMP22/00083), co-financed by the European Regional Development Funds “A way of making Europe” from the EU, and by an agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in genomics, epidemiological surveillance based on massive sequencing, personalized medicine, and biotechnology (OA23/043). CJR was supported by Wellcome grant 201291/Z/16/Z. JML-S was supported by Cabildo Insular de Tenerife and Consejería de Educación, Gobierno de Canarias (A0000014697). This research used the ALICE and SPECTRE High Performance Computing Facility at the University of Leicester. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

COMPETING INTERESTS

AA declares funding from NIH (K23HL146942); consulting fees from Genentech, Inogen, Medscape, Abbvie, PatientMpower and Boehringer Ingelheim; payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Boehringer Ingelheim. ADS is a full-time employee of Genentech/Roche with stock and stock options in Roche. AFG was a full-time employee of PPD, Part of Thermo Fisher Scientific until June 2023. BGG declares fellowship funding from Wellcome Trust (221680/Z/20/Z). BLY is a full-time employee of Genentech/Roche with stock and stock options in Roche. CF declares funding Ministerio de Ciencia e Innovación, Instituto de Salud Carlos III and Instituto Tecnológico y de Energías Renovables; honoraria in educational events from Fundación Instituto Roche. DAS declares being the founder and chief scientific officer of Eleven P15, Inc., a company dedicated to the early diagnosis and treatment of pulmonary fibrosis; also reports support from the NHLBI (UH3HL151865, P01HL162607, R01HL158668, R01HL149836, and X01HL134585) and from the VAMC (IO1BX005295), as well as consulting fees from Vertex Pharmaceuticals. FM declares funding from NHLBI; on the steering committee for Afferent/Merck, Boehringer Ingelheim Ltd, Biogen, Bristol Myers Squibb, Chiesi, DevPro, GlaxoSmithKline, Nitto, Patara/Respivant, Roche and Vicore; consulting (not receiving fees) for AstraZeneca, Boehringer Ingelheim, Excalibur, GlaxoSmithKline, Hoffmann-LaRoche, Lung Therapeutics, RS Biotherapeutics and Two XR; payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Boehringer Ingelheim; support from Boehringer Ingelheim and Chiesi; participation on a Data Safety Monitoring Board or Advisory Board for Endevearo biomedicine; receipt of medical writing support from Boehringer Ingelheim and Hoffmann-LaRoche. HLB reports institutional grants from Boehringer Ingelheim for a collaborative working project and for an educational regional network meeting; personal payments from Boehringer Ingelheim for consulting, lectures, and travel; personal payment from GRI BIO for consulting; and personal payment from SENISCA for advisory board participation; and holds voluntary roles as Trustee of Action for Pulmonary Fibrosis and Scientific Advisory Board member of EU-PFF. HP declares grant payment to institution from Boehringer Ingelheim Ltd; consulting fees from Boehringer Ingelheim Ltd, Roche Limited, Trevi Therapeutics, Pilant Therapeutics; speaker fees from Boehringer Ingelheim Ltd; member of TIPAL trial management group, trustee for Action for Pulmonary Fibrosis, member of scientific advisory board for European Pulmonary Fibrosis Federation. IN declares funding from National Institutes of Health (UG3HL145266) to institution and R01HL171918; grant funding to institution from Veracyte; consulting fees from Boehringer Ingelheim and Sanofi. IPH declares funding from Wellcome Trust and NIHR; vice chair Trustees for Asthma + Lung UK. IS reports research grants from the Wellcome Trust, Medical Research Council, UKRI, and Nottingham NIHR Biomedical Research Centre, all unrelated to this work. JMO declares funding from National Institutes of Health (R01HL169166 & K23HL138190); consulting fees from Boehringer Ingelheim, Lupin pharmaceuticals, AmMax Bio, Roche and Veracyte; patent for TOLLIP TT genotype for NAC use in IPF; participation on a Data Safety Monitoring Board or Advisory Board for Endeavor Biomedicines, Novartis and Genentech; Associate editor for CHEST, on Program Committee for American Thoracic Society and Editorial board for AJRCCM. LJD was a full-time employee at Genentech when performing the work in this manuscript. LVW declares funding from UK Research and Innovation (MR/V00235X/1) and GSK/Asthma + Lung UK (Professorship (C17-1)) to complete this work; funding from Orion Pharma, GSK, Genentech, AstraZeneca, Nordic Bioscience, Sysmex (OGT); Consulting fees Galapagos, Boehringer Ingelheim, GSK; support for attending meetings and/or travel Genentech; participation on Advisory Board for Galapagos; leadership or fiduciary roles as Associate Editor for European Respiratory Journal and Medical Research Council Board member and Deputy Chair. MES reports institutional research support from Boehringer Ingelheim Pharmaceuticals, Inc. and the NIH; personal consulting fees and travel support from Boehringer Ingelheim Pharmaceuticals, Inc.; medical writing support related to Boehringer Ingelheim Pharmaceuticals, Inc.; and service on an adjudication committee for FibroGen and a data safety monitoring board for Bristol Myers Squibb. MKBW declares funding from National Institutes of Health (K23HL146942); consulting fees from Genentech, Inogen, Medscape, Abbvie, PatientMpower and Boehringer Ingelheim; payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Boehringer Ingelheim. MM reports being a full-time employee of Roche/Genentech and holding stock and stock options in Roche. MMM reports consulting fees and payments for lectures or presentations from Boehringer Ingelheim, Ferrer, Insmed, Veracyte, and Savara; travel support from Boehringer Ingelheim; and participation on Data Safety Monitoring Boards for Insmed, Boehringer Ingelheim, and Ferrer; and holds leadership roles on SAB IDISCAM and Scientific Direction boards at IDIBELL and CIBERES. MN is a full-time employee of Genentech/Roche with stock and stock options in Roche. MDT declares funding from Wellcome Trust (WT225221/Z/22/Z) and NIHR (NIHR201371 AND Leicester NIHR BRC), a research collaboration with Orion Pharma unrelated to this manuscript and a Respiratory Drug Delivery 2023 speaker fee, unrelated to the manuscript. RP declares funding from a research collaboration with Orion Pharma unrelated to this manuscript. MRH declares funding from The Wellcome Trust to University College London. NK declares grant funding from National Institutes of Health; grant funding to institution from BMS, Boehringer Ingelheim and Three Lakes Foundation; consultancy fees from Biogen Idec, Boehringer Ingelheim, Third Rock, Pliant, Samumed, NuMedii, Theravance, Three Lake Partners, Astra Zeneca, RohBar, Veracyte, Augmanity, CSL Behring, Thyron, Gilead, Galapagos, Chiesi, Arrowhead, Sofinnova, GSK and Merk; patent for new therapies for IPF (Biotech), new therapies for ARDS (Biotech) and new Biomarkers in IPF (Biotech); equity in Pilant and Thyron; reports serving as the scientific founder of Thyron. OCL reports receiving research funding from the Medical Research Council and serving as an associate editor for BMJ Open Respiratory Research, for which receive an honorarium. PLM declares grant funding to institution from AstraZeneca; consultancy fees from Hoffman-La Roche, Boehringer Ingelheim, AstraZeneca, Trevi and Qureight; speaker fees from Boehringer Ingelheim and Hoffman-La Roche; appears on editorial board for European Respiratory Journal Open Research. RGJ declares funding from UK Research and Innovation (MR/V00235X/1); that their institute received funding from Astra Zeneca, Biogen, Galecto, GlaxoSmithKline, Nordic Biosciences, RedX and Pliant; consulting fees from AstraZeneca, Brainomix, Bristol Myers Squibb, Chiesi, Cohbar, Daewoong, GlaxoSmithKline, Veracyte, Resolution Therapeutics and Pliant; payment for lectures and presentations received from Boehringer Ingelheim, Chiesi, Roche, PatientMPower, AstraZeneca; payment for expert testimony from Pinsent Masons LLP; participation on a Data Safety Monitoring Board or Advisory Board for Boehringer Ingelheim, Galapagos, Vicore; leadership or fiduciary role for NuMedii and president for Action for Pulmonary Fibrosis. SPH declares grant funding to institution from Boehringer Ingelheim; consulting fees from Trevi therapeutics; payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Chiesi and Trevi therapeutics; support for attending meetings and/or travel from Chiesi; Participation on a Data Safety Monitoring Board or Advisory Board for Trevi therapeutics; Chair for BTS Standards of Care Committee (till November 2022), Trustee for Action for Pulmonary Fibrosis and on the editorial board for European Respiratory Journal Open Research. TEF declares grant fees from NIH paid to their institution; consulting fees from Eleven P15; patent for Methods for Predicting Risk of Interstitial Pneumonia. TMM declares consulting fees from Boehringer Ingelheim, Roche/Genentech, Astra Zeneca, Bayer, Blade Therapeutics, Bristol-Myers Squibb, CSL Behring, Galapagos, Galecto, GlaxoSmithKline, IQVIA, Pfizer, Pliant, Respivant, Sanofi, Theravance, Trevi, Veracyte and Vicore; participation on a Data Safety Monitoring Board or Advisory Board for Fibrogen, Blade Therapeutics and Nerre. VN reports consulting fees, honoraria for lectures or presentations, travel support, and participation on a Data Safety Monitoring Board from Boehringer Ingelheim. XRS is a full-time employee of Genentech/Roche with stock and stock options in Roche. All other authors have nothing to disclose.

Funding Statement

This work was partially supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre (NIHR203327). The views expressed are those of the author(s) and not necessarily those of the National Health Service, the NIHR, or the Department of Health and Social Care. This study was funded by a Medical Research Council Programme Grant (MR/V00235X/1) to RGJ and LVW by a Wellcome Trust PhD studentship for DC as part of the Wellcome Trust Genetic Epidemiology and Public Health Genomics Doctoral Training Programme (218505/Z/19/Z). LVW held a GlaxoSmithKline / Asthma + Lung UK Chair in Respiratory Research (C17-1). BGG is supported by Wellcome Trust grant 221680/Z/20/Z. JMO reports NIH/NHLBI grants R56HL158935 and K23HL138190, as well as institutional NIH grants R01HL169166 and R01HL166290; an institutional grant from Boehringer Ingelheim; personal consulting fees from Boehringer Ingelheim, Lupin Pharmaceuticals, Oorja Bio, Sunterra, and Mediar; travel support from Boehringer Ingelheim; service as DMC chair for GSK and DMC member for Novartis and Genentech; stock options in Gatehouse Bio; manuscript writing assistance from Boehringer Ingelheim; and leadership roles as Program Committee Chair of the American Thoracic Society and editorial board member for AJRCCM and Chest. CF is supported by the Instituto de Salud Carlos III (PI20/00876, PI23/00980, CB06/06/1088, and PMP22/00083), co-financed by the European Regional Development Funds “A way of making Europe” from the EU, and by an agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in genomics, epidemiological surveillance based on massive sequencing, personalized medicine, and biotechnology (OA23/043). CJR was supported by Wellcome grant 201291/Z/16/Z. JML-S was supported by Cabildo Insular de Tenerife and Consejería de Educación, Gobierno de Canarias (A0000014697). This research used the ALICE and SPECTRE High Performance Computing Facility at the University of Leicester. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Footnotes

ETHICS

This research was conducted with appropriate ethics approval. The PROFILE study (which provided samples for the UK and UUS studies) had institutional ethics approval at the University of Nottingham (NCT01134822 - ethics reference 10/H0402/2) and Royal Brompton and Harefield NHS Foundation Trust (NCT01110694 - ethics reference 10/H0720/12). UK samples were recruited across multiple sites with individual ethics approval (University of Edinburgh Research Ethics Committee [The Edinburgh Lung Fibrosis Molecular Endotyping (ELFMEN) Study NCT04016181] 17/ES/0075, NRES Committee South West - Southmead, Yorkshire and Humber Research Ethics Committee 08/H1304/54 and Nottingham Research Ethics Committee 09/H0403/59). Spanish samples were recruited under ethics approval by ethics committee from the Hospital Universitario N.S. de Candelaria (reference of the approval: PI-19/12). The UUS study also included individuals from clinical trials with ethics approval (ACE [NCT00957242] and PANTHER [NCT00650091]). For the UCSF cohort, sample and data collection were approved by the University of California San Francisco Committee on Human Research and all patients provided written informed consent. For the Vanderbilt cohort, the Institutional Review Boards from Vanderbilt University approved the study and all participants provided written informed consent before enrolment. For individuals recruited at the University of Chicago, consenting patients with IPF who were prospectively enrolled in the institutional review board-approved ILD registry (IRB#14163A) were included. Individuals recruited at the University of Pittsburgh Medical Centre had ethics approval from the University of Pittsburgh Human Research Protection Office (referenceSTUDY20030223: Genetic Polymorphisms in IPF). Individuals from the COMET (NCT01071707) and Lung Tissue Research Consortium (NCT02988388) studies were also included in the Chicago study. All subjects in the Colorado study gave written informed consent as part of IRB-approved protocols for their recruitment at each site and the GWAS study was approved by the National Jewish Health IRB and Colorado Combined Institutional Review Boards (COMIRB). Subjects in the Genentech study provided written informed consent for whole-genome sequencing of their DNA. Ethical approval was provided as per the original clinical trials (INSPIRE [NCT00075998], RIFF [NCT01872689], CAPACITY [NCT00287729 and NCT00287716] and ASCEND [NCT01366209]). Individuals in the CleanUP-UCD study included individuals from clinical trials with ethics approval (NCT02759120). These samples were genotyped under University of Virginia ethics approval (IRB 20845). IPFJES involved human participants and was approved by East Midlands - Nottingham 1 Research Ethics Committee REC reference: 17/EM/0021IRAS project ID: 203355. Participants gave informed consent to participate in the study before taking part.

DATA AVAILABITILY

Summary statistics (i.e., effect size estimates, standard errors, p values and basic variant information) for all variants included in the genome-wide meta-analysis can be accessed via https://github.com/genomicsITER/PFgenetics.

REFERENCES

  • 1.Ley B, Collard HR, King TE Jr. Clinical course and prediction of survival in idiopathic pulmonary fibrosis. American journal of respiratory and critical care medicine. 2011;183(4):431–40. [DOI] [PubMed] [Google Scholar]
  • 2.Undurraga Á. Fibrosis pulmonar idiopática. Revista médica clínica las Condes. 2015;26(3):292–301. [Google Scholar]
  • 3.Janowiak P, Szymanowska-Narloch A, Siemińska A. IPF Respiratory Symptoms Management—Current Evidence. Frontiers in Medicine. 2022;9:917973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gupta RS, Koteci A, Morgan A, George PM, Quint JK. Incidence and prevalence of interstitial lung diseases worldwide: a systematic literature review. BMJ Open Respiratory Research. 2023;10(1):e001291. [Google Scholar]
  • 5.Mushiroda T, Wattanapokayakit S, Takahashi A, Nukiwa T, Kudoh S, Ogura T, et al. A genome-wide association study identifies an association of a common variant in TERT with susceptibility to idiopathic pulmonary fibrosis. J Med Genet. 2008;45(10):654–6. [DOI] [PubMed] [Google Scholar]
  • 6.Noth I, Zhang Y, Ma S, Flores C, Barber M, Huang Y, et al. Genetic variants associated with idiopathic pulmonary fibrosis susceptibility and mortality: a genome-wide association study. The Lancet respiratory medicine. 2013;1(4):309–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Allen RJ, Porte J, Braybrooke R, Flores C, Fingerlin TE, Oldham JM, et al. Genetic variants associated with susceptibility to idiopathic pulmonary fibrosis in people of European ancestry: a genome-wide association study. The Lancet respiratory medicine. 2017;5(11):869–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Allen RJ, Guillen-Guio B, Oldham JM, Ma S, Dressen A, Paynton ML, et al. Genome-wide association study of susceptibility to idiopathic pulmonary fibrosis. American journal of respiratory and critical care medicine. 2020;201(5):564–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Allen RJ, Stockwell A, Oldham JM, Guillen-Guio B, Schwartz DA, Maher TM, et al. Genome-wide association study across five cohorts identifies five novel loci associated with idiopathic pulmonary fibrosis. Thorax. 2022;77(8):829–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chin D, Hernandez-Beeftink T, Donoghue L, Guillen-Guio B, Leavy OC, Adegunsoye A, et al. Genome-wide association study of Idiopathic Pulmonary Fibrosis susceptibility using clinically-curated European-ancestry datasets. medRxiv. 2025. [Google Scholar]
  • 11.Peljto AL, Blumhagen RZ, Walts AD, Cardwell J, Powers J, Corte TJ, et al. Idiopathic pulmonary fibrosis is associated with common genetic variants and limited rare variants. American journal of respiratory and critical care medicine. 2023;207(9):1194–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. 2012.
  • 13.Liu H, Zheng J, Yang D, Liu Z, Li Z, Hu Z, et al. Recessive/dominant model: Alternative choice in case-control-based genome-wide association studies. PLoS One. 2021;16(7):e0254947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.O’Connor MJ, Schroeder P, Huerta-Chagoya A, Cortés-Sánchez P, Bonàs-Guarch S, Guindo-Martínez M, et al. Recessive genome-wide meta-analysis illuminates genetic architecture of type 2 diabetes. Diabetes. 2022;71(3):554–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fingerlin TE, Murphy E, Zhang W, Peljto AL, Brown KK, Steele MP, et al. Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis. Nat Genet. 2013;45(6):613–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Reynolds CJ, Sisodia R, Barber C, Moffatt M, Minelli C, De Matteis S, et al. What role for asbestos in idiopathic pulmonary fibrosis? Findings from the IPF job exposures case–control study. Occup Environ Med. 2023;80(2):97–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Donoghue LJ, Stockwell AD, Neighbors M, Sheng RX, Prabhakaran R, Wolters PJ, et al. Identification of a Genetic Susceptibility Locus for Idiopathic Pulmonary Fibrosis in the 16p Subtelomere Using Whole-Genome Sequencing. American Journal of Respiratory and Critical Care Medicine. 2023;207(7):941–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Martinez FJ, Yow E, Flaherty KR, Snyder LD, Durheim MT, Wisniewski SR, et al. Effect of antimicrobial therapy on respiratory hospitalization or death in adults with idiopathic pulmonary fibrosis: the CleanUP-IPF randomized clinical trial. JAMA. 2021;325(18):1841–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Raghu G, Remy-Jardin M, Myers JL, Richeldi L, Ryerson CJ, Lederer DJ, et al. Diagnosis of idiopathic pulmonary fibrosis. An official ATS/ERS/JRS/ALAT clinical practice guideline. American journal of respiratory and critical care medicine. 2018;198(5):e44–68. [DOI] [PubMed] [Google Scholar]
  • 20.Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590(7845):290–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):s13742–8. [Google Scholar]
  • 23.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bulik-Sullivan BK, Loh P, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shrine N, Izquierdo AG, Chen J, Packer R, Hall RJ, Guyatt AL, et al. Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk. Nat Genet. 2023;55(3):410–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wakefield J. Bayes factors for genome-wide association studies: comparison with P-values. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2009;33(1):79–86. [Google Scholar]
  • 27.Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jaganathan K, Panagiotopoulou SK, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535,548. e24. [DOI] [PubMed] [Google Scholar]
  • 29.Vogan K. Improved pathogenicity prediction using primate genomics. Nat Genet. 2023;55(7):1082. [Google Scholar]
  • 30.Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022;604(7905):310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Weykopf G, Badonyi M, Friman ET, Nguyen JMH, Ioannou A, Livesey BJ, et al. Disease-associated genetic variants can cause mutations in tissue-specific protein isoforms. bioRxiv. 2025:2025.07. 24.666385. [Google Scholar]
  • 32.Shi Q, Li X, Liu Y, Chen Z, He X. FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues. Nucleic Acids Res. 2024;52(D1):D124–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Biddie SC, Weykopf G, Hird EF, Friman ET, Bickmore WA. DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants. Genome Biol. 2024;25(1):208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Packer RJ, Williams AT, Hennah W, Eisenberg MT, Shrine N, Fawcett KA, et al. DeepPheWAS: an R package for phenotype generation and association analysis for phenome-wide association studies. Bioinformatics. 2023;39(4):btad073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mountjoy E, Schmidt EM, Carmona M, Schwartzentruber J, Peat G, Miranda A, et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet. 2021;53(11):1527–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45(6):580–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS genetics. 2014;10(5):e1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Adams TS, Schupp JC, Poli S, Ayaub EA, Neumark N, Ahangari F, et al. Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis. Science advances. 2020;6(28):eaba1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Habermann AC, Gutierrez AJ, Bui LT, Yahn SL, Winters NI, Calvi CL, et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Science advances. 2020;6(28):eaba1972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Morse C, Tabib T, Sembrat J, Buschur KL, Bittar HT, Valenzi E, et al. Proliferating SPP1/MERTK-expressing macrophages in idiopathic pulmonary fibrosis. European Respiratory Journal. 2019;54(2). [Google Scholar]
  • 41.Kline SL, Cheeseman IM, Hori T, Fukagawa T, Desai A. The human Mis12 complex is required for kinetochore assembly and proper chromosome segregation. J Cell Biol. 2006;173(1):9–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Barton AR, Sherman MA, Mukamel RE, Loh P. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat Genet. 2021;53(8):1260–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Malik R, Chauhan G, Traylor M, Sargurupremraj M, Okada Y, Mishra A, et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet. 2018;50(4):524–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ruth KS, Day FR, Tyrrell J, Thompson DJ, Wood AR, Mahajan A, et al. Using human genetics to understand the disease impacts of testosterone in men and women. Nat Med. 2020;26(2):252–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Nakanishi T, Willett J, Farjoun Y, Allen RJ, Guillen-Guio B, Adra D, et al. Alternative splicing in lung influences COVID-19 severity and respiratory diseases. Nature Communications. 2023;14(1):6198. [Google Scholar]
  • 46.Petrovic A, Keller J, Liu Y, Overlack K, John J, Dimitrova YN, et al. Structure of the MIS12 complex and molecular basis of its interaction with CENP-C at human kinetochores. Cell. 2016;167(4):1028,1040. e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Dhindsa RS, Mattsson J, Nag A, Wang Q, Wain LV, Allen R, et al. Identification of a missense variant in SPDL1 associated with idiopathic pulmonary fibrosis. Communications biology. 2021;4(1):392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhang D, Povysil G, Kobeissy PH, Li Q, Wang B, Amelotte M, et al. Rare and common variants in KIF15 contribute to genetic risk of idiopathic pulmonary fibrosis. American journal of respiratory and critical care medicine. 2022;206(1):56–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hewitt RJ, Pearmain L, Lyka E, Dickens J. Epithelial damage and ageing: the perfect storm. Thorax. 2025. [Google Scholar]
  • 50.Schafer MJ, White TA, Iijima K, Haak AJ, Ligresti G, Atkinson EJ, et al. Cellular senescence mediates fibrotic pulmonary disease. Nat Commun. 2017; 8: 14532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.McCarthy CG, Wenceslau CF, Webb RC, Joe B. Novel contributors and mechanisms of cellular senescence in hypertension-associated premature vascular aging. American journal of hypertension. 2019;32(8):709–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhang K, Shi P, Li A, Zhou J, Chen M. Plasma genome-wide mendelian randomization identifies potentially causal genes in idiopathic pulmonary fibrosis. Respiratory Research. 2024;25(1):379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Li Y, Zhang P, Tang G, Zhong J, Wang Z, Zhu B. Lowering expression of Epsin-3 inhibits migration and invasion of lung adenocarcinoma cells by inhibiting the epithelial–mesenchymal transition. Scientific Reports. 2024;14(1):17069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chang J, Pickard A, Herrera JA, O’Keefe S, Garva R, Hartshorn M, et al. Endocytic recycling is central to circadian collagen fibrillogenesis and disrupted in fibrosis. Elife. 2025;13:RP95842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Di Guglielmo GM, Le Roy C, Goodfellow AF, Wrana JL. Distinct endocytic pathways regulate TGF-β receptor signalling and turnover. Nat Cell Biol. 2003;5(5):410–21. [DOI] [PubMed] [Google Scholar]
  • 56.Wang J, Lai X, Yao S, Chen H, Cai J, Luo Y, et al. Nestin promotes pulmonary fibrosis via facilitating recycling of TGF-β receptor I. European Respiratory Journal. 2022;59(5). [Google Scholar]
  • 57.Palmer DS, Zhou W, Abbott L, Wigdor EM, Baya N, Churchhouse C, et al. Analysis of genetic dominance in the UK Biobank. Science. 2023;379(6639):1341–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.docx (2.7MB, docx)
Supplement 2
media-2.xlsx (40.5KB, xlsx)

Data Availability Statement

Summary statistics (i.e., effect size estimates, standard errors, p values and basic variant information) for all variants included in the genome-wide meta-analysis can be accessed via https://github.com/genomicsITER/PFgenetics.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES