Skip to main content
ERJ Open Research logoLink to ERJ Open Research
letter
. 2025 Sep 22;11(5):00305-2025. doi: 10.1183/23120541.00305-2025

Fine-scale regional frequency of the MUC5B promoter variant correlates with healthcare burden of idiopathic pulmonary fibrosis

Edmund Gilbert 1,2, Aoife Carolan 3,4, Mari Ozaki 3,4, Niamh Logan 5, Khaled Musameh 5, Helen O'Brien 6, Wan Lin Ng 3,7, Jisha Jasmin 3,8, Cormac McCarthy 6, Michael P Keane 6, David A Schwartz 9, Michael T Henry 5, Killian Hurley 3,4,
PMCID: PMC12451587  PMID: 40989789

Extract

The gain-of-function MUC5B promoter variant n.661G>T (rs35705950) is a well-established common genetic risk factor for idiopathic pulmonary fibrosis (IPF) [1] and is present in over 50% of European-like ancestry patients with IPF. It also confers risk for other types of interstitial lung diseases (ILDs) with usual interstitial pneumonia pattern, including chronic hypersensitivity pneumonitis [2], asbestosis [3] and rheumatoid arthritis-associated ILD [4].

Shareable abstract

Mapping the fine-scale geographic distribution of the MUC5B promoter variant across the island of Ireland demonstrated a strong correlation with IPF hospital discharge. This type of risk variant mapping could support more precise healthcare planning. https://bit.ly/43tZYI7


To the Editor:

The gain-of-function MUC5B promoter variant n.661G>T (rs35705950) is a well-established common genetic risk factor for idiopathic pulmonary fibrosis (IPF) [1] and is present in over 50% of European-like ancestry patients with IPF. It also confers risk for other types of interstitial lung diseases (ILDs) with usual interstitial pneumonia pattern, including chronic hypersensitivity pneumonitis [2], asbestosis [3] and rheumatoid arthritis-associated ILD [4]. The MUC5B promoter variant (rs35705950) is thought to be involved in the pathogenesis of IPF but its role is only partially understood [5, 6]. In addition, the promoter variant is relatively frequent among individuals of European-like ancestry (minor allele frequency (MAF) 9–11%) and to a lesser degree South Asian-like ancestry (∼8%), but is rare or nearly absent in those with East Asian- and African-like ancestry (MAF∼1%) [7].

Our understanding of the fine-scale geographic distribution of the MUC5B promoter variant is limited and its interaction with healthcare burden associated with ILD has not been studied. Using a unique dataset of 2465 individuals with Irish ancestry [8, 9] we examined the geographic distribution of the MUC5B promoter variant allele across the island of Ireland. We estimated the population weighted IPF discharge proportion per county across Ireland and demonstrated a strong correlation between the presence of MUC5B promoter variant and hospital discharge with a diagnosis of IPF.

To quantify regional differences in the frequency of the MUC5B promoter variant, we leveraged an expanded set of “Irish Regional Reference” genotypes combining data from the Irish DNA Atlas [8] and Trinity Student cohort [9]. The Irish DNA Atlas consists of individuals with genealogical regional ancestry [8] allowing us to map them by their averaged great-grandparental birthplace. With these genealogies and genotypes, and an updated network-based clustering methodology (see additional detail on methods provided at: https://github.com/FutureNeuroIE/IPF_Ireland) we detected several fine-scale clusters across the island of Ireland – and organised over two levels of detail (see online methods for a map of these clusters). The first assigned individuals into one of four groups that correspond well to the four provinces of Ireland. The second divided each of these four groups into fine-scale clusters, whose geographic distributions are consistent with previous observations [8], allowing us to be confident they pick up meaningful, albeit subtle, genetic distinctions on the island. Using this fine-scale geographically-placed ancestry data for 2465 individuals, we sought to determine the frequency distribution of the MUC5B promoter variant across clusters on the island of Ireland (figure 1a). Overall, these data demonstrate a MAF of 11.1% for the MUC5B promoter variant in Ireland, but with significant geographical heterogeneity. For example, we found that there was an increased MAF in clusters S Connacht 1, 2, and 4 in the west of Ireland, in the Clare and Limerick clusters in southern Ireland, and in the N Leinster–SE Ulster cluster of the northern Ireland region. These data also show a decrease in MAF in clusters in the southernmost cluster of eastern Ireland (Wexford) and in the western tip of the western Ireland region. To better visualise the MAF across Ireland we employed Kriging surface interpolation, which estimates values at missing grid points (i.e., at regions with no Irish Regional References) and is represented in figure 1b. These data demonstrate the MAF across the island of Ireland and reveal three main combined hot spots of increased MUC5B promoter MAF; the first in the northern aspect of Leinster (around county Meath), second a hot spot in the north western aspect of Munster (around county Clare) and finally a hot spot in Connacht (on the border between counties Galway and Mayo, figure 1b). Furthermore, the data demonstrate a chain of population clusters of increased frequency of the MUC5B promoter variant across the southern border of Ulster and arching around the eastern border of Connacht down into county Clare. Thus, our results show a fine-scale heterogenous distribution of the MUC5B promoter variant across Ireland.

FIGURE 1.

FIGURE 1

Fine-scale distribution and frequency of the MUC5B risk allele across the island of Ireland and its relationship with idiopathic pulmonary fibrosis (IPF) hospital discharge rates. a) The imputed minor allele frequencies of the MUC5B promoter variant in each of the Irish fine-scale clusters, separated by the four level groups which correspond to Irish Provinces. Each point is provided a colour and shape, denoting different clusters. Solid error bars denote standard errors, and dashed error bars denote standard deviation, each estimated by 100 bootstrap replicates. A single vertical solid grey line shows the island mean frequency weighted by cluster size, 11.1%. b) Kriging surface interpolation of the MUC5B promoter variant across the island of Ireland. Each participant (black point) is assigned the minor allele frequency calculated for that participant's fine-scale cluster (see a), and these points were used in the surface interpolation. All plots were generated using the statistical computing language R, and the package ggplot2. c) A per-county map of population-size-weighted IPF (international classification of diseases J84.1) discharge rates across Ireland by MUC5B promoter variant MAF, colouring each county by its average rate. All plots were generated using the statistical computing language R, and the package ggplot2. d) The relationship between average county MUC5B promoter variant (rs35705950) frequency and weighted IPF discharge rate. The value for each county is shown as a labelled point, colour coded by Irish health regions (South-West, South-East, Mid-West, Mid-East, West, Dublin, Border, Midland). A linear relationship was estimated between the two (see trendline) and a 95% confidence interval for this trend was also estimated (shaded area).

Next, to examine the relationship between a community's genetic risk for, and phenotype presentation of IPF, we tested the county-wide correlation between the geographic distribution of the MUC5B MAF and hospital attendance for IPF in Ireland. Using Irish hospital discharge rates for a diagnosis of IPF (J84.1, World Health Organization international classification of diseases (ICD) 10) for a 5-year period recorded in the Hospital Inpatient Enquiry system, and the Irish county population sizes from the Irish census, we estimated a population-size-weighted discharge rate for each county in the Republic of Ireland and compared this to an average MUC5B MAF for each Irish county (see online methods https://github.com/FutureNeuroIE/IPF_Ireland). We found that hospital discharge for IPF weighted by population size was elevated in county Galway, other western counties, counties in the regions in southern Ulster and the northeast of Ireland. Graphically, we can see that there is a distinct geographical profile of weighted discharge from hospital (figure 1c) which appears to mirror the MUC5B MAF in figure 1a and b. This localised increased MAF is significantly associated with a higher rate of hospital discharge with a diagnosis of IPF in Ireland, with a strong correlation found when plotting weighted discharge rate versus the MUC5B promoter variant frequency as shown in figure 1d (Spearman's rho of 0.67, p-value=0.0001887). In addition, we performed a linear regression between weighted discharge and the MUC5B promoter variant frequency, controlling for county average age, sex, smoking status and county average measured particulate matter with aerodynamic diameter <2.5 µm (PM2.5) and <10 µm (PM10) air pollution exposure, using stepwise model Akaike information criterion selection to find the best model. In this model, promoter variant frequency was the most substantial predictor included, and significant (p-value=0.004, beta=14.0 CI 6.0–22.0). PM2.5 was significant with small effect size (p-value=0.04, beta=−0.03 CI −0.06–0.002), and M/F sex ratio was included but not significant (p-value=0.22, beta=−5.2 CI −14.4–4.0).

In IPF, there is a complex relationship between genetic factors, environmental exposures and other patient risk factors. Data on discharged patient family history was not available, but Irish registry data suggests that familial pulmonary fibrosis affects 19% of patients with IPF [10]. Our analysis did take into account recognised environmental modifiers of risk for IPF, such as smoking status and per county average air pollution exposures, however air pollution can vary greatly across urban and rural settings which may have limited our analysis. We are also limited by smaller sample sizes of genetic profiles in the west of Ireland (i.e., Galway), which impacts confidence around frequency point estimates. However, even a weighted average frequency across provinces shows a higher frequency in the west compared to the east (12.4% versus 10.3%). Healthcare records have the potential to misclassify patients [11] and ICD-10 codes, J84.1 specifically, may fail to replicate known association effect sizes in large patient databases such as the UK Biobank [12]. Therefore, these potential inaccuracies in classifying patients may limit the generalisability of correlations with MUC5B MAF identified in this study. Qualified by these limitations, we conclude that there is a strong correlation between estimated high geographic frequency of the MUC5B risk allele and healthcare burden of IPF as represented by hospital discharges coded for IPF, even when taking into account non-genetic risk factors. Estimating the prevalence of a rare disease in the community is often challenging due to the resources involved in case finding, collecting data and the fragmented nature of many healthcare systems. But more countries and regions now have population-level genomic programmes such as UK BioBank and All of Us, which enable accurate assessment of risk-allele frequencies at the level of fine-scale geography. Therefore, where incomplete historical disease prevalence data exists, fine-scale risk-allele frequency may help to identify geographic locations where specific disease prevalence and hence healthcare burden are likely to be high, and thus assist in long-term planning for specialised hospital and community services.

Acknowledgements

We would like to thank Prof. Gianpiero L. Cavalleri for the use of the Irish DNA Atlas dataset and his advice on the manuscript. We would like to thank Dr Lawrence Brody and the Trinity student cohort for the use of the dataset. Hospital discharge data was extracted from the Hospital Inpatient Enquiry system supplied by The Healthcare Pricing Office, Ireland. PM2.5 and PM10 datasets were accessed from the EPA Ireland Archive of Air Quality monitoring data for Ireland.

Footnotes

Provenance: Submitted article, peer reviewed.

Ethics statement: This study was approved by the Beaumont Hospital Research Ethics Committee.

Conflict of interest: E. Gilbert reports that Science Foundation Ireland funded the original genotyping of the Irish DNA Atlas and Sequence Bioinformatics provided funds in-kind for a previous project unrelated to this manuscript. This funding was used to generate additional genotype data from the Irish DNA Atlas. This funding was separate to the project reported in this article. They report being a voluntary member of the Irish Centre for High-End Computing (ICHEC) Users Council that was set up in September 2024. This role, and the council's activity is unrelated to the current manuscript which did not utilise computing resources at ICHEC. They also report being a member of the Irish Society of Human Genetics board, which is a voluntary position. A. Carolan reports being a scholar on the Strategic Academic Recruitment Doctor of Medicine programme with Royal College of Surgeons in Ireland Dublin and the Bon Secours Hospital Dublin. They also report an Irish Thoracic Society–GSK educational travel bursary of EUR 1000 to travel to the European Respiratory Society Congress. M. Ozaki, N. Logan, K. Musameh, H. O'Brien, W.L. Ng, J. Jasmin, M.P. Keane and M.T. Henry have nothing to disclose. C. McCarthy reports grants or contracts from the Health Research Board, Ireland; Enterprise Ireland; The LAM Foundation; Savara Inc.; and the UCD Foundation. They report consulting fees from Savara Inc., AI Therapeutics Inc. and Boehringer Ingelheim. They report payment or honoraria for lectures, presentations, speakers’ bureaus, manuscript writing or educational events from Savara Inc. and Boehringer Ingelheim. They also report participation on a data safety monitoring or advisory board from Kinevant Inc. and Savara Inc. D.A. Schwartz reports National Heart, Lung, Blood and Institute grants UH3HL151865, P01HL162607, R01HL158668, R01HL149836, X01HL134585, and Veterans Affairs Medical Center grant IO1BX005295; consulting fees from Vertex Pharmaceuticals; and is founder and chief scientific officer of Eleven P15 Inc., a company focused on the early diagnosis and treatment of pulmonary fibrosis. K. Hurley reports support for the present study from Health Research Board, Ireland, Emerging Clinical Scientist Award (ECSA-2020-011) and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (101078300 - STAR-TEL); grants or contracts from Moderna (payments to institution); and payment or honoraria for lectures, presentations, speakers’ bureaus, manuscript writing or educational events from Boehringer Ingelheim and patientMpower.

Support statement: This study was supported by a Health Research Board, Ireland, Emerging Clinical Scientist Award (ECSA-2020-011) and the European Research Council under the European Union's Horizon 2020 research and innovation programme (101078300 – STAR-TEL) to K. Hurley; and an Irish Research Council Starting Laureate Award (IRCLA/2022/1572) to M. Ozaki. E. Gilbert was supported by funding from Science Foundation Ireland (SFI) (grant number 16/RC/3948) and co-funded under the European Regional Development Fund and by FutureNeuro industry partners; and The National University of Ireland Post-Doctoral Fellowship in the Sciences and Engineering 2021–2023. Genotypes from the Irish DNA Atlas were part-funded by a Career Development Award (13/CDA/2223) from SFI and through industry funding from Sequence Bioinformatics Inc. (Canada). The funders had no role in study design, data interpretation or writing of this letter. Funding information for this article has been deposited with the Crossref Funder Registry.

Data availability

Datasets generated and/or analysed during the study are available indefinitely from: https://github.com/FutureNeuroIE/IPF_Ireland. All code for statistically modelling and analysis is available from the same web page. The primary data from the Trinity student cohort is available from Dr Lawrence Brody at Trinity College Dublin on reasonable request. Hospital Inpatient Enquiry system data was supplied by The Healthcare Pricing Office, Ireland and is available from them on request. Data is available to anyone who wishes to access the data for any purpose immediately following publication. PM2.5 and PM10 datasets were accessed from the EPA Ireland Archive of Air Quality monitoring data for Ireland.

References

  • 1.Seibold MA, Wise AL, Speer MC, et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. N Engl J Med 2011; 364: 1503–1512. doi: 10.1056/NEJMoa1013660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Furusawa H, Peljto AL, Walts AD, et al. Common idiopathic pulmonary fibrosis risk variants are associated with hypersensitivity pneumonitis. Thorax 2022; 77: 508–510. doi: 10.1136/thoraxjnl-2021-217693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Platenburg M, Wiertz IA, van der Vis JJ, et al. The MUC5B promoter risk allele for idiopathic pulmonary fibrosis predisposes to asbestosis. Eur Respir J 2020; 55: 1902361. doi: 10.1183/13993003.02361-2019 [DOI] [PubMed] [Google Scholar]
  • 4.Juge PA, Lee JS, Ebstein E, et al. MUC5B promoter variant and rheumatoid arthritis with interstitial lung disease. N Engl J Med 2018; 379: 2209–2219. doi: 10.1056/NEJMoa1801562 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hancock LA, Hennessy CE, Solomon GM, et al. Muc5b overexpression causes mucociliary dysfunction and enhances lung fibrosis in mice. Nat Commun 2018; 9: 5363. doi: 10.1038/s41467-018-07768-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Stancil IT, Michalski JE, Davis-Hall D, et al. Pulmonary fibrosis distal airway epithelia are dynamically and structurally dysfunctional. Nat Commun 2021; 12: 4566. doi: 10.1038/s41467-021-24853-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Genomes Project C, Auton A, Brooks LD, et al. A global reference for human genetic variation. Nature 2015; 526: 68–74. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gilbert E, O'Reilly S, Merrigan M, et al. The Irish DNA atlas: revealing fine-scale population structure and history within Ireland. Sci Rep 2017; 7: 17199. doi: 10.1038/s41598-017-17124-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Desch KC, Ozel AB, Siemieniak D, et al. Linkage analysis identifies a locus for plasma von Willebrand factor undetected by genome-wide association. Proc Natl Acad Sci USA 2013; 110: 588–593. doi: 10.1073/pnas.1219885110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Irish Thoracic Society . Irish Thoracic Society ILD Registry Annual Report 2018. Dublin, Irish Thoracic Society, 2018. https://irishthoracicsociety.com/wp-content/uploads/2018/11/ITS-ILD-Registry-Annual-Report-2018.pdf [Google Scholar]
  • 11.Beesley LJ, Mukherjee B. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 2022; 78: 214–226. doi: 10.1111/biom.13400 [DOI] [PubMed] [Google Scholar]
  • 12.Leavy OC, Allen RJ, Kraven LM, et al. The use of genetic information to define idiopathic pulmonary fibrosis in UK biobank. Chest 2023; 163: 362–365. doi: 10.1016/j.chest.2022.07.027 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Datasets generated and/or analysed during the study are available indefinitely from: https://github.com/FutureNeuroIE/IPF_Ireland. All code for statistically modelling and analysis is available from the same web page. The primary data from the Trinity student cohort is available from Dr Lawrence Brody at Trinity College Dublin on reasonable request. Hospital Inpatient Enquiry system data was supplied by The Healthcare Pricing Office, Ireland and is available from them on request. Data is available to anyone who wishes to access the data for any purpose immediately following publication. PM2.5 and PM10 datasets were accessed from the EPA Ireland Archive of Air Quality monitoring data for Ireland.


Articles from ERJ Open Research are provided here courtesy of European Respiratory Society

RESOURCES