Summary
Individuals with cystic fibrosis (CF) develop complications of the gastrointestinal tract influenced by genetic variants outside of CFTR. Cystic fibrosis-related diabetes (CFRD) is a distinct form of diabetes with a variable age of onset that occurs frequently in individuals with CF, while meconium ileus (MI) is a severe neonatal intestinal obstruction affecting ∼20% of newborns with CF. CFRD and MI are slightly correlated traits with previous evidence of overlap in their genetic architectures. To better understand the genetic commonality between CFRD and MI, we used whole-genome-sequencing data from the CF Genome Project to perform genome-wide association. These analyses revealed variants at 11 loci (6 not previously identified) that associated with MI and at 12 loci (5 not previously identified) that associated with CFRD. Of these, variants at SLC26A9, CEBPB, and PRSS1 associated with both traits; variants at SLC26A9 and CEBPB increased risk for both traits, while variants at PRSS1, the higher-risk alleles for CFRD, conferred lower risk for MI. Furthermore, common and rare variants within the SLC26A9 locus associated with MI only or CFRD only. As expected, different loci modify risk of CFRD and MI; however, a subset exhibit pleiotropic effects indicating etiologic and mechanistic overlap between these two otherwise distinct complications of CF.
Keywords: genetic modifier, pleiotropy, cystic fibrosis, intestinal obstruction, diabetes, CFRD, meconium ileus, cystic fibrosis-related diabetes
Genetic modifiers play a significant role in two independent complications of cystic fibrosis (CF): neonatal intestinal obstruction and diabetes. Whole-genome sequencing followed by common and rare variant association identified pleiotropic loci displaying concordant and/or discordant modification of each trait, revealing unexpected mechanistic overlap between distinct complications of CF.
Introduction
Cystic fibrosis (CF [MIM: 219700]) is an autosomal-recessive disorder affecting more than 70,000 individuals worldwide, caused by loss-of-function genetic variants in the CF transmembrane conductance regulator gene (CFTR). Diabetes and intestinal obstruction at birth are frequent complications of CF.
CF-related diabetes (CFRD) has some characteristics in common with type 1 diabetes (T1D [MIM: 222100]) and type 2 diabetes (T2D [MIM: 125853]) but is distinct from both. CFRD generally involves a slow decline in β-cell function and islet amyloid deposition,1,2 as is the case for T2D. However, unlike T2D, individuals with CFRD generally have normal insulin sensitivity (except during pulmonary disease exacerbations or glucocorticoid treatment).3 The prevalence of CFRD increases with age, affecting 19% of adolescents, 40%–50% of adults,4 and, in those with severe CFTR dysfunction, more than 90% by age 50.5 CFRD is associated with severe long-term complications including worse lung function trajectory despite ivacaftor use6 and reduced survival.4
There is an overlap between the genetic architecture of T2D and CFRD. Variants at several loci are associated with both CFRD and T2D (e.g., TCF7L2, CDKAL1, CDKN2A/B, and IGF2BP27, 8, 9). Furthermore, CFRD was strongly associated with polygenic risk scores (PRSs) for T2D, insulin secretion, postchallenge glucose concentration, and fasting plasma glucose, and less strongly with a T1D PRS. CFRD was inconsistently associated with PRSs for insulin sensitivity and was not associated with a PRS for islet autoimmunity.9
Meconium ileus (MI [MIM: 614665]) is an intestinal obstruction in the terminal ileum resulting in failure to pass the meconium at birth.10 It occurs in ∼20% of CF newborns with equal frequency in males and females11 and is fatal without enema or surgical intervention.10
CFRD and MI are both correlated with CFTR function (individuals with CF with milder CFTR dysfunction and exocrine pancreatic sufficiency have low rates of MI and CFRD), but the etiologic and mechanistic relationships between CFRD and MI are unclear. After controlling for CFTR function, MI and CFRD have been reported to be correlated in some but not all cohorts.8,12 While restricting to individuals with severe CFTR dysfunction, previous genome-wide and candidate-based association studies have identified variants at several loci that associate with either trait, including three loci (PRSS1, CEBPB, and SLC26A9) with variants that were associated with both traits.8,9,12, 13, 14 In this manuscript, to explore the extent of pleiotropy (i.e., one variant affecting multiple traits15), we analyzed whole-genome sequence (WGS) data of individuals in the CF Genome Project (CFGP).
Subjects and methods
Cohorts
Individuals with CF were recruited from five cohorts: CF Twin and Sibling Study (TSS)16 and Cystic Fibrosis-Related Diabetes (CFRD) Study8 (Johns Hopkins University); Genetic Modifier Study (GMS)17,18 and GMS of Severe CF Liver Disease (GMS CFLD)19, 20, 21 (University of North Carolina); and Early Pseudomonas Infection Control Observational (EPIC Obs) Study22 (University of Washington and Seattle Children’s Hospital). Each site participating in the TSS, CFRD, GMS, GMS CFLD, and EPIC Obs Studies obtained IRB approval, and participants or their parents/guardians provided informed consent. Most of the individuals from the TSS, GMS, and GMS CFLD studies were included in previous genome-wide association studies for MI and CFRD as a part of the Gene Modifier Consortium.8,9,13,14 The remaining individuals, mostly from EPIC Obs and CFRD, were not previously included in those prior studies. Details on these cohorts are as described previously.23 In brief, TSS is comprised of twins, triplets, and siblings with CF. CFRD enrolled individuals with CF who do and do not have CFRD. GMS enrolled individuals who had mild or severe extremes of lung disease and two pancreatic-insufficient (PI) variants. GMS CFLD had the goal to define the major gene modifiers for severe CF liver disease with portal hypertension. EPIC OBS is a longitudinal cohort study of risk factors for and outcomes associated with acquisition of Pseudomonas aeruginosa (Pa) infection. The EPIC OBS cohort is the youngest, and the GMS and GMS CFLD cohorts are the oldest (Table 1).
Table 1.
Characteristics of affected individuals enrolled by the studies composing the cystic fibrosis genome project
GMS + GMSCFLD | TSS + CFRD | EPIC Obs | Everyone | |
---|---|---|---|---|
Number of individualsa | 1,763 | 1,582 | 1,224 | 4,569 |
Femalea | 769 (43.6%) | 749 (47.3%) | 603 (49.3%) | 2,121 (46.4%) |
Mean birth yeara | 1982 | 1992 | 2000 | 1990 |
Mean age (years)b | 31.1 | 23.2 | 16.8 | 24.1 |
% with CFRD (has CFRD/non-missing) | 50.7% (720/1,419) | 31.9% (480/1,504) | 19.1% (226/1,185) | 34.7% (1,426/4,108) |
% with MI (has MI/non-missing) | 15.7% (272/1,736) | 21.0% (324/1,544) | 26.4% (319/1,209) | 20.4% (915/4,489) |
F508del homozygotea | 1,277 (72.4%) | 878 (55.5%) | 714 (58.3%) | 2,869 (62.8%) |
F508del hom: % with CFRD (has CFRD/non-missing) | 51.4% (567/1,103) | 33.6% (283/843) | 19.1% (132/690) | 37.3% (982/2,636) |
F508del hom: % with MI (has MI/non-missing) | 15.5% (195/1,260) | 23.3% (201/863) | 27.1% (191/704) | 20.8% (587/2,827) |
Includes the full dataset of 4,569 individuals whose WGS passed sample QC and with phenotype data available for at least one of the two key phenotypes (CFRD and MI).
Mean age at last CFRD screen. Includes only the 4,108 individuals with non-missing CFRD statuses.
Cystic fibrosis-related diabetes phenotype
Cystic fibrosis-related diabetes (CFRD) status was ascertained from medical records and the CFF Patient Registry (CFFPR)24 2017 data for all five cohorts. Diabetes was defined by clinician diagnosis and insulin treatment for at least 1 year. Age of onset was defined as the first year of treatment with oral agents, intermittent insulin, or chronic insulin. We censored (i.e., did not use data) after the age at last diabetes screen, or age at first solid organ transplant, whichever came first. Individuals without clinician diagnostic information with no insulin use and no high lab test results (2-h glucose < 200, random glucose < 200, and hemoglobin A1c < 6.5) were classified as unaffected by CFRD. Individuals with diagnoses of type 1 or type 2 diabetes were excluded. Individuals with prior CFRD diagnosis and insulin use that did not have CFRD in at least two of the most recent screenings and individuals with inconsistent CFRD diagnoses and insulin use were excluded.
Neonatal intestinal obstruction/meconium ileus phenotype
Meconium ileus (MI) status was ascertained from medical records and the CFFPR 2017 data.14,25 Medical records were not available for the EPIC Obs cohort. MI was slightly more common in the EPIC Obs cohort, possibly due to the MI status relying solely on CFFPR 2017 data, which has been shown to contain 7%–18% false positives and no false negatives (S.D. Wood et al., 2007, Cystic Fibrosis Conf., abstract), or also to birth cohort effects (recall bias and previously lower survival rates in the older cohorts).
Whole-genome sequencing
Genomic DNA of 5,199 samples were sequenced at the Broad Institute using NovaSeq 6000. Joint variant calling was performed for 5,134 samples that passed quality control (QC) at the sequencing center using GATK (v.4.1) Best Practices Workflows and HaplotypeCaller (see supplemental methods). Bi-allelic, single-nucleotide variants were included. Insertion/deletions were excluded. Variant quality score recalibration (VQSR) and hard filters (QD > 2.0; QUAL > 30.0; SOR < 3.0; FS < 60.0; MQ > 40.0; ReadPosRankSum > −8.0) were used to filter variants.
Sample inclusion
Duplicate samples and those with evidence for contamination (Freemix26 estimate ≥ 2%), high chimera rate (≥5%), and low coverage (mean ≤2 9.5×, 20× coverage ≤ 85%, 10× coverage ≤ 95%) were excluded. Sample identity was verified using sex, comparisons of pedigree-based vs. empirical kinship, and concordance with prior genotyping data. Individuals with aneuploid sex chromosomes were excluded. Individuals with CF who have 2 PI CFTR disease-causing variants and/or exocrine pancreatic insufficiency were included.23 After excluding the individuals with missing phenotype information, this yielded a final dataset of 4,569 individuals with at least one of the two key phenotypes known. Of these individuals, 4,028 had phenotype information available for both phenotypes.
Correcting for population structure and relatedness
Sample structure was well controlled by including a genetic relationship matrix (GRM) as random effect and first four principal components (PCs) as fixed effects. The estimation of GRM and PCs are performed using GENESIS27 packages which allow for estimation of relatedness accounting for population structure28 and robust population structure inference in the presence of relatedness.29
We first performed linkage disequilibrium (LD) pruning on all the autosomal single-nucleotide polymorphisms (SNPs) within the 5,105 samples that passed quality control (QC) at the sequencing center consisting of SNPs with MAF > 0.05 and missing call rate < 0.02, with all pairs of SNPs having r2 < 0.2 within 10 Mb sliding windows. This resulted in 118,117 SNPs. The initial relatedness and the kinship coefficients (KC) were estimated using the KING-robust procedure28 implemented in SNPRelate.30 We then used the ancestry divergence measure from the KING-robust to partition samples into “maximum unrelated subset (n = 4,065)” and “related subset (n = 1,040).” The maximum unrelated subset contains individuals who are mutually unrelated to each other at KC of 0.044 (corresponding to 3rd degree relatives), and the related subset contains individuals with KC > 0.044 to someone in the unrelated subset. Another LD pruning was done on the max unrelated subset using the same set of parameters but on the max unrelated subset of the samples. This yielded 118,109 independent SNPs for the principal component analysis (PCA) in 4,065 unrelated samples. To obtain eigenvectors for all study samples, including relatives of the unrelated set, we used the approach described by Zhu et al.31 In this approach, the maximum unrelated was analyzed to obtain SNP eigenvectors, which were then used to calculate sample eigenvectors for (i.e., project into) the related set (Figure S1). To determine how many PCs might be useful covariates to adjust for population stratification in downstream association testing, we examined the scree plot for the PCA which showed the first four PCs were sufficient to account for the majority of the population structure (Figure S2). We then estimated KC and IBD sharing (pcrelate function) accounting for population structure using the first two ancestry representative PCs to provide accurate relatedness estimates due to recent family (pedigree) structure. We repeated the above step for another round to obtain the GRM to be included in association tests.
Common variant association testing
MI was modeled as a binary variable via mixed-effects logistic regression (GMMAT32 implemented in GENESIS package27), with covariates of site (JHU, UNC, UW), residual CFTR function (zero vs. nonzero), linear birth cohort ([<1960: 0, [1960–1965): 1, [1965–1970): 2, [1970–1975): 3, (1975–1980]: 4, [1980–1985): 5; ≥1985]: 6), first four principal components (PCs) as fixed effects, and genetic relatedness matrix as random effect to account for family structure (Table S1).
Martingale residuals for CFRD (as a time-to-onset trait)33 were obtained with a Breslow model. The Martingale residuals for age of CFRD onset were used in a mixed-effects linear regression, with the same covariates as MI plus sex, due to association of sex with CFRD (Table S1). Use of Martingale residuals in a linear regression (as opposed to use of a Cox proportional hazards regression) allowed us to include all participants while correcting for relatedness using a genetic relationship matrix. However, we found that a Cox proportional hazards regression in the unrelated-only subset yielded essentially identical results (data not shown). In addition, we found that when CFRD was treated as a dichotomous trait in a logistic regression with an additional covariate of age at last CFRD screening, the results were very similar (Figure S3). Based on the quantile-quantile plots and lambda values (Figure S4) of these analyses, the observed distribution follows the expected (null) distribution using these models. All analyses were conducted in R 4.0.
In adjusted models, an additional covariate of MI or CFRD Martingale residuals was added for the CFRD and MI models, respectively (Table S1). Our approach of including each phenotype as a covariate may be thought of as an approximate way to rule out mediation of MI on CFRD.34 We achieved similar beta and p values before and after adjustment (Figure S5), demonstrating that adjusting for the other phenotype did not introduce collider bias.
In a joint analysis of both traits, presence of CFRD and/or MI was modeled as a binary variable via mixed-effects logistic regression, with covariates of site (JHU, UNC, UW), residual CFTR function (zero vs. nonzero), linear birth cohort as outlined above, sex, age at last CFRD screening, and first four principal components (PCs) as fixed effects, and genetic relatedness matrix as random effect to account for family structure.
Variants with a minor allele frequency (MAF) > 0.5% were included in all single-variant association analyses. Variants associating with genome-wide significance (p value < 5e−8) or suggestive significance (p value < 5e−7) and previously published candidate variants (Table S2) with region-wide significance (p value < 0.01) were further evaluated for pleiotropy.
Pathogenicity prediction
To evaluate the potential pathogenicity of coding variants identified in this study, PROVEAN,35 SIFT,36 MetaSVM,37 MutationTaster,38 PolyPhen-2,39 FATHMM XF,40 and ClinPred41 were used. Multiple tools were used to better predict the consequences of these variants.
Rare variant association testing
The variant call format files (VCFs) were annotated using VEP v.101 to curate LOFTEE and 5 kb upstream/downstream annotations. OpenCRAVAT v.2.2.542 was then used to supply additional annotations for each variant based upon their MANE canonical transcripts. The gene-centric aggregation units are based on the GENCODE gene model.43 Within each gene unit, we retained likely deleterious variants based on a series of variant annotations. Specifically, we retained variants which were either (1) high-confidence loss-of-function variants based on Loss-Of-Function Transcript Effect Estimator,44 (2) missense variants with MetaSVM37 score > 0, (3) in-frame indels with Fathmm-XF40 score > 0.5, (4) or synonymous variants with Fathmm-XF score > 0.5. The genome-wide significance threshold was determined as 0.05/(number of genes) = 1.52e−6. Gene-based aggregation set mixed model association tests (SMMAT)45 for CFRD Martingale residuals and MI were conducted using the same model for common variants.
Phasing
Phasing using Eagle v.2.4.146 was conducted with default parameters on chromosome 1 on post-QC CFGP data. We then extracted the bi-allelic variants previously included in the SLC26A9 haplotypes.47
Minor allele frequency
Unless otherwise specified, the minor allele frequencies (MAF) were calculated within the CFGP-affected individuals included in the adjusted analyses (n = 4,028).
Linkage disequilibrium
Linkage disequilibrium (LD) was determined on the maximally unrelated subset (n = 3,350) of the CFGP-affected population included in this analysis, using Haploview v.16.0.1.
Colocalization analysis
Colocalization between CFRD and MI was evaluated in regions that associated with both phenotypes using the coloc package (v.5.1.0)48 coloc.abf() function in R v.4.0.2. Summary statistics from the adjusted genome-wide association analyses for CFRD martingale residuals and MI (n = 4,028) were used. All common (MAF > 0.5%) single-nucleotide variants within each region were included.
Results
Genome-wide association studies (GWASs) were conducted for association with two important complications of cystic fibrosis (CF)—CF-related diabetes (CFRD) and neonatal intestinal obstruction (a.k.a. meconium ileus or MI)—on individuals with CF enrolled in the CF Genome Project (CFGP). A total of 4,569 individuals with pancreatic-insufficient CF (n = 4,108 with CFRD status known [1,426 with CFRD]) were studied, of which 4,489 had MI status known (915 with MI) and 4,028 had MI and CFRD status known (1,131 with CFRD only; 552 with MI only; 265 with both) (Table 1). 3,267 of the 4,569 total individuals were previously included in a GWAS for MI or CFRD.8,9,13,14 The “adjusted” model included the other phenotype as a covariate, whereas the “unadjusted” model did not.
Although there was some evidence suggesting an earlier onset of CFRD within individuals with MI prior to consideration of key covariates (Figure 1A), when accounting for birth cohort, site, and CFTR genotype severity (Figures 1B–1D), the evidence for association between MI and earlier age of onset of CFRD was weaker albeit still significant (MI logistic regression p value 2.92e−2, beta 0.14). Similarly, Martingale residuals for age of onset of CFRD were minimally associated with MI (CFRD linear regression p value 3.02e−2, beta 0.03, Table S1). To account for any residual correlation between these traits, adjusted models were used controlling for both phenotypes.
Figure 1.
Kaplan-Meier plots for age of onset of CFRD
Plots divided by history of meconium ileus and site of enrollment in (A) everyone, (B) individuals born after 1970, (C) F508del homozygotes, and (D) F508del homozygotes born after 1970.
Identification of novel modifier loci for MI and CFRD
Variants at five loci were associated with MI at genome-wide significance (nearest genes: SLC26A9, CLPS/SLC26A8, EMSY, ATP12A, and SLC6A14), and variants at five other loci had suggestive evidence for association (SLC37A3, DOCK1, MNAT1, BCAR1, and CEBPB) (Figure 2). The association of MI with variants at six of these loci (two genome-wide significant) was not previously identified, indicated by blue font (Figure 2). In addition, variants at one candidate locus (PRSS1) had modest evidence of association at the region-wide significance level (Tables S2 and S3). For CFRD, variants at two loci had genome-wide significant evidence for association (TCF7L2, chr16p12.1), while variants at six loci had suggestive evidence of association with CFRD (PRAMEF19, SLC26A9 [two independent signals; see below], chr5q34, TFAP2B, ZNF316, and CEBPB). The association of CFRD with variants at five of these loci (one genome-wide significant) was not previously identified (Figure 2, purple font). In addition, variants at four candidate loci had modest evidence of association with CFRD at the region-wide significance level (PTMA, SLC2A2, PRSS1, and CDKAL1).
Figure 2.
Manhattan plots of phase 1 + 2 combined association analyses
Association analysis was performed on all variants with minor allele frequency (MAF) > 0.5% that passed quality control criteria. The x axis indicates chromosomal position, and the y axis indicates the strength of evidence for association with meconium ileus (−log10[p value]) (top) or cystic fibrosis-related diabetes (bottom). Adjusted models include additional covariates of MI or CFRD. The black line corresponds to the genome-wide significance threshold (p value 5e−8). Labels in bold exceed genome-wide significance and in regular font exceeded suggestive significance. Labels in blue are not previously identified for MI, purple are not previously identified for CFRD, and black are previously known. CEBPB falls below the suggestive threshold in the adjusted analyses, so it is labeled in parentheses.
Variants at SLC26A9 and CEBPB were associated with increased risk of both CFRD and MI, whereas variants at PRSS1 had inverse effects on risk for MI and CFRD
To test broadly for pleiotropic variants, we compared the p values of all variants genome-wide from the MI GWAS to the p values from the CFRD GWAS (Figure 3). Variants at SLC26A9 and CEBPB were associated with risk for both CFRD and for MI, meaning that the same alleles were associated with increased risk of both traits. In contrast, variants at PRSS1 were associated with CFRD and MI in the opposite direction, meaning that the alleles associated with increased risk for MI were protective for CFRD (and vice versa). Variants at the remaining loci did not display pleiotropy, associating with either MI or CFRD but not both (Figure 4; Figure S6).
Figure 3.
Comparison of the genetic risk architectures of CFRD and MI
Comparison of p values of each variant for association with MI and CFRD. All variants have been plotted. The MI log10(p value) was defined as positive when the risk alleles were concordant between CFRD and MI.
Figure 4.
Forest plot of significant loci (genome-wide significant, suggestive significant, or candidate-based significant)
Summary statistics in the adjusted analyses are shown. Genomic positions are in GRCh38. §Aksit et al.9 ‡Gong et al.13 EA, effect allele; OA, other allele; EAF: effect allele frequency.
Common variants within and upstream of SLC26A9 were associated with increased risk for both MI and CFRD in this dataset (e.g., rs2036100, MI p value 7.3e−8, beta −0.32, CFRD p value 2.7e−6, beta −0.06), as previously reported.8,14 The most significantly associated variants (MI, rs2036100; CFRD, rs1874361) are in high linkage disequilibrium (LD) with each other (r2 = 0.593, D′: 1 in CFGP unrelated subset) and exist within an LD block with variants previously reported to be significantly associated with MI and CFRD (Figure 5; Table S3). Colocalization analysis indicated support for the same causal SNP associating with each trait (posterior probability 96.4%; Table S4; Figure S7)
Figure 5.
LocusZoom plots for the SLC26A9 locus
(A) LocusZoom plot for association with MI in the CFGP cohort, adjusted with a covariate of CFRD Martingale residuals.
(B) LocusZoom plot for association with CFRD Martingale residuals in the CFGP cohort, adjusted with a covariate of MI.
(C) D′ linkage disequilibrium plot created on Haploview demonstrating the linkage disequilibrium between key SNPs in the region in the CFGP cohort.
Variants surrounding CEBPB were associated with increased risk for CFRD and MI (e.g., MI rs6095829, MI p value 5.2e−7, beta −0.29, CFRD p value 3.3e−7, beta −0.06; Figures 6A and 6B). The most significantly associated variants (MI, rs6095829; CFRD, rs2869963) are in high LD (Figure 6C). Colocalization analysis supported the same causal SNP associating with each trait (posterior probability 98.8%; Table S4; Figure S8). Variants in this region have been previously reported to increase risk for both traits,9,13 and also for T2D49 (Table S3). Furthermore, the association was replicated in the new set of individuals (rs6095829, UW-only MI, n = 1,170, p value 7.8e−2, beta −0.17; CFRD p value 7.9e−4, beta −0.06). Though these variants are physically closest to CEBPB, they are significant eQTLs for a long non-coding RNA, LINC01273, in the lung and adipose tissues (subcutaneous adipose p value 1.3e−13, normalized effect size 0.37; lung p value 2.0e−13, normalized effect size 0.37; GTEx v.850).
Figure 6.
LocusZoom plots for the CEBPB locus
(A) LocusZoom plot for association with MI in the CFGP cohort, adjusted with a covariate of CFRD Martingale residuals.
(B) LocusZoom plot for association with CFRD Martingale residuals in the CFGP cohort, adjusted with a covariate of MI.
(C) R-squared linkage disequilbrium plot created on Haploview demonstrating the linkage disequilibrium between key SNPs in the region in the CFGP cohort.
The major alleles for common variants surrounding the PRSS1 locus were associated with increased risk for CFRD but decreased risk for MI (e.g., rs3757377, CFRD p value 7.4e−5, beta 0.05 [CFRD-risk allele: C]; MI p value 5.9e−4, beta −0.21 [MI-risk allele: T]; C-allele freq: 61%; Figures 7A and 7B). Variants at this locus have previously been associated with MI (rs3757377)13 and CFRD (rs1964986).12 In our analyses, the variants most significantly associated with CFRD and MI were different (CFRD, rs4726576; MI, rs4527797); however, these variants are in high LD (Figure 7C) and all were associated with both traits in this study (Table S3). Colocalization analysis indicated the same causal SNPs likely associated with each trait (posterior probability 62.6%; Table S4; Figure S9). A GWAS reported an association of chronic pancreatitis with rs10273639,51 which is in high LD with the most significant variants in our analyses (Figure 7C). The alleles that increase risk for CFRD and are protective for MI increase risk for alcoholic pancreatitis (Table S3).
Figure 7.
LocusZoom and linkage disequilibrium plots for the PRSS1 locus
(A) LocusZoom plot for association with MI, adjusted with a covariate of CFRD Martingale residuals.
(B) LocusZoom plot for association with CFRD Martingale residuals, adjusted with a covariate of MI.
(C) R-squared linkage disequilibrium plot created on Haploview demonstrating the linkage disequilibrium between key SNPs in the region in the CFGP cohort.
In a joint analysis for association with presence/absence of at least one of CFRD or MI with a covariate of age at last CFRD screen, the association became more significant for the two loci associated with CFRD and MI in the same direction (rs2036100 at SLC26A9 p value 7.35e−10, beta −0.31; rs6095829 at CEBPB p value 6.56e−12, beta −0.33) (Figure S10). In contrast, association of variants with CFRD and MI with opposite directions of effect were not significant in this analysis (rs3757377 at PRSS1 p value 0.97, beta 0.002). In this analysis, variants at SLC6A14 associated with the joint phenotype with genome-wide significance; however, this association was driven by MI (Figure 4).
A locus downstream of SLC26A9 associates with increased risk for CFRD
A previously unidentified locus ∼260 kb downstream of SLC26A9 and ∼3.8 kb upstream of SLC45A3 had suggestive evidence of association with CFRD (e.g., rs6685188 p value 3.2e−7, beta 0.07). There was no evidence of association with MI (p value 0.22, beta 0.08) (Figure 5). Variants in this second locus are in a separate LD block from the variants near and within SLC26A9 (rs2036100 and rs6685188 r2: 0.02; D′: 0.17; Figure 5C). This assertion was supported by conditional analyses, which demonstrated that these signals are independent (Figure S11).
Variants located between SLC26A9 and SLC45A3 demonstrated modest association with CFRD (Figure 5A). These variants display residual LD with rs6685188 (Figure 5C) and, as expected, association is not apparent when conditioning for rs6685188 (Figure S11F). Of note, common variants within this new locus have been associated with T2D (e.g., rs7538321; CFRD p value 0.86)52 and body mass index (BMI; e.g., rs708727; CFRD p value 6.1e−4).53 Intriguingly, the higher-risk allele for CFRD was associated with lower risk for T2D and lower BMI. These T2D and BMI-associated variants have residual LD with rs6685188 (rs708727 and rs7538321 D′: 0.896 and 0.192, r2: 0.087 and 0, respectively); however, rs6685188 was not significantly associated with T2D or BMI (p values 0.2849 and 0.23 [proxy variant: rs1172199],53 respectively). Published Hi-C data on the pancreas demonstrated some chromatin interaction between chr1:205660000–205680000 (intron 1–4 of SLC45A3) and chr1:205940000–205960000 (upstream of SLC26A9),54 raising the possibility that the variants near SLC45A3 might affect enhancers of SLC26A9, or vice versa.
Variants in the two chromosome 1 loci associated with CFRD are correlated with expression of PM20D1
The higher-risk allele of rs6685188 was associated with decreased expression of PM20D1 in the lung (p value 8.5e−7, beta −0.33; GTEx v.8) and pancreas (p value 1.1e−10, beta −0.52; GTEx v.8). PM20D1 is a bidirectional enzyme that catalyzes both the condensation of fatty acids and amino acids to generate N-acyl amino acids and the reverse hydrolytic reaction, which could be important for diabetes as administration of N-acyl amino acids to mice has been shown to improve glucose homeostasis and increase energy expenditure.55 The higher-risk allele of rs6685188 was also associated with expression of other genes: decreased expression of NUCKS1 in the tibial nerve, esophagus, and thyroid; increased expression of SLC41A1 in the esophagus and pancreas; and increased expression of NUAK2 in the thyroid (GTEx v.8). rs6685188 was slightly associated with decreased expression of SLC26A9 in the pancreas (p value 0.03) but not lung (p value 0.28, GTEx v.8) and was not associated with expression of SLC45A3 (pancreas eQTL p value 0.97, GTEx v.8).
rs6685188 is in weak LD with rs708727 (r2: 0.19; D′: 0.90; MAF: 39.2%; synonymous variant within SLC41A1), which has been previously reported to be associated with lower PM20D1 expression.56 The less common variants within this region (e.g., rs73080402; MAF: 11.8%) were also associated with CFRD and slightly associated with PM20D1 expression in the pancreas (p value 2.5e−3, beta 0.31, GTEx v.8).
Previous studies of the variants at the SLC26A9 locus (e.g., rs2036100) revealed that alleles associated with increased risk of CFRD are associated with decreased expression of SLC26A9 in the pancreas based on experimental and eQTL analysis.47,50 These variants have also been shown to be associated with decreased expression of PM20D1 in the pancreas (rs2036100 p value 8.4e−5, beta 0.29; GTEx v.8) and lung (p value 1.4e−9, beta 0.37; GTEx v.8). The association between the previously unidentified variants which impact PM20D1 expression and CFRD indicates that variation in PM20D1 expression is likely to contribute to CFRD risk.
An uncommon missense variant in SLC26A9 region is associated with MI but not CFRD
To identify other variants in the SLC26A9 region that may independently be contributing to CFRD or MI risk, conditional analyses on the most significant variants in this locus were conducted. This revealed an uncommon variant, p.Val172Met (c.514G>A; rs146704092) that remained significant for association with MI (MAF: 0.98% in the adjusted cohort [n = 4,028] and 0.85% in the full cohort [n = 5,469]; p value 6.57e−05, beta 1.14, rs2036100 conditional p value 6.8e−4, beta 0.95, Figure S11). In contrast to the common variants at this locus, this variant was not associated with CFRD (p value 0.786, beta 0.016). This variant is not predicted to create an alternative Kozak sequence (ATGpr score 0.04).57 It is predicted to be deleterious by MetaSVM (score 0.324), MutationTaster (score 0.5116), SIFT (score 0.007), and PolyPhen-2 (score 0.5528) and benign by FATHMM XF (score 0.24), ClinPred (score 0.033), and PROVEAN (score 0.5422).
We tested whether the haplotype background of the p.Val172Met variant might account for its association with MI. Eighty-four of the 93 M alleles in our full cohort (n = 5,469) are observed on the background of a common haplotype associated with MI but not CFRD (CFGP frequency including individuals with the V allele = 7.4%; Figure S12). This haplotype bearing the p.Val172Met allele is associated with an even greater risk for MI within the 624 individuals with at least one copy of this haplotype (Fisher’s exact test p value 8.7e−4). Thus, the p.Val172Met variant is still associated with MI while accounting for haplotype background. The remaining nine individuals (one sibling pair and seven singletons) harbor the M allele on four rare haplotypes (<1%, Figure S12).
p.Val172Met was associated with MI in the F508del homozygote subset (where F508del refers specifically to p.Phe508del in CFTR; n = 2,594; p value 4.8e−4, beta 1.08) and in individuals with one or two copies of F508del (n = 4,226; p value 8.3e−5, beta 0.99). It is unclear whether p.Val172Met has any effect on MI in individuals with zero copies of F508del, because the sample size is small (n = 185; p value 0.99). An interaction term analysis demonstrated that the p.Val172Met effect size was not significantly different depending on F508del copy number (p value 0.58), indicating that there is no evidence that F508del modifies the effect of p.Val172Met on MI. Importantly, the finding that p.Val172Met is associated with MI in the F508del homozygous subset indicates that SLC26A9 variation can influence MI risk even when there is no functional CFTR at the cell surface.
The p.Val172Met variant was independently detected in a separate analysis of gene-based aggregated rare variant (MAF < 1%) association with MI (Figure S13). SLC26A9 was one of the most significantly associated genes (p value 2.77e−4), and it was found that association entirely depended on p.Val172Met, disappearing after removing the p.Val172Met (p value 0.52). Finally, preliminary data indicate that, when expressed in HEK293 cells, SLC26A9 carrying the p.Val172Met variant has altered glycosylation and, when expressed with wtCFTR, less constitutive current and greater outward rectification in whole cell patch clamp experiments (Dr. Carol Bertrand, personal communication). These preliminary findings support the p.Val172Met variant altering the function of SLC26A9.
Variants at three loci that were not previously identified as associated with MI or CFRD
Variants at two loci were associated with MI (CLPS/SLC26A8 and EMSY) and variants at one locus (chr16p12.1) were associated with CFRD for the first time at a genome-wide significance level. The most significant variant at CLPS/SLC26A8 lies in an intron of SLC26A8 (rs78387293). This variant is uncommon (MAF: 1.1%) and is not a known eQTL in any tissue (GTEx v.8). SLC26A8 is a testis-specific member of the SLC26 family58 which also includes SLC26A9, a locus that harbored variants that significantly associated with CFRD and MI. SLC26A8 and SLC26A9 have been shown to have a functional interaction with CFTR.59 Of note, CLPS Arg109Cys (c.325C>T; rs41270082) was the second-most significantly associated variant at this locus (Figure 8). Though this variant is also not a significant eQTL in any tissue (GTEx v.8), it is predicted to be damaging by PROVEAN (score −4) and SIFT (score 0.002) and benign by MetaSVM (score −1.03), MutationTaster (score 1), PolyPhen-2 (score 0.0747; benign), FATHMM XF (score 0.1707), and ClinPred (score 0.042).
Figure 8.
LocusZoom plots for the CLPS locus
(A) LocusZoom plot for association with MI, adjusted with a covariate of CFRD Martingale residuals.
(B) LocusZoom plot for association with CFRD Martingale residuals, adjusted with a covariate of MI.
The second genome-wide significant locus not previously identified for association with MI was upstream of EMSY (e.g., rs2212434; MAF: 45.5%) (Figure S14), which encodes for a nuclear protein that forms a complex with BRCA2 and participates in DNA damage response.60 The same variants are also associated with food allergy,61 ulcerative colitis versus rheumatoid arthritis,62 atopic dermatitis,63 and inflammatory skin disease,64 raising interesting possibilities for the pathophysiology of MI which is otherwise poorly understood. These variants are not significant eQTLs in any tissue (GTEx v.8).
Variants at the chr16p12.1 (e.g., rs71384892, MAF: 2.7%) locus were associated with CFRD at the genome-wide significance level in the unadjusted model and with suggestive evidence for significance in the adjusted model (Figure S15). This locus harbored only three variants reaching significance. There are no annotated protein-coding genes within 500 kb of these variants, and three non-coding RNA genes within this region (AC009035.1 ∼45 kb downstream, AC002331.1 ∼75 kb upstream, AC009158.1 ∼340 kb upstream). This variant was not a significant eQTL in any tissue (GTEx v.8) and was not associated with T2D (p value 0.45).49
Variants at eight loci were suggestively associated with either MI or CFRD for the first time: four with MI (SLC37A3, DOCK1, MNAT1, BCAR1) and four with CFRD (PRAMEF19, chr5q34, TFAP2B, ZNF316) (Table S5, Figures S16 and S17). All three cohorts contributed to the association of each of these loci (same direction of effect), with the exception of ZNF316 for CFRD (Table S5). Of note, the variants at BCAR1 (e.g., rs9972644) are 68 kb downstream from variants previously associated with CFRD and T2D (e.g., rs72802342) and are not in LD with these variants (r2: 0.001).
Discussion
This study provides evidence that genetic modification of CFRD incorporates pathways both in common with and dissimilar to MI; it demonstrates that some of the same genes and/or pathways might be involved in these otherwise distinct complications of CF. Variants at SLC26A9 and CEBPB associated with MI and CFRD risk in the same direction, whereas the variants that increase risk for MI at PRSS1 were protective for CFRD. Furthermore, several loci not previously identified were identified for each trait: variants at three loci were genome-wide significant (two for MI, one for CFRD) and at eight loci were suggestive (four for MI, four for CFRD) for the first time.
Variants in a locus upstream and intronic of SLC26A9 have previously been associated with both MI and CFRD. Inclusion of MI and CFRD as covariates in the association models further demonstrated that this association was truly pleiotropic and not due to the correlations between the two traits. Interestingly, an uncommon missense variant in SLC26A9 was associated with MI only, and variants downstream of SLC26A9 that are eQTLs for PM20D1 were associated with CFRD only. This indicated to us that the variants upstream and within SLC26A9 could be influencing MI by decreasing SLC26A9 expression and influencing CFRD by decreasing PM20D1 expression. Hence, the pleiotropy at this locus could be due to these adjacent but perhaps functionally distinct genes being affected by the same variants.
The candidate variants surrounding PRSS1 were associated with MI and CFRD but with opposing directions of effect. The variants associated with higher risk of CFRD were also associated with higher risk of alcohol-related chronic pancreatitis.51 A variant in high LD with the risk variants at this locus is in the promoter of PRSS1 and affects PRSS1 expression.65 The pancreatitis- and CFRD-protective allele and MI-risk allele is associated with lower PRSS1 expression, whereas the CFRD- and pancreatitis-risk allele and MI-protective allele is associated with increased PRSS1 expression. These multiple associations suggest a unifying hypothesis in which increased PRSS1 expression results in increased risk of pancreatitis and increased risk (earlier onset) of CFRD as a result of pancreatic autodigestion and/or inflammation and decreased risk of MI as a result of improved dissolution of luminal contents in the developing fetal intestine.
Variants in a locus surrounding CEBPB were previously associated with MI13 and CFRD9 and were confirmed in this study. These variants were also associated with T2D in the general population.49 CEBPB encodes for a transcription factor that plays a crucial role during adipocyte differentiation, which is critical for the regulation of metabolism and hence could modify diabetes.66 There is also a non-coding gene in the region, LINC01273; however, little is known about this gene.
Variants that are pleiotropic for both traits modify distinct complications of CF that occur at different points in life by at least three mechanisms: by modifying a simple biochemical pathway leading to complex final phenotypes, by affecting adjacent but unrelated genes, or potentially by having different affects in different tissues.
Of particular interest among the loci not previously identified containing non-pleiotropic modifiers was a missense variant within CLPS which was associated with MI. CLPS encodes for colipase, a protein co-enzyme required for optimal enzyme activity of pancreatic lipase, which is essential for digestion67 and therefore could be influencing MI.
There are several limitations of this study. First, the status of CFRD and MI were defined by data available in source documents and/or CF Foundation Patient Registry, which could be subject to data entry error (S.D. Wood et al., 2007, Cystic Fibrosis Conf., abstract). Second, although this is a large sample size for genetic studies of modifiers of Mendelian disease, a much larger sample size would have increased power to study these complex phenotypes and could have helped identify additional loci that are associated with either trait. Third, though many of the previously known loci have been replicated in the individuals evaluated for association with CFRD or MI for the first time, many of the loci not previously identified were not yet tested for replication in a distinct population. Finally, this study focuses on single-nucleotide variants, but additional variation such as structural variation should be explored in future studies to further evaluate the genetic architectures of CFRD and MI.
Investigations of the differences and overlap between CFRD and MI hold the promise of delivering insight into overlapping and non-overlapping molecular etiologies of these two diseases.
Acknowledgments
The authors would like to thank the Cystic Fibrosis Foundation for the use of CF Foundation Patient Registry data to conduct this study. Additionally, we would like to thank the subjects, care providers, and clinic coordinators at CF centers throughout the United States for their contributions to the CF Foundation Patient Registry. The authors also thank Dr. Carol Bertrand (Department of Cell Biology, University of Pittsburgh) for sharing the results of her functional studies, Anh-Thu Ngoc Lam (McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University) for assistance in preparing the samples, and Deepti Jain (Department of Biostatistics, University of Washington) for her contributions in determining the filtering strategy for rare variant association analyses.
Support provided by NHLBI, through the BioData Catalyst program (award 1OT3HL142479-01, 1OT3HL142478-01, 1OT3HL142481-01, 1OT3HL142480-01, 1OT3HL147154). Funded by the Cystic Fibrosis Foundation CUTTIN18XX1, BAMSHA18XX0, and KNOWLE18XX0.
Declaration of interests
The authors declare no competing interests.
Published: October 6, 2022
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2022.09.004.
Contributor Information
Scott M. Blackman, Email: sblackman@jhmi.edu.
CF Genome Project:
Melis A. Aksit, Michael J. Bamshad, Scott M. Blackman, Elizabeth Blue, Kati Buckingham, Jessica X. Chong, J. Michael Collaco, Garry R. Cutting, Hong Dang, Alice Eastman, Anna Faino, Paul J. Gallins, Ronald Gibson, Beth Godwin, William W. Gordon, Kurt Hetrick, Le Huang, Michael R. Knowles, Anh-Thu N. Lam, Hua Ling, Weifang Liu, Yun Li, Frankline Onchiri, Wanda K. O'Neal, Rhonda G. Pace, Kymberleigh Pagel, Mark Porter, Elizabeth Pugh, Karen S. Raraigh, Rebekah Mikeasky, Margaret Rosenfeld, Jonathan Rosen, Adrienne Stilp, Jaclyn R. Stonebraker, Quan Sun, Jia Wen, Fred A. Wright, Yingxi Yang, Peng Zhang, Yan Zhang, and Yi-Hui Zhou
Web resources
GTEx v.8, https://www.gtexportal.org/home/
LocusZoom, http://locuszoom.org/
Loss-Of-Function Transcript Effect Estimator, https://github.com/konradjk/loftee
OMIM, http://www.omim.org/
openCRAVAT, https://opencravat.org/
Supplemental information
Data and code availability
The data generated during this study are available in the GWAS Catalog (https://www.ebi.ac.uk/gwas/), accession numbers GCST90104458, GCST90104459, GCST90104460, and GCST90104461.
References
- 1.Couce M., O'Brien T.D., Moran A., Roche P.C., Butler P.C. Diabetes mellitus in cystic fibrosis is characterized by islet amyloidosis. J. Clin. Endocrinol. Metab. 1996;81:1267–1272. doi: 10.1210/jcem.81.3.8772610. [DOI] [PubMed] [Google Scholar]
- 2.Hull R.L., Westermark G.T., Westermark P., Kahn S.E. Islet amyloid: a critical entity in the pathogenesis of type 2 diabetes. J. Clin. Endocrinol. Metab. 2004;89:3629–3643. doi: 10.1210/jc.2004-0405. [DOI] [PubMed] [Google Scholar]
- 3.Moran A., Becker D., Casella S.J., Gottlieb P.A., Kirkman M.S., Marshall B.C., Slovis B., CFRD Consensus Conference Committee Epidemiology, pathophysiology, and prognostic implications of cystic fibrosis-related diabetes: a technical review. Diabetes Care. 2010;33:2677–2683. doi: 10.2337/dc10-1279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Moran A., Dunitz J., Nathan B., Saeed A., Holme B., Thomas W. Cystic fibrosis-related diabetes: current trends in prevalence, incidence, and mortality. Diabetes Care. 2009;32:1626–1631. doi: 10.2337/dc09-0586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lewis C., Blackman S.M., Nelson A., Oberdorfer E., Wells D., Dunitz J., Thomas W., Moran A. Diabetes-related mortality in adults with cystic fibrosis. role of genotype and sex. Am. J. Respir. Crit. Care Med. 2015;191:194–200. doi: 10.1164/rccm.201403-0576OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bengtson C.D., He J., Kim M.D., Salathe M.A. Cystic fibrosis-related diabetes is associated with worse lung function trajectory despite ivacaftor use. Am. J. Respir. Crit. Care Med. 2021;204:1343–1345. doi: 10.1164/rccm.202104-1060LE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blackman S.M., Hsu S., Ritter S.E., Naughton K.M., Wright F.A., Drumm M.L., Knowles M.R., Cutting G.R. A susceptibility gene for type 2 diabetes confers substantial risk for diabetes complicating cystic fibrosis. Diabetologia. 2009;52:1858–1865. doi: 10.1007/s00125-009-1436-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Blackman S.M., Commander C.W., Watson C., Arcara K.M., Strug L.J., Stonebraker J.R., Wright F.A., Rommens J.M., Sun L., Pace R.G., et al. Genetic modifiers of cystic fibrosis-related diabetes. Diabetes. 2013;62:3627–3635. doi: 10.2337/db13-0510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Aksit M.A., Pace R.G., Vecchio-Pagán B., Ling H., Rommens J.M., Boelle P.Y., Guillot L., Raraigh K.S., Pugh E., Zhang P., et al. Genetic modifiers of cystic fibrosis-related diabetes have extensive overlap with type 2 diabetes and related traits. J. Clin. Endocrinol. Metab. 2020;105:dgz102. doi: 10.1210/clinem/dgz102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carlyle B.E., Borowitz D.S., Glick P.L. A review of pathophysiology and management of fetuses and neonates with meconium ileus for the pediatric surgeon. J. Pediatr. Surg. 2012;47:772–781. doi: 10.1016/j.jpedsurg.2012.02.019. [DOI] [PubMed] [Google Scholar]
- 11.Cystic Fibrosis Foundation . 2019. Cystic Fibrosis Foundation Patient Registry Annual Data Report. [Google Scholar]
- 12.Lin Y.C., Keenan K., Gong J., Panjwani N., Avolio J., Lin F., Adam D., Barrett P., Bégin S., Berthiaume Y., et al. Cystic fibrosis-related diabetes onset can be predicted using biomarkers measured at birth. Genet. Med. 2021;23:927–933. doi: 10.1038/s41436-020-01073-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gong J., Wang F., Xiao B., Panjwani N., Lin F., Keenan K., Avolio J., Esmaeili M., Zhang L., He G., et al. Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci. PLoS Genet. 2019;15:e1008007. doi: 10.1371/journal.pgen.1008007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sun L., Rommens J.M., Corvol H., Li W., Li X., Chiang T.A., Lin F., Dorfman R., Busson P.F., Parekh R.V., et al. Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis. Nat. Genet. 2012;44:562–569. doi: 10.1038/ng.2221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hodgkin J. Seven types of pleiotropy. Int. J. Dev. Biol. 1998;42:501–505. [PubMed] [Google Scholar]
- 16.Vanscoy L.L., Blackman S.M., Collaco J.M., Bowers A., Lai T., Naughton K., Algire M., McWilliams R., Beck S., Hoover-Fong J., et al. Heritability of lung disease severity in cystic fibrosis. Am. J. Respir. Crit. Care Med. 2007;175:1036–1043. doi: 10.1164/rccm.200608-1164OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Corvol H., Blackman S.M., Boëlle P.Y., Gallins P.J., Pace R.G., Stonebraker J.R., Accurso F.J., Clement A., Collaco J.M., Dang H., et al. Genome-wide association meta-analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat. Commun. 2015;6:8382. doi: 10.1038/ncomms9382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Drumm M.L., Konstan M.W., Schluchter M.D., Handler A., Pace R., Zou F., Zariwala M., Fargo D., Xu A., Dunn J.M., et al. Genetic modifiers of lung disease in cystic fibrosis. N. Engl. J. Med. 2005;353:1443–1453. doi: 10.1056/NEJMoa051469. [DOI] [PubMed] [Google Scholar]
- 19.Stonebraker J.R., Ooi C.Y., Pace R.G., Corvol H., Knowles M.R., Durie P.R., Ling S.C. Features of severe liver disease with portal hypertension in patients with cystic fibrosis. Clin. Gastroenterol. Hepatol. 2016;14:1207–1215.e3. doi: 10.1016/j.cgh.2016.03.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bartlett J.R., Friedman K.J., Ling S.C., Pace R.G., Bell S.C., Bourke B., Castaldo G., Castellani C., Cipolli M., Colombo C., et al. Genetic modifiers of liver disease in cystic fibrosis. JAMA. 2009;302:1076–1083. doi: 10.1001/jama.2009.1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Polineni D., Piccorelli A.V., Hannah W.B., Dalrymple S.N., Pace R.G., Durie P.R., Ling S.C., Knowles M.R., Stonebraker J.R. Analysis of a large cohort of cystic fibrosis patients with severe liver disease indicates lung function decline does not significantly differ from that of the general cystic fibrosis population. PLoS One. 2018;13:e0205257. doi: 10.1371/journal.pone.0205257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Treggiari M.M., Rosenfeld M., Mayer-Hamblett N., Retsch-Bogart G., Gibson R.L., Williams J., Emerson J., Kronmal R.A., Ramsey B.W., EPIC Study Group Early anti-pseudomonal acquisition in young patients with cystic fibrosis: rationale and design of the EPIC clinical trial and observational study. Contemp. Clin. Trials. 2009;30:256–268. doi: 10.1016/j.cct.2009.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Raraigh K.S., Aksit M.A., Hetrick K., Pace R.G., Ling H., O'Neal W., Blue E., Zhou Y.H., Bamshad M.J., Blackman S.M., et al. Complete CFTR gene sequencing in 5,058 individuals with cystic fibrosis informs variant-specific treatment. J. Cyst. Fibros. 2021;21:463–470. doi: 10.1016/j.jcf.2021.10.011. [DOI] [PubMed] [Google Scholar]
- 24.Knapp E.A., Fink A.K., Goss C.H., Sewall A., Ostrenga J., Dowd C., Elbert A., Petren K.M., Marshall B.C. The cystic fibrosis foundation patient registry: Design and methods of a national observational disease registry. Annals of the American Thoracic Society. 2016;13:1173–1179. doi: 10.1513/AnnalsATS.201511-781OC. [DOI] [PubMed] [Google Scholar]
- 25.Blackman S.M., Deering-Brose R., McWilliams R., Naughton K., Coleman B., Lai T., Algire M., Beck S., Hoover-Fong J., Hamosh A., et al. Relative contribution of genetic and nongenetic modifiers to intestinal obstruction in cystic fibrosis. Gastroenterology. 2006;131:1030–1039. doi: 10.1053/j.gastro.2006.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jun G., Flickinger M., Hetrick K.N., Romm J.M., Doheny K.F., Abecasis G.R., Boehnke M., Kang H.M. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 2012;91:839–848. doi: 10.1016/j.ajhg.2012.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gogarten S.M., Sofer T., Chen H., Yu C., Brody J.A., Thornton T.A., Rice K.M., Conomos M.P. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019;35:5346–5348. doi: 10.1093/bioinformatics/btz567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Conomos M.P., Miller M.B., Thornton T.A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 2015;39:276–293. doi: 10.1002/gepi.21896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zheng X., Levine D., Shen J., Gogarten S.M., Laurie C., Weir B.S. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhu X., Li S., Cooper R.S., Elston R.C. A unified association analysis approach for family and unrelated samples correcting for stratification. Am. J. Hum. Genet. 2008;82:352–365. doi: 10.1016/j.ajhg.2007.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chen H., Wang C., Conomos M.P., Stilp A.M., Li Z., Sofer T., Szpiro A.A., Chen W., Brehm J.M., Celedón J.C., et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 2016;98:653–666. doi: 10.1016/j.ajhg.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Therneau T.M., Grambsch P.M., Fleming T.R. Martingale-based residuals for survival models. Biometrika. 1990;77:147–160. [Google Scholar]
- 34.Lange T., Hansen K.W., Sørensen R., Galatius S. Applied mediation analyses: a review and tutorial. Epidemiol. Health. 2017;39:e2017035. doi: 10.4178/epih.e2017035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Choi Y., Chan A.P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31:2745–2747. doi: 10.1093/bioinformatics/btv195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sim N.L., Kumar P., Hu J., Henikoff S., Schneider G., Ng P.C. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012;40:W452–W457. doi: 10.1093/nar/gks539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kim S., Jhong J.H., Lee J., Koo J.Y. Meta-analytic support vector machine for integrating multiple omics data. BioData Min. 2017;10:2. doi: 10.1186/s13040-017-0126-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Schwarz J.M., Rödelsperger C., Schuelke M., Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods. 2010;7:575–576. doi: 10.1038/nmeth0810-575. [DOI] [PubMed] [Google Scholar]
- 39.Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova A., Bork P., Kondrashov A.S., Sunyaev S.R. A method and server for predicting damaging missense mutations. Nat. Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rogers M.F., Shihab H.A., Mort M., Cooper D.N., Gaunt T.R., Campbell C. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics. 2018;34:511–513. doi: 10.1093/bioinformatics/btx536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Alirezaie N., Kernohan K.D., Hartley T., Majewski J., Hocking T.D. ClinPred: prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am. J. Hum. Genet. 2018;103:474–483. doi: 10.1016/j.ajhg.2018.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Pagel K.A., Kim R., Moad K., Busby B., Zheng L., Tokheim C., Ryan M., Karchin R. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform. 2020;4:310–317. doi: 10.1200/CCI.19.00132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Frankish A., Diekhans M., Ferreira A.M., Johnson R., Jungreis I., Loveland J., Mudge J.M., Sisu C., Wright J., Armstrong J., et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47:D766–D773. doi: 10.1093/nar/gky955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141, 456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chen H., Huffman J.E., Brody J.A., Wang C., Lee S., Li Z., Gogarten S.M., Sofer T., Bielak L.F., Bis J.C., et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am. J. Hum. Genet. 2019;104:260–274. doi: 10.1016/j.ajhg.2018.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Loh P.R., Danecek P., Palamara P.F., Fuchsberger C., A Reshef Y., K Finucane H., Schoenherr S., Forer L., McCarthy S., Abecasis G.R., et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lam A.T.N., Aksit M.A., Vecchio-Pagan B., Shelton C.A., Osorio D.L., Anzmann A.F., Goff L.A., Whitcomb D.C., Blackman S.M., Cutting G.R. Increased expression of anion transporter SLC26A9 delays diabetes onset in cystic fibrosis. J. Clin. Invest. 2020;130:272–286. doi: 10.1172/JCI129833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Mahajan A., Taliun D., Thurner M., Robertson N.R., Torres J.M., Rayner N.W., Payne A.J., Steinthorsdottir V., Scott R.A., Grarup N., et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 2018;50:1505–1513. doi: 10.1038/s41588-018-0241-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Whitcomb D.C., LaRusch J., Krasinskas A.M., Klei L., Smith J.P., Brand R.E., Neoptolemos J.P., Lerch M.M., Tector M., Sandhu B.S., et al. Common genetic variants in the CLDN2 and PRSS1-PRSS2 loci alter risk for alcohol-related and sporadic pancreatitis. Nat. Genet. 2012;44:1349–1354. doi: 10.1038/ng.2466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Vujkovic M., Keaton J.M., Lynch J.A., Miller D.R., Zhou J., Tcheandjieu C., Huffman J.E., Assimes T.L., Lorenz K., Zhu X., et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 2020;52:680–691. doi: 10.1038/s41588-020-0637-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yengo L., Sidorenko J., Kemper K.E., Zheng Z., Wood A.R., Weedon M.N., Frayling T.M., Hirschhorn J., Yang J., Visscher P.M., GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Schmitt A.D., Hu M., Jung I., Xu Z., Qiu Y., Tan C.L., Li Y., Lin S., Lin Y., Barr C.L., Ren B. A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Rep. 2016;17:2042–2059. doi: 10.1016/j.celrep.2016.10.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Long J.Z., Svensson K.J., Bateman L.A., Lin H., Kamenecka T., Lokurkar I.A., Lou J., Rao R.R., Chang M.R., Jedrychowski M.P., et al. The secreted enzyme PM20D1 regulates lipidated amino acid uncouplers of mitochondria. Cell. 2016;166:424–435. doi: 10.1016/j.cell.2016.05.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kichaev G., Bhatia G., Loh P.R., Gazal S., Burch K., Freund M.K., Schoech A., Pasaniuc B., Price A.L. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 2019;104:65–75. doi: 10.1016/j.ajhg.2018.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Salamov A.A., Nishikawa T., Swindells M.B. Assessing protein coding region integrity in cDNA sequencing projects. Bioinformatics. 1998;14:384–390. doi: 10.1093/bioinformatics/14.5.384. [DOI] [PubMed] [Google Scholar]
- 58.Lohi H., Kujala M., Makela S., Lehtonen E., Kestila M., Saarialho-Kere U., Markovich D., Kere J. Functional characterization of three novel tissue-specific anion exchangers SLC26A7, -A8, and -A9. J. Biol. Chem. 2002;277:14246–14254. doi: 10.1074/jbc.M111802200. [DOI] [PubMed] [Google Scholar]
- 59.El Khouri E., Touré A. Functional interaction of the cystic fibrosis transmembrane conductance regulator with members of the SLC26 family of anion transporters (SLC26A8 and SLC26A9): physiological and pathophysiological relevance. Int. J. Biochem. Cell Biol. 2014;52:58–67. doi: 10.1016/j.biocel.2014.02.001. [DOI] [PubMed] [Google Scholar]
- 60.Livingston D.M. EMSY, a BRCA-2 partner in crime. Nat. Med. 2004;10:127–128. doi: 10.1038/nm0204-127. [DOI] [PubMed] [Google Scholar]
- 61.Marenholz I., Grosche S., Kalb B., Rüschendorf F., Blümchen K., Schlags R., Harandi N., Price M., Hansen G., Seidenberg J., et al. Genome-wide association study identifies the SERPINB gene cluster as a susceptibility locus for food allergy. Nat. Commun. 2017;8:1056. doi: 10.1038/s41467-017-01220-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Peyrot W.J., Price A.L. Identifying loci with different allele frequencies among cases of eight psychiatric disorders using CC-GWAS. Nat. Genet. 2021;53:445–454. doi: 10.1038/s41588-021-00787-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Paternoster L., Standl M., Waage J., Baurecht H., Hotze M., Strachan D.P., Curtin J.A., Bønnelykke K., Tian C., Takahashi A., et al. Multi-ancestry genome-wide association study of 21, 000 cases and 95, 000 controls identifies new risk loci for atopic dermatitis. Nat. Genet. 2015;47:1449–1456. doi: 10.1038/ng.3424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Baurecht H., Hotze M., Brand S., Büning C., Cormican P., Corvin A., Ellinghaus D., Ellinghaus E., Esparza-Gordillo J., Fölster-Holst R., et al. Genome-wide comparative analysis of atopic dermatitis and psoriasis gives insight into opposing genetic mechanisms. Am. J. Hum. Genet. 2015;96:104–120. doi: 10.1016/j.ajhg.2014.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Boulling A., Sato M., Masson E., Génin E., Chen J.M., Férec C. Identification of a functional PRSS1 promoter variant in linkage disequilibrium with the chronic pancreatitis-protecting rs10273639. Gut. 2015;64:1837–1838. doi: 10.1136/gutjnl-2015-310254. [DOI] [PubMed] [Google Scholar]
- 66.Guo L., Li X., Tang Q.Q. Transcriptional regulation of adipocyte differentiation: a central role for CCAAT/enhancer-binding protein (C/EBP) β. J. Biol. Chem. 2015;290:755–761. doi: 10.1074/jbc.R114.619957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lowe M.E. Structure and function of pancreatic lipase and colipase. Annu. Rev. Nutr. 1997;17:141–158. doi: 10.1146/annurev.nutr.17.1.141. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data generated during this study are available in the GWAS Catalog (https://www.ebi.ac.uk/gwas/), accession numbers GCST90104458, GCST90104459, GCST90104460, and GCST90104461.