Summary
The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784−12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.
Keywords: population genetics, statistical genetics, identity-by-descent, liver disease, electronic health records, phenome wide association studies, liver serum measures
Introduction
Genetic identification of monogenic disease historically relied on tracking the co-segregation of genomic segments and disease state through familial pedigrees, in a process known as linkage mapping.1,2 This approach is typically followed by localized sequencing to reveal the disease-causing variant and confirmatory functional studies in vitro or in animal models. This strategy has been used successfully throughout the late 20th century to uncover thousands of loci underlying suspected, rare genetic disorders.3 More recently, next generation sequencing technologies have led to the identification of the genetic etiology of disease through the direct sequencing of patient exomes and genomes in close pedigree structures.4 Genomic technologies have also been applied in health systems to uncover unknown pathogenic variants and streamline diagnosis5 and to refine our understanding of the penetrance and frequency of pathogenic variants at a population level.6 However, the preponderance of genome sequencing and genomic medicine research have been performed in populations of European descent, and there is a lag in genomic sequence data available for, and studies directed at, understanding monogenic disorders in non-European populations.7
The growth of large-scale biobanks linked to health systems data in recent years has opened avenues to uncovering the etiology of monogenic disorders.8 With some exceptions,9,10 the majority of genomic data generated in biobanks worldwide is on low-cost genotype arrays rather than genome sequencing and many biobanks are designed for population-based recruitment rather than being disease or pedigree focused. However, by leveraging array data in population-based biobanks, it is possible to calculate haplotypes of the genome that have been co-inherited from a recent common ancestor identical-by-descent.11 Using this strategy, genealogical relationships can be captured locally along the genome among distantly or putatively unrelated members of a population, which are particularly enriched in founder populations.12, 13, 14, 15 Identical-by-descent haplotypes have the potential to harbor rare alleles that are not directly ascertained on genotyping arrays, facilitating association mapping of rare variants even when they are not directly observed,12 or are too rare or population-private to be readily imputable with currently existing reference panels; this approach is known as IBD mapping (identity-by-descent mapping).15,16 This property of IBD makes it especially useful for rare variant-based associations in diverse and understudied founder populations, for which deep genome-sequencing datasets may not be available. Furthermore, previous studies have leveraged EHR data in concert with genomic data to demonstrate the ubiquity and potential under-recognition of monogenic forms of disease in patient populations.6,17 We previously applied IBD mapping to height in a Puerto Rican (PR) founder population in New York City and identified a monogenic variant underlying the skeletal disorder Steel syndrome,18 demonstrating the power of population-based strategies for elucidating monogenic disorders.
Here we expand our previous approach by systematically associating IBD haplotypes with the full spectrum of EHR derived phenotypes in the large founder population of PR and PR-descent participants in the diverse, multi-ethnic BioMe biobank in New York City. We performed a phenome-wide association study (PheWAS)19 of identical-by-descent haplotypes under a recessive model in the PR founder population and identified a significant association between homologous IBD sharing at the locus 7q21.12 and severe liver disease. Fine-mapping of the identical-by-descent haplotypic region uncovered a rare variant in the gene ABCB4 (ABCB4: c.2784−12T>C; rs201498350; MIM: 171060), variants in which are known to play a causal role in multiple forms of hepatobiliary disease.20 In vitro analysis demonstrated that this variant disrupted splicing, leading to an ABCB4 protein product lacking exon 23. Manual chart review of these individuals revealed evidence of severe liver diseases in four of five homozygotes. We also investigated the impact of harboring one copy of c.2784−12T>C via a combination of PheWAS, analysis of liver function tests, and manual chart review, revealing both an elevation of serum liver enzyme levels and an increased risk of liver disease in heterozygotes. Furthermore, population-level screening revealed the variant to be common in PR (carrier rate of ∼1.9%) while rare (< 1%) in other global populations. These analyses provide a methodological framework for bridging statistical genetics and clinical genomics and demonstrate that EHR-embedded, population-level research can elucidate the continuum of genomic risk for liver disease.
Subjects and methods
BioMe Biobank
Study participants were recruited from the BioMe Biobank Program of The Charles Bronfman Institute for Personalized Medicine at Mount Sinai Medical Center from 2007 onward. The BioMe Biobank Program (Institutional Review Board 07–0529) operates under a Mount Sinai Institutional Review Board-approved research protocol. All study participants provided written informed consent.
Genotype data and quality control
Genotyping, quality control, and merging of array data across the OMNI and MEGA platforms was performed as described in detail in Vishnu et al.21 In brief, we performed standard quality control for variants based on missingness, heterozygosity, and Hardy-Weinberg equilibrium using PLINKv.1.9.22,23 We removed samples that were duplicated across both arrays and subset data to the intersect of variants present on both platforms (n = 461,677 SNPs; n = 21,692 individuals). After subsequently removing palindromic sites with a missingness rate of >1%, this resulted in a total of 377,799 SNPs and 25,750 individuals for downstream analysis.
Haplotype phasing
Phasing was performed per chromosome with the EAGLEv.2.0.524 software using the genetic map (hg19) that is included in the EAGLEv.2.0.5 package.
An additional two individuals were excluded during the phasing process if they had a per chromosome level missingness rate of greater than 10% for any one autosome, leaving n = 25,748 individuals in total.
IBD Inference and quality control
Phased output from EAGLE was filtered to a MAF of ≥1% and converted to PLINK format using fcGENE.25 This was used as input for the GERMLINE algorithm.26 We ran GERMLINE over each autosome across all individuals simultaneously using the following flags: “-min_m 3 -err_hom 0 -err_het 2 -bits 25 –haploid.” For quality control, IBD that overlapped with low complexity regions were excluded, along with IBD that fell within regions of excessive IBD sharing (which we defined as regions of the genome where the level of pairwise IBD sharing exceeded three standard deviations above the genome-wide mean).
IBD-based clustering of Puerto Rican ancestry participants
We summed IBD haplotypes along the genome of all n = 25,748 participants and used to construct an adjacency matrix where each node represented a BioMe participant and each weighted edge represented the pairwise sum of IBD sharing between a given pair of individuals. After first excluding edges sharing > = 1500cM of their genome IBD, we employed the InfoMap27,28 as implemented in the iGraph package (R v.3.2.0) to uncover communities of individuals enriched for IBD sharing. We uncovered a community of N = 5,100 individuals who, based on self-reporting labels, we defined as the Puerto Rican ancestry identical-by-descent community going forward.
Phenome-wide association of IBD haplotypes
We first clustered IBD haplotypes inferred via GERMLINE into homologous cliques using the DASH29 advanced (dash_adv) algorithm across all BioMe participants, including the following additional parameters: “-win 250000 -r2 1.” We then extracted the Puerto Rican community (N = 5,100) from the DASH output and recoded individuals who were homozygous for a given IBD clique as “1” and those who were heterozygous or who were not members of the clique as “0.” We then used this as the primary predictor variable for an IBD-based phenome-wide association that was modeled using an implementation of the Saddle Point approximation30 (using the R package “SPAtest,” R v.3.2.0), with age and sex included as covariates. For each test, one individual from each pair of directly related individuals was excluded prior to association, preferentially excluding “controls” to “cases” for each ICD-9 code.
Whole-genome sequencing, variant calling, and annotation of identical-by-descent homozygotes
Alignment and variant calling of whole-genome sequence (WGS) data was performed using the pipeline provided by Linderman et al.31 Further variant annotation was performed using Variant Effect Predictor. These annotations were then intersected with the WGS data for the three homozygotes using an in-house python script.
Phenome-wide association of ABCB4 c.2784−12T>C in heterozygous carriers
A phenome-wide association of ABCB4 c.2784−12T>C carrier status was conducted using the SAIGE software21,32 for a total of n = 4,903 Puerto Rican ancestry participants (homozygous individuals were excluded). ICD-9 billing codes served as the phenotypic outcome, and we included age, sex, and the first five principal components (PCs) as covariates, as well as a general relatedness matrix (GRM) to account for relatedness. The association analysis was restricted to ICD-9 codes for which three or more affected individuals were present among carriers (n = 550 ICD-9 codes).
Association of ABCB4 c.2784−12T>C and liver enzymes
Outpatient values for nine laboratory tests for liver enzymes and liver function were extracted from EHRs. For each individual, the median value was taken for each trait. Individuals were stratified according to sex, and outliers that fell more than four standard deviations from the sex-specific population median were excluded. Sex-specific values were subsequently log-transformed and converted to z-scores (mean 0, standard deviation 1) before the data were recombined. These z-scores were then used as the phenotypic outcome in a linear model that included age as a covariate. Related individuals were excluded from the analysis, as were the five individuals who were homozygous for the ABCB4 c.2784−12T>C variant.
Association of ABCB4 c.2784−12T>C and liver disease
Manual chart review was performed by a physician blinded to the subject’s ABCB4 c.2784−12T>C carrier status. Subjects with hepatitis C causing viral hepatitis were excluded from further analyses. Text search was performed for “liver disease,” “fatty liver,” “NAFLD,” “fibrosis,” “steatosis,” “sclerosing cholangitis,” and “cirrhosis.” A review of all prior abdominal imaging was performed, specifically assessing for phrases such as “nodular” or “hyperechogenic” liver. If any of these searches yielded a positive result, then clinical notes, alcohol history, BMI, liver function tests, FibroScan results, and any liver biopsies were reviewed to establish the etiology and severity of the subject’s liver disease. A two-tailed Fisher’s exact test was performed to assess for associations between carrier status and the presence of any non-viral liver disease, and a p value of < 0.05 was considered significant.
Functional validation of ABCB4 c.2784−12T>C
We amplified (PrimeSTAR GXL DNA Polymerase, Takara Bio) and cloned a 4,340 bp ABCB4 genomic region from exons 22 to exons 24 into the pCR2.1-TOPO vector (TOPO TA cloning kit, Invitrogen) using the following forward and reverse primers: 5′-GCGATCGCC ATG GTG TCT TTG ACC CAG GAA AGA AA-3′ and 5′-ACG CGT AGA ACT GGC ATG TCC TAG AGC C-3′. Sequence verified pCR2.1-TOPO with this fragment was used as a template to re-amplify the insert (PrimeSTAR GXL DNA Polymerase, Takara Bio) using the following forward and reverse primers: 5′-CAC TTG GCG ATC GCC ATG GTG TCT TTG ACC CAG GAA AGA A-3′ and 5′-GAT AAC ACG CGT AGA ACT GGC ATG TCC TAG AGC C-3′. The primers introduce a 5′ AsiSI/SgfI and 3′ MluI restriction site (bold and underlined) that were used for cloning the fragment into the pCMV6-entry vector (Origene). The c.2784−12T>C variant was introduced using site-directed mutagenesis (Q5 Site-Directed Mutagenesis kit, NEB) with the following oligonucleotides primers: Q5-Fw 5′-AGTATACTGAcTTGCTTTTCAG-3′ (mutated nucleotide in lower case) and Q5-Rev 5′-TGTAACCATCTCTTCAGC-3′. The wild-type and variant pCMV6-ABCB4 were sequenced to confirm the absence and presence of the variant. Both vectors were transfected into HEK293 cells using Lipofectamine 2000. After 24 h, cells were lysed in QIAzol and RNA isolated (RNeasy mini kit, QIAGEN). RNA was used for cDNA synthesis (SuperScript IV First-strand Synthesis System, Invitrogen) after which the splicing of exons 22–24 was studied using PCR. Because HEK293 cells express low levels of native ABCB4, we used the forward primer annealing in exon 22 used for cloning and a reverse primer on the MYC-DDK tag of the pCMV6 vector: DDK reverse 5′-CCT TAT CGT CGT CAT CCT TGT AAT CC-3′. All PCR fragments were Sanger sequenced to confirm their identity.
Results
Inference of IBD haplotypes in PR population
We previously inferred IBD sharing across BioMe and used IBD haplotypes to cluster individuals into communities linked by recent shared ancestry, as described in Belbin et al.33 By using this method, we identified a community of individuals of Puerto Rican (PR) ancestry and observed elevated IBD sharing within this group, suggestive of a founder effect. We clustered IBD haplotypes locally along the genome by homology and identified 4,526,956 homologous IBD-clusters within the PR population. Examining the frequency spectrum of these haplotypic alleles, we observed most to be rare (median haplotypic frequency = 0.04%, Figure S1). We hypothesized that we may be able to leverage these haplotypic alleles as proxies for unobserved rare variants in an association testing framework designed for discovery of monogenic recessive disorders (Figure 1).
Figure 1.
Framework for phemome-wide IBD-mapping in a health system
(A) Distant, cryptic genealogical relationships are present in large, putatively unrelated populations.
(B) This leads to the presence of genomic patterns of IBD haplotype sharing across distantly related individuals. These haplotypes have been co-inherited from a recent, shared common ancestor at some point in recent genealogical history. IBD can be inferred from phased genotype data and clustered into homologous groups.
(C) Different colored identical-by-descent haplotypes represent different groups of IBD that have each been co-inherited identically by multiple individuals and that can be clustered together according to homology.
(D) Phenome-wide IBD mapping performed via phenome-wide association (PheWAS) by extracting phenotypes from the electronic health records (EHR) in the form of ICD-9 billing codes, and systematically testing for association between all identical-by-descent haplotype clusters and all ICD-9 code outcomes.
Phenome-wide association of IBD haplotypes in Puerto Rican community
To systematically explore the relationship between haplotypic alleles and HER-derived health outcomes, we performed a PheWAS under a recessive model implementing the Saddle Point Approximation, which accommodates for rare observations and instances of extreme case-control imbalance. Because our method depends on leveraging cryptic relatedness, we applied our approach specifically within PR BioMe participants, on the basis of previous observations of a founder effect within this group.18 In our model, the haplotypic alleles served as the primary predictor variable and ICD-9 billing codes served as the outcome variable. We restricted analysis to 754 haplotypic alleles for which there were at least 3 observations of individuals that were homozygous for a shared IBD haplotype, and we systematically tested these for association against each ICD-9 code (n = 3,679,520 tests in total). Only one association achieved study-wide significance (SWS, threshold: p < 1.4 × 10−8), an association at a haplotypic allele at 7q21.12 (p < 2.9 × 10−9, haplotypic frequency = 0.7%) (Figure 2A). The significant haplotypic allele represented 3 individuals who were each homozygous for a homologous segment of IBD at the region, and all of whom had EHR record of the rare ICD-9 code “571.6” (which encodes for biliary cirrhosis). While not study-wide significant, the haplotypic allele was also associated with the ICD-9 code “576.1” (which encodes for cholangitis; p < 9.9 × 10−8). In addition to the three individuals who were homozygous for the IBD haplotype at 7q21.12, n = 70 individuals carried the haplotype in the heterozygous state. The significant haplotypic allele spanned a large interval (minimum shared boundary: chr7:86,817,459–90,407,237) (Figure 2B) and contained 21 known genes.
Figure 2.
Phenome-wide population-scale identical-by-descent mapping in Puerto Ricans
(A) PheWAS of recessive identical-by-descent haplotypes against N = 4,880 ICD-9 billing codes revealed an association between “biliary cirrhosis” and an identical-by-descent haplotype at 7q21.12. Red-dashed line represents the study-wide significance threshold. We also observed a suggestive association signal between an identical-by-descent haplotype at 9q32 and “other congenital deformity of the hip,” recapitulating the known signal for steel syndrome in Puerto Ricans at that locus.
(B) The minimum shared boundary of the identical-by-descent haplotypes shared between all three homozygotes at 7q21.12 spans a highly genic mapping interval of 3.6 MB.
Fine-mapping IBD-haplotypic signal uncovers a cryptic splice variant in ABCB4
To fine-map the signal, we performed whole-genome sequencing of all three homozygous carriers and characterized variants that fell within the minimum shared boundary of the haplotypic allele. Under the hypothesis that the causal variant would be rare, we filtered to retain only variants with a global minor allele frequency of <1% (in any population group from gnoMAD or 1000 Genomes; Table S1). We identified a total of 195 that were shared in the homozygous state between all three individuals, none of which represented non-synonymous coding variation. We found 24 sites homozygous in all three individuals that were also present in ClinVar (Table S2). Intersecting this list with the allele frequency data, only one variant had a MAF of <1% across all population databases, a single nucleotide variant (rs201498350, GenBank: NM_000443.4; ABCB4: c.2784−12T>C); this variant had been asserted as “likely pathogenic” for “progressive familial intrahepatic cholestasis, type 1” (PFIC1; MIM: 211600) by a single submitter. The ABCB4 c.2784−12T>C variant has a CADD34 score of 15.9 and a spliceAI35 score of 0.39 (interpreted as the probability of causing a splice acceptor loss). This variant occurs in a polypyrimidine tract 12 bp from the 3′ splice site of intron 22. The natural occurrence of ABCB4 mRNAs (GenBank: NM_018850.2) that lack exon 23 indicates that this splice site is weak and prone to exon skipping. This is further supported by our observation when examining HEK293 cells that ABCB4 cDNA fragments are expressed both with and without exon 23 (Figure S2). Skipping of exon 23 leads to a 141-bp deletion and likely encodes for a non-functional protein due to the deletion of 47 amino acids (929 to 975), which encompasses the majority of transmembrane helix 11 and the last extracellular loop.
In vitro analysis of ABCB4 c.2784−12T>C indicates it causes increased skipping of exon 23
To test whether ABCB4 c.2784−12T>C affects splicing of exon 23, we cloned a genomic region of ABCB4 containing exons 22 to 24 in an expression vector and expressed this fragment in HEK293 cells (Figure 3). RT-PCR shows that the resulting pre-mRNA fragment is spliced into mRNA with and without exon 23. In this assay, the mRNA without exon 23 is more abundant than the mRNA with exon 23. Mutating the consensus T at the −12 position of intron 22 into the less-favored pyrimidine C further decreases splicing efficiency at this acceptor site. Mutating it to the purine G appears to prevent splicing completely. Our results show that the splice acceptor site of intron 22 is weak and that the c.2784−12T>C variant increases skipping of exon 23.
Figure 3.
ABCB4 c.2784−12T>C leads to increased skipping of exon 23
Schematic representation of the approach to study the effect of ABCB4 c.2784−12T>C (rs201498350) on ABCB4 splicing. A genomic region containing exons 22 to exons 24 of ABCB4 was cloned into the pCMV6 expression vector using a forward and reverse primer as indicated. The construct harbors the 3′ 65 bp of exon 22, intron 22 (1,583 bp), exon 23 (141 bp), intron 23 (2,500 bp), and the 5′ 51 bp of exon 24. The location of rs201498350 in the polypyrimidine tract in the splice acceptor site is indicated. The consensus AG at the −2 and −1 position of the splice acceptor is bold and underlined. The rs201498350 as well as another mutation (mutant 2) were introduced into the pCMV6 vector followed by transfection into HEK293 cells (triplicate). The result of the RT-PCR with the Fw and Myc-DDK reverse primers is shown as the inverted colors of the ethidium bromide staining of a 2% agarose gel.
Clinical characterization of ABCB4 c.2784−12T>C in homozygotes
Subsequent to the discovery of the c.2784−12T>C variant, we obtained exome sequencing data for a larger dataset of unrelated BioMe participants (N = 28,344). This included N = 4,332 PR participants who were in the original discovery dataset, and N = 1,015 independent PR participants. Leveraging off-target exome sequencing reads in the independent dataset, we identified two additional participants who were homozygous for the c.2784−12T>C variant. A subject domain expert performed manual chart review of all five homozygotes. Evaluation of outpatient measures of serum liver enzyme levels and liver function tests revealed significant elevation of measures consistent with liver disease (Table 1). Four of the five homozygotes were found to have a diagnosis of cirrhosis on chart review, and the fifth had liver steatosis on imaging. Each homozygote had a distinct etiology of their liver disease: alcohol-associated cirrhosis, primary sclerosing cholangitis, primary biliary cholangitis (with possible component of alcohol-associated liver disease), and cryptogenic cirrhosis. Two had undergone liver transplant and one was found to have an incidental hepatocellular carcinoma on explant.
Table 1.
Summary of outpatient liver enzymes and liver function tests for the five ABCB4 c.2784−12T>C homozygotes
| Individual A |
Individual B |
Individual C |
Individual D |
Individual E |
|||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Serum Measure (units, normal range) | Serum Measure | Median (range) | n | Median (range) | n | Median (range) | n | Median (range) | n | Median (range) | n |
| GGT (U/L, 8–35) | GGT | 847 | 1 | 64.5 (50–127) | 8 | 265 (255–347) | 3 | N/A | 0 | 144 (81–180) | 12 |
| ALT (U/L, 1–45) | ALT | 17.5 (5–93) | 4 | 27.5 (18–52) | 18 | 43 (22–88) | 6 | N/A | 0 | 41 (21–78) | 28 |
| AST (U/L, 1–35) | AST | 15.5 (11–186) | 4 | 68 (55–128) | 18 | 68 (55–128) | 6 | N/A | 0 | 64.5 (37–93) | 28 |
| ALP (U/L, 38–126) | ALP | 83 (66–390) | 4 | 554 (307–738) | 18 | 554 (307–738) | 6 | N/A | 0 | 161.5 (106–190) | 28 |
| Bilirubin (total, mg/dL, 0.1–1.2) | bilirubin (Total) | 1.25 (0.2–10.5) | 4 | 5.4 (0.2–6.6) | 18 | 5.4 (0.2–6.6) | 7 | N/A | 0 | 1.3 (0.9–2.5) | 31 |
| Bilirubin (direct, mg/dL, 0.0–0.8) | bilirubin (Direct) | 1 (0–7.4) | 3 | 3 (1.1–4.1) | 8 | 3 (1.1–4.1) | 3 | N/A | 0 | 0.5 (0.4–1.1) | 18 |
| Platelet count (150–450 × 103/ μL) | platelet count | 199 (131–214) | 5 | 86 (53–103) | 20 | 86 (53–103) | 3 | 215 | 1 | 77 (65–103) | 21 |
| Albumin (g/dL, 3.5–4.9) | albumin | 3.35 (2.5–3.8) | 4 | 2.75 (2.1–3.2) | 18 | 2.75 (2.1–3.2) | 6 | N/A | 0 | 3.1 (2.6–3.6) | 31 |
| INR (international normalized ratio, 0.8–1.1) | INR | 1.1 (1–1.3) | 3 | 1.85 (1.4–2.3) | 10 | 1.85 (1.4–2.3) | 2 | 1 | 1 | 1.2 (65–103) | 10 |
The median measure of each serum measure per homozygote along with 95% confidence intervals. Columns marked “n” denote the number of measures per individual available that contributed to the estimation of the median.
Clinical characterization of ABCB4 c.2784−12T>C in heterozygotes
Variation in ABCB4 is known to confer susceptibility to hepatobiliary disease via both autosomal-dominant (AD) and autosomal-recessive (AR) modes of inheritance, and with variation in severity of disease.36,37 To clinically characterize the c.2784−12T>C variant in heterozygotes, we identified via exome sequence data n = 73 PR participants in the original discovery dataset (of which n = 50 were carriers of the discovery IBD haplotype which has 75% concordance with the causal variant, Table S3) and n = 11 in the independent dataset of PR participants, for a total of n = 84 PR heterozygotes. We compared this cohort to clinical data for n = 5,248 PR participants who did not harbor the c.2784−12T>C variant. To test for evidence of liver and other phenotypes in heterozygous carriers of ABCB4 c.2784−12T>C, we performed a PheWAS of ICD-9 codes. While no association achieved study-wide significance, the ICD-9 “574.10,” which encodes for “calculus of gallbladder with other cholecystitis, without mention of obstruction” was ranked second among all associations (p < 0.002; odds ratio = 7.1; SE = 1.9; Table 2). We also explored the relationship between ABCB4 c.2784−12T>C and nine outpatient serum liver enzyme levels and liver function tests in heterozygous carriers. We extracted these measures (Figure S3) and performed linear regression of ABCB4 c.2784−12T>C carrier status versus the nine laboratory measures, adjusting for age and sex (Table 3). Both alanine aminotransferase (ALT) and aspartate transaminase (AST) were significantly elevated among carriers (p < 0.0007 [beta = 0.39; SE = 0.21] and p < 0.002 [beta = 0.36; SE = 0.21], respectively) after adjusting for multiple testing (study-wide significance threshold p < 0.0056), while the association between ABCB4 c.2784−12T>C carrier status and elevated gamma-glutamyl transferase (GGT) achieved nominal significance (p < 0.03). To follow up these findings, we further evaluated the association of ABCB4 c.2784−12T>C with liver disease phenotypes by performing manual chart review of 50 ABCB4 c.2784−12T>C carriers and 50 age-, sex-, and ancestry-matched non-carriers. We excluded 14 individuals with viral hepatitis from further analysis. Medical records from the remaining 43 carriers and 43 non-carriers were reviewed for evidence of any non-viral liver disease by a physician blinded to subjects’ ABCB4 c.2784−12T>C carrier status. A total of 18 of 43 carriers (41.9%) had evidence of liver disease, compared to 8 of 43 non-carriers (18.6%; p = 0.03, OR = 3.01). Together with the findings of advanced liver disease in homozygotes, this suggests that ABCB4 c.2784−12T>C is associated with increased risk of liver disease in an allele dose-dependent manner.
Table 2.
Phenome-wide association study of heterozygous carrier status of ABCB4 c.2784−12T>C
| ICD-9 Code | Beta | Standard Error | p Value | ICD-9 Translation | Disease Category |
|---|---|---|---|---|---|
| 788.62 | 5.69 | 1.73 | 0.00098 | slowing of urinary stream | genitourinary |
| 574.1 | 1.96 | 0.64 | 0.00222 | calculus of gallbladder with other cholecystitis, without mention of obstruction | digestive |
| 724.02 | 1.66 | 0.59 | 0.00465 | spinal stenosis of lumbar region | musculoskeletal |
| 396.3 | 5.20 | 1.85 | 0.00489 | mitral valve insufficiency and aortic valve insufficiency | circulatory system |
| 695.9 | 2.01 | 0.72 | 0.00537 | unspecified erythematous condition | dermatologic |
| E933.1 | 2.79 | 1.01 | 0.00547 | antineoplastic and immunosuppressive drugs causing adverse effects in therapeutic use | injuries & poisonings |
| 640.03 | 3.34 | 1.22 | 0.00607 | threatened abortion, antepartum | pregnancy complications |
| 680.9 | 3.43 | 1.25 | 0.00616 | carbuncle and furuncle of unspecified site | dermatologic |
| V81.0 | 4.63 | 1.72 | 0.00694 | screening for ischemic heart disease | N/A |
| 249.6 | 2.57 | 0.95 | 0.00704 | secondary diabetes mellitus with neurological manifestations, not stated as uncontrolled, or unspecified | endocrine/metabolic |
Table 3.
Association study of heterozygous carrier status of ABCB4 c.2784−12T>C with nine outpatient serum measures
| Phenotype | Measures | Beta | p Value |
|---|---|---|---|
| Albumin | 4,309 | −0.01 | 0.956 |
| Alkaline Phosphatase | 4,347 | 0.19 | 0.091 |
| Alanine transaminase | 4,367 | 0.39 | 0.0007∗∗ |
| Aspartate transaminase | 4,288 | 0.36 | 0.002∗∗ |
| Direct bilirubin | 2,875 | 0.08 | 0.566 |
| Total bilirubin | 4,331 | 0.03 | 0.781 |
| Gamma-glutamyl transferase | 1,474 | 0.41 | 0.027∗ |
| International normalized ratio | 2,681 | −0.03 | 0.838 |
| Blood platelet count | 4,250 | −0.01 | 0.924 |
Single asterisk (∗) indicates nominal significance. Double asterisk (∗∗) indicates significance after Bonferroni correction.
Impact of PNPLA3 c.444C>G and ABCB4 c.2784−12T>C in PR populations
We also explored the joint impact of ABCB4 c.2784−12T>C and PNPLA3 c.444C>G, a globally common variant previously demonstrated to impact liver traits. The PNPLA3 c.444C>G risk allele is common in H/L populations38,39 and has previously been demonstrated to be associated with the development of non-alcoholic fatty liver disease (NAFLD)40,41 as well as levels of both ALT and AST.42,43 For N = 4,253 PR ancestry individuals for whom complete phenotype and genotype information was available, the minor allele frequency of PNPLA3 c.444C>G was observed to be 33.7%. We examined Z-scores for ALT and AST stratified by genotype for both the ABCB4 and PNPLA3 variants within this subset and noted trends in increasing levels of both liver enzymes with increasing number of copies of the risk alleles at both loci (Figure S4). We also tested for an interaction between being heterozygous for ABCB4 c.2784−12T>C and the number of copies PNPLA3 c.444C>G carried, but did not observe a significant interaction for either ALT levels (p = 0.28 for C/G and p = 0.92 for G/G) or AST levels (p = 0.25 for C/G and p = 0.82 for G/G), although this may be attributable to a lack of statistical power, given the small sample size of carriers within each strata of PNPLA3 c.444C>G genotype status.
Population history and global distribution of ABCB4 c.2784−12T>C
Finally, to gain a better understanding of which populations may harbor the ABCB4 c.2784−12T>C risk variant, we leveraged complete survey information on ethnicity and geographical origin. One homozygote reported being born in Puerto Rico, while the remaining four self-reported being born on the US mainland. By exploring segregation based on self-reported country of birth and self-reported ethnicity, we observed that n = 42 carriers reported being born in PR (out of N = 2,251 PR-born individuals in total), suggesting a carrier rate of 1.95% in PR (Figure 4A). The remaining n = 42 carriers reported being born on the US mainland, with 40 self-identifying as Hispanic/Latino (carrier rate of 1.36%). Three carriers reported being born in the Dominican Republic, one reported being born in Barbados, and the remaining two self-identifying as European American. Examination of local ancestry along the maximum shared boundary of IBD sharing between the three original homozygotes revealed all to be homozygous for European ancestry across the locus (Figure 4B). Additionally, the variant is present in 27 copies in the gnomAD(v.3.1)44,45 database, at a minor allele frequency of 0.16% among the “Latino/admixed American” population, and with a single copy being present in each of the “other,” “African/African American,” and “European (non-Finnish)” populations. We also identified a total of n = 165 carriers in the UK Biobank dataset, n = 163 of which self-identified as “white,” and the remaining two did not report an ethnicity in the survey. Examining carriers by country of origin, we noted that the majority self-reported being born in European countries with the highest carrier rate in Austria (0.5%) and lowest in England (0.03%) (Table 4). Overall this suggests that ABCB4 c.2784−12T>C is segregating at very low frequency in European populations, and arose to higher frequency in the PR population due to a founder effect on the European ancestral background.46
Figure 4.
Minor allele frequency of ABCB4 c.2784−12T>C by geographic country of birth across n = 134 countries in n = 10,232 BioMe participants born outside of the United States
(A) Population screening of ABCB4 c.2784−12T>C based on self-reported country of birth reveals segregation to be geographically restricted to the Caribbean, with elevated frequency in Puerto Rico.
(B) Local ancestry across the maximum shared boundary of the three homozygotes identified through IBD sharing reveals all to be homozygous for European ancestry across the locus.
Table 4.
Minor allele count and carrier rate for n = 165 copies of ABCB4 c.2784−12T>C by country of birth in the UK Biobank
| Count | Country | Carrier rate | Total individuals |
|---|---|---|---|
| 98 | England | 0.026% (0.021%–0.036%) | 379,058 |
| 49 | Scotland | 0.125% (0.095%–0.166%) | 39,110 |
| 6 | Wales | 0.028% (0.013%–0.060%) | 21,563 |
| 3 | Republic of Ireland | 0.063% (0.021%–0.184%) | 4,785 |
| 2 | Northern Ireland | 0.067% (0.018%–0.243%) | 2,989 |
| 1 | Australia | 0.510% (0.090%–2.83%) | 196 |
| 1 | Brazil | 0.369% (0.065%–2.060%) | 271 |
| 1 | Canada | 0.137% (0.024%–0.771%) | 730 |
| 1 | France | 0.117% (0.021%–0.661%) | 853 |
| 1 | Isle of Man | N/A | 8 |
| 1 | New Zealand | 0.145% (0.025%–0.821%) | 686 |
| 1 | Poland | 0.157% (0.028%–0.885%) | 636 |
Discussion
Here we demonstrate that by using IBD sharing to leverage distant genetic relationships in a patient population, and by linking this to a breadth of phenotypes derived from an EHR, it is possible to discover disease driven by monogenic variants segregating appreciably in a large founder population. By combining PheWAS, fine-mapping, and in silico approaches, we implicate a single causal splice variant in ABCB4. Individuals harboring two copies of this variant have evidence of severe liver disease, and there is evidence of both elevated serum liver enzyme levels and an increased risk of more moderate liver disease in those carrying one copy. We demonstrate this variant present on the background of European ancestry tracts and is rare or absent in non-PR populations, and does not appear to interact with another well-known liver trait associated variant in PNPLA3. This work further highlights that PRs represent an understudied founder population with elevated levels of cryptic relatedness, making IBD-based approaches for genomic discovery especially powerful within this population.
We used molecular approaches to investigate the etiology of this finding. We fine-mapped the signal to a non-coding variant in ABCB4 (c.2784−12T>C). We demonstrated that this variant disrupts splicing in vitro, resulting in an mRNA lacking exon 23 and most likely encoding a non-functional protein product. ABCB4 encodes for the ATP binding cassette subfamily B member 4 (ABCB4), also known as multi-drug resistance protein 3 (MDR3).47 The protein is expressed on the canalicular membrane of hepatocytes and is involved in the secretion of phosphatidylcholine,48 an essential component of bile, into the bile canaliculus. This role mitigates the potentially damaging effect of bile salt on the hepatocellular membrane.49,50 Homozygous knockout mice for the murine ortholog, Abcb4, exhibit hepatocellular inflammation and necrosis, as well as damage to the bile ducts.49 In humans, variation in ABCB4 has previously been implicated in numerous forms of hepatobiliary and other liver-related phenotypes.51 Pathogenic variation in ABCB4 is causal for progressive intrahepatic familial cholestasis (PFIC) type 3 (MIM: 602347),52, 53, 54 a severe autosomal-recessive hepatobiliary disease that typically affects children and adolescents.37 Notably, ABCB4 has also previously been implicated in cryptogenic cirrhosis of the liver (MIM: 215600),55,56 as well as a range of milder phenotypes including intrahepatic cholestasis of pregnancy (MIM: 614972),57, 58, 59, 60 drug-induced liver injury,61 and low phospholipid-associated cholelithiasis (MIM: 600803).62 The range of Mendelian ABCB4 variation associated phenotypes has been noted to follow both autosomal-dominant and -recessive modes of inheritance. Furthermore, in large-scale population-based studies, variation in ABCB4 has been statistically associated with elevated risk for a number of liver-related phenotypes, including risk for non-alcoholic fatty liver disease,63 elevated serum liver enzyme levels,64,65 and risk for hepatobiliary carcinoma,66 suggesting that common variation in this gene may play a broader role in liver disease risk in the human population.
Our study also provides evidence of global prevalence rates and impact on health outcomes in individuals of Puerto Rican ancestry in the Mount Sinai Health System (MSHS) in New York City. A previous clinical study of PR individuals with PFIC noted a high prevalence of symptoms representative of ABCB4 deficiency and suggested an ABCB4 founder variant may contribute to the prevalence of PFIC in the PR population.67 We observe that ABCB4 c.2784−12T>C segregates at a carrier rate of ∼2% in PR ancestry individuals, while being rare or absent in non-Caribbean populations. It is plausible that this variant may be linked to the previously observed clinical symptoms, although further study would be required to demonstrate this. In the MSHS, the high carrier rate in PR suggests the existence of hundreds of homozygous individuals who may be at elevated risk for liver disease. In our study, we identified five homozygotes for ABCB4 c.2784−12T>C and found evidence of liver cirrhosis in four. The etiology of liver disease in each was noted to be different, which suggests that ABCB4 c.2784−12T>C could predispose to various forms of liver disease, and none of the four seemed to present with classic PFIC symptoms. We also noted significant elevation of liver enzyme levels in heterozygotes, as well as increased rates of liver diseases in heterozygotes compared to matched non-carriers. These findings are consistent with the known AD and AR inheritance of other ABCB4 variants and indicate 1:2,500 PR individuals may be at risk for severe liver disease.
Taken together, this work demonstrates the utility of genomic discovery in highly diverse health systems for uncovering a continuum of genomic risk for common diseases. Such approaches can be effective tools for accelerating genomic research and expanding genomic medicine applications in populations traditionally underrepresented in biomedical research. Cases like this also support the utility of early or preventive genetic testing to improve diagnosis and to enhance our understanding of prevalence and symptomatology in genomic medicine.
Acknowledgments
This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. Research reported in this paper was supported by the Office of Research Infra-structure of the National Institutes of Health under award numbers R01HG011345, S10OD018522, and S10OD026880. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would also like to acknowledge BioMe participants for their contribution to this study.
Declaration of interests
N.S.A.-H. was previously employed by Regeneron Pharmaceuticals, has received an honorarium from Genentech, and serves on the Scientific Advisory Board for Allelica. E.E.K. has received speaker honoraria from Illumina and Regeneron Pharmaceuticals. C.R.G. owns stock in 23andMe, Inc. The remaining authors declare no competing interests.
Published: October 21, 2021
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2021.09.016.
Contributor Information
Gillian M. Belbin, Email: gillian.belbin@icahn.mssm.edu.
Eimear E. Kenny, Email: eimear.kenny@mssm.edu.
Data and code availability
The genotype and exome-sequencing datasets used in this study were generated by Regeneron and are not publicly available. The data will be made available for purposes of replicating the results by contacting the corresponding author and appropriate collaboration and/or data sharing agreements.
Web resources
EAGLEv2.0.5 package, https://data.broadinstitute.org/alkesgroup/Eagle/downloads/Eagle_v2.0.tar.gz
GenBank, https://www.ncbi.nlm.nih.gov/genbank/
OMIM, https://www.omim.org/
Supplemental information
References
- 1.Donahue R.P., Bias W.B., Renwick J.H., McKusick V.A. Probable assignment of the Duffy blood group locus to chromosome 1 in man. Proc. Natl. Acad. Sci. USA. 1968;61:949–955. doi: 10.1073/pnas.61.3.949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McKusick V.A. Current trends in mapping human genes. FASEB J. 1991;5:12–20. doi: 10.1096/fasebj.5.1.1991580. [DOI] [PubMed] [Google Scholar]
- 3.Claussnitzer M., Cho J.H., Collins R., Cox N.J., Dermitzakis E.T., Hurles M.E., Kathiresan S., Kenny E.E., Lindgren C.M., MacArthur D.G., et al. A brief history of human disease genetics. Nature. 2020;577:179–189. doi: 10.1038/s41586-019-1879-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Biesecker L.G., Green R.C. Diagnostic clinical genome and exome sequencing. N. Engl. J. Med. 2014;371:1170. doi: 10.1056/NEJMc1408914. [DOI] [PubMed] [Google Scholar]
- 5.Turro E., Astle W.J., Megy K., Gräf S., Greene D., Shamardina O., Allen H.L., Sanchis-Juan A., Frontini M., Thys C., et al. NIHR BioResource for the 100,000 Genomes Project Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583:96–102. doi: 10.1038/s41586-020-2434-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Abul-Husn N.S., Manickam K., Jones L.K., Wright E.A., Hartzel D.N., Gonzaga-Jauregui C., O’Dushlaine C., Leader J.B., Lester Kirchner H., Lindbuchler D.M., et al. Genetic identification of familial hypercholesterolemia within a single U.S. health care system. Science. 2016;354:354. doi: 10.1126/science.aaf7000. [DOI] [PubMed] [Google Scholar]
- 7.Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Abul-Husn N.S., Kenny E.E. Personalized Medicine and the Power of Electronic Health Records. Cell. 2019;177:58–69. doi: 10.1016/j.cell.2019.02.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Van Hout C.V., Tachmazidou I., Backman J.D., Hoffman J.D., Liu D., Pandey A.K., Gonzaga-Jauregui C., Khalid S., Ye B., Banerjee N., et al. Geisinger-Regeneron DiscovEHR Collaboration. Regeneron Genetics Center Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature. 2020;586:749–756. doi: 10.1038/s41586-020-2853-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schwartz M.L.B., McCormick C.Z., Lazzeri A.L., Lindbuchler D.M., Hallquist M.L.G., Manickam K., Buchanan A.H., Rahm A.K., Giovanni M.A., Frisbie L., et al. A Model for Genome-First Care: Returning Secondary Genomic Findings to Participants and Their Healthcare Providers in a Large Research Cohort. Am. J. Hum. Genet. 2018;103:328–337. doi: 10.1016/j.ajhg.2018.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Browning S.R., Browning B.L. Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 2012;46:617–633. doi: 10.1146/annurev-genet-110711-155534. [DOI] [PubMed] [Google Scholar]
- 12.Browning S.R., Thompson E.A. Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics. 2012;190:1521–1531. doi: 10.1534/genetics.111.136937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gauvin H., Moreau C., Lefebvre J.-F., Laprise C., Vézina H., Labuda D., Roy-Gagnon M.-H. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur. J. Hum. Genet. 2014;22:814–821. doi: 10.1038/ejhg.2013.227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Thompson E.A. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013;194:301–326. doi: 10.1534/genetics.112.148825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Te Meerman G.J., Van der Meulen M.A., Sandkuijl L.A. Perspectives of identity by descent (IBD) mapping in founder populations. Clin. Exp. Allergy. 1995;25(Suppl 2):97–102. doi: 10.1111/j.1365-2222.1995.tb00433.x. [DOI] [PubMed] [Google Scholar]
- 16.Houwen R.H., Baharloo S., Blankenship K., Raeymaekers P., Juyn J., Sandkuijl L.A., Freimer N.B. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nat. Genet. 1994;8:380–386. doi: 10.1038/ng1294-380. [DOI] [PubMed] [Google Scholar]
- 17.Bastarache L., Hughey J.J., Hebbring S., Marlo J., Zhao W., Ho W.T., Van Driest S.L., McGregor T.L., Mosley J.D., Wells Q.S., et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359:1233–1239. doi: 10.1126/science.aal4043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Belbin G.M., Odgis J., Sorokin E.P., Yee M.-C., Kohli S., Glicksberg B.S., Gignoux C.R., Wojcik G.L., Van Vleck T., Jeff J.M., et al. Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. eLife. 2017;6:6. doi: 10.7554/eLife.25060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Denny J.C., Ritchie M.D., Basford M.A., Pulley J.M., Bastarache L., Brown-Gentry K., Wang D., Masys D.R., Roden D.M., Crawford D.C. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Reichert M.C., Lammert F. ABCB4 Gene Aberrations in Human Liver Disease: An Evolving Spectrum. Semin. Liver Dis. 2018;38:299–307. doi: 10.1055/s-0038-1667299. [DOI] [PubMed] [Google Scholar]
- 21.Vishnu A., Belbin G.M., Wojcik G.L., Bottinger E.P., Gignoux C.R., Kenny E.E., Loos R.J.F. The role of country of birth, and genetic and self-identified ancestry, in obesity susceptibility among African and Hispanic Americans. Am. J. Clin. Nutr. 2019;110:16–23. doi: 10.1093/ajcn/nqz098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Loh P.-R., Danecek P., Palamara P.F., Fuchsberger C., A Reshef Y., K Finucane H., Schoenherr S., Forer L., McCarthy S., Abecasis G.R., et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Roshyara N.R., Scholz M. fcGENE: a versatile tool for processing and transforming SNP datasets. PLoS ONE. 2014;9:e97589. doi: 10.1371/journal.pone.0097589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gusev A., Lowe J.K., Stoffel M., Daly M.J., Altshuler D., Breslow J.L., Friedman J.M., Pe’er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. doi: 10.1101/gr.081398.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rosvall M., Bergstrom C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA. 2008;105:1118–1123. doi: 10.1073/pnas.0706851105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rosvall M., Axelsson D., Bergstrom C.T. The map equation. Eur. Phys. J. Spec. Top. 2009;178:13–23. [Google Scholar]
- 29.Gusev A., Kenny E.E., Lowe J.K., Salit J., Saxena R., Kathiresan S., Altshuler D.M., Friedman J.M., Breslow J.L., Pe’er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am. J. Hum. Genet. 2011;88:706–717. doi: 10.1016/j.ajhg.2011.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Dey R., Schmidt E.M., Abecasis G.R., Lee S. A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS. Am. J. Hum. Genet. 2017;101:37–49. doi: 10.1016/j.ajhg.2017.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Linderman M.D., Brandt T., Edelmann L., Jabado O., Kasai Y., Kornreich R., Mahajan M., Shah H., Kasarskis A., Schadt E.E. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med. Genomics. 2014;7:20. doi: 10.1186/1755-8794-7-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhou W., Nielsen J.B., Fritsche L.G., Dey R., Gabrielsen M.E., Wolford B.N., LeFaive J., VandeHaar P., Gagliano S.A., Gifford A., et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 2018;50:1335–1341. doi: 10.1038/s41588-018-0184-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Belbin G.M., Wenric S., Cullina S., Glicksberg B.S., Moscati A., Wojcik G.L., Shemirani R., Beckmann N.D., Cohain A., Sorokin E.P., et al. Towards a fine-scale population health monitoring system. Cell. 2019;184:2068–2083. doi: 10.1016/j.cell.2021.03.034. [DOI] [PubMed] [Google Scholar]
- 34.Rentzsch P., Witten D., Cooper G.M., Shendure J., Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886–D894. doi: 10.1093/nar/gky1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jaganathan K., Kyriazopoulou Panagiotopoulou S., McRae J.F., Darbandi S.F., Knowles D., Li Y.I., Kosmicki J.A., Arbelaez J., Cui W., Schwartz G.B., et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176:535–548.e24. doi: 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]
- 36.Sticova E., Jirsa M. ABCB4 disease: Many faces of one gene deficiency. Ann. Hepatol. 2020;19:126–133. doi: 10.1016/j.aohep.2019.09.010. [DOI] [PubMed] [Google Scholar]
- 37.Stättermayer A.F., Halilbasic E., Wrba F., Ferenci P., Trauner M. Variants in ABCB4 (MDR3) across the spectrum of cholestatic liver diseases in adults. J. Hepatol. 2020;73:651–663. doi: 10.1016/j.jhep.2020.04.036. [DOI] [PubMed] [Google Scholar]
- 38.Edelman D., Kalia H., Delio M., Alani M., Krishnamurthy K., Abd M., Auton A., Wang T., Wolkoff A.W., Morrow B.E. Genetic analysis of nonalcoholic fatty liver disease within a Caribbean-Hispanic population. Mol. Genet. Genomic Med. 2015;3:558–569. doi: 10.1002/mgg3.168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Goran M.I., Walker R., Le K.-A., Mahurkar S., Vikman S., Davis J.N., Spruijt-Metz D., Weigensberg M.J., Allayee H. Effects of PNPLA3 on liver fat and metabolic profile in Hispanic children and adolescents. Diabetes. 2010;59:3127–3130. doi: 10.2337/db10-0554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kotronen A., Johansson L.E., Johansson L.M., Roos C., Westerbacka J., Hamsten A., Bergholm R., Arkkila P., Arola J., Kiviluoto T., et al. A common variant in PNPLA3, which encodes adiponutrin, is associated with liver fat content in humans. Diabetologia. 2009;52:1056–1060. doi: 10.1007/s00125-009-1285-z. [DOI] [PubMed] [Google Scholar]
- 41.Romeo S., Kozlitina J., Xing C., Pertsemlidis A., Cox D., Pennacchio L.A., Boerwinkle E., Cohen J.C., Hobbs H.H. Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet. 2008;40:1461–1465. doi: 10.1038/ng.257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Romeo S., Sentinelli F., Dash S., Yeo G.S.H., Savage D.B., Leonetti F., Capoccia D., Incani M., Maglio C., Iacovino M., et al. Morbid obesity exposes the association between PNPLA3 I148M (rs738409) and indices of hepatic injury in individuals of European descent. Int. J. Obes. 2010;34:190–194. doi: 10.1038/ijo.2009.216. [DOI] [PubMed] [Google Scholar]
- 43.Larrieta-Carrasco E., Acuña-Alonzo V., Velázquez-Cruz R., Barquera-Lozano R., León-Mimila P., Villamil-Ramírez H., Menjivar M., Romero-Hidalgo S., Méndez-Sánchez N., Cárdenas V., et al. PNPLA3 I148M polymorphism is associated with elevated alanine transaminase levels in Mexican Indigenous and Mestizo populations. Mol. Biol. Rep. 2014;41:4705–4711. doi: 10.1007/s11033-014-3341-0. [DOI] [PubMed] [Google Scholar]
- 44.Koch L. Exploring human genomic diversity with gnomAD. Nat. Rev. Genet. 2020;21:448. doi: 10.1038/s41576-020-0255-7. [DOI] [PubMed] [Google Scholar]
- 45.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. Genome Aggregation Database Consortium The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Moreno-Estrada A., Gravel S., Zakharia F., McCauley J.L., Byrnes J.K., Gignoux C.R., Ortiz-Tello P.A., Martínez R.J., Hedges D.J., Morris R.W., et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 2013;9:e1003925. doi: 10.1371/journal.pgen.1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lincke C.R., Smit J.J., van der Velde-Koerts T., Borst P. Structure of the human MDR3 gene and physical mapping of the human MDR locus. J. Biol. Chem. 1991;266:5303–5310. [PubMed] [Google Scholar]
- 48.van Helvoort A., Smith A.J., Sprong H., Fritzsche I., Schinkel A.H., Borst P., van Meer G. MDR1 P-glycoprotein is a lipid translocase of broad specificity, while MDR3 P-glycoprotein specifically translocates phosphatidylcholine. Cell. 1996;87:507–517. doi: 10.1016/s0092-8674(00)81370-7. [DOI] [PubMed] [Google Scholar]
- 49.Smit J.J., Schinkel A.H., Oude Elferink R.P., Groen A.K., Wagenaar E., van Deemter L., Mol C.A., Ottenhoff R., van der Lugt N.M., van Roon M.A., et al. Homozygous disruption of the murine mdr2 P-glycoprotein gene leads to a complete absence of phospholipid from bile and to liver disease. Cell. 1993;75:451–462. doi: 10.1016/0092-8674(93)90380-9. [DOI] [PubMed] [Google Scholar]
- 50.Morita S.-Y., Terada T. Molecular mechanisms for biliary phospholipid and drug efflux mediated by ABCB4 and bile salts. BioMed Res. Int. 2014;2014:954781. doi: 10.1155/2014/954781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Davit-Spraul A., Gonzales E., Baussan C., Jacquemin E. The spectrum of liver diseases related to ABCB4 gene mutations: pathophysiology and clinical aspects. Semin. Liver Dis. 2010;30:134–146. doi: 10.1055/s-0030-1253223. [DOI] [PubMed] [Google Scholar]
- 52.de Vree J.M., Jacquemin E., Sturm E., Cresteil D., Bosma P.J., Aten J., Deleuze J.F., Desrochers M., Burdelski M., Bernard O., et al. Mutations in the MDR3 gene cause progressive familial intrahepatic cholestasis. Proc. Natl. Acad. Sci. USA. 1998;95:282–287. doi: 10.1073/pnas.95.1.282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Degiorgio D., Colombo C., Seia M., Porcaro L., Costantino L., Zazzeron L., Bordo D., Coviello D.A. Molecular characterization and structural implications of 25 new ABCB4 mutations in progressive familial intrahepatic cholestasis type 3 (PFIC3) Eur. J. Hum. Genet. 2007;15:1230–1238. doi: 10.1038/sj.ejhg.5201908. [DOI] [PubMed] [Google Scholar]
- 54.Deleuze J.F., Jacquemin E., Dubuisson C., Cresteil D., Dumont M., Erlinger S., Bernard O., Hadchouel M. Defect of multidrug-resistance 3 gene expression in a subtype of progressive familial intrahepatic cholestasis. Hepatology. 1996;23:904–908. doi: 10.1002/hep.510230435. [DOI] [PubMed] [Google Scholar]
- 55.Gotthardt D., Runz H., Keitel V., Fischer C., Flechtenmacher C., Wirtenberger M., Weiss K.H., Imparato S., Braun A., Hemminki K., et al. A mutation in the canalicular phospholipid transporter gene, ABCB4, is associated with cholestasis, ductopenia, and cirrhosis in adults. Hepatology. 2008;48:1157–1166. doi: 10.1002/hep.22485. [DOI] [PubMed] [Google Scholar]
- 56.Ziol M., Barbu V., Rosmorduc O., Frassati-Biaggi A., Barget N., Hermelin B., Scheffer G.L., Bennouna S., Trinchet J.-C., Beaugrand M., Ganne-Carrié N. ABCB4 heterozygous gene mutations associated with fibrosing cholestatic liver disease in adults. Gastroenterology. 2008;135:131–141. doi: 10.1053/j.gastro.2008.03.044. [DOI] [PubMed] [Google Scholar]
- 57.Wasmuth H.E., Glantz A., Keppeler H., Simon E., Bartz C., Rath W., Mattsson L.-A., Marschall H.-U., Lammert F. Intrahepatic cholestasis of pregnancy: the severe form is associated with common variants of the hepatobiliary phospholipid transporter ABCB4 gene. Gut. 2007;56:265–270. doi: 10.1136/gut.2006.092742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Anzivino C., Odoardi M.R., Meschiari E., Baldelli E., Facchinetti F., Neri I., Ruggiero G., Zampino R., Bertolotti M., Loria P., Carulli L. ABCB4 and ABCB11 mutations in intrahepatic cholestasis of pregnancy in an Italian population. Dig. Liver Dis. 2013;45:226–232. doi: 10.1016/j.dld.2012.08.011. [DOI] [PubMed] [Google Scholar]
- 59.Johnston R.C., Stephenson M.L., Nageotte M.P. Novel heterozygous ABCB4 gene mutation causing recurrent first-trimester intrahepatic cholestasis of pregnancy. J. Perinatol. 2014;34:711–712. doi: 10.1038/jp.2014.86. [DOI] [PubMed] [Google Scholar]
- 60.Müllenbach R., Linton K.J., Wiltshire S., Weerasekera N., Chambers J., Elias E., Higgins C.F., Johnston D.G., McCarthy M.I., Williamson C. ABCB4 gene sequence variation in women with intrahepatic cholestasis of pregnancy. J. Med. Genet. 2003;40:e70. doi: 10.1136/jmg.40.5.e70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Lang C., Meier Y., Stieger B., Beuers U., Lang T., Kerb R., Kullak-Ublick G.A., Meier P.J., Pauli-Magnus C. Mutations and polymorphisms in the bile salt export pump and the multidrug resistance protein 3 associated with drug-induced liver injury. Pharmacogenet. Genomics. 2007;17:47–60. doi: 10.1097/01.fpc.0000230418.28091.76. [DOI] [PubMed] [Google Scholar]
- 62.Rosmorduc O., Poupon R. Low phospholipid associated cholelithiasis: association with mutation in the MDR3/ABCB4 gene. Orphanet J. Rare Dis. 2007;2:29. doi: 10.1186/1750-1172-2-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Vujkovic M., Ramdas S., Lorenz K.M., Schneider C.V., Park J., Lee K.M., Serper M., Carr R.M., Kaplan D.E., Haas M.E., et al. A genome-wide association study for nonalcoholic fatty liver disease identifies novel genetic loci and trait-relevant candidate genes in the Million Veteran Program. medRxiv. 2021 2020.12.26.20248491. [Google Scholar]
- 64.Kanai M., Akiyama M., Takahashi A., Matoba N., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
- 65.Gudbjartsson D.F., Helgason H., Gudjonsson S.A., Zink F., Oddson A., Gylfason A., Besenbacher S., Magnusson G., Halldorsson B.V., Hjartarson E., et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 2015;47:435–444. doi: 10.1038/ng.3247. [DOI] [PubMed] [Google Scholar]
- 66.Lammert F., Hochrath K. A letter on ABCB4 from Iceland: On the highway to liver disease. Clin. Res. Hepatol. Gastroenterol. 2015;39:655–658. doi: 10.1016/j.clinre.2015.08.004. [DOI] [PubMed] [Google Scholar]
- 67.Soler D.M., Del Valle A.I., Fernandez-Lube D., Shneider B.L. Cross-Sectional Analysis of Progressive Familial Intrahepatic Cholestasis in Puerto Rican Children. P. R. Health Sci. J. 2016;35:220–223. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genotype and exome-sequencing datasets used in this study were generated by Regeneron and are not publicly available. The data will be made available for purposes of replicating the results by contacting the corresponding author and appropriate collaboration and/or data sharing agreements.




