Abstract
Biobanks with exomes linked to electronic health records (EHRs) enable the study of genetic pleiotropy between rare variants and seemingly disparate diseases. We performed robust clinical phenotyping of rare, putatively deleterious variants (loss-of-function [LoF] and deleterious missense variants) in ERCC6, a gene implicated in inherited retinal disease. We analyzed 213,084 exomes, along with a targeted set of retinal, cardiac, and immune phenotypes from two large-scale EHR-linked biobanks. In the primary analysis, a burden of deleterious variants in ERCC6 was strongly associated with 1) retinal disorders; 2) cardiac and electrocardiogram perturbations; and 3) immunodeficiency and decreased immunoglobulin levels. Meta-analysis of results from the BioMe Biobank and UK Biobank showed significant association of deleterious ERCC6 burden with retinal dystrophy (OR=2.6, 95% CI 1.5–4.6; P=8.7 × 10−4), atypical atrial flutter (OR=3.5, 95% CI 1.9–6.5; P=6.2 × 10−5), arrhythmia (OR=1.5, 95% CI 1.2–2.0; P=2.7 × 10−3), and lymphocyte immunodeficiency (OR=3.8, 95% CI 2.1–6.8; P=5.0 × 10−6). Carriers of ERCC6 LoF variants who lacked a diagnosis of these conditions exhibited increased symptoms, indicating underdiagnosis. These results reveal a unique genetic link among retinal, cardiac, and immune disorders and underscore the value of EHR-linked biobanks in assessing the full clinical profile of carriers of rare variants.
Keywords: ERCC6, whole-exome sequencing, rare variant, pleiotropy, genotype-first diagnosis
The advent of high-throughput DNA sequencing and population biobanks has afforded an unprecedented opportunity to interrogate the impact of rare variants on disease (Cirulli et al., 2020; Dewey et al., 2016; Son et al., 2018). In biobanks, exomes of participants are linked to a rich set of phenotypes, including electronic health records (EHRs), laboratory results, and electrocardiogram (ECG) measurements (Bycroft et al., 2018; Swede et al., 2007). This provides an ideal setting for investigating the clinical consequences of rare alleles across a wide spectrum of medical phenotypes, which is otherwise not possible with epidemiological case-control data (Kohane, 2011). Rare predicted loss-of-function (LoF) and deleterious missense variants in a gene can be collapsed (i.e., a gene burden), which increases the statistical power of regression when testing for association with clinical traits (Lee et al., 2014; Park et al., 2020).
Retinal disorders, such as age-related macular degeneration (AMD) and retinal dystrophies (RD), have a significant genetic component (Inglehearn, 1998; Seddon et al., 2005). Several genes implicated in retinal disease demonstrate pleiotropy with extra-ocular disorders: for example, recessive homozygosity in excision repair cross-complementing group 6 (ERCC6) results in RD and the multi-system Cockayne Syndrome B (CSB) (Licht et al., 2003). Yet clinical studies of rare variants implicated in retinal disease are scarce and with limited phenotypes (Corton et al., 2013; Sardell et al., 2016). Thus, we sought to uncover the genetic pleiotropy of rare variants in ERCC6 among ancestrally diverse individuals in the BioMe Biobank (BioMe) and UK Biobank (UKB). We selected a targeted set of retinal, cardiac, and immune traits to test a priori based on phenotypic overlap with CSB anomalies (Bailey et al., 2012; Muzaffar & Hussain, 2003; Nance & Berry, 1992).
METHODS
All individuals recruited to BioMe were patients from clinical practice sites in the Mount Sinai Health System. Informed consent was obtained from all participants for the storage of biological specimens, genetic sequencing, and access to EHR data. The study was approved by the Institutional Review Board of the Icahn School of Medicine at Mount Sinai and adhered to the tenets of the Declaration of Helsinki. Data from UKB, which is governed by its own ethics review process (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics), was used under Application Number 16218.
Identification of cases of retinal, cardiac, and immune diseases
Our study implemented a targeted approach to examine carriers of rare variants in ERCC6 for a set of clinical traits that phenotypically overlap with Mendelian disease. We selected a set of eight retinal, cardiac, and immune disorders to test a priori based on their phenotypic intersection with inherited retinal disease and CSB (Bailey et al., 2012; Muzaffar & Hussain, 2003; Nance & Berry, 1992). These included two retinal disorders (AMD and RD), three cardiac phenotypes (arrhythmia, atypical atrial flutter, and QRS duration), and three immune phenotypes (lymphocyte immunodeficiency, IgM, and IgG levels). In BioMe, participants were identified as cases if they had a corresponding International Classification of Diseases-Clinical Modification 10 (ICD-10) diagnosis code in the EHR. In UKB, participants were identified as cases if they had a corresponding ICD-10 diagnosis code and self-report where available. A summary of cases and ICD-10 diagnosis codes for the BioMe and UKB cohorts is provided (Supp. Table S1). For quantitative traits, immunoglobulin levels were obtained for IgM (n=2,501) and IgG (n=2,489) in BioMe, while electrocardiogram (ECG) measurements of the QRS duration were ascertained in both BioMe (n=15,384) and UKB (n=20,977).
Exome sequencing
The BioMe dataset comprised 28,877 individuals who underwent exome sequencing and passed quality control (QC) parameters at the sample and variant levels. On the variant level, variant call files (VCFs) produced by Illumina v4 HiSeq 2500 contained 9,202,884 variants called in 31,250 individuals. The Goldilocks Filter (GF) was applied to the VCFs (Van Hout et al., 2020). For single nucleotide polymorphisms (SNPs), cells with depth-normalized quality scores <3 or depth of coverage <7 were set to missing. For insertions and deletions (indels), cells with depth-normalized quality scores <5 or depth of coverage <10 were set to missing. Variant sites were then filtered, whereby sites of heterozygous variation failed the Allele Balance (AB) cutoff and were removed. SNP sites required ≥1 sample to carry an alternate AB ≥15% and indel sites required ≥1 sample to carry an alternate AB ≥20%. Together, these site filters removed 441,406 sites, leaving 8,761,478 variants after GF was applied. Next, sites with missing genotypes for >2% of individuals in the dataset (267,955 sites) were removed. AB was calculated for biallelic SNPs and 320,877 sites with AB <0.3 or >0.8 were removed, leaving 8,172,646 sites. Lastly, the dataset was filtered to regions within the target regions of the exome capture platform (4,256,827 sites) and separated into 2 file sets for biallelic and multiallelic sites (3,948,623 and 308,204, respectively) due to differences in QC procedures. On the sample level, 2,102 samples with discordance between genetic sex and sex listed in the manifest, relatedness to another sample up to the third degree, low coverage, contamination, low call rate, or duplications were removed, leaving 29,148 individuals. The remaining individuals who had complete demographic and clinical phenotype information (n=28,877) were included for downstream analysis.
The UKB dataset consisted of 184,207 individuals who had genetic principal component and ICD-10 diagnosis data available out of the 200,643 individuals with exome sequences from the OQFE pipeline described elsewhere (Szustakowski et al., 2020). Briefly, exomes captured with IDT xGen Exome Research Panel v.1.0 yielded approximately 10 million variants within the target regions. SNPs with missingness >10% and Hardy-Weinberg Equilibrium test with P<1 × 10−15 were removed. SNP genotypes with read depth <7 and indel genotypes with read depth <10 were set to missing. Samples with sex discordance, low coverage, contamination, low call rate, relatedness to another sample up to the third degree, duplications, or discordance between exome sequence variants and genotyping chip were excluded.
Annotation and identification of rare putatively deleterious variants for burden testing
In both BioMe and UKB, we identified rare putatively deleterious variants with allele frequency <1%, including predicted LoF and deleterious missense variants. Rare predicted LoF variants were identified by annotations of frameshift, splice donor, splice acceptor, stop gained, stop lost, or start lost using Variant Effect Predictor (VEP) (McLaren et al., 2016). This allele frequency threshold has previously been used to enrich for rare, highly impactful variants (Do et al., 2015). Rare missense variants in ERCC6 were identified by an annotation of missense using VEP, before being categorized further on the basis of predicted deleterious effect on ERCC6 protein by CADD (Rentzsch et al., 2019), MutationTaster (Schwarz et al., 2010), PolyPhen-2 HumDiv (Adzhubei et al., 2013), PolyPhen-2 HumVar (Adzhubei et al., 2013), and SIFT (Ng & Henikoff, 2003). To enrich for deleterious missense alleles, we only considered missense alleles annotated as deleterious by all five protein prediction algorithms (strictly deleterious) (Do et al., 2015; Purcell et al., 2014).
Statistical analysis and burden testing
In the primary analysis, we aggregated all rare LoF and deleterious missense variants (collectively putatively deleterious variants) in ERCC6 for each carrier and evaluated this burden of putatively deleterious ERCC6 variants for association with retinal, cardiac, and immune traits. In a sensitivity analysis of variant types, a burden of LoF variants and a burden of missense variants in ERCC6 were separately assessed for association with the aforementioned phenotypes. In a subgroup analysis of different ancestries, pleiotropy of a burden of putatively deleterious variants in ERCC6 was examined in four self-reported ancestries (European, African, Hispanic, and Other) in BioMe. For binary traits in burden testing, logistic regression with Firth’s penalized likelihood was used to compute an odds ratio (OR) and 95% confidence intervals (CI), adjusted for age, sex, BMI, and 10 genetic principal components (PCs). Firth’s approach accounts for potential bias of small sample sizes by offsetting the first-order term in the asymptotic expansion of the bias during estimation of the maximum likelihood (Firth, 1993; Wang, 2014). For quantitative traits in burden testing (i.e., ECG and immunoglobulin measurements), linear regression was used to calculate an effect size estimate (β) and standard error (SE), adjusted for age, sex, BMI, and 10 genetic PCs. A fixed-effects inverse-variance weighted meta-analysis of regression results from BioMe and UKB was conducted using the meta R package (Schwarzer et al., 2015). To account for multiple testing of the burden of putatively deleterious ERCC6 variants with eight total phenotypes in the primary analysis, we used a conservative Bonferroni correction of P<6.3 × 10−3 (0.05/8). We also report nominal associations with P<0.05 as the phenotypes tested have a strong biological rationale.
Association of ERCC6 LoF burden with symptoms of underdiagnosed diseases
We examined the association of a burden of LoF variants in ERCC6 with clinical symptoms of three diseases (RD, atypical atrial flutter, and lymphocyte immunodeficiency) that had strong associations in burden testing. In individuals without a formal diagnosis for the three diseases, the prevalence of symptoms among carriers and a random sample of 370 non-carriers of ERCC6 LoF variants was compared via Fisher’s exact test. We manually examined physician notes in the EHR for symptoms of the three diseases while blinded to ERCC6 carrier status: 1) RD symptoms included loss of peripheral vision (e.g. “tunnel vision”), nyctalopia, loss of color vision, photophobia (e.g. “glare” or “light sensitive”), and decreased visual acuity (e.g. “difficulty reading”); 2) atypical atrial flutter symptoms included palpitations (e.g. “heart pounding”), tachycardia (e.g. “fast heart beat”), dyspnea (e.g. “shortness of breath”), angina (e.g. “chest pain”), dizziness, light-headedness, and syncope; and 3) lymphocyte immunodeficiency symptoms included recurrent bacterial infection, pulmonary infection (e.g. pneumonia, bronchitis), sinus infection (e.g. chronic sinusitis), and gastrointestinal infection (i.e. campylobacter, giardia).
RESULTS
We included 28,877 individuals from BioMe and 184,207 individuals from UKB who had genotype and phenotype data and passed quality control. Both biobanks had fewer males than females (43% males in BioMe and 45% males in UKB), with a mean age of 58 years (standard deviation [SD]=18 years) in BioMe and 56 years (SD=8.1 years) in UKB (Table 1). BioMe consisted of individuals of diverse ancestries (33% Hispanic, 24% African, and 33% European) while UKB was predominantly composed of individuals of European ancestry (94%). Cases of retinal, cardiac, and immune disorders were present in both BioMe and UKB (Table 1; Supp. Table S1). A total of 985 (3.4%) individuals in BioMe and 3,694 (2.0%) individuals in UKB were carriers of putatively deleterious ERCC6 variants, of whom 12 in BioMe and 39 in UKB had more than one putatively deleterious allele, with a maximum of three putatively deleterious alleles in one carrier from UKB (Supp. Table S2 and Supp. Table S3).
Table 1.
Overview of demographic and clinical traits in the BioMe Biobank and UK Biobank.
Trait | BioMe (n=28,877) | UKB (n=184,207) |
---|---|---|
Male, n (%) | 12,338 (43) | 82,163 (45) |
Age, mean (SD) | 58 (18) | 56 (8.1) |
European, n (%) | 9,559 (33) | 173,648 (94) |
Hispanic, n (%) | 9,387 (33) | -- |
African, n (%) | 6,847 (24) | 1,904 (1.0) |
Other, n (%) | 3,023 (10) | 4,704 (2.6) |
BMI in kg/m2, mean (SD) | 28 (6.7) | 27 (4.8) |
Age-related macular degeneration, n (%) | 924 (3.2) | 3,136 (1.7) |
Retinal dystrophy, n (%) | 105 (0.36) | 62 (0.034) |
Arrhythmia, n (%) | 195 (0.68) | 1,573 (0.85) |
Atypical atrial flutter, n (%) | 62 (0.21) | 80 (0.043) |
Lymphocyte immunodeficiency, n (%) | 100 (0.35) | 58 (0.031) |
QRS duration in ms, mean (SD) | 91 (19) | 89 (14) |
IgM level in mg/dL, mean (SD) | 116 (105) | -- |
IgG level in mg/dL, mean (SD) | 1,316 (635) | -- |
Overview of demographic and clinical traits in the BioMe Biobank (BioMe) and UK Biobank (UKB). n, number; SD, standard deviation; ms, milliseconds; other, includes Asian, Pacific Islander, Native American and miscellaneous ancestries in BioMe, and Asian and miscellaneous ancestries in UKB
no Hispanic ancestry individuals are in UKB and immunoglobulin data is not available in UKB.
We performed a meta-analysis of the association of a burden of putatively deleterious variants in ERCC6 with the targeted set of pleiotropic phenotypes across BioMe and UKB. We first assessed a burden of putatively deleterious variants in ERCC6 for association with the retinal disorders RD and AMD (Figure 1). In the primary analysis, a burden of putatively deleterious ERCC6 variants was significantly associated with RD (OR=2.6, 95% CI 1.5–4.6; P=8.7 × 10−4) and nominally associated with AMD (OR=1.3, 95% CI 1.1–1.5; P=8.2 × 10−3). In the sensitivity analyses stratified by variant type, a burden of LoF ERCC6 variants was significantly and strongly associated with RD (OR=12, 95% CI 4.0–39; P=1.5 × 10−5) while a burden of deleterious missense variants in ERCC6 was significantly associated with AMD (OR=1.3, 95% CI 1.1–1.6; P=1.8 × 10−3).
Figure 1.
Meta-analysis of a burden of putatively deleterious ERCC6 variants on odds of retinal, cardiac, and immune disorders in the BioMe Biobank and UK Biobank.
Meta-analysis of a burden of putatively deleterious ERCC6 variants on odds of retinal, cardiac, and immune disorders in the BioMe Biobank and UK Biobank, adjusting for age, sex, BMI, and 10 genetic principal components. Putatively deleterious variants included loss-of-function (LoF) variants of frameshift, splice acceptor/donor, stop gained/lost, or start lost consequence, and deleterious missense (missense) variants. Only rare putatively deleterious variants with allele frequency <1% were included. Results of a fixed-effects inverse variance-weighted meta-analysis across biobanks are shown as a forest plot, with the primary analysis of putatively deleterious variants depicted as five large box plots and the sensitivity analyses of LoF and missense variants nested underneath as pairs of smaller box plots. AMD, age-related macular degeneration; class, of variants including deleterious, LoF, or missense; OR, adjusted odds ratio; 95% CI, 95% confidence interval.
We then examined the burden of putatively deleterious ERCC6 variants for evidence of pleiotropy with two cardiac electrical disorders in BioMe (Figure 1). Strikingly, a burden of putatively deleterious variants in ERCC6 was strongly associated with both atypical atrial flutter (OR=3.5, 95% CI 1.9–6.5; P=6.2 × 10−5) and arrhythmia (OR=1.5, 95% CI 1.2–2.0; P=2.7 × 10−3) in the primary analysis. One carrier of a stop gained variant in ERCC6 (NC_000010.11:c.49470632C>A) from BioMe was diagnosed with both AMD and arrhythmia. Physician notes in their EHR showed unremitting blurred vision and RPE atrophy, along with multiple syncopal episodes accompanied by arrhythmia and tachycardia. We also investigated the relationship of ERCC6 with the ECG parameter of QRS duration to quantify electrical disturbances that may contribute to these cardiac presentations. A burden of putatively deleterious ERCC6 variants was associated with nominally shorter QRS in the primary analysis (β=−1.7 ms, SE=0.73 ms; P=1.8 × 10−2) and a burden of deleterious missense variants in ERCC6 was associated with significantly shorter QRS in the sensitivity analysis (β=−1.5 ms, SE=0.52 ms; P=4.2 × 10−3) (Supp. Table S4). Publicly available expression data from Genotype-Tissue Expression (Lonsdale et al., 2013), BioGPS (Wu et al., 2009), and Human Integrated Protein Expression (Fishilevich et al., 2016) showed that ERCC6 RNA and protein are expressed in multiple tissues, including the heart (Supp. Figure S1), providing a biological context for our findings of cardiac pleiotropy.
Next, we characterized the pleiotropy of a burden of putatively deleterious ERCC6 variants with immunological conditions. Immune dysfunction is another feature of CSB (Bailey et al., 2012; Muzaffar & Hussain, 2003; Nance & Berry, 1992) and lymphocytes express ERCC6 RNA and protein (Supp. Figure S1). We therefore interrogated the impact of a burden of putatively deleterious ERCC6 variants on lymphocyte immunodeficiency, observing a robust association in the primary analysis (OR=3.8, 95% CI 2.1–6.8; P=5.0 × 10−6) (Figure 1). In the sensitivity analysis, a burden of LoF variants in ERCC6 had an even stronger association with immunodeficiency (OR=24, 95% CI 9.9–56; P=9.0 × 10−13). One carrier of a frameshift variant in ERCC6 (NC_000010.11:c.49532919_49532926del) from BioMe had both diagnosed retinal dystrophy and immunodeficiency. Physician notes in their EHR documented retinal degeneration and hemorrhages, as well as recurrent and chronic infections, such as systemic cellulitis and abscesses, osteomyelitis, pyelonephritis, cystitis, and tinea corporis. We then evaluated immunoglobulin levels in 2,489 participants (125 carriers of ERCC6 putatively deleterious variants; 2,364 non-carriers) from BioMe. In the primary analysis, a burden of putatively deleterious ERCC6 variants was significantly associated with lower immunoglobulin levels, including IgG (β=−204 mg/dL, SE=63 mg/dL; P=1.2 × 10−3) and IgM (β=−36 mg/dL, SE=13 mg/dL; P=3.7 × 10−3) (Supp. Table S4).
We then performed a subgroup analysis whereby we stratified the BioMe cohort into four self-reported ancestry groups—European (number of putatively deleterious variant carriers [n]=183), African (n=388), Hispanic (n=308), and Other (n=106)—to examine ancestry-specific associations of putatively deleterious ERCC6 variation. The association of a burden of putatively deleterious variants in ERCC6 with pleiotropic phenotypes in the ancestry subgroups was similar to that in the primary analysis (Supp. Table S5), albeit with smaller sample sizes and less statistical significance. In addition, we examined the carrier frequency of putatively deleterious ERCC6 variants in over 150 self-reported countries of origin in BioMe. The highest rates of carriers were observed in Caribbean countries, including Curacao (n=1/3 [33%] were carriers of putatively deleterious ERCC6 variants), the Bahamas (n=2/11 [18%]), Grenada (n=3/28 [11%]), and Antigua and Barbuda (n=3/33 [9.1%]). High carrier rates were also noted in African countries, including Cameroon (n=2/6 [33%]), Kenya (n=3/9 [33%]), and Mali (n=2/9 [22%]), as well as South and Central American countries, such as Belize (n=4/40 [10%]), Uruguay (n=1/10 [10%]), and Panama (n=5/67 [7.5%]) (Figure 2; Supp. Table S6). These results underscore the importance of conducting rare variant association studies in multi-ancestry populations.
Figure 2.
Carrier rate of rare putatively deleterious variants in ERCC6 by self-reported countries of origin in the BioMe Biobank.
Carrier rate (percentage of individuals who have at least 1 copy of a variant) of rare putatively deleterious variants in ERCC6 in 62 out of 154 self-reported countries of origin in the BioMe Biobank. Putatively deleterious variants included loss-of-function variants of frameshift, splice acceptor/donor, stop gained/lost, or start lost consequence, and deleterious missense variants. Only rare putatively deleterious variants with allele frequency <1% were included. Participants with the self-reported country of origin of USA were classified further into African American, European American, Hispanic American, or Other (includes Asian, Native American, and miscellaneous ancestries). The complete list of 62 countries with carriers of putatively deleterious ERCC6 variants is tabulated in Supp. Table S6.
As clinical care does not currently consider an individual’s ERCC6 status, we assessed if carriers of ERCC6 LoF variants are underdiagnosed in BioMe. We identified individuals without a formal diagnosis of RD, atypical atrial flutter, and lymphocyte immunodeficiency (34 ERCC6 LoF carriers; 28,840 non-carriers). Symptoms of these three conditions documented in physician notes from the EHR were manually reviewed while blinded to ERCC6 carrier status and the prevalence of symptoms in carriers was compared to that in a random sample of 370 non-carriers (Figure 3). We observed a significant preponderance of symptoms for all three conditions among carriers of ERCC6 LoF variants compared to non-carriers. A total of 5/34 (15%) carriers versus 17/370 (4.6%) non-carriers exhibited symptoms of RD (P=0.031), with loss of peripheral vision and nyctalopia exclusively found in carriers. There were 11/34 (32%) carriers versus 61/370 (16%) non-carriers who showed symptoms of atypical atrial flutter (P=0.032), with palpitations and tachycardia predominant in carriers. Lastly, 11/34 (32%) carriers versus 56/370 (15%) non-carriers presented with symptoms of lymphocyte immunodeficiency (P=0.028), with recurrent bacterial infections resulting in bacteremia and sepsis primarily in carriers. Together, these data suggest significant underdiagnosis of pleiotropic conditions in carriers of rare ERCC6 LoF variants.
Figure 3.
Symptoms of retinal dystrophy, atypical atrial flutter, and lymphocyte immunodeficiency among carriers versus non-carriers of rare, predicted loss-of-function variants in ERCC6 in the BioMe Biobank.
Symptoms of retinal dystrophy (RD), atypical atrial flutter (AAF), and lymphocyte immunodeficiency (LI) among carriers versus non-carriers of rare, predicted loss-of-function variants in ERCC6 in the BioMe Biobank. Participants who lacked a formal diagnosis of RD, AAF, LI were included in the analyses (34 ERCC6 LoF carriers; 370 non-carriers). Proportion of symptoms for each of the three diseases in carriers was compared to that of non-carriers using Fisher’s exact test. Symptoms of RD included loss of peripheral vision, nyctalopia, loss of color vision, photophobia, and decreased visual acuity. Symptoms of AAF included palpitations, tachycardia, dyspnea, angina, dizziness, light-headedness, and syncope. Symptoms of LI included recurrent bacterial infection, pulmonary infection, chronic sinusitis, and gastrointestinal infection (i.e., campylobacter, giardia). Cases (%), percent of individuals with symptom; all, individuals with any symptoms of the disease; *, Fisher’s exact test P<0.05.
DISCUSSION
Understanding how rare genetic dysfunction cascades into physiological changes and human disease is a major hurdle to delivering genomic medicine. Using a “genotype-first” approach can ultimately aid the diagnosis and management of individuals at elevated genetic risk for otherwise under-diagnosed diseases. Here, we found that a burden of rare putatively deleterious variants in ERCC6 is significantly associated with a wide breadth of clinical phenotypes that are similar to those in CSB (Bailey et al., 2012; Licht et al., 2003; Muzaffar & Hussain, 2003; Nance & Berry, 1992; Wilson et al., 2016) spanning retinal, cardiac, and immune systems among ancestrally diverse individuals. Our study presents a targeted strategy to robustly assess carriers of rare variants for a set of medical consequences that phenotypically overlap with Mendelian diseases.
Here, we demonstrate that a burden of putatively deleterious variants in ERCC6 is associated with ocular, cardiac, and immune diseases, whereas past studies only reported the presentation of homozygous ERCC6 variants in CSB (Wilson et al., 2016). ERCC6 facilitates transcription, chromatin remodeling, and DNA repair in multiple tissues, dysfunction of which may explain the phenotypic heterogeneity of CSB (Licht et al., 2003). Our results suggest that individuals with heterozygous ERCC6 LoF variants may therefore present with pleiotropy of clinical disorders similar to CSB. Interestingly, reports of the retinal pigment epithelium—which is central to RD—pulsating in vitro (Okubo et al., 2008) and, more recently, an epigenetic switch of embryonic cell fates from neuroectoderm to cardiogenic mesoderm (Li et al., 2020) suggest a potential biological explanation linking ocular and cardiac phenotypes.
With this pipeline, we draw several conclusions regarding pleiotropy of this essential gene. First, a burden of putatively deleterious variants in ERCC6 contributes significantly to the risk of various cardiac electrical disorders. ERCC6 expression in the heart and altered ECG readings in carriers of putatively deleterious ERCC6 variants indicate that ERCC6 LoF may manifest as irregular cardiac electrophysiology and rhythms. Second, a burden of putatively deleterious variants in ERCC6 significantly increases the risk of lymphocyte immunodeficiency. ERCC6 expression in T lymphocytes and plasma cells, along with depressed concentrations of IgM and IgG in carriers of putatively deleterious ERCC6 variants, suggest that damaging ERCC6 variation may negatively impact lymphocyte function and antibody production. Third, Hispanic, African, and European ancestry carriers of putatively deleterious ERCC6 variants were affected by cardiac and immune diseases, with carrier rates highest among those from Caribbean, African, and Central and South American countries. These data emphasize the importance of considering diverse populations beyond solely European ancestries in exome studies. Fourth, we demonstrated that carriers of ERCC6 LoF variants who lacked a formal diagnosis of RD, atypical atrial flutter, and immunodeficiency were at increased risk of underdiagnosis, evidenced by a larger proportion of symptoms for these pleiotropic conditions.
We note several study limitations. Multiple traits were tested; yet, there was biological evidence for pleiotropy, a strict Bonferroni correction was applied, and a meta-analysis of independent results from two large-scale biobanks was performed. In addition, we cannot exclude the possibility of case misclassification from ICD-10 diagnosis codes biasing our results; however, we found evidence for pleiotropy across two independent cohorts and in quantitative traits (QRS, IgG, and IgM). We manually reviewed symptoms common in RD, atypical atrial flutter, and immunodeficiency to assess underdiagnosis of carriers. While some symptoms are non-specific (e.g., decreased visual acuity), carriers of ERCC6 LoF variants exhibited a far greater proportion of corresponding symptoms for all three conditions, making an external factor very improbable. We also note the difference in ascertainment of individuals in UKB, which is composed predominantly of healthy volunteers, compared to BioMe, which consists of individuals recruited from the Mount Sinai Health System and therefore has a higher disease prevalence (Supp. Table S1). Low case numbers for some diseases tested raise the chance of a spurious association; yet multiple pleiotropic associations with biological relevance were observed across two independent cohorts making this unlikely.
Overall, these data revealed strong associations of a burden of putatively deleterious variants in ERCC6 with clinically relevant phenotypes, warranting further study of its application in the clinical diagnosis and risk assessment of retinal, cardiac, and immunologic disorders. We used large-scale exome sequencing in tandem with deep EHR phenotyping to identify novel relationships between an important DNA-repair gene and clinical phenotypes spanning multiple systems. This constellation of phenotypes in ERCC6 heterozygous carriers is notably similar to that found in the syndromic disease of ERCC6 homozygous carriers. Critically, this study provides a “genotype-first” blueprint for future studies to characterize the genetic pleiotropy of rare variants in carriers from ancestrally diverse populations.
Supplementary Material
Acknowledgements
The BioMe healthcare delivery cohort at Mount Sinai was founded and maintained with a generous gift from the Andrea and Charles Bronfman Philanthropies. We thank the individuals who were involved in the quality control and/or file handling for the exome sequencing and genome-wide genotyping data, including Aayushee Jain, Kumardeep Chaudhary, Lisheng Zhou, Michael Preuss, Quingbin Song, Stephane Wenric, and Steve Ellis. We also thank the thesis advisory committee of IF, including Bruce D. Gelb, Sander Houten, Paz Polak, and Stuart Scott, for their critical feedback and expertise. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award numbers S10OD018522 and S10OD026880. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was also supported, in part, by a Challenge Grant from Research to Prevent Blindness, New York City (LP). This research has been conducted using the UK Biobank Resource under Application Number ‘16218’.
Funding Information:
IF was supported by T32GM007280 the Medical Scientist Training Program Training Grant from the National Institute of General Medical Sciences of the National Institutes of Health. RD is supported by R35GM124836 from the National Institute of General Medical Sciences of the National Institutes of Health, and R01HL139865 from the National Heart, Lung, and Blood Institute of the National Institutes of Health. LP is supported by R01EY015473 from the National Eye Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Competing Interests
RD reported receiving grants from AstraZeneca, grants and nonfinancial support from Goldfinch Bio, being a scientific co-founder and equity holder for Pensieve Health and being a consultant for Variant Bio. GN reported being a scientific co-founder, consultant, advisory board member, and equity owner of Renalytix AI, is a scientific co-founder and equity holder for Pensieve Health, being a consultant for Variant Bio and receiving grants from Goldfinch Bio and receiving personal fees from Renalytix AI, BioVie, Reata, AstraZeneca and GLG Consulting. LP is a consultant for Bausch+Lomb, Eyenovia, Verily, Nicox and Emerald Bioscience.
Data availability
The UKB data may be browsed at http://biobank.ndph.ox.ac.uk/showcase/ and access to data can be requested at https://www.ukbiobank.ac.uk/register-apply/. More information about BioMe and its data can be found at https://icahn.mssm.edu/research/ipm/programs/biome-biobank/researcher-faqs.
REFERENCES
- Adzhubei I, Jordan DM, & Sunyaev SR (2013). Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2. Current Protocols in Human Genetics, 76(1), 7.20.1–7.20.41. 10.1002/0471142905.hg0720s76 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey AD, Gray LT, Pavelitz T, Newman JC, Horibata K, Tanaka K, & Weiner AM (2012). The conserved Cockayne syndrome B-piggyBac fusion protein (CSB-PGBD3) affects DNA repair and induces both interferon-like and innate antiviral responses in CSB-null cells. DNA Repair, 11(5), 488–501. 10.1016/j.dnarep.2012.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, & Marchini J (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726), 203–209. 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cirulli ET, White S, Read RW, Elhanan G, Metcalf WJ, Tanudjaja F, Fath DM, Sandoval E, Isaksson M, Schlauch KA, Grzymski JJ, Lu JT, & Washington NL (2020). Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts. Nature Communications, 11(1), 1–10. 10.1038/s41467-020-14288-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corton M, Nishiguchi KM, Avila-Fernández A, Nikopoulos K, Riveiro-Alvarez R, Tatu SD, Ayuso C, & Rivolta C (2013). Exome Sequencing of Index Patients with Retinal Dystrophies as a Tool for Molecular Diagnosis. PLoS ONE, 8(6), e65574. 10.1371/journal.pone.0065574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dewey FE, Murray MF, Overton JD, Habegger L, Leader JB, Fetterolf SN, O’Dushlaine C, Van Hout CV, Staples J, Gonzaga-Jauregui C, Metpally R, Pendergrass SA, Giovanni MA, Kirchner HL, Balasubramanian S, Abul-Husn NS, Hartzel DN, Lavage DR, Kost KA, … Carey DJ (2016). Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science, 354(6319). 10.1126/science.aaf6814 [DOI] [PubMed] [Google Scholar]
- Do R, Stitziel NO, Won HH, Jørgensen AB, Duga S, Merlini PA, Kiezun A, Farrall M, Goel A, Zuk O, Guella I, Asselta R, Lange LA, Peloso GM, Auer PL, Girelli D, Martinelli N, Farlow DN, DePristo MA, … Kathiresan S (2015). Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature, 518(7537), 102–106. 10.1038/nature13917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Firth D (1993). Bias Reduction of Maximum Likelihood Estimates. Biometrika, 80(1), 27. 10.2307/2336755 [DOI] [Google Scholar]
- Fishilevich S, Zimmerman S, Kohn A, Stein TI, Olender T, Kolker E, Safran M, & Lancet D (2016). Genic insights from integrated human proteomics in GeneCards. Database, 2016. 10.1093/database/baw030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Inglehearn CF (1998). Molecular genetics of human retinal dystrophies. Eye (Basingstoke), 12(3), 571–579. 10.1038/eye.1998.147 [DOI] [PubMed] [Google Scholar]
- Kohane IS (2011). Using electronic health records to drive discovery in disease genomics. Nature Reviews Genetics, 12(6), 417–428. 10.1038/nrg2999 [DOI] [PubMed] [Google Scholar]
- Lee S, Abecasis GR, Boehnke M, & Lin X (2014). Rare-variant association analysis: Study designs and statistical tests. American Journal of Human Genetics, 95(1), 5–23. 10.1016/j.ajhg.2014.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q, Mao F, Zhou B, Huang Y, Zou Z, denDekker AD, Xu J, Hou S, Liu J, Dou Y, & Rao RC (2020). p53 Integrates Temporal WDR5 Inputs during Neuroectoderm and Mesoderm Differentiation of Mouse Embryonic Stem Cells. Cell Reports, 30(2), 465–480.e6. 10.1016/j.celrep.2019.12.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Licht CL, Stevnsner T, & Bohr VA (2003). Cockayne Syndrome Group B Cellular and Biochemical Functions. American Journal of Human Genetics, 73(6), 1217–1239. 10.1086/380399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, Foster B, Moser M, Karasik E, Gillard B, Ramsey K, Sullivan S, Bridge J, Magazine H, Syron J, … Moore HF (2013). The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45(6), 580–585. 10.1038/ng.2653 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, & Cunningham F (2016). The Ensembl Variant Effect Predictor. Genome Biology, 17(1), 122. 10.1186/s13059-016-0974-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muzaffar F, & Hussain I (2003). Cockayne syndrome. An update. Journal of Pakistan Association of Dermatologists, 135–145. https://www.jpad.com.pk/index.php/jpad/article/view/749
- Nance MA, & Berry SA (1992). Cockayne syndrome: Review of 140 cases. American Journal of Medical Genetics, 42(1), 68–84. 10.1002/ajmg.1320420115 [DOI] [PubMed] [Google Scholar]
- Ng PC, & Henikoff S (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31(13), 3812–3814. 10.1093/nar/gkg509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okubo A, Hirakawa M, Ito M, Sameshima M, & Sakamoto T (2008). Clinical features of early and late stage polypoidal choroidal vasculopathy characterized by lesion size and disease duration. Graefe’s Archive for Clinical and Experimental Ophthalmology, 246(4), 491–499. 10.1007/s00417-007-0680-8 [DOI] [PubMed] [Google Scholar]
- Park J, Levin MG, Haggerty CM, Hartzel DN, Judy R, Kember RL, Reza N, Ritchie MD, Owens AT, Damrauer SM, & Rader DJ (2020). A genome-first approach to aggregating rare genetic variants in LMNA for association with electronic health record phenotypes. Genetics in Medicine, 22(1), 102–111. 10.1038/s41436-019-0625-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, O’Dushlaine C, Chambert K, Bergen SE, Kähler A, Duncan L, Stahl E, Genovese G, Fernández E, Collins MO, Komiyama NH, Choudhary JS, Magnusson PKE, Banks E, … Sklar P (2014). A polygenic burden of rare disruptive mutations in schizophrenia. Nature, 506(7487), 185–190. 10.1038/nature12975 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rentzsch P, Witten D, Cooper GM, Shendure J, & Kircher M (2019). CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research, 47(D1), D886–D894. 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sardell RJ, Bailey JNC, Courtenay MD, Whitehead P, Laux RA, Adams LD, Fortun JA, Brantley MA Jr., Kovach JL, Schwartz SG, Agarwal A, Scott WK, Haines JL, & Pericak-Vance MA (2016). Whole exome sequencing of extreme age-related macular degeneration phenotypes. Molecular Vision, 22, 1062. [PMC free article] [PubMed] [Google Scholar]
- Schwarz JM, Rödelsperger C, Schuelke M, & Seelow D (2010). MutationTaster evaluates disease-causing potential of sequence alterations. In Nature Methods (Vol. 7, Issue 8, pp. 575–576). Nature Publishing Group. 10.1038/nmeth0810-575 [DOI] [PubMed] [Google Scholar]
- Schwarzer G, Carpenter JR, & Rücker G (2015). An Introduction to Meta-Analysis in R (pp. 3–17). 10.1007/978-3-319-21416-0_1 [DOI] [Google Scholar]
- Seddon JM, Cote J, Page WF, Aggen SH, & Neale MC (2005). The US twin study of age-related macular degeneration: Relative roles of genetic and environmental influences. Archives of Ophthalmology, 123(3), 321–327. 10.1001/archopht.123.3.321 [DOI] [PubMed] [Google Scholar]
- Son JH, Xie G, Yuan C, Ena L, Li Z, Goldstein A, Huang L, Wang L, Shen F, Liu H, Mehl K, Groopman EE, Marasa M, Kiryluk K, Gharavi AG, Chung WK, Hripcsak G, Friedman C, Weng C, & Wang K (2018). Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. American Journal of Human Genetics, 103(1), 58–73. 10.1016/j.ajhg.2018.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swede H, Stone CL, & Norwood AR (2007). National population-based biobanks for genetic research. Genetics in Medicine, 9(3), 141–149. 10.1097/GIM.0b013e3180330039 [DOI] [PubMed] [Google Scholar]
- Szustakowski JD, Balasubramanian S, Sasson A, Khalid S, Bronson PG, Kvikstad E, Wong E, Liu D, Wade Davis J, Haefliger C, Katrina Loomis A, Mikkilineni R, Noh HJ, Wadhawan S, Bai X, Hawes A, Krasheninina O, Ulloa R, Lopez A, … Ye Z (2020). Advancing Human Genetics Research and Drug Discovery through Exome Sequencing of the UK Biobank. MedRxiv, 2020.11.02.20222232. 10.1101/2020.11.02.20222232 [DOI] [PubMed] [Google Scholar]
- Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, Gonzaga-Jauregui C, Khalid S, Ye B, Banerjee N, Li AH, O’Dushlaine C, Marcketta A, Staples J, Schurmann C, Hawes A, Maxwell E, Barnard L, Lopez A, … Baras A (2020). Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature, 586(7831), 749–756. 10.1038/s41586-020-2853-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X (2014). Firth logistic regression for rare variant association tests. Frontiers in Genetics, 5(JUN), 187. 10.3389/fgene.2014.00187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson BT, Stark Z, Sutton RE, Danda S, Ekbote AV, Elsayed SM, Gibson L, Goodship JA, Jackson AP, Keng WT, King MD, McCann E, Motojima T, Murray JE, Omata T, Pilz D, Pope K, Sugita K, White SM, & Wilson IJ (2016). The Cockayne Syndrome Natural History (CoSyNH) study: Clinical findings in 102 individuals and recommendations for care. Genetics in Medicine, 18(5), 483–493. 10.1038/gim.2015.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, & Su AI (2009). BioGPS: An extensible and customizable portal for querying and organizing gene annotation resources. Genome Biology, 10(11), R130. 10.1186/gb-2009-10-11-r130 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The UKB data may be browsed at http://biobank.ndph.ox.ac.uk/showcase/ and access to data can be requested at https://www.ukbiobank.ac.uk/register-apply/. More information about BioMe and its data can be found at https://icahn.mssm.edu/research/ipm/programs/biome-biobank/researcher-faqs.