Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2017 Jul 20;101(2):206–217. doi: 10.1016/j.ajhg.2017.06.011

Computational Prediction of Position Effects of Apparently Balanced Human Chromosomal Rearrangements

Cinthya J Zepeda-Mendoza 1,2,25, Jonas Ibn-Salem 3,25, Tammy Kammin 1, David J Harris 2,4, Debra Rita 5, Karen W Gripp 6, Jennifer J MacKenzie 7, Andrea Gropman 8, Brett Graham 9, Ranad Shaheen 10, Fowzan S Alkuraya 10,11, Campbell K Brasington 12, Edward J Spence 12, Diane Masser-Frye 13, Lynne M Bird 13,14, Erica Spiegel 15, Rebecca L Sparkes 16, Zehra Ordulu 17, Michael E Talkowski 17,18,19,20,21, Miguel A Andrade-Navarro 3, Peter N Robinson 22, Cynthia C Morton 1,3,20,23,24,
PMCID: PMC5544382  PMID: 28735859

Abstract

Interpretation of variants of uncertain significance, especially chromosomal rearrangements in non-coding regions of the human genome, remains one of the biggest challenges in modern molecular diagnosis. To improve our understanding and interpretation of such variants, we used high-resolution three-dimensional chromosomal structural data and transcriptional regulatory information to predict position effects and their association with pathogenic phenotypes in 17 subjects with apparently balanced chromosomal abnormalities. We found that the rearrangements predict disruption of long-range chromatin interactions between several enhancers and genes whose annotated clinical features are strongly associated with the subjects’ phenotypes. We confirm gene-expression changes for a couple of candidate genes to exemplify the utility of our analysis of position effect. These results highlight the important interplay between chromosomal structure and disease and demonstrate the need to utilize chromatin conformational data for the prediction of position effects in the clinical interpretation of non-coding chromosomal rearrangements.

Keywords: cytogenetics, long-range effect, HPO, chromatin conformation, distal effect, chromosomal translocation, chromosomal rearrangement, diagnosis, clinical genetics, balanced chromosomal rearrangement

Introduction

The importance of the integrity of chromosomal structure and its association with human disease is one of the oldest and most studied topics in clinical genetics. As early as 1959, cytogenetic studies in humans linked specific genetic or genomic disorders and intellectual disability syndromes to changes in chromosomal ploidy, translocations, and DNA duplications and deletions.1, 2, 3, 4 The discovery of copy-number variants (CNVs) by microarray and sequencing technologies expanded the catalog of genetic variation between individuals to test such associations at higher resolution.5, 6, 7, 8, 9, 10, 11, 12, 13, 14 Over the years, analysis of disease-related structural rearrangements has illuminated genes that are mutated in various human developmental disorders.15, 16, 17, 18 Such chromosomal aberrations can directly disrupt gene sequences, affect gene dosage, generate gene fusions, unmask recessive alleles, reveal imprinted genes, or result in alterations of gene expression through additional mechanisms, such as position effects.15 The latter is particularly important for the study of apparently balanced chromosomal abnormalities (BCAs), such as translocations and inversions, often found outside of the hypothesized disease-related genes.19

Position effects were first identified in Drosophila melanogaster, in which chromosomal inversions placing white+ near centric heterochromatin caused mosaic red and white eye patterns.20 In humans, BCAs can induce position effects through disruption of a gene’s long-range transcriptional control (i.e., enhancer-promoter interactions, insulator influence, etc.) or its placement in regions with different local chromatin environments, as observed in the classical Drosophila position-effect variegation.19, 21, 22 Examples of position-effect genes include paired box gene 6 (PAX6 [MIM: 607108]), for which downstream chromosomal translocations affect its cis-regulatory control and produce aniridia (MIM: 106210);23, 24 twist family bHLH transcription factor 1 (TWIST1 [MIM: 601622]), where downstream translocations and inversions are associated with Saethre-Chotzen syndrome (MIM: 101400);25 paired-like homeodomain 2 (PITX2 [MIM: 601542]), for which translocations are associated with Axenfeld-Rieger syndrome type 1 (MIM: 180500);26, 27 and SRY-box 9 (SOX9 [MIM: 608160]), where translocation breakpoints located up to 900 kb upstream and 1.3 Mb downstream are associated with campomelic dysplasia (MIM: 114290);28 as well as several others.19, 29, 30

The availability of genome sequencing in the clinical setting has generated a need for rapid prediction and interpretation of structural variants, especially those pertaining to de novo non-coding rearrangements in individual subjects. With the development and subsequent branching of the chromosome conformation capture (3C) technique,31, 32 regulatory issues such as alteration of long-range transcriptional control and position effects can now be predicted in terms of chromosome organization. The high-resolution view of chromosomal architecture in diverse human cell lines and tissues33, 34, 35, 36, 37, 38, 39, 40 has allowed molecular assessment of the disruption of regulatory chromatin contacts by pathogenic structural variants and single-nucleotide changes; examples include the study of limb malformations,41 leukemia,42 and obesity,43 among others.44, 45, 46, 47, 48, 49 These examples underscore the importance of chromatin interactions in quantitative and temporal control of gene expression, which can greatly enhance our power to predict pathologic consequences.

To test the feasibility of prediction and clinical interpretation of position effects of non-coding chromosomal rearrangements, we analyzed 17 subjects with de novo non-coding BCAs classified as variants of uncertain significance (VUSs) from the Developmental Genome Anatomy Project (DGAP).18, 50, 51, 52, 53 Using publicly available chromatin contact information, annotated and predicted regulatory elements, and correlation between phenotypes observed in DGAP subjects and those associated with neighboring genes, we reliably predicted candidate genes exhibiting misregulated expression in DGAP-derived lymphoblastoid cell lines (LCLs). These results suggest that many VUSs are likely to be further interpretable via long-range effects and warrant routine assessment and integration in clinical diagnosis.

Material and Methods

Selection of Subjects with BCAs

BCA breakpoints and clinical data were obtained from DGAP subjects for whom whole-genome sequencing had been performed according to a previously described large-insert jumping-library approach.18, 50, 51, 52, 53, 54 A total of 151 subjects were filtered to include only subjects whose translocation or inversion breakpoints fell within intergenic regions (GRCh37) and did not overlap known long intergenic non-coding RNAs (lincRNAs) or pseudogenes, given that these elements have been shown to exert functional roles.55, 56, 57 Of 151 DGAP subjects, only 17 fulfilled our selection criteria, and 12 of these had available and reportedly normal clinical array results, suggesting a lack of large duplications or deletions.

Clinical Descriptions of DGAP Subjects

The clinical presentation of the 17 subjects ranged from developmental delay to neurological conditions, offering the opportunity to assess long-range position effects in different phenotypes. Subjects’ karyotypes are presented in the main text according to the International System for Human Cytogenomic Nomenclature (ISCN2016) (Table 1). Detailed clinical descriptions, as well as nomenclature developed to describe chromosomal rearrangements by using next-generation sequencing,58 are included in the Supplemental Note: Case Reports. Reported ages of DGAP subjects are from the time of enrollment. All reported genomic coordinates reference GRCh37.

Table 1.

Description of the 17 Analyzed DGAP Subjects with Non-coding BCAs

Subject ID Reported Karyotype Disruption of Functional Element Breakpoints within TADs
Top-Ranking Candidates ± 1 Mb
hESC IMR90 GM12878
DGAP017 46,X,t(X;10)(p11.2;q24.3) DHS 2 2 1
DGAP111 46,XY,t(16;20)(q11.2;q13.2)dn CTCF 1 1 2 ORC6a
DGAP113 46,XY,t(1;3)(q32.1;q13.2)dn 2 2 2 ASPMa
DGAP126 46,XX,t(5;10)(p13.3;q21.1)dn 2 1 2
DGAP138 46,XY,t(1;6)(q23;q13)dn 2 2 2 GRIK2a,c
DGAP153 46,X,t(X;17)(p11.23;p11.2)dn 1 1 1
DGAP163 46,XY,t(2;14)(p23;q13)dn 2 2 2 SOS1c,d,e and COCHd,e
DGAP176 46,Y,inv(X)(q13q24)mat DHS, CTCF 2 1 2 ACSL4b,d and COL4A5b,c,d,e
DGAP249 46,XX,t(2;11)(q33;q23)dn E, DHS 2 2 2 SATB2b,c,d,e and SORL1e
DGAP252 46,XY,t(3;18)(q13.2;q11.2)dn 2 2 2 RBBP8a and GATA6b,c,d,e
DGAP275 46,XX,t(7;12)(p13;q24.33)dn DHS 1 1 2 ANKLE2e and POLEe
DGAP287 46,XY,t(10;14)(p13;q32.1)dn CTCF 2 2 2
DGAP288 46,XX,t(6:17)(q13;q21)dn DHS 2 2 2 SOX9b,c,d
DGAP315 46,XX,inv(6)(p24q11)dn 1 1 2
DGAP319 46,XX,t(4;13)(q31.3;q14.3)dn 2 1 2
DGAP322 46,XY,t(1;18)(q32.1;q22.1) DHS 1 2 2 IRF6b,c,d
DGAP329 46,XX,t(2;14)(q21;q24.3)dn 1 2 2 ZEB2b,c,d,e

Corresponding clinical karyotypes, including overlap between breakpoints and regulatory elements (E, enhancer; DHS, DNaseI hypersensitive site; CTCF, CTCF binding site), and TADs from H1-hESC, IMR90, and GM12878 (1, one breakpoint within the TAD; 2, both BCA breakpoints are located within the TAD) are reported. Top-ranking position-effect genes are provided for the ±1 Mb windows surrounding the BCA breakpoints; each gene is highlighted with different evidence supporting its inclusion (see footnotes).

a

ClinGen known recessive genes.

b

ClinGen genes with emerging and sufficient evidence suggesting that HI is associated with clinical phenotype.

c

HI scores less than 10.

d

Within H1-ESC TAD.

e

Disrupted DHS/enhancer-promoter interactions.

Analysis of Genes Bordering the Rearrangement Breakpoints

The presence of annotated genes or pseudogenes and lincRNAs was assessed in ±3 and ±1 Mb windows neighboring each subject’s translocation and inversion breakpoints and within the reported H1-hESC topologically associated domains (TADs)35 where the breakpoints were located. The gene annotation file was obtained from Ensembl GRCh37,59 and we used the Human lincRNA Catalog.60 Haploinsufficiency (HI) and triplosensitivity scores were assigned according to the report by Huang et al.61 and version hg19 of ClinGen62 data downloaded on September 20, 2016.

Assessment of Disrupted Functional Elements and Chromatin Interactions Bordering Rearrangement Breakpoints

The disruption of regulatory elements such as enhancers, promoters, locus control regions, and insulators can lead to disease-related changes in gene expression; DNase I hypersensitive sites (DHSs) have been used as markers for the identification of such elements.63 In addition, the alteration of TAD boundaries has been previously shown to cause a rewiring of enhancers with pathological consequences;41, 46, 64 CCCTC-binding factor (CTCF) binding sites have been found to be enriched in TAD boundaries,35 and several mutations of boundary-defining sites have been associated with cancer.65, 66 On the basis of these observations, we assessed the number of regulatory elements that were potentially disrupted by the analyzed DGAP breakpoints. We compared the breakpoint positions of the selected DGAP subjects against ENCODE project67 data on CTCF binding sites, DHSs, and chromatin segmentation classifications (Broad ChromHMM) derived from a LCL (GM12878) and human stem cells (H1-hESC) accessed through the UCSC Genome Browser.68 Enhancer positions were additionally obtained from Andersson et al.69 for tissue and primary cells and from the VISTA Enhancer Browser version hg19.70 Finally, lists of transcription factor (TF) binding sites and gene promoters were obtained from Ensembl GRCh37.59 Hi-C interaction data and TAD positions for H1-hESC, GM06990, and IMR90 at 20 kb, 40 kb, 100 kb, and 1 Mb resolution were obtained from Dixon et al.35 and the WashU EpiGenome Browser.71 A high-resolution dataset of chromatin loops and domains was obtained from Rao et al.38 for IMR90 and GM12878 cells. Lastly, we used distal DHS/enhancer-promoter connections63 (DHSs that could be candidate enhancers given their association with distal promoters) to assess disrupted predicted cis-regulatory interactions by the BCAs. Genomic overlaps between the rearrangement breakpoints, functional elements, and disrupted chromatin interactions were calculated with custom Perl scripts, the BEDtools suite,72 and the Genomic Association Tester (GAT).73

Ontological Analysis of Genes Neighboring Breakpoints

We calculated phenotype similarity between potential position-effect genes and DGAP subjects by converting the phenotypes of the 17 subjects to Human Phenotype Ontology (HPO)74 terms and calculating their phenomatch score as described in Ibn-Salem et al.48 The phenomatch score quantifies the information content of the most specific HPO term that is part of or a common ancestor (more general term) of a set of phenotypes. Our set of phenotypes is constituted by the HPO terms associated with DGAP subjects and those annotated to candidate position-effect genes within windows of 3 and 1 Mb of sequence in proximity to the breakpoints. We used two background models to assess the significance of this similarity. The rest was based on randomly permuting the associations of phenotypes to genes; to this effect, the phenotype-gene associations were shuffled 100 times randomly, and the similarity between these random phenotypes and the studied clinical findings was calculated. The second background control was based on shifting the breakpoint location along the chromosome; each breakpoint was shifted by −9, −6, −3, +3, +6, and +9 Mb, and the similarity of genes in proximity to the shifted breakpoints was computed.

Real-Time qPCR

LCLs derived from DGAP236-02m, DGAP244-02m, and DGAP245-02m were used as karyotypically normal male control subjects. These are karyotypically normal fathers of enrolled DGAP subjects and have no history of disease. LCL 17402 (DGAP163) was used for testing differential gene expression for SOS Ras/Rac guanine nucleotide exchange factor 1 (SOS1 [MIM: 182530]), and LCL 18060 (DGAP176) was used for testing midline 2 (MID2 [MIM: 300204]), p21 (RAC1) activated kinase 3 (PAK3 [MIM: 300142]), and POU class 3 homeobox 4 (POU3F4 [MIM: 300039]) expression via qPCR. Glucuronidase beta (GUSB [MIM: 611499]) was used as a housekeeping control. qPCR experiments were performed by the Harvard Biopolymers Facility with TaqMan probes Hs00264887_s1 (POU3F4), Hs00201978_m1 (MID2), Hs00176828_m1 (PAK3), Hs00893134_m1 (SOS1), and Hs00939627_m1 (GUSB). Data were analyzed by the cycle threshold (Ct) method.

Assessment of DGAP Breakpoints Overlapping Non-coding Structural Variants in Public Databases

To find similar subjects with non-coding structural rearrangements and compare their annotated clinical phenotypes with those observed in DGAP subjects, we searched DECIPHER75 version 2015-07-13, as well as dbVar from the NCBI Variation Viewer 1.5.76 Both databases are comprehensive community-supported repositories of clinical subjects with novel and extremely rare genomic variants.

Results

Genomic Characterization of Non-coding Breakpoints

To study the structural and evolutionary context of BCAs and their impact on nuclear architecture and gene expression, we used data generated by DGAP,18, 50, 51, 52, 53 the largest collection of sequenced balanced chromosomal rearrangements from individuals with abnormal developmental and cognitive phenotypes; many of these have yet to be investigated in detail. Each studied DGAP BCA has two breakpoint positions (because two distinct chromosome regions are involved in their generation), which we labeled with the DGAP#_A and DGAP#_B identifiers. We filtered DGAP data to select subjects with both breakpoints in non-coding regions only and exclude lincRNAs and pseudogenes; a total of 17 subjects, 15 with translocations and 2 with inversions, fulfilled our criteria (Figure 1 and Table S1). These subjects are phenotypically distinct, and most of them presented with congenital developmental and neurological conditions not recognized as a known syndrome or genomic disorder (see clinical descriptions in Supplemental Note: Case Reports).

Figure 1.

Figure 1

Chromosome Locations of the 17 Analyzed DGAP Subjects with Non-coding BCAs

Breakpoint positions are marked with a blue line and the corresponding DGAP number. All chromosomes are aligned by the centromere (marked in pink) and are indicated above by their corresponding chromosome number.

Further analysis revealed that BCA breakpoints were significantly depleted of overlapping annotated promoters and TF binding sites (GAT TF p = 0.0003, promoter p = 0.0001; Tables S2 and S3). Only one breakpoint (DGAP249_B) overlapped a ChromHMM enhancer in GM12878 cells (Table 1); the others had no overlap with annotated or predicted enhancers in the analyzed datasets, and this depletion was significant for VISTA (GAT p = 0.0364) and H1-hESC (GAT p = 0.0036) but not for the annotated tissue or primary cell enhancers from Andersson et al.69 (Table S4). Eight breakpoints overlapped cell-type-specific DHSs (Tables 1 and S5); these corresponded to DGAP subjects DGAP017, DGAP176, DGAP249, DGAP275, DGAP288, and DGAP322. Of these, DGAP176 and DGAP275 breakpoints overlapped DHSs at both BCA breakpoint sites. In addition, three DGAP subjects had rearrangements overlapping CTCF binding sites in H1-hESC (DGAP111, DGAP176, and DGAP287) and none in GM12878 cells (Table 1 and Table S6). Except for those in two subjects in H1-hESC (DGAP17 and DGAP176) and four subjects in GM12878 (DGAP017, DGAP126, DGAP163, and DGAP176), all rearrangements fell within ChromHMM repressed chromatin regions, but this association was not significant (GAT p = 0.40 for GM12878 and p = 0.15 for H1-hESC; Table S2F). Interestingly, 22 of the 34 breakpoints (∼65%) overlapped repeated elements at a significant level (GAT p = 0.0002; Table S8), which could indicate a non-allelic homologous recombination process in their generation.77, 78

Noticeably, either one or two breakpoints from all the non-coding DGAP BCAs fell within previously reported TADs in H1-hESC and IMR90 cell lines (Tables 1 and S9).35 However, this overlap was not significant for either cell line (GAT H1-hESC p = 0.0537 and IMR90 p = 0.28). We found that the breakpoints disrupted dozens, hundreds, or even thousands of chromatin contacts when they were assessed at 20 and 40 kb resolution in Hi-C data of H1-hESC and IMR90 cells, as well as chromatin contacts at 100 kb and 1 Mb resolution in GM06990 cells (Table S11). Breakpoint DGAP111_A consistently lacked disrupted chromatin contacts, which is expected because it overlaps a repetitive satellite region, so no chromatin contacts could be mapped to the segment (Tables S9 and S11). With the availability of higher-resolution data, it is possible to detect whether BCA breakpoints disrupt smaller chromatin domains and loops not detected in previous studies. When analyzing high-resolution IMR90 and GM12878 Hi-C data,38 we discovered that 32 of 34 breakpoints were contained within GM12878 sub-compartments (Tables 1 and S10); interestingly, 28 of these were classified as members of the B compartment, which is less gene dense and less expressed than the A compartment. On the other hand, 18 and 24 breakpoints were contained within GM12878 and IMR90 arrowhead domains, respectively (Table S10), which are regions of enhanced contact frequency that tile the diagonal of each chromatin contact matrix. In addition, the breakpoints disrupted several significant short and long-range chromatin interactions in the GM12878 Hi-C data (Table S12).

Overall, the observation of breakpoint-associated DHSs suggests the alteration of underlying regulatory elements with potential pathogenic outcomes, whereas the predicted extensive disruption of chromatin contacts and the alteration of TAD boundaries by the BCAs could affect long-range regulatory interactions of neighboring genes (see the Discussion).

Identification of Genes with Potential Position Effects

To identify genes that could be generating the complex DGAP phenotypes via position effects from chromosomal rearrangements, we analyzed all annotated genes within windows of ±3 and ±1 Mb proximal and distal to the breakpoints and within the BCA-containing H1-hESC reported TAD positions. A total of 3,081 genes were contained within the ±3 and ±1 Mb windows for all subjects; 106 of these genes (∼3.4%) had an HI score of <10%, which is a predictor of HI,61 and 55 and 2 genes had ClinGen emerging evidence suggesting that dosage HI and triplosensitivity, respectively, are associated with clinical phenotypes (Table S15).

To further refine our search for genes that might exhibit position effects, we performed an unbiased correlation between DGAP subjects’ phenotypes and the clinical traits associated with genes bordering each breakpoint. To this end, we used the HPO dataset,74 which provides a standardized vocabulary of phenotypic abnormalities encountered in human disease and currently contains ∼11,000 terms and over 115,000 annotations to hereditary diseases. We translated DGAP clinical features to HPO terms (Table S16) and calculated phenotype similarity between DGAP subjects and neighboring genes by using the phenomatch score.48 The phenomatch score distinguishes between general and very specific phenotypic descriptions by quantifying the information content of the most specific HPO terms that are common to, or a common ancestor of, the phenotypes of the DGAP subject and neighboring gene. The similarity significance is then calculated on the basis of randomly permuting the associations between phenotypes and genes and on shifting the DGAP translocation and inversion breakpoint positions along the chromosome. We obtained phenomatch scores ranging from 0.003 to 91.48 for 179 genes within the ±3 and ±1 Mb windows, as well as within the TAD positions (Table S15).

In addition to obtaining information on dosage sensitivity and phenotypic similarity, we complemented our analysis with assessment of enhancer-promoter interactions to make our candidate selection more specific. A typical mechanism by which chromosomal rearrangements cause position effects is through disruptions in the association between genes and their regulatory regions.19, 29 We therefore reasoned that genes and enhancers included in predicted enhancer-promoter interactions would be strong position-effect candidates. We used the ENCODE distal DHS/enhancer-promoter connections63 to assess disrupted predicted cis-regulatory interactions by the DGAP breakpoints within a ±500 kb window. The analysis revealed 193 genes that were separated from their predicted candidate enhancers, potentially altering gene expression (Table S13). A total of 133 candidate genes were separated from <10 of their predicted enhancers, whereas 60 genes were separated from their predicted interactions with 10 or up to 91 enhancers (Table S14).

For the 17 analyzed DGAP BCAs, 645 genes had evidence of dosage sensitivity, disrupted enhancer-promoter interactions, or significant phenotypic similarity. This represents ∼21% of the genes contained within the ±3 Mb windows, clearly an undesirable number for timely clinical interpretation and functional analyses. To filter the most promising candidates, we ranked them according to their reported dosage sensitivity and disrupted regulatory interactions and selected a phenomatch cutoff value capable of detecting pathogenic and likely pathogenic genes in 57 published DGAP subjects from Redin et al.53 By accounting for the top quartile values of the reported phenomatch scores per subject and adding up their data on dosage sensitivity and disrupted regulatory interactions, we consistently ranked the reported pathogenic and likely pathogenic genes in the upper decile for 52 of the 57 DGAP control subjects (∼91%) when considering candidates within the TAD and ±1 Mb analysis windows (Table S17). 32 of these genes were the top-ranking candidates in their corresponding DGAP subject, whereas 19 of them were positioned in the second-tier rank. Only five genes could not be found in the top decile positions, because they had one or no lines of evidence supporting their inclusion.

Applying this ranking strategy to the 17 non-coding BCAs, we predict 16 top-ranking candidates for 11 DGAP subjects and 102 second-tier candidates for the 17 analyzed DGAP subjects within ±1 Mb analysis windows (Tables 1 and S15). This is a significant reduction in comparison with the initial 645 possible candidates (∼3.8% of the neighboring genes in the ±3 Mb windows when top and second-tier candidates are considered and 0.5% when only top candidates are considered). Of note, only 9 of the 16 top-ranking candidates were included within the same TAD as the BCA breakpoint (H1-hESC TADs from Dixon et al.35), whereas the rest were located farther away. Nine top-ranking genes had an HI score < 10%,61 whereas ClinGen HI data revealed that 4 of these 16 genes are associated with autosomal-recessive phenotypes, and an additional seven have sufficient or some evidence of HI. Only one candidate gene for DGAP138, glutamate ionotropic receptor kainate type subunit 2 (GRIK2 [MIM: 138244]), was a confirmed triplosensitive annotated gene in ClinGen (Table S15).

Together, these cases represent more plausible candidates in the search for position-effect genes with functional consequences in the subjects’ phenotypes. For example, GRIK2 could explain the intellectual disability observed in DGAP138; SOS1, forkhead box G1 (FOXG1 [MIM: 164874]), and cochlin (COCH [MIM: 603196]) could be related to the neurological and developmental delay and hearing loss in DGAP163; acyl-CoA synthetase long-chain family member 4 (ACSL4 [MIM: 300157]) and POU3F4 could be involved in DGAP176’s cognitive impairment and hearing loss; SATB homeobox 2 (SATB2 [MIM: 608148]) might underlie the delayed speech and language development observed in DGAP249; RB binding protein 8 endonuclease (RBBP8 [MIM: 604124]) might be involved in DGAP252’s craniofacial dysmorphic features; SOX9 most likely explains the cleft palate observed in DGAP288; DNA polymerase epsilon catalytic subunit (POLE [MIM: 174762]) might contribute to the extreme short stature observed in DGAP275; and zinc finger E-box binding homeobox 2 (ZEB2 [MIM: 605802]) could potentially explain the hypotonia and neurological features observed in DGAP329. SOX9 had been previously proposed to explain DGAP288’s phenotype, and as predicted by our method, a decrease in its expression was observed in RNA derived from DGAP288’s umbilical cord blood.49 Additional real-time qPCR analyses revealed SOS1 as having lower expression in DGAP163-derived LCLs than in three normal sex-matched control lines (Figure 2). Expression of second-tier candidates PAK3, MID2, and POU3F4 in DGAP176 LCLs did not deviate substantially from their control expression values (Figure S1); further searches into the Genotype-Tissue Expression (GTEx) project79 revealed that PAK3, MID2, and POU3F4 have low expression in LCLs, which would have made assessing changes in expression of these genes technically difficult. This points to the importance of the availability of tissues and cell lines relevant to the studied phenotypes or the capacity to generate cellular or animal models that reproduce the observed BCAs for further analysis.

Figure 2.

Figure 2

Assessment of Gene-Expression Changes for DGAP163-Derived LCLs

Each column compares the ΔCt results of three culture replicates (with four technical replicates each) with those of three sex-matched control cell lines. Error bars indicate the standard deviation calculated from the biological replicates. The Mann-Whitney U test p value is provided for the comparison between expression values of SOS1 and the control GUSB.

Identification of Subjects with Shared Non-coding Chromosomal Alterations and Phenotypes

The identification of subjects with shared non-coding chromosomal alterations and phenotypes as described herein would further support our idea that these rearrangements exert their pathogenic outcomes through long-range position effects. To identify such subjects, we searched DECIPHER75 and dbVar,76 both comprehensive community-supported repositories of clinical subjects with novel or extremely rare genomic variants.

We found 494 DECIPHER subjects whose rearrangements overlap our 34 non-coding BCA breakpoints (Table S19). Of these, 489 have rearrangements that overlap one or more annotated genes (Table S20). Only five DECIPHER subjects fulfilled our non-coding selection criteria (Table S21): subjects 1985 and 1989, whose rearrangement positions overlap one of DGAP017’s breakpoints in chromosome 10 but have several other gene-altering genomic rearrangements; subject 289720, who has a 161.44 kb deletion in chromosome 10 described as likely benign and shares a sequence breakpoint with DGAP126; subject 289865, who has a rearrangement overlapping a breakpoint in chromosome 10 of DGAP126, and similarly subject 289720, who has an additional pathogenic gene-altering rearrangement; and lastly subject 293610, in whom a pathogenic duplication of 364.43 kb in chromosome 17 shares a breakpoint with DGAP288. Only two of the five DECIPHER subjects have reported clinical phenotypes. DECIPHER subject 289720 presents with intellectual disability and psychosis, both pertaining to the superclasses of behavioral and neurodevelopmental abnormalities under the HPO classification. Interestingly, DGAP126 has abnormal aggressive, impulsive, or violent behavior and auto-aggression, as well as language and motor delays, which also fall under the classification of behavioral and neurodevelopmental abnormalities. DECIPHER subject 293610 has reported gonadal tissue discordant for external genitalia or chromosomal sex and a non-obstructive azoospermia clinical phenotype;80 neither feature was observed until puberty, and both are associated with the female-to-male sex disorder observed for CNVs altering the SOX9 genomic landscape. Although DGAP288 is still an infant, there is no report of sex reversal.

From dbVar, 675 non-coding structural rearrangements including CNVs, deletions, inversions, and translocations overlapped DGAP breakpoints (Table S22). Of these, only five variants had associated clinical information, including variant nsv534336, a 530 kb duplication overlapping the DGAP017 breakpoint in chromosome 10, classified as “uncertain significance,”81 and exhibiting a growth-delay phenotype; nsv931775, a benign ∼381.8 kb deletion overlapping the DGAP113 breakpoint on chromosome 3 and associated with developmental delay and/or other significant developmental or morphological phenotypes;81 nsv534571, an ∼639.7 kb duplication of uncertain significance associated with muscular hypotonia and overlapping the DGAP287 breakpoint on chromosome 10; and variants nsv532026 and nsv917014, two ∼613 kb duplications classified as “uncertain significance” and “likely benign,” respectively, overlapping the DGAP315 breakpoint in chromosome 6, and associated with developmental delay and/or other significant developmental or morphological phenotypes as well as autism and global developmental delay. All detected variants are associated with phenotypes observed in the DGAP subjects, especially DGAP017’s hypoplasia, the developmental delay observed in DGAP113, and DGAP315’s significant developmental or morphological phenotypes.

Strictly speaking, these phenotypes are disparate but fall under similar phenotypic categories, which could enable identification of long-range-effect genes between different subjects with similar clinical features and chromosomal rearrangements. These comparisons highlight the importance of establishing detailed, specific, and unbiased guidelines for assigning phenotypes when performing computational phenotype comparisons.

Discussion

Structural variation of the human genome, either inherited or arising from de novo germline or somatic mutations, can give rise to different phenotypes through several mechanisms. Chromosomal rearrangements can alter gene dosage, promote gene fusions, unmask recessive alleles, or disrupt associations between genes and their regulatory elements. The traditional clinical focus of studying genes disrupted by chromosomal rearrangements has shifted to also assessing regions neighboring these variants.49 This search for position effects has been particularly important in the analysis of chromosomal rearrangements associated with different clinical conditions and disrupting non-annotated genomic regions.21, 22

The study of chromatin conformation has been requisite in the analysis of such non-coding rearrangements. DNA is organized in the three-dimensional nucleus at varying hierarchical levels that are important for the regulation of gene expression,32 with primary roles in embryonic development and disease.82 Several studies have analyzed the impact of structural variants in disease-causing disruption of the regulatory chromatin environment;41, 42, 44, 45, 46, 48 these studies have set the precedent for integrative analyses of disrupted chromatin conformation to expedite functional annotations of non-coding chromosomal rearrangements.

We tested the possibility of utilizing chromatin contact information to dissect chromosomal rearrangements that disrupt non-coding chromosome regions in clinical cases. We focused on 17 DGAP subjects (12 of whom have available clinical microarray information) with different rare presentations and de novo non-coding BCAs classified as VUSs. Of these, 15 had translocations, and two had inversions. These subjects represent ∼11% of the total number of sequenced DGAP subjects, which makes our predictions even more significant for future potential treatment or management of subjects who would not otherwise obtain a clinical diagnosis. Utilizing publicly available annotated genomic and regulatory elements, chromatin conformation information, predicted enhancer-promoter interactions, phenomatch scores, and HI and triplosensitivity information for all genes surrounding the BCA breakpoints at different window sizes (±3 and ± 1 Mb and BCA-containing TAD positions), we discovered 16 genes that are top-ranking position-effect candidates for 11 DGAP subjects’ clinical phenotypes (Table 1).

We observed that eight of the sequenced DGAP BCA breakpoints, corresponding to six DGAP subjects (DGAP017, DGAP176, DGAP249, DGAP275, DGAP288, and DGAP322), overlapped reported annotated and predicted enhancers and DHSs. Disruption of these regulatory elements could potentially cause improper gene expression or repression through altered enhancer-promoter interactions or interactions with other DHS-associated elements, such as insulators and locus control regions, among others. In fact, four of the breakpoints that disrupt annotated DHSs and enhancers have been shown to establish chromatin contacts with our top position-effect candidate genes in the region in Hi-C data of H1-hESC cells at 40 kb resolution (Table S18). For example, the DGAP275_B breakpoint is involved in a chromatin interaction that puts it into physical proximity with POLE and ANKLE2, DGAP288_B contacts SOX9, and DGAP176_B interacts with ACSL4. Three additional breakpoints from DGAP111, DGAP249, and DGAP287 overlap CTCF binding sites. CTCF binding sites are enriched in TAD boundaries,35 and the elimination of these binding sites could potentially induce gene expression or other functional changes through alteration of the structural regulatory landscape of the region.41

There are nine DGAP subjects (DGAP113, DGAP126, DGAP138, DGAP153, DGAP163, DGAP252, DGAP315, DGAP319, and DGAP329), six with normal arrays and two with benign CNVs, for whom no overlap with genomic or other regulatory elements was detected. These subjects thus represent events in which position effects are most likely caused by alteration of the underlying chromatin structure itself. This hypothesis is supported by detection of a vast number of disrupted chromatin contacts in four different cell lines (H1-hESC, IMR90, GM06990, and GM12878) at different Hi-C window resolutions, 32 breakpoints in H1-hESC TADs,35 and the separation of 193 genes from 1–91 of their predicted enhancers after the occurrence of the BCAs (Table S14). For example, SOS1, one of the most significant candidates in explaining DGAP163’s global developmental delay, dysmorphic and distinctive facies, and hearing loss (as observed in Noonan syndrome 1 [NS1 (MIM: 163950)]), is separated from its interaction with 88 predicted enhancers (Figure 3) and exhibited a decrease in expression in DGAP163-derived LCLs. However, NS1 is caused by autosomal-dominant mutations in SOS1. We hypothesize that the reduced expression of SOS1 might affect the RAS-MAPK signaling pathway and generate clinical features not completely overlapping those of NS1; however, this possibility remains to be functionally tested and complemented with analyses of genomic single-nucleotide variants. A similar approach could be explored for DGAP275, where we hypothesize that POLE, associated with facial dysmorphism, immunodeficiency, livedo, and short stature syndrome (MIM: 615139) in an autosomal-recessive manner,83 could contribute to the extreme short stature observed in this DGAP subject. Furthermore, ZEB2, related to Mowat-Wilson syndrome (MOWS [MIM: 235730]) in an autosomal-dominant manner (MIM: 235730), could potentially explain the hypotonia and neurological features observed in DGAP329 but not all of the dysmorphic features of MOWS. Overall, assessing the validity of our position-effect predictions and the disruption of important chromatin regulatory elements will require rigorous analysis of more candidate genes. Nonetheless, insight into the molecular pathway of disorders could be forthcoming from our approach and of value in the management of some individuals.

Figure 3.

Figure 3

Disrupted DHS/Enhancer-Promoter Interactions Predicted for SOS1

Gene position is indicated by an asterisk. The color-graded rectangle represents the correlation values for the interactions reported by Dixon et al.40 The dashed line indicates the translocation breakpoint position in chromosome 2. Lilac rectangles represent genes, and pink rectangles show TAD positions annotated in H1-hESC.

All predicted candidate genes have different lines of evidence supporting their selection, starting with a significant phenomatch score that correlates annotated gene phenotypes with those observed in the DGAP subjects. Evidence of HI and triplosensitivity, inclusion in TAD regions, and HI scores build upon this selection and can help laboratories and clinicians focus on candidates of their interest in subsequent analyses. As of now, the “top-ranking” candidates have the most evidence supporting their selection; however, the 17 analyzed DGAP subjects have 102 second-tier candidates within ±1 Mb analysis windows, and these could very well play a functional role. Presently, we are unable to give “weights” to any of these selection criteria (i.e., a gene with a high phenomatch score and no evidence of HI is “more significant” than a gene with a medium phenomatch score and evidence of HI) mainly for two reasons: (1) we would need to collect more examples, which might not be easy to find and require a tremendous curation effort, and (2) we need to understand the possibility, suggested by our results, that more than one gene could contribute to the clinical presentation of the DGAP subjects either simultaneously or throughout development. Moreover, many of the candidates have recessive inheritance modes, making it necessary to assess the mutational status of both alleles, as well as additional sequence variants not captured by our BCA breakpoint sequencing and the microarrays. Future in-depth exome, DNA, and RNA sequencing, as well as Hi-C experiments, will provide a comprehensive view of the contribution of sequence variants, disruption of chromatin contacts, and changes in gene expression in the DGAP disease etiologies, such that guidelines might be developed as to which candidates should be followed up first and further studied with comprehensive functional validation via animal models and human cell lines that reproduce the BCA breakpoints.

Overall, our results suggest that the integration of phenomatch scores, altered chromatin contacts, and other clinical gene annotations provides valuable interpretation to many VUSs through long-range position effects. The correct prediction of 52 of 57 known pathogenic genes in DGAP subjects used as positive control individuals supports such integration. Our computational analysis is rapid and can provide additional information to benefit the clinical assessment of both coding and non-coding genome variants. The latter is an important step toward predicting pathogenic consequences of non-coding variation observed in prenatal samples. For example, given its position and chromatin contact alterations, we correctly predicted the involvement and decreased expression of SOX9 in the cleft palate Pierre-Robin sequence (MIM: 261800) association in DGAP288.49

Lastly, we would like to note that predicting the pathogenic outcome of disrupted chromatin contacts is not a straightforward endeavor: it has been shown that a single gene promoter can be targeted by several enhancers,63 therefore compensating for the perturbed interactions by the chromosomal rearrangements. In addition, rearrangements can reposition gene promoters and enhancers outside of their preferred chromatin environments, leading to improper gene activation by enhancer adoption.41 Our method currently identifies instances in which known and predicted enhancer-promoter interactions are disrupted by the rearrangement breakpoints and thus lead to decreased candidate-gene expression. Prediction of enhancer adoption will be incorporated once mathematical models of TAD formation upon changes in genomic sequence are refined and available to the greater scientific community. Presently, our predictions are as good as the availability of pathogenic gene annotations, chromatin conformation data, clinical phenotype information, and the presence of similar rearrangements in databases such as DECIPHER and dbVar. Although the existence of other subjects with phenotypes related to those of the DGAP subjects does not prove the involvement of neighboring genes in the etiology of these phenotypes, it is a step toward predicting pathogenic effects by starting from a simple computational analysis, pointing to a better phenotypic categorization during the clinical examination of affected individuals. By making our position-effect prediction method available to the human genetics community (see Web Resources), we hope to study additional subjects with complete phenotypic information and be able to better refine the rules for predicting position effects on gene expression and discover new mechanisms of pathogenicity.

Acknowledgments

We offer heartfelt gratitude to all DGAP research participants and their families and to countless genetic counselors, clinical geneticists, cytogeneticists, and physicians for their ongoing support of our study and for referrals to our project. This study was funded by the National Institutes of Health (GM061354 to C.C.M. and M.E.T.).

Published: July 20, 2017

Footnotes

Supplemental Data include a Supplemental Note detailing clinical case reports, a Supplemental Note detailing the karyotypes of DGAP subjects, 1 figure, and 22 tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2017.06.011.

Web Resources

Supplemental Data

Document S1. Supplemental Notes and Figure S1
mmc1.pdf (245.9KB, pdf)
Table S1. 17 Subjects with Both Breakpoints in Non-coding Regions

Case identifiers are provided per studied subject (Subject ID), in addition to their karyotypes using the International System for Human Cytogenetic Nomenclature (ISCN2016) and array information reported in hg19 unless otherwise stated in hg18. Each case has two reported breakpoints (A and B), and for each we provide cytogenetic band and nucleotide locations in hg19 coordinates for the derivative chromosomes involved in their generation (der(A) and der(B)). We also report the sequencing reads by which the breakpoints were identified, and the overlap with known annotated genes (Disrupted gene 1 and Disrupted Gene 2), as well as the two nearest genes (Closest Gene 1 and Closest Gene 2) and their distance in base pairs (bp) to the breakpoint locations (Distance to gene 1 and Distance to Gene 2) in the derivative chromosomes. Negative distance numbers indicate genes upstream of the breakpoint position, while positive numbers indicate genes located downstream of the breakpoint.

mmc2.xlsx (25.7KB, xlsx)
Table S2. Overlap between Non-coding DGAP Breakpoint Positions and Gene Promoters

The number of annotated Ensembl GRCh37 gene promoters (Ensembl_GRCh37_promoters) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).

mmc3.xlsx (15.1KB, xlsx)
Table S3. Overlap between Non-coding DGAP Breakpoint Positions and Transcription Factor Binding Sites

The number of annotated Ensembl GRCh37 transcription factor binding sites (Ensembl_GRCh37_tfbindingsites) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).

mmc4.xlsx (13.9KB, xlsx)
Table S4. Overlap between Non-coding DGAP Breakpoint Positions and Enhancers

The number of primary cell (Primary_cell_enhancers), tissue (Tissue_enhancers), H1-ESC (ChromHMM_H1_ESC_enhancers), GM12878 (ChromHMM_GM12878_enhancers), and VISTA (VISTA_db_hg19) enhancers that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Enhancer positions were obtained from Andersson et al.6 (see references in Document S1). ENCODE, and the VISTA enhancer database human version hg19. Highlighted green rows indicate breakpoints which overlapped one or more of the enhancer categories analyzed.

mmc5.xlsx (13.9KB, xlsx)
Table S5. Overlap between Non-coding DGAP Breakpoint Positions and DNaseI Hypersensitive Sites

The number of DNaseI hypersensitive sites from H1-hESC, GM06990, GM12878, and the master table (a compilation of 125 cell lines DNaseI clusters) from ENCODE that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the DNaseI hypersensitive sites in the different cell lines analyzed.

mmc6.xlsx (27.1KB, xlsx)
Table S6. Overlap between Non-coding DGAP Breakpoint Positions and CTCF Binding Sites

The number of ENCODE CTCF binding sites from H1-hESC and GM12878 that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the CTCF binding sites in the two cell lines analyzed.

mmc7.xlsx (17.4KB, xlsx)
Table S7. Overlap between Non-coding DGAP Breakpoint Positions and ENCODE Chromatin State Segments

ENCODE chromatin state segment classifications per non-coding DGAP breakpoint (DGAP id, chr, start, end) for H1-hESC and GM12878 cell lines. Chromatin state segment coordinates and other bed file information is displayed starting from column #bin until column itemRGB. Please refer to ENCODE’s bed items description from here: http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgTables. Chromatin state names CTCF = CTCF binding site, E = enhancer, WE = weak enhancer, T = transcriptionally active, R = transcriptionally repressed.

mmc8.xlsx (19.4KB, xlsx)
Table S8. Overlap between Non-coding DGAP Breakpoint Positions and Repetitive Elements

The number of repetitive elements as assessed by Repeat Masker that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Repetitive elements information such as coordinates (Rep_chr, Rep_start, Rep_end), name, class and family are provided for each overlap.

mmc9.xlsx (21KB, xlsx)
Table S9. Overlap between Non-coding DGAP Breakpoint Positions and Topologically Associating Domains

The number of topologically associating domains (TADs) in H1-hESC and IMR90 (see Dixon et al.7 in Document S1) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). TAD information such as coordinates (TAD_chr, TAD_start, TAD_end) are provided for each overlap.

mmc10.xlsx (17KB, xlsx)
Table S10. Overlap between Non-coding DGAP Breakpoint Positions and High-Resolution Chromatin Subcompartments and Arrowhead Domains

The number of high-resolution chromatin subcompartments and arrowhead domains in IMR90 and GM12878 (see Rao et al.8 in Document S1) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Chromatin subcompartments and arrowhead domains information such as coordinates and class are provided for each overlap.

mmc11.xlsx (20.6KB, xlsx)
Table S11. Disruption of Chromatin Contacts by Non-coding DGAP Breakpoint Positions

The number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in Hi-C datasets of 20 and 40 Kb resolution of H1-hESC (see Dixon et al.7 in Document S1) (Esc_20kb_HindIII_rep1, Esc_20kb_HindIII_rep2, Esc_40kb_hindIII_combined, Esc_40kb_hindIII_rep1, Esc_40kb_hindIII_rep2), 20 and 40 Kb resolution of IMR90 (Dixon et al.7) (IMR90_20kb_hindIII_rep1, IMR90_20kb_hindIII_rep2, IMR90_40kb_hindIII_combined, IMR90_40kb_hindIII_rep1, IMR90_40kb_hindIII_rep2), 100Kb and 1Mb resolution of GM06990 (http://epigenomegateway.wustl.edu/) (GM06990_obsexp_100kb, GM06990_obsexp_1mb) and looplists from Rao et al.8 (see Document S1) for GM12878 and IMR90 (GSE63525_GM12878_primary+replicate_HiCCUPS_looplist, GSE63525_IMR90_HiCCUPS_looplist).

mmc12.xlsx (17.5KB, xlsx)
Table S12. Disruption of GM12878 Chromatin Contacts at Various Resolution Levels by Non-coding DGAP Breakpoint Positions

The number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in the 50Kb, 100Kb, 250Kb, 500Kb and 1Mb resolution Hi-C datasets from Rao et al.8 (see Document S1) for GM12878.

mmc13.xlsx (15.2KB, xlsx)
Table S13. Disruption of Predicted Disrupted ENCODE Distal DHS/Enhancer-Promoter Connections by Non-coding DGAP Breakpoint Positions

The number of predicted ENCODE distal DHS/enhancer-promoter connections (see Thurman et al.9 in Document S1) (promoter_DHS_chr, promoter_DHS_start, promoter_DHS_end, promoter_DHS_gene, distal_DHS_chr, distal_DHS_start, distal_DHS_end, promoter_distal_DHS_correlation) by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their ± 500 Kb analysis windows (window_start, window_end).

mmc14.xlsx (180.1KB, xlsx)
Table S14. Genes with Predicted Disrupted ENCODE Distal DHS/Enhancer-Promoter Connections by the Non-coding DGAP Breakpoint Positions

The names of genes (Genes) separated from their predicted enhancers (Disrupted_enh_prom_interactions) (see Thurman et al.9 in Document S1).

mmc15.xlsx (13.9KB, xlsx)
Table S15. Identification of Genes with Potential Position Effects

Candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for each non-coding DGAP breakpoint position (DGAP id, chr, start, end) within their analysis windows (window_start, window_end). Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into consideration. Green row highlight indicates highest ranking gene, and yellow row highlight indicates second best ranking genes.

mmc16.xlsx (583KB, xlsx)
Table S16. Translation of DGAP Clinical Features to HPO Terms

HPO identifiers per DGAP case.

mmc17.xlsx (10.2KB, xlsx)
Table S17. Identification of Genes with Potential Position Effects in Instances of Known Pathogenicity

Candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for the set of known pathogenic rearrangement positive controls (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) from Redin et al.10 (see Document S1). Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into consideration. Yellow row highlight indicates pathogenic genes reported by Redin et al.10 (see Document S1).

mmc18.xlsx (3.9MB, xlsx)
Table S18. Identification of Disrupted Chromatin Contacts between Disrupted DHSs and Enhancers by the Non-coding DGAP Breakpoint Positions

An agnostic search revealed the existence of chromatin contacts between breakpoint-disrupted sequences of DHS sites and gene enhancers in Hi-C data of H1-hESC cells at 40 kb resolution (see Dixon et al.7 in Document S1). The reported genes are our top position effect candidate genes in the region. Table columns report the candidate gene information (Gene_chr, Gene_start, Gene_end, Gene_name), the associated DGAP case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) and the disrupted Hi-C chromatin interaction (HiC_1_chr, HiC_1_start, HiC_1_end, HiC_2_chr, HiC_2_start, HiC_2_end, HiC_1_interaction).

mmc19.xlsx (11.3KB, xlsx)
Table S19. Overlap between Non-coding DGAP Breakpoint Positions and DECIPHER Cases

The number of DECIPHER cases that overlap non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case information such as ID_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype are provided for each overlap.

mmc20.xlsx (5.3MB, xlsx)
Table S20. Genes Contained within DECIPHER Cases Overlapped by Non-coding DGAP Breakpoint Positions

The number of genes contained within overlapped DECIPHER cases by the non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case and gene information such as gene_count, DECIPHER_ID, DECIPHER_chr, DECIPHER_start, DECIPHER_end, DECIPHER_value, DECIPHER_type_rearr, DECIPHER_phenotype and HG_symbol are provided for each overlapped DECIPHER case.

mmc21.xlsx (347.7KB, xlsx)
Table S21. DECIPHER Cases Overlapped by Non-coding DGAP Breakpoint Positions That Fulfilled Non-coding Selection Criteria

The number of DECIPHER cases that have non-coding breakpoints. DGAP comparison case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) is provided, as well as overlapped DECIPHER case information containing id_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype.

mmc22.xlsx (9.4KB, xlsx)
Table S22. Overlap between Non-coding DGAP Breakpoint Positions and dbVar Cases

The number of dbVar cases that overlap non-coding DGAP breakpoints (DGAP_chr, DGAP_start, DGAP_end, DGAP_ID). dbVar case information such as dbVar ID, Start, End, Variant type, Gene, Molecular consequences, Most severe clinical significance, 1000G minor allele, 1000G MAF, GO-ESP minor allele, GO-ESP MAF, ExAC minor allele, ExAC MAF, Publications (PMIDs), Variant allele, Transcript change, RefSeq, Protein change, Molecular consequence, HGVS_c, HGVS_g, HGVS_ng, HGVS_p, Condition, Most severe clinical significance, Submitters, Highest review status and Last evaluated are provided for each overlap.

mmc23.xlsx (155.2KB, xlsx)
Document S2. Article plus Supplemental Data
mmc24.pdf (876.6KB, pdf)

References

  • 1.Lejeune J., Gautier M., Turpin R. [Study of somatic chromosomes from 9 mongoloid children] C. R. Hebd. Seances Acad. Sci. 1959;248:1721–1722. [PubMed] [Google Scholar]
  • 2.Ford C.E., Jones K.W., Polani P.E., De Almeida J.C., Briggs J.H. A sex-chromosome anomaly in a case of gonadal dysgenesis (Turner's syndrome) Lancet. 1959;1:711–713. doi: 10.1016/s0140-6736(59)91893-8. [DOI] [PubMed] [Google Scholar]
  • 3.Jacobs P.A., Strong J.A. A case of human intersexuality having a possible XXY sex-determining mechanism. Nature. 1959;183:302–303. doi: 10.1038/183302a0. [DOI] [PubMed] [Google Scholar]
  • 4.Stankiewicz P., Lupski J.R. Genome architecture, rearrangements and genomic disorders. Trends Genet. 2002;18:74–82. doi: 10.1016/s0168-9525(02)02592-1. [DOI] [PubMed] [Google Scholar]
  • 5.Iafrate A.J., Feuk L., Rivera M.N., Listewnik M.L., Donahoe P.K., Qi Y., Scherer S.W., Lee C. Detection of large-scale variation in the human genome. Nat. Genet. 2004;36:949–951. doi: 10.1038/ng1416. [DOI] [PubMed] [Google Scholar]
  • 6.Sebat J., Lakshmi B., Troge J., Alexander J., Young J., Lundin P., Månér S., Massa H., Walker M., Chi M. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. doi: 10.1126/science.1098918. [DOI] [PubMed] [Google Scholar]
  • 7.Hinds D.A., Kloek A.P., Jen M., Chen X., Frazer K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 2006;38:82–85. doi: 10.1038/ng1695. [DOI] [PubMed] [Google Scholar]
  • 8.Conrad D.F., Andrews T.D., Carter N.P., Hurles M.E., Pritchard J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 2006;38:75–81. doi: 10.1038/ng1697. [DOI] [PubMed] [Google Scholar]
  • 9.Conrad D.F., Pinto D., Redon R., Feuk L., Gokcumen O., Zhang Y., Aerts J., Andrews T.D., Barnes C., Campbell P. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712. doi: 10.1038/nature08516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Korbel J.O., Urban A.E., Affourtit J.P., Godwin B., Grubert F., Simons J.F., Kim P.M., Palejev D., Carriero N.J., Du L. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318:420–426. doi: 10.1126/science.1149504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Stankiewicz P., Lupski J.R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 2010;61:437–455. doi: 10.1146/annurev-med-100708-204735. [DOI] [PubMed] [Google Scholar]
  • 12.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Carvalho C.M., Lupski J.R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 2016;17:224–238. doi: 10.1038/nrg.2015.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang F., Gu W., Hurles M.E., Lupski J.R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genomics Hum. Genet. 2009;10:451–481. doi: 10.1146/annurev.genom.9.081307.164217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Theisen A., Shaffer L.G. Disorders caused by chromosome abnormalities. Appl. Clin. Genet. 2010;3:159–174. doi: 10.2147/TACG.S8884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nambiar M., Raghavan S.C. How does DNA break during chromosomal translocations? Nucleic Acids Res. 2011;39:5813–5825. doi: 10.1093/nar/gkr223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Higgins A.W., Alkuraya F.S., Bosco A.F., Brown K.K., Bruns G.A., Donovan D.J., Eisenman R., Fan Y., Farra C.G., Ferguson H.L. Characterization of apparently balanced chromosomal rearrangements from the developmental genome anatomy project. Am. J. Hum. Genet. 2008;82:712–722. doi: 10.1016/j.ajhg.2008.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kleinjan D.A., van Heyningen V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 2005;76:8–32. doi: 10.1086/426833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Weiler K.S., Wakimoto B.T. Heterochromatin and gene expression in Drosophila. Annu. Rev. Genet. 1995;29:577–605. doi: 10.1146/annurev.ge.29.120195.003045. [DOI] [PubMed] [Google Scholar]
  • 21.Zhang F., Lupski J.R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 2015;24(R1):R102–R110. doi: 10.1093/hmg/ddv259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Spielmann M., Mundlos S. Looking beyond the genes: the role of non-coding variants in human disease. Hum. Mol. Genet. 2016;25:R157–R165. doi: 10.1093/hmg/ddw205. [DOI] [PubMed] [Google Scholar]
  • 23.Fantes J., Redeker B., Breen M., Boyle S., Brown J., Fletcher J., Jones S., Bickmore W., Fukushima Y., Mannens M. Aniridia-associated cytogenetic rearrangements suggest that a position effect may cause the mutant phenotype. Hum. Mol. Genet. 1995;4:415–422. doi: 10.1093/hmg/4.3.415. [DOI] [PubMed] [Google Scholar]
  • 24.Kleinjan D.A., Seawright A., Schedl A., Quinlan R.A., Danes S., van Heyningen V. Aniridia-associated translocations, DNase hypersensitivity, sequence comparison and transgenic analysis redefine the functional domain of PAX6. Hum. Mol. Genet. 2001;10:2049–2059. doi: 10.1093/hmg/10.19.2049. [DOI] [PubMed] [Google Scholar]
  • 25.Cai J., Goodman B.K., Patel A.S., Mulliken J.B., Van Maldergem L., Hoganson G.E., Paznekas W.A., Ben-Neriah Z., Sheffer R., Cunningham M.L. Increased risk for developmental delay in Saethre-Chotzen syndrome is associated with TWIST deletions: an improved strategy for TWIST mutation screening. Hum. Genet. 2003;114:68–76. doi: 10.1007/s00439-003-1012-7. [DOI] [PubMed] [Google Scholar]
  • 26.Flomen R.H., Vatcheva R., Gorman P.A., Baptista P.R., Groet J., Barisić I., Ligutic I., Nizetić D. Construction and analysis of a sequence-ready map in 4q25: Rieger syndrome can be caused by haploinsufficiency of RIEG, but also by chromosome breaks approximately 90 kb upstream of this gene. Genomics. 1998;47:409–413. doi: 10.1006/geno.1997.5127. [DOI] [PubMed] [Google Scholar]
  • 27.Trembath D.G., Semina E.V., Jones D.H., Patil S.R., Qian Q., Amendt B.A., Russo A.F., Murray J.C. Analysis of two translocation breakpoints and identification of a negative regulatory element in patients with Rieger’s syndrome. Birth Defects Res. A Clin. Mol. Teratol. 2004;70:82–91. doi: 10.1002/bdra.10154. [DOI] [PubMed] [Google Scholar]
  • 28.Velagaleti G.V., Bien-Willner G.A., Northup J.K., Lockhart L.H., Hawkins J.C., Jalal S.M., Withers M., Lupski J.R., Stankiewicz P. Position effects due to chromosome breakpoints that map approximately 900 Kb upstream and approximately 1.3 Mb downstream of SOX9 in two patients with campomelic dysplasia. Am. J. Hum. Genet. 2005;76:652–662. doi: 10.1086/429252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kleinjan D.J., van Heyningen V. Position effect in human genetic disease. Hum. Mol. Genet. 1998;7:1611–1618. doi: 10.1093/hmg/7.10.1611. [DOI] [PubMed] [Google Scholar]
  • 30.Lupski J.R., Stankiewicz P. Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet. 2005;1:e49. doi: 10.1371/journal.pgen.0010049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dekker J., Rippe K., Dekker M., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
  • 32.de Wit E., de Laat W. A decade of 3C technologies: insights into nuclear organization. Genes Dev. 2012;26:11–24. doi: 10.1101/gad.179804.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fullwood M.J., Liu M.H., Pan Y.F., Liu J., Xu H., Mohamed Y.B., Orlov Y.L., Velkov S., Ho A., Mei P.H. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sanyal A., Lajoie B.R., Jain G., Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489:109–113. doi: 10.1038/nature11279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Phillips-Cremins J.E., Sauria M.E., Sanyal A., Gerasimova T.I., Lajoie B.R., Bell J.S., Ong C.T., Hookway T.A., Guo C., Sun Y. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell. 2013;153:1281–1295. doi: 10.1016/j.cell.2013.04.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rao S.S., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S., Aiden E.L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mifsud B., Tavares-Cadete F., Young A.N., Sugar R., Schoenfelder S., Ferreira L., Wingett S.W., Andrews S., Grey W., Ewels P.A. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 2015;47:598–606. doi: 10.1038/ng.3286. [DOI] [PubMed] [Google Scholar]
  • 40.Dixon J.R., Jung I., Selvaraj S., Shen Y., Antosiewicz-Bourget J.E., Lee A.Y., Ye Z., Kim A., Rajagopal N., Xie W. Chromatin architecture reorganization during stem cell differentiation. Nature. 2015;518:331–336. doi: 10.1038/nature14222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lupiáñez D.G., Kraft K., Heinrich V., Krawitz P., Brancati F., Klopocki E., Horn D., Kayserili H., Opitz J.M., Laxova R. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Gröschel S., Sanders M.A., Hoogenboezem R., de Wit E., Bouwman B.A., Erpelinck C., van der Velden V.H., Havermans M., Avellino R., van Lom K. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell. 2014;157:369–381. doi: 10.1016/j.cell.2014.02.019. [DOI] [PubMed] [Google Scholar]
  • 43.Claussnitzer M., Dankel S.N., Kim K.H., Quon G., Meuleman W., Haugen C., Glunk V., Sousa I.S., Beaudry J.L., Puviindran V. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 2015;373:895–907. doi: 10.1056/NEJMoa1502214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Visser M., Kayser M., Palstra R.J. HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. Genome Res. 2012;22:446–455. doi: 10.1101/gr.128652.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Roussos P., Mitchell A.C., Voloudakis G., Fullard J.F., Pothula V.M., Tsang J., Stahl E.A., Georgakopoulos A., Ruderfer D.M., Charney A. A role for noncoding variation in schizophrenia. Cell Rep. 2014;9:1417–1429. doi: 10.1016/j.celrep.2014.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Giorgio E., Robyr D., Spielmann M., Ferrero E., Di Gregorio E., Imperiale D., Vaula G., Stamoulis G., Santoni F., Atzori C. A large genomic deletion leads to enhancer adoption by the lamin B1 gene: a second path to autosomal dominant adult-onset demyelinating leukodystrophy (ADLD) Hum. Mol. Genet. 2015;24:3143–3154. doi: 10.1093/hmg/ddv065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Oldridge D.A., Wood A.C., Weichert-Leahey N., Crimmins I., Sussman R., Winter C., McDaniel L.D., Diamond M., Hart L.S., Zhu S. Genetic predisposition to neuroblastoma mediated by a LMO1 super-enhancer polymorphism. Nature. 2015;528:418–421. doi: 10.1038/nature15540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ibn-Salem J., Köhler S., Love M.I., Chung H.R., Huang N., Hurles M.E., Haendel M., Washington N.L., Smedley D., Mungall C.J. Deletions of chromosomal regulatory boundaries are associated with congenital disease. Genome Biol. 2014;15:423. doi: 10.1186/s13059-014-0423-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ordulu Z., Kammin T., Brand H., Pillalamarri V., Redin C.E., Collins R.L., Blumenthal I., Hanscom C., Pereira S., Bradley I. Structural Chromosomal Rearrangements Require Nucleotide-Level Resolution: Lessons from Next-Generation Sequencing in Prenatal Diagnosis. Am. J. Hum. Genet. 2016;99:1015–1033. doi: 10.1016/j.ajhg.2016.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ligon A.H., Moore S.D., Parisi M.A., Mealiffe M.E., Harris D.J., Ferguson H.L., Quade B.J., Morton C.C. Constitutional rearrangement of the architectural factor HMGA2: a novel human phenotype including overgrowth and lipomas. Am. J. Hum. Genet. 2005;76:340–348. doi: 10.1086/427565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kim H.G., Kishikawa S., Higgins A.W., Seong I.S., Donovan D.J., Shen Y., Lally E., Weiss L.A., Najm J., Kutsche K. Disruption of neurexin 1 associated with autism spectrum disorder. Am. J. Hum. Genet. 2008;82:199–207. doi: 10.1016/j.ajhg.2007.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lu W., Quintero-Rivera F., Fan Y., Alkuraya F.S., Donovan D.J., Xi Q., Turbe-Doan A., Li Q.G., Campbell C.G., Shanske A.L. NFIA haploinsufficiency is associated with a CNS malformation syndrome and urinary tract defects. PLoS Genet. 2007;3:e80. doi: 10.1371/journal.pgen.0030080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Redin C., Brand H., Collins R.L., Kammin T., Mitchell E., Hodge J.C., Hanscom C., Pillalamarri V., Seabra C.M., Abbott M.A. The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies. Nat. Genet. 2017;49:36–45. doi: 10.1038/ng.3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Talkowski M.E., Ernst C., Heilbut A., Chiang C., Hanscom C., Lindgren A., Kirby A., Liu S., Muddukrishna B., Ohsumi T.K. Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. Am. J. Hum. Genet. 2011;88:469–481. doi: 10.1016/j.ajhg.2011.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Quinn J.J., Chang H.Y. Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 2016;17:47–62. doi: 10.1038/nrg.2015.10. [DOI] [PubMed] [Google Scholar]
  • 56.Pink R.C., Wicks K., Caley D.P., Punch E.K., Jacobs L., Carter D.R. Pseudogenes: pseudo-functional or key regulators in health and disease? RNA. 2011;17:792–798. doi: 10.1261/rna.2658311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Muro E.M., Andrade-Navarro M.A. Pseudogenes as an alternative source of natural antisense transcripts. BMC Evol. Biol. 2010;10:338. doi: 10.1186/1471-2148-10-338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ordulu Z., Wong K.E., Currall B.B., Ivanov A.R., Pereira S., Althari S., Gusella J.F., Talkowski M.E., Morton C.C. Describing sequencing results of structural chromosome rearrangements with a suggested next-generation cytogenetic nomenclature. Am. J. Hum. Genet. 2014;94:695–709. doi: 10.1016/j.ajhg.2014.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Flicek P., Amode M.R., Barrell D., Beal K., Billis K., Brent S., Carvalho-Silva D., Clapham P., Coates G., Fitzgerald S. Ensembl 2014. Nucleic Acids Res. 2014;42:D749–D755. doi: 10.1093/nar/gkt1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Cabili M.N., Trapnell C., Goff L., Koziol M., Tazon-Vega B., Regev A., Rinn J.L. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. doi: 10.1101/gad.17446611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Huang N., Lee I., Marcotte E.M., Hurles M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 2010;6:e1001154. doi: 10.1371/journal.pgen.1001154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Rehm H.L., Berg J.S., Brooks L.D., Bustamante C.D., Evans J.P., Landrum M.J., Ledbetter D.H., Maglott D.R., Martin C.L., Nussbaum R.L., ClinGen ClinGen--the Clinical Genome Resource. N. Engl. J. Med. 2015;372:2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Thurman R.E., Rynes E., Humbert R., Vierstra J., Maurano M.T., Haugen E., Sheffield N.C., Stergachis A.B., Wang H., Vernot B. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Narendra V., Rocha P.P., An D., Raviram R., Skok J.A., Mazzoni E.O., Reinberg D. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015;347:1017–1021. doi: 10.1126/science.1262088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Flavahan W.A., Drier Y., Liau B.B., Gillespie S.M., Venteicher A.S., Stemmer-Rachamimov A.O., Suvà M.L., Bernstein B.E. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature. 2016;529:110–114. doi: 10.1038/nature16490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Hnisz D., Weintraub A.S., Day D.S., Valton A.L., Bak R.O., Li C.H., Goldmann J., Lajoie B.R., Fan Z.P., Sigova A.A. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science. 2016;351:1454–1458. doi: 10.1126/science.aad9024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Consortium E.P., ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Andersson R., Gebhard C., Miguel-Escalada I., Hoof I., Bornholdt J., Boyd M., Chen Y., Zhao X., Schmidl C., Suzuki T., FANTOM Consortium An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Visel A., Minovitsky S., Dubchak I., Pennacchio L.A. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35:D88–D92. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Zhou X., Wang T. Using the Wash U Epigenome Browser to examine genome-wide sequencing data. Curr. Protoc. Bioinformatics. 2012;Chapter 10:10. doi: 10.1002/0471250953.bi1010s40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Heger A., Webber C., Goodson M., Ponting C.P., Lunter G. GAT: a simulation framework for testing the association of genomic intervals. Bioinformatics. 2013;29:2046–2048. doi: 10.1093/bioinformatics/btt343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Köhler S., Doelken S.C., Mungall C.J., Bauer S., Firth H.V., Bailleul-Forestier I., Black G.C., Brown D.L., Brudno M., Campbell J. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42:D966–D974. doi: 10.1093/nar/gkt1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Firth H.V., Richards S.M., Bevan A.P., Clayton S., Corpas M., Rajan D., Van Vooren S., Moreau Y., Pettett R.M., Carter N.P. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 2009;84:524–533. doi: 10.1016/j.ajhg.2009.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lappalainen I., Lopez J., Skipper L., Hefferon T., Spalding J.D., Garner J., Chen C., Maguire M., Corbett M., Zhou G. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013;41:D936–D941. doi: 10.1093/nar/gks1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Gu W., Zhang F., Lupski J.R. Mechanisms for human genomic rearrangements. PathoGenetics. 2008;1:4. doi: 10.1186/1755-8417-1-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Cardoso A.R., Oliveira M., Amorim A., Azevedo L. Major influence of repetitive elements on disease-associated copy number variants (CNVs) Hum. Genomics. 2016;10:30. doi: 10.1186/s40246-016-0088-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Consortium G.T., GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Vetro A., Dehghani M.R., Kraoua L., Giorda R., Beri S., Cardarelli L., Merico M., Manolakos E., Parada-Bustamante A., Castro A. Testis development in the absence of SRY: chromosomal rearrangements at SOX9 and SOX3. Eur. J. Hum. Genet. 2015;23:1025–1032. doi: 10.1038/ejhg.2014.237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Miller D.T., Adam M.P., Aradhya S., Biesecker L.G., Brothman A.R., Carter N.P., Church D.M., Crolla J.A., Eichler E.E., Epstein C.J. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 2010;86:749–764. doi: 10.1016/j.ajhg.2010.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Bonev B., Cavalli G. Organization and function of the 3D genome. Nat. Rev. Genet. 2016;17:661–678. doi: 10.1038/nrg.2016.112. [DOI] [PubMed] [Google Scholar]
  • 83.Pachlopnik Schmid J., Lemoine R., Nehme N., Cormier-Daire V., Revy P., Debeurme F., Debré M., Nitschke P., Bole-Feysot C., Legeai-Mallet L. Polymerase ε1 mutation in a human syndrome with facial dysmorphism, immunodeficiency, livedo, and short stature (“FILS syndrome”) J. Exp. Med. 2012;209:2323–2330. doi: 10.1084/jem.20121303. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental Notes and Figure S1
mmc1.pdf (245.9KB, pdf)
Table S1. 17 Subjects with Both Breakpoints in Non-coding Regions

Case identifiers are provided per studied subject (Subject ID), in addition to their karyotypes using the International System for Human Cytogenetic Nomenclature (ISCN2016) and array information reported in hg19 unless otherwise stated in hg18. Each case has two reported breakpoints (A and B), and for each we provide cytogenetic band and nucleotide locations in hg19 coordinates for the derivative chromosomes involved in their generation (der(A) and der(B)). We also report the sequencing reads by which the breakpoints were identified, and the overlap with known annotated genes (Disrupted gene 1 and Disrupted Gene 2), as well as the two nearest genes (Closest Gene 1 and Closest Gene 2) and their distance in base pairs (bp) to the breakpoint locations (Distance to gene 1 and Distance to Gene 2) in the derivative chromosomes. Negative distance numbers indicate genes upstream of the breakpoint position, while positive numbers indicate genes located downstream of the breakpoint.

mmc2.xlsx (25.7KB, xlsx)
Table S2. Overlap between Non-coding DGAP Breakpoint Positions and Gene Promoters

The number of annotated Ensembl GRCh37 gene promoters (Ensembl_GRCh37_promoters) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).

mmc3.xlsx (15.1KB, xlsx)
Table S3. Overlap between Non-coding DGAP Breakpoint Positions and Transcription Factor Binding Sites

The number of annotated Ensembl GRCh37 transcription factor binding sites (Ensembl_GRCh37_tfbindingsites) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).

mmc4.xlsx (13.9KB, xlsx)
Table S4. Overlap between Non-coding DGAP Breakpoint Positions and Enhancers

The number of primary cell (Primary_cell_enhancers), tissue (Tissue_enhancers), H1-ESC (ChromHMM_H1_ESC_enhancers), GM12878 (ChromHMM_GM12878_enhancers), and VISTA (VISTA_db_hg19) enhancers that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Enhancer positions were obtained from Andersson et al.6 (see references in Document S1). ENCODE, and the VISTA enhancer database human version hg19. Highlighted green rows indicate breakpoints which overlapped one or more of the enhancer categories analyzed.

mmc5.xlsx (13.9KB, xlsx)
Table S5. Overlap between Non-coding DGAP Breakpoint Positions and DNaseI Hypersensitive Sites

The number of DNaseI hypersensitive sites from H1-hESC, GM06990, GM12878, and the master table (a compilation of 125 cell lines DNaseI clusters) from ENCODE that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the DNaseI hypersensitive sites in the different cell lines analyzed.

mmc6.xlsx (27.1KB, xlsx)
Table S6. Overlap between Non-coding DGAP Breakpoint Positions and CTCF Binding Sites

The number of ENCODE CTCF binding sites from H1-hESC and GM12878 that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the CTCF binding sites in the two cell lines analyzed.

mmc7.xlsx (17.4KB, xlsx)
Table S7. Overlap between Non-coding DGAP Breakpoint Positions and ENCODE Chromatin State Segments

ENCODE chromatin state segment classifications per non-coding DGAP breakpoint (DGAP id, chr, start, end) for H1-hESC and GM12878 cell lines. Chromatin state segment coordinates and other bed file information is displayed starting from column #bin until column itemRGB. Please refer to ENCODE’s bed items description from here: http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgTables. Chromatin state names CTCF = CTCF binding site, E = enhancer, WE = weak enhancer, T = transcriptionally active, R = transcriptionally repressed.

mmc8.xlsx (19.4KB, xlsx)
Table S8. Overlap between Non-coding DGAP Breakpoint Positions and Repetitive Elements

The number of repetitive elements as assessed by Repeat Masker that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Repetitive elements information such as coordinates (Rep_chr, Rep_start, Rep_end), name, class and family are provided for each overlap.

mmc9.xlsx (21KB, xlsx)
Table S9. Overlap between Non-coding DGAP Breakpoint Positions and Topologically Associating Domains

The number of topologically associating domains (TADs) in H1-hESC and IMR90 (see Dixon et al.7 in Document S1) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). TAD information such as coordinates (TAD_chr, TAD_start, TAD_end) are provided for each overlap.

mmc10.xlsx (17KB, xlsx)
Table S10. Overlap between Non-coding DGAP Breakpoint Positions and High-Resolution Chromatin Subcompartments and Arrowhead Domains

The number of high-resolution chromatin subcompartments and arrowhead domains in IMR90 and GM12878 (see Rao et al.8 in Document S1) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Chromatin subcompartments and arrowhead domains information such as coordinates and class are provided for each overlap.

mmc11.xlsx (20.6KB, xlsx)
Table S11. Disruption of Chromatin Contacts by Non-coding DGAP Breakpoint Positions

The number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in Hi-C datasets of 20 and 40 Kb resolution of H1-hESC (see Dixon et al.7 in Document S1) (Esc_20kb_HindIII_rep1, Esc_20kb_HindIII_rep2, Esc_40kb_hindIII_combined, Esc_40kb_hindIII_rep1, Esc_40kb_hindIII_rep2), 20 and 40 Kb resolution of IMR90 (Dixon et al.7) (IMR90_20kb_hindIII_rep1, IMR90_20kb_hindIII_rep2, IMR90_40kb_hindIII_combined, IMR90_40kb_hindIII_rep1, IMR90_40kb_hindIII_rep2), 100Kb and 1Mb resolution of GM06990 (http://epigenomegateway.wustl.edu/) (GM06990_obsexp_100kb, GM06990_obsexp_1mb) and looplists from Rao et al.8 (see Document S1) for GM12878 and IMR90 (GSE63525_GM12878_primary+replicate_HiCCUPS_looplist, GSE63525_IMR90_HiCCUPS_looplist).

mmc12.xlsx (17.5KB, xlsx)
Table S12. Disruption of GM12878 Chromatin Contacts at Various Resolution Levels by Non-coding DGAP Breakpoint Positions

The number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in the 50Kb, 100Kb, 250Kb, 500Kb and 1Mb resolution Hi-C datasets from Rao et al.8 (see Document S1) for GM12878.

mmc13.xlsx (15.2KB, xlsx)
Table S13. Disruption of Predicted Disrupted ENCODE Distal DHS/Enhancer-Promoter Connections by Non-coding DGAP Breakpoint Positions

The number of predicted ENCODE distal DHS/enhancer-promoter connections (see Thurman et al.9 in Document S1) (promoter_DHS_chr, promoter_DHS_start, promoter_DHS_end, promoter_DHS_gene, distal_DHS_chr, distal_DHS_start, distal_DHS_end, promoter_distal_DHS_correlation) by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their ± 500 Kb analysis windows (window_start, window_end).

mmc14.xlsx (180.1KB, xlsx)
Table S14. Genes with Predicted Disrupted ENCODE Distal DHS/Enhancer-Promoter Connections by the Non-coding DGAP Breakpoint Positions

The names of genes (Genes) separated from their predicted enhancers (Disrupted_enh_prom_interactions) (see Thurman et al.9 in Document S1).

mmc15.xlsx (13.9KB, xlsx)
Table S15. Identification of Genes with Potential Position Effects

Candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for each non-coding DGAP breakpoint position (DGAP id, chr, start, end) within their analysis windows (window_start, window_end). Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into consideration. Green row highlight indicates highest ranking gene, and yellow row highlight indicates second best ranking genes.

mmc16.xlsx (583KB, xlsx)
Table S16. Translation of DGAP Clinical Features to HPO Terms

HPO identifiers per DGAP case.

mmc17.xlsx (10.2KB, xlsx)
Table S17. Identification of Genes with Potential Position Effects in Instances of Known Pathogenicity

Candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for the set of known pathogenic rearrangement positive controls (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) from Redin et al.10 (see Document S1). Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into consideration. Yellow row highlight indicates pathogenic genes reported by Redin et al.10 (see Document S1).

mmc18.xlsx (3.9MB, xlsx)
Table S18. Identification of Disrupted Chromatin Contacts between Disrupted DHSs and Enhancers by the Non-coding DGAP Breakpoint Positions

An agnostic search revealed the existence of chromatin contacts between breakpoint-disrupted sequences of DHS sites and gene enhancers in Hi-C data of H1-hESC cells at 40 kb resolution (see Dixon et al.7 in Document S1). The reported genes are our top position effect candidate genes in the region. Table columns report the candidate gene information (Gene_chr, Gene_start, Gene_end, Gene_name), the associated DGAP case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) and the disrupted Hi-C chromatin interaction (HiC_1_chr, HiC_1_start, HiC_1_end, HiC_2_chr, HiC_2_start, HiC_2_end, HiC_1_interaction).

mmc19.xlsx (11.3KB, xlsx)
Table S19. Overlap between Non-coding DGAP Breakpoint Positions and DECIPHER Cases

The number of DECIPHER cases that overlap non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case information such as ID_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype are provided for each overlap.

mmc20.xlsx (5.3MB, xlsx)
Table S20. Genes Contained within DECIPHER Cases Overlapped by Non-coding DGAP Breakpoint Positions

The number of genes contained within overlapped DECIPHER cases by the non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case and gene information such as gene_count, DECIPHER_ID, DECIPHER_chr, DECIPHER_start, DECIPHER_end, DECIPHER_value, DECIPHER_type_rearr, DECIPHER_phenotype and HG_symbol are provided for each overlapped DECIPHER case.

mmc21.xlsx (347.7KB, xlsx)
Table S21. DECIPHER Cases Overlapped by Non-coding DGAP Breakpoint Positions That Fulfilled Non-coding Selection Criteria

The number of DECIPHER cases that have non-coding breakpoints. DGAP comparison case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) is provided, as well as overlapped DECIPHER case information containing id_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype.

mmc22.xlsx (9.4KB, xlsx)
Table S22. Overlap between Non-coding DGAP Breakpoint Positions and dbVar Cases

The number of dbVar cases that overlap non-coding DGAP breakpoints (DGAP_chr, DGAP_start, DGAP_end, DGAP_ID). dbVar case information such as dbVar ID, Start, End, Variant type, Gene, Molecular consequences, Most severe clinical significance, 1000G minor allele, 1000G MAF, GO-ESP minor allele, GO-ESP MAF, ExAC minor allele, ExAC MAF, Publications (PMIDs), Variant allele, Transcript change, RefSeq, Protein change, Molecular consequence, HGVS_c, HGVS_g, HGVS_ng, HGVS_p, Condition, Most severe clinical significance, Submitters, Highest review status and Last evaluated are provided for each overlap.

mmc23.xlsx (155.2KB, xlsx)
Document S2. Article plus Supplemental Data
mmc24.pdf (876.6KB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES