Abstract
Despite their clinical significance, characterization of balanced chromosomal abnormalities (BCAs) has largely been restricted to cytogenetic resolution. We explored the landscape of BCAs at nucleotide resolution in 273 subjects with a spectrum of congenital anomalies. Whole-genome sequencing revised 93% of karyotypes and revealed complexity that was cryptic to karyotyping in 21% of BCAs, highlighting the limitations of conventional cytogenetic approaches. At least 33.9% of BCAs resulted in gene disruption that likely contributed to the developmental phenotype, 5.2% were associated with pathogenic genomic imbalances, and 7.3% disrupted topologically associated domains (TADs) encompassing known syndromic loci. Remarkably, BCA breakpoints in eight subjects altered a single TAD encompassing MEF2C, a known driver of 5q14.3 microdeletion syndrome, resulting in decreased MEF2C expression. This study proposes that sequence-level resolution dramatically improves prediction of clinical outcomes for balanced rearrangements, and provides insight into novel pathogenic mechanisms such as altered regulation due to changes in chromosome topology.
Keywords: Cytogenetics, structural variation, balanced chromosomal abnormality, congenital anomaly, intellectual disability, autism, translocation, inversion, chromothripsis, topologically associated domain (TAD), Hi-C, MEF2C
Balanced chromosomal abnormalities (BCA) are a class of structural variation involving rearrangement of chromosome structure that alters the orientation or localization of a genomic segment without a concomitant large gain or loss of DNA. This class of variation includes inversions, translocations, excisions/insertions, and more complex rearrangements consisting of combinations of such events. Cytogenetic studies of unselected newborns and control adult males estimate a prevalence of 0.2–0.5% for BCAs in the general population1–3. By contrast, an approximate five-fold increase in the prevalence of BCAs detected by karyotyping has been reported among subjects with neurodevelopmental disorders, particularly intellectual disability (1.5%)4 and autism spectrum disorder (ASD; 1.3%)5, suggesting that a meaningful fraction of BCAs may represent highly penetrant mutations in those subjects.
Delineating the breakpoints of BCAs, and the genomic regions that they disrupt, has long been a fertile area of novel gene discovery and has greatly contributed to the annotation of the morbid map of the human genome6–8. Despite their significance in human disease, the clinical detection of this unique class of rearrangements still relies upon conventional cytogenetic methods such as karyotyping that are limited to microscopic resolution (~3–10 Mb)9. The absence of gross genomic imbalances renders BCAs invisible to higher resolution techniques that currently serve as first-tier diagnostic screens for many developmental anomalies of unknown etiology: chromosomal microarray (CMA), which can detect microscopic and sub-microscopic copy number variants (CNVs), or whole-exome sequencing (WES), which surveys single nucleotide variants within coding regions. We have recently shown that innovations in genomic technologies can efficiently reveal BCA breakpoints at nucleotide resolution with a cost and timeframe comparable to clinical CMA or karyotyping; however, only a limited number of BCAs have been evaluated to date7,10–15
In this study, we explored several fundamental but previously intractable questions regarding de novo BCAs associated with human developmental anomalies, such as the origins of their formation, the genomic properties of the sequences that they disrupt, and the mechanisms by which they can act as dominant pathogenic mutations. We evaluated 273 subjects ascertained based upon the presence of a BCA discovered by karyotyping in a proband that presented with a developmental anomaly. We mapped these BCA breakpoints at basepair resolution and created a framework to interpret their significance based on convergent genomic datasets, including CNV and WES data in tens of thousands of individuals. We also integrated data from high-resolution maps of chromosomal compartmentalization in the nucleus to predict long-range regulatory effects16,17, and confirm those predictions with functional validation. Our findings indicate that formation of BCAs involves a variety of mechanisms, that the end-result often reflects substantial complexity invisible to cytogenetic assessment, that BCAs directly disrupt genes likely to contribute to early developmental abnormalities in at least one-third of subjects, and that BCAs can cause long-range regulatory changes due to alterations to the chromosome structure.
RESULTS
Sequencing BCAs reveals cryptic complexity
We sequenced DNA from 273 subjects originating from five primary referral sites that collectively engaged over 100 clinical investigators. Subjects harbored a BCA that was detected by karyotyping and presented with varied congenital and/or developmental anomalies. Most subjects were surveyed using large-insert whole-genome sequencing (liWGS or ‘jumping libraries’; 83%), with the remainder of subjects being analyzed by standard short-insert WGS or targeted breakpoint sequencing (see Online Methods; Supplementary Table 1). Subjects were preferentially selected with confirmed de novo BCAs based on cytogenetic studies or with rearrangements that segregated with a phenotypic anomaly within a family (72.5% of subjects); however, inheritance information was unavailable for one or both parents in the remaining 27.5% of subjects. Subjects harboring BCAs that were inherited from an unaffected parent were excluded from this study. Of interest, 62.6% of subjects received clinical CMA screening prior to enrollment to confirm the absence of a pathogenic CNV (Table 1). Subjects presented with a spectrum of clinical features: congenital anomalies ranged from organ-specific disorders to multisystem abnormalities, as well as neurodevelopmental conditions such as intellectual disability or ASD (Table 1). While no specific phenotypes were prioritized for inclusion (Supplementary Fig. 1), neurological defects were the most common feature in the cohort (80.2% of subjects when using digitalized phenotypes from Human Phenome Ontology [HPO]18; Table 1; Supplementary Table 2).
Table 1.
Affected subjects | Frequency in cohort | |
---|---|---|
Gender | ||
Male | 159 | 58.2% |
Female | 114 | 41.8% |
Co-Segregation | ||
De novo | 184 | 67.4% |
Unknown | 75 | 27.5% |
Inherited, segregating | 14 | 5.1% |
array-CGH analyses | ||
Normal | 139 | 50.9% |
VUS | 32 | 11.7% |
Not Performed | 102 | 37.4% |
Abdomen defects | 54 | 19.8% |
Cardiovascular defects | 41 | 15.0% |
Eye defects | 54 | 19.8% |
Hearing defects | 52 | 19.0% |
Genitourinary defects | 50 | 18.3% |
Growth defects | 64 | 23.4% |
Head/Neck/Craniofacial defects | 140 | 51.3% |
Integument defects | 50 | 18.3% |
Limb defects | 57 | 20.9% |
Musculature defects | 71 | 26.0% |
Neurological defects | 219 | 80.2% |
Behavior disorders | 51 | 18.7% |
Developmental delay | 159 | 58.2% |
Epilepsy | 51 | 18.7% |
Hypotonia | 41 | 15.0% |
ASD/autistic features | 31 | 11.4% |
High functioning ASD | 4 | 1.5% |
Respiratory defects | 30 | 11.0% |
Skeletal defects | 116 | 42.4% |
Clinical description was converted for all 273 subjects into standardized terms using Human Phenotype Ontology (HPO)18, which allowed systematic association with broad phenotypic categories for each enrolled subject.
Breakpoints were identified in 248 of the 273 subjects (90.8%); all subsequent analyses were restricted to these 248 subjects. This success rate was consistent with expectations, as simulation of one million breakpoints in the genome suggested that 7.6% of breakpoints were localized within genomic segments that cannot be confidently mapped by short-read sequencing (Supplementary Fig. 2). Sequencing identified 876 breakpoints genome-wide (Fig. 1a) and revised the breakpoint localization by at least one sub-band in 93% of subjects when compared to the karyotype interpretation (breakpoint positions provided in Supplementary Table 3). Across all rearrangements, 26% (n=65) of BCAs were found to be complex (i.e., involved three or more breakpoints; Supplementary Fig. 3–65), including 5% (n=13) that were consistent with the phenomena of chromothripsis or chromoplexy (complex reorganization of the chromosomes involving extensive shattering and random ligation of fragments from one or more chromosomes)19–23. The most complex BCA involved 57 breakpoints (Supplementary Fig. 59). When analyses were restricted to the 230 subjects for which the karyotype suggested a simple chromosomal exchange, 48 (21%) were determined to harbor complexity that was cryptic to the karyotype, emphasizing the insights that are gained from nucleotide resolution. Across all BCAs, 80.7% resolved to less than ten kilobases of total genomic imbalance, although several cases harbored large cryptic imbalances (mostly deletions) of varied impact (Fig. 1b; Supplementary Table 4). Importantly, only 12.2% had imbalances of >100 kb in this study (9.3% greater than 1 Mb), representing a significantly lower fraction than previous cytogenetic estimates24. Genomic imbalances associated with BCAs were larger on average among subjects without CMA pre-screening, with 15.5% harboring imbalances >1 Mb versus 5.9% in subjects pre-screened by CMA (Fig. 1b; Supplementary Table 4). The total genomic imbalance generally increased with the number of breakpoints, though there were chromothripsis and chromoplexy events that were essentially balanced (e.g., subject NIJ19 involved 13 junctions across five chromosomes that resolved to a final genomic imbalance of only 631 bases).
BCA formation is mediated by multiple molecular mechanisms
Extensive mechanistic studies have been performed on breakpoints of large CNV datasets; however, the limited scale and resolution of BCA studies have precluded similar analyses for balanced rearrangements. Using precise junction sequences from 662 breakpoints, we found that nearly half displayed signatures of blunt-end ligation (45%), presumably driven by non-homologous end joining (NHEJ) (Fig. 1c). A substantial fraction (29%) involved microhomology of 2–15 bp at the breakpoint junction, indicating that template-switching coupled to DNA-replication mechanisms such as microhomology-mediated break-induced replication (MMBIR) contribute to a substantial fraction of BCAs25. A comparable fraction (25%) of junctions harbored micro-insertions of several basepairs, consistent with NHEJ or fork stalling and template switching (FoSTeS) mechanisms (Fig. 1c). Only nine junctions (1%) contained long stretches of homologous sequences (>100 bp) that would be consistent with homology-mediated repair. This is certainly an underestimate given the limitations of short-read sequencing to capture rearrangements localized within highly homologous sequences such as segmental duplications or microsatellites. BCA breakpoint signatures from this study were also compared to 8,943 deletion breakpoints identified in 1,092 samples from the 1000 Genomes Project26, revealing that BCA breakpoints were enriched for blunt-end signatures while depleted for microhomology and large homology sequences compared to deletion breakpoints (Supplementary Fig. 66).
Comparison of the observed breakpoints to 100,000 sets of simulated breakpoints that retained the properties of the observed dataset (see Online Methods) established nominal enrichment for repeat elements (P=0.021) and fragile sites (P=0.043), while no significant enrichment for the other genomic features tested (Supplementary Fig. 67). Incorporating Hi-C interaction data to explore the association between nuclear organization of the chromosomes and BCA formation revealed that pairs of loci comprising a BCA breakpoint did not stem from regions with significantly higher contact patterns in the nucleus17; however, these pairs displayed genome-wide interaction patterns that were more correlated than random pairings (P=0.046; Supplementary Note and Supplementary Fig. 68). These results suggest that DNA fragments involved in BCA formation are more likely to be co-localized in the same or neighboring sub-compartments prior to chromosomal reassembly, though at the sample sizes available they did not necessarily harbor increased direct interactions.
BCA breakpoints associated with congenital anomalies are enriched for functionally relevant loci
While protein-coding sequences represent less than 2% of the human genome, the total genic space in which a structural variation can disrupt a transcript is considerable as the cumulative coverage of transcribed regions is over 60% from recent annotations27. Consistent with this expectation, 67% (589/876) of breakpoints in this study disrupted a gene, and at least one gene was truncated in most BCAs (75%, 186/248), which did not deviate from random expectations (observed n=408 genes, expected n=392±20, P=0.220; Supplementary Fig. 69). The properties of the disrupted genes, however, deviated significantly from randomly breakpoints for several key features, suggesting that the pathogenic impact of BCAs in this cohort is not a consequence of their likelihood to disrupt genes but rather a reflection of the gene(s) that they alter (disrupted genes provided in Supplementary Table 5).
We observed significant enrichment for disruption of genes highly intolerant to truncating mutations, as defined by two independent groups (P=0.027 for Petrovski et al., P=0.0009 for Samocha et al.; Fig. 2a)28,29, embryonically expressed genes (P=0.001)30, and genes previously associated with autosomal dominant disorders (P=0.002)31, whereas no enrichment was observed for genes associated with autosomal recessive disorders (P=0.294; Fig. 2a)31. The strongest enrichment at breakpoints was detected for genes previously associated with developmental disorders (≥2 de novo LoF mutations [dnLoF]) as amalgamated from independent datasets (P=2×10−5; Supplementary Table 6). Significant enrichment was also observed at breakpoints for FMRP-target genes and chromatin remodeling genes32,33, consistent with the association of genes implicated in neurodevelopmental disorders (Fig. 2b)7,30,34–37, but not CHD8 target genes38,39. When further incorporating expression data of the developing brain from BrainSpan40, truncated genes showed higher expression patterns during early developmental stages than randomly simulated datasets (Supplementary Fig. 70). By contrast, there was no significant enrichment of genes associated with schizophrenia41,42, or gene-sets associated with complex disorders that were considered as negative controls such as type-II diabetes, cancer, or height. Subgroup analyses revealed that most enrichment signals were driven by the predominance of neurological abnormalities among the subjects (Supplementary Fig. 71).
BCAs predominantly contribute to developmental anomalies by direct gene truncation
We next asked the fundamental question: “How often does a BCA represent a likely pathogenic mutation that contributes to the subject’s abnormal developmental phenotype?” We built an interpretation framework using categories comparable to those established by ClinVar and the Deciphering Developmental Disorders consortium (DDD)43; however, we restricted interpretation of potential clinical relevance to Pathogenic or Likely Pathogenic, as detailed below and in Supplementary Table 7. All other variants were interpreted as Variant of Unknown Significance (VUS; the predicted impact for each BCA is provided in Supplementary Table 8).
Pathogenic
We compared loci disrupted by BCAs to genes that had been robustly associated with dominant developmental disorders (≥3 reported cases with dnLoF in OMIM, DDD, and amalgamated large-scale sequencing studies in neurodevelopmental disorders; see Supplementary Note and Supplementary Table 6). In total, 66 subjects (26.6%) harbored Pathogenic BCAs that disrupted these previously defined developmental loci either through direct gene disruption or genomic imbalance (Fig. 2c; Table 2; Supplementary Table 9). In the majority of these subjects (53/66), the rearrangement truncated a high confidence syndromic locus. These included known drivers of recurrent microdeletion syndromes (e.g., SATB2, MBD5, EHMT1, NFIA, ZBTB20)44–48, loci associated with imprinting disorders (SNURF-SNRPN), and genes well-established as highly penetrant loci in developmental disorders (e.g., CHD7, CHD8, CDKL5, CUL3, DYRK1A, GRIN2B), as well as more recently implicated genes such as AHDC1, CTNND2 and WAC (Fig. 2c; Table 2; Supplementary Table 9)49–51. Several genes were disrupted in two or more subjects, further confirming their role in developmental anomalies (Table 2). Importantly, ten subjects harbored BCAs that disrupted genes associated with dominant disorders for which the expected phenotype was not reported in the proband (e.g. cardiovascular defects, childhood or late-onset hearing loss, neurodegenerative disorder; Supplementary Table 9). In these subjects, the rearrangements could represent pleiotropy (i.e. disruption of the same locus that can manifest in multiple distinct phenotypes) or incidental findings, and were thus interpreted as VUS. In the remaining 13 subjects with Pathogenic BCAs (13/66), genomic imbalances at the breakpoints either overlapped with known microdeletion/microduplication syndromes, or encompassed a gene associated with a dominant developmental disorder (e.g., 12p12.1-p11.22 deletion encompassing SOX5; Table 2; Fig. 2c).
Table 2.
Pathogenic | |
Genomic imbalances at breakpoints | 2q24.3 deletion (SCN9A); 4q34 deletion; 6q13-q14.1 deletion (PHIP)a; 6q14.1 deletion (TBX18)b; 6q22.1–22.31 deletion (GJA1); 10p15.3-p14 deletion (GATA3); 11p14.2 deletion; 12p12.1-p11.22 deletion (SOX5, PTHLH); 13q14.2 deletion; 14q12-q21.1 deletion (NFKBIA, NKX2-1)c; 18p11.32-p11.22 deletiond; 19q12-q13.11 deletion; Xq25 duplication |
Gene disruption | AHDC1; AUTS2(x2); CAMTA1; CDKL5; CHD7; CHD8; CTNND2; CUL3; DYRK1A; EFTUD2; EHMT1; FGFR1; FOXP1; FOXP2; GRIN2B; IL1RAPL1; KAT6B; KDM6A(x2); MBD5(x3); MEF2C; MTAP; MYT1L(x2); MYO6e; NALCN; NFIA; NFIX; NODAL; NOTCH2; NR2F1; NR5A1; NRXN1; NSD1; PAK3; PDE10A; PHF21A(x2)d; PHIPe; SATB2; SCN1A; SMS; SNRPN-SNURF(x3); SOX5(x2); SPAST; TCF12; TCF4; WAC; ZBTB20(x2) |
Likely Pathogenic | |
Genomic imbalances at breakpoints | 2p21-p13.3 duplication (NRXN1) |
Gene disruption | ARIH1; BBX; CACNA2D3; CACNA1C; CADPS2f CDK6(x2); CELSR1; EP400g; GNB1; GRM1h; KCND2; MDN1; NFIB; NPAS3(x4)c,i; NRXN3; PRPF40A; PSD3j; PTPRZ1(x3)a,f; ROBO2; SHROOM4g; SPTBN1; SYNCRIP(x2)b,j; STXBP5h; UPF2; 11p15 region |
Positional effect | FOXG1(x4)i; MEF2C(x7); PITX2; SATB2(x3)j; SLC2A1; SOX9; SRCAP |
Details on BCA interpretation are provided in Methods and Supplementary Table 7. Genes that have been associated to dominant developmental disorders and encompassed by genomic imbalances at breakpoints are indicated in brackets; lower-scripts indicate when a gene was disrupted by a BCA in multiple subjects; upper-scripts report subjects with a BCA disrupting multiple genes/loci that may each contribute to their developmental phenotype and to distinct clinical features;
: Subject DGAP133;
: Subject DGAP317,
: subject DGAP002,
: subject DGAP316,
: subject NIJ2,
: subject DGAP168,
: subject DGAP172,
: DGPA196;
: DGAP246;
: DGAP237.
Likely Pathogenic
Each specific rearrangement effectively represents a private event, which is a major challenge for interpretation in genomic studies. To interpret variants as Likely Pathogenic, we relied on convergent genomic evidence from large-scale datasets, postulating that candidate genes associated with congenital anomalies or early developmental defects would show evidence of intolerance to haploinsufficiency. Thirty-one subjects harbored BCAs that were considered Likely Pathogenic (Table 2; Supplementary Tables 8, 10). In 25 subjects, the rearrangement directly disrupted a gene intolerant to dnLoF, and in which dnLoF mutations had been previously reported in isolated cases (1 or 2 subjects, with an additional subject now represented by the BCA in our study; e.g. CACNA2D3, ROBO2, NFIB), some of which had strong biological support for involvement in developmental anomalies (EP400, STXBP5, NRXN3). There were also several genes disrupted in multiple subjects from the cohort (NPAS3, PTPRZ1, SYNCRIP; Table 2, Supplementary Tables 10–11). Two subjects had BCAs likely associated with genomic disorders: one involved a 2p21-p13.3 duplication encompassing NRXN1, and the other disrupted the imprinted 11p15 region associated with Silver-Russel syndrome (MIM#180860). In the remaining four subjects with Likely Pathogenic BCAs, the rearrangement truncated genes that were associated with developmental disorders, yet only activating or missense mutations had been previously reported (e.g., CACNA1C and GNB1)52,53, proposing a dosage sensitive model for these loci. Based on these results, we interpreted that 12.5% (31/248) of subjects harbored a BCA that likely contributed to the developmental phenotype by disrupting potentially novel candidate genes or disease mechanisms.
Collectively, these data suggest that 39.1% (97/248) of subjects have a phenotype that can be at least partially explained by haploinsufficiency or dosage alteration of an individual gene or locus (Fig. 2c; Supplementary Tables 8–10). Importantly, the overall diagnostic yield was significantly higher in subsets of the group, such as among those subjects who harbored de novo or co-segregating BCAs compared to subjects for whom inheritance was unknown (Fig. 2d), or among subjects who had not been screened clinically by CMA prior to enrollment (Fig. 2e). Despite these substantial yields, the marked increase in the frequency of BCAs associated with birth defects compared to the general population still suggests that alternative mutational mechanisms, other than direct gene disruption, may account for the developmental defects in a fraction of subjects for which the BCAs were interpreted as VUS.
Positional effects via disruption of long-range regulatory interactions
Clusters of BCA breakpoints within intergenic regions may suggest disruption of strong regulatory elements that contribute to disease manifestation via positional effects. Alternatively, this could reflect recurrent rearrangements due to fragile sites and/or recombination hotspots. To isolate genomic regions in which an unusual number of BCA breakpoints were localized, we partitioned the genome into 1 Mb bins. Remarkably, one genomic segment, localized to 5q14.3, achieved genome-wide significance and harbored breakpoints from eight independent BCAs (P=8×10−9; Fig. 3a).
All BCA breakpoints from the 5q14.3 cluster mapped to a region overlapping with the previously described 5q14.3 microdeletion syndrome for which almost 100 subjects have been previously reported, with MEF2C as the proposed genetic driver (Fig. 3b)54–60. However, there are reported deletions that do not encompass MEF2C (Fig. 3b), and we now report seven BCAs distal to MEF2C in subjects with comparable phenotypes to those harboring direct disruption of MEF2C, challenging the hypothesis that direct disruption of MEF2C is a necessary cause of the syndrome. When combining data from the literature, a total of 11 subjects harbor balanced rearrangement breakpoints localized to the same 1 Mb region within 5q14.3 (Fig. 3b)14,54,59. One BCA directly disrupted MEF2C while the remaining 10 mapped to intergenic regions distal to MEF2C; none included a breakpoint disrupting a locus of known significance elsewhere in the genome, suggesting that an alternative mechanism to direct gene disruption was operating in the 5q14.3 region. All 10 BCAs with intergenic breakpoints were predicted to disrupt a topologically associated domain (TAD) containing MEF2C (Fig. 3b). TADs are structured chromatin domains of increased interactions that typically define a local regulatory unit bridging regulatory elements together with their target genes61. Their disruption by genomic rearrangements can lead to impaired gene regulation and therefore disease pathogenesis62–64. Correspondingly, in four subjects that harbored BCA breakpoints up to 860 kb distal to MEF2C, and for which RNA from lymphoblastoid cell lines (LCLs) was available, MEF2C expression was significantly reduced compared to controls (Fig. 3d). These analyses indicate that alteration of the TAD architecture in this genomic disorder region can disrupt normal MEF2C expression. When integrated with existing data, the converging clinical features suggest that multiple distinct mutational mechanisms can result in presentation of 5q14.3 microdeletion syndrome: (1) direct disruption of MEF2C via dnLoF mutations, (2) deletions including MEF2C, and (3) long-range positional effects from deletions and BCAs that do not directly truncate MEF2C yet disrupt its normal function via alteration of the TAD structure (Fig. 3c).
Beyond 5q14.3, three other loci were suggestive of an accumulation of BCA breakpoints (2q33.1, 6q14.3 and 14q12, each containing BCA breakpoints from four independent subjects), although they did not reach genome-wide significance (P=1×10−4; Fig. 3a). At 2q33.1, one BCA disrupted SATB2, associated with Glass syndrome and recognized as the established driver of the 2q33.1 microdeletion syndrome7,46, while the remaining three rearrangements were predicted to impact long-range interactions between SATB2 and its regulatory elements (Supplementary Fig. 72). In the 14q12 cluster, all BCA breakpoints were distal to FOXG1, which has been reported in atypical Rett syndrome65–68. The phenotypes associated with all four of these subjects were highly correlated based upon analyses of HPO reported terms (HPO-sim P-value=0.006; see Methods and Supplementary Table 11)69,70, and were consistent with the multiple previous reports of subjects with dysregulation of FOXG1 (Supplementary Fig. 73)65–68,71. At 6q14.3, four BCAs were localized in proximity to SYNCRIP, a highly constrained gene in which dnLoF had been reported in two subjects with neurodevelopmental disorders72. In one subject the BCA directly disrupted SYNCRIP, another subject harbored a breakpoint distal to SYNCRIP that was part of a cryptic 6q14.3 deletion encompassing the full gene, though the impact of the other two BCAs was unclear due to their localization to an adjacent contact domain (Supplementary Fig. 74). Finally, a systematic screen identified four additional subjects in which a TAD disruption could represent a positional effect on known syndromic loci associated with a developmental disorder that closely matched the subject’s phenotype (PITX2, SLC2A1, SOX9, SRCAP; Supplementary Fig. 75–77). In two of these regions, LCLs were available from the corresponding subjects and expression of the proposed driver gene was significantly reduced when compared to controls (SLC2A1 and SRCAP, Supplementary Fig. 75 and 76).
Collectively, 7.3% of subjects harbored a BCA predicted to alter long-range regulatory interactions involving an established syndromic locus with comparable phenotype, recurrently involving MEF2C, SATB2, and FOXG1, while an additional four subjects harbored a BCA that may represent long-range positional effects (two confirmed by expression studies). These data suggest that alterations to TAD structures likely represent a significant component of the deleterious impact of genomic rearrangements.
DISCUSSION
This characterization of BCAs at nucleotide resolution offers new insights into their mechanisms of formation, the properties connected to their rearrangement in the nucleus, and a substantial yield of potentially novel genes associated with human development. These results also emphasize that neither the mere presence of a BCA in a subject with developmental anomalies nor the number of genes it disrupts (if any) provide sufficient prognostic power, but rather that the properties of the specific genes and regions that are altered are the most informative in predicting resultant phenotypes. These data build upon recent studies on genome topology and provide further evidence that alterations to chromosome structure can lead to alternative, yet potentially predictable, pathogenic mechanisms by changing the long-range regulatory architecture of physical interactions and chromatin looping in the nucleus62–64,69. The yield of clinically meaningful results in this study, which ranged from 26.6% to 46.4% of the subjects evaluated, was substantial. Nonetheless, the relative enrichment from cytogenetic studies of BCAs in subjects with developmental abnormalities compared to controls suggests that there are yet additional alternative pathogenic mechanisms associated with de novo chromosomal rearrangements that remain to be discovered4,5.
These data provide an initial vantage of the potential utility of emerging datasets that characterize the nuclear organization of the chromosomes. They propose novel pathogenic mechanisms by which BCAs may operate, which appear to be a consequence of the disruption of long-range interactions between regulatory elements and their target gene62–64,69. Structural variants can indeed easily scramble DNA topology and contact domains with potentially dramatic regulatory consequences. TADs cover a substantial fraction of the genome; therefore, the vast majority of structural variation will perturb one of those domains and cannot constitute a predictive criterion for pathogenicity per se. However, these data propose that the recurrent disruption of a TAD encompassing a high confidence locus beyond what is expected by chance, concomitant with strong phenotypic overlap between the carrier of the variant and haploinsufficiency of the locus in independent subjects and demonstrated effect on gene expression, may represent a first step towards highlighting putative positional effects in the human genome. There is clearly a need for sensitive and specific tools to predict such positional effects caused by long-range regulatory perturbations, and to annotate further the morbid genome with more expansive knowledge of these functional interactions. The fraction of BCAs in this study that may be associated with this pathogenic mechanism is therefore just an entrée into their likely significance as a component of the unexplained genetic contribution to human birth defects.
In terms of evaluating diagnostic strategies, this study further highlights limitations of current diagnostic tools such as karyotyping or CMA in interpreting and detecting BCAs10,12–15. While the capability to visualize the chromosomes and detect de novo BCAs by traditional karyotyping represented a critical leap in genetic diagnostics, as exemplified by the seminal population cytogenetic studies performed by our late co-author, Dorothy Warburton73, the detection of gross chromosomal abnormalities provides limited prognostic capability. Our data demonstrate that karyotyping significantly underestimates complex rearrangements and is almost always revised by at least a sub-band. Karyotyping is also insensitive to genomic imbalances that cannot be directly visualized (~5–10 Mb). By comparison, CMA is generally recommended as a first-tier diagnostic screen given its sensitivity to detect submicroscopic CNVs, yet it is blind to copy-neutral events such as those described herein. This study provides critical new insights into the fraction of BCAs that can be ascertained by CMA analyses. Compared to cytogenetic estimates suggesting that up to 40% of BCAs resolved as unbalanced rearrangements and could therefore be ascertained using CMA24, whole-genome sequencing in this cohort suggests that, even at the resolution of 100 kb, only about 12% of BCAs involved a genomic imbalance. If we consider only the 102 subjects for whom no CMA was previously performed, this proportion increases to 18.8% at 100 kb resolution and 17.6% at 500 kb resolution, suggesting that 81.2–82.4% of BCAs in this study would be inaccessible to most CMA platforms routinely used in clinical diagnostics. Notably, there is still benefit to an initial CMA screen, as is illustrated by the significantly lower yield of pathogenic BCAs among subjects who had been pre-screened by CMA (19–37%) compared to those who had not (41–64%; Fig. 2e), indicating that a fraction of pathogenic variation in these genomes was captured by the CMA prescreen either in relation to or independent of the BCA.
These data strongly argue for the implementation of technologies capable of detecting both balanced and unbalanced genomic rearrangements. This could be achieved by using a conventional cytogenetic test followed by a reflex WGS analysis when an abnormality is detected, which we have previously demonstrated can provide access to all classes of structural variation in the human genome in a relatively rapid timeframe11,74. Despite its great promise, it is important to recognize the limitations of massively parallel sequencing in routine cytogenetic practice. This study used large-insert jumping libraries to maximize physical coverage and minimize cost per base of genome covered. Yet these analyses failed to reveal breakpoints in 9% of BCAs tested, and our simulations indicate that at large sample sizes, we would anticipate ~7–8% of breakpoints to be undetectable by short-read sequencing. As sequencing technologies and analytical capabilities improve, this component of the variant spectrum that are recalcitrant to short-read sequencing will become more tractable to genomic approaches, and the future implementation of long-read sequencing may revolutionize the capacity to survey currently inaccessible segments of the human genome75,76.
In conclusion, these data indicate that de novo BCAs represent a highly penetrant mutational class in human disease, and that their delineation can provide prognostic insights not available at current cytogenetic resolution. Although encouraging, this yield does not explain all of the developmental anomalies in this cohort and suggests that additional pathogenic mechanisms await discovery. A meaningful fraction may be attributable to novel genes or regulatory alterations, but additional pathogenic mechanisms remain to be explored such as recessive modes of inheritance, gene fusions, disruption of imprinted regions, enhancer adoption69,77, and more complex oligogenic models. Evaluation of extremely large cohorts will be required to resolve further such mechanisms, and characterization of BCAs in control populations would benefit annotation of the morbid human genome and interpretation of the biological and clinical consequences of its structural rearrangement.
ONLINE METHODS
Subject Ascertainment
Subjects were enrolled through cytogenetic reference centers including DGAP (the Developmental Genome Anatomy Project) of Brigham and Women’s Hospital and Massachusetts General Hospital, Boston, MA; Mayo Clinic, Rochester, MN; University Medical Center, Utrecht, NL; Radboud University Medical Center, Nijmegen, NL. Enrollment was based on the presence of a developmental anomaly and concomitant BCA (de novo or that segregated with the abnormal phenotype) detected by karyotyping, and exclusion of clinically significant genomic copy number imbalances using chromosomal microarray analyses (SNP array or array-CGH) when possible (171/273 tested subjects; Supplementary Fig. 1). In the majority of cases the BCA was confirmed to have arisen de novo by karyotyping (184/273) or segregated with a developmental phenotype in the family (14/273). In a subset of subjects: (1) the BCA was inherited but the phenotype of the transmitting parent was not available (3/273); (2) one parent was available and did not harbor the BCA (4/273); or (3) neither parents were available for testing (68/273). An informed consent was obtained from all subjects or their legal representative for participation in the study. All studies were approved by respective Institutional Review Boards.
Whole-genome sequencing using large-insert jumping libraries
Blood samples were collected from all subjects and their parents when available. DNA was extracted from blood or from freshly derived LCLs. Samples were prepared using multiple sequencing methods over several years (Supplementary Table 1). Most samples were sequenced using whole-genome large-insert jumping library preparation protocols for subsequent Illumina sequencing: 149 using our 2×25-bp EcoP15l protocol11,80, 59 using a variant of our jumping library protocol in which we randomly shear circularized DNA, which enables longer reads (paired-end 50 bp, see Supplementary Note) and 19 using standard Illumina mate-pair protocols. All large-insert sequencing methods allowed generation of paired-end reads with median insert size of 2.5–3.5 kb as opposed to 300 bp using conventional methods. A subset of samples were prepared with standard short-insert paired-end protocols (n=12) or targeted sequencing of the breakpoints based on previous positional cloning to narrow the breakpoint regions (n=34), as previously described7,11,81. Of note, 87 BCAs had been initially reported in the literature, though many had not been mapped to sequence resolution (Supplementary Table 1).
Digitalization and homogenization of reported phenotypes
Clinical description was converted for all 273 subjects into standardized terms using Human Phenotype Ontology (HPO; Supplementary Table 2)18. Such digitalization allowed systematic comparison of phenotypes between subjects carrying BCAs that disrupted the same gene, as well as between subjects with a disrupted gene to previously described subjects using Phenomizer82. HPO-sim was used to compute phenotypic similarity scores between subjects sharing the disruption of the same gene or locus compared to random expectations (Supplementary Table 11). P-values were generated as the proportion of simulated scores greater than the observed probands’ score, alike described by the authors70. HPO-digitalization also allowed the generation of heatmaps summarizing the correlation between disrupted genes and phenotypes reported in subjects. For each gene, the number of HPO terms belonging each broad HPO categories was computed18. The matrix was then z-score transformed by gene, and clustering was performed using a distance matrix of correlation coefficients and average agglomeration (Figure 4).
BCA discovery pipeline and breakpoint inference
All computational analyses have been previously described74,83. In brief, reads were reverse-complemented and aligned using BWA84. Anomalous read-pairs in terms of insert size, mate mapping, or mate orientation were extracted using Sambamba and clustered using ReadPairCluster, our single-linkage clustering algorithm11,85. Anomalous read-pair clusters meeting our established thresholds of structural variation were subsequently classified based on their read-pair orientation signature into the following categories: deletions, insertions, inversions, and translocations83. When no clusters were found that matched the proposed karyotype, BAM files were agnostically analyzed and manually inspected for anomalous pairs or split reads. Breakpoints were successfully identified in 248 of 273 cases, leading to an overall breakpoint fine-mapping yield of 91%. All subsequent counts and yields were computed relative to mapped cases (n=248). For the remaining 25 unmapped cases, no breakpoints were identified in proximity to the karyotype interpretation following extensive analyses and visual inspection. For the majority of these latter unresolved cases, one or more breakpoints were interpreted by the karyotype to localize near centromeres heterochromatic regions, or within segmental duplications, which are recognized to be blind spots for short-read alignments. All large genomic imbalances predicted to be connected to BCA breakpoints following rearrangement reconstruction were confirmed to have aberrant depth of coverage using a custom R-script (CNView: https://github.com/RCollins13/CNView).
When additional DNA was available, precise breakpoint junctions were delineated at base-pair resolution by Sanger sequencing and final breakpoints coordinates were reported; else the reported coordinates reflect the closest breakpoint estimates based on the resolution of the jumping libraries (Supplementary Table 3). A total of 82.7% (725/876) of the reported breakpoints could be tested by Sanger sequencing given DNA availability, among which 662 were confirmed yielding a minimum estimate of 91.3% (662/725) sensitivity for our mapping method.
Molecular signature of BCA breakpoints
As previously described22, we processed all Sanger sequences from validated breakpoints with the BWA Smith-Waterman algorithm (modified parameters z 100 -t 3 -H -T 1) to retrieve precise breakpoint coordinates as well as infer the associated microhomology, micro-insertions or blunt end signature. This approach was sufficiently high-throughput to enable the direct comparison of BCA breakpoints with a large set of deletion breakpoints published by Abyzov et al.26, at the cost of not allowing concomitant microhomology and base insertions at breakpoints.
Monte-Carlo randomization tests and associated statistics
A Browser Extensible Data (BED) file containing GRCh37/hg19 genomic coordinates of all 876 breakpoints detected by WGS was used as the input. One simulation consisted of generating random coordinates based on each pair of input coordinates, conserving the size of the feature as well as the intra-chromosomal distance when several breakpoints were localized to the same chromosome in a single individual. N-masked regions were excluded from simulations for consistency as they were excluded from the initial alignment mapping. Simulations were repeated 100,000 times. The number of unique intersections between the shuffled file and a BED-file containing features of interest (gene-sets, regulatory elements, etc.) was retrieved for each simulation, and the final sets of simulations delineated the expected distribution on intersections under the null hypothesis. The observed value of intersected features in this study was compared to this expected distribution. Empirical Monte-Carlo P-values were indicated, and were calculated as follows: P-value = (r + 1)/(n + 1), where r is the number of observations within the set of simulations that are at least as extreme as the one observed, and n is the total number of simulations86. References for all functional element datasets and genesets that were used to test for enrichment at breakpoints in the cohort are detailed in Supplementary Table 12.
To isolate genomic regions in which an unusual number of BCA breakpoints were localized, we partitioned the genome into 1 Mb bins using a sliding window of 100 kb, and counted the number of BCA breakpoints coming from independent subjects. The same approach was performed for 100,000 sets of simulated breakpoints generated as detailed previously. P-values were computed by comparing observed to expected cluster sizes after 100,000 Monte Carlo randomizations, and corrected for the total number of windows interrogated. Genome-wide significance was achieved for clusters with P-values below 1.6×10−6.
BCA outcome interpretation
To build reference lists of genes associated with dominant developmental disorders we amalgamated data from multiple large-scale exome sequencing, genome sequencing, or CNV studies investigating developmental (e.g. DDD consortium) and neurodevelopmental disorders (mostly intellectual disability, autism, and epilepsy cohorts; see Supplementary Note and Supplementary Table 6 for detailed references). We then built our interpretation using standard categories comparable to those established by ClinVar and the Deciphering Developmental Disorders consortium (DDD)43, as detailed below and in Supplementary Table 7.
Pathogenic: Confirmed Loci associated with developmental disorders
Any gene with three or more de novo LoF mutations (frameshift, nonsense or splice mutation, CNV, or BCA) reported from independent cases in those amalgamated studies or in OMIM was considered as high confidence for a particular phenotype, and any BCA impacting one of those loci was therefore considered to be Pathogenic (Supplementary Table 9).
Likely Pathogenic: Novel candidate genes or mechanisms
To evaluate the impact of the remaining BCAs and the genes they likely impacted, we relied on convergent genomic evidence from other large-scale datasets to prioritize which gene would most likely contribute to the subject’s phenotype. Multiple BCAs were considered to be Likely Pathogenic, based on various evidences (Supplementary Table 10):
Disruption of a likely risk factor: Disruption of one copy of a gene in which one or two dnLoF mutations had been previously reported and which demonstrated significant constraint (top 10% of constrained genes)28,29
Novel mechanisms: Disruption of a gene established as associated with dominant developmental disorders yet with a distinct mutation type (e.g. activating or missense mutations while we reported LoF)
Disruption of long-range interactions: BCA breakpoints located in the vicinity of a gene associated with dominant developmental disorders in a subject with a consistent phenotype, and predicted to impact long-range regulatory interactions.
VUS
All BCAs impacting genes not fitting in any of the above-mentioned categories were considered as VUS.
Predicted disruption of contact domains by BCAs
Topological associated domains (TADs) and predicted loops for lymphoblastoid cells were retrieved from Dixon et al. and Rao et al.17,61, and genes contained within a domain for which at least one of its insulating boundaries was disrupted by a BCA were assessed. Only genes that had been previously robustly associated with dominant developmental disorders (i.e., with dnLoF reported in three or more subjects) were considered for potential positional effects. A detailed comparison of the reported phenotypes in the corresponding subjects to phenotypes associated with disrupted genes in the literature was performed. For subjects identified with a BCA of plausible positional effect, the region was visualized using Juicebox87 (Supplementary Fig. 72–77). Heatmaps represent observed intrachromosomal interactions in GM12878 lymphoblastoid cells in a specific window; previously reported contact domains (regions of increased contact, not necessarily materializing as loops) and loops (sites of increased focal contacts indicating the presence of a loop) were indicated17,61, as well as the RefSeq genes located in the region.
Measuring gene expression from lymphoblasts
In subjects for whom the BCA was suspected to result in positional effects and for whom LCLs derived from blood were available, gene expression was investigated by quantitative RT-PCR. LCLs were not tested for mycoplasma contamination. Total RNA was extracted from LCLs using TRIzol® (Invitrogen) followed by RNeasy Mini Kit (Qiagen) column purification. cDNA was synthetized from 750 ng of extracted RNA using SuperScript® II Reverse Transcriptase (ThermoFisher Scientific with oligo(dT), random hexamers, and RNase inhibitor. Quantitative RT-PCR was performed for mRNA expression of genes of interest in the following subjects (MEF2C: DGAP131, DGAP191, DGAP218, DGAP222; SATB2: DGAP237; SLC2A1: DGAP170; SRCAP: DGAP134) using custom designed primers (see Supplementary Note). ACTB, GAPDH and POLR2A were each used as independent endogenous controls. Custom designed primers (0.75 μM final), cDNA (1:100 final) and nuclease-free water were added to the LightCycler® 480 SYBR Green I Master Mix (Roche) for a final 10 μL reaction volume. A LightCycler® 480 (Roche) was used for data acquisition. Values of each individual (subject or control) were obtained in triplicates of similar variance. Results of triplicates for each gene of interest were normalized against the average of the three endogenous controls (ACTB, GAPDH and POLR2A). Normalized expression levels were set in relation to eight age and sex-matched controls for the genes of interest SATB2, SLC2A1 and SRCAP, or 16 (eight males, eight females) age-matched controls for the gene of interest MEF2C, using the ΔΔCt method. Results are expressed as fold-change relative to the averaged control individuals. The significance of differential gene expression from a subject in comparison to controls was tested using a two-sided Wilcoxon Mann-Whitney test. All qRT-PCR results were independently replicated twice in the laboratory.
Supplementary Material
Acknowledgments
We are infinitely grateful for the seminal work led by our co-author, Prof. Dorothy Warburton, who passed away during review of this manuscript. Dr. Warburton was a pioneer in cytogenetic research and a close colleague, mentor, and friend to so many in the cytogenetics community. We wish to thank all subjects and families who have been enrolled in this study, as well as the countless genetic counselors and clinical geneticists who contributed to the ascertainment of subjects. This study was supported by: the National Institutes of Health (grant GM061354 to M.E.T., J.F.G., C.C.M. and E.L.; grants MH095867 and HD081256 to M.E.T.), the March of Dimes (6-FY15-255 to M.E.T.), the European Molecular Biology Organization and the Marie Curie Actions of the European Commission (fellowship EMBO ALTF-183-2015 to C.R.), the Bettencourt-Schueller Foundation (young investigator award to C.R.), the Philippe Foundation (award to C.R.), the Harvard Medical School–Portugal Program in Translational and Clinical Research and Health Information (Fundação para a Ciência e a Tecnologia, HMSP-ICT/0016/2013 to C.C.M and D.D.), the National Science Foundation (NSF Graduate Research Fellowship DGE1144152 to S.L.P.S.), the Fund for Scientific Research – Flanders (B.C. and S.V. are respectively a FWO senior clinical investigator and a FWO postdoctoral researcher), Clinical Medicine Science and Technology Projects of Jiangsu Province (grant BL2013019 to Ha.L. and Ho.L.), the Suzhou Key Medical Center (grant Szzx201505 to Ha.L. and Ho.L.), and the Royal Society of New Zealand (Rutherford Discovery Fellowship to J.C.J.). This study was also supported by the Desmond and Ann Heathwood MGH Research Scholars award to M.E.T
Footnotes
AUTHOR CONTRIBUTIONS
M.E.T, J.F.G, C.C.M, E.T., J.C.H., W.P.K., N.dL. and H.G.B designed the study. C.R., H.B., R.L.C., V.P., I.B., C.C., J.T.G., M.R.S., M.J.vR. and W.P.K., performed computational analyses. C.H., C.M.S., R.A., M-A.An., C.A., E.C., B.B.C., J.K., W.L., P.M., L.M., T.M., D.P., J.R., M.J.W. and A.W. performed cellular, molecular or genomic experiments. T.K., E.M., J.C.H, M-A.Ab., O.A.A-R., E.A.,, S.L.A-E., F.S.A, Y.A., K.A-Y., J.F.A., T.B., J.A.B., E.B., E.M.H.F.B., E.H.B, C.W.B., H.T.B., B.C., K.C., H.C., T.C., D.D., M.A.D., A.D., M.D’H., B.B.A.dV., D.L.E., H.L.F., H.F., D.R.F., P.G., D.G., T.G., M.G., B.H.G., C.G., K.W.G., A.L.G., A.H-K., D.J.H., M.A.H., R.Hi., R.Ho., J.D.H., R.J.H., M.W.H., A.M.I., Mi.I., Me.I., J.C.J., S.J., T.J., J.P.J., M.C.J., S.G.K, D.A.K., P.M.K., Y.L., E.L., K.L., A.V.L., Ha.L., Ho.L., E.C.L., C.L., E.J.L., D.L., M.J.M., G.M., C.L.M., D.M.F., M.W.M., C.Z.M., B.M., S.M., L.R.M., E.M., S.M., T.M., M.E.M., G.M., A.N., Z.O., S.P., S.P.P., S.P., K.P., R.E.P.A., P.J.P., G.P., S.R., L.R., W.R., D.R., I.R., F.R., P.R., S.L.P.S., R.Sh., R.Sp., E.S., B.S., J.T., J.V.T., B.W.vB., J.vdK., I.vDB., T.vE., C.M.vR-A, S.V., C.M.L.V-T., D.P.W., S.W., M.C.A.Y., R.T.Z., B.L., H.G.B., N.dL., W.P.K., E.C.T. C.C.M, and J.F.G. ascertained and enrolled subjects and provided phenotypic information. C.R. and M.E.T. wrote the manuscript, which was approved by all authors.
COMPETING FINANCIAL INTERESTS
The authors have none to declare.
Data availability
All reported breakpoints and their clinical interpretation have been submitted to dbVar (accession number: nstd133) and ClinVar (accession numbers: SCV000320745 to SCV000320992).
References
- 1.Jacobs PA, Melville M, Ratcliffe S, Keay AJ, Syme J. A cytogenetic survey of 11,680 newborn infants. Ann Hum Genet. 1974;37:359–376. doi: 10.1111/j.1469-1809.1974.tb01843.x. [DOI] [PubMed] [Google Scholar]
- 2.Nielsen J, Wohlert M. Chromosome abnormalities found among 34,910 newborn children: results from a 13-year incidence study in Arhus, Denmark. Hum Genet. 1991;87:81–83. doi: 10.1007/BF01213097. [DOI] [PubMed] [Google Scholar]
- 3.Ravel C, Berthaut I, Bresson JL, Siffroi JP Genetics Commission of the French Federation of C. Prevalence of chromosomal abnormalities in phenotypically normal and fertile adult males: large-scale survey of over 10,000 sperm donor karyotypes. Hum Reprod. 2006;21:1484–1489. doi: 10.1093/humrep/del024. [DOI] [PubMed] [Google Scholar]
- 4.Funderburk SJ, Spence MA, Sparkes RS. Mental retardation associated with “balanced” chromosome rearrangements. Am J Hum Genet. 1977;29:136–141. [PMC free article] [PubMed] [Google Scholar]
- 5.Marshall CR, et al. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet. 2008;82:477–488. doi: 10.1016/j.ajhg.2007.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.McKusick VA, Amberger JS. The morbid anatomy of the human genome: chromosomal location of mutations causing disease. J Med Genet. 1993;30:1–26. doi: 10.1136/jmg.30.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Talkowski ME, et al. Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell. 2012;149:525–537. doi: 10.1016/j.cell.2012.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14:125–138. doi: 10.1038/nrg3373. [DOI] [PubMed] [Google Scholar]
- 9.Warburton D. Current techniques in chromosome analysis. Pediatr Clin North Am. 1980;27:753–769. doi: 10.1016/s0031-3955(16)33924-4. [DOI] [PubMed] [Google Scholar]
- 10.Talkowski ME, et al. Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. Am J Hum Genet. 2011;88:469–481. doi: 10.1016/j.ajhg.2011.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Talkowski ME, et al. Clinical diagnosis by whole-genome sequencing of a prenatal sample. N Engl J Med. 2012;367:2226–2232. doi: 10.1056/NEJMoa1208594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schluth-Bolard C, et al. Breakpoint mapping by next generation sequencing reveals causative gene disruption in patients carrying apparently balanced chromosome rearrangements with intellectual deficiency and/or congenital malformations. J Med Genet. 2013;50:144–150. doi: 10.1136/jmedgenet-2012-101351. [DOI] [PubMed] [Google Scholar]
- 13.Utami KH, et al. Detection of chromosomal breakpoints in patients with developmental delay and speech disorders. PLoS One. 2014;9:e90852. doi: 10.1371/journal.pone.0090852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Vergult S, et al. Mate pair sequencing for the detection of chromosomal aberrations in patients with intellectual disability and congenital malformations. Eur J Hum Genet. 2014;22:652–659. doi: 10.1038/ejhg.2013.220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tabet AC, et al. Complex nature of apparently balanced chromosomal rearrangements in patients with autism spectrum disorder. Mol Autism. 2015;6:19. doi: 10.1186/s13229-015-0015-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jin F, et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013;503:290–294. doi: 10.1038/nature12644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kohler S, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42:D966–974. doi: 10.1093/nar/gkt1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Meyerson M, Pellman D. Cancer genomes evolve by pulverizing single chromosomes. Cell. 2011;144:9–10. doi: 10.1016/j.cell.2010.12.025. [DOI] [PubMed] [Google Scholar]
- 20.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kloosterman WP, et al. Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet. 2011;20:1916–1924. doi: 10.1093/hmg/ddr073. [DOI] [PubMed] [Google Scholar]
- 22.Chiang C, et al. Complex reorganization and predominant non-homologous repair following chromosomal breakage in karyotypically balanced germline rearrangements and transgenic integration. Nat Genet. 2012;44:390–397. S391. doi: 10.1038/ng.2202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.De Gregori M, et al. Cryptic deletions are a common finding in “balanced” reciprocal and complex chromosome rearrangements: a study of 59 patients. J Med Genet. 2007;44:750–762. doi: 10.1136/jmg.2007.052787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang F, et al. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet. 2009;41:849–853. doi: 10.1038/ng.399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Abyzov A, et al. Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun. 2015;6:7256. doi: 10.1038/ncomms8256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9:e1003709. doi: 10.1371/journal.pgen.1003709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Samocha KE, et al. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014;46:944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Iossifov I, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515:216–221. doi: 10.1038/nature13908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Berg JS, et al. An informatics approach to analyzing the incidentalome. Genet Med. 2013;15:36–44. doi: 10.1038/gim.2012.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Darnell JC, et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell. 2011;146:247–261. doi: 10.1016/j.cell.2011.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ascano M, Jr, et al. Nature. 2012;492:382–386. doi: 10.1038/nature11737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Iossifov I, et al. De novo gene disruptions in children on the autistic spectrum. Neuron. 2012;74:285–299. doi: 10.1016/j.neuron.2012.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.O’Roak BJ, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485:246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sanders SJ, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485:237–241. doi: 10.1038/nature10945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.De Rubeis S, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515:209–215. doi: 10.1038/nature13772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cotney J, et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat Commun. 2015;6:6404. doi: 10.1038/ncomms7404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sugathan A, et al. CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors. Proc Natl Acad Sci U S A. 2014;111:E4468–4477. doi: 10.1073/pnas.1405266111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hawrylycz MJ, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489:391–399. doi: 10.1038/nature11405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fromer M, et al. De novo mutations in schizophrenia implicate synaptic networks. Nature. 2014;506:179–184. doi: 10.1038/nature12929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Purcell SM, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature. 2014;506:185–190. doi: 10.1038/nature12975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Landrum MJ, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44:D862–868. doi: 10.1093/nar/gkv1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kleefstra T, et al. Loss-of-function mutations in euchromatin histone methyl transferase 1 (EHMT1) cause the 9q34 subtelomeric deletion syndrome. Am J Hum Genet. 2006;79:370–377. doi: 10.1086/505693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lu W, et al. NFIA haploinsufficiency is associated with a CNS malformation syndrome and urinary tract defects. PLoS Genet. 2007;3:e80. doi: 10.1371/journal.pgen.0030080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Rosenfeld JA, et al. Small deletions of SATB2 cause some of the clinical features of the 2q33.1 microdeletion syndrome. PLoS One. 2009;4:e6568. doi: 10.1371/journal.pone.0006568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Talkowski ME, et al. Assessment of 2q23.1 microdeletion syndrome implicates MBD5 as a single causal locus of intellectual disability, epilepsy, and autism spectrum disorder. Am J Hum Genet. 2011;89:551–563. doi: 10.1016/j.ajhg.2011.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rasmussen MB, et al. Neurodevelopmental disorders associated with dosage imbalance of ZBTB20 correlate with the morbidity spectrum of ZBTB20 candidate target genes. J Med Genet. 2014;51:605–613. doi: 10.1136/jmedgenet-2014-102535. [DOI] [PubMed] [Google Scholar]
- 49.DeSanto C, et al. WAC loss-of-function mutations cause a recognisable syndrome characterised by dysmorphic features, developmental delay and hypotonia and recapitulate 10p11.23 microdeletion syndrome. J Med Genet. 2015;52:754–761. doi: 10.1136/jmedgenet-2015-103069. [DOI] [PubMed] [Google Scholar]
- 50.Turner TN, et al. Loss of delta-catenin function in severe autism. Nature. 2015;520:51–56. doi: 10.1038/nature14186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Xia F, et al. De novo truncating mutations in AHDC1 in individuals with syndromic expressive language delay, hypotonia, and sleep apnea. Am J Hum Genet. 2014;94:784–789. doi: 10.1016/j.ajhg.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Splawski I, et al. Severe arrhythmia disorder caused by cardiac L-type calcium channel mutations. Proc Natl Acad Sci U S A. 2005;102:8089–8096. doi: 10.1073/pnas.0502506102. discussion 8086–8088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Petrovski S, et al. Germline De Novo Mutations in GNB1 Cause Severe Neurodevelopmental Disability, Hypotonia, and Seizures. Am J Hum Genet. 2016;98:1001–1010. doi: 10.1016/j.ajhg.2016.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Floris C, et al. Two patients with balanced translocations and autistic disorder: CSMD3 as a candidate gene for autism found in their common 8q23 breakpoint area. Eur J Hum Genet. 2008;16:696–704. doi: 10.1038/ejhg.2008.7. [DOI] [PubMed] [Google Scholar]
- 55.Cardoso C, et al. Periventricular heterotopia, mental retardation, and epilepsy associated with 5q14.3-q15 deletion. Neurology. 2009;72:784–792. doi: 10.1212/01.wnl.0000336339.08878.2d. [DOI] [PubMed] [Google Scholar]
- 56.Engels H, et al. A novel microdeletion syndrome involving 5q14.3-q15: clinical and molecular cytogenetic characterization of three patients. Eur J Hum Genet. 2009;17:1592–1599. doi: 10.1038/ejhg.2009.90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Le Meur N, et al. MEF2C haploinsufficiency caused by either microdeletion of the 5q14.3 region or mutation is responsible for severe mental retardation with stereotypic movements, epilepsy and/or cerebral malformations. J Med Genet. 2010;47:22–29. doi: 10.1136/jmg.2009.069732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zweier M, et al. Mutations in MEF2C from the 5q14.3q15 microdeletion syndrome region are a frequent cause of severe mental retardation and diminish MECP2 and CDKL5 expression. Hum Mutat. 2010;31:722–733. doi: 10.1002/humu.21253. [DOI] [PubMed] [Google Scholar]
- 59.Saitsu H, et al. De novo 5q14.3 translocation 121.5-kb upstream of MEF2C in a patient with severe intellectual disability and early-onset epileptic encephalopathy. Am J Med Genet A. 2011;155A:2879–2884. doi: 10.1002/ajmg.a.34289. [DOI] [PubMed] [Google Scholar]
- 60.Zweier M, Rauch A. TheMEF2C-Related and 5q14.3q15 Microdeletion Syndrome. Mol Syndromol. 2012;2:164–170. doi: 10.1159/000337496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lupianez DG, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lupianez DG, Spielmann M, Mundlos S. Breaking TADs: How Alterations of Chromatin Domains Result in Disease. Trends Genet. 2016;32:225–237. doi: 10.1016/j.tig.2016.01.003. [DOI] [PubMed] [Google Scholar]
- 64.Franke M, et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature. 2016 doi: 10.1038/nature19800. [DOI] [PubMed] [Google Scholar]
- 65.Mencarelli MA, et al. 14q12 Microdeletion syndrome and congenital variant of Rett syndrome. Eur J Med Genet. 2009;52:148–152. doi: 10.1016/j.ejmg.2009.03.004. [DOI] [PubMed] [Google Scholar]
- 66.Ellaway CJ, et al. 14q12 microdeletions excluding FOXG1 give rise to a congenital variant Rett syndrome-like phenotype. Eur J Hum Genet. 2013;21:522–527. doi: 10.1038/ejhg.2012.208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Takagi M, et al. A 2.0 Mb microdeletion in proximal chromosome 14q12, involving regulatory elements of FOXG1, with the coding region of FOXG1 being unaffected, results in severe developmental delay, microcephaly, and hypoplasia of the corpus callosum. Eur J Med Genet. 2013;56:526–528. doi: 10.1016/j.ejmg.2013.05.012. [DOI] [PubMed] [Google Scholar]
- 68.Perche O, et al. Dysregulation of FOXG1 pathway in a 14q12 microdeletion case. Am J Med GenetA. 2013;161A:3072–3077. doi: 10.1002/ajmg.a.36170. [DOI] [PubMed] [Google Scholar]
- 69.Ibn-Salem J, et al. Deletions of chromosomal regulatory boundaries are associated with congenital disease. Genome Biol. 2014;15:423. doi: 10.1186/s13059-014-0423-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Deng Y, Gao L, Wang B, Guo X. HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS One. 2015;10:e0115692. doi: 10.1371/journal.pone.0115692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Brunetti-Pierri N, et al. Duplications of FOXG1 in 14q12 are associated with developmental epilepsy, mental retardation, and severe speech impairment. Eur J Hum Genet. 2011;19:102–107. doi: 10.1038/ejhg.2010.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.McDermott SM, et al. Drosophila Syncrip modulates the expression of mRNAs encoding key synaptic proteins required for morphology at the neuromuscular junction. RNA. 2014;20:1593–1606. doi: 10.1261/rna.045849.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Warburton D. De novo balanced chromosome rearrangements and extra marker chromosomes identified at prenatal diagnosis: clinical significance and distribution of breakpoints. Am J Hum Genet. 1991;49:995–1013. [PMC free article] [PubMed] [Google Scholar]
- 74.Brand H, et al. Cryptic and complex chromosomal aberrations in early-onset neuropsychiatric disorders. Am J Hum Genet. 2014;95:454–461. doi: 10.1016/j.ajhg.2014.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Chaisson MJ, et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2015;517:608–611. doi: 10.1038/nature13907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Huddleston J, et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 2014;24:688–696. doi: 10.1101/gr.168450.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Lettice LA, et al. Enhancer-adoption as a mechanism of human developmental disease. Hum Mutat. 2011;32:1492–1499. doi: 10.1002/humu.21615. [DOI] [PubMed] [Google Scholar]
- 78.Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hanscom C, Talkowski M. Design of large-insert jumping libraries for structural variant detection using illumina sequencing. Curr Protoc Hum Genet. 2014;80:7 22 21–29. doi: 10.1002/0471142905.hg0722s80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Higgins AW, et al. Characterization of apparently balanced chromosomal rearrangements from the developmental genome anatomy project. Am J Hum Genet. 2008;82:712–722. doi: 10.1016/j.ajhg.2008.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Kohler S, et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009;85:457–464. doi: 10.1016/j.ajhg.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Brand H, et al. Paired-Duplication Signatures Mark Cryptic Inversions and Other Complex Structural Variation. Am J Hum Genet. 2015;97:170–176. doi: 10.1016/j.ajhg.2015.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.North BV, Curtis D, Sham PC. A note on the calculation of empirical P values from Monte Carlo procedures. Am J Hum Genet. 2002;71:439–441. doi: 10.1086/341527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Durand NC, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.