Detection of structural mosaicism from targeted and whole-genome sequencing data

Daniel A King; Alejandro Sifrim; Tomas W Fitzgerald; Raheleh Rahbari; Emma Hobson; Tessa Homfray; Sahar Mansour; Sarju G Mehta; Mohammed Shehla; Susan E Tomkins; Pradeep C Vasudevan; Matthew E Hurles; The Deciphering Developmental Disorders Study

doi:10.1101/gr.212373.116

. 2017 Oct;27(10):1704–1714. doi: 10.1101/gr.212373.116

Detection of structural mosaicism from targeted and whole-genome sequencing data

Daniel A King ¹, Alejandro Sifrim ¹, Tomas W Fitzgerald ¹, Raheleh Rahbari ¹, Emma Hobson ², Tessa Homfray ³, Sahar Mansour ³, Sarju G Mehta ⁴, Mohammed Shehla ⁵, Susan E Tomkins ⁶, Pradeep C Vasudevan ⁷, Matthew E Hurles ¹; The Deciphering Developmental Disorders Study¹

PMCID: PMC5630034 PMID: 28855261

Abstract

Structural mosaic abnormalities are large post-zygotic mutations present in a subset of cells and have been implicated in developmental disorders and cancer. Such mutations have been conventionally assessed in clinical diagnostics using cytogenetic or microarray testing. Modern disease studies rely heavily on exome sequencing, yet an adequate method for the detection of structural mosaicism using targeted sequencing data is lacking. Here, we present a method, called MrMosaic, to detect structural mosaic abnormalities using deviations in allele fraction and read coverage from next-generation sequencing data. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) simulations were used to calculate detection performance across a range of mosaic event sizes, types, clonalities, and sequencing depths. The tool was applied to 4911 patients with undiagnosed developmental disorders, and 11 events among nine patients were detected. For eight of these 11 events, mosaicism was observed in saliva but not blood, suggesting that assaying blood alone would miss a large fraction, possibly >50%, of mosaic diagnostic chromosomal rearrangements.

Genetic mutations that arise post-zygotically lead to genetic heterogeneity in an organism, a phenomenon called mosaicism. The detection of mosaic mutations that are small (single-base or indel) is still a great technical challenge, but can be achieved in specific experimental setups to restrict the number of candidate mutations (e.g., matched tumor-normal samples in cancer to discover somatic mutations) (Genovese et al. 2014; Jaiswal et al. 2014). However, multi-megabase (structural) mosaic rearrangements are now routinely detected using cytogenetics and microarray technology (Miller et al. 2010; Biesecker and Spinner 2013). Recent single-nucleotide polymorphism (SNP) microarray-based studies have demonstrated that mosaic structural abnormalities are implicated in developmental disorders (Conlin et al. 2010; King et al. 2015), increase in incidence with age (Forsberg et al. 2012), and predispose to hematological malignancies in adults (Jacobs et al. 2012; Laurie et al. 2012).

Modern SNP microarray technology is well suited for detecting mosaicism, because probe density is high (often above 1 million sites per genome) and probes generate allele ratio data with high signal-to-noise ratio. SNP microarray platforms assess two metrics useful for mosaicism detection: (1) b allele frequency (BAF), the fraction of the alleles at a locus representing the less-common allele; and (2) log R ratio (LRR), a measure of copy number, based on the log ratio of signal intensity compared to a reference. These metrics are affected differently depending on the nature of the structural abnormality: whereas copy-neutral (loss of heterozygosity [LOH]) mosaicism results in a deviation of BAF alone, copy-number (gain or loss) mosaicism additionally alters the LRR. Absolute deviation from genotype-expected BAF (e.g., 0.5 for AB genotype), called B-deviation (B_dev), occurs in mosaic regions when the locus has a mixture of genotypes from wild-type and mosaic tissue. Several software tools, such as Partek Genomics Suite, Illumina cnvPartition, BAFsegmentation (Staaf et al. 2008) and Mosaic Alteration Detection (MAD) (González et al. 2011), harness this deviation as a mosaic signal. MAD is open source and has been recently used in several large SNP-based mosaicism projects (Forsberg et al. 2012, 2014; Jacobs et al. 2012); it identifies mosaic segments using aberrations in B_dev and then labels aberrant segments as copy-loss, copy-gain, or copy-neutral events based on the alteration of the LRR from baseline, a deviation referred to here as copy-deviation, or C_dev. In contrast to loss of heterozygosity, other types of balanced structural variants, notably inversions and balanced translocations, do not typically disrupt BAF or LRR, cannot typically be detected using these methods, and are not addressed in this study.

Developmental disorders (DD) are often caused by rare, small (SNV and indel) mutations, genetic variation that is not easily captured using microarray (King et al. 2014). Therefore, to achieve a more comprehensive assessment of pathogenic mutations, rare disease studies rely heavily on targeted sequencing of the protein-coding regions (“exons”) of the genome, an approach called whole-exome sequencing (WES) (Koboldt et al. 2013). Indeed, sequencing of the whole genome (WGS) offers several advantages compared to WES, including greater breadth of the genome and more consistent coverage of exons (Meynert et al. 2014). However, WGS is not currently as widely used as WES for rare disease studies due to higher costs, so this work focuses primarily on exome-sequencing data.

In addition to small-scale variation, forms of large-scale “structural variation,” including copy-number (Lee et al. 2007) and copy-neutral variation (uniparental disomy [UPD]) (Yamazawa et al. 2010), are also important causes of DD. CNV burden analysis of nearly 16,000 children with DD (Cooper et al. 2011) demonstrated that nearly all CNVs >2 Mb are likely pathogenic (odds ratios for CNVs of 1.5 and 3 Mb were 20 and 50, respectively), and deletion events are more often penetrant than duplication events. UPD events are only present in approximately one in 3500 healthy individuals (Robinson 2000), but are enriched in children with DD (King et al. 2014) and may result in highly penetrant imprinting disorders, recessive diseases, or may be associated with chromosomal mosaicism (Eggermann et al. 2015). Low-clonality mosaicism is difficult to observe in karyotyping, as inspection of at least 20 cells is required to exclude 14% mosaicism with 95% confidence (Hook 1977), and it is also difficult to observe in microarray analysis, because the detection sensitivity of mosaic duplications by SNP microarray with about 1 million probes for events of at least 2 Mb in size is limited to events of ∼10% clonality (González et al. 2011; Jacobs et al. 2012; Laurie et al. 2012; Machiela et al. 2015). The median average clonality in recent SNP-based studies of DD for mosaic aneuploidy was 40% (Conlin et al. 2010), and for mosaic structural variation (2 Mb and greater), it was 44% (King et al. 2015). With regard to frequency of mosaicism among children investigated with clinical diagnostic testing, the proportion of autosomal mosaic copy-neutral events was 0.24% (12 in 5000) (Bruno et al. 2011), whereas the proportion of autosomal mosaic copy-number events was 0.35% (36 in 10,362) (Pham et al. 2014); summing both copy-neutral and copy-number proportions yields a combined proportion of 0.59% of cases with mosaic structural variation.

The detection of large-scale genetic variation from WES data is challenging because input data are derived using sparse sampling of the genome, because targeted regions typically cover only ∼2% of the genome (Meynert et al. 2014) and sequence read depth at exons is biased by enrichment efficiency and other factors (Plagnol et al. 2012). Despite these limitations, exome-based software tools have been successfully engineered to detect large-scale constitutive mutations, including copy-number variation (Sathirapongsasuti et al. 2011; Fromer et al. 2012; Krumm et al. 2012; Magi et al. 2013; Backenroth et al. 2014) and copy-neutral variation, such as BCFtools/RoH (Narasimhan et al. 2016) and UPDio (King et al. 2014). These tools are relatively insensitive to mosaic abnormalities (post-zygotic abnormalities, i.e., “mutations”), however, because they typically rely on single metrics, such as copy-number change (rather than copy-number and allele-fraction), or on genotype, which is not well assessed in mosaic state. Specialized methods have been developed for the analysis of cancer exomes where tumor and normal tissue can be isolated (Lonigro et al. 2011; Amarasinghe et al. 2014) or, in the context of a parent-fetus trio, for fetal DNA in maternal plasma (Rampášek et al. 2014). However, a method to detect copy-number and copy-neutral mosaicism from an individual's exome (or genome) is lacking, but if available, could further extend the capacity of sequence-based analyses.

We developed MrMosaic, a method that detects structural mosaicism using joint analysis of B_dev and C_dev in targeted or whole-genome sequencing data (Fig. 1). We used simulations to demonstrate the superior performance of MrMosaic compared to the MAD algorithm. We also applied MrMosaic to analyze WES data from 4911 children with developmental disorders and identified 11 structural mosaic events in nine individuals, six of whom exhibited tissue-specific mosaicism.

Figure 1. — Detecting structural mosaicism using MrMosaic. (A) Exome data are stored in a BAM file from which allele fraction (*left*) and coverage (*right*) are measured at polymorphic positions within or near target regions. A simulated mosaic deletion is depicted. (B) The raw data, consisting of BAFs (y-axis: B allele frequency) and normalized coverage (y-axis: log ratio of normalized coverage) are plotted across chromosome space (x-axis) for a simulated mosaic deletion. (C) Absolute deviation of BAF (y-axis: B_dev) and normalized coverage (y-axis: C_dev) at heterozygous sites are analyzed. A smoothed median has been included. (D) Mann-Whitney U tests were performed separately for B_dev and C_dev, comparing the signal detected in sliding windows in this chromosome compared with randomly selected sites from other chromosomes, generating a test statistic (y-axis). A smoothed median has been included. (E) The test statistics are depicted in log scale. The P-values of the Mann-Whitney U tests were combined and segmented (black lines). Segments passing the Mscore significance threshold are plotted in blue.

Results

We developed a new computational method, MrMosaic, to detect structural mosaic abnormalities (copy number and loss of heterozyosity) from high-throughput sequence data (Methods). In summary, this method identifies chromosomal segments with elevated deviations in allelic proportion and copy number, relative to randomly selected sites on other chromosomes from the same data (Fig. 1). Initially, measures of deviation of allelic proportion (B_dev) and copy number (C_dev) are calculated from the WES/WGS data at well-covered (at least seven reads) known polymorphic SNVs. Whereas B_dev is only assessed at heterozygous sites, C_dev extracts and integrates read-depth information from flanking nonheterozygous sites to reduce noise. The statistical significance of the observed B_dev and C_dev were assessed separately, using nonparametric testing, and the resultant P-values were subsequently combined and then segmented using the GADA algorithm (Pique-Regi et al. 2008). We devised a confidence score, the Mscore, to curate putative detections of mosaic segments, by integrating metrics that discriminate between true positive and false positive mosaic detections (Methods).

Simulations

We performed simulations (Methods) to explore the performance of MrMosaic for three different classes of structural mosaicism: gains, losses, and LOH, in several contexts. The variation in performance across mosaicism of different sizes, clonalities, and sequencing coverage is summarized in Figure 2, for both WES and WGS data.

Figure 2. — Simulation performance summarized by AUC. We measured the average precision (area under the precision-recall curve [AUC]) for MrMosaic implemented on whole-exome (WE) simulations (A,C,E) and MrMosaic and MAD implemented on whole-genome (WG) simulations (B,D,F). The depth, size, and coverage measured for WES and WGS simulations were selected to accentuate informative differences in performance. AUC across size: Simulated events of 50% clonality were studied for WES (A) and WGS (B) simulations. Although for WES simulations, simulated exome depth was 75× depth, for WGS simulations, it was 30× depth. MrMosaic on whole-genome data (WG-MrM) outperforms MrMosaic on exome data (WE-MrM), which outperforms MAD on exome data (WE-MAD). AUC across clonality: Although for WES (C) simulations, the simulated size and coverage was 5 Mb and 75×, for WGS (D) simulations, it was 100 kb and 30×. AUC across average coverage: Simulated events of 50% were studied for both WES (E) and WGS (F) simulations. Although for WES simulations, simulated event size was 5 Mb, for WGS simulations, it was 100 kb.

Across all measured categories, mosaic duplications were more difficult to identify than deletion or LOH events, especially at lower (25%) clonality (Supplemental Fig. S1). We suspected that the most likely explanation for this lower sensitivity is that duplications result in the smallest deviation of B_dev compared with deletion and LOH events (Supplemental Fig. S2), and the C_dev signal is masked by sampling noise at low clonality. To further explore the effect of including C_dev in addition to B_dev, we investigated the performance of MrMosaic using B_dev alone compared with joint analysis of B_dev and C_dev. This analysis showed substantially improved detection of copy-number events above lower clonality, whereas only a marginally decreased performance of LOH detection (Supplemental Fig. S3), consistent with the intuition that C_dev yields a valuable net signal when clonality is above the C_dev noise floor.

Simulation performance increased with larger event size (Fig. 2A). WES simulation analysis demonstrated high area under the precision-recall curve (AUC) for all events at least 10 Mb in size and at least 50% in clonality, and for deletion and loss of heterozygosity (LOH) events, at least 5 Mb in size. MrMosaic performed favorably compared to MAD in all measured categories. Results for WGS simulations demonstrated an AUC of about 0.9 for 100 kb LOH and loss events and >0.95 for all megabase-size events. Larger events were assayed by more positions, and whole-genome simulations interrogated nearly 50-fold more sites than exome data (Supplemental Table S1).

Detection performance in simulations increased between 25% and 75% clonality (Fig. 2B). The WES and WGS clonality performance results were measured at 5 Mb and 100 kb sizes, respectively, as events at these sizes were most sensitive to changes in clonality (Supplemental Figs. S4, S5). Previous studies of children with DD have reported a median mosaicism of ∼40% mosaicism, and detection performance is strong for detecting mosaicism at this clonality at the studied sizes. As clonality increases, the mosaicism is present in a greater proportion of cells, resulting in a greater signal of detection.

Simulation performance increases with respect to sequencing coverage (Fig. 2C). The WES and WGS performance with respect to sequencing coverage were assessed for events of 50% clonality, using 5-Mb events for the WES simulations and 100-kb events for the WGS simulations. WES simulations demonstrated a marginal improvement of detection performance at higher coverage, which was notable for midclonality gains (Supplemental Fig. S4). Previous work suggested that 75× average coverage in WES data is sufficient for constitutive copy-number analysis (Fitzgerald et al. 2014), and these coverage simulations demonstrated that this exome coverage is also sufficient for the detection of mosaic structural abnormalities. In the WGS results, AUC rose dramatically between 15× and 20× for LOH and loss events and between 25× and 30× for gains. AUC was above about 0.9 for LOH and loss events at 30× depth, a standard sequencing depth used in WGS disease studies. Nearly all structural mosaic events of 100 kb and 50% clonality were detected (Supplemental Fig. S5), and average coverage of 20× was sufficient to detect nearly all 50% clonality deletion and LOH events at 100 kb, and detection performance of gains improved at 30× and 40× (Supplemental Fig. S6). This improved performance as coverage increases results primarily from sampling variance (“noise”) decreasing (correlation r = −0.95) (Supplemental Fig. S7), with an additional minor contribution from more sites (more signals) passing the minimal depth threshold for consideration (Supplemental Table S1).

Detections in 4911 exome samples

We generated WES data for 4911 children with undiagnosed developmental disorders. DNA was collected from either blood (n = 1652), saliva (n = 3246), or both (n = 13), and sequenced to a median average coverage of 90×. Analysis for structural mosaicism identified 11 mosaic abnormalities among nine individuals, a frequency of 0.18%. The detections consisted of five losses (median size: 13 Mb; median clonality: 46%), four gains (median size: 25 Mb; median clonality: 55%), and two LOHs (median size: 50 Mb; median clonality: 26%) (Fig. 3; Table 1; Supplemental Figs. S8–S18).

Figure 3. — Structural mosaicism detected from exome data: Structural Mosaicism Detected by MrMosaic in the Deciphering Developmental Disorders (DDD) study. Black and red dots represent copy number and allele fraction, respectively. C_dev and B_dev are plotted in black and red trend lines. The blue line represents statistically significant segmented detections passing a threshold. Different classes of events are found: (A–C) Mosaic gains; (D–F) mosaic losses; (G) mixed copy number; (H,I) loss-of-heterozygosity events.

Table 1.

Detections by exome and validation by SNP microarray

Open in a new tab

To improve our understanding of the accuracy of this sequencing-based method, we compared the results of the above analysis with the results of a prior experiment (King et al. 2015), which had analyzed high-resolution SNP data of 1303 Deciphering Developmental Disorders (DDD) samples, among which 1226 (of the 1303) had both exome and SNP data available. Among these 1226 for which the exome data could be compared with the gold-standard SNP data, detection using MrMosaic identified eight events, whereas detection using SNP microarray data of probands identified 10 events. Of the two events not detected by exome but detected by SNP microarray, one of the missed events was a 4-Mb duplication below 25% clonality. The other missed event was an LOH event with low sequencing depth (33×, one of the lowest of our study) (Supplemental Fig. S19); low depth results in higher sampling variance and lower statistical significance of deviations in allelic proportion and copy number (Supplemental Fig. S7). Given the high clonality (∼75%) of this event, it may have been detected using constitutive (genotype-based) UPD analysis; however, because paternal data were not available for this sample, it was not analyzed by our trio-based UPD detection pipeline (King et al. 2014).

Validation of the 11 mosaic abnormalities using SNP microarrays on DNA derived from both blood and saliva successfully detected all abnormalities in at least one tissue (Table 1). Notably, six of the seven mosaic copy-number mutations detected by MrMosaic in exome data had been undetected by both clinical and high-resolution aCGH investigation of the same tissue, despite most events being at least 5 Mb in size and exhibiting 50% clonality (Supplemental Table S2). Examination of the raw aCGH data in one case (Supplemental Fig. S17) showed that only small fragments of one of the events were detected, but these called segments were individually much smaller than the actual event.

Detection of the mosaic events was largely dependent on the assayed tissue, suggesting the importance of tissue-specificity (present in only a subset of tissues) in mosaicism detection. Of the 11 mosaic events, three were detected in blood and in saliva samples, whereas the remaining eight were only observed in saliva (Table 1; Supplemental Figs. S8–S18). There were two abnormalities detected from 1652 blood samples and nine detected from 3246 saliva samples, a nonsignificant proportional difference (P > 0.05, Fisher's exact test). One of the mosaic events detected in both blood and saliva was an LOH-type event, remarkable for having a gradient of increasing clonality toward the telomere (Supplemental Figs. S16, S19). This gradient of increasing clonality along the chromosome is compatible with LOH-mediated mosaic reversion, characterized by distinct cell populations carrying partially overlapping independent LOH events, as reported recently (Choate et al. 2015). Nevertheless, despite generation and analysis of high-depth (∼400×) WES data for this sample and the identification of several strong candidate genes, including CEP57 (the cause of mosaic aneuploidy syndrome) (Snape et al. 2011) in the reversion-localized region, no plausibly pathogenic rare (<1% minor allele frequency) coding sequence variants were identified (Supplemental Table S4). It may be that the gene of interest is several megabases distal to the breakpoint region.

We assessed the pathogenicity of the events detected in these nine children based on their phenotypes and known genomic disorders whose phenotypes matched those found in these children. Of the nine children presented here, four (Decipher IDs: 261373, 259003, 260462, and 257978) had been discovered and examined for pathogenicity during an earlier study (see Table 2; Supplemental Note S1; King et al. 2015). The mosaic events identified in seven of nine children were considered definitely pathogenic because they were multi-megabase CNVs that overlap known genomic-disorder regions (Supplemental Note S1). The reversion mosaic event was considered indicative of a likely pathogenic mutation, because the presence of multiple overlapping mosaic clones suggests strong and on-going negative selection against a deleterious allele. One LOH event was of uncertain pathogenicity as no rare loss of function or functional variants were detected (Supplemental Table S4).

Table 2.

Phenotypes for children with identified structural mosaicism

graphic file with name 1704tb02.jpg

Open in a new tab

Empirical evaluation of detection of mosaicism from WGS data

One sample, with three mosaic abnormalities detected on a single chromosome, which had also been detected during an earlier analysis (King et al. 2015), provided a valuable opportunity to use whole-genome sequencing data to clarify rearrangement architecture and to demonstrate MrMosaic performance on whole-genome sequence data. After the whole-genome sequencing data were generated and analyzed, MrMosaic easily detected these multi-megabase mosaic events, found with Mscores of 36, 117, and 32. The presence of three mosaic events of similar clonality on the same chromosome is suggestive of a complex chromosomal rearrangement. Analysis of the WGS read-pair data using BreakDancer (Chen et al. 2009) identified read pairs mapping across the centromere and evidence of a breakpoint spanning from the q-arm deletion to the centromere. Ring chromosomes are associated with biterminal deletions (Guilherme et al. 2011) and inverted duplications (Knijnenburg et al. 2007). Additionally, all three mosaic components arose from a single parental origin (paternal, in this case) (King et al. 2015), which would be expected in a ring chromosome. We suspected that the underlying abnormality in this child is a ring chromosome, although we were unable to access the cellular material required to generate the cytogenetic data to prove this hypothesis (Supplemental Fig. S21).

Discussion

Structural mosaic abnormalities are multi-megabase, post-zygotic mutations that have previously been associated with developmental disorders (Conlin et al. 2010; King et al. 2015). This work introduces a novel method to detect these mutations from next-generation sequencing data.

In an extensive simulation study, we show adequate discriminative ability to detect abnormalities in WES and WGS data across a large, clinically relevant range of size and clonality in different types of mosaic structural variation. We also compare our method to the popular array-based mosaic detection method, MAD, and show a substantial boost in performance, which derives primarily from the joint analysis of allelic proportion and copy-number deviations. Simulation results suggested that exome sequencing data can be used to identify many of the known clinical mosaic duplications involving chromosome-arm events, such as 12p and 18p mosaic tetrasomy, because MrMosaic easily detected events of this size. Given the dimensionality of the simulation parameter space (i.e., clonality, event size, coverage) and the computational cost of running these simulations, we restricted our analysis to parameter values in line with previous observations of structural mosaicism and reasonable experimental parameters at current sequencing costs. Additionally, we also chose more extreme parameter values for size and clonality in order to illustrate the dynamic range of the method in high-depth whole-genome sequence data (e.g., performance at <100-kb resolution and low clonality events), although few previous pathogenic variants with these characteristics have previously been described. These simulated performances only serve as illustrations for the selected parameter set and are not readily generalizable to other combinations of parameters, given the nonlinear interaction between parameters. Overall, simulation results show that MrMosaic is able to detect variants similar to previously described pathogenic variants with good performance.

We used MrMosaic to uncover pathogenic structural mosaicism in a large exome study of children with undiagnosed developmental disorders. Applying our method to the exome data of 4911 enrolled children, we identified nine individuals with structural mosaicism; the majority of these mutations were considered pathogenic. Assessment of pathogenicity was largely based on identifying substantial overlap between the known syndromic manifestations of large, well-known syndromic disorders, and the predominant phenotypes seen in each child. In one child with LOH-mosaicism, no pathogenic mutations were identified in the mosaic LOH region, suggesting that a pathogenic allele may lie outside of this mosaic region. In this WES-based analysis, we recovered eight of the 10 abnormalities previously detected in a subset of 1226 samples previously analyzed with SNP genotyping chip data, suggesting that exome-analysis alone is sensitive to detecting large-scale mosaicism. One of the missed abnormalities was likely undetected because the exome data were of low depth, which increases the variance of measured B_dev and C_dev. Most of the detected mosaic copy-number abnormalities had escaped detection by previous aCGH analysis. This demonstrates that detection of mosaic events requires assay of tissue containing the abnormality and tailored methods with sufficient sensitivity for mosaicism.

The overall frequency of mosaicism detected in this study, 0.18%, is lower and significantly different (P < 10⁻⁴, binomial test) from the 0.59% structural mosaicism frequency estimated from previous studies. One likely explanation for the discrepancy in these frequencies is ascertainment bias, as some classes of structural mosaicism (e.g., mosaic trisomies) are likely to have been diagnosed by prior diagnostic testing (e.g., karyotype or microarray) and not enrolled into the DDD study. Another component of this discordance may be due to decreased sensitivity, as mosaicism smaller than 2 Mb is challenging to detect by exome, and these small events account for ∼25% (9/36) of mosaic copy number events described previously (Pham et al. 2014). Given the low number of mosaic events in our cohort, due to the low mosaicism rate and tissue specificity, and the lack of publicly available large-scale developmental disorder data sets, this study only provides a limited estimate of the real-life performance of MrMosaic on nonsimulated data sets. As developmental disorder studies increase in sample size and scope, we envision that screening for mosaicism will provide additional explanatory power, increasing the number of diagnosed cases.

In one sample, we observed a gradient of mosaicism, a phenomenon likely associated with mosaic reversion of a de novo mutation dominantly inducing genome instability. Analysis of the mosaic LOH region with high-depth exome data did not identify a strong candidate coding variant, and a further WGS-based search for candidate pathogenic de novo mutations is on-going. Whole-genome sequencing data were generated for one individual with three mosaic abnormalities on the same chromosome. Analysis of these data recapitulated the mosaic events, and read pair analysis identified a pericentromeric inversion and supported the hypothesis of an underlying complex chromosomal rearrangement, likely a ring chromosome.

As expected, whole-genome analysis had superior performance compared to exome analysis, which was likely due to a combination of advantages of whole-genome data, including higher density of assayed sites (by nearly 50-fold) and more consistent coverage across sites, compared to exome coverage, which is subject to exome bait hybridization biases. Compared to whole-genome data, the exome data had higher average coverage (75× to 25×) for sites within targeted regions compared to the whole-genome data, and although simulation results showed increasing performance with higher depth sequence data, this effect was outweighed by the greater density of sites in whole-genome data.

Although the general performance of the method is adequate in many clinically relevant cases, some classes of event prove more difficult to detect. For example, low clonality mosaic gains generate the smallest deviation in B_dev and C_dev compared to other types of events, explaining their comparatively poor detection sensitivity in simulations and the failure to detect one mosaic duplication found using SNP data but not in exome data. More lenient detection thresholds may be preferred to increase detection sensitivity if clinical suspicion of mosaic duplication exists. Increasing the clonality of mosaicism by the biopsy of affected tissue, as is performed when pigmentary mosaicism provides evidence of underlying mosaicism (Woods et al. 1994), should also theoretically improve detection. Given the size and clonality of the two missed events and the simulation results from whole-genome sequencing, both events would likely have been detected had they been analyzed using higher depth WES or WGS, which are likely to become more common in the future.

The majority of the mosaic events we observed in saliva-derived DNA were not observed in blood. The samples with these abnormalities were recruited into our study because they remained undiagnosed after assessment by clinical laboratories of blood-derived DNA failed to detect the mosaic abnormalities we detected in saliva. DNA derived from saliva has a mixed origin, mainly lymphocytes (derived from mesoderm) and epithelium (derived from epiderm) (Endler et al. 1999); therefore, the events detected in saliva, but not blood, are believed to reflect epithelial mosaicism. There are two possible explanations for the disparity in tissue distribution we observed, first, that the epithelium-derived mutational events occurred late, i.e., after the differentiation of lymphocytes and epithelial cells; or second, that these events occurred early, i.e., prior to the split between lymphocytes and epithelial cells with subsequent removal from blood cell lineages by purifying selection. Several lines of evidence suggest the second explanation is more likely: (1) existing precedent, as the second phenomenon has been directly observed in Pallister-Killian syndrome, in which the percentage of abnormal cells decreases with age in blood but not fibroblasts (Conlin et al. 2012), and tissue-limited mosaicism has been observed in mosaic tetrasomies of Chromosomes 5p, 8p, 9p, and 18p (Choo et al. 2002); (2) the clonality of events observed in both blood and saliva is not greater than the clonality of events in only saliva, which would be expected if events seen across tissue arose earlier in development; and (3) both observed LOH events are shared between tissues but only one of nine CNV events are shared between tissues, perhaps suggesting increased pathogenicity of CNV events compared to copy-neutral events, thus, more likely to be negatively selected in blood. Given these considerations underlying the disparity in tissue type and the observation that the majority of observed abnormalities were detected in saliva but not blood, it is possible that, compared to the sampling of saliva, the sampling of blood could lead to a substantial loss of power (possibly <50% power) to detect pathogenic structural mosaicism, resulting in missed diagnoses. Studying the saliva tissue in these children permitted the identification of their mosaic abnormalities and ended for them and their families, their quest for diagnosis.

Additional work is required to investigate for which developmental disorders tissue-limited mosaicism is common. Another intriguing question regarding tissue distribution is the relationship between clonality and pathogenicity. Although mosaicism limited to a small number of cells is unlikely to cause developmental disorders, it is conceivable that low-level mosaicism present in a vulnerable tissue, such as white matter neurons, may have clinical consequences. More work is needed to address this question, including more extensive analysis of the tissue distribution of mosaicism (for example, by analyzing diverse tissues sampled from all three germ layers) and assays with improved resolution, allowing single or oligo-cell sequencing. The availability of more sensitive detection methods will improve the detection of a larger fraction of events limited to a single tissue.

Next-generation sequencing, in the form of exome and genome sequencing, can be harnessed to detect a wide range of mutations, including, as presented here, mosaic structural abnormalities. Given that sequencing costs continue to decline and because of the multifaceted detection capabilities of exome data, it may be that exome sequencing will supersede microarray technology as a first-line test for developmental disorders. Widespread incorporation of high-depth exome and whole-genome sequencing will revolutionize our understanding of the extent of mosaicism in the body and better define the relationship of mosaicism and disease.

Methods

MrMosaic

Implementing mosaic detection requires generating an input file and executing the algorithm; the latter consists of several steps: statistical testing, segmentation, filtering, and results visualization. “BAF” is used below as an alias for “nonreference proportion.” The input data for MrMosaic consist of genomic loci with measured B_dev values, C_dev values, and genotypes, stored in a tab-delimited file. The loci selected were diallelic single-nucleotide polymorphic (1%–99% MAFs among European individuals in the UK10K project) (UK10K Consortium 2015) autosomal positions. For exome analysis, only loci overlapping targeted regions of the exome design were used. At these loci, B_dev and C_dev values were calculated as described in the following two paragraphs.

B_dev values were generated using the following method: the identity of the alleles at each locus is extracted using fast_pileup function in the Perl module Bio::DB::Sam (Stajich et al. 2002), using high-quality reads (removal criteria: below base quality Q10, below mapping quality Q10, improper pairs, soft- or hard-clipped reads), and BAF was calculated as the number of reference bases divided by the total of reference bases and nonreference bases. Heterozygous sites were defined as loci with a BAF between 0.06 and 0.94, inclusive. The B_dev was calculated at heterozygous sites as the absolute difference between the BAF and 0.5. Only loci with sufficient read coverage (at least seven reads) were used for analysis.

C_dev values were generated using the following method: read depths from each target region were collected, the log₂ ratio for that target region was calculated by comparing its read depth to a reference read depth, in which the reference value was defined as the median read depth among the distribution of read depths at that target region from dozens of highly correlated samples. This log₂ ratio was normalized based on several covariates pertaining to each target region (covariates included were GC-content, hybridization melting temperature, delta free energy) (Fitzgerald et al. 2014). Lastly, using the Aberration Detection Algorithm v2 (ADM2) method by Agilent, a final error-weighted value is produced, which we used as the C_dev value.

The statistical testing step of the MrMosaic algorithm begins by data smoothing, using a rolling median (width of 5) across heterozygote and homozygous sites, to utilize the depth information in homozygous sites to reduce variance. From this point forward, only heterozygote sites were considered, as mosaic abnormalities do not affect B_dev of homozygous loci. Statistical testing assesses whether a given locus is significantly deviated from the B_dev and C_dev means given the null hypothesis of no chromosomal abnormality. At every heterozygote site, we compute two Mann-Whitney U tests, one for B_dev and one for C_dev, testing the alternative hypothesis that the distribution of the metric in the neighborhood of the chosen site is greater (has a higher median rank) than the distribution of the background. We use 10,000 randomly selected sites from all autosomes, excluding the current chromosome, as the background population. In order to account for nonuniform spacing of the data points, we applied a distance-weighted resampling scheme, to down-weight distant points from the chosen site. The tricube distance, inspired by Loess smoothing, was chosen as a decay function for the resampling weights and considered data points up to 0.5 Mb upstream of and downstream from the given position. An equal number of data points were then sampled around the chosen site and from the background (n = 100) and the Mann-Whitney U test was performed. Finally, we combined the P-values of the two statistical tests (one for B_dev and C_dev) for every position using Fisher's Omnibus test.

The segmentation step operates on the combined P-value generated above. Segmentation was performed using the GADA algorithm (González et al. 2011), using the parameters values as follows: SBL step: maxit of 1 × 10⁷; Backward Elimination step: T-value of 10 and MinSegLen value of 15. This step generates contiguous segments of putative chromosomal abnormalities. Segments in close proximity (within 1 Mb) that show the same signal direction (loss, gain, LOH) are merged to reduce oversegmentation.

The filtering step is required to assess which of the segments generated above are likely reflective of true mosaicism. While testing MrMosaic in exome simulation analyses, we observed that true-positive detections (those overlapping simulated events) tended to be larger (greater number of probes) and have stronger evidence of deviation (GADA amplification value) than putative segments that did not overlap simulated regions (i.e., false-positive, spurious calls) (Supplemental Figs. S22–S24). We captured these two features in a scoring metric calculated from the cumulative empirical distribution functions for “number of probes” and “GADA amplification value” of false-positive segments and assessed the composite probability that a given segment comes from these distributions, such that: Mscore = abs[−log₂(x) + −log₂(y)], where x and y refer to these empirical cumulative distribution functions. Thus, the Mscore is a quality-control metric derived by combining the size and signal-strength of detections. We used the Mscore to filter those events least likely to represent false positives. We selected events with an Mscore of 8 or greater for analysis, because we observed that this appeared to provide a good balance between sensitivity and specificity (Supplemental Fig. S24).

The visualization step generates a detection table and detection plots. The detection table consists of mosaic abnormalities detected and contains the following data: chromosome, start_position, end_position, log2ratio_of_segment, bdev_of_segment, clonality, type, number_of_probes, GADA_amplification, p_val_nprobes, p_val_GADA_amplification, Mscore. Event clonality was calculated by assessing the type of mosaic event based on LRR and converting the B_dev value to clonality based on the type of event (Supplemental Table S6). The detection plots are png files showing the loci and BAF and C_dev data for each chromosome in which a mosaic abnormality is detected, as well as a genome-wide lattice plot using the data for all chromosomes.

The algorithm can be used in multithreaded mode to facilitate whole-genome analysis. Analysis of a single whole exome using a single thread was completed in 15 min when tested using a single core of an Intel Xeon 2.67 Ghz processor and 500 MB of RAM. Whole-genome analysis using 24 cores required 30 GB of RAM and 7 h. Whole-genome analysis can be substantially shortened if the number of sliding windows is reduced or the window size is increased.

Simulating mosaicism

We devised a series of simulation experiments to assess MrMosaic performance for various events, across type (LOH, gains, losses), clonalities, sequencing depths, platforms (whole-exome [WE] and whole-genome [WG]) and to compare performance to the MAD method. We compared performance to a modified version of MAD that we adapted to enable more flexible execution in a parallel-computing environment, but identical with respect to statistical methods.

The simulation method consisted of the following steps: (1) loci selection, (2) calculating depth at these loci, (3) parameter space and number of trials, (4) adjusting read depth in simulated regions, (5) calculating final read depth, (6) selecting sites based on minimum depth, (7) calculating relative copy-number, (8) assigning genotypes, (9) calculating the BAF for each site, and (10) calculating performance. Steps 1–3 differed between the WES and WGS simulations and are described below. The remaining steps 4–10 were executed consistently for WES and WGS simulations and are described next.

For WES simulations, loci selection (1) was based on diallelic single-nucleotide polymorphic positions—between 1% and 99% UK10K (UK10K Consortium 2015) European minor allele frequency—in the V3 version of the target-region design. To calculate depth at these loci (2), at each locus i, baseline sequence read depth $({\tilde{D P}}_{i})$ for these sites was defined as the median of the read depth distribution among 100 parental exomes for each site, considering only high-quality reads (mapQ ≥ 10, baseQ ≥ 10, properly mapped read pairs), where parental exomes had a mean average sequencing output of 67× (calculated as the number of QC-passed and mapped reads without read-duplicates × 75 bp read length/96 Mb targeted bp). The parameter space (3) consisted of the following: target average sequencing coverage (in ×) ∈ {50, 75, 100}, event clonality m ∈ {0.25, 0.375, 0.5, 0.75}, type ∈ {loss, gain, LOH}, and size ∈ {2 × 10⁶, 5 × 10⁶, 1 × 10⁷, 2 × 10⁷}. Two hundred trials (4) were conducted per parameter combination for a total of 36,000 simulations.

For WGS simulations, the loci selection (1) was based on diallelic single-nucleotide polymorphic (1%–99% European MAFs from the May 2013 release of The 1000 Genomes Project) (The 1000 Genomes Project Consortium 2012) autosomal positions. To calculate depth at these loci (2), we calculated a scaling factor for each locus based on the median read depth of the first two median absolute deviations of the distribution of coverage for that site seen across 2500 low-coverage samples in The 1000 Genomes Project (The 1000 Genomes Project Consortium 2012). A site-specific scaling factor was calculated as the deviation of each site's read depth from the average read depth across all polymorphic positions. Simulation depth was defined at each site as the desired simulation coverage multiplied by site-specific scaling factor. The parameter space (3) consisted of two experiments: (1) average genome coverage of 25×, event clonality m ∈ {0.25, 0.375, 0.5, 0.75}, type {loss, gain, LOH}, and size (Mb) ∈ {1 × 10⁵, 2 × 10⁶, 5 × 10⁶}; and (2) a 5-Mb 50% clonality event captured at average genome coverages (in ×) ∈ {30, 40, 50, 60} for the three mosaic types {loss, gain, LOH}. One hundred trials (4) were conducted per WGS simulation.

The remaining simulations steps 4–10 described below were performed consistently for WES and WGS simulations. For each simulation, a single mosaic event was introduced into each simulation trial. The adjustment of read depth in simulated regions (4) was performed using a scaling factor based on the type and clonality of the simulated event, m, while sites not overlapping copy-number simulated events would not undergo this scaling step (Supplemental Table S6). To calculate the final simulated read depth (5) for each site i (SDP_i), we sampled from a Poisson distribution with λ_i equal to the scaled read depth. Only positions with a final read depth (6) of at least 7 were included for analysis. Relative copy-number (7) was defined as log₂ of the ratio of the final read depth to the baseline read depth.

The assignment of genotypes (8) (AA, AB, or BB) at each position, i, was randomly determined based on the site's minor allele frequency, which was used in a multinomial function with probabilities corresponding to Hardy Weinberg–assumed genotype proportions (p², 2pq, q²). For calculating the BAF for each heterozygote at site i (9), we adjusted the expected heterozygote proportion of 0.5 with respect to the chosen event type and clonality and sampled from a binomial distribution given this adjusted proportion and the simulated read depth at i. BAFs for homozygote reference (AA) and nonreference (BB) sites were chosen by sampling from a binomial distribution with P = 0.01 or P = 0.99, respectively, and the simulated read depth at i.

MrMosaic and MAD were applied on the simulated WES and WGS samples generated by the preceding procedure, and performance was measured using precision-recall metrics (10). A “success” in a trial was considered a detection overlapping the simulated mosaic event. Precision was calculated as the number of successes divided by the number of detections. Recall was defined as the proportion of trials with a success.

Description of samples and sequencing

The samples used in this analysis derived from the Deciphering Developmental Disorders study, a proband two-parent trio-based investigation of children with undiagnosed developmental disorders from the United Kingdom and Ireland (Firth and Wright 2011; Fitzgerald et al. 2014; Wright et al. 2014; King et al. 2015). DNA was extracted from blood and saliva and was processed at the Wellcome Trust Sanger Institute by array CGH and exome sequencing. There were 4926 DNA samples analyzed in this study from 4911 children, because some children were analyzed using both blood and saliva. The majority, 3260 of 4926 (66%), of the DNA samples were extracted from saliva.

DNA was enriched using an Agilent exome kit, based on the Agilent Sanger Exome V3 or V5 backbone and augmented with 5 Mb of additional custom content (Agilent Human All Exon V3+/V5+, ELID # C0338371). An “extended target region” workspace was defined by padding the 5′ and 3′ termini of each target region by 100 bp, yielding a total analyzed genome size of ∼90 Mb. Sequencing was performed using the Illumina HiSeq 2500 platform with a target of at least 50× mean coverage using paired-end sequence reads of 75-bp read length. Measured exome coverage ranged from 14× to 155× with a mean of 69× (Supplemental Fig. S24). Alignment to the reference genome GRCh37-hs37d was performed by BWA version 0.5.9 (Li and Durbin 2009) and saved in BAM-format files (Li et al. 2009).

Additionally, two exome samples were processed post hoc from saliva after SNP genotyping chip analysis showed mosaicism was present in saliva but absent in blood. These two exome samples and the exome sample with suspected revertant mosaicism were processed separately from the exome experiment described in the previous paragraph. For these three exomes, the Agilent Sanger Exome V5 target kit was used, and sequence depth ranged from 387× to 455× coverage (reads = {465,522,627, 483,098,826, 549,766,632} × 75 bp read length/(90 × 10⁶) target-region size). The sample with suspected underlying mosaic reversion had 549,224,891 QC-passed and mapped reads and 57,165,328 duplicates, therefore, a mapped read coverage of 410× [(549,224,891–57,165,328)] × 75/(90 × 10⁶).

Sequencing was performed using an Illumina X-Ten sequencing machine on the sample for which whole-genome sequencing data were generated. Library fragments of 450-bp insert size were used, and paired-end 151-bp read-length sequence reads were generated. Alignment to the reference genome GRCh37-hs37d was performed by BWA version 0.5.9 (Li and Durbin 2009) and saved in BAM-format files (Li et al. 2009). Realignment to GRCh38 was not done because this method avoids mitochondrial regions and harnesses exonic regions, whose mapping is unlikely to be affected by alternate scaffolds. Average coverage was calculated using SAMtools flagstat as the number of QC-passed mapped-reads without duplicates using 151-bp read-lengths in a 3-Gb genome: (616,151,282–124,325,581) × 151/(3 × 10⁹) = 24.8×. Rearrangement analysis was carried out using BreakDancer v1.0 (Chen et al. 2009).

Additional filtering implemented in addition to Mscore quality score

Some events with very high Mscores appeared to represent real, but constitutive, abnormalities. There were two failure modes we identified: constitutive duplications and homozygosity by descent (HBD). Constitutive duplications genuinely produce strong signals in MrMosaic, but also constitutive deletion and ROH events may produce putative detections if individual probes had mapping artifacts that resulted in spurious signals. We used BCFtools/RoH to identify and filter HBD regions, and we flagged as suspicious events with >25% reciprocal overlap with CNVs detected through constitutive copy-number detection. In addition, we observed several recurrent putative detections, especially prevalent in pericentromeric and acrocentric regions that appeared spurious on the basis of inconsistencies between BAF and LRR, and we filtered such systematic errors by filtering putative mosaic events seen in >2.5% of samples. The remaining putative detections were each manually reviewed. Tissue specificity was assessed through manual review of each detection in both saliva and blood (Supplemental Note S1).

SNP genotyping chip validation

Illumina HumanOmniExpress-24 Beadchips (713,014 markers) were used to ensure that both saliva and blood tissue were analyzed using SNP microarray. To complete dual-tissue SNP microarray for the validation experiment, SNP microarray chips were run on blood samples for IDs 261373, 273553, 259003, 260462, and 257978. Illumina GenomeStudio software was used to generate log R ratio and BAF metrics, and Illumina Gencall software was used to calculate genotypes. Structural mosaic detection was performed using MAD (González et al. 2011). Initial mosaic events were merged if events were within 1 Mb and were the same type (loss, gain, or LOH) of mosaic event. Results were plotted using custom R code.

Software availability

MrMosaic is primarily written in the R language, available as an open source tool at GitHub (https://github.com/asifrim/mrmosaic). MrMosaic source code is available in the Supplemental Material.

Data access

The complete raw exome sequencing data from this study have been submitted to the European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) under accession number EGAS00001000775, and are available following Data Access Committee (DAC) approval.

Competing interest statement

M.E.H. is a cofounder, shareholder, and consultant to Congenica, Ltd., a company providing diagnostic decision support software.

Supplementary Material

Supplemental Material

supp_27_10_1704__index.html^{(4.1KB, html)}

Acknowledgments

This study relied on the generous participation of DDD patients and their parents. We are grateful to the DDD informatics and HGI pipeline staff for generating the data, DDD laboratory staff for sample handling, and Sanger genotyping core for running the validation arrays. Yanick Crow, Helen Firth, David Fitzpatrick, and Wendy Jones provided invaluable clinical expertise. We thank Jeff Barrett for his sharp, constructive feedback. Rolph Pfundt and James Lupski aided the interpretation of the revertant mosaic mutation. The DDD is supported by the Health Innovation Challenge Fund (HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the Wellcome Trust Sanger Institute (WT098051).

Author contributions: D.A.K., A.S., and M.E.H. developed the algorithm and wrote the manuscript; T.W.F. generated the ADM2 exome scores; R.R. assisted with WGS analysis; E.H., T.H., S.M., S.G.M., M.S., S.E.T., and P.C.V. recruited and phenotyped the DDD patients with detected mosaicism.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.212373.116.

Freely available online through the Genome Research Open Access option.

References

The 1000 Genomes Project Consortium . 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
Amarasinghe KC, Li J, Hunter SM, Ryland GL, Cowin PA, Campbell IG, Halgamuge SK. 2014. Inferring copy number and genotype in tumour exome data. BMC Genomics 15: 732. [DOI] [PMC free article] [PubMed] [Google Scholar]
Backenroth D, Homsy J, Murillo LR, Glessner J, Lin E, Brueckner M, Lifton R, Goldmuntz E, Chung WK, Shen Y. 2014. CANOES: detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res 42: e97. [DOI] [PMC free article] [PubMed] [Google Scholar]
Biesecker LG, Spinner NB. 2013. A genomic view of mosaicism and human disease. Nat Rev Genet 14: 307–320. [DOI] [PubMed] [Google Scholar]
Bruno DL, White SM, Ganesamoorthy D, Burgess T, Butler K, Corrie S, Francis D, Hills L, Prabhakara K, Ngo C, et al. 2011. Pathogenic aberrations revealed exclusively by single nucleotide polymorphism (SNP) genotyping data in 5000 samples tested by molecular karyotyping. J Med Genet 48: 831–839. [DOI] [PubMed] [Google Scholar]
Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang Q, Locke DP, et al. 2009. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6: 677–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
Choate KA, Lu Y, Zhou J, Elias PM, Zaidi S, Paller AS, Farhi A, Nelson-Williams C, Crumrine D, Milstone LM, et al. 2015. Frequent somatic reversion of KRT1 mutations in ichthyosis with confetti. J Clin Invest 125: 1703–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Choo S, Teo SH, Tan M, Yong MH, Ho LY. 2002. Tissue-limited mosaicism in Pallister–Killian syndrome—a case in point. J Perinatol 22: 420–423. [DOI] [PubMed] [Google Scholar]
Conlin LK, Thiel BD, Bonnemann CG, Medne L, Ernst LM, Zackai EH, Deardorff MA, Krantz ID, Hakonarson H, Spinner NB. 2010. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum Mol Genet 19: 1263–1275. [DOI] [PMC free article] [PubMed] [Google Scholar]
Conlin LK, Kaur M, Izumi K, Campbell L, Wilkens A, Clark D, Deardorff MA, Zackai EH, Pallister P, Hakonarson H, et al. 2012. Utility of SNP arrays in detecting, quantifying, and determining meiotic origin of tetrasomy 12p in blood from individuals with Pallister–Killian syndrome. Am J Med Genet A 158A: 3046–3053. [DOI] [PubMed] [Google Scholar]
Cooper GM, Coe BP, Girirajan S, Rosenfeld JA, Vu TH, Baker C, Williams C, Stalker H, Hamid R, Hannig V, et al. 2011. A copy number variation morbidity map of developmental delay. Nat Genet 43: 838–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eggermann T, Soellner L, Buiting K, Kotzot D. 2015. Mosaicism and uniparental disomy in prenatal diagnosis. Trends Mol Med 21: 77–87. [DOI] [PubMed] [Google Scholar]
Endler G, Greinix H, Winkler K, Mitterbauer G, Mannhalter C. 1999. Genetic fingerprinting in mouthwashes of patients after allogeneic bone marrow transplantation. Bone Marrow Transplant 24: 95–98. [DOI] [PubMed] [Google Scholar]
Firth HV, Wright CF. 2011. The Deciphering Developmental Disorders (DDD) study. Dev Med Child Neurol 53: 702–703. [DOI] [PubMed] [Google Scholar]
Fitzgerald TW, Gerety SS, Jones WD, van Kogelenberg M, King DA, McRae J, Morley KI, Parthiban V, Al-Turki S, Ambridge K, et al. 2014. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519: 223–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
Forsberg LA, Rasi C, Razzaghian HR, Pakalapati G, Waite L, Thilbeault KS, Ronowicz A, Wineinger NE, Tiwari HK, Boomsma D, et al. 2012. Age-related somatic structural changes in the nuclear genome of human blood cells. Am J Hum Genet 90: 217–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
Forsberg LA, Rasi C, Malmqvist N, Davies H, Pasupulati S, Pakalapati G, Sandgren J, Diaz de Ståhl T, Zaghlool A, Giedraitis V, et al. 2014. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat Genet 46: 624–628. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, Handsaker RE, McCarroll SA, O'Donovan MC, Owen MJ, et al. 2012. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet 91: 597–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genovese G, Kähler AK, Handsaker RE, Lindberg J, Rose SA, Bakhoum SF, Chambert K, Mick E, Neale BM, Fromer M, et al. 2014. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med 371: 2477–2487. [DOI] [PMC free article] [PubMed] [Google Scholar]
González JR, Rodríguez-Santiago B, Cáceres A, Pique-Regi R, Rothman N, Chanock SJ, Armengol L, Pérez-Jurado LA. 2011. A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data. BMC Bioinformatics 12: 166. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guilherme RS, Meloni VF, Kim CA, Pellegrino R, Takeno SS, Spinner NB, Conlin LK, Christofolini DM, Kulikowski LD, Melaragno MI. 2011. Mechanisms of ring chromosome formation, ring instability and clinical consequences. BMC Med Genet 12: 171. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hook EB. 1977. Exclusion of chromosomal mosaicism: tables of 90%, 95% and 99% confidence limits and comments on use. Am J Hum Genet 29: 94–97. [PMC free article] [PubMed] [Google Scholar]
Jacobs KB, Yeager M, Zhou W, Wacholder S, Wang Z, Rodriguez-Santiago B, Hutchinson A, Deng X, Liu C, Horner M-J, et al. 2012. Detectable clonal mosaicism and its relationship to aging and cancer. Nat Genet 44: 651–658. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jaiswal S, Fontanillas P, Flannick J, Manning A, Grauman PV, Mar BG, Lindsley RC, Mermel CH, Burtt N, Chavez A, et al. 2014. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med 371: 2488–2498. [DOI] [PMC free article] [PubMed] [Google Scholar]
King DA, Fitzgerald TW, Miller R, Canham N, Clayton-Smith J, Johnson D, Mansour S, Stewart F, Vasudevan P, Hurles ME. 2014. A novel method for detecting uniparental disomy from trio genotypes identifies a significant excess in children with developmental disorders. Genome Res 24: 673–687. [DOI] [PMC free article] [PubMed] [Google Scholar]
King DA, Jones WD, Crow YJ, Dominiczak AF, Foster NA, Gaunt TR, Harris J, Hellens SW, Homfray T, Innes J, et al. 2015. Mosaic structural variation in children with developmental disorders. Hum Mol Genet 24: 2733–2745. [DOI] [PMC free article] [PubMed] [Google Scholar]
Knijnenburg J, van Haeringen A, Hansson KBM, Lankester A, Smit MJM, Belfroid RDM, Bakker E, Rosenberg C, Tanke HJ, Szuhai K. 2007. Ring chromosome formation as a novel escape mechanism in patients with inverted duplication and terminal deletion. Eur J Hum Genet 15: 548–555. [DOI] [PubMed] [Google Scholar]
Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. 2013. The next-generation sequencing revolution and its impact on genomics. Cell 155: 27–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, NHLBI Exome Sequencing Project, Quinlan AR, Nickerson DA, Eichler EE. 2012. Copy number variation detection and genotyping from exome sequence data. Genome Res 22: 1525–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
Laurie CC, Laurie CA, Rice K, Doheny KF, Zelnick LR, McHugh CP, Ling H, Hetrick KN, Pugh EW, Amos C, et al. 2012. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet 44: 642–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee C, Iafrate AJ, Brothman AR. 2007. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet 39: S48–S54. [DOI] [PubMed] [Google Scholar]
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lonigro RJ, Grasso CS, Robinson DR, Jing X, Wu YM, Cao X, Quist MJ, Tomlins SA, Pienta KJ, Chinnaiyan AM. 2011. Detection of somatic copy number alterations in cancer using targeted exome capture sequencing. Neoplasia 13: 1019–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Machiela MJ, Zhou W, Sampson JN, Dean MC, Jacobs KB, Black A, Brinton LA, Chang IS, Chen C, Chen C, et al. 2015. Characterization of large structural genetic mosaicism in human autosomes. Am J Hum Genet 96: 487–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
Magi A, Tattini L, Cifola I, D'Aurizio R, Benelli M, Mangano E, Battaglia C, Bonora E, Kurg A, Seri M, et al. 2013. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol 14: R120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meynert AM, Ansari M, FitzPatrick DR, Taylor MS. 2014. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 15: 247. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ, et al. 2010. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet 86: 749–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, Durbin R. 2016. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32: 1749–1751. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pham J, Shaw C, Pursley A, Hixson P, Sampath S, Roney E, Gambin T, Kang SH, Bi W, Lalani S, et al. 2014. Somatic mosaicism detected by exon-targeted, high-resolution aCGH in 10 362 consecutive cases. Eur J Hum Genet 22: 969–978. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pique-Regi R, Monso-Varona J, Ortega A, Seeger RC, Triche TJ, Asgharzadeh S. 2008. Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics 24: 309–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, Wood NW, Hambleton S, Burns SO, Thrasher AJ, et al. 2012. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 28: 2747–2754. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rampášek L, Arbabi A, Brudno M. 2014. Probabilistic method for detecting copy number variation in a fetal genome using maternal plasma sequencing. Bioinformatics 30: i212–i218. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson WP. 2000. Mechanisms leading to uniparental disomy and their clinical consequences. Bioessays 22: 452–459. [DOI] [PubMed] [Google Scholar]
Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF. 2011. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 27: 2648–2654. [DOI] [PMC free article] [PubMed] [Google Scholar]
Snape K, Hanks S, Ruark E, Barros-Núñez P, Elliott A, Murray A, Lane AH, Shannon N, Callier P, Chitayat D, et al. 2011. Mutations in CEP57 cause mosaic variegated aneuploidy syndrome. Nat Genet 43: 527–529. [DOI] [PMC free article] [PubMed] [Google Scholar]
Staaf J, Lindgren D, Vallon-Christersson J, Isaksson A, Göransson H, Juliusson G, Rosenquist R, Höglund M, Borg A, Ringnér M. 2008. Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol 9: R136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12: 1611–1618. [DOI] [PMC free article] [PubMed] [Google Scholar]
UK10K Consortium. 2015. The UK10K project identifies rare variants in health and disease. Nature 526: 82–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
Woods CG, Bankier A, Curry J, Sheffield LJ, Slaney SF, Smith K, Voullaire L, Wellesley D. 1994. Asymmetry and skin pigmentary anomalies in chromosome mosaicism. J Med Genet 31: 694–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright CF, Fitzgerald TW, Jones WD, Clayton S, McRae JF, van Kogelenberg M, King DA, Ambridge K, Barrett DM, Bayzetinova T, et al. 2014. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385: 1305–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yamazawa K, Ogata T, Ferguson-Smith AC. 2010. Uniparental disomy and human disease: an overview. Am J Med Genet C Semin Med Genet 154C: 329–334. [DOI] [PubMed] [Google Scholar]