Highlights
-
•
We provide an update on developments in mtDNA forensic genetics.
-
•
We review the recent literature for the detection of heteroplasmy using next generation sequencing techniques.
-
•
We highlight artefacts and problems in datasets.
-
•
We discuss the implications for forensic practice.
Keywords: mtDNA, mtGenome, Heteroplasmy, Errors, Massively parallel sequencing
Abstract
Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method.
1. Introduction
The specific genetic markers and procedures employed for DNA testing of specimens in forensics currently depend primarily upon the quality and quantity of DNA present, and additionally on the known samples available for comparison. Though not a unique identifier, mitochondrial DNA (mtDNA) has long offered advantages for certain forensic genetic analyses. MtDNA is abundant relative to nuclear DNA in most human cells, with each cell containing hundreds to thousands of copies of the mitochondrial genome (mtGenome) [1]. For aged specimens in which the DNA may be highly fragmented and damaged [2,3], the high copy number of the mtGenome often means that mtDNA data can be reliably generated even when attempts to type nuclear DNA markers fail to produce a profile. MtDNA is also generally present in abundance in samples that may contain little or no intact nuclear DNA, such as hair shafts [4] and aged fingernails [5]. As a result, mtDNA has been the historical marker of choice for these sample types [6–10]. Additional benefits of mtDNA in forensic casework relate to the inheritance of the molecule, which permits the use of maternal relatives as references for unknown samples [6,11]. This is extremely valuable in a number of scenarios, but particularly when the direct references or close relatives required for kinship analyses based on autosomal markers are unavailable.
Standard forensic comparisons employing mtDNA are typically straightforward. Interpretation of the evidence generally entails a direct comparison of the sequences obtained from the samples of questioned origin to the sequences of known origin. When the mtDNA sequences of both questioned (e.g. crime scene) and known (e.g. suspect) samples are consistent across all positions considered for interpretation, the samples cannot be excluded as originating from the same source or same lineage [12]. On the flip side, non-matching mtDNA sequences between a questioned and known sample (an exclusion) also provide useful information in a forensic context [11]. Both the International Society of Forensic Genetics (ISFG) and the U.S. Scientific Working Group on DNA Methods (SWGDAM) have published detailed guidelines on both mtDNA sequence data development in the laboratory as well as data analysis and interpretation [12–15].
Though routine mtDNA testing tends to be relatively straightforward, there are particular scenarios that introduce additional complexity from the standpoint of interpretation. The most common of these involves the observation of heteroplasmy. Heteroplasmy refers to the presence of more than one mtDNA haplotype within a single individual or tissue. Individuals may possess mtDNA molecules that differ in their length (length heteroplasmy; LHP), or at single nucleotide positions (point heteroplasmy; PHP). Heteroplasmic variation of either type is not used for exclusionary purposes in forensic match comparisons of mtDNA profiles, owing to both (1) the high mutation rate of mtDNA and the germ-line bottleneck in oogenesis, which can result in complete homoplasmic shifts between generations [16,17] and (2) the variation that has been observed between different tissue types [18–24]. However, shared PHP between maternal relatives can provide further support for non-exclusion and, indeed, has proven to increase the strength of the mtDNA evidence in a case of historical significance [25].
The average mutation rate of mtDNA exceeds that of nuclear DNA by at least an order of magnitude according to nucleotide substitution rates reported in phylogenetic studies [26–29]. However, the mutation rate estimated from pedigree studies is substantially higher, most likely due to the fine-scale picture of the mutation process provided at that level, before the changes can be masked by the longer-term effects of homoplasy and selection [30–32]. Both germline and somatic mtDNA mutations occur with relative frequency, and are often observed in mtDNA profiles as heteroplasmy.
Heteroplasmy is generally regarded as being the exception rather than the rule. Nevertheless, it has been shown to occur with appreciable frequency in the general population [19,33] and must be considered in the interpretation of forensic evidence. To date, studies based on Sanger technology have formed the basis of our understanding and led to the development of appropriate interpretation guidelines to accommodate it [14,15]. However, the sensitivity and throughput of newer massively parallel sequencing (MPS) technologies stand to further refine our knowledge of heteroplasmy.
As heteroplasmy generally presents in mtDNA sequence data as a mixture (see Fig. S1 for an example from Sanger data), distinguishing authentic mtDNA heteroplasmy (i.e. intra-individual variation) from other causes of mixed data that may result in the appearance of heteroplasmy adds a level of complexity to data interpretation regardless of whether the data are Sanger-based or MPS-based. Mixtures of mtDNA from distinct individuals, contamination by nuclear mitochondrial pseudogene (NUMT) sequences, and chemistry-based sequencing errors all have contributed to problems in the detection and reporting of homoplasmic mutations in past Sanger-based mtDNA reference population datasets [34–36]. It stands to reason that these issues have the potential to impact heteroplasmy detection in haplotypes developed by MPS techniques as well. Given the recent explosion of studies exploring the inheritance, pattern and incidence of mtDNA heteroplasmy based on MPS data [21,22,24,37–46], along with our expectation that the detection and treatment of heteroplasmy will be one of the key areas in which MPS influences current forensic mtDNA testing practices, in this paper we discuss the detection and authentication of mtDNA heteroplasmy in light of these technological advances. In addition, we review mtGenome heteroplasmy rates reported in Sanger and MPS-based studies to provide a baseline understanding for both future MPS-based studies and mtDNA casework application.
2. Detection of mtDNA heteroplasmy using MPS techniques
The past ten years have seen a dramatic advance in the methods, chemistries and detection platforms available for DNA data generation. These massively parallel technologies are rapidly replacing more traditional methods of DNA sequencing and typing; and although Sanger sequencing is still employed, it has largely become a complementary rather than standalone technology in many disciplines. MPS methods produce large volumes of sequence data at extremely low cost relative to Sanger sequencing, and over the past decade the technologies themselves, as well as their applications, have evolved quickly. MPS has revolutionized most fields of genetics and is now routinely applied to various questions in medical genetics (e.g. personalized medicine and genome-wide association studies), evolutionary biology, molecular anthropology, epidemiology, and metagenomics [47–49]. For many of these applications, NGS is being used to produce sequence data covering thousands of loci, or even entire organismal genomes in a single sequencing run. For mtDNA sequencing in the forensic context, the high throughput capacity of MPS can be harnessed to develop mtDNA data at high depths of sequence coverage for tens or hundreds of individuals. Indeed, several studies have demonstrated the clear utility of MPS for mtDNA sequencing [50–52], with the throughput and sensitivity of the technology resulting in far more efficient and cost-effective data production than can be achieved via Sanger technology.
With regard to the identification of heteroplasmy, the most substantial difference between MPS and Sanger-type sequencing is the overall sensitivity of the detection methods. With MPS, the parallel sequencing of, and subsequent detection from, individual source DNA templates can permit the discovery of very low frequency molecules (<5%). Such authentic low-level sequence variants are often imperceptible in Sanger-based capillary electrophoresis (CE) trace data, which essentially reflect mtDNA consensus sequences, and where the limit of detection is typically described as being approximately 10–20%. For example, the current GEDNAP (www.gednap.org) proficiency test program expects participants to detect and report PHP in Sanger-based data when the minor component exceeds 20% of the major component based on visual estimation of peak heights (C. Hohoff, personal communication), and a recent study detected artificially-mixed mtDNA sequences at approximately 10% and greater [24]. Experiments in the authors’ respective laboratories have demonstrated that heteroplasmic variants representing fewer than 10% of the mtDNA molecules or PCR products can be detected from CE data (H. Niederstätter, unpublished; and M. Peck, unpublished). However, detection of authentic heteroplasmy at these lower levels has been position and/or mtGenome region-specific rather than universal, and also highly dependent upon the extent of sequence background/noise. These caveats apply to the detection of really any heteroplasmy in electrophoretic data, whether the minor variant represents more or less than 10% of the mtDNA molecules – and, it is important to note, likely also apply to MPS-based data. This may be particularly true in the case of poor quality or very low DNA quantity forensic specimens, as well as in regions of the mtGenome that are more challenging to sequence (for instance, due to secondary structure or the local sequence environment) and thus exhibit strand bias and/or reduced coverage compared to the rest of the molecule [50–55]. Additionally, both the random and non-random sequencing errors that are more apparent in MPS reads than consensus Sanger trace data will likely influence the threshold at which authentic low-level heteroplasmy can be distinguished from noise [55,56].
MPS library preparation strategies based on capture of mtDNA templates from a genomic DNA extract ([57,58], for example), otherwise known as hybridization enrichment, can recover the very short mtDNA fragments that are difficult or impossible to sequence using targeted PCR-based approaches. This technique has been used in the ancient DNA community to sequence exceptionally old specimens via MPS [57,59–61], and holds promise for some forensic applications [62,63]. However, the increased sensitivity of this sample preparation method for highly degraded specimens comes with a cost: homologous NUMTs are also more easily captured (as compared to a long-range PCR approach [64]), and this may confound heteroplasmy detection. MPS reads developed from highly degraded DNA are also, of course, very short. As a result, it may be more difficult to determine the phasing of reads and thus the overall haplotype of a low-level sequence – something that could otherwise assist in the identification of NUMT background. Though NUMT amplification does not pose a significant challenge with current mtDNA typing processes as applied to the CR even when extremely short amplicons are employed, the results of a few studies [64–66] suggest that NUMT detection should be considered in a MPS framework. Fortunately, and as with current Sanger sequence data, the identification of NUMTs as the source of very low-level mixed mtDNA data can almost certainly be addressed with laboratory-based, bioinformatic, and other data handling and review methods.
To date, the detection of mixed positions that may represent heteroplasmy in Sanger-based data has primarily relied upon repeated visual examination of properly aligned electropherogram traces by experienced examiners. This approach to heteroplasmy identification is clearly unviable with MPS data, due to both average sequence read depths that may range into the thousands or tens of thousands, and the difficulty (or impossibility) of manual manipulation of mapped sequence reads in most software programs designed for MPS data handling. As a result, the detection of mixed positions in MPS data will need to rely on bioinformatic solutions.
A related but distinct issue is the determination or authentication of heteroplasmy as the cause of the mixed mtDNA data observed at any given nucleotide position – and in this respect the treatment of Sanger and MPS data may be quite similar. With Sanger-based data, heteroplasmy may be confirmed as the source of mixed mtDNA data by the consistent application of a set of criteria – which in past studies has included conditions such as observation across PCR replicates, observation of the variant nucleotide in both sequencing directions and in excess of any noise/background in the sequence data, and the use of STR typing to exclude contamination as the cause of multiple mixed positions. In addition, but to a lesser extent due to the rarity of the issue in Sanger-based data, PCR strategies designed to avoid NUMT amplification or comparison of any mixed positions to known NUMTs, along with confirmation via an alternative typing method have also been employed [19,23,67,68]. Thus, heteroplasmies are called in Sanger sequence data when other potential causes of mtDNA mixtures have been excluded as reasonable possibilities. A similar approach for authenticating heteroplasmy will be required with MPS. It will be a matter of (1) recognizing the potential causes of mtDNA mixtures aside from heteroplasmy and their potential for impact on the data, and (2) developing and applying criteria that address these causes and reduce the false positive heteroplasmy designation rate to the acceptable level. At its core this approach is no different than what is presently applied to Sanger data. However, it will likely require greater formalization given the more quantitative nature, and sheer volume, of MPS data.
Several of the MPS-based mtDNA publications to date have developed laboratory processing and/or data evaluation strategies to eliminate potential causes of mixed mtDNA data and thereby authenticate heteroplasmy. For example, use of a long-range PCR strategy or primers designed to avoid NUMT amplification; data filtering by quality scores; “double-strand confirmation” (observation of the variant nucleotide in sequences from both directions); a minimum number of variant nucleotide observations; strand balance thresholds; frequency screening in population data; comparison to NUMT compendiums or alignment against the human reference genome hg19 to filter NUMT sequences; and confirmation by multiple typing methods or distinct MPS technologies [21–24,37,38,40–42,45,46,50,52,56,64,69,70]. Yet as the examinations detailed in Section 3 illustrate, in some instances the measures applied have been insufficient in flagging authentic heteroplasmy.
3. Incidence and pattern of human mtDNA heteroplasmy in Sanger and MPS-based studies
Examinations of population-based mtDNA datasets developed via Sanger sequencing have identified PHP in the CR in approximately 6% of individuals when buccal and blood cells were examined [19], and in up to 88% of individuals when PHPs across several tissue types from the same individual were combined [23]. The variable rates observed in different studies are primarily accounted for by differences in the (1) tissue types, and (2) populations examined. The incidence of PHP in tissues with high metabolic activity (e.g. 79% in muscle [23]) has been generally higher than in blood or blood-derived specimens (4–8% [23,67]), though a recent study also found high rates of PHP in 18% of the investigated individuals, albeit with more sensitive detection criteria [23]. In general, the CR nucleotide positions at which PHP has been most frequently observed are consistent with the positions with the highest substitution rates [19]. There are a few notable exceptions, however. In particular, PHP at position 16,093 has often been observed at rates more than double any other CR position, despite a substitution rate that is considerably lower than other variable sites in the CR [7,23,71–73].
Though the incidence of LHP in the CR also varies by population and sample type, the picture derived from examinations of homopolymer regions of the CR across numerous studies is consistent: LHP is observed always or nearly always when tracts of nine or more identical nucleotides are present, sometimes with tracts of eight identical nucleotides, and very rarely with tracts of seven or fewer identical nucleotides (e.g. [74]). In the AC repeat region ending at position 524, LHP has been detected when the majority molecule had as few as five repeats, but more commonly when six or more repeats are present (www.empop.org [75]).
Despite the fact that some 20,000 complete human mtGenomes are now publicly available, it is only within the past few years – and concurrent with the expanding use of MPS technologies for mtGenome sequencing – that detailed reports on heteroplasmy in the mtDNA coding region have begun to emerge. The few Sanger and MPS-based population datasets that have assumed or applied heteroplasmy detection thresholds ranging from approximately 10% to 20% have produced remarkably similar pictures of PHP across the full mtGenome (Fig. 1 and Table S1). At this level of detection, roughly 25% of individuals have at least one PHP, and no more than a handful of PHPs are observed in any one individual [38,52,67,68]. In addition, and similar to the CR, variation in PHP rates by population was also observed [68]. When the PHP data from these studies were considered in combination, they suggested that coding region PHP hotspots may be rare or nonexistent, and that patterns of PHP and substitution in the coding region may not be correlated [68] as is generally the case in the CR [19].
A number of MPS-based studies of heteroplasmy across the full mtGenome, most of them based on a substantially lower heteroplasmy detection threshold, have reported higher rates and/or different patterns of PHP than those observed in the above-referenced datasets. The first high-profile report on heteroplasmy in MPS-based mtGenome data was published in Nature in 2010 [37]. The authors described universal presence of heteroplasmy in normal colon tissue from ten individuals when using a <2% detection threshold, with an average of four PHP per individual. In a critique of that study, Bandelt and Salas [76] found an average of five errors in the homoplasmic variants reported for those specimens – indicating some general issues with the He et al. data. Those authors also found inconsistencies in the rate and pattern of the He et al. CR heteroplasmies in comparison to the comprehensive Sanger-based study by Irwin et al. [19]. Other authors have suggested the He et al. data may not be fully reliable due to Illumina chemistry-based sequencing errors mistaken as heteroplasmy, or the use of lymphoid cell lines [45,46].
A more recent paper [39] reported a very high number of PHP in some of the 40 control DNA samples examined, along with an incidence of 65% PHP in the samples overall. These observations were based on the application of a 10% minor variant (or “minor allele frequency”, MAF) threshold for detection – the same threshold used in four other studies to establish the presence of heteroplasmy in only 25% of individuals tested. However, when the positions at which PHP was observed are examined further, it is clear that some of the mixed bases are the result of mixture between distinct mtDNA haplotypes rather than authentic heteroplasmy. For example, of the ten PHP reported for sample NA121245, eight are diagnostic for haplogroup K1b1a1 (according to Build 16 of PhyloTree [77]).
Similar problems were detected in a recent high-profile report on heteroplasmy [43], which examined mtDNA data from the 1000 Genomes Project (http://www.1000genomes.org). As we [68,78] and other authors [44] have described, the presence of distinct mtDNA haplogroup motifs among the heteroplasmies listed for a number of the samples indicates that at least some of the reported PHP is contamination rather than intraindividual variation. In a reexamination of their original data, Ye et al. identified 63 samples in which mixtures may be present and removed these from their dataset [79]. While these may represent “only” 5.8% of the original sample set of 1085, the discarded samples account for more than 22% of the heteroplasmy initially reported by Ye et al. Further, they comprise all of the samples for which the authors originally described more than 15 PHP per individual. The revised Ye et al. dataset still appears to be unusual in a few respects, especially in light of other reports on mtDNA heteroplasmy that employed various detection thresholds (see Fig. 1 and Table S1). For example, the data still reflect an unusually high percentage of individuals with heteroplasmy (even when a 10% PHP filter is applied to the data), a low percentage of coding region PHPs that are unique, a high frequency of uncommon coding region variants (for example, 29 instances of PHP at position 3579) which suggests that sequencing error has likely been mistaken as heteroplasmy, and the persistence of distinct haplogroup motifs among the PHP in at least a few of the non-excluded samples.
A separate study of mtDNA heteroplasmy published in 2014 also utilized data from the 1000 Genomes Project [42]. After filtering for NUMT-like sequences and PCR duplicates, the authors reported average mtGenome sequence coverage ranging from approximately 50 to 100 (variable by population), and applied an approximately 1% MAF detection threshold for at least some mtGenome positions. The study reported an average of 7.75 heteroplasmies (PHP and LHP inclusive) per sample across more than 1200 individuals, with a maximum number of PHP described for a single individual greater than 80. Examination of the position-specific incidence of PHP in the CR and the codR reported in this study (obtained from Additional File 9 in [42], and displayed in part in Table S2) reveals some highly unusual patterns. For instance, 21 positions, including multiple coding region positions, have a higher frequency of PHP than position 16,093 – which past studies have repeatedly demonstrated has the highest rate of PHP in the CR [10,18,19,68,71,80]. As an extreme example, the number of PHP reported for position 513 was more than 8-fold greater than the incidence of PHP at 16,093; yet in an earlier, Sanger-based study of heteroplasmy in more than 5000 CR haplotypes, not a single instance of PHP at position 513 was detected [19]. Similarly, the Irwin et al. examination found zero instances of PHP at position 247, but Diroma et al. report PHP at 247 at twice the rate of 16,093. Further, among the 26 positions with the highest incidence of PHP in the Diroma et al. dataset, 18 exactly match the sequence motif for macrohaplogroup N; and several other complete haplogroup motifs can be seen among the PHP with lower reported frequencies. These observations clearly point to widespread problems in the dataset, which in the above examples are mostly likely the result of both mixtures/contamination and sequence alignment issues.
In sharp contrast to these problematic reports, a recent study that developed extremely high depth Illumina sequence coverage (20,000×) and a custom data analysis pipeline to examine the transmission of heteroplasmy in mother–child pairs found a substantially smaller maximum number of heteroplasmies per individual [24]. With some PHP detected at MAFs as low as 0.1% – 10-fold lower than the MAF threshold applied in the Ye et al. [43] and Diroma et al. [42] examinations – an average of 78% of individuals possessed heteroplasmy in the Rebolledo-Jaramillo et al. study, and the maximum number of PHP observed in any single sample was seven [24]. When we applied a 1% MAF cut-off to the data, an average of 63% of individuals had at least one PHP, and the maximum number of heteroplasmies per sample was five (Fig. 1 and Table S1). When a 10% MAF threshold was applied, an average of 24% of individuals possessed a PHP, and the maximum number of PHP per sample was two – a result that is strikingly consistent with previous Sanger sequencing studies in which heteroplasmy detection thresholds were roughly 10% or greater. Similarly, the recent report on mtGenome heteroplasmy from Skonieczna et al. found a maximum of just two PHP within a single individual. Overall, the authors found heteroplasmy in 32% of 50 individuals, despite including a number of heteroplasmies observed at a variant frequency of just 2% [44].
Though the heteroplasmy data from these two recent studies are based on a smaller sample size (39 mtDNA lineages and 50 individuals, respectively) than other MPS-based examinations, it is notable that the only positions at which PHP was observed as shared (detected in more than one lineage) in either study were in the CR, and that the 74 total coding region positions at which PHP was detected across the two studies do not overlap with any of the heteroplasmic positions captured from more than 1100 complete mtGenomes in earlier reliable reports (Li et al. [38], Ramos et al. [67], King et al. [52], and Just et al. [68] datasets; see Table S3). That is, when the coding region PHP data from these six studies are combined, 261 of 279 coding region PHP (93.5%) are unique across the approximately 1200 total mtGenomes, and no single coding region PHP is observed in more than two individuals. The data from these two studies are thus consistent with earlier reports in these key respects, and they further support the notion that coding region PHP hotspots (as exist in the CR) may be rare or nonexistent.
Yet another very recent and in-depth examination of mtDNA heteroplasmy, based on high coverage depth MPS data and a low PHP detection threshold, also appears to be consistent with these findings. Earlier this year, Li et al. [46] published an analysis of complete mtGenomes from 12 tissue types from 152 total individuals. Across all tissue types and individuals, PHPs present at a MAF of 0.5% or greater were observed at 179 different coding region positions. When the heteroplasmy data for a single tissue type (blood) was assessed (with individual 272 removed due to probable mixture given multiple positions specific to mtDNA haplogroup H3g1b [77]), PHPs present at a MAF of 0.5% or greater were found in 80.6% of individuals, 90% of the coding region PHP were unique to a single individual, and the maximum number of PHP found in a single individual was eight. With the application of a 10% detection threshold, 22.3% of individuals possessed at least one PHP, and no individual had more than two PHP. As Fig. 1 and Table S1 demonstrate, all of these values are highly similar to other seemingly reliable reports on mtDNA heteroplasmy.
The examination of heteroplasmy in new MPS-based reports in comparison to consensus observations from previous reliable reports that are detailed above highlight an important and useful strategy for assessing mtGenome data developed using new technologies. MtDNA data that have been carefully generated, comprehensively examined with respect to well-known sources of error, and shown to be of high quality can and should be used to develop expectations regarding the prevalence and pattern of mtGenome variation. Such datasets can be used as benchmarks against which new datasets can be evaluated. This is especially important when considering very low-level heteroplasmy detection in MPS-based datasets, since comparative data from methodologies that have been fully validated for forensic use (i.e. Sanger sequencing) will be unavailable. When new MPS-based mtDNA data do not exhibit any obvious signs of errors (for instance, haplogroup-specific motifs or high rates of heteroplasmy at unusual positions) and are consistent with the expected patterns of variation at a common level of resolution (e.g. 10%), there can be greater confidence that the sequence variants detected at lower levels are authentic. Conversely, when the data appear to differ substantially from the expected patterns, more thorough investigation of the potential cause(s) for the discrepancies would be prudent.
The coding region PHPs reported in the recent Li et al. study include one such feature that is anomalous in comparison to prior reports. Among the 192 total coding region PHP the authors detected at MAFs ≥ 0.5% across all 12 tissue types (again excluding individual 272), PHP at position 12,684 was found in four individuals, and PHP at position 12,705 was identified in six individuals (see Dataset S1 in [46]). For these two positions the heteroplasmy was always evident in the blood specimens at relatively low MAFs (2.1–4.2%) with the rCRS variant as the majority type, and in all but one individual was found in at least one additional tissue type at very low MAF (0.5–1.0%). No PHP at these two positions was detected in the earlier datasets captured in Table S3, including the Rebolledo-Jaramillo et al. data that were developed from blood specimens analyzed using a 0.1% MAF [24], and both positions have low relative substitution rates according to [73]. Yet PHP at these two positions co-occurred in three individuals in the Li et al. study, in a single tissue type (blood) at a rate greater than would be expected by chance [46]. When we performed a blastn search of GenBank using a text string that included the minor sequence variant at both positions (ATTCTTCAAATATCTACTCATT), three of the 20 fully matching human results represented a NUMT on Chromosome 13. But several of the other 100% matches were to mtGenomes published in peer-reviewed studies, and both 12,684 and 12,705 are third codon positions (in the ND5 gene). Thus, though the multiple PHPs at these two positions appear to be quite unusual and are worth noting as such, it is difficult to conclusively determine whether they represent authentic intraindividual mtDNA variation or not.
When the coding region PHP from the recent Li et al. examination are combined with data from earlier reports, a greater number of PHP are observed in more than one individual, as would be expected as the total number of mtGenomes examined using low MAF thresholds and the number of PHP detected increase (compare Tables S4 to S3). Yet still only 26 of the 437 distinct heteroplasmic positions found among the 1343 total individuals/lineages (derived from 3073 complete mtGenome sequences) were observed more than once (Table S4). When the PHP at positions 12,684 and 12,705 from Li et al. are ignored, 89.2% of the PHP were unique to a single individual, PHP at 22 positions was observed in two individuals, and just two PHP were observed in three individuals. A linear regression analysis of the combined data indicates that little of the total variance in the incidence of coding region PHP is explained by the relative substitution rates of coding region positions (r2 = 0.0449; Fig. S2), and the 22 PHPs detected in two distinct individuals were found at positions that range from highly variant to invariant among the 2000+ mtGenomes examined by Soares et al. [73]. However, the two coding region positions at which PHP was observed in three individuals (positions 709 and 12,007) both have high relative substitution rates, and the proportion of PHP observed at positions variant (versus invariant) in the Soares et al. analysis is greater than would be expected by chance (two-tailed p < 0.0001). This constellation of findings is both sensical and consistent with recent analyses of heteroplasmy that seem to demonstrate a degree of functional constraint imposed on random mutation in the coding region that is short of the more extensive selection detected when complete substitutions are examined [24,46,67,68]. Additional reliable PHP data (and, ideally, a refined estimation of position-specific rates) will be needed to clarify the relationship between the likelihood of observation of heteroplasmy in relation to mutation and substitution rates in the coding region, and to determine whether any true heteroplasmic hotspots exist in the coding region.
LHP has been largely ignored in MPS-based mtDNA examinations to date, with the exception of two studies that developed extremely high sequence coverage depths [24,81], and one other study that reported numerous erroneous LHPs [42]. And although LHP is not routinely used in forensic match comparisons of Sanger-based CR haplotypes [14,15], accurate sequence alignment in LHP regions will likely still be desirable as the field transitions to MPS for routine mtDNA data generation. Particular regions in which LHP is encountered contain phylogenetically relevant information (for example, variation at 16,189, and with the 9 base pair coding region repeat), and thus accurate and consistent reflection of the template in these regions will be important. In addition, improper alignment of sequence reads when LHP is present can lead to erroneous haplotypes due to miscalled indels, complete substitutions, and heteroplasmy. This issue has previously resulted in phantom mutations in Sanger-based mtDNA data [82], and appears to be the cause of some specious heteroplasmies (both point and length) described in one of the few MPS-based studies that has attempted to report LHP [42].
Regardless of the MPS chemistry or platform employed, sequence data in homopolymer regions are likely to continue to be challenging to assemble and align when LHP is present. If sequence reads are mapped with respect to a reference sequence (such as the rCRS [83]), the particular reads that are mapped or discarded will depend upon the underlying algorithm of the chosen assembler/software program as well as the stringency of the base mismatch and length similarity parameters applied (assuming these latter two features can be adjusted with the selected assembler). And clearly, the number and complexity of the variant molecules in cases of LHP may impact the difficulty of alignment and the accuracy with which the authentic template sequences are detected due to reference bias [84–86]. That is, in cases of simple insertions of one or more nucleotides, assembly of the reads against the rCRS may be fairly uncomplicated as the template sequences will not differ greatly from the reference sequence; but in situations in which the mixture of mtDNA molecules are more dissimilar to the reference sequence, those reads which align will be biased toward those most similar to the reference. Additionally, a higher rate of non-random sequencing error in GC-rich regions may result in sequences with low quality scores that will be filtered out during secondary data processing [55]. This could further complicate data interpretation in C-tracts in particular.
In considering how assembly in mtGenome regions prone to LHP (as well as simple indels) may be addressed, one common-sense step would be to use only those sequence reads which span the entire homopolymeric or repeat region (i.e. reads which include portions of the 3′ and 5′ flanking sequence) to determine the template sequence(s) for the region – as sequence reads which end within the tract will not provide useful information, and indeed may confound interpretation [45]. In addition, an approach that utilizes multiple reference sequences representing known variation in the region could potentially be useful in some instances, both to capture all sequence reads for the region and to avoid reference bias. For example, indels of the repeated 9 bp motif at positions 8272–8289 are structurally similar to nuclear short tandem repeats (STRs). In this case, both (1) LHP of the 9 bp repeat itself, and (2) LHP within the 9 bp repeat region (as was observed in mtGenomes reported in [68]) may prove challenging to align to the rCRS and accordingly suffer from reference bias. One strategy that has been used to develop accurate allele calls for MPS-based STR data was the development of reference sequences for known alleles for each STR marker, and subsequent mapping of the MPS data against the set of references for each marker using stringent sequence similarity requirements (to ensure assembly to the correct reference sequence among multiple similar references) [87]. With this approach it seems that length variants in the 9 bp repeat region could be fully captured, accurately detected, and in cases of LHP, approximately quantified.
Testing would clearly be required to assess how well this multiple reference alignment method or a similar strategy could work for mtDNA (rather than STR) data, and for which types of length variants it may have utility (e.g. repeats versus homopolymeric tracts, simple indel LHP versus LHP combined with substitutions). Alternative approaches that specifically consider the size of the fragments spanning LHP and indel-prone regions may also be considered. These methods also use de novo alignment and have been shown to work for MPS-based STR analysis [88]. While this strict amplicon-size/read-length approach currently employed for STRs may not be ideal for mtDNA purposes, some variation could potentially be used for repeat and indel regions of the mtGenome. Additionally, if the specific sequence reads which span particular LHP or indel-prone regions can be identified and sequestered from the remainder of the data, a de novo assembly of only those reads would be plausible. It also seems feasible that where limitations in sequence alignment algorithms prevent reliable placement of indels according to current phylogenetic-based guidelines in forensics, but consistent results are obtained using the selected data analysis pipeline, the variant calls for the relevant regions could be transformed bioinformatically prior to reporting.
Regardless of the particular method(s) applied, it seems evident that known LHP regions should be afforded some degree of specialized handling, distinct from the routine processes applied to either sequence alignment or variant calling in the remainder of the genome. To accomplish this, it will be essential to develop a catalog of the mtGenome regions in which LHP has been observed, and to have a background understanding of the conditions that frequently result in LHP (i.e. homopolymeric tracts of eight or more identical nucleotides). With this knowledge, we can begin to develop and apply methods to screen for, identify and accurately report LHP in mtGenome haplotypes derived from MPS data.
4. Summary and outlook
The application of highly sensitive MPS techniques to the analysis of mtDNA has the potential to improve the recovery of genetic information from difficult forensic specimens, and increase the discrimination potential of mtDNA via the capture and comparison of full mtGenomes. Several studies to date have harnessed the generally increased sensitivity of MPS technologies to examine purported low frequency mtDNA variants observed in heteroplasmic state, but in a number of these reports authentic mtDNA heteroplasmy has not been fully distinguished from other sources of mixed data. Given the differences in the sensitivity of Sanger sequencing and MPS to heteroplasmy detection and thus the greater opportunity to detect low-level, non-target DNA that may give rise to the appearance of heteroplasmy, together with the requisite differences in data handling due to data volume, the reliable identification of heteroplasmy in MPS data will need to depend heavily on the development and application of bioinformatic processes. The development of such tools should be based on, among other things: (1) the mtDNA phylogeny, (2) known patterns of mtDNA heteroplasmy and mutation derived from well-vetted datasets, (3) known NUMT sequences, (4) MPS chemistry artifacts and (5) data alignment artifacts. Robust and reliable automated processes should greatly simplify the discernment of authentic intraindividual variation from other sources of mixed MPS-developed mtDNA data.
Despite substantial differences between Sanger and MPS typing methods, the approach to authenticating heteroplasmy is fundamentally the same in both cases. It involves recognition of the potential causes of mtDNA mixtures aside from heteroplasmy, an understanding of their potential for impact on the data, and systematic methods for developing and applying criteria that address these causes. With a careful and methodical approach, there would seem to be no question that MPS can dramatically improve our understanding of mtDNA heteroplasmy and mutation, leading to further refinement of its treatment and utility in forensic mtDNA casework.
Acknowledgements
The research leading to this publication was funded in part by the Austrian Science Fund (FWF) (P22880-B12) and TR L397, as well as by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 285487 (EUROFORGEN-NoE). It was also supported by Award No. 2011-MU-MU-K402, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice, and administered by the American Registry of Pathology. Neither the U.S. Department of Justice nor the American Registry of Pathology had any role in the analysis or interpretation of data; in the writing of this report; or in the decision to submit this paper for publication. The opinions or assertions presented herein are the private views of the authors and should not be construed as official or as reflecting the views of the Department of Justice, the Department of Defense, its branches, the U.S. Army Medical Research and Materiel Command, the Armed Forces Medical Examiner System, the Federal Bureau of Investigation, or the U.S. Government.
References
- 1.Robin E.D., Wong R. Mitochondrial DNA molecules and virtual number of mitochondria per cell in mammalian cells. J. Cell. Physiol. 1988;136:507–513. doi: 10.1002/jcp.1041360316. [DOI] [PubMed] [Google Scholar]
- 2.Lindahl T., Nyberg B. Rate of depurination of native deoxyribonucleic acid. Biochemistry. 1972;11:3610–3618. doi: 10.1021/bi00769a018. [DOI] [PubMed] [Google Scholar]
- 3.Lindahl T., Andersson A. Rate of chain breakage at apurinic sites in double-stranded deoxyribonucleic acid. Biochemistry. 1972;11:3618–3623. doi: 10.1021/bi00769a019. [DOI] [PubMed] [Google Scholar]
- 4.Higuchi R., von Beroldingen C.H., Sensabaugh G.F., Erlich H.A. DNA typing from single hairs. Nature. 1988;332:543–546. doi: 10.1038/332543a0. [DOI] [PubMed] [Google Scholar]
- 5.Anderson T.D., Ross J.P., Roby R.K., Lee D.A., Holland M.M. A validation study for the extraction and analysis of DNA from human nail material and its application to forensic casework. J. Forensic Sci. 1999;44:1053–1056. [PubMed] [Google Scholar]
- 6.Wilson M.R., DiZinno J.A., Polanskey D., Replogle J., Budowle B. Validation of mitochondrial DNA sequencing for forensic casework analysis. Int. J. Legal Med. 1995;108:68–74. doi: 10.1007/BF01369907. [DOI] [PubMed] [Google Scholar]
- 7.Melton T., Nelson K. Forensic mitochondrial DNA analysis: two years of commercial casework experience in the United States. Croat. Med. J. 2001;42:298–303. [PubMed] [Google Scholar]
- 8.Linch C.A., Whiting D.A., Holland M.M. Human hair histogenesis for the mitochondrial DNA forensic scientist. J. Forensic Sci. 2001;46:844–853. [PubMed] [Google Scholar]
- 9.Cline R.E., Laurent N.M., Foran D.R. The fingernails of Mary Sullivan: developing reliable methods for selectively isolating endogenous and exogenous DNA from evidence. J. Forensic Sci. 2003;48:328–333. [PubMed] [Google Scholar]
- 10.Melton T., Dimick G., Higgins B., Lindstrom L., Nelson K. Forensic mitochondrial DNA analysis of 691 casework hairs. J. Forensic Sci. 2005;50:73–80. [PubMed] [Google Scholar]
- 11.Holland M., Parsons T.J. Mitochondrial DNA sequence analysis – validation and use for forensic casework. Forensic Sci. Rev. 1999;11:29. [PubMed] [Google Scholar]
- 12.Carracedo A., Bär W., Lincoln P., Mayr W., Morling N., Olaisen B., Schneider P., Budowle B., Brinkmann B., Gill P., Holland M., Tully G., Wilson M. DNA commission of the international society for forensic genetics: guidelines for mitochondrial DNA typing. Forensic Sci. Int. 2000;110:79–85. doi: 10.1016/s0379-0738(00)00161-4. [DOI] [PubMed] [Google Scholar]
- 13.SWGDAM Guidelines for mitochondrial DNA (mtDNA) nucleotide sequence interpretation. Forensic Sci. 2003 [Google Scholar]
- 14.SWGDAM . 2013. Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories.http://swgdam.org/SWGDAM%20mtDNA_Interpretation_Guidelines_APPROVED_073013.pdf [Google Scholar]
- 15.Parson W., Gusmao L., Hares D.R., Irwin J.A., Mayr W.R., Morling N. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing. Forensic Sci. Int. Genet. 2014;13:134–142. doi: 10.1016/j.fsigen.2014.07.010. [DOI] [PubMed] [Google Scholar]
- 16.Tully G., Barritt S.M., Bender K., Brignon E., Capelli C., Dimo-Simonin N. Results of a collaborative study of the EDNAP group regarding mitochondrial DNA heteroplasmy and segregation in hair shafts. Forensic Sci. Int. 2004;140:1–11. doi: 10.1016/S0379-0738(03)00181-6. [DOI] [PubMed] [Google Scholar]
- 17.Wilson M.R., Polanskey D., Replogle J., DiZinno J.A., Budowle B. A family exhibiting heteroplasmy in the human mitochondrial DNA control region reveals both somatic mosaicism and pronounced segregation of mitotypes. Hum. Genet. 1997;100:167–171. doi: 10.1007/s004390050485. [DOI] [PubMed] [Google Scholar]
- 18.Calloway C.D., Reynolds R.L., Herrin G.L., Jr., Anderson W.W. The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age. Am. J. Hum. Genet. 2000;66:1384–1397. doi: 10.1086/302844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Irwin J.A., Saunier J.L., Niederstätter H., Strouss K.M., Sturk K.A., Diegoli T.M. Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples. J. Mol. Evol. 2009;68:516–527. doi: 10.1007/s00239-009-9227-4. [DOI] [PubMed] [Google Scholar]
- 20.Jazin E.E., Cavelier L., Eriksson I., Oreland L., Gyllensten U. Human brain contains high levels of heteroplasmy in the noncoding regions of mitochondrial DNA. Proc. Natl. Acad. Sci. USA. 1996;93:12382–12387. doi: 10.1073/pnas.93.22.12382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Samuels D.C., Li C., Li B., Song Z., Torstenson E., Boyd Clay H. Recurrent tissue-specific mtDNA mutations are common in humans. PLoS Genet. 2013;9:e1003929. doi: 10.1371/journal.pgen.1003929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Krjutskov K., Koltsina M., Grand K., Vosa U., Sauk M., Tonisson N. Tissue-specific mitochondrial heteroplasmy at position 16,093 within the same individual. Curr. Genet. 2014;60:11–16. doi: 10.1007/s00294-013-0398-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Naue J., Horer S., Sanger T., Strobl C., Hatzer-Grubwieser P., Parson W. Evidence for frequent and tissue-specific sequence heteroplasmy in human mitochondrial DNA. Mitochondrion. 2015;20:82–94. doi: 10.1016/j.mito.2014.12.002. [DOI] [PubMed] [Google Scholar]
- 24.Rebolledo-Jaramillo B., Su M.S., Stoler N., McElhoe J.A., Dickins B., Blankenberg D. Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc. Natl. Acad. Sci. USA. 2014;111:15474–15479. doi: 10.1073/pnas.1409328111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ivanov P.L., Wadhams M.J., Roby R.K., Holland M.M., Weedn V.W., Parsons T.J. Mitochondrial DNA sequence heteroplasmy in the Grand Duke of Russia Georgij Romanov establishes the authenticity of the remains of Tsar Nicholas II. Nat. Genet. 1996;12:417–420. doi: 10.1038/ng0496-417. [DOI] [PubMed] [Google Scholar]
- 26.Brown W.M., George M., Jr., Wilson A.C. Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA. 1979;76:1967–1971. doi: 10.1073/pnas.76.4.1967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stoneking M., Hedgecock D., Higuchi R.G., Vigilant L., Erlich H.A. Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes. Am. J. Hum. Genet. 1991;48:370–382. [PMC free article] [PubMed] [Google Scholar]
- 28.Tamura K., Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 1993;10:512–526. doi: 10.1093/oxfordjournals.molbev.a040023. [DOI] [PubMed] [Google Scholar]
- 29.Hasegawa M., Di Rienzo A., Kocher T.D., Wilson A.C. Toward a more accurate time scale for the human mitochondrial DNA tree. J. Mol. Evol. 1993;37:347–354. doi: 10.1007/BF00178865. [DOI] [PubMed] [Google Scholar]
- 30.Howell N., Kubacka I., Mackey D.A. How rapidly does the human mitochondrial genome evolve? Am. J. Hum. Genet. 1996;59:501–509. [PMC free article] [PubMed] [Google Scholar]
- 31.Parsons T.J., Muniec D.S., Sullivan K., Woodyatt N., Alliston-Greiner R., Wilson M.R. A high observed substitution rate in the human mitochondrial DNA control region. Nat. Genet. 1997;15:363–368. doi: 10.1038/ng0497-363. [DOI] [PubMed] [Google Scholar]
- 32.Howell N., Smejkal C.B., Mackey D.A., Chinnery P.F., Turnbull D.M., Herrnstadt C. The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. Am. J. Hum. Genet. 2003;72:659–670. doi: 10.1086/368264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Elliott H.R., Samuels D.C., Eden J.A., Relton C.L., Chinnery P.F. Pathogenic mitochondrial DNA mutations are common in the general population. Am. J. Hum. Genet. 2008;83:254–260. doi: 10.1016/j.ajhg.2008.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bandelt H.J., Lahermo P., Richards M., Macaulay V. Detecting errors in mtDNA data by phylogenetic analysis. Int. J. Legal Med. 2001;115:64–69. doi: 10.1007/s004140100228. [DOI] [PubMed] [Google Scholar]
- 35.Bandelt H.J., Quintana-Murci L., Salas A., Macaulay V. The fingerprint of phantom mutations in mitochondrial DNA data. Am. J. Hum. Genet. 2002;71:1150–1160. doi: 10.1086/344397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bandelt H.J., Yao Y.G., Salas A., Kivisild T., Bravi C.M. High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients. Biochem. Biophys. Res. Commun. 2007;352:283–291. doi: 10.1016/j.bbrc.2006.10.131. [DOI] [PubMed] [Google Scholar]
- 37.He Y., Wu J., Dressman D.C., Iacobuzio-Donahue C., Markowitz S.D., Velculescu V.E. Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature. 2010;464:610–614. doi: 10.1038/nature08802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li M., Schonberg A., Schaefer M., Schroeder R., Nasidze I., Stoneking M. Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am. J. Hum. Genet. 2010;87:237–249. doi: 10.1016/j.ajhg.2010.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sosa M.X., Sivakumar I.K., Maragh S., Veeramachaneni V., Hariharan R., Parulekar M. Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency. PLoS Comput. Biol. 2012;8:e1002737. doi: 10.1371/journal.pcbi.1002737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Payne B.A., Wilson I.J., Yu-Wai-Man P., Coxhead J., Deehan D., Horvath R. Universal heteroplasmy of human mitochondrial DNA. Hum. Mol. Genet. 2013;22:384–390. doi: 10.1093/hmg/dds435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Guo Y., Li C.I., Sheng Q., Winther J.F., Cai Q., Boice J.D. Very low-level heteroplasmy mtDNA variations are inherited in humans. J. Genet. Genomics. 2013;40:607–615. doi: 10.1016/j.jgg.2013.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Diroma M.A., Calabrese C., Simone D., Santorsola M., Calabrese F.M., Gasparre G. Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data. BMC Genomics. 2014;15(Suppl. 3):S2. doi: 10.1186/1471-2164-15-S3-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ye K., Lu J., Ma F., Keinan A., Gu Z. Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals. Proc. Natl. Acad. Sci. USA. 2014;111:10654–10659. doi: 10.1073/pnas.1403521111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Skonieczna K., Malyarchuk B., Jawien A., Marszalek A., Banaszkiewicz Z., Jarmocik P. Heteroplasmic substitutions in the entire mitochondrial genomes of human colon cells detected by ultra-deep 454 sequencing. Forensic Sci. Int. Genet. 2015;15:16–20. doi: 10.1016/j.fsigen.2014.10.021. [DOI] [PubMed] [Google Scholar]
- 45.Goto H., Dickins B., Afgan E., Paul I.M., Taylor J., Makova K.D., Nekrutenko A. Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study. Genome Biol. 2011;12:R59. doi: 10.1186/gb-2011-12-6-r59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li M., Schroder R., Ni S., Madea B., Stoneking M. Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations. Proc. Natl. Acad. Sci. USA. 2015;112:2491–2496. doi: 10.1073/pnas.1419651112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Buchanan C.C., Torstenson E.S., Bush W.S., Ritchie M.D. A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data. J. Am. Med. Inform. Assoc. 2012;19:289–294. doi: 10.1136/amiajnl-2011-000652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Weber-Lehmann J., Schilling E., Gradl G., Richter D.C., Wiehler J., Rolf B. Finding the needle in the haystack: differentiating identical twins in paternity testing and forensics by ultra-deep next generation sequencing. Forensic Sci. Int. Genet. 2014;9:42–46. doi: 10.1016/j.fsigen.2013.10.015. [DOI] [PubMed] [Google Scholar]
- 50.Parson W., Strobl C., Huber G., Zimmermann B., Gomes S.M., Souto L. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM) Forensic Sci. Int. Genet. 2013;7:543–549. doi: 10.1016/j.fsigen.2013.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.McElhoe J.A., Holland M.M., Makova K.D., Su M.S., Paul I.M., Baker C.H. Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq. Forensic Sci. Int. Genet. 2014;13:20–29. doi: 10.1016/j.fsigen.2014.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.King J.L., LaRue B.L., Novroski N.M., Stoljarova M., Seo S.B., Zeng X. High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq. Forensic Sci. Int. Genet. 2014;12:128–135. doi: 10.1016/j.fsigen.2014.06.001. [DOI] [PubMed] [Google Scholar]
- 53.Seneca S., Vancampenhout K., Van Coster R., Smet J., Lissens W., Vanlander A. Analysis of the whole mitochondrial genome: translation of the Ion Torrent Personal Genome Machine system to the diagnostic bench? Eur. J. Hum. Genet. 2015;23:41–48. doi: 10.1038/ejhg.2014.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vancampenhout K., Caljon B., Spits C., Stouffs K., Jonckheere A., De Meirleir L. A bumpy ride on the diagnostic bench of massive parallel sequencing, the case of the mitochondrial genome. PLoS ONE. 2014;29:e112950. doi: 10.1371/journal.pone.0112950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ekblom R., Smeds L., Ellegren H. Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria. BMC Genomics. 2014;15:467. doi: 10.1186/1471-2164-15-467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Li M., Stoneking M. A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 2012;13:R34. doi: 10.1186/gb-2012-13-5-r34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Briggs A.W., Good J.M., Green R.E., Krause J., Maricic T., Stenzel U. Primer extension capture: targeted sequence retrieval from heavily degraded DNA sources. J. Visual. Exp. 2009;31:1573. doi: 10.3791/1573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Maricic T., Whitten M., Paabo S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE. 2010;5:e14004. doi: 10.1371/journal.pone.0014004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Burbano H.A., Hodges E., Green R.E., Briggs A.W., Krause J., Meyer M. Targeted investigation of the Neandertal genome by array-based sequence capture. Science. 2010;328:723–725. doi: 10.1126/science.1188046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Krause J., Fu Q., Good J.M., Viola B., Shunkov M.V., Derevianko A.P. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature. 2010;464:894–897. doi: 10.1038/nature08976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Meyer M., Fu Q., Aximu-Petri A., Glocke I., Nickel B., Arsuaga J.L. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature. 2014;505:403–406. doi: 10.1038/nature12788. [DOI] [PubMed] [Google Scholar]
- 62.Loreille O., Koshinsky H., Fofanov V.Y., Irwin J.A. Application of next generation sequencing technologies to the identification of highly degraded unknown soldiers' remains. Forensic Sci. Int. Genet. Suppl. Ser. 2011;3:e540–e541. [Google Scholar]
- 63.Templeton J.E., Brotherton P.M., Llamas B., Soubrier J., Haak W., Cooper A. DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification. Invest. Genet. 2013;4:26. doi: 10.1186/2041-2223-4-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Li M., Schroeder R., Ko A., Stoneking M. Fidelity of capture-enrichment for mtDNA genome sequencing: influence of NUMTs. Nucleic Acids Res. 2012;40:e137. doi: 10.1093/nar/gks499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Bintz B.J., Dixon G.B., Wilson M.R. Simultaneous detection of human mitochondrial DNA and nuclear-inserted mitochondrial-origin sequences (NumtS) using forensic mtDNA amplification strategies and pyrosequencing technology. J. Forensic Sci. 2014;59:1064–1073. doi: 10.1111/1556-4029.12441. [DOI] [PubMed] [Google Scholar]
- 66.Dayama G., Emery S.B., Kidd J.M., Mills R.E. The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Res. 2014;42:12640–12649. doi: 10.1093/nar/gku1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ramos A., Santos C., Mateiu L., Gonzalez Mdel M., Alvarez L., Azevedo L. Frequency and pattern of heteroplasmy in the complete human mitochondrial genome. PLoS ONE. 2013;8:e74636. doi: 10.1371/journal.pone.0074636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Just R.S., Scheible M.K., Fast S.A., Sturk-Andreaggi K., Röck A.W., Bush J.M. Full mtGenome reference data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations. Forersic Sci. Int. Genet. 2015;14:141–155. doi: 10.1016/j.fsigen.2014.09.021. [DOI] [PubMed] [Google Scholar]
- 69.Guo Y., Li J., Li C.I., Shyr Y., Samuels D.C. MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis. Bioinformatics. 2013;29:1210–1211. doi: 10.1093/bioinformatics/btt118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Calabrese C., Simone D., Diroma M.A., Santorsola M., Gutta C., Gasparre G. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing. Bioinformatics. 2014;30:3115–3117. doi: 10.1093/bioinformatics/btu483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Tully L.A., Parsons T.J., Steighner R.J., Holland M.M., Marino M.A., Prenger V.L. A sensitive denaturing gradient-gel electrophoresis assay reveals a high frequency of heteroplasmy in hypervariable region 1 of the human mtDNA control region. Am. J. Hum. Genet. 2000;67:432–443. doi: 10.1086/302996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Strouss K. George Washington University; Washington, DC: 2006. Relative Evolutionary Rate Estimation for Sites in the mtDNA Control Region [Master Thesis] [Google Scholar]
- 73.Soares P., Ermini L., Thomson N., Mormina M., Rito T., Rohl A. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am. J. Hum. Genet. 2009;84:740–759. doi: 10.1016/j.ajhg.2009.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Parson W., Parsons T.J., Scheithauer R., Holland M.M. Population data for 101 Austrian Caucasian mitochondrial DNA d-loop sequences: application of mtDNA sequence analysis to a forensic case. Int. J. Legal Med. 1998;111:124–132. doi: 10.1007/s004140050132. [DOI] [PubMed] [Google Scholar]
- 75.Parson W., Dür A. EMPOP–a forensic mtDNA database. Forensic Sci. Int. Genet. 2007;1:88–92. doi: 10.1016/j.fsigen.2007.01.018. [DOI] [PubMed] [Google Scholar]
- 76.Bandelt H.J., Salas A. Current next generation sequencing technology may not meet forensic standards. Forensic Sci. Int. Genet. 2012;6:143–145. doi: 10.1016/j.fsigen.2011.04.004. [DOI] [PubMed] [Google Scholar]
- 77.van Oven M., Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 2009;30:E386–E394. doi: 10.1002/humu.20921. [DOI] [PubMed] [Google Scholar]
- 78.Just R.S., Irwin J.A., Parson W. Questioning the prevalence and reliability of human mitochondrial DNA heteroplasmy from massively parallel sequencing data. Proc. Natl. Acad. Sci. USA. 2014;111:E4546–E4547. doi: 10.1073/pnas.1413478111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Ye K., Lu J., Ma F., Keinan A., Gu Z. Reply to Just et al.: mitochondrial DNA heteroplasmy could be reliably detected with massively parallel sequencing technologies. Proc. Natl. Acad. Sci. USA. 2014;111:E4548–E4550. doi: 10.1073/pnas.1415171111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Forster L., Forster P., Lutz-Bonengel S., Willkomm H., Brinkmann B. Natural radioactivity and human mitochondrial DNA mutations. Proc. Natl. Acad. Sci. USA. 2002;99:13950–13954. doi: 10.1073/pnas.202400499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Davis C., Peters D., Warshauer D., King J., Budowle B. Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: enhanced data acquisition for DNA samples encountered in forensic testing. Legal Med. 2015;17:123–127. doi: 10.1016/j.legalmed.2014.10.004. [DOI] [PubMed] [Google Scholar]
- 82.Zimmermann B., Röck A.W., Dür A., Parson W. Improved visibility of character conflicts in quasi-median networks with the EMPOP NETWORK software. Croat. Med. J. 2014;55:115–120. doi: 10.3325/cmj.2014.55.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Andrews R.M., Kubacka I., Chinnery P.F., Lightowlers R.N., Turnbull D.M., Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 1999;23:147. doi: 10.1038/13779. [DOI] [PubMed] [Google Scholar]
- 84.Degner J.F., Marioni J.C., Pai A.A., Pickrell J.K., Nkadori E., Gilad Y. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–3212. doi: 10.1093/bioinformatics/btp579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Guo Y., Samuels D.C., Li J., Clark T., Li J., Shyr Y. Evaluation of allele frequency estimation using pooled sequencing data simulation. Sci. World J. 2013 doi: 10.1155/2013/895496. Article ID 895496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Ye F., Samuels D.C., Clark T., Guo Y. High-throughput sequencing in mitochondrial DNA research. Mitochondrion. 2014;17:157–163. doi: 10.1016/j.mito.2014.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Scheible M., Loreille O., Just R., Irwin J. Short tandem repeat typing on the 454 platform: strategies and considerations for targeted sequencing of common forensic markers. Forensic Sci. Int. Genet. 2014;12:107–119. doi: 10.1016/j.fsigen.2014.04.010. [DOI] [PubMed] [Google Scholar]
- 88.Warshauer D.H., Lin D., Hari K., Jain R., Davis C., Larue B. STRait razor: a length-based forensic STR allele-calling tool for use with second generation sequencing data. Forensic Sci. Int. Genet. 2013;7:409–417. doi: 10.1016/j.fsigen.2013.04.005. [DOI] [PubMed] [Google Scholar]