To the Editor:
The good news is that a very large number of human mtDNA sequences from diverse populations and ethnic groups are becoming available for analysis. The bad news is that many of these sequences contain errors (Dennis 2003; Forster 2003). In at least one instance, that of the Icelandic population, it appears that mtDNA sequence errors were a contributing factor (although not the only one) to an erroneous conclusion about the genetic diversity of these people (Arnason 2003). Forster (2003) cites other examples where mtDNA sequence errors have compromised analyses of population genetics and human evolution. In a reanalysis of mtDNA sequences in the Ladin population of the Alps, the original conclusions on population diversity were not overturned after the use of more accurate sequences (Vernesi et al. 2002). At this point, we do not know the extent of the damage, so to speak, caused by mtDNA sequence errors. Nevertheless, it is clear that correcting such errors must be undertaken as quickly as possible.
As a result of our reduced median network analyses (Herrnstadt et al. 2002), we released a database of 560 human mtDNA coding region sequences. A small number of errors in these sequences were detected by Dr. Hans-Jürgen Bandelt, and we were able to correct these, as noted in an erratum that was published soon after our original report (Herrnstadt et al. 2002). Subsequently, a systematic approach to the detection of phantom sequence errors was published in this Journal (Bandelt et al. 2002). As defined by these investigators, phantom errors are those that arise during the sequencing process itself. Dr. Bandelt contacted us again and suggested that there were phantom mutations in our mtDNA database. Specifically, the likely errors involved G→C transversions at nt 7927 and nt 7985. Such a result was surprising to us, because we believed that our sequencing approach and quality control measures had avoided such errors. Therefore, we used Dr. Bandelt’s information as a starting point for a comprehensive reanalysis of our database.
After reanalysis, which included inspection of the electropherograms for all G→C and C→G transversions, we found that 41 of these mtDNA sequences contained at least one such phantom error. In fact, there were more such phantom errors than those suggested by Dr. Bandelt. In addition to the phantom transversions at positions 7927 and 7985, we detected instances of other such errors that included ones at nucleotide positions 500, 14160, 14460, 14974, and 16239. However, these errors did not occur randomly throughout the database. Instead, we could “isolate” the errors to a short time period that was relatively early during our large-scale mtDNA sequencing program. With the benefit of hindsight, it appears that the frequency of these errors was caused by two technical factors (see also Bandelt et al. 2002). The first was that one particular capillary array of the ABI 3700 DNA Analyzer produced suboptimal base separations, whereas the second was that the sequencing chemistry at that time utilized an early version of reagents that was optimized subsequently.
In addition to these 41 sequences, we also found that an additional 26 mtDNA sequences contained errors that arose during data entry or editing. As a result of this reanalysis, we have corrected the database of 560 sequences, which is available through the MitoKor Web site (the URL address is given below).
Have these errors invalidated our network analyses? Not to a substantial degree. Many of the sequence errors generated private polymorphisms, which were not included in our analyses. Furthermore, a substantial proportion of the branches in these networks were established by multiple substitutions (see figs. 1–4 in Herrnstadt et al. 2002), and, so far, we have no evidence from additional network analysis that the original results need major revision. Can we now guarantee that our mtDNA database is error free? No. Although such is our goal, it is not practical, and it is probably not technically feasible.
It is now clear that many mtDNA databases or sequence sets contain errors (Forster 2003). The solution to this problem is further effort, both at the front end (the sequencing process itself) and at the back end (increased quality control) of mtDNA database construction.
Acknowledgments
We thank Dr. Hans-Jürgen Bandelt (University of Hamburg) for bringing the issue of mtDNA sequence errors to our attention. The expert assistance of Brian Hulihan (MitoKor) with the Web site and with the files of mtDNA sequences is gratefully acknowledged.
Electronic-Database Information
The URL for data presented herein is as follows:
- MitoKor, http://www.mitokor.com/science/560mtdnasrevision.php (for the revised 560 mtDNA coding-region sequences; “zip” and “sit” files also available)
References
- Arnason E (2003) Genetic heterogeneity of Icelanders. Ann Hum Genet 67:5–16 [DOI] [PubMed] [Google Scholar]
- Bandelt H-J, Quintana-Murci L, Salas A, Macaulay V (2002) The fingerprint of phantom mutations in mitochondrial DNA data. Am J Hum Genet 71:1150–1160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dennis C (2003) Error reports threaten to unravel databases of mitochondrial DNA. Nature 421:773–774 [DOI] [PubMed] [Google Scholar]
- Forster P (2003) To err is human. Ann Hum Genet 67:2–4 [DOI] [PubMed] [Google Scholar]
- Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, Anderson C, Ghosh SS, Olefsky JM, Beal MF, Davis RE, Howell N (2002) Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups. Am J Hum Genet 70:1152–1170 (erratum 71:448–449) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vernesi C. Fuselli S, Castri L, Bertorelle G, Barbujani G (2002) Mitochondrial diversity in linguistic isolates of the Alps: a reappraisal. Hum Biol 74:725–730 [DOI] [PubMed] [Google Scholar]