Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2011 Mar;5(2):133–137. doi: 10.1016/j.fsigen.2010.10.003

Application of a west Eurasian-specific filter for quasi-median network analysis: Sharpening the blade for mtDNA error detection

Bettina Zimmermann a, Alexander Röck b, Gabriela Huber a, Tanja Krämer c, Peter M Schneider c, Walther Parson a,
PMCID: PMC3065003  PMID: 21067984

Abstract

The application of quasi-median networks provides an effective tool to check the quality of mtDNA data. Filtering of highly recurrent mutations prior to network analysis is required to simplify the data set and reduce the complexity of the network. The phylogenetic background determines those mutations that need to be filtered. While the traditional EMPOPspeedy filter was based on the worldwide mtDNA phylogeny, haplogroup-specific filters can more effectively highlight potential errors in data of the respective (sub)-continental region. In this study we demonstrate the performance of a new, west Eurasian filter EMPOPspeedyWE for the fine-tuned examination of data sets belonging to macrohaplogroup N that constitutes the main portion of mtDNA lineages in Europe. The effects on the resulting network of different database sizes, high-quality and flawed data, as well as the examination of a phylogenetically distant data set, are presented by examples. The analyses are based on a west Eurasian etalon data set that was carefully compiled from more than 3500 control region sequences for network purposes. Both, etalon data and the new filter file, are provided through the EMPOP database (www.empop.org).

Keywords: mtDNA population data, West Eurasia, Network analysis, Filter analysis, Error detection, EMPOP

1. Introduction

Mitochondrial (mt) DNA data generation is challenging and prone to error [1]. Despite guidelines for failsafe mtDNA typing [2], serious problems are abundant and can lead to severe misunderstandings and errors in interpretation [1]. Consequently, there is a need for quality control to prevent the release of erroneous mtDNA data.

The application of quasi-median (QM) networks provides an opportunity to examine the quality of mtDNA data by graphically representing the genetic structure of the lineages in a data set [3]. Filtering of highly recurrent (i.e. homoplasic) mutations prior to network analysis is required to reduce the complexity of the resulting network, which then enables the detection of data idiosyncrasies and potential artefacts. In general, the number of filtered mutations is positively correlated with the simplicity of the resulting network torso. However, in order to keep the network analysis powerful for forensic quality control, filtered mutations should be kept at a minimum and only concern sites that display highly recurrent mutations in the given sample. We have previously presented the universal filter EMPOPspeedy [4] that was based on hotspot mutations observed in the worldwide phylogeny [5]. However, some of the mutations included there are not relevant for subsets of the west Eurasian phylogeny. Therefore, we here present the new filter EMPOPspeedyWE that is specific to the west Eurasian phylogeny. In addition, we have compiled a high-quality set of mtDNA haplotypes that is representative for the west Eurasian mtDNA pool and serves as etalon.

2. Materials and methods

Thirty-one DNA samples from different west Eurasian countries (Germany, France, Poland, Bosnia-Herzegovina, Romania, Switzerland, Italy, Austria, Turkey, Kazakhstan, Denmark, Portugal, Spain, and the Russian Federation) were subjected to entire mtDNA control region sequencing according to high quality amplification and sequencing procedures [6]. Volunteer donors gave written consent. Those samples were used to highlight the effect of applying a small data set to network analysis.

A reference data set of 3673 west Eurasian control region (CR) haplotypes served as basis for selecting the filtered mutations and the make-up of the etalon. These data were extracted from the EMPOP database [4], including their corresponding haplogroup affiliation based on the nomenclature updated in [7,Build 7]. Samples outside macrohaplogroup N as well as haplotypes that could not be assigned to a specific haplogroup within R0 by CR polymorphisms were removed (1201 samples affected). The final west Eurasian reference data set comprised 2472 defined mtDNA haplotypes of typically west Eurasian origin. Based on these data, 202 haplotypes were selected to compose the etalon data set (Table S1). This selection is based on the observation of haplotypes/haplogroups frequently present in Europe and was adapted to the requirements of network analysis. This involves a sample size of about 200 distinct haplotypes [4] that builds a reasonable network torso on its own and to which small data sets can be added to allow their depiction in a useful manner.

On the basis of the west Eurasian reference data set, the fluctuation rate of the observed mutations was determined. For this purpose, the haplotypes were clustered according to their major CR haplogroups and the relative frequency of the mutations with respect to these clusters was estimated. These ranged between 0.00% and 40.00% with an average positional log-likelihood ratio of 2.81. For deciding which mutations to include in the filter file, a fluctuation rate threshold of 0.85% was applied. Furthermore, additional mutations with lower thresholds that increased the complexity of the network were identified empirically by analyzing example data together with the etalon.

3. Results and discussion

The presented west Eurasian-specific filter EMPOPspeedyWE contained a total of 111 mutational positions (Table S2), 29 less than the general EMPOPspeedy filter ([4]; see www.empop.org for details). When applied to the west Eurasian etalon data set of 202 haplotypes (Table S1) the network torsi of both hypervariable segments HVS-I and HVS-II displayed typical star-like structures (Fig. 1). This combination (etalon and filter) lends itself to the addition of small sample size data sets to enhance the demonstration of the included sequence information in the network. The following examples illustrate the application of the west Eurasian etalon in combination with the new filter to mtDNA data sets of different sizes and qualities.

Fig. 1.

Fig. 1

QM network torsi of the etalon data set. (A) HVS-I: nps 16024–16569; (B) HVS-II: nps 1–576. The nodes correspond to reduced and condensed haplotypes. The most frequent haplogroup and a “+” are given if more than one haplogroups are involved. The number of condensed haplotypes is indicated below. The branches represent mutational events; transitions are shown in green, transversions in red. Small black nodes represent a quasi-median indicating that this haplotype was not observed in the data set. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

3.1. QM network analysis of a high-quality but small data set (N = 31)

The QM network torsi of a small high-quality data set (N = 31, Table S3) are shown in Fig. 2. The small sample size and the high quality of the data led to a very simple torso with all haplotypes being condensed into one node (Fig. 2A). The addition of these 31 haplotypes to the west Eurasian etalon resulted in a more complex but still star-like torso (Fig. 2B). The three-dimensional reticulation on the right side of the HVS-I torso can be explained by the known parallel occurrence of a transition and a transversion at position 16147 in two samples (C16147T: hg U5b1; C16147A: hg N1a1).

Fig. 2.

Fig. 2

QM network torsi (nps 16024–16569) of a small high-quality data set (N = 31) from west Eurasia. (A) The torso of the 31 samples. (B) The resulting torso of the 31 samples combined with the etalon (N = 233).

3.2. Effect of a small ambiguous data set on QM torsi (N = 29)

Another west Eurasian data set of similar size (N = 29; from Dagestan [8]) resulted in a different picture (Fig. 3). The torso of the 29 Darginian HVS-I haplotypes alone (Fig. 3A) already shows two three-dimensional reticulations, suggesting phantom mutations at positions 16280, 16281, 16384 and 16391. When combined with the etalon 13 reticulations were observed in the resulting network torso (Fig. 3B). These data (among others from [8]) were in the centre of a debate [9]. Finally, the authors presented their raw data and thus provided evidence for sequence misinterpretation due to low quality of the raw data [10].

Fig. 3.

Fig. 3

QM network torsi (nps 16024–16400) of a Darginian data set (N = 29). (A) The torso of the 29 haplotypes. (B) The resulting torso of the 29 samples combined with the etalon (N = 231).

3.3. Effect of a high-quality but oversized data set on QM network torsi (N = 786)

Oversized data sets produce crowded network torsi as enormous complexity arises from the high number of condensed haplotypes. Fig. 4A demonstrates an oversized data set by joining four (high-quality) data sets retrieved from the EMPOP database [6,11,12, Lutz-Bonengel et al., unpublished], comprising a total of 786 haplotypes. The resulting torso is still star-like but overloaded and impossible to read. Data sets of such size should be partitioned and then tested independently. Hence, we recommend the application of sample sizes between 200 and 300 haplotypes per query to get useful graphical representations as shown in Fig. 4B (273 Austrian Europeans from [6]). This high-quality Austrian data set exhibits one phylogenetically distant sample, a C5b1 haplotype (nested in macrohaplogroup M) that can be easily identified by the three-dimensional reticulations on the left bottom of the network torso.

Fig. 4.

Fig. 4

QM network torsi of high-quality west Eurasian samples. (A) The torso (nps 16024–16365) of an oversized sample set (N = 786). (B) The torso (nps 16024–16569) of 273 Austrians.

3.4. Application of a Southeast Asian data set to the west Eurasian filter (N = 190)

An apparent misapplication of the west Eurasian-specific filter is shown in Fig. 5. The investigated haplotypes originate from the Southeast Asian phylogeny (Thailand [13]). The complexity of the network torso in Fig. 5A points at the presence of mutational hotspots in the data that are not removed by the west Eurasian filter. For comparison, the torso of the same data after passage through the general EMPOPspeedy filter is shown in Fig. 5B, revealing (less specific) reduced complexity.

Fig. 5.

Fig. 5

QM network torsi of 190 haplotypes from Thailand (nps 16024–16569), constructed with the new west Eurasian-specific filter EMPOPSpeedyWE (A) and the EMPOPspeedy filter (B).

4. Conclusions

QM network analysis is a useful tool for the quality control of mtDNA sequences as data idiosyncrasies can be unmasked. The complexity of mtDNA data needs to be reduced to simplify the graphical representation of the network and to make it more powerful for the detection of errors. This is achieved by introducing a filter that targets highly recurrent mutations that would otherwise distort the network. The new filter EMPOPspeedyWE is specific to sieve homoplasic mutations typical to the west Eurasian mtDNA phylogeny. In combination with the presented etalon data set it is powerful to examine even small sized sample sets of west Eurasian origin. Since homoplasic mutations from other parts of the phylogeny are not filtered, this approach also allows the detection of phylogenetically distant lineages that may be present in the data.

Acknowledgement

This study was supported by the FWF Austrian Science Fund (TR397).

Footnotes

Appendix A

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.fsigen.2010.10.003.

Appendix A. Supplementary data

Supplementary Table S1

Etalon data set: 202 various west Eurasian mtDNA haplotypes; investigated range: nps 16024–576.

mmc1.pdf (87.1KB, pdf)
Supplementary Table S2

EMPOPspeedyWE filter for QM network analyses. Filtered positions were ascertained from mutations in 2472 EMPOP sequences.

mmc2.pdf (97KB, pdf)
Supplementary Table S3

Summary of 31 west Eurasian mtDNA haplotypes (EMP00018); investigated range: nps 16024–576.

mmc3.pdf (69.5KB, pdf)

References

  • 1.Brandstätter A., Sänger T., Lutz-Bonengel S., Parson W., Béraud-Colomb E., Wen B., Kong Q.-P., Bravi C.M., Bandelt H.-J. Phantom mutation hotspots in human mitochondrial DNA. Electrophoresis. 2005;26:3414–3429. doi: 10.1002/elps.200500307. [DOI] [PubMed] [Google Scholar]
  • 2.Salas A., Carracedo A., Macaulay V., Richards M., Bandelt H.J. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics. Biochem. Biophys. Res. Commun. 2005;335:891–899. doi: 10.1016/j.bbrc.2005.07.161. [DOI] [PubMed] [Google Scholar]
  • 3.Bandelt H.-J., Dür A. Translating DNA data tables into quasi-median networks for parsimony analysis and error detection. Mol. Phylogenet. Evol. 2007;42:256–271. doi: 10.1016/j.ympev.2006.07.013. [DOI] [PubMed] [Google Scholar]
  • 4.Parson W., Dür A. EMPOP—a forensic mtDNA database. Forensic Sci. Int. Genet. 2007;1:88–92. doi: 10.1016/j.fsigen.2007.01.018. [DOI] [PubMed] [Google Scholar]
  • 5.Bandelt H.-J., Kong Q.-P., Richards M., Macaulay V. Estimation of mutation rates and coalescence times: some caveats. In: Bandelt H.-J., Richards M., Macaulay V., editors. Human Mitochondrial DNA and the Evolution of Homo sapiens. Springer-Verlag; Berlin/Heidelberg/New York: 2006. (Chapter 4) [Google Scholar]
  • 6.Brandstätter A., Niederstätter H., Pavlic M., Grubwieser P., Parson W. Generating population data for the EMPOP database—an overview of the mtDNA sequencing and data evaluation processes considering 273 Austrian control region sequences as example. Forensic Sci. Int. 2007;166:164–175. doi: 10.1016/j.forsciint.2006.05.006. [DOI] [PubMed] [Google Scholar]
  • 7.van Oven M., Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 2009;30:E386–E394. doi: 10.1002/humu.20921. [DOI] [PubMed] [Google Scholar]
  • 8.Nasidze I., Stoneking M. Mitochondrial DNA variation and language replacements in the Caucasus. Proc. R. Soc. Lond. B: Biol. Sci. 2001;268:1197–1206. doi: 10.1098/rspb.2001.1610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bandelt H.-J., Kivisild T. Quality assessment of DNA sequence data: autopsy of a mis-sequenced mtDNA population sample. Ann. Hum. Genet. 2006;70:314–326. doi: 10.1111/j.1529-8817.2005.00234.x. [DOI] [PubMed] [Google Scholar]
  • 10.Parson W. The art of reading sequence electropherograms. Ann. Hum. Genet. 2007;71:276–278. doi: 10.1111/j.1469-1809.2006.00319.x. [DOI] [PubMed] [Google Scholar]
  • 11.Brandstätter A., Klein R., Duftner N., Wiegand P., Parson W. Application of a quasi-median network analysis for the visualization of character conflicts to a population sample of mitochondrial DNA control region sequences from southern Germany (Ulm) Int. J. Legal Med. 2006;120:310–314. doi: 10.1007/s00414-006-0114-x. [DOI] [PubMed] [Google Scholar]
  • 12.Tetzlaff S., Brandstätter A., Wegener R., Parson W., Weirich V. Mitochondrial DNA population data of HVS-I and HVS-II sequences from a northeast German sample. Forensic Sci. Int. 2007;172:218–224. doi: 10.1016/j.forsciint.2006.12.016. [DOI] [PubMed] [Google Scholar]
  • 13.Zimmermann B., Bodner M., Amory S., Fendt L., Röck A., Horst D., Horst B., Sanguansermsri T., Parson W., Brandstätter A. Forensic and phylogeographic characterization of mtDNA lineages from northern Thailand (Chiang Mai) Int. J. Legal Med. 2009;123:495–501. doi: 10.1007/s00414-009-0373-4. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table S1

Etalon data set: 202 various west Eurasian mtDNA haplotypes; investigated range: nps 16024–576.

mmc1.pdf (87.1KB, pdf)
Supplementary Table S2

EMPOPspeedyWE filter for QM network analyses. Filtered positions were ascertained from mutations in 2472 EMPOP sequences.

mmc2.pdf (97KB, pdf)
Supplementary Table S3

Summary of 31 west Eurasian mtDNA haplotypes (EMP00018); investigated range: nps 16024–576.

mmc3.pdf (69.5KB, pdf)

RESOURCES