Abstract
The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit is the most popular locus for species identification and subgeneric phylogenetic inference in sequence-based mycological research. The region is known to show certain variability even within species, although its intraspecific variability is often held to be limited and clearly separated from interspecific variability. The existence of such a divide between intra- and interspecific variability is implicitly assumed by automated approaches to species identification, but whether intraspecific variability indeed is negligible within the fungal kingdom remains contentious. The present study estimates the intraspecific ITS variability in all fungi presently available to the mycological community through the international sequence databases. Substantial differences were found within the kingdom, and the results are not easily correlated to the taxonomic affiliation or nutritional mode of the taxa considered. No single unifying yet stringent upper limit for intraspecific variability, such as the canonical 3% threshold, appears to be applicable with the desired outcome throughout the fungi. Our results caution against simplified approaches to automated ITS-based species delimitation and reiterate the need for taxonomic expertise in the translation of sequence data into species names.
Keywords: fungi, molecular species delimitation, intraspecific variation
Introduction
That DNA sequence information is assigned material importance in contemporary mycology is exemplified by the recent notion of fungal barcoding, which seeks to standardize DNA-borne species identification through the use of one or more gene sequences from aptly chosen and annotated reference specimens (Hebert et al. 2003; Seifert et al. 2007; Bruns et al. 2008). The need for such protocols in mycology is patently clear: the vast number of extant fungi coupled with the dwindling number of taxonomic experts and the recondite nature of fungal life jointly make a persuasive case for barcoding-type approaches to species identification in the fungi (Guarro et al. 1999; Taylor et al. 2000; Schadt et al. 2003). The most popular locus for DNA-based mycological studies at the subgeneric level, and hence for species identification, is the internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit (Horton and Bruns, 2001; Bridge et al. 2005). This multi-copy, tripartite, and roughly 550-basepair (bp) segment combines the advantages of resolution at various scales (ITS1: rapidly evolving, 5.8S: very conserved, ITS2: moderately rapid to rapid; Hillis and Dixon, 1991; Hershkovitz and Lewis, 1996) with the ease of amplification of a multi-copy region into a readily obtainable product whose variability typically reflects synapomorphies at the species level.
Genome scans and novel molecular insights have brought attention to other genes of various copy number—notably the mitochondrial cytochrome C oxidase I (COI; Hebert et al. 2003; Little and Stevenson, 2007; Seifert et al. 2007)—that potentially could meet the occasional shortcomings of the multi-copy ITS region, such as pleomorphism and alignment difficulties (c.f. Álvarez and Wendel, 2003; Avis et al. 2006; Feliner and Rosselló, 2007). While the use of these new regions for purposes of species identification is certain to complement—perhaps even replace—that of the ITS region in some groups of fungi (Geiser et al. 2004), the difficulty associated with their generic primer design and amplification from low-quantity samples such as herbarium specimens suggest that the much more easily amplified ribosomal DNA will remain in frequent use for the foreseeable future (Tautz et al. 2003; Bruns and Shefferson, 2004; Blaxter et al. 2005). Many aspects of the nuclear ribosomal repeat region are but partly understood, however, and the prospects of the region as a barcode for the fungi has mainly been evaluated within limited taxonomic scopes. Using all 4185 available, fully identified fungal species represented by at least two satisfactory ITS sequences in the International Nucleotide Sequence Database (INSD: GenBank, EMBL, DDBJ; Benson et al. 2007), this study pursues the following questions:
Can intraspecific ITS variability in the fungi be captured in one generally applicable yet stringent interval, such as 0–3%?
It is often assumed, implicitly or otherwise, that fungal intraspecific variability is comparatively low and generally applicable across the kingdom such that it can be represented by a percentage interval, notably 0–3% (c.f. Cohan, 2002; Izzo et al. 2005; Ciardo et al. 2006). While this indeed seems to be the case for some groups of fungi (Druzhinina et al. 2005; Hinrikson et al. 2005; Smith et al. 2007), such a contention probably does not hold true for others (Martin et al. 2002; Edwards and Turco, 2005). As the absence of such a fungal-wide interval would be expected to compromise automated attempts at separation of intra- and interspecific variation, it would be of value to attain detailed knowledge on intraspecific ITS variability in all fungi presently available to the mycological community through INSD.
Is ITS1 always more variable than ITS2?
Much attention has been focused on ITS1 as the more variable sublocus of the two and thereby, presumably, the better species marker (Chen et al. 2001; Narutaki et al. 2002; Hinrikson et al. 2005). There are, however, observations to the contrary (Leaw et al. 2006), and knowledge of the extent of this deviance, as well as of any systematic signal in it, would add to our understanding of how ITS-based species identification efforts best be designed.
Is the ITS a straightforward barcode region for the fungi?
The ITS region is more often advocated than cautioned against as a vector for species identification in fungi, but these reports are typically based on subsets of fungi, often at the family level or lower (Sugita et al. 1999; Henry et al. 2000; Iwen et al. 2002). The picture emerging from joint analysis of all available fungal ITS sequences should be highly indicative of the performance of the ITS as a barcode region in the fungi.
This study uses the INSD data on an as-is basis. It is well known that the taxonomic reliability in public sequence databases is less than ideal (Bridge et al. 2003; Binder et al. 2005; Nilsson et al. 2006) and that there are other compounding factors such as chimeric sequences and obsolete classification systems and synonyms (Ashelford et al. 2005; Bidartondo et al. 2008; Ryberg et al. 2008) that render difficult the extrication of true taxonomic signal from the welter of noise surrounding it. While this will always hamper automated approaches to en masse sequence analysis, the present study takes several measures to account for these complications as to provide reasonably objective answers to the above questions.
Materials and Methods
Compilation of data
All fungal ITS sequences identified to species level in INSD as of August 6, 2007 were downloaded using emerencia 1.0 (Nilsson et al. 2005). The 2995 entries identified by Nilsson et al. (2006) as potentially misidentified or otherwise problematic were discarded from the study. Similarly, sequences with less than 100 bp. in either of ITS1 or ITS2, as well as sequences with more than 1% IUPAC DNA ambiguity symbols in any of the three ITS subregions, were excluded. Hidden Markov Models (HMMs) of the flanking nuclear small sub-unit (nSSU), 5.8S, and nuclear large sub-unit (nLSU) were constructed from the large-scope fungal alignments of Tehler et al. (2003); Larsson et al. (2004); Binder et al. (2005); and James et al. (2006) using HMMER 2.3.2 (Eddy, 1998). After calibration, the HMMs enabled in silico extraction of ITS1, 5.8S, and ITS2 from the downloaded sequences using Perl (Supplementary Document 1).
Data analysis
Intraspecific pairwise alignments of all loci considered (ITS1, 5.8S, ITS2, and jointly) were generated in Clustal W 1.83 (Thompson et al. 1997) for all 4185 species for which satisfactory INSD data from two or more specimens were available. Sequence similarity in the form of absolute, uncorrected (Hamming) distances (c.f. Minichini and Sciarrino, 2006) for all combinations of two conspecific specimens were computed in Python (Supplementary Document 1); from these distance matrices, median intraspecific similarities for each species were retrieved as to further reduce the impact of potentially contestable records using the statistical language R 2.5.1 (R Development Core Team, 2007; Supplementary Document 1). For the 16 species represented by more than 100 ITS sequences in INSD, the estimates were based on a random sample of 100 sequences from these. To derive global values for the intraspecific variability of the kingdom Fungi and its five conceptual phyla (Ascomycota, Basidiomycota, Chytridiomycota, Glomeromycota, and Zygomycota), weighted averages with weights proportional to the number of available sequences for each species were computed; the weighting scheme employed assigns higher importance to well-sampled species without disregarding more poorly represented species.
Results and Discussion
The ease with which the ITS region can be amplified from a variety of fungi in various morphs and states of preservation—as well as its high level of synapomorphic variability in many groups of fungi—have given impetus to several ITS-based barcoding-type efforts for select groups of fungi (e.g. Druzhinina et al. 2005; Kopchinskiy et al. 2005; Kõljalg et al. 2005). Given the decisive role assigned to the region, it may perhaps seem curious that many of its facets remain poorly understood, and the present study seeks to provide the data needed to examine these in a critical way. The questions pursued and the results obtained are constrained by, and to some extent reflective of, the wanting state of taxonomic reliability in the public sequence databases. Attempts were made to correct for outlier sequences, thereby abating the impact of inconsistent application of species names and the vagaries of laboratory work. Even so, for some common root-and soil associated fungi such as Rhizoctonia, Latin binomials seem little more than convenient placeholders under which specimens are subsumed in the absence of conflictory, but also confirmatory, evidence. Our results furthermore capture taxonomic complications such as the hypothesized presence of hybridisation in Tricholoma sulphureum (Comandini et al. 2004) and cryptic speciation in Laetiporus sulphureus (Rogers et al. 1999). As such the data obtained seem to corroborate one of the corollaries arising from the barcoding debate, namely that it may not lie in the interest of the mycological community to allow open and non-validated submission of barcodes to the international sequence databases. Similarly, continuous curation of taxonomic and nomenclatural aspects of reference sequences on part of both the sequence authors and the database in question appears a crucial element of molecular mycology.
One noticeable aspect of our assessment of fungal intraspecific variability is that the uncertainty of the estimates tends to decrease as the number of conspecific sequences available for any given species increases (Supplementary Document 2). In other words, more than some few conspecific sequences may be required to encompass the genetic variation found among populations of distinct localities. This observation calls into question the considerable number of barcoding studies based on less than a handful of collections per species (c.f. Little and Stevenson, 2007) and indeed the use of a single, defining sequence as arbiter of conspecificity in the first place. Other conveyors of amalgamated information, notably HMMs and multiple alignments (Eddy, 1998; Nilsson et al. 2004), appear much more suited to capture and relay such complexity.
Intraspecific ITS variability
The fungal intraspecific ITS variability as expressed in INSD does not readily lend itself to partitioning into clearly defined units. As defined in Materials and Methods, the weighted average of the intraspecific ITS variability of the kingdom Fungi is 2.51% with a standard deviation (SD) of 4.57 (Ascomycota: 1.96%, SD 3.73; Basidiomycota: 3.33%, SD 5.62; Chytridiomycota: 5.63%, SD 10.49; Glomeromycota: 7.46%, SD 4.14; Zygomycota: 3.24%, SD 6.12; Table 1). The comparatively well-studied Dikarya (Ascomycota and Basidiomycota) stands out as less variable than the basal fungal lineages, although these regions of the kingdom are rather sparsely represented by ITS sequences such that taxonomic intricacies and the deficient state of some sequence data are likely to attain a higher degree of penetration for these taxa. The canonical 3% threshold value for intraspecific variation fares surprisingly well for the fungi (Fig. 1), but it is nevertheless refuted by multiple examples from all fungal phyla (Supplementary Document 3).
Table 1.
Statistics on all species and sequences included in this study; data as of August 6, 2007. The standard deviation is shown in brackets as applicable.
Number of included species | 4185 (973 genera) |
Number of species excluded due to being represented by only one (satisfactory) sequence | 5428 |
Average number of sequences per species | 7 |
Total number of pairwise alignments of the study | ~2 million (13 Gb) |
Percentage of the estimated 1.5 million fungal species represented by at least two ITS sequences | 0.28% |
Median length of the loci (bp) | 183 (ITS1), 158 (5.8S), 173 (ITS2) |
Weighted intraspecific ITS variability in the kingdom Fungi | 2.51% [4.57] |
Weighted intraspecific ITS variability for the five conceptual phyla of the kingdom | |
Ascomycota (2509 species) | 1.96% [3.73] |
Basidiomycota (1582 species) | 3.33% [5.62] |
Chytridiomycota (11 species) | 5.63% [10.49] |
Glomeromycota (23 species) | 7.46% [4.14] |
Zygomycota (60 species) | 3.24% [6.12] |
Most abundantly represented species for each of the five conceptual phyla of the kingdom, their intraspecific variability, and the number of sequences | |
Fusarium solani (Ascomycota) | 3.1%, 542 |
Thanatephorus cucumeris (Basidiomycota) | 15.7%, 608 |
Olpidium brassicae (Chytridiomycota) | 2.0%, 18 |
Glomus intraradices (Glomeromycota) | 8.7%, 92 |
Rhizopus oryzae (Zygomycota) | 0.9%, 143 |
Correlation coefficient for variability between ITS1 and ITS2 | 0.87 (p-value less than 10−16) |
Percentage of species where ITS2 is more variable than ITS1 | 34% |
Percentage of species where ITS1 and ITS2 differ in variability by less than 0.5% | 91% |
Percentage of species with either ITS1 or ITS2 fully conserved and the other one at least 0.25% variable | 22% |
Percentage of species with fully conserved ITS region | 22% |
Percentage of species with intraspecific variability ≤3% | 75% (ITS1), 77% (ITS2), 80% (ITS1, 5.8S, ITS2) |
Percentage of species where the intraspecific variability of 5.8S is ≤0.5% | 80% |
Figure 1.
The proportion of fungal species in this study having intraspecific variabilities in the ranges depicted of (A) the ITS1 region, (B) the ITS2 region, and (C) the ITS1, 5.8S, and ITS2 regions combined.
Interestingly, our results also offer examples of well-and independently sampled species with low or no intraspecific variability (e.g. Boletus pinophilus and Serpula lacrymans; Supplementary Document 3). The wide spread in intraspecific variability observed testifies to the apparent futility of trying to find a single unifying yet stringent fungal-wide cut-off value to demarcate intra- from interspecific variability (Fig. 1; Supplementary Document 2). Such divides between intra- and interspecific variability—barcoding gaps—will, if at all in existence, have to be sought in different regions of variation space depending on the taxa under consideration; there is furthermore little to suggest that such divides in similarity could be deduced by taxonomic knowledge and logic alone. Even when collapsing the mushroom-forming Agaricomycetes in Supplementary Document 3 into formal orders as applicable (Hibbett et al. 2007), no such group could be classified as “easily barcoded”: all orders feature taxa that are considerably below, roughly at, and markedly above the fungal-wide average for intraspecific ITS variability. A similar pattern is observed when grouping these fungi according to putative nutritional mode, but whether these inferences will persist in the light of extended taxon sampling remains at issue.
Internal ITS variability
Our results show that the variability of ITS1, at least on average, exceeds that of ITS2 (Fig. 1). The difference in variability is noticeable at times (e.g. Hypocrea citrina and Malassezia furfur, both with >3% difference). In other cases, such as Boletus edulis and Cordyceps bassiana—with less than 1% difference—and Agaricus bisporus and Alternaria brassicae—with no difference—it is less conspicuous. For 34% of the fungal species compared, however, ITS2 is more variable than ITS1, which refutes the common assumption that ITS1 always is the most variable spacer of the ITS region. Indeed, it would seem likely that for certain taxa, ITS2 represents a better vector of low-level taxonomic information. We did not find evidence, however, for any phylum-wide systematic component to this observation as comparatively higher levels of ITS2 variability could not be significantly related to phylum-wise affiliation (Fisher’s exact test, p-value >0.05). The overall variation in ITS1 and ITS2 was found to be highly correlated (0.87; Supplementary Document 2) which supports the view that the two regions do not evolve independently of one another.
The 5.8S is typically fully conserved within a species, and the variation sometimes observed is negligible (weighted average: 0.21%, SD 0.67). Counterintuitively, therefore, the region can be expected to interfere with pre-defined threshold values for intraspecific variation. Supplementary Document 3 shows that even if both the ITS1 and ITS2 are more than 3% variable within one and the same species, the inclusion of the very conserved 5.8S may serve to reduce the apparent variability of the joint region into less than 3%, thereby masking the distinctness indicated by its flanking regions (as is the case for, e.g. Rhizopogon roseolus and Xerocomus subtomentosus). Our data suggest that the 5.8S, while arguably useful in other contexts (Hershkovitz and Lewis, 1996; Larsson et al. 2004), may be best left out from such estimates.
The ITS region as a fungal barcode
The ever-increasing prevalence of fungal environmental samples generated in ecological studies accentuates the need for automated, high-throughput approaches to species identification, and many such initiatives are indeed centered around the ITS region. This study shows that the ITS region is not equally variable in all groups of fungi (Table 2) and that the variation does not seem to be easily correlated to the systematic affiliation or nutritional mode of the species. These disparities speak against automated species delimitation using, for example, a global 3% cut-off value. To devise efficient fungal barcodes based on the ITS region will require, it would seem, far-reaching taxonomic knowledge specific to each group of fungi; a large number of conspecific specimens from as many populations and geographical regions as can be reasonably achieved; and possibly the erection of one or more tailored, closed-submission databases for the purpose. Criticism has been raised against the barcoding community for not taking these matters seriously enough (Wheeler, 2004; Will et al. 2005; Meier et al. 2006), and the present study lends further weight to the importance of these claims.
Table 2.
Intraspecific ITS variability of select species from each of the five conceptual phyla of the kingdom Fungi. Taxon selection was influenced by scheduled, ongoing, and completed genome projects.
Taxonomic affiliation | Sequences | Intraspecific ITS variability |
---|---|---|
Ascomycota | ||
Aspergillus fumigatus | 43 | 0.2% |
Candida albicans | 56 | 0.2% |
Fusarium solani | 542 | 3.1% |
Saccharomyces cerevisiae | 145 | 0.8% |
Xanthoria parietina | 54 | 0.6% |
Xylaria hypoxylon | 13 | 24.2% |
Basidiomycota | ||
Amanita muscaria | 45 | 0.9% |
Boletus edulis | 22 | 0.3% |
Coprinopsis echinospora | 7 | 2.6% |
Filobasidiella neoformans | 114 | 0.0% |
Puccinia graminis | 28 | 2.4% |
Rhizoctonia bataticola | 6 | 17.3% |
Ustilago maydis | 5 | 0.5% |
Chytridiomycota | ||
Olpidium brassicae | 18 | 2.0% |
Blastocladiella emersonii | 2 | 2.0% |
Glomeromycota | ||
Archaeospora leptoticha | 62 | 9.8% |
Glomus intraradices | 92 | 8.7% |
Glomus mosseae | 84 | 5.9% |
Paraglomus occultum | 12 | 19.5% |
Zygomycota | ||
Absidia corymbifera | 9 | 0.7% |
Endogone pisiformis | 3 | 2.6% |
Mucor racemosus | 9 | 8.4% |
Rhizopus oryzae | 143 | 0.9% |
Zoophthora radicans | 7 | 1.5% |
Conclusions
A plexus of pleomorphic organisms, fungi often defy assignment to genus or even family level, and it is becoming progressively apparent that molecular information will soon take over the role as the primary source for reliable species identification in all but for some few groups of fungi. It is moreover clear that these methods have only begun to reveal the true face of fungal diversity in that the absolute majority of fungi still await discovery and formal description (Hawksworth, 2001; Blackwell et al. 2006; Schmit and Mueller, 2007). Much of this diversity is recovered from ecological samples such as soil and plant debris in total absence of any physical manifestation of the fungi present. The mere observation that the multi-copy ITS region can be amplified from these low-quantity samples, whereas many low- and single-copy genes currently cannot, implies that the ITS region will remain a mycological cornerstone for a long time to come. That the region typically shows variation within, and to an even larger extent among, species turns the region into a valuable vector for mycological pursuits, although one for which not all preconceived ideas and assumptions hold true. The large number of fungi for which the ITS has been generated further serves to increase the usefulness of the region for purposes of comparison, but whether it will ever be truly useful also for automated species delimitation remains an open question—and one that the present results do not seem to answer in the affirmative.
Supplementary Material
Perl code used for in silico extraction of ITS1, 5.8S, and ITS2 from the fungal ITS-region sequences in INSD using HMMs; Python code used for alignment and similarity comparison; and R code used for calculating the statistics. Released under the GNU-GPLv2 software license.
(a) A histogram of the number of fungal ITS sequences per species as included in this study, showing that the majority of species is represented by fewer than five sequences. (b) The number of conspecific sequences plotted against the median intraspecific variability for the species in question, showing a decrease in the uncertainty of the estimates with higher number of sequences. Deviant sequences attain a higher degree of penetration in sparsely sampled species than in more richly sampled ones, where the larger sample sizes lead to estimates of smaller variance. (c) The variability of ITS1 (x axis) plotted against that of ITS2 (y axis) on a logarithmic scale. The correlation coefficient is 0.87 (p-value <10−16). (d) A histogram of the number of species in the study with an intraspecific variability in the ranges indexed, showing the asymmetric, long-tailed distribution of intraspecific variability. Jointly with (a), the histogram gives a good overview of the present state of ITS-borne sampling of fungi.
Estimated intraspecific variability of all 4185 fungal species of this study; results boiled down to ITS1, 5.8S, ITS2, and all combined. The number of sequences underlying the estimates, as well as the phylum-wise affiliation as given in INSD, are indicated. Extreme values are likely, but not necessarily bound, to hint at the presence of cryptic species or other unresolved taxonomic issues, laboratory artefacts, or additional compounding factors and were found to be distributed in all phyla in proportion to their size (Chi2 test: p-value >0.2). In the interest of completeness, no such entries were left out from the study. Only organisms annotated in INSD as belonging to the kingdom Fungi are included; organisms traditionally treated as “fungal allies” but now known to belong elsewhere were not targeted in this study.
Acknowledgements
Financial support from the foundations of Anna and Gunnar Vidfelt and Wilhelm and Martina Lundgren (RHN) and from Helge Ax:son Johnson and KVVS (MR) is gratefully acknowledged. Only freely available software was used in the making of this study.
References
- Álvarez I, Wendel JF. Ribosomal ITS sequences and plant phylogenetic inference. Mol Phylogenet Evol. 2003;29:417–34. doi: 10.1016/s1055-7903(03)00208-2. [DOI] [PubMed] [Google Scholar]
- Ashelford KE, Chuzhanova NA, Fry JC, et al. At least 1 in 20 26S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol. 2005;71:7724–36. doi: 10.1128/AEM.71.12.7724-7736.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avis PG, Dickie IA, Mueller GM. A ‘dirty’ business: Testing the limitations of terminal restriction fragment length polymorphism TRFLP) analysis of soil fungi. Mol Ecol. 2006;15:873–82. doi: 10.1111/j.1365-294X.2005.02842.x. [DOI] [PubMed] [Google Scholar]
- Benson DA, Karsch-Mizrachi I, Lipman DJ, et al. GenBank. Nucleic Acids Res. 2007;35:D21–D5. doi: 10.1093/nar/gkl986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bidartondo M, Bruns TD, Blackwell M, et al. Preserving accuracy in GenBank. Science. 2008;319:1616. doi: 10.1126/science.319.5870.1616a. [DOI] [PubMed] [Google Scholar]
- Binder M, Hibbett DS, Larsson KH, et al. The phylogenetic distribution of resupinate forms across the major clades of mushroom-forming fungi (Homobasidiomycetes) Syst Biodiv. 2005;3:113–57. [Google Scholar]
- Blackwell M, Hibbett DS, Taylor JW, et al. Research coordination networks: a phylogeny for kingdom Fungi (Deep Hypha) Mycologia. 2006;98:829–37. doi: 10.3852/mycologia.98.6.829. [DOI] [PubMed] [Google Scholar]
- Blaxter M, Mann J, Chapman T, et al. Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci. 2005;360:1935–43. doi: 10.1098/rstb.2005.1725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bridge PD, Roberts PJ, Spooner BM, et al. On the unreliability of published DNA sequences. New Phytol. 2003;160:43–8. doi: 10.1046/j.1469-8137.2003.00861.x. [DOI] [PubMed] [Google Scholar]
- Bridge PD, Spooner BM, Roberts PJ. The impact of molecular data in fungal systematics. Adv Bot Res. 2005;42:33–67. [Google Scholar]
- Bruns TD, Arnold AE, Hughes KW. Fungal networks made of humans: UNITE, FESIN, and frontiers in fungal ecology. New Phyt. 2008;177:586–8. doi: 10.1111/j.1469-8137.2008.02341.x. [DOI] [PubMed] [Google Scholar]
- Bruns TD, Shefferson RP. Evolutionary studies of ectomycorrhizal fungi: recent advances and future directions. Can J Bot. 2004;82:1122–32. [Google Scholar]
- Chen YC, Eisner JD, Kattar MM, et al. Polymorphic internal transcribed spacer region 1 DNA sequences identify medically important yeasts. J Clin Microbiol. 2001;39:4042–51. doi: 10.1128/JCM.39.11.4042-4051.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciardo DE, Schär G, Böttger EC, et al. Internal transcribed spacer sequencing versus biochemical profiling for identification of medically important yeasts. J Clin Microbiol. 2006;44:77–84. doi: 10.1128/JCM.44.1.77-84.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohan FM. What are bacterial species. Ann Rev Microbiol. 2002;56:457–87. doi: 10.1146/annurev.micro.56.012302.160634. [DOI] [PubMed] [Google Scholar]
- Comandini O, Haug I, Rinaldi AC, et al. Uniting Tricholoma sulphureum and T. bufonium. Mycol Res. 2004;108:11620–71. doi: 10.1017/s095375620400084x. [DOI] [PubMed] [Google Scholar]
- Druzhinina IS, Kopchinskiy AG, Komon-Zelazowska M, et al. An oligonucleotide barcode for species identification in Trichoderma and Hypocrea. Fungal Genet Biol. 2005;42:813–28. doi: 10.1016/j.fgb.2005.06.007. [DOI] [PubMed] [Google Scholar]
- Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–63. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
- Edwards IP, Turco RF. Inter- and intraspecific resolution of nrDNA TRFLP assessed by computer-simulated restriction analysis of a diverse collection of ectomycorrhizal fungi. Mycol Res. 2005;109:212–26. doi: 10.1017/s0953756204002151. [DOI] [PubMed] [Google Scholar]
- Feliner GN, Rosselló JA. Better the devil you know? Guidelines for insightful utilization of nrDNA ITS in species-level evolutionary studies in plants. Mol Phylogenet Evol. 2007;44:911–9. doi: 10.1016/j.ympev.2007.01.013. [DOI] [PubMed] [Google Scholar]
- Geiser DM, Jiménez-Gasco M, Kang S, et al. FUSARIUM-ID v.1.0: A DNA sequence database for identifying Fusarium. Eur J Plant Pathol. 2004;110:473–9. [Google Scholar]
- Guarro J, Gené J, Stchigel AM. Developments in fungal taxonomy. Clin Microbiol Rev. 1999;12:454–500. doi: 10.1128/cmr.12.3.454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawksworth DL. The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycol Res. 2001;105:1422–32. [Google Scholar]
- Hebert PDN, Cywinska A, Ball SL, et al. Biological identifications through DNA barcodes. Proc R Soc Lond B. 2003;270:313–21. doi: 10.1098/rspb.2002.2218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henry T, Iwen PC, Hinrichs SH. Identification of Aspergillus species using internal transcribed spacer regions 1 and 2. J Clin Microbiol. 2000;38:1510–5. doi: 10.1128/jcm.38.4.1510-1515.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hershkovitz MA, Lewis LA. Deep-level diagnostic value of the rDNA-ITS region. Mol Biol Evol. 1996;13:1276–95. doi: 10.1093/oxfordjournals.molbev.a025693. [DOI] [PubMed] [Google Scholar]
- Hibbett DS, Binder M, Bischoff JF, et al. A higher-level phylogenetic classification of the Fungi. Mycol Res. 2007;111:509–47. doi: 10.1016/j.mycres.2007.03.004. [DOI] [PubMed] [Google Scholar]
- Hillis DM, Dixon MT. Ribosomal DNA: Molecular evolution and phylogenetic inference. Q Rev Biol. 1991;66:411–53. doi: 10.1086/417338. [DOI] [PubMed] [Google Scholar]
- Hinrikson HP, Hurst SF, Lott TJ, et al. Assessment of ribosomal large-subunit D1–D2, internal transcribed region spacer 1, and internal transcribed spacer 2 regions as targets for molecular identification of medically important Aspergillus species. J Clin Microbiol. 2005;43:2092–103. doi: 10.1128/JCM.43.5.2092-2103.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horton TR, Bruns TD. The molecular revolution in ectomycorrhizal ecology: peeking into the black-box. Mol Ecol. 2001;10:1855–71. doi: 10.1046/j.0962-1083.2001.01333.x. [DOI] [PubMed] [Google Scholar]
- Iwen PC, Hinrichs SH, Rupp ME. Utilization of the internal transcribed spacer regions as molecular targets to detect and identify human fungal pathogens. Med Mycol. 2002;40:87–109. doi: 10.1080/mmy.40.1.87.109. [DOI] [PubMed] [Google Scholar]
- Izzo A, Agbowo J, Bruns TD. Detection of plot-level changes in ectomycorrhizal communities across years in an old-growth mixed-conifer forest. New Phytol. 2005;166:619–30. doi: 10.1111/j.1469-8137.2005.01354.x. [DOI] [PubMed] [Google Scholar]
- James TY, Kauff F, Schoch CL, et al. Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature. 2006;443:818–22. doi: 10.1038/nature05110. [DOI] [PubMed] [Google Scholar]
- Kopchinskiy A, Komo M, Kubicek CP, et al. TrichoBLAST: a multilocus database for Trichoderma and Hypocrea identifications. Mycol Res. 2005;109:658–60. doi: 10.1017/s0953756205233397. [DOI] [PubMed] [Google Scholar]
- Kõljalg U, Larsson KH, Abarenkov K, et al. UNITE: a database providing web based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 2005;166:1063–8. doi: 10.1111/j.1469-8137.2005.01376.x. [DOI] [PubMed] [Google Scholar]
- Larsson KH, Larsson E, Kõljalg U. High phylogenetic diversity among corticioid homobasidiomycetes. Mycol Res. 2004;108:983–1002. doi: 10.1017/s0953756204000851. [DOI] [PubMed] [Google Scholar]
- Leaw SN, Chang HC, Sun HF, et al. Identification of medically important yeast species by sequence analysis of the internal transcribed spacer regions. J Clin Microbiol. 2006;44:693–9. doi: 10.1128/JCM.44.3.693-699.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little DP, Stevenson DW. A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics. 2007;23:1–21. doi: 10.1111/j.1096-0031.2006.00126.x. [DOI] [PubMed] [Google Scholar]
- Martin F, Diez J, Dell B, et al. Phylogeography of the ectomycorrhizal Pisolithus species as inferred from nuclear ribosomal DNA ITS sequences. New Phytol. 2002;153:345–57. [Google Scholar]
- Meier R, Shiyang K, Vaidya G, et al. DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol. 2006;55:715–28. doi: 10.1080/10635150600969864. [DOI] [PubMed] [Google Scholar]
- Minichini C, Sciarrino A. Mutation model for nucleotide sequences based on crystal basis. Biosystems. 2006;84:191–206. doi: 10.1016/j.biosystems.2005.11.003. [DOI] [PubMed] [Google Scholar]
- Narutaki S, Takatori K, Nishimura H, et al. Identification of fungi based on the nucleotide sequence homology of their internal transcribed spacer 1 (ITS1) region. PDA J Pharm Sci Technol. 2002;56:90–8. [PubMed] [Google Scholar]
- Nilsson RH, Kristiansson E, Ryberg M, et al. Approaching the taxonomic affiliation of unidentified sequences in public databases— an example from the mycorrhizal fungi. BMC Bioinformatics. 2005;6:178. doi: 10.1186/1471-2105-6-178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nilsson RH, Larsson KH, Ursing BM. galaxie—CGI scripts for sequence identification through automated phylogenetic analysis. Bioinformatics. 2004;20:1447–52. doi: 10.1093/bioinformatics/bth119. [DOI] [PubMed] [Google Scholar]
- Nilsson RH, Ryberg M, Kristiansson E, et al. Taxonomic reliability of DNA sequences in public sequences databases: a fungal perspective. PLoS ONE. 2006;1:e59. doi: 10.1371/journal.pone.0000059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Development Core Team. 2007. R: A Language and Environment for Statistical Computing. R. Foundation for Statistical Computing, Austria.
- Rogers SO, Holdenrieder O, Sieber TN. Intraspecific comparisons of Laetiporus sulphureus isolates from broadleaf and coniferous trees in Europe. Mycol Res. 1999;103:1245–51. [Google Scholar]
- Ryberg M, Nilsson RH, Kristiansson E, et al. Mining ecological metadata in GenBank: a case-study from Inocybe (Agaricales) BMC Evolutionary Biology. 2007;8:50. doi: 10.1186/1471-2148-8-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schadt CW, Martin AP, Lipson DA, et al. Seasonal dynamics of previously unknown fungal lineages in tundra soils. Science. 2003;301:1359–61. doi: 10.1126/science.1086940. [DOI] [PubMed] [Google Scholar]
- Schmit JP, Mueller GM. An estimate of the lower limit of global fungal diversity. Biodiversity Conserv. 2007;16:99–111. [Google Scholar]
- Smith ME, Douhan GW, Rizzo DM. Intra-specific and intra-sporocarp ITS variation of ectomycorrhizal fungi as assessed by rDNA sequencing of sporocarps and pooled ectomycorrhizal roots from a Quercus woodland. Mycorrhiza. 2007;18:15–22. doi: 10.1007/s00572-007-0148-z. [DOI] [PubMed] [Google Scholar]
- Seifert KA, Samson RA, deWaard JR, et al. Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci USA. 2007;104:3901–6. doi: 10.1073/pnas.0611691104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sugita T, Nishikawa A, Ikeda R, et al. Identification of medically relevant Trichosporon species based on sequences of internal transcribed spacer regions and construction of a database for Trichosporon identification. J Clin Microbiol. 1999;37:1985–93. doi: 10.1128/jcm.37.6.1985-1993.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tautz D, Arctander P, Minelli A, et al. A plea for DNA taxonomy. Trends Ecol Evol. 2003;18:70–4. [Google Scholar]
- Taylor JW, Jacobson DJ, Kroken S, et al. Phylogenetic species recognition and species concepts in fungi. Fungal Genet Biol. 2000;31:21–32. doi: 10.1006/fgbi.2000.1228. [DOI] [PubMed] [Google Scholar]
- Tehler A, Little D, Farris JS. The full-length phylogenetic tree from 1551 ribosomal sequences of chitinous fungi, Fungi. Mycol Res. 2003;107:901–16. doi: 10.1017/s0953756203008128. [DOI] [PubMed] [Google Scholar]
- Thompson JD, Gibson TJ, Plewniak F, et al. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–82. doi: 10.1093/nar/25.24.4876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wheeler QD. Taxonomic triage and the poverty of phylogeny. Philos Trans R Soc Lond B Biol Sci. 2004;359:571–83. doi: 10.1098/rstb.2003.1452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Will KW, Mishler BD, Wheeler QD. The perils of DNA barcoding and the need for integrative taxonomy. Syst Biol. 2005;54:844–51. doi: 10.1080/10635150500354878. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Perl code used for in silico extraction of ITS1, 5.8S, and ITS2 from the fungal ITS-region sequences in INSD using HMMs; Python code used for alignment and similarity comparison; and R code used for calculating the statistics. Released under the GNU-GPLv2 software license.
(a) A histogram of the number of fungal ITS sequences per species as included in this study, showing that the majority of species is represented by fewer than five sequences. (b) The number of conspecific sequences plotted against the median intraspecific variability for the species in question, showing a decrease in the uncertainty of the estimates with higher number of sequences. Deviant sequences attain a higher degree of penetration in sparsely sampled species than in more richly sampled ones, where the larger sample sizes lead to estimates of smaller variance. (c) The variability of ITS1 (x axis) plotted against that of ITS2 (y axis) on a logarithmic scale. The correlation coefficient is 0.87 (p-value <10−16). (d) A histogram of the number of species in the study with an intraspecific variability in the ranges indexed, showing the asymmetric, long-tailed distribution of intraspecific variability. Jointly with (a), the histogram gives a good overview of the present state of ITS-borne sampling of fungi.
Estimated intraspecific variability of all 4185 fungal species of this study; results boiled down to ITS1, 5.8S, ITS2, and all combined. The number of sequences underlying the estimates, as well as the phylum-wise affiliation as given in INSD, are indicated. Extreme values are likely, but not necessarily bound, to hint at the presence of cryptic species or other unresolved taxonomic issues, laboratory artefacts, or additional compounding factors and were found to be distributed in all phyla in proportion to their size (Chi2 test: p-value >0.2). In the interest of completeness, no such entries were left out from the study. Only organisms annotated in INSD as belonging to the kingdom Fungi are included; organisms traditionally treated as “fungal allies” but now known to belong elsewhere were not targeted in this study.