A software pipeline for processing and identification of fungal ITS sequences

R Henrik Nilsson; Gunilla Bok; Martin Ryberg; Erik Kristiansson; Nils Hallenberg

doi:10.1186/1751-0473-4-1

. 2009 Jan 15;4:1. doi: 10.1186/1751-0473-4-1

A software pipeline for processing and identification of fungal ITS sequences

Reviewed by: R Henrik Nilsson ^1,^✉, Gunilla Bok ¹, Martin Ryberg ¹, Erik Kristiansson ², Nils Hallenberg ¹

PMCID: PMC2649129 PMID: 19146660

Abstract

Background

Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software – in the form of an identification pipeline for large sets of fungal ITS sequences – developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 ITS sequences from fungi growing as epiphytes on building material.

Results

The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion ITS2 of the ITS region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the Ascomycota, with Penicillium and Aspergillus as the two most common genera. The ITS2 was found to indicate a different taxonomic affiliation than did the complete ITS region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences.

Conclusion

The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups.

Background

The fungal kingdom is thought to comprise upwards of 1.5 million species, a figure that contrasts sharply with the 97,000 species of fungi described so far [1,2]. The discrepancy between the estimated and the known fungal diversity is largely ascribable to the non-conspicuous nature of fungal life; most species are readily visible, and thus amenable for collection, only when they form fruiting-bodies or other propagation structures [3,4]. Yet fungi are ubiquitously present in all biota as mycelia or other, non-hyphal life stages, and studies have revealed that fungi may account for as much as 90% of the living biomass of some ecosystems [5]. The important ecological roles played by fungi in terms of nutrient recycling, symbiosis, and parasitism call for a deeper, taxonomy-oriented understanding of the mycoflora of the world, and one far more detailed than any fruiting-body based field inventory could ever bring about [6,7]. Mycologists of today thus face the daunting task of locating and identifying large sets of fungi from non-trivial substrates such as soil, decaying wood, and plant debris, often in the complete absence of any physical manifestation of the fungi present.

The last ten years have seen a surge in the use of DNA sequences as a means to seek to identify fungi from various substrates and in different life stages to species level [8,9]. The most frequently sequenced region for such purposes is the internal transcribed spacer (ITS) region of the nuclear ribosomal DNA [10]. Though it does not come without complications [11], this ~550 base-pair (bp), transcribed but non-coding region is present in upwards of 250 copies per cell [12], which makes it relatively straightforward to amplify even from scanty input material. The high level of variability in two of its three subregions (ITS1 and ITS2) coupled with the very low variability of the intercalary third subregion (5.8S) make the region an advantageous choice for assignment of a query sequence to species level or a higher fungal taxon [13,14]. Such inferences are typically achieved through similarity searches using BLAST [15] against the International Nucleotide Sequence Databases (INSD: GenBank, EMBL, DDBJ; [16]).

While BLAST is a powerful and versatile software package, its use over the web is dependent on server load and web traffic intensity. To compile and interpret its output in the case of multiple query sequences tends to be time-consuming and involve a significant amount of manual intervention. It is furthermore often necessary to re-run a certain proportion of the query sequences after the removal of overly conserved sequence segments – here the nuclear small subunit (nSSU/16S/18S), the 5.8S, and the nuclear large subunit (nLSU/25S/28S) – in order to obtain fine-grained detail [13,17-19]. It may even be necessary to compile a multiple alignment of certain query sequences and their respective best matches and analyse the alignment under more powerful optimality criteria than similarity – such as phylogenetic analysis – in order to obtain sufficient resolution [cf. [20-23]]. To process an authentic set of fungal sequences obtained through environmental sampling thus often proves a lengthy and laborious process. The present study attempts a remedy in the form of a Perl-based software pipeline for rapid BLAST processing of large sets (10–100,000 s) of fungal query sequences. The pipeline streamlines functions such as automated ITS2-based BLAST runs for finer precision, the computation of multiple alignments for subsequent use, and the generation of comprehensible and easily analysed summaries of the results. The pipeline is designed to work in any UNIX-type environment (including MacOS X, Linux, and BSD) for which NCBI-BLAST, HMMER [24], and one or more of Clustal W [25], DIALIGN-TX [26], and MAFFT [27] are available.

To evaluate the performance of the pipeline on a large, authentic dataset, 350 ITS sequences were obtained from fungi growing on buildings in Sweden. Fungal growth on building material causes significant economic loss and is frequently connected to the Sick Building Syndrome [28]. The present sampling was carried out as a part of a larger study to characterize such fungal communities; detailed knowledge of the taxonomic affiliations of these fungi is likely to form a key element in assessing health risks caused by mass occurrence of fungi in moist damaged buildings [29].

Implementation and methods

The pipeline [Additional file 1] was written in Perl 5 [30] and acts as a wrapper for NCBI-BLAST, HMMER, and Clustal W/DIALIGN-TX/MAFFT. By default, all fungal ITS sequences annotated as such in INSD are used as target database (approximately 110,000 sequences as of December 2008). Optionally, the user may choose to include only sequences identified to species level in the database. Tailored, biweekly updated versions of these databases are made available by the authors at [31].

For each query sequence, blastn of the NCBI-BLAST package is run locally, and the result is parsed to retrieve the 15 (default; adjustable) best (topmost) matches in INSD. An attempt is then made to locate and extract the ITS2 from the same query sequence using HMMER and the Markov models employed by Nilsson et al [14]. If successful, the ITS2 of the query sequence is used as the query in a new BLAST run against the same database, again with the 15 (default) best matches saved. The fifteen best matches of the first BLAST run are aligned jointly with the query sequence using Clustal W (default), DIALIGN-TX, or MAFFT to present the sequences in a more integrative context than the pairwise alignments of the BLAST output. The default value of the number of sequences to retain in the above steps was set to 15 as a product of the present focus on very similar sequences and the computational capacity at hand.

Upon completion of the BLAST runs, all query sequences that share at least 50% (default; adjustable) of their 15 best BLAST matches are clustered and aligned accordingly. Conspecific and closely related sequences are likely to be grouped into clusters in this step. Similarly, all sequences not a member of any such group are listed – in all likelihood, these singleton sequences represent species present only once among the query sequences. Ample screen output is given for each of the query sequences, and the results are summarized in a joint tab-separated file for viewing in any spreadsheet-type program (e.g., Open Office or Excel). The pipeline is released under the GNU-GPL software license version 2 and makes use of only freely available software. Although distributed over the web, the pipeline does not require an Internet connection to run.

The fungi were isolated from different kinds of building materials, e.g. wood, fibreboard, and linoleum mat, with varying extent of mould growth on the surface. The surface of the sample was scraped with a sterile needle and the scraped product was mixed with distilled water. The suspension was spread out on Petri dishes (Ø 9 cm) containing 2% malt extract agar (MEA) or dichloran 18% – glycerol agar (DG18). The samples were incubated in daylight at room temperature (20°C). Subsequently, fungal colonies were sub-cultured to new Petri dishes (Ø 6 cm) as part of the cleaning process. A total of 350 sequences were extracted using DNeasy Plant Mini Kit (QIAGEN, Hilden); amplified using READY-TO-GO PCR beads (QIAGEN) and the ITS1F and ITS4 primers; purified using the QiaQuick Spin Procedure Kit (QIAGEN); and sequenced using the ITS1F and ITS4 primers and the CEQ 2000XL DNA Analysis System (Beckman Coulter, Fullerton). Full details of the laboratory procedures are available in [32]. The sequences were pooled and fed to the pipeline using its default settings. Decisions of taxonomic affiliations were done considering the results from both the full ITS region and from the ITS2 region, including studying the alignments in non-trivial cases, with priority given to the ITS2 data where the results differed.

Results and discussion

The 350 query sequences from mould growth on building material were found to be distributed among 42 distinct groups, each composed of mutually similar sequences to the exclusion of the other groups, and 27 singletons that did not match any other query sequence closely. The absolute majority (98%) of the fungal species recovered were found to belong to the Ascomycota, with a modest 2% of the sequences belonging to the Zygomycota (Fig. 1; Additional file 2). The two most common genera, Penicillium and Aspergillus, were totally dominant among the sequences with an occurrence of 58% and 16%, respectively. The most frequently occurring Penicillium species were P. chrysogenum, P. corylophilum, P. glabrum, and P. commune. The three latter species have as a rule not been identified to species level – but rather only to genus level – in prior biodiversity studies of building material due to the considerable difficulties in telling them apart using morphological analysis [33,34], a shortcoming that molecular data and the present software package are set to address. Aspergillus versicolor, A. niger, and A. ustus were the three most common species of the genus Aspergillus, which again represents a finer view than that presented in many surveys based on traditional techniques. Indeed, the view of the building mycoflora as revealed by the software pipeline shares many overall similarities with those from similar studies [35] but stands out through the level of resolution obtained on the species level [36,37]. The taxonomic and nomenclatural issues surrounding Penicillium and Aspergillus are far from trivial, however, and the lack of inclusive modern revisions and well-identified reference sequences are likely to have influenced the results of the present and many other efforts.

**Taxonomic distribution of the species recovered from the substrates**. The taxonomic distribution of the species in the pooled evaluation dataset over (A) the two fungal phyla retrieved, (B) the most common genera retrieved, (C) the most common species of *Penicillium*, and (D) the most common species of *Aspergillus*. The sequences were found to belong to 28 different fungal genera and 49 distinct species; an additional 11 sequences could not be assigned to species level with confidence due to the lack of fully identified reference sequences.

The evaluation dataset took two hours to run on a single-core 2.3 GHz Pentium 4 workstation. This is a considerable improvement in speed over any attempt at performing the corresponding BLAST runs over the web at INSD – particularly if complementary analyses have to be performed for some subset of the query sequences – and additional speed gains are to be attained if the pipeline is run on multiple-core processors. Manual inspection of the results revealed that the function to extract, and base the BLAST run on, the ITS2 gave a different taxonomic result than did the complete ITS region for about 10% of the cases (Additional file 2). This observation highlights the usefulness of excluding the very conserved genes encoding for the ribosomal small subunit, 5.8S, and large subunit, particularly when sequences from species not present in the target database are used as query. The magnitude of the discrepancy between the taxonomic signal obtained from the ITS2 and from the entire ITS region (including any portions of the flanking genes) can be expected to vary among different groups of fungi.

The largest shortcoming of the present pipeline, as indeed with all sequence-based identification procedures, is the wanting state of taxonomic coverage and reliability in the public sequence databases. These vary among different taxonomic groups but form a tangible problem for the fungal entries. A mere 13,300 (14%) of the 97,000 described species of fungi are represented by at least one fully identified ITS sequence in INSD, a number that, in turn, pales in comparison with the estimated 1.5 million extant species of fungi. More than 10% of the fungal INSD sequences under study have, furthermore, been shown to be incorrectly identified to species level [38,39]. Adding to the complexity, 39% of the fungal ITS sequences in INSD are not identified to species level in the first place, and their sheer number often serves to mask the presence of explanatory, fully identified sequences in the BLAST hit list [40-42]. This is, however, a problem to which the present study offers a partial remedy in the form of the option to use only fully identified sequences in the BLAST reference database.

Conclusion

We see the presented pipeline as a way to speed up the processing of large numbers of query sequences and to obtain extended functionality helpful for complementary, in-depth analyses. As with BLAST itself, however, the pipeline should be used on the understanding that definite assignment of a query sequence to species level is not amenable to algorithmic interpretation other than in a trivial sense, and in consideration of the fact that additional analysis steps may be needed, particularly for problematic query sequences.

Availability and requirements

Project name: FungalITSPipeline

Project home page: http://www.emerencia.org/fungalitspipeline.html

Operating systems: UNIX-type (MacOS X, Linux, UNIX/BSD)

Programming language: Perl

Other requirements: NCBI-BLAST, HMMER, optionally at least one of Clustal W, DIALIGN-TX, and MAFFT

Licence: GNU GPL 2

Any restrictions to use by non-academics: None other than those imposed by GNU GPL 2

Notes: The pipeline is available at the above address together with its documentation, the evaluation dataset, and biweekly updated BLAST database files. All these files are also bundled as additional data files with this manuscript. The pipeline is straightforward to adapt for other DNA regions and organism groups; new BLAST databases need to be assembled and the ITS2 mode should be turned off (or HMMs for the new gene need to be constructed).

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

RHN wrote the chief part of the pipeline, to which MR and EK contributed with subroutines and suggestions. GB and NH undertook the sampling and DNA sequencing of the fungi of the evaluation dataset and were responsible for the study design. All authors contributed to the manuscript and approved its final version.

Supplementary Material

Additional file 1

The software pipeline. The software pipeline described in the paper, together with its documentation, installation instructions, and the seed of a BLAST database.

Click here for file^{(16.6MB, zip)}

Additional file 2

The evaluation dataset. The evaluation dataset (in the FASTA format) and the results of its analysis (a summary in the OpenOffice/Excel spreadsheet format with multiple alignments in the ALN (Clustal W) format and a list of the species recovered in the PDF format).

Click here for file^{(1.7MB, zip)}

Acknowledgments

Acknowledgements

Financial support from FORMAS to NH and GB, and from Kapten Carl Stenholms Donationsfond to RHN and MR is gratefully acknowledged. Figure 1 was designed in collaboration with Bomb Mediaproduktion. RHN acknowledges infrastructure support from the Fungi in Boreal Forests network. Five anonymous referees are acknowledged for valuable input on the manuscript and software. Only freely available software was used in the making of this study.

Contributor Information

R Henrik Nilsson, Email: henrik.nilsson@dpes.gu.se.

Gunilla Bok, Email: gunilla.bok@dpes.gu.se.

Martin Ryberg, Email: martin.ryberg@dpes.gu.se.

Erik Kristiansson, Email: erik.kristiansson@zool.gu.se.

Nils Hallenberg, Email: nils.hallenberg@dpes.gu.se.

References

Hawksworth DL. The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycol Res. 2001;105:1422–1432. doi: 10.1017/S0953756201004725. [DOI] [Google Scholar]
Kirk PM, Cannon PF, Minter DW, Stalpers JA. Dictionary of the Fungi. 10. Wallingford: CABI Publishing; 2008. [Google Scholar]
Horton TR, Bruns TD. The molecular revolution in ectomycorrhizal ecology: peeking in the black-box. Mol Ecol. 2001;10:1855–1871. doi: 10.1046/j.0962-1083.2001.01333.x. [DOI] [PubMed] [Google Scholar]
Hibbett DS. After the gold rush, or before the flood? Evolutionary morphology of mushroom-forming fungi (Agaricomycetes) in the early 21st century. Mycol Res. 2007;111:1001–1018. doi: 10.1016/j.mycres.2007.01.012. [DOI] [PubMed] [Google Scholar]
Moore D, Novak LA. Essential Fungal Genetics. New York: Springer-Verlag; 2002. [Google Scholar]
Blackwell M, Hibbett DS, Taylor JW, Spatafora JW. Research coordination networks: a phylogeny for kingdom Fungi (Deep Hypha) Mycologia. 2006;98:829–837. doi: 10.3852/mycologia.98.6.829. [DOI] [PubMed] [Google Scholar]
Taylor DL, McCormick MK. Internal transcribed spacer primers and sequences for improved characterization of basidiomycetous orchid mycorrhizas. New Phytol. 2008;177:1020–1033. doi: 10.1111/j.1469-8137.2007.02320.x. [DOI] [PubMed] [Google Scholar]
Kõljalg U, Larsson K-H, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AFS, Tedersoo L, Vrålstad T, Ursing BM. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 2005;166:1063–1068. doi: 10.1111/j.1469-8137.2005.01376.x. [DOI] [PubMed] [Google Scholar]
Seifert KA, Samson RA, deWaard JR, Houbraken J, Levesque CA, Molcalvo J-M, Louis-Seize G, Hebert PDN. Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci USA. 2007;104:3901–3906. doi: 10.1073/pnas.0611691104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hibbett DS, Nilsson RH, Snyder M, Fonseca M, Costanzo J, Shonfeld M. Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi) Syst Biol. 2005;54:660–668. doi: 10.1080/10635150590947104. [DOI] [PubMed] [Google Scholar]
Avis PG, Dickie IA, Mueller GM. A 'dirty' business: testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Mol Ecol. 2006;15:873–882. doi: 10.1111/j.1365-294X.2005.02842.x. [DOI] [PubMed] [Google Scholar]
Vilgalys R, Gonzalez D. Organization of ribosomal DNA in the basidiomycete Thanathephorus praticola. Curr Genet. 1990;18:277–280. doi: 10.1007/BF00318394. [DOI] [PubMed] [Google Scholar]
Bruns TD, Shefferson RP. Evolutionary studies of ectomycorrhizal fungi: recent advances and future directions. Can J Bot. 2004;82:1122–1132. doi: 10.1139/b04-021. [DOI] [Google Scholar]
Nilsson RH, Kristiansson E, Ryberg M, Hallenberg N, Larsson K-H. Intraspecific ITS variability in the kingdom Fungi as expressed in the international sequence databases and its implications for molecular species identification. Evol Bioinform Online. 2008;4:193–201. doi: 10.4137/ebo.s653. http://www.la-press.com/intraspecific-its-variability-in-the-kingdom-fungi-as-expressed-in-the-a823 [DOI] [PMC free article] [PubMed] [Google Scholar]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;35:D25–D30. doi: 10.1093/nar/gkm929. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leaw SN, Chang HC, Sun HF, Barton R, Bouchara J-P, Chang TC. Identification of medically important yeast species by sequence analysis of the internal transcribed spacer regions. J Clin Microbiol. 2006;44:693–699. doi: 10.1128/JCM.44.3.693-699.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hau J, Muller M, Pagni M. HitKeeper, a generic software package for hit list management. Source Code Biol Med. 2007;2:2. doi: 10.1186/1751-0473-2-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gollapudi R, Revanna KV, Hemmerich C, Schaack S, Dong Q. BOV – a web-based BLAST output visualization tool. BMC Genomics. 2008;9:414. doi: 10.1186/1471-2164-9-414. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nilsson RH, Larsson K-H, Ursing BM. galaxie – CGI scripts for sequence identification through automated phylogenetic analysis. Bioinformatics. 2004;20:1447–1452. doi: 10.1093/bioinformatics/bth119. [DOI] [PubMed] [Google Scholar]
Nilsson RH, Rajashekar B, Larsson KH, Ursing BM. galaxieEST: addressing EST identity through automated phylogenetic analysis. BMC Bioinformatics. 2004;5:87. doi: 10.1186/1471-2105-5-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
Giancarlo R, Siragusa A, Siragusa E, Utro F. A basic analysis toolkit for biological sequences. Algorithms Mol Biol. 2007;2:10. doi: 10.1186/1748-7188-2-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hanekamp K, Bohnebeck U, Beszteri B, Valentin K. PhyloGena – a user-friendly system for automated phylogenetic annotation of unknown sequences. Bioinformatics. 2007;23:793–801. doi: 10.1093/bioinformatics/btm016. [DOI] [PubMed] [Google Scholar]
Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. doi: 10.1093/nar/25.24.4876. [DOI] [PMC free article] [PubMed] [Google Scholar]
Subramanian AR, Kaufmann M, Morgenstern B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol. 2008;3:6. doi: 10.1186/1748-7188-3-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skorge TD, Eagan TML, Eide GE, Gulsvik A, Bakke S. Indoor exposures and respiratory symptoms in a Norwegian community sample. Thorax. 2005;60:937–942. doi: 10.1136/thx.2004.025973. [DOI] [PMC free article] [PubMed] [Google Scholar]
Flannigan B. Air sampling for fungi in indoor environments. J Aerosol Sci. 1997;28:381–392. doi: 10.1016/S0021-8502(96)00441-7. [DOI] [Google Scholar]
Wall L, Christiansen T, Orwant J. Programming Perl. 3. Sebastopol: O'Reilly Media; 2000. [Google Scholar]
The Fungal ITS Pipeline http://www.emerencia.org/fungalitspipeline.html
Paulus B, Nilsson RH, Hallenberg N. Phylogenetic studies in Hypochnicium (Basidiomycota), with special emphasis on species from New Zealand. New Zeal J Bot. 2007;45:139–150. [Google Scholar]
Beguin H, Nolard N. Mould biodiversity in homes I. Air and surface analysis of 130 dwellings. Aerobiologia. 1994;10:157–166. doi: 10.1007/BF02459231. [DOI] [Google Scholar]
Hyvärinen A, Reponen T, Husman T, Ruuskanen J, Nevalainen A. Compostion of fungal flora in mold problem houses determined with four different methods. In: Kalliokoski P, Jantunen M, Seppänen O, editor. Proceedings of the 6th International Conference on Indoor Air Quality and Climate (Particles, microbes, radon): 4–8 July 1993; Helsinki. Vol. 4. University of Technology; 1993. pp. 273–278. [Google Scholar]
Pietarinen VM, Rintala H, Hyvärainen H, Lignell U, Karkkainen P, Nevalainen A. Quantitative PCR analysis of fungi and bacteria in building materials and comparison to culture-based analysis. J Environ Monit. 2008;10:655–663. doi: 10.1039/b716138g. [DOI] [PubMed] [Google Scholar]
Gravesen S, Nielsen PA, Iversen R, Nielsen KF. Microfungal contamination of damp buildings – examples of risk constructions and risk materials. Environ Health Perspect. 1999;107:505–508. doi: 10.1289/ehp.99107s3505. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hyvärinen A, Meklin T, Vepsäläinen A, Nevalainen A. Fungi and actinobacteria in moisture-damaged building materials – concentrations and diversity. Int Biodeterior Biodegrad. 2002;49:27–37. doi: 10.1016/S0964-8305(01)00103-2. [DOI] [Google Scholar]
Bridge PD, Roberts PJ, Spooner BM, Panchal G. On the unreliability of published DNA sequences. New Phytol. 2003;160:43–48. doi: 10.1046/j.1469-8137.2003.00861.x. [DOI] [PubMed] [Google Scholar]
Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson K-H, Kõljalg U. Taxonomic reliability of DNA sequences in public sequences databases: a fungal perspective. PLoS ONE. 2006;1:e59. doi: 10.1371/journal.pone.0000059. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nilsson RH, Kristiansson E, Ryberg M, Larsson KH. Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi. BMC Bioinformatics. 2005;6:178. doi: 10.1186/1471-2105-6-178. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ryberg M, Nilsson RH, Kristiansson E, Jacobsson S, Larsson E. Mining metadata from unidentified ITS sequences in GenBank: a case study in Inocybe (Basidiomycota) BMC Evol Biol. 2008;8:50. doi: 10.1186/1471-2148-8-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH. An outlook on the fungal ITS sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. New Phytol. 2009;181:471–477. doi: 10.1111/j.1469-8137.2008.02667.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1

The software pipeline. The software pipeline described in the paper, together with its documentation, installation instructions, and the seed of a BLAST database.

Click here for file^{(16.6MB, zip)}

Additional file 2

Click here for file^{(1.7MB, zip)}

[B1] Hawksworth DL. The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycol Res. 2001;105:1422–1432. doi: 10.1017/S0953756201004725. [DOI] [Google Scholar]

[B2] Kirk PM, Cannon PF, Minter DW, Stalpers JA. Dictionary of the Fungi. 10. Wallingford: CABI Publishing; 2008. [Google Scholar]

[B3] Horton TR, Bruns TD. The molecular revolution in ectomycorrhizal ecology: peeking in the black-box. Mol Ecol. 2001;10:1855–1871. doi: 10.1046/j.0962-1083.2001.01333.x. [DOI] [PubMed] [Google Scholar]

[B4] Hibbett DS. After the gold rush, or before the flood? Evolutionary morphology of mushroom-forming fungi (Agaricomycetes) in the early 21st century. Mycol Res. 2007;111:1001–1018. doi: 10.1016/j.mycres.2007.01.012. [DOI] [PubMed] [Google Scholar]

[B5] Moore D, Novak LA. Essential Fungal Genetics. New York: Springer-Verlag; 2002. [Google Scholar]

[B6] Blackwell M, Hibbett DS, Taylor JW, Spatafora JW. Research coordination networks: a phylogeny for kingdom Fungi (Deep Hypha) Mycologia. 2006;98:829–837. doi: 10.3852/mycologia.98.6.829. [DOI] [PubMed] [Google Scholar]

[B7] Taylor DL, McCormick MK. Internal transcribed spacer primers and sequences for improved characterization of basidiomycetous orchid mycorrhizas. New Phytol. 2008;177:1020–1033. doi: 10.1111/j.1469-8137.2007.02320.x. [DOI] [PubMed] [Google Scholar]

[B8] Kõljalg U, Larsson K-H, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AFS, Tedersoo L, Vrålstad T, Ursing BM. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 2005;166:1063–1068. doi: 10.1111/j.1469-8137.2005.01376.x. [DOI] [PubMed] [Google Scholar]

[B9] Seifert KA, Samson RA, deWaard JR, Houbraken J, Levesque CA, Molcalvo J-M, Louis-Seize G, Hebert PDN. Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci USA. 2007;104:3901–3906. doi: 10.1073/pnas.0611691104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Hibbett DS, Nilsson RH, Snyder M, Fonseca M, Costanzo J, Shonfeld M. Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi) Syst Biol. 2005;54:660–668. doi: 10.1080/10635150590947104. [DOI] [PubMed] [Google Scholar]

[B11] Avis PG, Dickie IA, Mueller GM. A 'dirty' business: testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Mol Ecol. 2006;15:873–882. doi: 10.1111/j.1365-294X.2005.02842.x. [DOI] [PubMed] [Google Scholar]

[B12] Vilgalys R, Gonzalez D. Organization of ribosomal DNA in the basidiomycete Thanathephorus praticola. Curr Genet. 1990;18:277–280. doi: 10.1007/BF00318394. [DOI] [PubMed] [Google Scholar]

[B13] Bruns TD, Shefferson RP. Evolutionary studies of ectomycorrhizal fungi: recent advances and future directions. Can J Bot. 2004;82:1122–1132. doi: 10.1139/b04-021. [DOI] [Google Scholar]

[B14] Nilsson RH, Kristiansson E, Ryberg M, Hallenberg N, Larsson K-H. Intraspecific ITS variability in the kingdom Fungi as expressed in the international sequence databases and its implications for molecular species identification. Evol Bioinform Online. 2008;4:193–201. doi: 10.4137/ebo.s653. http://www.la-press.com/intraspecific-its-variability-in-the-kingdom-fungi-as-expressed-in-the-a823 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;35:D25–D30. doi: 10.1093/nar/gkm929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Leaw SN, Chang HC, Sun HF, Barton R, Bouchara J-P, Chang TC. Identification of medically important yeast species by sequence analysis of the internal transcribed spacer regions. J Clin Microbiol. 2006;44:693–699. doi: 10.1128/JCM.44.3.693-699.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Hau J, Muller M, Pagni M. HitKeeper, a generic software package for hit list management. Source Code Biol Med. 2007;2:2. doi: 10.1186/1751-0473-2-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Gollapudi R, Revanna KV, Hemmerich C, Schaack S, Dong Q. BOV – a web-based BLAST output visualization tool. BMC Genomics. 2008;9:414. doi: 10.1186/1471-2164-9-414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Nilsson RH, Larsson K-H, Ursing BM. galaxie – CGI scripts for sequence identification through automated phylogenetic analysis. Bioinformatics. 2004;20:1447–1452. doi: 10.1093/bioinformatics/bth119. [DOI] [PubMed] [Google Scholar]

[B21] Nilsson RH, Rajashekar B, Larsson KH, Ursing BM. galaxieEST: addressing EST identity through automated phylogenetic analysis. BMC Bioinformatics. 2004;5:87. doi: 10.1186/1471-2105-5-87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Giancarlo R, Siragusa A, Siragusa E, Utro F. A basic analysis toolkit for biological sequences. Algorithms Mol Biol. 2007;2:10. doi: 10.1186/1748-7188-2-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Hanekamp K, Bohnebeck U, Beszteri B, Valentin K. PhyloGena – a user-friendly system for automated phylogenetic annotation of unknown sequences. Bioinformatics. 2007;23:793–801. doi: 10.1093/bioinformatics/btm016. [DOI] [PubMed] [Google Scholar]

[B24] Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]

[B25] Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. doi: 10.1093/nar/25.24.4876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Subramanian AR, Kaufmann M, Morgenstern B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol. 2008;3:6. doi: 10.1186/1748-7188-3-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] Skorge TD, Eagan TML, Eide GE, Gulsvik A, Bakke S. Indoor exposures and respiratory symptoms in a Norwegian community sample. Thorax. 2005;60:937–942. doi: 10.1136/thx.2004.025973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Flannigan B. Air sampling for fungi in indoor environments. J Aerosol Sci. 1997;28:381–392. doi: 10.1016/S0021-8502(96)00441-7. [DOI] [Google Scholar]

[B30] Wall L, Christiansen T, Orwant J. Programming Perl. 3. Sebastopol: O'Reilly Media; 2000. [Google Scholar]

[B31] The Fungal ITS Pipeline http://www.emerencia.org/fungalitspipeline.html

[B32] Paulus B, Nilsson RH, Hallenberg N. Phylogenetic studies in Hypochnicium (Basidiomycota), with special emphasis on species from New Zealand. New Zeal J Bot. 2007;45:139–150. [Google Scholar]

[B33] Beguin H, Nolard N. Mould biodiversity in homes I. Air and surface analysis of 130 dwellings. Aerobiologia. 1994;10:157–166. doi: 10.1007/BF02459231. [DOI] [Google Scholar]

[B34] Hyvärinen A, Reponen T, Husman T, Ruuskanen J, Nevalainen A. Compostion of fungal flora in mold problem houses determined with four different methods. In: Kalliokoski P, Jantunen M, Seppänen O, editor. Proceedings of the 6th International Conference on Indoor Air Quality and Climate (Particles, microbes, radon): 4–8 July 1993; Helsinki. Vol. 4. University of Technology; 1993. pp. 273–278. [Google Scholar]

[B35] Pietarinen VM, Rintala H, Hyvärainen H, Lignell U, Karkkainen P, Nevalainen A. Quantitative PCR analysis of fungi and bacteria in building materials and comparison to culture-based analysis. J Environ Monit. 2008;10:655–663. doi: 10.1039/b716138g. [DOI] [PubMed] [Google Scholar]

[B36] Gravesen S, Nielsen PA, Iversen R, Nielsen KF. Microfungal contamination of damp buildings – examples of risk constructions and risk materials. Environ Health Perspect. 1999;107:505–508. doi: 10.1289/ehp.99107s3505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] Hyvärinen A, Meklin T, Vepsäläinen A, Nevalainen A. Fungi and actinobacteria in moisture-damaged building materials – concentrations and diversity. Int Biodeterior Biodegrad. 2002;49:27–37. doi: 10.1016/S0964-8305(01)00103-2. [DOI] [Google Scholar]

[B38] Bridge PD, Roberts PJ, Spooner BM, Panchal G. On the unreliability of published DNA sequences. New Phytol. 2003;160:43–48. doi: 10.1046/j.1469-8137.2003.00861.x. [DOI] [PubMed] [Google Scholar]

[B39] Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson K-H, Kõljalg U. Taxonomic reliability of DNA sequences in public sequences databases: a fungal perspective. PLoS ONE. 2006;1:e59. doi: 10.1371/journal.pone.0000059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] Nilsson RH, Kristiansson E, Ryberg M, Larsson KH. Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi. BMC Bioinformatics. 2005;6:178. doi: 10.1186/1471-2105-6-178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] Ryberg M, Nilsson RH, Kristiansson E, Jacobsson S, Larsson E. Mining metadata from unidentified ITS sequences in GenBank: a case study in Inocybe (Basidiomycota) BMC Evol Biol. 2008;8:50. doi: 10.1186/1471-2148-8-50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH. An outlook on the fungal ITS sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. New Phytol. 2009;181:471–477. doi: 10.1111/j.1469-8137.2008.02667.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

A software pipeline for processing and identification of fungal ITS sequences

R Henrik Nilsson

Gunilla Bok

Martin Ryberg

Erik Kristiansson

Nils Hallenberg

Abstract

Background

Results

Conclusion

Background

Implementation and methods

Results and discussion

Figure 1.

Conclusion

Availability and requirements

Competing interests

Authors' contributions

Supplementary Material

Acknowledgments

Acknowledgements

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A software pipeline for processing and identification of fungal ITS sequences

R Henrik Nilsson

Gunilla Bok

Martin Ryberg

Erik Kristiansson

Nils Hallenberg

Abstract

Background

Results

Conclusion

Background

Implementation and methods

Results and discussion

Figure 1.

Conclusion

Availability and requirements

Competing interests

Authors' contributions

Supplementary Material

Acknowledgments

Acknowledgements

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases