Table 2.
Brief description of GRA estimation tools: advantages and disadvantages.
Tool | Bioinformatics tools for Genomes Relative Abundance (GRA) estimation | ||
---|---|---|---|
Brief description | Advantages | Disadvantages | |
TETRA | Pioneering classifier that uses tetranucleotide-derived z-score correlations to taxonomically classify genomic fragments. Compositional-based. | Provides statistical analysis of tetranucleotide usage patterns in genomic fragments. It works either via a web-service or a stand-alone program. | Accuracy at genus level is reached using long reads (>1 kb). Tends to create multiple clusters for reads originating from highly abundant species when the sample contains multiple species with highly varying levels of abundance. |
CompostBin | DNA compositional-based algorithm which adopts a weighted Principal Component Analysis (PCA)-based strategy. Compositional-based. | Reduces the dimensionality of compositional space. Bins raw sequence reads without need for assembly or training. | Accuracy at genus level is reached using long reads (>1 kb). Tends to create multiple clusters for reads originating from highly abundant species when the sample contains multiple species with highly varying levels of abundance. |
TACOA | Multi-class taxonomic classifier combining the idea of the k-nearest neighbor with strategies from kernel-based learning. Compositional-based. | Easily installed and run on a desktop computer. Its reference set can be easily updated with newly sequenced genomes. | Accuracy at genus level is reached using long reads (>1 kb). |
AbundanceBin | Binning tool, based on the l-tuple content of reads, developed on the assumption that reads are sampled from genomes following a Poisson distribution. Compositional-based. | Capable to return accurate results also when the sequence lengths are very short (~75 pb). | Binning efficiency decrease in case of samples which tend to have a uniform distribution of species. |
MEGAN | Standalone computer program allowing large metagenomic data sets. It uses BLAST or other comparison tools to assign species to each read, and then employs the NCBI taxonomy. Alignment-based. | Allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. Provides statistical and graphical output. Computes quantitatively accuracy and specificity. | Uses bit-score of individual hits as the sole parameter for judging significance, thus affecting specificity and accuracy of taxonomic assignments in different scenarios. |
GRAMMy | Probabilistic framework developed for GRA. It uses the Mixture Model theory. | Exploitable with mapping, alignment and composition-based tools. Possibility to handle very short reads obtaining accurate results. | Accuracy in estimated abundance decreases in case of closely related microbes whose genomic sequences are highly similar. |