Table 1.
Matched metagenome | Unmatched metagenome | Unrestricted reference database | Restricted database amplicon sequencing | Restricted database defined community | |
---|---|---|---|---|---|
Monetary cost | Sample type dependent $100-$2,000/sample or pooled samples | Free | Free | $50-$100/sample | Free |
Time cost (labor & computation) | Genome-resolved month-year, otherwise weeks | Days | Days | Weeks | Days |
Presence of sequences representing proteins not actually in the sample | Low, sequences are derived from sample | Medium, sequences are derived from system but not specific sample | High, sequences represent all of sequenced life | Medium, sequences are derived from same taxa as the sample, but not the same genomes | Low, exact composition is known and reference database is used |
Likelihood of sequences missing | Low to medium, Dependent on depth of sequencing and inclusion of unbinned sequences. | Medium to high, dependent on similarity between previously sequenced samples and samples measured by metaproteomics. | Medium to high, even if relatives of community members are present in public repositories, even closely related strains differ significantly in gene content. | Medium to high, even if representative genomes for identified taxa are available, closely related strains differ significantly in gene content. | None to low |
Potential sources for redundant (highly similar or identical) sequences | Artificial: bringing together sequences from sequential gene prediction and multiple assemblies. Biological: similar genes in different strains from the same species or genus. | Artificial: bringing together sequences from sequential gene prediction and multiple assemblies. Biological: similar genes in different strains from the same species or genus. | Artificial: bringing together sequences from multiple sources. Biological: similar genes in different strains from the same species or genus. |
Artificial: bringing together sequences from multiple sources. Biological: similar genes in different strains from the same species or genus. | Biological: similar genes in different strains from the same species or genus. |
Taxonomic resolution | If genome-resolved subspecies to species, otherwise genus to phylum based on LCA to reference databases | If genome-resolved subspecies to species, otherwise genus to phylum based on LCA to reference databases | Genus to phylum based on LCA of all matches in the reference databases | Genus to phylum based on LCA to reference databases | Subspecies to species |
Likelihood of misidentifying taxa | Low | Medium, dependent on relevance of metagenome to sample | High, many sequences missing from database and many sequences in the database are not in the sample | Medium, dependent on relevance of selected reference genomes to actual genomes in sample | Low |