metagenome |
total genomic potential of a microbial community (in this work, we use this term interchangeably with “sample” and “metagenomic sample”) |
singleton gene |
a gene detected in only one metagenomic sample across a defined collection of samples |
non-singleton gene |
a gene detected in more than one metagenomic sample across a defined collection of samples |
ORFan gene |
genes that have no detectable homologs in other species and are distinct from all open reading frames (ORFs) in the genome |
universe of genes |
the set of all non-redundant genetic elements across all communities of organisms in a given niche |
gene rarefaction curve |
a curve tracking the accumulation of new genes as samples are incrementally added |
gene discovery curve |
the derivative of the rarefaction curve (It estimates the rate at which new genes are added to the catalog when samples are added incrementally, and it can be used to estimate the size and burden of sampling of the universe of genes.) |
singleton fraction curve |
a curve estimating the fraction of a gene catalog that consists of singletons vs. non-singletons as samples are added incrementally (It is used to estimate the total number of samples that would be required for all singletons to be seen twice and thus no longer be singletons.) |
mixture contig |
A contig from de novo assembly consisting of both singletons and non-singletons |
singleton contig |
A contig from de novo assembly consisting of only singletons |
non-singleton contig |
A contig from de novo assembly consisting of only non-singletons |