Fig. 2.
Cladogram of 800 validated MarFERReT entries and summary of reference metadata. (a) Cladogram of hierarchical taxonomic ranks of marine eukaryotes within the NCBI Taxonomy framework293 using the NCBI CommonTree tool. Each tip is a unique taxon included in MarFERReT, defined by its NCBI taxID identifier. Branches are colored by taxonomic lineage with size of the closed circle at each tip proportional to the number of validated entries in each taxon. Concentric rings describe metadata and statistics for each taxon. From innermost ring outward: year of publication or data release for sequence data (average year of release for multiple entries), number of clustered sequences in taxon, raw input format of sequence data: transcriptome, transcriptome shotgun assembly; genome, genome-derived gene models; SAG, single-cell amplified genome; SAT, single-cell amplified transcriptome; or a combination of types (mixed), and source of data: NCBI293, METdb7, JGI PhycoCosm300, or MMETSP3. (b) Number of clustered sequences in MarFERReT build by year of data release, and (c) Histogram showing distribution of clustered sequence count for 453 taxa in the final build.