Figure 3.
Updating species clusters with each GTDB release. (A) Workflow for updating GTDB species clusters with results for the most recent GTDB release, R06-RS202, given below each step. There were 90 368 new genomes in this release, 987 genomes where the assembly at NCBI was updated, and 158 genomes where the assembly was suppressed at NCBI and thus not used in this release. All genomes were subjected to quality control which resulted in 26 407 (9.3%) genomes being removed from consideration. There were 2,458 species where multiple genomes were identified as being assembled from the type strain of the species. Of these, 130 species had genomes that were sufficiently divergent to warrant manual inspection to establish the genome most likely to be from the type strain. The 31 910 representatives from the previous GTDB release, R05-RS95, were examined and 1131 (3.5%) updated to a new genome. In addition, 6 species defined in R05-RS95 were retired as the sole genome representing the species was suppressed at NCBI. (B) Illustrative example of a GTDB species cluster with previous and new genomes. Genomes are depicted by shapes and the distance between genomes scales with their ANI divergence. The large red circle indicates the ANI circumscription radii for assigning genomes to the current species clusters. The new/updated genome (blue triangle) will only replace the existing GTDB species representative (red circle) if the ANI between these genomes is sufficiently high and the new/updated genome is of sufficient quality (see Table 1). This decision is determined quantitatively using the balanced ANI score (see main text). (C) Updating the Macrococcus equipercicus species cluster from GTDB R05-RS95 to R06-RS202. The M. equipercicus genome assembly, GCF_004359525.1, was updated and found to be distinct from the previous assembly (ANI = 80.6%). Consequently, this genome formed a new species cluster and the genome GCF_004359515.1 was promoted to a species representative. GCF_004359525.2 is assembled from the type strain of M. equipercicus and GCF_004359515.1 assembled from the type strain of M. carouselicus indicating the M. equipercicus cluster in GTDB R05-RS95 actually represented the species M. carouselicus and was incorrectly classified as a result of the GCF_004359525.1 assembly being incorrect.