Figure S1. Persistence of Sequence Based (SB) clusters.
Cluster persistence is defined as the relative number of genomes with at least one protein assigned to the cluster. The plots show frequency of SB clusters according to their persistence. Publicly available and complete genome sequences assigned to each taxon were selected so that phylogenetic diversity within the taxon was preserved, as described in materials and methods. 60 distinct genome sequences were considered for each taxon shown.