Skip to main content
. 2023 Aug 22;14:5098. doi: 10.1038/s41467-023-40726-8

Fig. 4. High proportion and diversity of COG4948 proteins in the SAR202 clade.

Fig. 4

a Proportion of COG4948 proteins among the species cluster-representative genomes of GTDB (release 202; 47,894 genomes). Only the top 200 genomes with the highest proportion are included. Genomes of o_UBA1151 (SAR202 group I) are indicated with a distinct color. All 48 o__UBA1151 genomes are included in this figure owing to the high proportion of COG4948. The black arrow on the y-axis indicates the proportion of the four SAR202 genomes sequenced in this study (~2.8%). Source data are provided as a Source data file. b SSN (sequence similarity network) analysis of diverse COG4948 proteins predicted in the JH545 genome. UniProt proteins with the two Pfam domains, PF02746 and PF13378, were included in the analysis together with the 80 COG4948 proteins of strain JH545. Only the protein clusters including the JH454 proteins were retained in this figure after applying a threshold cutoff value to the alignment score (≥150). Nodes (proteins) are colored by their phylum-level taxonomy, following the color codes at the bottom. The 80 COG4948 proteins of the JH545 genome are indicated in red triangles. The four proteins that have been studied experimentally are indicated with black rectangles (UniProt ID: D8ADB5, C9A1P5, C6CBG9, and A8RQK7).