Extended Data Fig. 3. Many unannotated proteins have structural similarity to annotated protein clusters.
A. Many protein clusters contain a mix of annotated and unannotated sequence clusters. Each “wheel” of nodes indicates a protein cluster, with individual nodes representing individual sequence clusters. Each sequence cluster node is colored based on if it is annotated (gray) or unannotated (red). All protein clusters with at least one annotated and one unannotated protein cluster are shown. Numbers below each wheel indicate the cluster ID. B-G. (Left) A network of sequence clusters that belong to each protein cluster, where nodes that are red are unannotated and those that are gray are annotated. The centroid is the protein cluster representative. (Right) Members of annotated and unannotated sequence clusters are highlighted, where the structure of an annotated protein (left) is compared to the structure of an unannotated protein (right). Proteins are colored based on pLDDT, with red indicating higher pLDDT and blue indicating lower pLDDT. The RMSD between the two structures is indicated.