Skip to main content
. 2021 May 31;118(23):e2101349118. doi: 10.1073/pnas.2101349118

Fig. 4.

Fig. 4.

Sequence clustering of long tandem highly identical repeats identified across proteins of the NCTC3000 genomes. The largest clusters are annotated using Pfam (Pfam IDs are found in Dataset S1). Clusters with unknown Pfam classification are marked with “?”. New protein domain families built from sequence clusters into Pfam are highlighted in blue.