Figure 4:
Distribution of GOS and MetaHIT protein clusters. The x-axis is the cluster size X. The y-axis in left figures is the number of clusters of size at least X; the y-axis in right figures is the percentage of total sequences included in the clusters of size at least X. Graphs in (A) and (B) are for all GOS and MetaHIT sequences. Graphs in (C) and (D) are only for MetaHIT sequences, grouped by Known and Novel clusters. In addition, two separate lines are made for NR sequences (i.e. the 3 076 514 representative sequences clustered at 90% identity).