Fig. 1.

Venn diagram of datasets of the same/different folds. Set-I contains 746 420 same Fold domain pairs generated from 11 239 protein domains in SCOP. Set-II consists of 2 769 868 same Topology domain pairs generated from 14 830 protein domains in CATH. Set-III is the overlap part of Set-I and Set-II, which includes 186 359 pairs from 5105 consensus domains. Set-IV contains 13 027 960 all-to-all pairs from the 5105 consensus domains. Set-I′ is the different fold set for SCOP, generated by subtracting a subset of Set-I from Set-IV. Set-II′ is the different fold set for CATH, generated by subtracting a subset of Set-II from Set-IV. Set-III′ is the different fold set for Set-III and obtained by subtracting subsets of Set-I and Set-II from Set-IV.