Table 1. Robustness of the automatic classification.
Set | Ali | Score | Norm | Cl. Al. | N.Clu. | Clus.co. | T.V. | WKSS | WKSF | WKCS | WKCF |
SCOP 2890 | MM | Cont. | Gauss | AL | 779 | 0.90 | 0.072 | 0.54 | 0.69 | 0.58 | 0.32 |
SCOP 2890 | MM | TM | No | AL | 740 | 0.87 | 0.101 | 0.59 | 0.60 | 0.55 | 0.22 |
SCOP 2890 | MM | PSI4-p | EV | AL | 768 | 0.88 | 0.088 | 0.51 | 0.57 | 0.51 | 0.24 |
SCOP 2890 | MM | PSI6-p | EV | AL | 855 | 0.87 | 0.113 | 0.54 | 0.58 | 0.52 | 0.27 |
SCOP 2890 | MM | PSI4-t | EV | AL | 788 | 0.88 | 0.084 | 0.49 | 0.60 | 0.48 | 0.26 |
SCOP 2890 | MM | Cont. | No | AL | 883 | 0.88 | 0.069 | 0.57 | 0.50 | 0.53 | 0.27 |
SCOP 2890 | MP | Cont. | No | AL | 950 | 0.86 | 0.070 | 0.51 | 0.54 | 0.53 | 0.23 |
SCOP 2890 | MP | PSI4-p | EV | AL | 797 | 0.77 | 0.089 | 0.47 | 0.44 | 0.49 | 0.19 |
SCOP 2890 | MP | PSI4-t | EV | AL | 758 | 0.88 | 0.085 | 0.51 | 0.54 | 0.51 | 0.25 |
SCOP 2890 | MM | Cont. | Gauss | SL | 876 | 0.90 | 0.167 | 0.24 | 0.48 | 0.54 | 0.69 |
SCOP 2890 | MM | Cont. | Gauss | CL | 730 | 0.90 | 0.080 | 0.26 | 0.47 | 0.43 | 0.10 |
CATH 2890 | MM | Cont. | Gauss | AL | 776 | 0.90 | 0.079 | 0.50 | 0.71 | 0.54 | 0.36 |
SCOP 5041 | MM | Cont. | Gauss | AL | 1353 | 0.92 | 0.063 | 0.61 | 0.52 | - | - |
CATH 7073 | MM | Cont. | Gauss | AL | 2287 | 0.91 | 0.068 | - | - | 0.51 | 0.14 |
The qualitative features of the classification at the cross-over point are robust with respect to different methodological choices. First column, set of domains at less than 40 percent sequence identity: either 2890 domains from SCOP, or the corresponding 2890 domains from CATH, or 5041 domains from SCOP, or 7073 domains from CATH. The number of superfamilies and folds is, respectively: SCOP 2890: 779, 466; CATH 2890: 873, 473; SCOP 5041: 1094, 660; CATH 7073: 995, 1852. 2nd column, alignment algorithm: either the multiple structure alignment algorithm MAMMOTH multiple (MM) or its pairwise version (MP), faster but much less accurate. 3rd column, similarity measures: either Contact Overlap (Cont.) or TM score (TM) or percentage of structure identity (PSI). This can have a tolerance of either 4Å or 6Å , and it can be normalized either with respect the length of the shortest domain, PSI partial (PSI-p), or with respect to the geometric average, PSI total (PSI-t). 4th column, normalization with respect to length: either none, or Gaussian statistics (Gauss) or extreme value statistics (EV) 5th column, clustering algorithms: either average linkage (AL), or single linkage (SL) or complete linkage (CL). The results presented are the following. Number of clusters at the cross-over point (6th column), clustering coefficient averaged until the cross-over similarity (7th column), mean transitivity violations(8th column) and weighted kappa with respect to SCOP superfamilies (9th column), SCOP folds (10th column), CATH superfamilies (11th column) and CATH topologies (12th column), The first line in bold face refers to the selected choices, used in the presented results. In the following lines we evidence in bold face the variables that have changed with respect to the reference.