Table 2. Accuracy of clustering in Training and Test-18 sets.
Training Set † | True Positive (TP) | False Positive (FP) | False Negative (FN) | True Negative (TN) | Accuracy | MCC |
1st clustering | ||||||
49 | 1 | 51 | 59 | 67.5% | 0.49 | |
2nd Clustering | ||||||
31 | 17 | 20 | 42 | 66.4% | 0.32 | |
Total | 80 | 18 | 20 | 42 | 76.3% | 0.50 |
Test 18 Set † | 1st clustering | |||||
3 | 0 | 10 | 5 | 44.4% | 0.28 | |
2nd clustering | ||||||
6 | 0 | 4 | 5 | 73.3% | 0.58 | |
Total | 9 | 0 | 4 | 5 | 77.8% | 0.62 |
†) TP: FLIP found in Cluster 1TN: FUNC found in Cluster 2
FP: FUNC found in Cluster 1FN: FLIP found in Cluster 2
The accuracy and Matthews correlation coefficient (MCC, a measure of the quality of a binary classification) of the results of the clusterings shown in Figure 4 are indicated. The overall accuracy is 76% and 78% for both training Test-18 sets, respectively. TPs are quite readily identified in both training and Test-18 sets (80% and 69% sensitivity, respectively). The majority of TPs are enzymes and immunoglobin heavy chain-light chain interactions. TNs are less well identified (70% and 56% negative predictive values, respectively). MCCs of 0.50 and 0.62 indicate that our simple two-category approach is generally appropriate.