Table 4.
Data set | False positive rate (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | AUC (%) | MCC | |
---|---|---|---|---|---|---|---|
Disease negative set |
Iter. 1 |
7.0 |
53.4 |
93.0 |
73.2 |
75.2 |
0.45 |
Iter. 2 |
7.0 |
52.5 |
93.0 |
72.8 |
75.9 |
0.44 |
|
Iter. 3 |
4.4 |
55.0 |
95.6 |
75.3 |
77.1 |
0.49 |
|
SNP negative set |
Iter. 1 |
36.8 |
73.1 |
63.2 |
68.1 |
76.4 |
0.35 |
Iter. 2 |
36.8 |
72.3 |
63.2 |
67.7 |
76.8 |
0.34 |
|
Iter. 3 |
34.2 |
71.0 |
65.8 |
68.4 |
78.3 |
0.35 |
|
Mixed negative set |
Iter. 1 |
7.9 |
56.3 |
92.1 |
74.2 |
78.8 |
0.46 |
Iter. 2 |
7.9 |
56.7 |
92.1 |
74.4 |
78.6 |
0.46 |
|
Iter. 3 |
7.0 |
64.7 |
93.0 |
78.8 |
83.5 |
0.54 |
|
Random SNP set | Iter. 1 |
0.0 |
1.3 |
100.0 |
50.6 |
50.6 |
0.06 |
Iter. 2 |
0.9 |
1.7 |
99.1 |
50.4 |
45.2 |
0.03 |
|
Iter. 3 | 29.8 | 31.1 | 70.2 | 50.6 | 50.3 | 0.01 |
Classification models were built using RF with 1,000 trees. The unseen test set was experimentally characterized with respect to the splicing phenotype. Performance benchmarks for the final classification model (Mixed negative set; Iter. 3) are highlighted in bold. Performance metrics where appropriate were calculated using a probability threshold (general score) ≥0.60. The Random SNP set is a control set. MCC, Matthews correlation coefficient.