Table 2. Performance on the VEST-indel test set for frameshift variations.
Method | MCCb | Sensitivity | Specificity | F-scorec | False positive rate | False discovery rate |
---|---|---|---|---|---|---|
Evaluation on the consensus subseta | ||||||
ENTPRISE-X | 0.626 (0.749) |
0.943 | 0.916 | 0.620 (0.767) |
8.4% | 54% |
VEST-indel | 0.440 (0.585) |
0.914 | 0.814 | 0.421 (0.615) |
18.6% | 73% |
DDIG-in | 0.321 (0.441) |
0.943 | 0.663 | 0.297 (0.439) |
33.7% | 82% |
Evaluation on the full test set | ||||||
ENTPRISE-X | 0.586 | 0.878 | 0.912 | 0.590 | 8.8% | 55% |
Baselined | 0.323 | 0.988 | 0.621 | 0.294 | 37.9% | 83% |
Baselinee | 0.224 | 0.598 | 0.775 | 0.271 | 22.5% | 83% |
ENTPRISE-X_1f | 0.570 | 0.878 | 0.905 | 0.574 | 9.5% | 57% |
ENTPRISE-X_2f | 0.555 | 0.854 | 0.905 | 0.562 | 9.5% | 58% |
ENTPRISE-X_10altf | 0.587±0.006 | 0.887±0.006 | 0.910±0.003 | 0.590±0.006 | 9.0%±0.3% | 55.8%±0.7% |
ENTPRISE-X-nolocal | 0.481 | 0.707 | 0.914 | 0.509 | 8.6% | 60% |
ENTPRISE-X-nonew | 0.099 | 0.793 | 0.390 | 0.168 | 61.0% | 90% |
ENTPRISE-X-noratio | 0.513 | 0.890 | 0.871 | 0.509 | 12.9% | 64% |
ENTPRISE-X-noessential | 0.574 | 0.890 | 0.903 | 0.575 | 9.7% | 58% |
ENTPRISE-X-nopathogen | 0.543 | 0.866 | 0.896 | 0.546 | 10.4% | 60% |
ENTPRISE-X-nodisease | 0.368 | 0.683 | 0.859 | 0.396 | 14.1% | 72% |
ENTPRISE-X-nointeract | 0.586 | 0.890 | 0.909 | 0.588 | 9.1% | 56% |
a To be fair to all methods, only the consensus mutations of three methods are evaluated in comparison to the other methods.
b Matthew’s Correlation Coefficient. The numbers in parentheses are the maximal possible values.
c 2(precision×recall)/(precision+recall), where precision = (true positive)/(true positive + false positive), recall = (true positive)/(true positive + false negative). Numbers in parentheses are the maximal possible values.
d When only the feature representing if the gene is disease-associated or not is used.
e When only the feature representing if the gene is essential or not is used.
f When using one of the 2 models trained on each half of the pathogenic data and training ENTPRISE-X for 10 different random partitions of the pathogenic part of the training set were used.