Skip to main content
. 2017 Jul 12;33(14):i83–i91. doi: 10.1093/bioinformatics/btx231

Table 2.

Summary of the text-level prediction with different combinations of text types, language models and classifiers

Text Type Language Model Classifiers
LR RF SVM GBM
Titles TFIDF 0.7774 0.7942 0.8751 0.7218
LDA 0.6128 0.6829 0.6584 0.7065
DEEP 0.7696 0.7402 0.8429 0.8029
PDEEP 0.6262 0.5482 0.4836 0.6445
Abstracts TFIDF 0.9220 0.8682 0.9371 0.8396
LDA 0.6419 0.6936 0.6512 0.7349
DEEP 0.7775 0.8119 0.8480 0.7987
PDEEP
Function Descriptions TFIDF 0.7412 0.7439 0.7715 0.6947
LDA 0.6128 0.6829 0.6582 0.7065
DEEP 0.8929 0.8962 0.9184 0.8788
PDEEP 0.7017 0.7211 0.3474 0.6917

Two-class weighted F-score was reported, where F-score of MP and non-MP was calculated and weighted average of them was taken, where the weights are the number of data points of each class. The values shown are the average of the test sets in the Five-fold cross-validation. LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; GBM, Gradient Boosted Machine.