Skip to main content
. 2017 Jul 12;33(14):i83–i91. doi: 10.1093/bioinformatics/btx231

Table 4.

Summary of the protein-level prediction

Text Type Language Model Classifiers
LR RF SVM GBM
Titles TFIDF 0.7703 0.7474 0.8330 0.6901
LDA 0.5654 0.5723 0.5836 0.6227
DEEP 0.6651 0.6698 0.7557 0.6826
PDEEP 0.6611 0.5278 0.4314 0.6021
Abstracts TFIDF 0.8132 0.8225 0.8208 0.7833
LDA 0.5459 0.5739 0.5342 0.5713
DEEP 0.7650 0.8105 0.7747 0.7909
PDEEP
Function Descriptions TFIDF 0.7412 0.7439 0.7715 0.6947
LDA 0.6128 0.6829 0.6582 0.7065
DEEP 0.8929 0.8962 0.9184 0.8788
PDEEP 0.7017 0.7211 0.3474 0.6917

F-score was reported. The values shown are the average of the test sets in the five-fold cross validation. LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; GBM, Gradient Boosted Machine. For each text type, titles, abstracts and function descriptions, the best performing language model under four classifiers is highlighted in bold.