Table 2.
Text Type | Language Model | Classifiers |
|||
---|---|---|---|---|---|
LR | RF | SVM | GBM | ||
Titles | TFIDF | 0.7774 | 0.7942 | 0.8751 | 0.7218 |
LDA | 0.6128 | 0.6829 | 0.6584 | 0.7065 | |
DEEP | 0.7696 | 0.7402 | 0.8429 | 0.8029 | |
PDEEP | 0.6262 | 0.5482 | 0.4836 | 0.6445 | |
Abstracts | TFIDF | 0.9220 | 0.8682 | 0.9371 | 0.8396 |
LDA | 0.6419 | 0.6936 | 0.6512 | 0.7349 | |
DEEP | 0.7775 | 0.8119 | 0.8480 | 0.7987 | |
PDEEP | – | – | – | – | |
Function Descriptions | TFIDF | 0.7412 | 0.7439 | 0.7715 | 0.6947 |
LDA | 0.6128 | 0.6829 | 0.6582 | 0.7065 | |
DEEP | 0.8929 | 0.8962 | 0.9184 | 0.8788 | |
PDEEP | 0.7017 | 0.7211 | 0.3474 | 0.6917 |
Two-class weighted F-score was reported, where F-score of MP and non-MP was calculated and weighted average of them was taken, where the weights are the number of data points of each class. The values shown are the average of the test sets in the Five-fold cross-validation. LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; GBM, Gradient Boosted Machine.