Table 4.
Text Type | Language Model | Classifiers |
|||
---|---|---|---|---|---|
LR | RF | SVM | GBM | ||
Titles | TFIDF | 0.7703 | 0.7474 | 0.8330 | 0.6901 |
LDA | 0.5654 | 0.5723 | 0.5836 | 0.6227 | |
DEEP | 0.6651 | 0.6698 | 0.7557 | 0.6826 | |
PDEEP | 0.6611 | 0.5278 | 0.4314 | 0.6021 | |
Abstracts | TFIDF | 0.8132 | 0.8225 | 0.8208 | 0.7833 |
LDA | 0.5459 | 0.5739 | 0.5342 | 0.5713 | |
DEEP | 0.7650 | 0.8105 | 0.7747 | 0.7909 | |
PDEEP | – | – | – | – | |
Function Descriptions | TFIDF | 0.7412 | 0.7439 | 0.7715 | 0.6947 |
LDA | 0.6128 | 0.6829 | 0.6582 | 0.7065 | |
DEEP | 0.8929 | 0.8962 | 0.9184 | 0.8788 | |
PDEEP | 0.7017 | 0.7211 | 0.3474 | 0.6917 |
F-score was reported. The values shown are the average of the test sets in the five-fold cross validation. LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; GBM, Gradient Boosted Machine. For each text type, titles, abstracts and function descriptions, the best performing language model under four classifiers is highlighted in bold.