Skip to main content
. 2021 Jan 25;2020:442–451.

Table 2:

K-fold cross validation of different machine learning classifiers: MLPClassifier (MLP), XGBClassifier (XGB), KNeigh-borsClassifier (KNN), RandomForestClassifier (RF), DecisionTreeClassifier (DT) using two different types of features (1) tf-idf and word2vec trained on twitter and (2) tf-idf and word2vec trained Wikipedia, PubMed and PMC. On the basis of reported precision or positive predictive value (PPV), recall or true positive rate (TPR) and F1 score (F1), MLPClassifier turned out to be the best performing classifier using tf-idf and word2vec trained on twitter.

Using tf-idf and word2vec trained on twitter Using tf-idf and word2vec trained Wikipedia, PubMed and PMC
3-fold 5-fold 10-fold 3-fold 5-fold 10-fold
PPV Recall F1 PPV TPR F1 PPV TPR F1 PPV TPR F1 PPV TPR F1 PPV TPR F1
MLP 0.62 0.71 0.66 0.76 0.75 0.76 0.80 0.82 0.80 0.63 0.71 0.66 0.74 0.75 0.75 0.79 0.81 0.79
XGB 0.58 0.68 0.62 0.71 0.73 0.72 0.78 0.81 0.79 0.58 0.70 0.63 0.72 0.73 0.72 0.78 0.79 0.78
KNN 0.59 0.55 0.57 0.67 0.59 0.62 0.70 0.62 0.66 0.59 0.66 0.62 0.65 0.68 0.66 0.69 0.71 0.70
RF 0.62 0.63 0.62 0.72 0.64 0.66 0.74 0.67 0.70 0.61 0.62. 0.63 0.72 0.63 0.67 0.74 0.68 0.69
DT 0.54 0.64 0.59 0.64 0.68 0.65 0.69 0.70 0.69 0.56 0.64 0.60 0.63 0.66 0.65 0.67 0.71 0.68