Table 2.
Method | Tested functions and parameters |
---|---|
Text vectorization |
Function: TfidfVectorizer() ngram_range: [(1,1), (1,2), (1,3)] max_df: [0.70, 0.80, 0.90, 0.95, 1.0] min_df: [2, 10, 50] binary: [False, True] use_idf: [False, True] norm: ['l1', ‘l2’, None] |
Logistic regression |
Function: LogisticRegression() penalty: ‘none’ class_weight: ‘balanced’ max_iter: 1e4 solver: ‘saga’ |
Support vector machine |
Function: SVC() kernel: ‘linear’ class_weight: ‘balanced’ max_iter: 1e4 |
Random forest |
Function: RandomForestClassifier() class_weight: ‘balanced’ |
Adaptive boosting | Function: AdaBoostClassifier() |
Neural networks |
Function: Sequential() Layers: Dense(units = number of variables, activation = ‘relu’) Dropout(dropout = 0.2) Dense(units = 1, activation = ‘sigmoid’) optimizer: ‘adam’ loss: ‘binary_crossentropy’ metrics: ‘binary_accuracy’ epochs: 1000 callbacks: EarlyStopping(monitor = ‘val_loss’, min_delta = 0.01) output threshold: [0.01, 0.02, …, 0.98, 0.99] |