Skip to main content
. 2020 Nov 17;28(3):541–548. doi: 10.1093/jamia/ocaa263

Table 2.

Tested methods, functions, and parameters

Method Tested functions and parameters
Text vectorization

Function: TfidfVectorizer()

ngram_range: [(1,1), (1,2), (1,3)]

max_df: [0.70, 0.80, 0.90, 0.95, 1.0]

min_df: [2, 10, 50]

binary: [False, True]

use_idf: [False, True]

norm: ['l1', ‘l2’, None]

Logistic regression

Function: LogisticRegression()

penalty: ‘none’

class_weight: ‘balanced’

max_iter: 1e4

solver: ‘saga’

Support vector machine

Function: SVC()

kernel: ‘linear’

class_weight: ‘balanced’

max_iter: 1e4

Random forest

Function: RandomForestClassifier()

class_weight: ‘balanced’

Adaptive boosting Function: AdaBoostClassifier()
Neural networks

Function: Sequential()

Layers:

Dense(units = number of variables, activation = ‘relu’)

Dropout(dropout = 0.2)

Dense(units = 1, activation = ‘sigmoid’)

optimizer: ‘adam’

loss: ‘binary_crossentropy’

metrics: ‘binary_accuracy’

epochs: 1000

callbacks: EarlyStopping(monitor = ‘val_loss’, min_delta = 0.01)

output threshold: [0.01, 0.02, …, 0.98, 0.99]