Table 1.
Hyperparameter | Values checked | Chosen value |
---|---|---|
For all models | ||
Sampling ratio (non-CRT:CRT) | (1411:589), (2411:589), (3411:589), (4411:589) | 3411: 589 |
Class weights (non-CRT:CRT) | (1:1), (1:5), (0.59:3.4), (1:17), (1:20) | 0.59: 3.4 |
Metric | AUROC | AUROC |
Convolutional neural network—Word2Vec | ||
Max length of each abstract | 100, 150, 200, 250, 300, 350 | 300 |
Batch size (distribution) | Uniform distribution (10, 30) | 11 |
Learning rate (distribution) | Uniform distribution (0.0005, 0.005) | 0.0047 |
Dropout rate (distribution) | Uniform distribution (0.1, 0.5) | 0.29 |
Number of filters (distribution) | Uniform distribution (64, 1526) | 923 |
Kernel size (distribution) | Uniform distribution (3, 12) | 8 |
Number of epochs (distribution) | Uniform distribution (3, 20) | 7 |
Constraint applied to the kernel matrix (distribution) | 1, 1.5, 2, 2.5, 3 | 2 |
Optimizer (distribution) | Adadelta, Adam | Adam |
Embedding | Skip-gram; CBOW | Skip-gram |
Embedding dimensions | 50, 100, 200, 300 | 100 |
Number of embedding iterations | 5, 10, 15, 20 | 10 |
Loss | Binary cross-entropy | Binary cross-entropy |
Convolutional neural network—FastText | ||
Max length of each abstract | 100, 150, 200, 250, 300, 350 | 300 |
Batch size (distribution) | Uniform distribution (10, 30) | 16 |
Learning rate (distribution) | Uniform distribution (0.0005, 0.005) | 0.0026 |
Dropout rate (distribution) | Uniform distribution (0.1, 0.5) | 0.47 |
Number of filters (distribution) | Uniform distribution (64, 1526) | 532 |
Kernel size (distribution) | Uniform distribution (3, 12) | 11 |
Number of epochs (distribution) | Uniform distribution (3, 20) | 14 |
Constraint applied to the kernel matrix (distribution) | 1, 1.5, 2, 2.5, 3 | 2 |
Optimizer (distribution) | Adadelta, Adam | Adam |
Embedding | Skip-gram; CBOW | Skip-gram |
Embedding dimensions | 50, 100, 200, 300 | 100 |
Number of embedding iterations | 5, 10, 15, 20 | 10 |
Loss | Binary cross-entropy | Binary cross-entropy |
Support vector machines | ||
Kernel | linear, polynomial, sigmoid, or radial basis function | Radial basis function |
Kernel coefficient | 1, 0.1, 0.01, 0.001, 0.0001 | 0.001 |
Regularization parameter | 1, 10, 100, 1000 | 100 |
Ngrams | 1, 1 to 2, 1 to 3, 1 to 4 | 1-gram and bi-gram (1 to 2) |
Word Vectorization | Bag of Words, TF-IDF | TF-IDF |
CRT Cluster randomized trial, Ngrams A sequence of n words from a text document, TF-IDF Term frequency-inverse document frequency, CBOW Continuous bag of words model, AUROC Area under the receiver operating characteristic curve