Table 2. Summary of performance measures of STEED compared with manual human ascertainment.
Specificity | Sensitivity | Precision | Accuracy | F1-score | |
---|---|---|---|---|---|
Training corpus (motor neuron diseases, n = 45) | |||||
Species | NA | 96 | 100 | 96 | 0.98 |
Sex | 67 | 85 | 94 | 82 | 0.89 |
Disease model | NA | 96 | 100 | 96 | 0.98 |
Outcome histology | 89 | 92 | 97 | 91 | 0.94 |
Outcome behaviour | 50 | 97 | 84 | 84 | 0.90 |
Outcome imaging | 96 | NA | NA | 96 | NA |
Randomization | 84 | 96 | 89 | 91 | 0.93 |
Blinding | 95 | 92 | 96 | 93 | 0.94 |
Animal welfare | NA | 86 | 97 | 84 | 0.92 |
Conflict of interest | 100 | 98 | 100 | 97 | 0.99 |
Sample size calculation | 78 | 92 | 63 | 82 | 0.75 |
ARRIVE guidelines | 100 | 100 | 100 | 100 | 1.00 |
Data availability | 85 | 94 | 94 | 91 | 0.94 |
Validation corpus 1 (motor neuron diseases, n = 31) | |||||
Species | NA | 100 | 100 | 100 | 1.00 |
Sex | 100 | 74 | 100 | 84 | 0.85 |
Disease model | NA | 90 | 100 | 90 | 0.95 |
Outcome histology | 100 | 96 | 100 | 97 | 0.98 |
Outcome behaviour | 78 | 85 | 76 | 81 | 0.79 |
Outcome imaging | NA | 100 | 100 | 100 | 1.00 |
Randomization | 100 | 86 | 100 | 97 | 0.92 |
Blinding | 100 | 89 | 100 | 97 | 0.94 |
Animal welfare | 100 | 89 | 100 | 90 | 0.94 |
Conflict of interest | 92 | 94 | 94 | 94 | 0.94 |
Sample size calculation | 81 | 80 | 44 | 81 | 0.57 |
ARRIVE guidelines | 100 | NA | NA | 100 | NA |
Data availability | 96 | 83 | 83 | 94 | 0.83 |
Validation corpus 2 (multiple sclerosis, n = 244) | |||||
Species | NA | 75 | 100 | 75 | 0.86 |
Sex | 76 | 83 | 93 | 82 | 0.88 |
Disease model | NA | 87 | 100 | 88 | 0.93 |
Outcome histology | 64 | 96 | 93 | 91 | 0.95 |
Outcome behaviour | 66 | 91 | 81 | 82 | 0.86 |
Outcome imaging | NA | 94 | 100 | 94 | 0.97 |
Randomization | 93 | 81 | 75 | 90 | 0.78 |
Blinding | 98 | 85 | 96 | 93 | 0.90 |
Animal welfare | 86 | 80 | 95 | 82 | 0.87 |
Conflict of interest | 96 | 97 | 90 | 97 | 0.93 |
Sample size calculation | 94 | 100 | 27 | 97 | 0.43 |
ARRIVE guidelines | 100 | 100 | 100 | 100 | 1.00 |
Data availability | 100 | 80 | 80 | 100 | 0.80 |
Specificity, sensitivity, precision, and accuracy are denoted in percentage. For details regarding measures, please see the materials and methods section. Items reaching or exceeding our pre-defined thresholds (sensitivity of 85% and a specificity of 80%) are printed in bold font.