Table 2.
Performance comparison of 4 machine learning models evaluated using area under the receiver operating characteristic curve (AUC) and recall. Metrics were averaged across multiple runs with different random seeds to ensure robustness.
| Category and subcategory | AUC | Recall | ||
| Model selection, mean (SD) | ||||
|
|
Held-out patients | |||
|
|
|
LightGBMa | 0.51 (0.004) | 0.02 (0.009) |
|
|
|
FNNb | 0.52 (0.005) | 0.05 (0.011) |
|
|
|
LSTMc | 0.87 (0.002) | 0.75 (0.02) |
|
|
|
LSTM+attention | 0.87 (0.002) | 0.74 (0.04) |
|
|
Out-of-time patients | |||
|
|
|
LightGBM | 0.52 (0.002) | 0.04 (0.003) |
|
|
|
FNN | 0.54 (0.01) | 0.08 (0.03) |
|
|
|
LSTM | 0.84 (0.01) | 0.72 (0.01) |
|
|
|
LSTM+attention | 0.85 (0.01) | 0.75 (0.02) |
| PRIME’sd performance | ||||
|
|
Sex | |||
|
|
|
Male | 0.83 | 0.36 |
|
|
|
Female | 0.84 | 0.29 |
|
|
|
Intersex | 0.87 | 0.23 |
|
|
Race | |||
|
|
|
Black | 0.69 e | 0.16 |
|
|
|
First Nations | 0.8 | 0.16 |
|
|
|
White | 0.84 | 0.38 |
|
|
|
Other racial identities | 0.81 | 0.27 |
|
|
Sexual orientation | |||
|
|
|
Heterosexual | 0.82 | 0.34 |
|
|
|
Other | 0.84 | 0.34 |
|
|
Program type | |||
|
|
|
Regional (nonforensic) | 0.83 | 0.34 |
|
|
|
Provincial (forensic) | 0.8 | 0.27 |
|
|
Age group (years) | |||
|
|
|
18-65 | 0.81 | 0.32 |
|
|
|
≥65 | 0.81 | 0.38 |
|
|
All | 0.81 | 0.3 | |
aLightGBM: light gradient boosting machine.
bFNN: feedforward neural network.
cLSTM: long short-term memory.
dPRIME: Predictive Risk Identification for Mental Health Events.
eItalicization indicates significance.