Skip to main content
. 2026 Feb 6;13:e84318. doi: 10.2196/84318

Table 2.

Performance comparison of 4 machine learning models evaluated using area under the receiver operating characteristic curve (AUC) and recall. Metrics were averaged across multiple runs with different random seeds to ensure robustness.

Category and subcategory AUC Recall
Model selection, mean (SD)

Held-out patients


LightGBMa 0.51 (0.004) 0.02 (0.009)


FNNb 0.52 (0.005) 0.05 (0.011)


LSTMc 0.87 (0.002) 0.75 (0.02)


LSTM+attention 0.87 (0.002) 0.74 (0.04)

Out-of-time patients


LightGBM 0.52 (0.002) 0.04 (0.003)


FNN 0.54 (0.01) 0.08 (0.03)


LSTM 0.84 (0.01) 0.72 (0.01)


LSTM+attention 0.85 (0.01) 0.75 (0.02)
PRIME’sd performance

Sex


Male 0.83 0.36


Female 0.84 0.29


Intersex 0.87 0.23

Race


Black 0.69 e 0.16


First Nations 0.8 0.16


White 0.84 0.38


Other racial identities 0.81 0.27

Sexual orientation


Heterosexual 0.82 0.34


Other 0.84 0.34

Program type


Regional (nonforensic) 0.83 0.34


Provincial (forensic) 0.8 0.27

Age group (years)


18-65 0.81 0.32


≥65 0.81 0.38

All 0.81 0.3

aLightGBM: light gradient boosting machine.

bFNN: feedforward neural network.

cLSTM: long short-term memory.

dPRIME: Predictive Risk Identification for Mental Health Events.

eItalicization indicates significance.