Table 2.
The mean AUC of the best-performing models (95% confidence interval) for predicting suicide outcomes by study design, data source, sample size, and type of machine learning methods.
| Suicidal thoughts | Suicide attempt | Death by suicide | Overall | Best-performing algorithms | |
|---|---|---|---|---|---|
| Study design | |||||
| Cross-sectional | 0.884 (0.859 - 0.909) | 0.866 (0.822 - 0.911) | 0.815 (0.014 - 1.615) | 0.862 (0.835 - 0.889) | Regularized regressions |
| Longitudinal | 0.846 (0.799 - 0.892) | 0.836 (0.813 - 0.858) | 0.827 (0.798 - 0.854) | 0.829 (0.813 - 0.846) | Support Vector Machine |
| Data source | |||||
| Administrative | 0.885 (0.801 - 0.968) | 0.829 (0.783 - 0.875) | 0.826 (0.780 - 0.854) | 0.838 (0.816 - 0.861) | Support vector machine |
| Survey | 0.849 (0.819 - 0.878) | 0.857 (0.830 - 0.884) | 0.815 (0.014 - 1.615) | 0.842 (0.822 - 0.862) | Regularized regressions |
| Administrative & Survey | – | 0.822 (0.695 - 0.950) | – | 0.822 (0.695 - 0.950) | Regularized regressions |
| Total Sample size | |||||
| ≤1,000 | 0.882 (0.818 - 0.947) | 0.847 (0.799 - 0.894) | – | 0.840 (0.804 - 0.792) | Regularized regressions |
| 1,001-10,000 | 0.874 (0.845 - 0.904) | 0.826 (0.801 - 0.851) | 0.824 (0.669 - 0.979) | 0.841 (0.819 - 0.863) | Support vector machine |
| >10,000 | 0.771 (0.640 - 0.903) | 0.874 (0.828 - 0.921) | 0.825 (0.795 - 0.856) | 0.839 (0.815 - 0.862) | Regularized regressions |
| Target sample size | |||||
| ≤200 | 0.910 (0.873 - 0.947) | 0.838 (0.810 - 0.866) | 0.819 (0.645 - 0.993) | 0.843 (0.819 - 0.867) | Support vector machine |
| 201-1,000 | 0.845 (0.822 - 0.868) | 0.859 (0.811 - 0.906) | 0.831 (0.787 - 0.874) | 0.844 (0.821 - 0.866) | Regularized regressions |
| >1,000 | 0.771 (0.640 - 0.901) | 0.862 (0.793 - 0.930) | 0.822 (0.771 - 0.873) | 0.829 (0.800 - 0.859) | Gradient boosting |
| Machine learning method | |||||
| Bayesian algorithms | – | – | – | 0.764 (0.698 - 0.829) | – |
| Boosting algorithms | – | – | 0.864 (0.678 - 1.050) | 0.864 (0.827 - 0.900) | – |
| Cox regression | – | – | 0.789 (-0.491 - 2.069) | 0.762 (0.731 - 0.793) | – |
| Decision tree | – | 0.760 (0.252 - 1.268) | – | 0.729 (0.682 - 0.777) | – |
| K-nearest neighbors | – | – | – | – | – |
| Linear discriminant analysis | – | – | – | – | – |
| Logistic regression | 0.812 (0.569 - 1.054) | 0.823 (0.701 - 0.945) | 0.788 (0.630 - 0.945) | 0.789 (0.737 - 0.841) | – |
| Neural network | 0.823 (0.676 - 0.970) | 0.858 (0.750 - 0.965) | 0.838 (0.741 - 0.935) | 0.841 (0.803 - 0.879) | – |
| Random forest | 0.874 (0.846 - 0.901) | 0.879 (0.848 - 0.909) | 0.841 (0.801 - 0.881) | 0.870 (0.852 - 0.889) | – |
| Regularized regressions | 0.795 (-0.412 - 2.002) | 0.851 (0.807 - 0.894) | 0.805 (0.100 - 1.511) | 0.841 (0.801 - 0.879) | – |
| Super learner | 0.860 (0.720 - 1.005) | 0.802 (0.708 - 0.896) | – | 0.835 (0.796 - 0.875) | – |
| Support vector machine | 0.930 (0.040 - 1.819) | 0.712 (0.616 - 0.808) | – | 0.877 (0.589 - 1.164) | – |
AUC, Area under the receiver operating characteristic curve.
The symbol "-" means no data is available to compute the summary statistics in the cell.