Abstract
Machine Learning models trained from real-world data have demonstrated promise in predicting suicide attempts in adolescents. However, their transportability, namely the performance of a model trained on one dataset and applied to different data, is largely unknown, hindering the clinical adoption of these models. Here we developed different machine learning-based suicide prediction models based on real-world data collected in different contexts (inpatient, outpatient, and all encounters) with varying purposes (administrative claims and electronic health records), and compared their cross-data performance. The three datasets used were the All-Payer Claims Database in Connecticut, the Hospital Inpatient Discharge Database in Connecticut, and the Electronic Health Records data provided by the Kansas Health Information Network. We included 285,320 patients among whom we identified 3389 (1.2%) suicide attempters and 66% of the suicide attempters were female. Different machine learning models were evaluated on source datasets where models were trained and then applied to target datasets. More complex models, particularly deep long short-term memory neural network models, did not outperform simpler regularized logistic regression models in terms of both local and transported performance. Transported models exhibited varying performance, showing drops or even improvements compared to their source performance. While they can achieve satisfactory transported performance, they are usually upper-bounded by the best performance of locally developed models, and they can identify additional new cases in target data. Our study uncovers complex transportability patterns and could facilitate the development of suicide prediction models with better performance and generalizability.
Subject terms: Psychiatric disorders, Scientific community
Introduction
Youth suicide is a major public health threat. Suicide is the second most common cause of death among adolescents and young adults [1, 2], with recent data from the CDC indicating that death by suicide among children and young adults ages 10-24 in the US increased by 57% between 2007 and 2018, from 6.8 to 10.7 per 100,000 [2]. Abundant data is indicating that individuals have had contact with their both primary and mental health care providers before suicide [3]. In a longitudinal study of eight Mental Health Research Network healthcare systems in the US, Ahmedani et al. found that 83% of individuals who died of suicide had received healthcare during the year before death [4]. In a recently published study, 62% of pediatric patients treated for suicide attempts in an urban pediatric hospital had a non-suicide-related visit within 90 days before the attempt [5]. These data point to the enormous opportunity for suicide prevention through improved surveillance, detection, and intervention in the healthcare system.
As a result of NIMH’s prioritization of the development of suicide risk prediction, there are now several published algorithms using data mining and machine learning (ML) approaches with Real-World Clinical Data (RWD) to predict suicidal behavior and suicide mortality among patients in large healthcare systems [5–13]. Follow-ups of patients completing suicide risk assessments have found that predictive models achieved higher sensitivity and specificity in identifying suicidal behavior than clinical assessments [14]. As part of the Food and Drug Administration’s Mini-Sentinel pilot program, a systematic review examined the efforts of five validated algorithms to predict suicide attempts and completed suicide and found that the sensitivity of the algorithms ranged up to 65% and positive predictive value ranged up to 100% [15, 16]. Moreover, recent studies indicate that such models are both generalizable and durable: they can be used effectively outside of the specific clinical settings in which they were produced [17], and maintain predictive accuracy over short to intermediate timeframes [18].
With the exception of the two aforementioned studies, however, barriers to the use of ML models developed for suicide risk prediction have not been adequately investigated. This would involve a more comprehensive investigation of the transportability of different ML models across different data types and the subsequent examination of cross-data performance and cross-data risk factors. For example, the discussion of transportability in ML-based prediction models often assumes that complex prediction models with many variables outperform simpler models [19]. Since complex models might require more resources to estimate, validate, and implement, it’s crucial to investigate if there exists a substantial improvement in using complex models over simpler ones, considering the resource constraints on the requirements. Besides, before ML-based prediction models can be implemented in clinical practice, their transportability must be evaluated, i.e., the models should be able to produce accurate predictions on new sets of patients from different settings, e.g., clinical settings, geographical locations, or time periods. However, there are many barriers to independent validation of these models, including patient heterogeneity, clinical process variability, and EHR configuration and non-interoperable databases across hospitals [20–23]. Furthermore, prediction model performance can vary based on changing patient populations, and shifts in clinical practice.
To fill the research gap, here we compared the performance and transportability of different ML-based suicide prediction models [including regularized logistic regression (LR), gradient boosting machine (GBM), and the deep long-short-term memory neural network (LSTM)] for children and adolescents across three RWD sets with different data types (including claims data and EHR data) and collected at different touchpoints (inpatient, outpatient, and all encounters). The local and transported performance of different models were compared across all enumerated source-target data pairs. We observed complex transportability patterns: a) transported models exhibited not only performance drops but also improvements on target data compared to their performance on source data; b) the LSTM model did not necessarily outperform simpler models, such as LR and GBM, in both local and transported settings; c) while transported models might achieve satisfactory performance, it is generally upper-bounded by the best performance of locally developed models; and d) additionally, transported models might identify new cases missed by locally developed models. Our analyses can help evaluate the readiness of these models for application or transport in different clinical settings, and facilitate the development of suicide prediction models with better performance and generalizability.
Method
Datasets
The following three real-world patient databases were used in our study:
The All-Payer Claims Database (APCD) from Connecticut [24], included medical and pharmacy claims for Connecticut residents from January 1, 2012, to December 31, 2017. The APCD contains both inpatient and outpatient encounters from approximately 35% of the commercially insured Connecticut population.
The Hospital Inpatient Discharge Database (HIDD) from Connecticut [25], contained inpatient hospitalization encounters from all acute care hospitals in the state from October 1, 2005, to September 30, 2017.
The Electronic Health Records (EHR) data provided by the Kansas Health Information Network (KHIN) [26], included the EHR collected from a patient population in Kansas with all types of encounters such as Outpatient, Inpatient, Emergency Room, etc, from 2013 to 2018.
This study was approved by the University of Connecticut Health Center Institutional Review Board and Weill Cornell Medical College Institutional Review Board, the CT Department of Public Health Human Investigations Committee, and the CT APCD Data Release Committee. All EHRs used in this study were appropriately deidentified and thus no informed consent from patients was obtained. The study was performed under the ethical standards of human experimentation established in the Declaration of Helsinki and subsequent amendments [27].
Study cohorts
Study cohorts consisted of children, adolescents, and young adults aged from 10 to 24 from three datasets (APCD, HIDD, and KHIN) who had at least one non-suicidal diagnosis in the recruiting window. Specifically, the recruiting windows were January 1, 2014, to December 31, 2015, for APCD, January 1, 2012, to September 30, 2015, for HIDD, and January 1, 2014, to December 31, 2015, for KHIN, respectively. Each recruited patients were followed up for two years. We excluded patients whose first EHR-documented visit had any suicide attempts. Suicide attempts were identified using ICD-9 codes with detailed rules listed in Supplementary Table S1. Figure 1 summarizes the inclusion-exclusion cascades for the three databases. Detailed descriptions of cohorts can be found in our previous analyses [13].
Fig. 1. Cohort selection.
Cohort selection from three datasets: a APCD, the All-Payer Claims Database from Connecticut; b HIDD, the Hospital Inpatient Discharge Database from Connecticut; and c KHIN, the Electronic Health Records data from the Kansas Health Information Network.
Follow-up and outcome of interest
The outcome of interest was the first documented suicide attempt (SA) in the follow-up period. For each qualified patient, the follow-up period starts from the first non-suicide-related hospital encounter within the recruiting window and continues until a suicide attempt, two years, or the end of the study period (the end of 2017), whichever happens first. The index time was defined as the time of the last non-suicide visit, namely the visit just before the outcome of interest for the cases or the last visit before the end of the follow-up for the non-cases.
Baseline covariates
Predictor covariates included self-reported demographic information including age, gender, and diagnosis information (ICD-9/10) from each medical encounter. The age was categorized into three age groups (10–14, 15–19, 20–24) and self-reported gender into two groups (female and male). The age at their last recorded non-suicidal visit was reported. For the suicide attempters, we also reported their age at their first recorded suicide attempt visit. The diagnosis information in both APCD and HIDD were encoded with ICD-9 codes, whereas only KHIN was encoded with ICD-9 and ICD-10 codes. We used the R package “touch” [28] to convert the ICD-10 codes to the ICD-9 codes for consistency for the KHIN data only. The diagnostic codes were aggregated by their first three digits and the top 300 most prevalent codes from each site were selected and combined into a shared feature space. For each patient, we used his/her earliest possible historical information to the last non-suicide visit and required at least one non-suicide visit. We considered both aggregated features as well as a sequence of features for different ML models as mentioned below.
Machine learning predictive models
Three machine learning (ML) models were used for predictive modeling. The first one was regularized logistic regression (LR) and we adopted both L1-norm and L2-norm penalty and grid searched the inverse of regularization strength from to with 0.2 as the sampling step size for the exponent. The second one was the gradient boosting machine (GBM) with the random forest as the base learner. We grid searched hyperparameters from maximum depth (3, 4, 5), max number of leaves in one tree (5, 15, 25, 35, 45), and the minimal number of data in one leaf (50, 100, 150, 200, 250). The third was the Long short-term memory (LSTM) neural network with temporal attention mechanisms [29–32]. We grid-searched hyperparameter configurations including a two-layered bidirectional LSTM with hidden dimensions (32, 64), learning rate (1e-3, 1e-4), and weight decay (1e-3, 1e-4. 1e-5). The LR and GBM used aggregated features before index visits for LR and GBM while the deep LSTM model used a sequence of features for modeling.
All datasets were first divided randomly into the training set and testing set according to the ratio of 80:20. We used the training set to select and learn the best model parameters according to a five-fold cross-validation pipeline, and evaluate the final performance on the hold-out testing set for the following evaluations.
Model evaluation
Both local and transported performance were evaluated for all investigated ML-predictive models. For the local performance, models were trained and tested on one single data source. For the transport setting, models will be trained on one source data set and then tested on another target data set. We compared the performance of the transported model on the target data with (a) its original performance on the source data, and (b) the performance of the locally trained model on the target data. The area under the receiver operating characteristic curve (AUROC), and the sensitivity as well as the positive predictive value (PPV) at the specificity level of 90% and 95%. The bootstrapped 95% confidence intervals or standardized variance were calculated by repeating the above-mentioned splitting, training, and testing 20 times. Regarding the importance of the predictors, we calculated the average of logistic regression coefficients over 20 times repetition as well. We used the standardized mean difference (SMD) [33] to measure the mean prevalence difference of a particular feature between two data sets. We assumed a significant difference in the feature value across two data sets if its corresponding SMD falls beyond the range of -0.2 to 0.2 [34].
Results
Population characteristics
As shown in Table 1, the basic population characteristics varied across the three study cohorts. For the three cohorts APCD (claims), HIDD (inpatient EHR), and KHIN (all settings EHR), the APCD and HIDD had similar suicide attempt rates (1.4% for APCD and 1.5% for HIDD, while the KHIN cohort had a lower suicide attempt rate of 0.9%. We observed more female suicide attempters in all three datasets. However, for suicide attempters, the proportion of females in KHIN (70.2%) was higher than in APCD (61.4%) and HIDD (62.4%). We also observed different index age distributions across three cohorts: in the younger 10-14 index age group, we observed more suicide attempters in KHIN (18.3%) than APCD (12.1%) and HIDD (14.7%); and in the older 20-24 age group, the APCD (38.7%) had more patients than HIDD (32.7%) and KHIN (30.6%). Regarding the suicide attempt methods, we observed more poisoning in HIDD (70.0%) than in APCD (63.5%) and KHIN (64.2%).
Table 1.
Population characteristics of three datasets.
| APCD (Claims) |
HIDD (EHR Inpatient) |
KHIN (EHR all Settings) |
||||
|---|---|---|---|---|---|---|
| Demographics | Case | Non-case | Case | Non-case | Case | Non-case |
| N (%) | 2163 (1.4%) | 153,318 (98.6%) | 434 (1.5%) | 28,407 (98.5%) | 906 (0.9%) | 100,047 (99.1%) |
| Sex - no. (%) | ||||||
| Female | 1329 (61.4) | 76,483 (49.9) | 271 (62.4) | 16,907 (59.5) | 636 (70.2) | 52,917 (52.9) |
| Male | 834 (38.6) | 76,835 (50.1) | 163 (37.6) | 11,500 (40.5) | 270 (29.8) | 47,130 (47.1) |
| Age - no. (%)* | ||||||
| 10-14 | 262 (12.1) | 19,943 (13.0) | 64 (14.7) | 4481 (15.8) | 166 (18.3) | 28,099 (28.1) |
| 15-19 | 1065 (49.2) | 52,166 (34.0) | 228 (52.5) | 9375 (33.0) | 463 (51.1) | 34,519 (34.5) |
| 20-24 | 836 (38.7) | 81,209 (53.0) | 142 (32.7) | 14,551 (51.2) | 277 (30.6) | 37,429 (37.4) |
| Suicide attempt methods - no. (%) | ||||||
| Poisoning | 1373 (63.5) | — | 304 (70.0) | — | 582 (64.2) | — |
| Cutting | 578 (26.7) | — | 89 (20.5) | — | 189 (20.9) | — |
| Other | 150 (6.9) | — | 25 (5.8) | — | 49 (5.4) | — |
| Suicide type >= 2 | 62 (2.9) | — | 16 (3.7) | — | 86 (9.5) | — |
*Age at index, namely the age at the last non-suicidal visit. See the age at the time of the event of interest in Supplementary Table S4. EHR, electronic health records; APCD, the All-Payer Claims Database from Connecticut; HIDD, the Hospital Inpatient Discharge Database from Connecticut; KHIN, the Electronic Health Records (EHR) data from the Kansas Health Information Network.
Local performance
Overall, the performance of the regularized logistic regression (LR) for suicide prediction was good across all three datasets when trained and tested on the same data source, with the test data held out and not used for training. Specifically, as shown in Table 2, the average AUROCs by LR were 0.879 (95% CI [0.876, 0.882]) on APCD, 0.831 (95% CI [0.821, 0.840]) on HIDD, and 0.787 (95% CI [0.781, 0.794]) on KHIN.
Table 2.
Local and transported performance of the regularized logistic regression across three datasets.
| 90%-Specificity | 95%-Specificity | |||||
|---|---|---|---|---|---|---|
| Source Dataset | Target Dataset | AUROC (95% CI) |
Sensitivity (95% CI) |
PPV (95% CI) |
Sensitivity (95% CI) |
PPV (95% CI) |
| APCD |
0.879 (0.876, 0.882) |
0.662 (0.653, 0.669) |
0.086 (0.084, 0.088) |
0.538 (0.527, 0.548) |
0.132 (0.129, 0.135) |
|
| APCD | HIDD |
0.797 (0.785, 0.809) |
0.464 (0.444, 0.482) |
0.066 (0.061, 0.071) |
0.282 (0.260, 0.304) |
0.079 (0.071, 0.087) |
| APCD | KHIN |
0.749 (0.739, 0.757) |
0.419 (0.403, 0.433) |
0.036 (0.035, 0.037) |
0.259 (0.246, 0.271) |
0.044 (0.042, 0.046) |
| HIDD |
0.831 (0.821, 0.840) |
0.456 (0.434, 0.479) |
0.065 (0.061, 0.069) |
0.279 (0.259, 0.299) |
0.078 (0.073, 0.084) |
|
| HIDD | APCD |
0.802 (0.795, 0.808) |
0.501 (0.489, 0.514) |
0.066 (0.064, 0.067) |
0.331 (0.319, 0.343) |
0.085 (0.082, 0.088) |
| HIDD | KHIN |
0.735 (0.726, 0.744) |
0.410 (0.392, 0.427) |
0.035 (0.034, 0.037) |
0.259 (0.245, 0.274) |
0.044 (0.042, 0.047) |
| KHIN |
0.787 (0.781, 0.794) |
0.478 (0.460, 0.494) |
0.041 (0.040, 0.043) |
0.348 (0.335, 0.362) |
0.059 (0.056, 0.061) |
|
| KHIN | APCD |
0.845 (0.841, 0.849) |
0.581 (0.570, 0.591) |
0.075 (0.073, 0.076) |
0.423 (0.412, 0.433) |
0.105 (0.103, 0.107) |
| KHIN | HIDD |
0.829 (0.821, 0.838) |
0.452 (0.431, 0.472) |
0.063 (0.060, 0.067) |
0.276 (0.259, 0.294) |
0.076 (0.071, 0.081) |
AUROC, area under the receiver operating characteristic curve. PPV, Positive predictive value. CI, confidence intervals. APCD, the All-Payer Claims Database from Connecticut; HIDD, the Hospital Inpatient Discharge Database from Connecticut; KHIN, the Electronic Health Records (EHR) data from the Kansas Health Information Network.
However, more complex ML models, particularly LSTM, did not necessarily outperform the simpler regularized logistic regression. As shown in Fig. 2a, the LR achieved an average AUROC of 0.879 (95% CI [0.876, 0.882]), which is similar to the AUROCs achieved by the GBM at 0.887 ([0.884, 0.890]) and LSTM at 0.901 ([0.896, 0.905]) on the APCD data. The average AUROCs on the HIDD were 0.831 ([0.821, 0.840]), 0.827 ([0.818, 0.837]), and 0.811 ([0.799, 0.824]) by LR, GBM and LSTM, respectively. Regarding the KHIN data, the LSTM showed inferior performance than other models, with an average AUROC of 0.759 ([0.745, 0.771), compared to 0.787 ([0.781, 0.794]) by LR and 0.793 ([0.788, 0.799]) by GBM.
Fig. 2. Local and transported AUROC performance of three ML-based predictive models across different source-target data pairs.
The three panels show the same set of experiments with different ordering to highlight different comparisons: a local and transported performance of different ML models, b transported performance compared to source data, and c transported performance compared to target data. Three targeted datasets are APCD, the All-Payer Claims Database from Connecticut; HIDD, the Hospital Inpatient Discharge Database from Connecticut; and KHIN, the Electronic Health Records data from the Kansas Health Information Network. Three different ML-based suicide prediction models include regularized logistic regression (LR), gradient boosting machine (GBM), and the deep long-short-term memory neural network (LSTM). The arrows show transporting models developed on the source data to the target data. AUROC, area under the receiver operating characteristic curve.
Transported performance
When transporting models developed on source data to target data, their transported performance does not necessarily drop, showing data-dependent transportability. Specifically, as shown in Fig. 2b, compared to the LR models developed on the APCD data or HIDD data, we observed performance drops when applying them to other datasets. As shown in Table 2, the locally learned LR model on APCD could achieve an average AUROC of 0.879 ([0.876, 0.882]), and transported performance of 0.797 ([0.785, 0.809]) and 0.749 ([0.739, 0.757]) to HIDD and KHIN, respectively. Similar drop patterns were observed on the LR model trained on HIDD and transported to other data sets. However, increased transported performance was observed for the LR model developed on KHIN, with an average AUROC of 0.787 ([0.781, 0.794]) on KHIN, and transported performance of 0.845 ([0.841, 0.849]) and 0.829 ([0.821, 0.838]) to APCD and HIDD, respectively. As shown in Fig. 2b, the data-dependent transportability patterns were also observed in other machine learning models including GBM (Details in Supplementary Table 2) and LSTM (Details in Supplementary Table 3).
All the transported models showed either inferior or comparable performance to the locally developed models, implying that while transported performance was sometimes still good, it was generally upper-bounded by the best performance of locally developed ones. For example, as shown in Table 2 and Fig. 2c, the LR model trained from APCD could achieve an average AUROC of 0.879 ([0.876, 0.882]) on the APCD testing data, and the transported model trained from HIDD and KHIN achieved an average AUROC of 0.802 ([0.795, 0.808]) and 0.845 ([0.841, 0.849]), respectively, when tested on the same APCD data. As shown in Fig. 2c, similar bounded transported performance was observed using different machine learning models, including GBM (Details in Supplementary Table 2) and LSTM (Details in Supplementary Table 3) across all three datasets. Though upper bounded, the good transported performances in several cases suggest the potential usefulness of the transported model, particularly when the locally trained models are not available.
Relatively more complex LSTM model usually exhibited inferior transported performance compared to the simpler LR model, as shown in Fig. 2a. For example, the LSTM showed very poor transported performance when transported from HIDD to KHIN with AUROC of 0.595 ([0.572, 0.616]), and from HIDD to APCD with AUROC of 0.654 ([0.641, 0.668]) (see Supplementary Table 3). Only in the case of APCD to HIDD did the LSTM show comparable transported performance to other models.
Predictor importance
We identified some common features consistently associated with increased suicide attempt risk across all three datasets, as shown in Table 3 and Fig. 3, including episodic mood disorders (with average rankings of 1.1, 1.8, and 2.1 in the APCD, HIDD, and KHIN, respectively), other psychosocial circumstances (8.4, 5.0, and 2.4), drug abuse (7.7, 16.5, and 3.7), and being female (3.6, 12.7, and 5.5). In addition, depressive disorders, anxiety, drug dependence, and poisoning by analgesics antipyretics, antirheumatics, and being in the age group 15 to 19 years, were also associated with increased risk of suicide attempts across all three datasets. Overall, the most predictive features identified by our LR model across three datasets are aligned with existing knowledge of risk factors for suicide, e.g., episodic mood disorder, anxiety, being female, and depressive disorder. See the results of these identified features from the gradient boosting machine in Supplementary Fig. S2.
Table 3.
Top 30 Diagnostic codes (ICD-9 codes) associated with increased risk of suicide attempt in three datasets.
| APCD | HIDD | KHIN | |||
|---|---|---|---|---|---|
| Features | Rank* | Features | Rank* | Features | Rank* |
| 296, Episodic mood disorders | 1.1 | 296, Episodic mood disorders | 1.8 | 296, Episodic mood disorders | 2.1 |
| 300, Anxiety, dissociative and somatoform disorders | 3.5 | 311, Depressive disorder, not elsewhere classified | 4.4 | V62, Other psychosocial circumstances | 2.4 |
| Sex Female | 3.6 | V62, Other psychosocial circumstances | 5.0 | 305, Nondependent abuse of drugs | 3.7 |
| 311, Depressive disorder, not elsewhere classified | 4.5 | 301, Personality disorders | 10.0 | Sex Female | 5.5 |
| 309, Adjustment reaction | 6.2 | Sex Female | 12.7 | 300, Anxiety, dissociative and somatoform disorders | 13.8 |
| 305, Nondependent abuse of drugs | 7.7 | 304, Drug dependence | 13.1 | 787, Symptoms involving digestive system | 14.7 |
| V62, Other psychosocial circumstances | 8.4 | 300, Anxiety, dissociative and somatoform disorders | 14.0 | 789, Other symptoms involving abdomen and pelvis | 15.8 |
| 298, Other nonorganic psychoses | 11.9 | 305, Nondependent abuse of drugs | 16.5 | 965, Poisoning by analgesics, antipyretics, and antirheumatics | 18.6 |
| 304, Drug dependence | 13.7 | 314, Hyperkinetic syndrome of childhood | 19.9 | 625, Pain and other symptoms associated with female genital organs | 23.3 |
| 299, Pervasive developmental disorders | 14.6 | 309, Adjustment reaction | 23.7 | 277, Other and unspecified disorders of metabolism | 27.5 |
| 995, Certain adverse effects not elsewhere classified | 15.1 | 965, Poisoning by analgesics, antipyretics, and antirheumatics | 30.0 | 307, Special symptoms or syndromes, not elsewhere classified | 31.8 |
| 293, Transient mental disorders due to conditions classified elsewhere | 15.4 | 724, Other and unspecified disorders of back | 33.5 | 301, Personality disorders | 32.8 |
| 319, Unspecified intellectual disabilities | 18.7 | 303, Alcohol dependence syndrome | 34.4 | 311, Depressive disorder, not elsewhere classified | 34.8 |
| 312, Disturbance of conduct, not elsewhere classified | 20.2 | V60, Housing, household, and economic circumstances | 35.0 | 729, Other disorders of soft tissues | 43.7 |
| 295, Schizophrenic disorders | 24.7 | E939, Psychotropic agents | 38.9 | 295, Schizophrenic disorders | 46.4 |
| 310, Specific nonpsychotic mental disorders due to brain damage | 28.9 | 307, Special symptoms or syndromes, not elsewhere classified | 40.6 | 536, Disorders of function of stomach | 47.6 |
| 599, Other disorders of urethra and urinary tract | 30.4 | 298, Other nonorganic psychoses | 44.7 | 969, Poisoning by psychotropic agents | 48.3 |
| 794, Nonspecific abnormal results of function studies | 33.4 | 728, Disorders of muscle, ligament, and fascia | 46.9 | E006, Activities involving other sports and athletics played individually | 55.1 |
| 250, Diabetes mellitus | 34.9 | 368, Visual disturbances | 50.5 | 780, General symptoms | 57.4 |
| V71, Observation and evaluation for suspected conditions not found | 35.5 | 308, Acute reaction to stress | 51.6 | 308, Acute reaction to stress | 59.5 |
| 620, Noninflammatory disorders of ovary, fallopian tube, and broad ligament | 36.3 | 788, Symptoms involving urinary system | 58.9 | 796, Other nonspecific abnormal findings | 61.5 |
| E000, External cause status | 37.3 | Age15-19 | 60.2 | 923, Contusion of upper limb | 62.3 |
| 920, Contusion of face, scalp, and neck except eye(s) | 38.6 | V61, Other family circumstances | 62.5 | 401, Essential hypertension | 67.5 |
| 314, Hyperkinetic syndrome of childhood | 38.6 | 295, Schizophrenic disorders | 63.6 | 338, Pain, not elsewhere classified | 67.9 |
| 493, Asthma | 39.4 | 473, Chronic sinusitis | 65.5 | 599, Other disorders of urethra and urinary tract | 71.8 |
| 965, Poisoning by analgesics, antipyretics, and antirheumatics | 39.6 | 881, Open wound of elbow, forearm, and wrist | 66.1 | 784, Symptoms involving head and neck | 74.3 |
| 307, Special symptoms or syndromes, not elsewhere classified | 42.9 | 277, Other and unspecified disorders of metabolism | 66.4 | V15, Other personal history presenting hazards to health | 74.9 |
| E960, Fight, brawl, rape | 43.1 | 253, Disorders of the pituitary gland and its hypothalamic control | 66.6 | V69, Problems related to lifestyle | 76.1 |
| 112, Candidiasis | 49.7 | V12, Personal history of certain other diseases | 66.9 | V60, Housing, household, and economic circumstances | 77.1 |
| 682, Other cellulitis and abscess | 51.1 | E850, Accidental poisoning by analgesics, antipyretics, and antirheumatics | 67.6 | V64, Persons encountering health services for specific procedures, not carried out | 78.1 |
* The average ranks were determined by the absolute value of the coefficient of the regularized logistic regression models over 20 repetitions. APCD, the All-Payer Claims Database from Connecticut; HIDD, the Hospital Inpatient Discharge Database from Connecticut; KHIN, the Electronic Health Records (EHR) data from the Kansas Health Information Network.
Fig. 3. Estimated predictor importance by LR and source data difference across three datasets.
The feature importances of the regularized logistic regression models locally estimated from APCD, HIDD, and KHIN were presented by bar plots with 95% Confidence Intervals as error bars. The standardized mean differences (SMD) of feature prevalence between datasets were shown in the dots or triangles. A significant difference in feature prevalences was assumed if the SMD was beyond . The top 30 features were showcased, and the dashed lines were guidelines for eyes.
On the other hand, we also observed heterogeneous predictor importance learned from different datasets. As shown in Fig. 3, ‘Health supervision of infant or child’ and ‘General medical examination’ as examples were associated with decreased risk of suicide attempt in the APCD and KHIN models but not in the HIDD model. Indeed, these codes were prevalent in the APCD (general claims data) and KHIN (all-setting EHR), rather than in the inpatient EHR data HIDD (See the standardized mean difference of feature prevalence in Fig. 3). In contrast, “outcome of delivery” and “acute appendicitis” were associated with a more significantly decreased risk of suicide attempt in the HIDD than in the APCD or KHIN.
Overall, the same ML-based suicide prediction model might identify a different set of important predictors when trained from different datasets, together with data differences, suggesting varying transported performance. However, some consistently identified features across different sets also suggest the potential feasibility of transporting models.
Sensitivity analysis
To assess the robustness of our results, we conducted several sensitivity analyses. First, the observed were not due to idiosyncrasies of metrics. The performance patterns remained consistent when evaluated using other metrics, including sensitivity and positive predictive value (PPV) at either 90% specificity or 95% specificity (see Table 2, Supplementary Tables S2 and S3). Second, we investigated how the modeling performance and transportability will change when removing the crosswalk of ICD-10 to ICD-9 in the KHIN dataset. Specifically, we only used the ICD-9 proportion in the KHIN data, leading to 62,636 eligible patients in the new KHIN cohort with 156 identified suicide attempters. We replicated our primary analyses in the newly built cohort and feature spaces. As shown in Supplementary Fig. S3 and Supplementary Tables S5, S6, and S7, similar local and transported results were observed. Third, to make more “apple-to-apple” comparisons, we replicated our primary analyses in the shared feature space built only from inpatient encounters across all three datasets. We observed similar local and transported performance patterns as shown in Supplementary Fig. S4 and Supplementary Tables S8, S9, and S10.
Discussion
In this study, we investigated three ML-based suicide prediction models (regularized logistic regression, gradient boosting machine, LSTM) on three real-world datasets collected in different contexts (inpatient, outpatient, and all encounters) with varying purposes (administrative claims and electronic health records), and compared their local and transported performance across datasets. Regarding local performance where models were trained and tested on the same data source, we observed similar good performance across three models, and what’s more, the relatively more complex models e.g., LSTM or GBM, did not necessarily outperform relatively simpler regularized logistic regression model. We observed that as the ML model becomes more complex, it could increase overfitting in the training data (see Supplementary Tables S11-13), however, we didn’t observe the superior performance for more complex models in the unseen test data, aligning with one recent work [19].
The transported performance of ML-based suicide prediction models exhibited more complex patterns. First, when compared to performance in the source data where models were developed, we observed both performance drop and increase in different transportation scenarios when using different data pairs, suggesting data-dependent transportability. Second, when compared to the locally developed model on the target data, the transported performance was generally upper-bounded by the best performance of the locally developed model. Third, relatively more complex models, particularly the deep learning model LSTM which is good at capturing sequential data, showed inferior transported performance than simpler LR models, suggesting model-dependent transportability. Although the models developed on the source data may demonstrate lower AUROC when transported to the target data and are expected to be inferior to models developed on the target data, in several cases, the transported performances on target data are still good, suggesting potential utility of the transporting suicide prediction models, especially when the target models are unavailable. The utility of the transported model is also evident in situations where the target setting has an insufficient sample size (e.g., rare events like suicide or a small population size), which makes it difficult to develop a robust model within the target setting. In such cases, models developed on a much larger yet comparable sample from another setting may perform better. In addition, the simpler LR model is a viable choice for suicide prediction using EHR/Claims data, whether in local or transported settings.
The differences across three different datasets might account for the complex patterns of transported performance. Specifically, the APCD dataset contained administrative claims data covering both inpatient and outpatient encounters, the HIDD dataset contained data from only inpatient hospitalization EHRs, and the KHIN included EHR from all encounters (e.g., inpatient, outpatient, and emergency room settings, etc), suggesting potentially remarkable differences in clinical information captures. Taking the difference in the prevalence of ‘Acute appendicitis’ between APCD and HIDD as an example, the standardized mean difference (SMD) of its prevalence between the two datasets is -0.7, indicating that compared to HIDD, the APCD captured much less this potential inpatient event. On the other hand, the SMD values of the ‘Health supervision of infant or child’, ‘General medical examination’, ‘Special investigation and examinations’, and ‘Special screening for malignant neoplasms’, between APCD to HIDD diagnoses were 1.7, 1.4, and 1.8, respectively, implying different captures in the potential outpatient events. What’s more, the SMD of the “Poisoning by psychotropic agents” diagnostic event between KHIN and other datasets, as shown in Fig. 3, indicates that KHIN potentially captured more emergency encounters than the other two datasets. Usually, performance drops are anticipated, considering that these different captures of clinical events under different clinical contexts may limit the transportability of the investigated ML-based risk prediction models [19]. However, the increased performance from KHIN to the other two datasets, compared to modeling performance on KHIN, suggests potentially generalizable use of suicide prediction models developed from all encounter EHRs to other settings (administrative claims and inpatient EHRs). These increased transported performances differ from findings in one recent work developed in the schizophrenia domain [19], suggesting disease-specific transportability of ML predictive models.
In addition, the transported models identified different sets of patients including new cases which were neglected by the locally trained models. We illustrated a Venn diagram of correctly identified suicide attempters by different models in Supplementary Fig. S1. Taking the target APCD data as an example, the true positive patients correctly identified by the APCD local model was 235 (among 440), and the true positive patients correctly identified by the HIDD model decreased to 141. Furthermore, the original 115 cases identified by the APCD model were not identified by the HIDD model, and the HIDD model identified an additional 21 new cases that were not identified by the APCD model. We also examined several true positive patient records to explore the differences between models. For instance, the predictors ‘Other disorders of the urethra and urinary traction, a potential risk factor that has been reported by other studies [35, 36], were only selected by the APCD model due to differences in the data (mean ranking in APCD was 23.3, HIDD was 371.7, and KHIN was 47.6). Thus, a suicide attempter from the APCD dataset whose profile had a diagnosis code of disorders of ‘urethra and urinary tract’ in his or her visit record was identified only by the model developed from the APCD. Indeed, the transported models identified fewer cases and thus potentially worse AUROC performance when transporting the HIDD model to the APCD data; however, the transported model was able to identify novel cases.
By demonstrating good local performance of simpler LR models, complex transported performance patterns, and potentially different transportability of three different ML-based suicide prediction models across three different patient populations with varying clinical settings, this study contributes to a small but developing literature on the challenges of deploying suicide risk models in clinical practice. Recent studies suggest that suicide risk models have both generalizability and durability: they can be used effectively in clinical settings outside of but similar to those in which they were produced [17], and maintain predictive accuracy over short to intermediate time frames [18]. Our study uncovers more complex patterns in transporting suicide risk models derived from particular patient populations to populations derived from other clinical settings, by comprehensively comparing transported performance with both source and target modeling performance.
Based on our results we recommend future research be focused on fusing knowledge learned from different populations and settings [13], potentially leading to better performance and generalizability. Through exploration, we found that different models can identify suicide attempters with different characteristics. Suppose we can obtain prior knowledge from different models and the weights of their feature importance, for example, by trying to integrate multiple models. This might improve the predictive performance so that the integrated model can correctly identify more suicide attempters. Alternatively, a solution that may improve prediction performance is to provide a model pool, namely a set of pre-trained models from the different data, and a designed metric can be used to select a best-fit model from the model pool to accomplish the prediction task and can provide statistical evidence, allowing for the optimal prediction result to be achieved.
There also exist limitations in this study. First, the APCD database relies on claims for commercially insured residents, and recruited patients can lose their follow-up records due to losing their commercial insurance or moving out of state. Similar types of issues would be relevant for the KHIN and HIDD as well. Despite this limitation, we selected patients with continuous insurance coverage and with valid enrollment during the recruiting window in our experiment design. Second, any eligibility criteria to select patients based on their information after time zero, e.g., complete follow-up information, might lead to selection bias, and thus, we only require such eligibility criteria in the recruiting windows before time zero rather than after time zero. In addition, we followed each patient up to their most recent 2 years, (rather than 5 years), attempting to minimize the loss to follow-up bias as much as possible in our analysis. Third, to study factors accounting for the transported performance, or fusing models learned from different sources are promising future directions. Fourth, we didn’t investigate model calibration, an important consideration when thinking about transportability and a feature that deserves dedicated study in the future.
Conclusion
This study investigated different ML-based suicide prediction models on three real-world datasets collected in different contexts (inpatient, outpatient, and all encounters) with varying purposes (administrative claims and electronic health records), and compared their local and transported performance. The relatively more complex models (e.g., LSTM and GBM) did not necessarily outperform relatively simpler models (e.g., LR) regarding the local performance as well as the transported performance. The transported performance is data-dependent, model-dependent, and upper-bounded by the performance of the locally developed models. The transported models might achieve good performance and identify additional new cases on target data, suggesting a fusion of knowledge learned from different datasets might improve the performance. Our analyses could facilitate the development of ML-based suicide prediction models with better performance and generalizability.
Supplementary information
Acknowledgements
The work was supported by NIH grant numbers R01MH124740 and R01MH112148. The authors are grateful to the Editor, the Associate Editor, and the referees for their valuable comments and suggestions, which have led to significant improvement of the article.
Author contributions
CZ, FW, KC, and RA designed the study. RA, JJ, and SS contributed to the data acquisition and processing. CZ, YH, and DL undertook the experiments and interpreted the data. CZ, YH, and FW drafted the manuscript. All the authors took critical revisions of the manuscript and approved the final version to be published.
Data availability
The data used in this analysis were obtained from the Connecticut Department of Public Health (DPH) and the Connecticut Office of Health Strategy (OHS). Neither agency endorses nor assumes any responsibility for any analyses, interpretations, or conclusions based on the data. The authors assume full responsibility for all such analyses, interpretations, and conclusions. The data use agreements governing access to and use of these datasets do not permit the authors to re-release the datasets or make the data publicly available. However, these data can be obtained by other researchers using the same application processes used by the authors.
Code availability
For reproducibility, our codes are available at https://github.com/houyurain/2022_Suicide. We implemented the regularized logistic regression (LR) implemented by the Scikit-learn package 1.0.2 [37], the GBM by the Lightgbm package 3.3.2. [38], and the LSTM by PyTorch 1.12 with GPU acceleration.
Competing interests
The authors declare no competing interests.
Ethics declarations
Our study was approved by the University of Connecticut Health Center Institutional Review Board and Weill Cornell Medical College Institutional Review Board, the CT Department of Public Health Human Investigations Committee, and the CT APCD Data Release Committee. All EHRs and claims used in this study were appropriately deidentified and thus no informed consent from patients was obtained.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Chengxi Zang, Yu Hou.
Contributor Information
Kun Chen, Email: kun.chen@uconn.edu.
Robert Aseltine, Email: aseltine@uchc.edu.
Fei Wang, Email: few2001@med.cornell.edu.
Supplementary information
The online version contains supplementary material available at 10.1038/s41398-024-03034-3.
References
- 1.Curtin SC. State suicide rates among adolescents and young adults aged 10–24: United States, 2000–2018. Natl Vital Stat Rep. 2020;69:1–10. [PubMed] [Google Scholar]
- 2.Leading Causes of Death and Injury - PDFs|Injury Center|CDC. https://www.cdc.gov/injury/wisqars/LeadingCauses.html (2022).
- 3.Luoma JB, Martin CE, Pearson JL. Contact with mental health and primary care providers before suicide: a review of the evidence. Am J Psychiatry. 2002;159:909–16. 10.1176/appi.ajp.159.6.909 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ahmedani BK, Simon GE, Stewart C, Beck A, Waitzfelder BE, Rossom R, et al. Health care contacts in the year before suicide death. J Gen Intern Med. 2014;29:870–7. 10.1007/s11606-014-2767-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Su C, Aseltine R, Doshi R, Chen K, Rogers SC, Wang F. Machine learning for suicide risk prediction in children and adolescents with electronic health records. Transl Psychiatry. 2020;10:1–10. 10.1038/s41398-020-01100-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kessler RC, Warner CH, Ivany C, Petukhova MV, Rose S, Bromet EJ, et al. Predicting suicides after psychiatric hospitalization in US army soldiers: the army study to assess risk and resilience in servicemembers (Army STARRS). JAMA Psychiatry. 2015;72:49–57. 10.1001/jamapsychiatry.2014.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH, et al. Predicting suicidal behavior from longitudinal electronic health records. AJP. 2017;174:154–62. 10.1176/appi.ajp.2016.16010077 [DOI] [PubMed] [Google Scholar]
- 8.Simon GE, Johnson E, Lawrence JM, Rossom RC, Ahmedani B, Lynch FL, et al. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. AJP. 2018;175:951–60. 10.1176/appi.ajp.2018.17101167 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Poulin C, Shiner B, Thompson P, Vepstas L, Young-Xu Y, Goertzel B, et al. Predicting the risk of suicide by analyzing the text of clinical notes. PLoS ONE. 2014;9:e85733. 10.1371/journal.pone.0085733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.McCarthy JF, Bossarte RM, Katz IR, Thompson C, Kemp J, Hannemann CM, et al. Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US department of veterans affairs. Am J Public Health. 2015;105:1935–42. 10.2105/AJPH.2015.302737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sanderson M, Bulloch AG, Wang J, Williamson T, Patten SB. Predicting death by suicide using administrative health care system data: Can recurrent neural network, one-dimensional convolutional neural network, and gradient boosted trees models improve prediction performance? J Affect Disord. 2020;264:107–14. 10.1016/j.jad.2019.12.024 [DOI] [PubMed] [Google Scholar]
- 12.Doshi RP, Chen K, Wang F, Schwartz H, Herzog A, Aseltine RH. Identifying risk factors for mortality among patients previously hospitalized for a suicide attempt. Sci Rep. 2020;10:15223. 10.1038/s41598-020-71320-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Xu W, Su C, Li Y, Rogers S, Wang F, Chen K, et al. Improving suicide risk prediction via targeted data fusion: proof of concept using medical claims data. J Am Med Inf Assoc. 2022;29:500–11. 10.1093/jamia/ocab209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tran T, Luo W, Phung D, Harvey R, Berk M, Kennedy RL, et al. Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments. BMC Psychiatry. 2014;14:76. 10.1186/1471-244X-14-76 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Walkup JT, Townsend L, Crystal S, Olfson M. A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data. Pharmacoepidemiol Drug Saf. 2012;21:174–82. 10.1002/pds.2335 [DOI] [PubMed] [Google Scholar]
- 16.Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, et al. The U.S. Food and Drug Administration’s Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Safety. 2012;21:1–8. [DOI] [PubMed] [Google Scholar]
- 17.Barak-Corren Y, Castro VM, Nock MK, Mandl KD, Madsen EM, Seiger A, et al. Validation of an electronic health record–based suicide risk prediction modeling approach across multiple health care systems. JAMA Network Open. 2020;3:e201262. 10.1001/jamanetworkopen.2020.1262 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Walker RL, Shortreed SM, Ziebell RA, Johnson E, Boggs JM, Lynch FL, et al. Evaluation of electronic health record-based suicide risk prediction models on contemporary data. Appl Clin Inform. 2021;12:778–87. 10.1055/s-0041-1733908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chekroud AM, Hawrilenko M, Loho H, Bondar J, Gueorguieva R, Hasan A, et al. Illusory generalizability of clinical prediction models. Science. 2024;383:164–7. 10.1126/science.adg8538 [DOI] [PubMed] [Google Scholar]
- 20.Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics. 2020;21:345–52. [DOI] [PubMed] [Google Scholar]
- 21.Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26:29–38. 10.1038/s41591-019-0727-5 [DOI] [PubMed] [Google Scholar]
- 22.Song X, Yu ASL, Kellum JA, Waitman LR, Matheny ME, Simpson SQ, et al. Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat Commun. 2020;11:5668. 10.1038/s41467-020-19551-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The Clinician and Dataset Shift in Artificial Intelligence. n engl j med 4. 2021. [DOI] [PMC free article] [PubMed]
- 24.All-Payer Claims Database. CT.gov - Connecticut’s Official State Websitehttps://portal.ct.gov/OHS/Services/Data-and-Reports/To-Access-Data/All-Payer-Claims-Database.
- 25.Hospital Patient Data. CT.gov - Connecticut’s Official State Websitehttps://portal.ct.gov/OHS/Services/Data-and-Reports/To-File-Data/Patient-Data.
- 26.KHIN - Health Information Network. https://www.khinonline.org/Product-Sevices/HEALTH-INFORMATION-NETWORK.aspx.
- 27.World Medical Association. World medical Association declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2013;310:2191–4. 10.1001/jama.2013.281053 [DOI] [PubMed] [Google Scholar]
- 28.Wang, W, Li, Y & Yan, J touch: Tools of Utilization and Cost in Healthcare. (2022).
- 29.Choi, E, Bahadori, MT, Kulas, JA, Schuetz, A, Stewart, WF & Sun, J. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. arXiv:1608.05745 [cs] (2017).
- 30.Liu R, Wei L, Zhang P. A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data. Nature Machine Intelligence. 2021;3:68–75. 10.1038/s42256-020-00276-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80. 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
- 32.Zang C, Zhang H, Xu J, Zhang H, Fouladvand S, Havaldar S, et al. High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data. Nat Commun. 2023;14:1–16. 10.1038/s41467-023-43929-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Austin PC. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Communications in Statistics - Simulation and Computation. 2009;38:1228–34. 10.1080/03610910902859574 [DOI] [Google Scholar]
- 34.Zhang Z, Kim HJ, Lonjon G, Zhu Y. Balance diagnostics after propensity score matching. Ann Transl Med. 2019;7:16. 10.21037/atm.2018.12.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Braga AANM, Veiga MLT, Ferreira MGCDS, Santana HM, Barroso U. Association between stress and lower urinary tract symptoms in children and adolescents. Int Braz J Urol. 2019;45:1167–79. 10.1590/s1677-5538.ibju.2019.0128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Carson CM, Phillip N, Miller BJ. Urinary tract infections in children and adolescents with acute psychosis. Schizophr Res. 2017;183:36–40. 10.1016/j.schres.2016.11.004 [DOI] [PubMed] [Google Scholar]
- 37.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning. in Python. J. Mach. Learn. Res. 2011;12:2825–30. [Google Scholar]
- 38.Ke, G, Meng, Q, Finley, T, Wang, T, Chen, W, Ma, W, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this analysis were obtained from the Connecticut Department of Public Health (DPH) and the Connecticut Office of Health Strategy (OHS). Neither agency endorses nor assumes any responsibility for any analyses, interpretations, or conclusions based on the data. The authors assume full responsibility for all such analyses, interpretations, and conclusions. The data use agreements governing access to and use of these datasets do not permit the authors to re-release the datasets or make the data publicly available. However, these data can be obtained by other researchers using the same application processes used by the authors.
For reproducibility, our codes are available at https://github.com/houyurain/2022_Suicide. We implemented the regularized logistic regression (LR) implemented by the Scikit-learn package 1.0.2 [37], the GBM by the Lightgbm package 3.3.2. [38], and the LSTM by PyTorch 1.12 with GPU acceleration.



