Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction

Ilkin Bayramli; Victor Castro; Yuval Barak-Corren; Emily M Madsen; Matthew K Nock; Jordan W Smoller; Ben Y Reis

doi:10.1038/s41746-022-00558-0

. 2022 Jan 27;5:15. doi: 10.1038/s41746-022-00558-0

Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction

Ilkin Bayramli ^1,², Victor Castro ^3,⁴, Yuval Barak-Corren ¹, Emily M Madsen ^5,⁶, Matthew K Nock ^4,^7,⁸, Jordan W Smoller ^5,^6,^9,^#, Ben Y Reis ^1,^9,^✉,^#

PMCID: PMC8795240 PMID: 35087182

Abstract

Clinical risk prediction models powered by electronic health records (EHRs) are becoming increasingly widespread in clinical practice. With suicide-related mortality rates rising in recent years, it is becoming increasingly urgent to understand, predict, and prevent suicidal behavior. Here, we compare the predictive value of structured and unstructured EHR data for predicting suicide risk. We find that Naive Bayes Classifier (NBC) and Random Forest (RF) models trained on structured EHR data perform better than those based on unstructured EHR data. An NBC model trained on both structured and unstructured data yields similar performance (AUC = 0.743) to an NBC model trained on structured data alone (0.742, p = 0.668), while an RF model trained on both data types yields significantly better results (AUC = 0.903) than an RF model trained on structured data alone (0.887, p < 0.001), likely due to the RF model’s ability to capture interactions between the two data types. To investigate these interactions, we propose and implement a general framework for identifying specific structured-unstructured feature pairs whose interactions differ between case and non-case cohorts, and thus have the potential to improve predictive performance and increase understanding of clinical risk. We find that such feature pairs tend to capture heterogeneous pairs of general concepts, rather than homogeneous pairs of specific concepts. These findings and this framework can be used to improve current and future EHR-based clinical modeling efforts.

Subject terms: Translational research, Epidemiology

Introduction

In recent years there has been a proliferation of clinical prediction models powered by electronic health records (EHRs). Many prediction models rely primarily on structured data from the EHR, which typically includes diagnostic, laboratory, medication, and procedure codes. Yet most EHRs also contain unstructured data such as clinician notes, which may include information already captured in the structured data, as well as information not present in the structured data (Fig. 1). Unstructured EHR data have been used for clinical predictive tasks, both as a standalone feature-set and in combination with structured data^1–4.

Fig. 1 — Electronic health records contain both structured and unstructured data. These two types of data contain both unique and overlapping information.

In order to optimally integrate both structured and unstructured data and improve predictive performance, it is important to understand the predictive value of each data type. It is also important to understand the interactions between these two data types and identify instances where the nature of these interactions differs between case and non-case populations. Such differences can be valuable for deepening our understanding of clinical risk and for improving clinical risk prediction in models that are able to capture these interactions.

As a case study, we focus on suicide prediction. Approximately 800,000 people die by suicide every year worldwide, accounting for 1.5% of all deaths⁵. Suicide is the tenth leading cause of death in North America and a leading cause of death globally among persons 15–24 years of age⁶. With suicide-related mortality rates rising in recent years⁷, it is becoming increasingly urgent to understand, predict, and prevent suicidal behavior. Early and accurate identification of individuals with elevated risk for suicide attempts is critical for developing effective suicide prevention strategies. Predicting suicide risk, however, is a complex challenge. The intuition of clinicians for detecting at-risk individuals is no better than random chance⁸, underscoring the potential value of algorithmic approaches to this challenge.

In recent years, rapidly growing quantities of electronic health data along with advancements in statistical learning methods have enabled the development of suicide risk prediction models. We recently developed one such model using data from over 1.7 million patients in a large healthcare system (Mass General Brigham)⁹; the model detected 45% of suicide attempts an average of 3–4 years in advance, with a specificity of 90% and an area under the receiver operating curve (AUC) of 0.77. Since structured EHR data capture only some elements of clinical presentation, in the present study, we seek to improve upon this prediction accuracy by examining features extracted using natural language processing (NLP) from unstructured clinician notes. (For simplicity, we refer to these as “unstructured features”.)

The goals of this study are threefold: (1) To compare the predictive value of structured and unstructured EHR data as standalone datasets for predicting suicide risk; (2) to evaluate the increase in prediction performance when integrating both structured and unstructured data using various models; and (3) to identify structured-unstructured feature pairs in which the interaction between the two features differs substantially between case and non-case populations, and which may thus have the potential to improve predictive performance. To achieve the latter, we propose a framework for identifying structured-unstructured feature pairs in which the interaction between the two features differs significantly between case and non-case cohorts.

Results

Study population

Approximately 2,303,376 individuals did not have sufficient number of visits to be included in the analysis. Of these, 11,316 had a suicide attempt (0.5%). Many of the excluded individuals had a single emergency department visit, hospitalization, or medical concept recorded over the 20 year course of the dataset. Applying the inclusion and exclusion criteria to the extracted data yielded 1,625,350 training subjects for the NBC models, which included 1,608,806 non-cases (99%) and 16,544 cases (1%) (Fig. 2). The testing set consisted of 697,411 subjects, including 7,155 cases. For the BRFC models, the dataset included 140,000 subjects for each of the training and testing populations, with the former having 16,538 cases (12%, due to the sampling approach mentioned above) and the latter having 1,384 (1%, reflecting the prevalence in the clinical population). For case subjects, the median time between the penultimate hospital visit and first suicide attempt was 35 days and the mean was 286 days. Figure 3 shows the distribution of time between the penultimate visit and first suicide attempt.

Fig. 2 — The diagram describes the filtering and processing steps taken to arrive at the final datasets used for training and testing different models described in this paper. STR—Structured Data; NLP—Unstructured data processed by Natural Language Processing; NBC—Naïve Bayesian Classifier; BRFC—Balanced Random Forest Classifier.

Fig. 3 — As the distribution was highly skewed, the x-axis was capped at 100 days for clarity. A few patients had several years between their last recorded visit and suicide attempt.

For both experiments, we had the same set of 45,808 features which included 43,435 structured features (95%) and 2,373 features derived from unstructured data using NLP (5%). Table 1 shows the correspondence between structured and unstructured codes for several sample concepts.

Table 1.

Correspondence between structured and unstructured codes.

Concept	Struct.	Unstruct.	Both	Total
Impulse control disorder	145 (19%)	688 (86%)	37 (5%)	796
Unspecified bipolar disorder	1,322 (30%)	4,053 (94%)	1,051 (24%)	4,324
Schizo-affective disorder	250 (42%)	522 (88%)	177 (30%)	595
Opioid dependence or abuse	1,183 (27%)	3,893 (90%)	761 (17%)	4,315

Open in a new tab

The number of patients that have a structured EHR code for a given concept (first column), an NLP code (based on a free-text mention of that concept in their unstructured clinician notes, second column), and both a structured code and an NLP code for the given concept. Since NLP concepts are more general, each row includes one NLP code but several structured codes with similar descriptions. Furthermore, “opioid dependence” and “opioid abuse” codes were merged into one code since many EHR codes mention both opioid dependence and abuse.

Model performance

The results of training and testing are presented in Tables 2 and 3 and Fig. 4. We found that for both NBC and BRFC modeling approaches, training on structured data features resulted in higher predictive performance than training on features derived from unstructured data, with an improvement in AUC of 2–3% (p < 0.001).

Table 2.

Performance of NBC models on the test set.

	Unstructured		Structured		Both
Specificity	PPV	Sensitivity	PPV	Sensitivity	PPV	Sensitivity
0.99	0.070	0.079	0.072	0.076	0.088	0.092
0.95	0.046	0.254	0.047	0.239	0.051	0.260
0.90	0.035	0.378	0.036	0.365	0.039	0.391
0.80	0.024	0.520	0.026	0.530	0.027	0.540
AUC	0.714		0.742		0.743

Open in a new tab

There is no significant increase (p = 0.688) in AUC between the model based on structured-data-only and the model based on both structured and unstructured data.

Table 3.

Performance of BRF models on the test set.

	Unstructured		Structured		Both
Specificity	PPV	Sensitivity	PPV	Sensitivity	PPV	Sensitivity
0.99	0.142	0.168	0.191	0.246	0.219	0.267
0.95	0.082	0.447	0.092	0.507	0.097	0.545
0.90	0.057	0.608	0.063	0.657	0.066	0.697
0.80	0.037	0.766	0.040	0.820	0.041	0.845
AUC	0.868		0.887		0.902

Open in a new tab

There is a significant increase (p < 0.001) in AUC between the model based on structured-data-only and the model based on both structured and unstructured data. There are also substantial increases in sensitivity.

Fig. 4 — BRFC models perform considerably better than NBC models in terms of AUC across all three datasets. Combining structured and unstructured data yields better performance than using structured data alone, which itself performs better than using unstructured data only.

For the NBC model, training on both structured and unstructured data yielded no significant improvement over training on structured data alone (p-value = 0.67). However, for BRFCs, training with both structured and unstructured data led to a moderate but significant 1.6% increase in AUC relative to training on structured data alone (p-value < 0.001). The combined structured and unstructured BRFC model also exhibited moderate increases in PPV and sensitivity from the structured-data-only model across all specificity thresholds, with a 4% increase in sensitivity at both 0.90 and 0.95 specificity, in addition to increases in PPV.

Contingency analysis

Tables 4 and 5 shows structured-unstructured feature pairs in which the relationship between the two features differed most between case and non-case cohorts—namely, those with the highest interaction heterogeneity. Table 4 shows feature pairs in which the structured feature A was associated with greater suicide risk (i.e., feature A occurred more frequently in the case cohort than in the non-case cohort). These include drug and opioid use, suicidal ideation, and borderline personality disorder which are associated with various high-risk NLP features including schizophrenia, self-reported suicide attempts, imprisonment, and homelessness.

Table 4.

Structured-unstructured feature pairs AB with high interaction heterogeneity (IH), where A is a strong risk factor for suicide attempt.

Features		Cases				Non-cases
Structured (A)	Unstructured (B)	A	B	AB expected	AB actual	A	B	AB expected	AB actual	IH
Other, mixed, or unsp. Drug abuse, unsp. Use	Suicide attempts	2356	3741	374.53	1003	148	563	3.54	53	77.55
Other, mixed, or unsp. Drug abuse, unsp. Use	Section XII	2356	3045	304.85	849	148	403	2.53	43	74.72
Other, mixed, or unsp. Drug abuse, unsp. Use	Living on the street	2356	1113	111.43	532	148	154	0.97	36	66.66
Other, mixed, or unsp. Drug abuse, unsp. Use	Prison	2356	2043	204.53	825	148	358	2.25	51	62.57
Other, mixed, or unsp. Drug abuse, unsp. Use	Intoxications	2356	2663	266.61	889	148	462	2.91	50	60.56
Suicidal ideation	Section XII	1820	3045	235.49	1299	127	403	2.17	81	54.69
Other, mixed, or unsp. Drug abuse, unsp. Use	Undomiciled	2356	2357	235.97	964	148	408	2.57	55	54.50
Other, mixed, or unsp. Drug abuse, unsp. Use	Opioid dependence	2356	1625	162.69	841	148	195	1.23	44	53.86
Suicidal ideation	Schizoaffective schizophrenia	1820	676	52.28	223	127	118	0.64	21	52.75
Other, mixed, or unsp. Drug abuse, unsp. Use	Sober	2356	3667	367.12	1329	148	723	4.55	76	52.29
Other, mixed, or unsp. Drug abuse, unsp. Use	Unspecified bipolar disorder	2356	3488	349.20	932	148	699	4.40	49	48.53
Other, mixed, or unsp. Drug abuse, unsp. Use	Schizoaffective schizophrenia	2356	676	67.68	172	148	118	0.74	15	46.44
Opioid abuse, unspec. Use	Sober	1305	3667	203.35	710	78	723	2.40	42	46.09
Other, mixed, or unsp. Drug abuse, unsp. Use	Methadone	2356	2992	299.54	1165	148	653	4.11	69	45.55
Borderline personality	Methadone	582	2992	74.00	139	35	653	0.97	14	43.59
Opioid abuse, unspec. Use	Living on the street	1305	1113	61.72	293	78	154	0.51	18	43.28
Opioid type dependence, continuous use	Drug seeking	710	463	13.97	96	50	51	0.11	9	37.61
Suicidal ideation	Suicidality	1820	2546	196.90	1057	127	380	2.05	58	35.84
Other, mixed, or unsp. Drug abuse, unsp. Use	Cluster b	2356	495	49.56	175	148	43	0.27	10	35.70
Unspec. Neurotic disorder	Opioid dependence	1003	1625	69.26	191	72	195	0.60	12	35.48

Open in a new tab

A high IH value indicates that the relationship between A and B changes significantly between case and non-case populations.

Table 5.

Structured-unstructured feature pairs AB with high interaction heterogeneity (IH), where A is a strong protective factor against suicide.

Features		Cases				Non-cases
Structured (A)	Unstructured (B)	A	B	AB expected	AB actual	A	B	AB expected	AB actual	IH
Screening mammogram for malignant neoplasm of breast	Imp. Cont. Dis.	89	661	2.50	51	2091	3658	325.03	875	110.08
Annual Exam	Imp. Cont. Dis.	171	661	4.80	81	2596	3658	403.53	1249	94.20
Screening mammogram for malignant neoplasm of breast	vacuuming	89	231	0.87	25	2091	1546	137.37	374	93.77
Screening digital breast tomosynthesis, bilateral	Imp. Cont. Dis	103	661	2.89	46	1656	3658	257.41	730	71.63
Encounter for screening, unspec.	Imp. Cont. Dis	55	661	1.54	30	809	3658	125.75	344	66.36
Screening digital breast tomosynthesis, bilateral	vacuuming	103	231	1.01	23	1656	1546	108.79	332	62.36
Encounter for screening for malignant neoplasm of colon	Imp. Cont. Dis	61	661	1.71	31	1399	3658	217.46	620	57.69
Screening mammogram for malignant neoplasm of breast	Imp. Cont. Dis	89	2019	7.64	80	2091	10987	976.24	1765	53.97
Pure hypercholesterolemia, unsp.	Imp. Cont. Dis	64	661	1.80	30	1328	3658	206.43	596	49.89
Screening digital breast tomosynthesis, bilateral	Imp. Cont. Dis	103	2019	8.84	82	1656	10987	773.15	1422	44.84
Annual Exam	vacuuming	171	231	1.68	23	2596	1546	170.54	423	44.53
Physical therapy evaluation low complex 20 mins	Imp. Cont. Dis	36	661	1.01	22	678	3658	105.39	325	44.29
Screening, malig. neopl. colon	vacuuming	61	231	0.60	14	1399	1546	91.91	269	43.32
Screening, malig. neopl. breast	Imp. Cont. Dis	30	661	0.84	18	571	3658	88.76	272	36.53
Other hemorrhoids	Imp. Cont. Dis	37	661	1.04	17	559	3658	86.89	236	33.29
Age-related osteoporosis without current pathological fracture	Imp. Cont. Dis	32	661	0.90	18	549	3658	85.34	271	32.33
Asymptomatic menopausal state	vacuuming	20	231	0.20	7	387	1546	25.42	81	29.70
Other melanin hyperpigmentation	vacuuming	25	231	0.25	8	699	1546	45.92	156	29.59
Screening, unspecified	Imp. Cont. Dis	55	2019	4.72	46	809	10987	377.70	692	29.58
Mod sed same phys/qhp each addl 15 mins	Imp. Cont. Dis	28	661	0.79	13	822	3658	127.77	329	28.45

Open in a new tab

A high IH value indicates that the relationship between A and B changes significantly between case and non-case populations. Among the unstructured concepts, “Imp. Cont. Dis” refers to impulse-control disorder, and “vacuuming” refers to use of hallucinogenic and psychoactive drugs derived from psilocybin mushrooms.

Table 5 shows feature pairs in which the structured feature A was associated with lower suicide risk (i.e., A occurred less frequently in the case cohort than in the non-case cohort). These include concepts such as annual exams, mammograms, and tumor screenings that are associated with NLP concepts such as impulse-control disorder and use of hallucinogenic and psychoactive drugs derived from psilocybin mushrooms (referred to as “vacuuming” in informal parlance). In many cases, structured codes such as mammograms and tumor screenings are confounded with older age which is protective of suicide attempt risk. Hence lower suicide attempt risk associated with interaction of these structured variables with high-risk concepts such as impulse-control disorder and hallucinogenic drug use is to be expected. (In Tables 4 and 5, “AB Expected” corresponds to E[a_i] used in computation of the T_i statistic defined above.)

As described above, interaction heterogeneity (IH) provides a summary measure of the difference in the overall shape of the contingency tables between case and non-case populations. In order to provide a more intuitive understanding of IH, Tables 6 and 7 provide illustrative examples of contingency tables for two structured-unstructured feature pairs AB: One with a high IH value of 77.55 (“Other, mixed, or unspecified drug abuse, unspecified use” & “suicide attempts”), and the other with a low IH value of 3.95 (“Opioid abuse, unspecified use” & “junk (heroin)”). For simplicity, we refer to the number of individuals who had both A and B in the cases cohort as AB_cases, and to the number of people who had A but did not have B in the cases cohort as A~B_cases, and so forth.

Table 6.

Contingency tables for the structured-unstructured pair “Other, mixed, or unspecified drug abuse, unspecified use” (A) and “suicide attempts” (B).

Cases			Non-cases
Concept	B: 1	B: 0	Concept	B: 1	B: 0
A: 1	0.0401	0.0541	A: 1	0.0021	0.004
A: 0	0.1095	0.7376	A: 0	0.0204	0.9150

Open in a new tab

This feature pair has a high interaction heterogeneity (IH) value of 77.55. Values shown are proportions of the total number of samples (23,566) for each bin.

Table 7.

Contingency tables for the structured-unstructured pair “Opioid abuse, unspecified use” (A) and “junk (heroin)” (B).

Cases			Non-cases
Concept	B: 1	B: 0	Concept	B: 1	B: 0
A: 1	0.0443	0.0079	A: 1	0.0022	0.0010
A: 0	0.1071	0.7820	A: 0	0.0297	0.9085

Open in a new tab

This feature pair has a low IH value of 3.95. Values shown are proportions of the total number of samples (23,566) for each bin. The differences between the two distributions are smaller in Table 7 than in Table 6, resulting in a lower IH value.

The values for AB_cases and AB_non-cases are similar for both pairs of contingency tables (Tables 6 and 7), as are the values for ~AB_cases and ~AB_non-cases. However, the differences between A~B_cases and A~B_non-cases, and the differences between ~A~B_cases and ~A~B_non-cases are greater in Table 6 than in Table 7. Thus, the overall shape of the contingency table in Table 6 changes more between case and non-case populations than the contingency in Table 7. This yields a larger IH value for Table 6 and a smaller IH value for Table 7, indicating that the interaction of concepts in Table 6 is more strongly associated with the suicide-attempt outcome.

In order to study the difference between IH and more traditional measures of risk, Fig. 5 plots IH versus the joint suicide attempt risk of features A and B (defined as the log of the ratio of the expected joint occurrences of AB in the case vs. non-case cohorts). As mentioned, IH is a measure of whether the interaction between features A and B differs significantly between case and non-case cohorts. The joint suicide attempt risk provides a summary measure of association between the features and the outcome, reflecting the difference in the number of occurrences of A and B between case and non-case cohorts. (To reduce noise, we only included feature pairs AB with at least 10 joint occurrences in either case or non-case cohorts.) Figure 5 shows that many feature pairs with similar joint suicide risk have a large variation in IH—highlighting the fact that IH can reveal variation in feature interactions that the ratio of expected occurrences does not capture.

This is illustrated further in Tables 8 and 9, which presents interactions that correspond to the rightmost cluster in Fig. 5 (i.e., feature pairs with joint suicide risk between 1.7 and 2.3). Within this cluster, Table 8 presents the 20 feature interactions with the highest values of IH, and Table 9 presents the 20 feature interactions with the lowest values of IH. Although the joint suicide risk values are approximately the same in both tables, we see that the nature of interactions is different between Tables 8 and 9. Table 8 contains mostly general substance-abuse structured features (e.g., “Other, mixed, or unspecified drug abuse, unspecified use”), while Table 9 includes specific substance-abuse structured features such as cocaine, methadone, barbiturate, and opioid consumption. Furthermore, the substance abuse codes in Table 8 interact mostly with non-substance-abuse unstructured features such as “lack of domicile”, “schizophrenia” and “imprisonment”, while the substance-abuse codes in Table 9 interact mostly with other substance-abuse-related unstructured features—most prominently, heroine and thioridazine. Thus, interactions between features that are near-synonyms show less difference between case and non-case cohorts than interactions between features that are more heterogeneous.

Table 8.

Structured-unstructured feature pairs A-B with high interaction heterogeneity (IH) values.

Structured feature (A)	Unstructured feature (B)	Joint suicide attempt risk	IH
Other, mixed, or unspecified drug abuse, unspecified use	Suicide attempts	2.02	77.55
Other, mixed, or unspecified drug abuse, unspecified use	Section XII	2.08	74.72
Other, mixed, or unspecified drug abuse, unspecified use	Living on the street	2.06	66.66
Other, mixed, or unspecified drug abuse, unspecified use	Prison	1.96	62.57
Other, mixed, or unspecified drug abuse, unspecified use	Undomiciled	2.02	61.18
Other, mixed, or unspecified drug abuse, unspecified use	Intoxications	1.96	60.56
Suicidal ideation	Section XII	2.03	54.69
Other, mixed, or unspecified drug abuse, unspecified use	Undomiciled	1.96	54.50
Other, mixed, or unspecified drug abuse, unspecified use	Opioid dependence	2.12	53.86
Suicidal ideation	Schizoaffective schizophrenia	1.91	52.75
Other, mixed, or unspecified drug abuse, unspecified use	Sober	1.91	52.29
Opioid abuse, unspecified use	Methadone	2.02	48.85
Other, mixed, or unspecified drug abuse, unspecified use	Unspecified bipolar disorder	1.90	48.53
Suicidal ideation	Delusions	1.86	48.32
Other, mixed, or unspecified drug abuse, unspecified use	Methadone	2.00	46.72
Other, mixed, or unspecified drug abuse, unspecified use	Schizoaffective schizophrenia	1.96	46.44
Opioid abuse, unspecified use	Sober	1.93	46.09
Other, mixed, or unspecified drug abuse, unspecified use	Methadone	1.86	45.55
Cocaine abuse, unspecified use	Methadone	1.97	43.78
Borderline personality	Methadone	1.88	43.59

Open in a new tab

The joint suicide attempt risk of features A and B is defined as the log of the ratio of the expected joint occurrences of AB in the case vs. non case cohorts.

Table 9.

Structured-unstructured feature pairs A-B with low interaction heterogeneity (IH) values.

Structured feature (A)	Unstructured feature (B)	Joint suicide risk	IH
Opioid type dependence, continuous use	Hearing voices	2.03	0.05
Opioid type dependence, continuous use	Suicidality	1.98	0.05
Methadone tab 40 mg	Junk (heroin)	1.73	0.05
Barbiturate and similarly acting sedative or hypnotic abuse, unspecified use	Mugged (assault)	1.96	0.04
Unspecified neurotic disorder	VH (visual hallucinations)	1.89	0.04
Other, mixed, or unspecified drug abuse, unspecified use	Judgment impaired	2.12	0.03
Barbiturate and similarly acting sedative or hypnotic abuse, unspecified use	Prison	2.04	0.03
Opioid type dependence, continuous use	Junk (heroin)	1.83	0.02
Cocaine abuse, unspecified use	Blackouts	1.88	0.02
Methadone tab 40 mg	Junk (heroin)	1.83	0.02
Opioid type dependence, continuous use	Thioridazine	1.99	0.02
Barbiturate and similarly acting sedative or hypnotic abuse, unspecified use	Junk (heroin)	2.11	0.01
Acute alcoholic intoxication in alcoholism, continuous drinking behavior	Hallucinosis	1.99	0.01
Suicidal ideation	Crack	2.02	0.01
Methadone tab 40 mg	Stolen	1.73	0.01
Unspecified neurotic disorder	Sexual assaults	1.81	0.01
Depressive Neuroses (MS v24)	Sober	1.96	0.00
Depressive Neuroses (MS v24)	Prison	2.01	0.00
Unspecified neurotic disorder	VH	1.85	0.00
Cocaine abuse, continuous use	VH	1.95	0.00

Open in a new tab

The joint suicide attempt risk of features A and B is defined as the log of the ratio of the expected joint occurrences of AB in the case vs. non case cohorts.

Discussion

We found that models trained only on features derived from structured-data perform better than models trained only on features derived from unstructured data. The performance gap between models trained with structured data and those trained with unstructured data is quite small, considering the compact size of the unstructured data.

Combining unstructured data with structured data provided almost no performance benefit with the NBC model, whereas the BRFC model showed a significant increase in AUC. The fact that the NBC model only negligibly benefitted from the addition of NLP concepts is not surprising; while interactions between structured and unstructured features could contain useful signals, NBCs assume conditional independence among features, and so cannot exploit these interactions to improve predictive performance. On the other hand, BRFCs are designed to capture interactions between features, and are thus able to deliver a significant improvement in predictive performance. Indeed, examining trees in the BRFC model, we found many examples where splits based on NLP concepts were either preceded or followed by structured-data-based splits, bearing evidence that the BRFC models captured useful structured-unstructured interactions.

Structured-unstructured feature pairs whose interactions differed most between suicidal and non-suicidal populations were those that described heterogeneous pairs of general concepts, rather than pairs of similar concepts. In particular, interactions between concepts related to mental health issues, drug abuse, excessive alcohol consumption, and psychiatric disorders were predictive of suicide risk. Although links between these concepts have been previously established, features derived from unstructured data further strengthen these associations. Unstructured data also helps capture complementary information about the well-being of patients that structured data may not provide: The interaction of structured concepts with concepts derived from unstructured data for environmental risk factors (e.g., “living on the street”, “undomiciled”, “prison”), services that occur outside the health system (e.g., “methadone maintenance”), and diagnoses such as “delusions” and “impulse-control disorder” were found to be highly predictive of suicide. Similarly, interactions between medical screenings and general examinations were observed to be protective of suicide risk, although it is unclear whether it was the examinations themselves or other confounding factors related to the examinations that were protective of suicide risk. As seen from the examples in Table 1, many patients who had NLP codes recorded for some concepts did not have the corresponding structured codes in their EHR records which shows that unstructured data can help capture information about a patient that structured data misses. Such insights into the changing nature of feature interactions between case and non-case cohorts can help to improve predictive performance and provide a deeper understanding of clinical risk.

This study is subject to a number of limitations. We analyzed 20 years of longitudinal healthcare data from a single healthcare system including hospital admissions, observational stays, emergency department visits, and outpatient encounters. Visits outside this geographical setting, time period, and network of hospitals were not included, and therefore this study dataset may be missing some encounters which could have potentially been useful for predicting suicide attempts. Moreover, some of these excluded visits may have been for suicidal behavior, meaning that some patients may have been incorrectly identified as non-case subjects or correctly identified as case subjects but given incorrect onset times. For patient diagnoses, we included both ICD-9 and ICD-10 codes since both encoding standards were used in the RPDR during the last 20 years. Due to this, there are some concepts for which both ICD-9 and ICD-10 definitions have been included in the dataset, adding extra computational burden. Since the goal of this research was to investigate properties of structured and unstructured data, we compared predictive performance of NBCs and BRFCs, which are relatively easy to interpret. To achieve a potentially superior predictive model, it would also be worthwhile to consider other modeling approaches such as XGBoost, neural networks, and support vector machines, as well as complex feature selection techniques such as PCA and t-SNE. However, these modeling methods are more difficult to interpret, making them less suitable for the present study. They are potential avenues for future work.

Another limitation is that suicide attempt risk predictions were performed only on the penultimate visits prior to a suicide attempt. This was done to reduce the complexity and computational burden of the prediction task while allowing us to focus on differences between structured and unstructured features. As a result, the specific models developed here are designed to predicting risk in later visits of patients and may not predict suicide risk sufficiently in advance if used in earlier visits. Predictive models trained for practical purposes would be designed for predicting at any point during the patient’s longitudinal history. One approach for doing this with random forests is to sample random visits in the patient’s medical timeline and include cumulative feature history up until that visit as “snapshots.” We have explored such multi-temporal suicide risk predictions with random forests in a separate study¹⁰.

It is likely that some patients who attempted suicide did not have their suicide attempts recorded in the electronic medical record, either due to failures in diagnosis, errors in recording the suicide attempt, or simply because the patient did not have an encounter with the health system around the suicide attempt. Therefore, the actual misclassification rate of case subjects by our models could be higher than reported. This limitation with case labeling is inherent to any study on suicide risk prediction based on EHR data. Although in this study only the structured codes were used in case definitions, future work can also consider using suicide attempt concepts in clinician notes when labeling cases. This way, NLP concepts would not only expand the feature set of the patients, but also expand the list of cases for training and testing. Given the imbalanced 1/99 case—non-case ratio, this could improve the predictive accuracy of trained models. Future work could also compare the performance of these models on different demographic, socio-economic, and time cohort subgroups. It would also be worthwhile to consider stratifying models by variables known to be confounded with suicide attempt risk such as age.

When developing the models for this study, we constrained ourselves to methods that are relatively easy to interpret. We also tried to keep the complexity of our models and NLP pipeline relatively low for better clinical generalizability. Therefore, our rule-based NLP pipeline and NBC models are highly interpretable. Although the training process of RF models can be obscure due to the randomness and ensembling involved, trees can be easily visualized to justify decisions made by the model during inference. However, if the interpretability constraint were waived, it would be worthwhile to explore other modeling approaches, including deep learning artificial neural network (ANN) models, for both NLP feature extraction and predictive modeling. Although such models are typically harder to interpret, deep learning models have exhibited superior predictive performance compared to statistical and machine learning models in a number of computational tasks. In particular, recent advancements made in recurrent neural networks (RNNs) and attention-based transformer networks have proven great potential in many types of natural language tasks. ANNs are able to extract abstract representations from different modes of input data, without any feature engineering involved and fuse them efficiently for optimal predictive performance. ANNs have already been employed for many medical prediction tasks involving structured and unstructured data. In the suicide prediction workflow described in this study, both our NLP feature extraction pipeline and RF-based risk modeling could be replaced with deep-learning approaches for greater predictive performance. Future work could explore the application of deep learning to NLP feature extraction and prediction of suicide attempt risk^11–16.

Previous studies have examined the use of unstructured EHR data in clinical prediction models in general, and in suicide prediction models in particular. Tsui et al.¹ showed that the use of NLP features extracted from clinician notes significantly improved the AUC of an ensemble of extreme gradient boosting models and of a Lasso model over a structured-data only baseline model. Poulin et al. used keywords extracted from unstructured clinician notes to predict suicide risk among US veterans with an accuracy of 65%⁴. Carson et al constructed a random forest model trained on structured and unstructured EHR data of psychiatrically hospitalized adolescents to predict suicidal behavior with an AUC of 0.68¹⁷.

In the present study, we examined the integration of features derived from unstructured clinician notes into structured-data-based suicide risk prediction models. We showed that a model that assumes independence among variables (NBC) does not significantly benefit from the addition of unstructured features, whereas models such as Balanced Random Forest Classifiers that explicitly capture interactions exhibit performance increases when unstructured features are added. We also proposed and implemented a framework for identifying specific structured-unstructured feature pairs whose interaction patterns differ with respect to a patient’s suicide risk, and thus have the potential to improve predictive performance and increase understanding of clinical risk. Many of the interactions identified are expected, which serves to validate our proposed approach for identifying meaningful interactions that can help further elucidate the risk factors of clinical conditions. These findings and this framework can be used to improve current and future EHR-based clinical prediction models, which are becoming increasingly widespread in clinical settings.

Methods

Data processing

We analyzed data from the Mass General Brigham Research Patient Data Registry (RPDR)¹⁸, an EHR data warehouse covering 4.6 million patients from two large academic medical centers in Boston, MA, USA (Massachusetts General Hospital and Brigham and Women’s Hospital), as well as their affiliated community and specialty hospitals in the Boston area. The RPDR was queried for all inpatient and outpatient visits occurring from 1998 through 2018 by individuals who met the inclusion criteria of: Three or more total visits recorded in the EHR, 30 days or more between the first and last visits, and the existence of at least one encounter after age 10 and before age 90. For each patient, we analyzed all demographic, diagnostic, procedure, laboratory, and medication data recorded at each visit, as well the unstructured clinician notes.

The structured data extracted from the RPDR was retrieved in a format wherein each row corresponded to a specific feature recorded during a specific encounter. All datasets had the following columns: de-identified subject number, encounter number, concept code, concept start date, and site of encounter. Lab results data had a “valueflag” column representing the recorded outcome of a given test (low, normal, high, abnormal, undetermined) which was merged into the lab features column. This transformed each lab feature into multiple “dummy” features. For example, instead of a feature “test_1”, we had five features: “test_1|L”, “test_1|N”, “test_1|H”, “test_1|U”, “test_1|A”. Each NLP concept was appended with the mention type of the concept (positive; negative; family history; negative family history), which quadrupled the number of NLP features. The “Concept start date” column was used for filtering patients by age along with demograpihcs data, and for removing data recorded following the first suicide attempt for cases.

Natural language processing

In order to derive features from the unstructured clinician notes, we created a custom lexicon of suicide-relevant and psychiatric concepts using a variety of approaches including: (1) selecting signs and symptoms, and mental and behavioral process semantic types from the Unified Medical Language System (UMLS)¹⁹; (2) mapping DSM symptoms and concepts from structured instruments²⁰; (3) automatically extracting features from public sources including Wikipedia and MedScape; (4) incorporating RDoC domain matrix terms²⁰; (5) selecting predictive features from coded suicide attempt prediction models²¹; and (6) manual annotation of terms by expert clinicians. This lexicon was linked to UMLS concepts and included 480 distinct semantic concepts and 1,273 tokens or phrases. Using this lexicon, we ran the HiTex²² NLP named-entity extraction pipeline to identify concepts in over 120 million clinical notes. For each note, we identified the presence of a concept (e.g., symptom, disease, mental process) and further tagged concepts as negated (NEG), family history mention (FH) or negated family history (NFH). For negation and family history pipeline components, we utilized the ConText algorithm²³.

Case definition

We have previously described the development of an EHR-based case definition for suicide⁹. In summary, with the help of three expert clinicians, we identified codes from International Classification of Diseases, Ninth Revision (ICD-9) and International Classification of Diseases, Tenth Revision (ICD-10) that reliably captured suicide attempts with a positive predictive value (PPV) of greater than 0.70. Subjects having at least one of these codes were included in the case population. For cases, we also removed all data following the first suicide attempt (the index event) and made predictions at the penultimate visit prior to the index event. For the purpose of this study, the case definition was based solely on structured diagnostic information and did not include information derived from the clinician notes when classifying individuals as cases versus non-cases.

Model training

We split our data into training and testing sets with a 70/30 ratio, respectively. For each individual, we included all visits available in that patient’s EHR. For individuals labeled as cases, we included only visits up to and including the penultimate visit prior to the visit on which the individual first met the case definition. This restriction was not applied to non-cases. We applied two modeling approaches for suicide prediction. The first was a Naive Bayes Classifier (NBC) model, described in detail elsewhere²⁴. NBCs are a subclass of Bayesian networks that assume strong conditional independence of all input features, greatly reducing model complexity²⁵. NBCs have been shown to be well-suited for clinical decision support tasks and are highly scalable and interpretable; they compute a risk score for each concept using the odds ratios of its prevalence in case and non-case populations, ignoring interactions with other variables. During validation, the NBC risk scores for each concept in a patient’s visit history were added together to compute a cumulative suicide risk measure for the subject. If a patient had multiple instances of the same predictor over multiple visits, that predictor was counted multiple times at different visits of the patient. The NBC model was trained using R version 3.6.0 and the R packages pROC and tidyverse.

The second modeling approach was a Balanced Random Forest Classifier (BRFC)²⁶, which unlike NBCs is capable of capturing interactions between features. Balanced Random Forests are an extension of Random Forest²⁷ models, which work well with label-imbalanced datasets. Due to computational constraints, the BRFCs were trained and tested on a smaller subset of 140,000 subjects of the RPDR data. The occurrence rate of suicide attempts in our dataset is very low, at about 1%, resulting in low positive predictive values (PPV) on test sets with regular Random Forests. BRFCs balance the classes by either downsampling the majority class, upsampling the minority class, or resampling both classes with replacement during bootstrap draws until a specified ratio of classes is met. During the sampling of training data, we ensured that the proportion of cases was lifted from 1% to around 12%. The test set was left intact with the natural 1% suicide attempt rate. The data pipeline for arriving at training and testing sets for all described models is illustrated in Fig. 2.

For selecting the parameters of the model, we performed a grid search with 5-fold cross-validation on the BRFC parameter space. Based on the grid search results, we arrived at a model with 30 trees, 50% of all features sampled for each tree, bootstrap sample size equal to the total number of samples, and 1:4 ratio of case to non-cases in every bootstrap sample, achieved with random undersampling of the majority class. Even after undersampling non-cases to 1:4 case:non-case ratio, the size of bootstrap samples remained sufficiently large due to the relatively high case prevalence (12%) in the training data. We used Python version 3.6.9 with the libraries scikit-learn, imblearn, numpy, pandas, and matplotlib. The packages imblearn and scikit-learn were useful for training and testing balanced random forests. Libraries numpy and pandas were helpful for data transformations and analyses. Paper visualizations were produced using matplotlib.

We used area under the receiver operating characteristic curve (AUC) as the primary predictive performance metric. In order to create confidence intervals and enable comparison of AUC values of different models, we used the percentile bootstrapping method with a simulation size of 1,000. We also measured PPV and sensitivity over a range of specificities. Since the primary goal of our work was to investigate properties of the NLP dataset rather than to build an optimal predictive model, we maximized simplicity in the study design: All predictions were made at the visit prior to the first suicide attempt for cases, and the last visit recorded for non-cases.

Contingency analysis

In order to better understand the interactions between structured and unstructured data, we performed a separate contingency analysis to identify interactions between structured and unstructured features that differed substantially between case and non-case populations. To account for possible effects of sample size differences between case and non-case populations, we randomly sampled two equal cohorts—one with 23,566 cases and the other with 23,566 non-cases. (These cohorts were sampled from the original dataset before training and testing splits were made.) To simplify analysis, we counted each feature only at its first occurrence for each subject.

For simplicity in the following discussion, we will refer to a feature derived from structured data as A, and a feature derived from NLP of unstructured data as B. For each feature pair A-B, we computed contingency tables for both case and non-case populations (Table 10). To measure the strength of association between feature A and feature B within each cohort, we performed a Chi-squared test of independence. The null hypothesis was that A is independent of B, while the alternative hypothesis was that there is an association between A and B. Equation (1) shows the computation of the statistic T_i for both case and non-case populations:

\begin{matrix} T_{i} = \sum \frac{a_{i} - E [a_{i}]}{E [a_{i}]}, E [a_{i}] = \frac{(a_{i} + c_{i}) (a_{i} + b_{i})}{n}, n = a_{i} + b_{i} + c_{i} + d_{i} \\ T_{i} ~ χ_{1}^{2} \end{matrix}

where a, b, c, and d are as defined in Table 10. Under the null hypothesis, T_i follows a Chi-squared distribution with one degree of freedom. This value can be used to compute p-values from the Chi-squared quantile function.

Table 10.

Contingency tables of structured-unstructured concept pairs A-B, for case and non-case cohorts.

Cases			Non-cases
Concept	B: 1	B: 0	Concept	B: 1	B: 0
A: 1	a₁	b₁	A: 1	a₀	b₀
A: 0	c₁	d₁	A: 0	c₀	d₀

Open in a new tab

In order to determine whether the interactions between feature A and feature B differed between case and non-case populations, we used Woolf’s method for testing for homogeneity²⁸. The null hypothesis was that the odds ratios computed on each of the case and non-case populations were equal, while the alternative hypothesis was that these differed significantly. We calculated Woolf’s test statistic (X²_HOM) as shown in Eq. 2:

\begin{matrix} \log ({\hat{O R}}_{i}) = \log (\frac{a_{i} d_{i}}{b_{i} c_{i}}) \\ {[Var (\log ({\hat{O R}}_{i}))]}^{- 1} = w_{i} = {(\frac{1}{a_{i}} + \frac{1}{b_{i}} + \frac{1}{c_{i}} + \frac{1}{d_{i}})}^{- 1} \\ \bar{\log O R} = \frac{\sum_{i = 1}^{k} w_{i} \log ({\hat{O R}}_{i})}{\sum_{i = 1}^{k} w_{i}} \\ X_{HOM}^{2} = \sum_{i = 1}^{k} w_{i} {(\log {\hat{O R}}_{i} - \bar{\log O R})}^{2} \\ X_{HOM}^{2} \overset{asym}{~} χ_{k - 1}^{2} \end{matrix}

For k = 0, under the null hypothesis, X²_HOM follows a Chi-squared distribution with one degree of freedom. For clarity, we will refer to Woolf’s test statistic X²_HOM as Interaction Heterogeneity (IH). Interaction heterogeneity provides a summary measure of the difference in the overall shape of the contingency table between case and non-case populations.

Next, we examined the joint distribution p(AB|Y), conditional on the case variable Y (suicide vs. non-suicide). Using Bayes’ rule, this distribution can be used to derive the more clinically interesting distribution p(Y|AB)—specifically P(Y = 1|A = 1, B = 1)—which is the probability of the patient attempting suicide in the future given that the patient has both features A and B:

\begin{matrix} P (Y ∣ A, B) = \frac{P (A ∣ Y, B) P (Y ∣ B)}{P (A ∣ B)} \\ P (A = 1 ∣ Y = 1, B = 1) = \frac{a_{1}}{a_{1} + b_{1}} \\ P (Y = 1 ∣ B = 1) = \frac{a_{1} + b_{1}}{a_{0} + b_{0} + a_{1} + b_{1}} \\ P (A = 1 ∣ B = 1) = \frac{a_{1} + a_{0}}{a_{1} + b_{1} + a_{0} + b_{0}} \\ P (Y = 1 ∣ A = 1, B = 1) = \frac{(\frac{a_{1}}{a_{1} + b_{1}}) (\frac{a_{1} + b_{1}}{a_{0} + b_{0} + a_{1} + b_{1}})}{\frac{a_{1} + a_{0}}{a_{1} + b_{1} + a_{0} + b_{0}}} \end{matrix}

The variables a_i, b_i, c_i, d_i shown in Eq. 3 are as in Table 10, except that the entries in the contingency table of cases have been divided by 100 to reflect the 1/99 case-non-case ratio encountered in the clinical population. Thus, using Woolf’s method, we are able to identify specific structured-unstructured feature interactions that are most different between case and non-case cohorts, and thus have the most potential for improving predictive performance.

Combining the above methods, we assembled a list of structured-unstructured feature pairs AB in which: (1) Both A and B were among the top 200 most important features as ranked by the absolute value of the NBC feature risk scores; (2) the joint occurrence of A and B were significantly different from the expected value under the null within both case and non-case cohorts, as measured using the Chi-squared statistic T_i; and (3) the interaction between A and B was significantly different (heterogeneous) between the case population and the non-case population—as measured by interaction heterogeneity (IH). For ease of interpretation, we included only unstructured features that were either “positive” or “positive family history” mentions, and excluded “negative” and “negative family history” mentions.

Since the goal of this analysis was not to simply find meaningful interactions in the dataset, but rather to identify meaningful interactions between structured and unstructured features, we performed the contingency analysis on structured-unstructured feature pairs, but not on structured–structured or unstructured–unstructured feature pairs.

Ethics

This research was approved by the Mass General Brigham Institutional Review Board, along with an IRB reliance agreement from the Boston Children’s Hospital Institutional Review Board.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Reporting Summary^{(79.5KB, pdf)}

Acknowledgements

This work was supported in part by a gift from the Tommy Fuss Fund and R01MH117599 (Drs. Smoller and Reis) from the National Institute of Mental Health.

Author contributions

I.B. and B.Y.R. conceived and planned the research, together with input from all other authors. I.B. conducted the principal data analysis and modeling work. V.C. conducted the NLP analysis and prepared the datasets for analysis. Y.B.C. also contributed to the modeling work. B.Y.R. supervised the research. I.B., V.C., Y.B.C., E.M., M.K.N., J.W.S., and B.Y.R. all discussed the results and contributed to the final manuscript.

Data availability

The data used in this study cannot be made available due to restrictions relating to the use of EHR data. This restriction also prevents the data from being made available upon request from the authors.

Code availability

The code used in this study is available from the authors upon request.

Competing interests

Dr. Smoller reported serving as an unpaid member of the Bipolar/Depression Research Community Advisory Panel of 23andMe and a member of the Leon Levy Foundation Neuroscience Advisory Board, and receiving an honorarium for an internal seminar at Biogen Inc. Dr. Nock receives textbook royalties from Macmillan and Pearson publishers and has been a paid consultant in the past year for Microsoft and for a legal case regarding a death by suicide. He is an unpaid scientific advisor for TalkLife and Empatica. The remaining authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Jordan W. Smoller, Ben Y. Reis.

Supplementary information

The online version contains supplementary material available at 10.1038/s41746-022-00558-0.

References

1.Tsui FR, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open. 2021;4:ooab011. doi: 10.1093/jamiaopen/ooab011. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.McCoy TH, Jr., Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–1071. doi: 10.1001/jamapsychiatry.2016.2172. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Glenn CR, Nock MK. Improving the short-term prediction of suicidal behavior. Am. J. Prev. Med. 2014;47:S176–S180. doi: 10.1016/j.amepre.2014.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Poulin C, et al. Predicting the risk of suicide by analyzing the text of clinical notes. PLoS One. 2014;9:e85733. doi: 10.1371/journal.pone.0085733. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gulati, G., Cullen, W. & Kelly, B. Psychiatry Algorithms for Primary Care (John Wiley & Sons, 2021).
6.Fazel S, Runeson B. Suicide. N. Engl. J. Med. 2020;382:266–274. doi: 10.1056/NEJMra1902944. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hedegaard, H., Curtin, S. C. & Warner, M. Suicide rates in the United States continue to increase. NCHS Data Brief (309) 1–8 (2018). [PubMed]
8.Nock MK, et al. Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychol. Sci. 2010;21:511–517. doi: 10.1177/0956797610364762. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Barak-Corren Y, et al. Predicting suicidal behavior from longitudinal electronic health records. Am. J. Psychiatry. 2017;174:154–162. doi: 10.1176/appi.ajp.2016.16010077. [DOI] [PubMed] [Google Scholar]
10.Bayramli I, Castro V, Barak-Corren Y, Madsen EM, Nock M K, Smoller J W, Reis B Y. Temporally informed random forests for suicide risk prediction. J. Am. Med. Inform. Assoc. 2021;29:62–71. doi: 10.1093/jamia/ocab225. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Xu Z, et al. Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J. Biomed. Inform. 2020;102:103361. doi: 10.1016/j.jbi.2019.103361. [DOI] [PubMed] [Google Scholar]
12.Su C, Xu Z, Pathak J, Wang F. Deep learning in mental health outcome research: a scoping review. Transl. Psychiatry. 2020;10:116. doi: 10.1038/s41398-020-0780-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sheikhalishahi S, et al. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med Inf. 2019;7:e12239. doi: 10.2196/12239. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Landi I, et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 2020;3:96. doi: 10.1038/s41746-020-0301-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Miotto R, Li L, Kidd B A, Dudley J T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 2016;6:26094. doi: 10.1038/srep26094. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Rajkomar A, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 2018;1:18. doi: 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Carson NJ, et al. Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLoS One. 2019;14:e0211116. doi: 10.1371/journal.pone.0211116. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a research patient data repository. AMIA Annu. Symp. Proc. 2006;2006:1044. [PMC free article] [PubMed] [Google Scholar]
19.Bodenreider O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ross, J. Psychiatric Phenotyping Using Symptom Profiles: Can Self-Report Symptoms Inform a New Psychiatric Taxonomy? (UCSF, 2018).
21.McCoy TH, Jr., et al. High throughput phenotyping for dimensional psychopathology in electronic health records. Biol. Psychiatry. 2018;83:997–1004. doi: 10.1016/j.biopsych.2018.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zeng QT, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 2006;6:30. doi: 10.1186/1472-6947-6-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Chapman, W., Dowling, J. & Chu, D. ConText: An algorithm for identifying contextual features from clinical text. In Biological, Translational, and Clinical Language Processing 81–88 (Association for Computational Linguistics, 2007).
24.Barak-Corren Y, et al. Validation of an electronic health record-based suicide risk prediction modeling approach across multiple health care systems. JAMA Netw. Open. 2020;3:e201262–e201262. doi: 10.1001/jamanetworkopen.2020.1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future diagnoses of domestic abuse: Modelling study. BMJ. 2009;339:b3677. doi: 10.1136/bmj.b3677. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chao, C., Liaw, A. & Breiman, L. Using random forest to learn imbalanced data. Berkeley Statistics Report No. 666. 1–12 (University of California Berkley, USA, 2004). https://statistics.berkeley.edu/tech-reports/666
27.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
28.Woolf B. On estimating the relation between blood group and disease. Ann. Hum. Genet. 1955;19:251–253. doi: 10.1111/j.1469-1809.1955.tb01348.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary^{(79.5KB, pdf)}

Data Availability Statement

The code used in this study is available from the authors upon request.

[CR1] 1.Tsui FR, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open. 2021;4:ooab011. doi: 10.1093/jamiaopen/ooab011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.McCoy TH, Jr., Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–1071. doi: 10.1001/jamapsychiatry.2016.2172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Glenn CR, Nock MK. Improving the short-term prediction of suicidal behavior. Am. J. Prev. Med. 2014;47:S176–S180. doi: 10.1016/j.amepre.2014.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Poulin C, et al. Predicting the risk of suicide by analyzing the text of clinical notes. PLoS One. 2014;9:e85733. doi: 10.1371/journal.pone.0085733. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Gulati, G., Cullen, W. & Kelly, B. Psychiatry Algorithms for Primary Care (John Wiley & Sons, 2021).

[CR6] 6.Fazel S, Runeson B. Suicide. N. Engl. J. Med. 2020;382:266–274. doi: 10.1056/NEJMra1902944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Hedegaard, H., Curtin, S. C. & Warner, M. Suicide rates in the United States continue to increase. NCHS Data Brief (309) 1–8 (2018). [PubMed]

[CR8] 8.Nock MK, et al. Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychol. Sci. 2010;21:511–517. doi: 10.1177/0956797610364762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Barak-Corren Y, et al. Predicting suicidal behavior from longitudinal electronic health records. Am. J. Psychiatry. 2017;174:154–162. doi: 10.1176/appi.ajp.2016.16010077. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Bayramli I, Castro V, Barak-Corren Y, Madsen EM, Nock M K, Smoller J W, Reis B Y. Temporally informed random forests for suicide risk prediction. J. Am. Med. Inform. Assoc. 2021;29:62–71. doi: 10.1093/jamia/ocab225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Xu Z, et al. Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J. Biomed. Inform. 2020;102:103361. doi: 10.1016/j.jbi.2019.103361. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Su C, Xu Z, Pathak J, Wang F. Deep learning in mental health outcome research: a scoping review. Transl. Psychiatry. 2020;10:116. doi: 10.1038/s41398-020-0780-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Sheikhalishahi S, et al. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med Inf. 2019;7:e12239. doi: 10.2196/12239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Landi I, et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 2020;3:96. doi: 10.1038/s41746-020-0301-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Miotto R, Li L, Kidd B A, Dudley J T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 2016;6:26094. doi: 10.1038/srep26094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Rajkomar A, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 2018;1:18. doi: 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Carson NJ, et al. Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLoS One. 2019;14:e0211116. doi: 10.1371/journal.pone.0211116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a research patient data repository. AMIA Annu. Symp. Proc. 2006;2006:1044. [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Bodenreider O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Ross, J. Psychiatric Phenotyping Using Symptom Profiles: Can Self-Report Symptoms Inform a New Psychiatric Taxonomy? (UCSF, 2018).

[CR21] 21.McCoy TH, Jr., et al. High throughput phenotyping for dimensional psychopathology in electronic health records. Biol. Psychiatry. 2018;83:997–1004. doi: 10.1016/j.biopsych.2018.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Zeng QT, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 2006;6:30. doi: 10.1186/1472-6947-6-30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Chapman, W., Dowling, J. & Chu, D. ConText: An algorithm for identifying contextual features from clinical text. In Biological, Translational, and Clinical Language Processing 81–88 (Association for Computational Linguistics, 2007).

[CR24] 24.Barak-Corren Y, et al. Validation of an electronic health record-based suicide risk prediction modeling approach across multiple health care systems. JAMA Netw. Open. 2020;3:e201262–e201262. doi: 10.1001/jamanetworkopen.2020.1262. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future diagnoses of domestic abuse: Modelling study. BMJ. 2009;339:b3677. doi: 10.1136/bmj.b3677. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Chao, C., Liaw, A. & Breiman, L. Using random forest to learn imbalanced data. Berkeley Statistics Report No. 666. 1–12 (University of California Berkley, USA, 2004). https://statistics.berkeley.edu/tech-reports/666

[CR27] 27.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]

[CR28] 28.Woolf B. On estimating the relation between blood group and disease. Ann. Hum. Genet. 1955;19:251–253. doi: 10.1111/j.1469-1809.1955.tb01348.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction

Ilkin Bayramli

Victor Castro

Yuval Barak-Corren

Emily M Madsen

Matthew K Nock

Jordan W Smoller

Ben Y Reis

Abstract

Introduction

Fig. 1. Information overlap in EHR data.

Results

Study population

Fig. 2. Data and modeling workflow.

Fig. 3. Distribution of time between penultimate hospital visit and first suicide attempt, in days.

Table 1.

Model performance

Table 2.

Table 3.

Fig. 4. Performance of NBC and BRFC models, by type of data used.

Contingency analysis

Table 4.

Table 5.

Table 6.

Table 7.

Fig. 5. Interaction heterogeneity versus joint suicide risk.

Table 8.

Table 9.

Discussion

Methods

Data processing

Natural language processing

Case definition

Model training

Contingency analysis

Table 10.

Ethics

Reporting summary

Supplementary information

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases