Abstract
Introduction:
Ecological Momentary Assessment (EMA) holds promise for providing insights into daily life experiences when studying mental health phenomena. However, commonly used mixed-effects linear statistical models do not fully utilize the richness of the multidimensional time-varying data that EMA yields. Recurrent Neural Networks (RNNs) provide an alternative data analytic method to leverage more information and potentially improve prediction, particularly for non-normally distributed outcomes.
Methods:
As part of a broader research study of suicidal thoughts and behavior in people with borderline personality disorder (BPD), eighty-four participants engaged in EMA data collection over one week, answering questions multiple times each day about suicidal ideation (SI), stressful events, coping strategy use, and affect. RNNs and mixed-effects linear regression models (MEMs) were trained and used to predict SI. Root mean squared error (RMSE), mean absolute percent error (MAPE), and a pseudo-R2 accuracy metric were used to compare SI prediction accuracy between the two modeling methods.
Results:
RNNs had superior accuracy metrics (full model: RMSE=3.41, MAPE=42%, pseudo-R2=26%) compared with MEMs (full model: RMSE=3.84, MAPE=56%, pseudo-R2=16%). Importantly, RNNs showed significantly more accurate prediction at higher values of SI. Additionally, RNNs predicted, with significantly higher accuracy, the SI scores of participants with depression diagnoses and of participants with higher depression scores at baseline.
Conclusion:
In this EMA study with a moderately sized sample, RNNs were better able to learn and predict daily SI compared with mixed-effects models. RNNs should be considered as an option for EMA analysis.
I. Introduction
Ecological Momentary Assessment (EMA) is a method of data collection that involves repeated real-world sampling of human subject data. EMA has several clear advantages over more traditional data collection methodologies such as retrospective self-report, and has seen rapid growth in implementation in recent years, facilitated by the use of mobile devices [Koppe et al., 2019]. Real-time reporting and the relatively minimal disruption to subjects’ everyday actions serve to reduce recall bias and preserve ecological validity allowing for more robust inference [Shiffman, 2014]. Additionally, EMA data can encompass many repeated observations for each subject under a range of different lived circumstances. These multivariate time-varying measures allow for more precise effects estimation and the exploration of transient predictors specific to the moment of reporting.
Machine-learning approaches, particularly recurrent neural networks (RNN), offer a promising avenue to capture nuanced aspects of lived experiences that may elude more conventional analyses of EMA data, such as mixed-effect models (MEMs) or data aggregation [Borrett et al., 1993; Oquendo et al., 2021; Parker et al., 2021]. RNNs are specifically designed to account for the temporal structure of data and possess the flexibility to incorporate non-linear effects and complex interactions without requiring pre-specification [Kriegeskorte and Golan, 2019]. The ability for RNNs to incorporate interactions and non-linear effects, as well as to pare down inputs automatically, without user direction, is a notable advantage over MEMs, for which a vast and unwieldy number of model variations could be considered for each model, with limited methods of automating such decisions. These qualities make RNNs a potentially powerful tool for analyzing EMA data, but particularly for time sensitive events that may require urgent clinical intervention such as suicidal ideation (SI) ratings. Prior research into SI collected using EMA has found that SI can vary widely both between and within individuals, with rapid severity changes possible in short periods of time [Gratch et al., 2020]. Moreover, SI has a complex range of presentations which can have many determinants and can have multi-step development over time [Carretero et al., 2020; Howarth et al., 2020]. Prior EMA research has found that interpersonal stress can be an important factor in precipitating suicidal thoughts and behaviors, and that coping strategies may differ in how effective they are at reducing EMA-reported SI [Stanley et al., 2021; Stewart et al., 2019]. As longitudinally-measured SI scores are often skewed in distribution, with low severity ideation often predominating punctuated with short periods of high severity, non-linear associations with predictors may be important considerations.
Use of machine learning analyses such as RNNs to analyze EMA data has been relatively uncommon thus far. Examples of these algorithms include network analysis, random forest models, component-wise gradient boosting, and boosted tree models [Barrigón et al., 2017; Rath et al., 2019]. Prior studies using RNNs to model EMA data include modeling daily mood ratings, end-of-day stress and binary daily SI [Mikus et al., 2017; Peis et al., 2019; Rozet et al., 2019]. In each of these cases, all EMA variables used in the RNNs, both for the outcomes and for the predictors, were aggregated to the daily level rather than allowing for within-day separate measures. The datasets for these prior studies either did not have repeated measures per subject within individual days or did not utilize these measures as separate observations. In the present study, all EMA observations, including within-day observations, were incorporated into RNNs.
The present study aims to add to the current state of knowledge in several ways. First, we attempt to demonstrate the use of RNNs to model suicidal ideation using EMA data with multiple observations per subject per day in a high suicide risk sample of individuals with borderline personality disorder (BPD). For this study, we predict future EMA-collected SI from RNNs trained on EMA data from the same subjects from earlier in a week-long period, using MEMs to provide a benchmark for evaluating the success of these RNN predictions since MEMs are a commonly used method of analysis for such data [Schwarz and Stone, 1998]. We hypothesize that RNNs would outperform MEMs by capturing more of the within-subject information which would be unexplained by MEMs. Second, because there is no gold standard metric for comparing accuracy, particularly for longitudinal data like EMA, we compare different metrics for assessing the prediction accuracy. Third, we examine whether some observations are more amenable to one prediction method than the other, and attempt to characterize those within-subject observations for which RNNs exhibited superior prediction over MEM, and those for which MEM predictions were superior. Lastly, as a post-hoc analysis, we attempt to demonstrate three ways of improving RNN predictions, through censoring the rare, extreme values on the upper range of SI, and through re-parameterization of the input and output features.
II. Methods
Sample.
As part of the baseline assessment in a larger clinical trial (trial registration information: “Treating Suicidal Behavior and Self-Mutilation in People With Borderline Personality Disorder”, # NCT00533117), examining treatment of borderline personality disorder (BPD) and prior suicidal behavior, eighty-four participants with BPD participated in one week of EMA data collection, though 4 subjects were excluded due to too little EMA data (N=80). Eligibility criteria for the study included a diagnosis of borderline personality disorder and a history of either suicidal behavior or non-suicidal self-injury (at least one episode in the past 6 months and another within the past 2 years). Participants with bipolar I disorder, psychotic disorder, mental retardation, or an acute condition that required priority treatment, such as anorexia nervosa or severe substance dependence, were excluded. No comorbid Axis II diagnoses were excluded [Chaudhury et al., 2017].
Measures.
Predictor measures.
Participants were prompted six times per day to complete a series of approximately 90 questions using mobile devices. Prompts occurred randomly during a 12-hr period in a given day, though never within half an hour of each other. The 12-hour period was selected by each participant, at the start of the study, to fit within their daily schedules. Participants could, however, ignore prompts or provide responses anytime without a prompt if they wished, and so could, and did, provide less or even more than 6 responses each day – there were 51 days, across 37 participants, when a participant responded more than the expected 6 times, with a maximum of 12 responses in one day. Overall, though, participants provided an average of 29 responses over the 7 days of EMA data collection, or 69% of the anticipated number of 42 prompts. An SI scale based on a subset of items from the Beck Scale for Suicidal Ideation (9 questions with Likert item responses; 0=very slightly or not at all, 1= a little, 2=moderately, 3=quite a bit, 4=extremely); total score range: 0–36) was included in these 90 questions. Questions also assessed stressful events (9 binary yes/no items), stress coping strategies (7 Likert items; 0=very slightly or not at all, 1= a little, 2=moderately, 3=quite a bit, 4=extremely); and 40 questions from the Profile of Mood States questionnaire (Likert items; 0=very slightly or not at all, 1= a little, 2=moderately, 3=quite a bit, 4=extremely), and asked about any suicidal or self-harm behaviors undertaken by the subjects. All questions (accounting for skip patterns) were presented in the same order each time a participant responded to a prompt. Table 1 contains lists of the SI, stressful events, coping strategies, affect items, and suicidal behavior questions asked each time. Ideation and affect questions were asked in reference to the past 15 minutes to assess their recent state of mind, while behaviors were asked about for the period since the last prompt.
Table 1.
EMA Items
| Question Set: | Items: |
|---|---|
| Suicidal Ideation | In the past 15 minutes, how strongly have you felt or experienced the following: 1. A wish to live, 2. A wish to die, 3. A wish to escape, 4. Thoughts about dying, 5. Thoughts about suicide, 6. Urge to commit suicide, 7. Thoughts about hurting yourself, 8. An urge to hurt yourself, 9. Like there were reasons for living |
| Stressful Events | Since the last prompt, have you: 1. Had a disagreement with someone, 2. Been rejected by someone, 3. Been complimented or praised by someone, 4. Been disappointed by someone, 5. Felt neglected by someone, 6. Experienced a loss of some sort, 7. Received good news, 8. Received bad news, 9. Been reminded of something painful from the past |
| Coping Strategies: | To what degree have you used the following strategies to manage any of the negative thoughts, feelings, or experiences you’ve had since the last prompt?: 1. Kept myself busy, 2. Socialized with others, 3. Focused on positive thoughts, 4. Did something good for myself, 5. Calmed myself down, 6. Tried to find perspective, 7. Sat with feelings until they passed |
| Affect: | In the past 15 minutes, how strongly have you felt the following: 1. Interested, 2. Distressed, 3. Frightened, 4. Excited, 5. Upset, 6. Shaky, 7. Strong, 8. Guilty, 9. Angry, 10. Enthusiastic, 11. Scared, 12. Hostile, 13. Proud, 14. Irritable, 15. Disgusted, 16. Alert, 17. Ashamed, 18. Scornful, 19. Inspired, 20. Nervous, 21. Loathing, 22. Determined, 23. Sad, 24. Confused, 25. Attentive, 26. Jittery, 27. Lonely, 28. Active, 29. Afraid, 30. Downhearted, 31. Good about yourself, 32. Alone, 33. Rejected, 34. Like you can cope, 35. Paralyzed, 36. Blue, 37. Numb, 38. Agitated, 39. Anxious, 40. Overwhelmed by feelings |
| Suicidal Behavior *: | Since the last prompt, have you done any of the following: 1. Cut yourself, 2. Scratched yourself, 3. Burned yourself, 4. Hit yourself, 5. Taken more medication than prescribed, 6. Taken more OTC medication than recommended, 7. Tried to jump of something or walked into traffic, 8. Tried to hang or choke yourself, 9. Ingested a poisonous substance |
Each of these items, if answered affirmatively, was followed by the question “Did you have any intent to die, no matter how small?”
Other variables were collected at baseline prior to the start of the EMA period. These data were used in SI modeling and included participant sex, any prior suicide attempt (yes/no), baseline Beck Depression Inventory (range: 0–63), Affective Lability Scale (range: 0–72), and MDD diagnosis (current/lifetime yes/no) (Beck et al., 1961; Harvey et al., 1989). Additionally, the Beck Scale for Suicidal Ideation (SSI), the Barrett Impulsivity Scale, Childhood Trauma Questionnaire, and the Hamilton Depression Ratings Scale (HDRS) were also measured at baseline (Beck et al., 1979; Patton et al., 1995; Hamilton, 1960).
Outcome measures.
The target of this study was prediction of EMA-measured suicidal ideation scores. While primary analyses focused on the score as a continuous measure over the full range of the scale (0–36), subsequent analysis additionally considered prediction of values over a censored range (all values greater than 20 were capped at 20, not removed). This was undertaken because higher values were relatively rare in the sample (<7%) and both MEM and RNN predictions were less accurate in predicting these observations, likely because of the relatively low base rate of these values. Moreover, as in a prior, unpublished analysis, a mixed-effects logistic regression model of the log-odds of self-injurious behavior using this dataset found that having an SI of 20 or greater was associated with 10 times the odds of self-injurious behavior (suicidal and non-suicidal) in the same epoch, compared to having an SI of less than 20 (OR=38.3, 95%CI=23.3–63.0), we considered SI values greater than 20 to indicate a clinically high degree of distress. As reported previously, variability in SI severity differed substantially between subjects, with some subjects staying within a narrow range of SI values, and others varying over the full or nearly full range of possible scores [Rizk et al., 2019]. The Root Mean Square of Successive Differences (RMSSD) for SI values were calculated for each subject as a measure of SI variability.
Data Analysis.
Prior to analysis, data was examined for validity and indications of non-compliance. Only observations with SI values were included in the analytic dataset. Initial EMA data cleaning identified four of 84 subjects did not provide a sufficient number of observations across the week (i.e. <10 total responses or <2 days out of the possible 42 responses across 7 days). These subjects were excluded from further analyses. Descriptive statistics for all baseline input features and aggregates of EMA variables were computed, and distributions were examined prior to analysis. Mean, median, and maximum SI severity over the EMA week were calculated for each subject, as was RMSSD.
Modeling EMA SI:
Following best practices, we split the data into training and testing sets and only the training set was used to fit the models described below. Model performance was then assessed using the testing set. Specifically, each subject’s data were divided into the first 80% of observations (training data), and the last 20% (testing data). This non-random split was consistent with the goal of predicting future SI based on participants’ past data. Cross-validation based on random splits of participants was also used, for tuning the model parameters, as described below.
Mixed-Effects models.
MEMs were fit using SAS Proc Glimmix. To establish comparison values for predictive performance, linear MEMs were fit with EMA SI as the outcome. Models featured random intercepts for subject and an AR(1) correlation structure for within-subject observations. Five different MEMs were fit to the training data. The first set of predictors included only baseline time-invariant variables (sex, prior suicide attempt, Beck Depression Inventory, Affective Lability Scale, and MDD diagnosis). The second model included time-varying predictors collected during the EMA. Affect items from the prior epoch were included only if the epoch was earlier the same day. Thus, this model included all EMA-measured variables in Table 1, as well as time variables (time from EMA period start, time from last epoch, time of day (categorized into 6-hour periods), and weekday/weekend). The third model included both the time-invariant and the time-varying predictor variables from the first two models. Parameter estimates from the MEMs fitted to the training data were then used to make predictions for SI in the testing dataset. As a sensitivity analysis, to see if MEM modifications could lead to better predictions, two additional MEMs were fit. First, a least absolute shrinkage and selection operator (LASSO) procedure was implemented to reduce the number of predictors in the full MEM model to only those most important; then, random slopes were added to the full model for the 4 most significant predictors (Tibshirani, 1996). Predicted values were produced for both models. Power analysis was performed for both between-subject and within-subject factors using the “longpower” package in R and G*Power 3.1, and assuming a moderate 0.5 correlation for within-subject observations (based on estimation from the data). For a single binary between-subject predictor (i.e. baseline predictor) in the mixed-effects models, analysis indicated >80% power to detect small effect sizes of Cohen’s d=0.22 in the training data, and d=0.25 in the testing dataset. For a within-subject (longitudinal) predictor, the minimal detectable effect size with 80% power was Cohen’s f=0.08, corresponding to d=0.16 in the training data, and f=0.10, corresponding to d=0.20, in the testing data.
RNN analysis.
RNNs were fit as Long Short-Term Memory models, using the Keras and TensorFlow libraries in R (Hochreiter and Schmidhuber, 1997; Abadi et al, 2015). Prior to training the RNNs on the complete training data, hyperparameters for the RNNs (including learning rate, number of algorithm iterations, and network structure) were tuned using 5-fold cross-validation by subject on the training dataset only. Hyperparameter tuning was performed separately for each of the first 3 sets of predictors mentioned above. Cross-validation was done by splitting subjects into five groups. Hyperparameters were systematically altered, and RNNs were trained on each combination of the 4 of the 5 subject groups and tested on the remaining group, using only the training data. The fit of the RNNs was then compared using the root mean squared error (RMSE), and the hyperparameters yielding the best RMSE were selected.
Using the selected hyperparameters, separate RNNs were trained for each of the same three main sets of predictors described for the MEMs. RNNs were trained on the full training data, and then tested on the unused testing data. Due to the stochastic nature of RNNs, networks are separately trained 20 times. Predictions for SI in the testing dataset were produced from the fitted RNNs, and averaged over the 20 replications.
Prediction accuracy metrics:
Using the averaged predictions from the RNNs, as well as the MEM predictions for each observation, several measures were computed to summarize the prediction accuracy for each method. RMSE and mean absolute percent error (MAPE) are common methods of assessing prediction error, producing an estimate of the average prediction error in raw or percent terms across all observation estimates. For the MAPE calculation, percent error was calculated as (observed-predicted)/(observed+1), to account for the observed zero values. RMSEs were computed for each subject separately in order to explore the between-subject variability in prediction accuracy [Zhou et al, 2018].
A pseudo-R2 estimate was calculated to describe the proportion of within-subject SI variance explained by the model. Average semi-variance (ASV) was calculated for SI, and for the MEM and RNN prediction errors. Semi-variance was calculated as the average of the squared difference between each pair of observations for an individual subject [Piepho, 2019]. The semi-variances were computed and then averaged across subjects, and the ASV estimates were used to produce pseudo-R2 values, calculated as the square of the ratio with the covariance between the predicted and observed values as the numerator, and the product of the variances of the predicted and observed values as the denominator.
To help to assess further the usefulness of the MEM and RNN predictions, we additionally dichotomized the EMA SI outcome at several values (≥8, ≥10, ≥15, ≥20), and computed the area under the Receiver-Operator Curve (AUC) for the MEM and RNN predictions associations with the dichotomized outcomes.
Methods for examining individual prediction differences.
Individual RMSEs were computed for each subject, for the full RNN and MEM models, and examined for distribution across subjects. The correlation between subjects’ RNN and MEM RMSEs was also computed. Individual RMSEs for the full models were modelled on baseline variables adjusted for mean observed SI in the testing data, to determine if any subject characteristics were associated with greater error. Differences scores, subtracting the RMSE for RNN predictions from the RMSE for MEM, were computed for each subject, as a measure of the degree to which the fit favored one or the other method for each individual subject. These differences were then modelled on baseline variables, to see if any baseline variables were associated with better prediction by RNN or by MEM.
The absolute values of RNN and MEM prediction errors were modelled separately using MEMs with random intercepts, with time since the start of the EMA period as the predictor, to determine if later observations were more poorly predicted than earlier ones (occurring closer to training data), and with observed SI as the predictor, to determine if higher values were less accurately predicted. Further, differences in absolute prediction error between the RNN and MEM predictions were computed for each predicted observation of the testing data, and were modelled, adjusting for SI value, on variables utilized in the prediction models and several additional baseline variables (HDRS, Barratt Impulsivity Scale, SSI, Childhood Trauma Questionnaire), to assess the circumstances where RNNs outperformed MEMs, and visa-versa. Differences in error magnitude were additionally modelled on stressful events and coping strategy use, and then on interactions between prior timepoint SI and stressful events or coping strategy use.
Additional models considered.
Three approaches were taken to improve predictions in MEMs and RNNs. First, due to the sparsity of high SI scores (>20), models were re-fit using a modified SI scale where high values were censored at 20. Specifically, the form of the models remained the same, but the outcome was edited such that values >20 were fixed at 20. Second, EMA predictors for stressors, coping, and affect were aggregated over various selected time windows rather than using each separate epoch. Time windows were selected as: 0–6 hrs earlier, 6–12 hrs earlier, 12–20 hrs earlier, and 20–24 hrs earlier. For each time window, variables were aggregated for any instances of each of the 9 stressors, for the average amount of use reported for each of the 7 coping strategies, and for the average rating for each of the 41 affect items. These predictors were substituted for the epoch-based EMA predictors and the RNNs and MEMs were re-fit. The rationale for these models was to investigate if SI were best modeled less as the immediate circumstances, but rather as longer-term effects.
In the last methodological improvement approach, the EMA SI outcome was manipulated by decomposing SI into three outcome variables. One variable is the censored SI scale described above. Another quantifies the SI level above 20 (i.e. 21–36). The last is a binary indicator of whether the SI value is above 20. The three outcome variables were then modelled simultaneously in the same RNN. Predictions from the trained RNN then produced three variables for each observation, which were then recombined to produce a single set of SI predictions. The rationale for this approach was to allow the RNNs to focus more on modelling the higher values, which can be overlooked due to infrequency, while maintaining the prediction for lower SI values. No equivalent method was attempted for the MEMs.
III. Results
The 80 subjects in the analytic sample provided an average of 28.9 responses (SD=8.6, range 10–58, total=2314) over an average of 6.1 days (SD=1.0, range 2.1–9.9), during the baseline EMA week. Average mean SI was 8.5 (SD=5.8), while average median SI and average maximum SI per subject were 7.5 (SD=5.8) and 19.7 (SD=9.8), respectively. Very little missingness (<1%) was present for any of the predictors used in analysis. Subjects reported a stressful event in 73% of the epochs, with an average of 2.1 (SD=1.10) stressful events reported per epoch.
Time-invariant baseline predictors model:
Predictions made from MEMs fit with only baseline variables had an RMSE of 5.31, indicating predicted values deviated from observed SI by around 5 points on average. MAPE for these predictions was 98%, indicating predicted values deviated from observed SI by about 98% on average, largely because errors in predicting the smaller values can produce large percent errors. Further, pseudo-R2 using average semi-variance was <1%, indicating very little of the variance in SI was explained by the predictors included in the model (Table 2). In contrast, for the RNNs, predictions had an RMSE of 4.89, a MAPE of 64%, and a pseudo-R2 of 8%. Each of these indictors suggests better prediction fit (smaller error from the observed SI) for the RNN predictions as compared with MEM results.
Table 2.
Full Scale SI Prediction Results
| MEM | RNN | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy Metric | Baseline Predictors Only | EMA Predictors Only | Full Model |
LASSO-Reduced Model | Full Model with Random Intercepts | Baseline Predictors Only | EMA Predictors Only | Full Model |
| RMSE (smaller is better) | 5.31 | 4.11 | 3.84 | 3.91 | 4.20 | 4.89 | 3.73 | 3.41 |
| MAPE (smaller is better) | 98% | 58% | 56% | 63% | 70% | 64% | 57% | 42% |
| Pseudo-R2 (larger is better) | <1% | 6% | 16% | 17% | 7% | 8% | 19% | 26% |
Time-varying EMA predictors models:
For models with stressful events, coping strategy use, and prior timepoint state affect items, as well as time variable, predictions are improved for both MEMs and RNNs over the baseline variable model. RMSE and MAPE indicated that predictions were off by 4.11 points and 58% on average, respectively, and pseudo-R2 indicated that 6% of the variance in observed SI was explained, for MEM prediction, while for RNNs, RMSE=3.73, MAPE=57%, and pseudo-R2=19%. While these metrics indicate a better fit than for the baseline variable models for both MEM and RNN, again the RNNs had closer predictions, except for MAPE which was similar.
Time invariant baseline and time-varying EMA predictors models:
For models with all variables, MEMs and RNNs predictions were superior to the previous two models. RMSE and MAPE were 3.84 and 56%, respectively, and pseudo-R2 was 16% for MEMs, while for RNNs, RMSE=3.41, MAPE=42%, and pseudo-R2=26%. Again, the RNNs predictions fit the observed data better. For the additional MEM using a LASSO to reduce the number of predictors, RMSE and MAPE were both slightly larger, compared to the MEM with all predictors, though the pseudo-R2 was slightly improved (RMSE=3.91, MAPE=63%, pseudo-R2=17%). On the other hand, the additional MEM adding random slopes for the 4 strongest predictors (Disagreement, Feeling Neglected, Painful Reminder, Focus on Positive Thoughts), all accuracy metrics were worse, compared to the model with all predictors (RMSE=4.20, MAPE=70%, pseudo-R2=7%).
Contextualizing the accuracy of MEM and RNN models:
While the primary aim of this analysis was to compare the accuracies of RNN and MEM predictions, it is helpful to contextualize how well either of the methods are doing in predicting later EMA SI. Using heuristics set out by Cohen in 1988, we can interpret the full MEM pseudo-R2 of 16% as a moderately large effect size, and the full RNN pseudo-R2 of 26% as a large effect size. For the RMSEs of 3.84 and 3.41, we can compare these values to the RMSSD for the EMA SI of 5.14, indicating that the average prediction error is substantially less than the average variation in EMA SI from one observation to the next. Finally, AUCs for the MEM predictions’ associations with dichotomized EMA SI outcomes ranged from 85% to 89%, depending on the value at which the outcome was dichotomized, indicating a strong concordance, while the AUCs for the MEM predictions ranged from 92% to 95% indicating excellent concordance.
Comparing individual level prediction errors.
RMSEs for individual subjects showed no visual evidence of distinguishable groups of subjects for either MEMs or RNNs. However, there was a wide range in the individual prediction accuracies, spanning from 0.47 to 11.33 for MEMs and 0.63 to 12.13 for RNNs, with right-skewed distributions for both. Spearman’s correlation between MEM and RNN RMSE (r=0.68) indicated that subjects who were more difficult to predict with one method were largely more difficult for the other method.
MEM prediction error was found to be positively correlated with subjects’ Beck depression (b (se)=.05 (.02), p=0.0173), with SI for more depressed subjects predicted less well by MEMs. On the other hand, higher RNN prediction error was seen for subjects with lower baseline SSI (b (se)=−.07 (.03), p=0.0163) and for those with higher baseline Barratt Impulsivity Scale (b (se)=−.02 (.01), p=0.0398). Further, we calculated the difference between MEM and RNN RMSE for each subject, to explore the characteristics which might make subjects more predictable by one method than another. The difference was found to be associated with MDD diagnosis, Beck Depression, and HDRS, such that RNN prediction had greater superiority over MEM prediction for subjects with MDD diagnosis (b(se)=0.92(0.38), p=0.0155), or higher subjective (HDRS: b(se)=0.06(0.03), p=0.0201) or objective depression (Beck Depression Inventory: b(se)=0.05(0.02), p=0.0035) at baseline. No significant associations were found for sex, prior suicide attempt, or Affective Lability Scale.
Modeling prediction errors revealed no significant time trend, as errors did not increase for later observations (MEM: b (se)=−0.04(0.08), p=0.6121; RNN: b(se)=−0.13(0.08), p=0.0876).
To describe which observations were better predicted using RNNs, and thus which observations drove the superior RNN prediction accuracy, differences in the absolute value of the errors for the two prediction methods for each observation were modelled, adjusted for observed SI value. Several variables were significantly associated with superior RNN prediction. Specifically, the difference between MEM and RNN prediction error magnitude was greater for observations where the subject was having a disagreement or using more of doing something good for self or finding perspective coping strategies. Additionally, high SI values (SI>20) were particularly poorly predicted by MEMs relative to RNNs. However, visual inspection shows prediction errors were substantially worse for observed SI values >20 for both methods (Fig 2). Furthermore, since more complex development of SI was theorized to be something RNN would be better suited to model, we explored whether differences in absolute error magnitude between MEM and RNN predictions might be associated with the interaction of stressful events or coping strategy use with prior timepoint SI. Several of these interactions were found to have significant effects on the error differences between RNN and MEM prediction, including stressful events of rejection, disappointment, neglect, and painful reminders, and coping strategy use of socializing, positive thoughts, or calming self. In prior analyses, each of these stressors were found to be associated with higher levels of SI, while socializing and positive thoughts (but not calming self) were found to be associated with lower levels of SI [Chaudhury et al., 2017; Stanley et al., 2021].
Figure 2.
Scatterplots of SI vs MEM and RNN predictions showing failure to predict higher SI values by MEMs, with better prediction for RNNs.
Model variants and extensions to improve prediction of suicidal ideation.
Censored SI scale models.
To address the problem identified for predictions for larger SI values, the MEM and RNN models using full variable sets were refit for SI censored at 20 (all observed values > 20 were recoded to be equal to 20). As expected, predictions on this censored scale were improved for both MEM (RMSE=3.66, MAPE=39%, pseudo-R2=20%) and RNN (RMSE=3.13, MAPE=34%, pseudo-R2=35%) results (Table 3).
TABLE 3.
Censored Scale SI Results
| Accuracy Metric | MEM | RNN |
|---|---|---|
| RMSE | 3.66 | 3.13 |
| MAPE | 39% | 34% |
| Pseudo-R 2 | 20% | 35% |
Collapsing across EMA prompts.
The models fit up to this point used every available prompt as a separate piece of information. Prior work analyzing EMA data often collapses across observations. To investigate whether there is any possible advantage for this data reduction approach, perhaps due to elimination of measurement noise, the MEM and RNN models were re-fit using collapsed timespan-based EMA predictors in place of the epoch-based EMA predictors. For MEMs, the timespan for the EMA predictors which produced the most accurate results was the 6–12hr span. The MEM using this timespan produced predictions with an RMSE of 4.81. For RNNs, the timespan for the EMA predictors which produced the most accurate results was the 0–2hr span (Supplemental Table 1). The RNN using this timespan produced predictions with an RMSE of 4.72. As these prediction metrics are worse than those produced by the models using epoch-based predictors, timespan-based predictors were not pursued further.
Modeling the EMA SI outcome in three parts.
As described above, SI was further re-parameterized into a three-part variable, in which one variable was a censored scale from 0 to 20, another was a censored scale from 21–36, and the last was an indicator for SI being >20. RNNs were fit for this re-parameterized SI, including the epoch-based EMA variables as input features. No equivalent MEM was attempted. Predictions made using this RNN had an RMSE of 3.29, a MAPE of 34%, and a pseudo-R2 of 38%. These metrics represent modest improvements over the RNN predictions using the SI score with no transformations, though particularly the predictions of this model have substantially better RMSE for SI values of 20 and higher (3.93 vs 5.13).
IV. Discussion
In the present study, we found RNN predictions outperformed MEM predictions of SI using several accuracy metrics, especially when the full set of time-varying and baseline predictors was incorporated. Using RMSE to quantify prediction error, our results indicate that, on average, RNN predictions are about half a point more accurate than predictions from MEMs. While this may seem a modest difference for a 36-point scale, it can substantially affect classification errors when using ideation cutoffs. For a predicted SI score split into high vs. low values with a cut-off of 8 (the 3rd quartile of all SI values), the MEM predictions would misclassify an additional 7% of observations, compared to RNN predictions (24% vs 17% misclassifications relative to the observed values), which could substantially bias analysis [Flegal et al., 1991].
Despite the improvements in predictive performance, RNNs have some limitations due to the relative opaqueness of the RNN modeling process. It remains challenging to understand the specific reasons why the RNN predictions were superior to MEMs. Still, some ways in which RNNs could be better equipped to model determinants of EMA SI can be proposed. Since RNNs are not constrained by distribution form (as MEMs are through assumption of normally distributed errors), it is likely that RNNs are better equipped to model non-normally distributed outcomes. Additionally, since RNNs are specifically designed for time series data, they allow for predictors to have effects from multiple prior epochs and for prior “states” to be integrated in predictions. This could allow RNNs to capture multifaceted temporal effects, which would be important as some determinants of SI have been found to have immediate or short-term effects, while others affect SI over hours or even days [Ben-Zeev et al., 2012]. This may help to explain why RNN predictions have better accuracy even when only baseline variables are used as predictors, since the MEMs produce only constant SI estimates in this case, whereas the RNN predictions will vary over time.
Another factor contributing to the better performance of the RNN’s over MEM may be that predictors may have non-linear effects on SI, which would be in-keeping with the nonlinear progression of SI generally [Bryan et al., 2019]. While these could be modeled using MEMs, without a priori knowledge of the transformations, including them may be unpractical. Similarly, any number of interaction effects between predictors could influence SI predictions. There may be some support for this supposition in our exploration of the factors associated with differences in prediction error magnitude between the two methods, as even among the relatively small list of interactions examined, several were found to have substantial influence. Again, these could be incorporated in MEMs, but would have to be prespecified, and exploring all possible interactions could be prohibitive. Interactions are incorporated implicitly (through data-driven learning) in the RNN. Similarly, many additional modifications to the MEMs could theoretically help to improve predictions, such as adding different random effects or re-parameterizing predictors, but the sheer numerousness of possibilities, and a lack of methods to easily automate the decision making, make these modifications difficult to undertake.
Interaction effects could be an important reason why RNNs were particularly more accurate in predicting higher observed SI values, compared to MEMs. Several theories of progression into higher SI and into suicidal actions suggest a confluence of multiple factors as the key driver of worsening conditions, which could be modelled with complex interactions between predictors [Klonsky et al., 2017]. Prior research has also found that active SI, compared to passive SI, may require more complex determinants to explain, and that subjects with higher SI variability, likely requiring more complex modeling to predict, can have distinctive neurobiological and clinical characteristics [Forkmann et al., 2018; Herzog et al., 2023].
While re-parameterizing the EMA predictors by collapsing across time did not lead to improved prediction accuracy in RNN, re-parameterizing the SI outcome did produce more accurate results. That is, rather than treating SI as single continuous variable outcome, using a 3-part outcome incorporating thresholds worked better. This may not necessarily be the optimal formulation for SI modelling – a comprehensive exploration of different ways to parameterize SI as an output signal of RNNs was not undertaken, and the use of 20 as the cutoff was somewhat arbitrary – and the success of the RNNs’ predictive abilities, as with any machine learning method, can greatly depend on how the outcome is formulated and modelled [Bengio et al., 2013]. However, this formulation may allow for more precise predictions for less common outcome values and may open the door to incurring greater efficiency in RNN predictions, particularly for non-normally distributed outcomes.
One interesting avenue of investigation that warrants further work is the finding that subjects with baseline depression symptoms had SI that was better predicted by RNNs than by MEMs. This may in part be explained by RNNs being superior in predicting higher SI, which is more likely for depressed subjects [Peters et al., 2022]. However, it may also be that depressed subjects (or a subtype of depressed subject) have SI that varies in a way that MEMs have difficulty in capturing. There is a clinical need to have better prediction of suicidal ideation in more subjectively depressed patients since this is a higher risk group [Mann et al., 2021]. Similarly, we did find that certain time-varying characteristics distinguished between observations that were better predicted by RNNs and those that were not. While we do not comment here on the potential implications on the specific differences found, it is notable that the differential prediction accuracy is not uniform across observations, which implies an interplay between time-varying predictors and predictable elements of the data.
Several additional strengths to our analysis should be noted. Firstly, our EMA dataset allowed us to utilize a set of predictors which have been demonstrated as being among the most important for SI prediction [Czyz et al., 2023; Thompson et al., 2014]. It should also be noted that short-term fluctuations, like those examined here, have been identified as key for future efforts for suicide prevention [Glenn and Nock, 2014; Mann et al., 2021]. A strength of our approach is also that we used multiple metrics to measure the accuracy of predictions made from MEMs and RNNs. Though we acknowledge that this could also have resulted in inconsistent findings, the measures we did use were largely consistent with each other, and the comparison between MEMs and RNNs is not ambiguous in this case. However, partly due to the skewed distribution of the outcome, and the temporal interdependence of prediction errors within subjects, it would be difficult to settle on a comprehensive concept of “accuracy”, which naturally diminishes the clarity of inferences drawn from the results [Armstrong and Collopy, 1992; De Gooijer and Hyndman, 2006].
There are of course limitations to these findings. Firstly, the sample size used for these analyses is rather smaller than many other machine learning explorations [Balki et al., 2019; Baum and Haussler, 1989], and, although other studies employing RNNs utilized datasets of comparable size to that used in this analysis (3,000–7,000 total observations) [Rozet et al, 2019; Peis et al., 2019], we are not aware of any established guidelines for the sample size requirements for optimal machine learning efficiency. Nonetheless, it is notable that the prediction accuracy of the RNNs was reasonable, and superior to MEM predictions, even with this limitation. However, it can’t be ignored that a larger sample – either more subjects or more observations per subject – likely would have allowed even better performance for both methods. Also, the skewed distribution of SI may have exacerbated the problem of small sample size, since our sample contains rather few instances of high SI values above 20. As noted above, the reparameterization of SI, while based on prior analysis looking at suicidal behavior rates for higher SI, was somewhat arbitrary, and does not necessarily represent the optimal formulation of the variable. It also should be noted that there are numerous variables, such as those describing substance use or medications, which were not included in this analysis, but could be highly important in understanding and predicting SI.
Additionally, this sample is restricted to a particular population of patients with BPD participating in a specific EMA study, for which interpersonal variables known to be related to BPD suicidality were collected [Brodsky et al., 2006]. Generalizability, therefore, from these results may be limited both in terms of generalizing for SI in other clinical and non-clinical populations, as well as for generalizing to other outcomes collected using EMA techniques. RNNs are also limited by the opacity of how input features influence predictions, making estimations of predictor importance and assessments of causal inference complex, as compared with MEMs. When the goal of analysis is to make inferences regarding the effects of predictors on a particular outcome (as opposed to prediction), MEMs would more readily provide such insights. Further work on this topic would be an important avenue of research to augment the value of using RNNs in EMA data analysis.
Supplementary Material
Figure 1.
Individual subject examples of MEM and RNN predictions over time, chosen by high and low RMSE.
Highlights.
Recurrent Neural Networks were found to have superior predictive ability for Ecological Momentary Assessment-measured Suicidal Ideation compared to Mixed-Effects Models
Both methods were less accurate in the higher range of Suicidal Ideation values
However, Recurrent Neural Networks showed significantly better accuracy for high Suicidal Ideation values than Mixed-Effects Models, and for subjects with baseline depression or depression symptoms
Proposed extensions of Recurrent Neural Network modeling further improved prediction accuracy in the high ideation range
Acknowledgments:
This research was supported in part by grants from the National Institute of Mental Health: R01 MH61017 (PI: B. Stanley), R01 MH062665 (PI: B. Stanley)
Funding:
This research was supported by grants from the National Institute of Mental Health: R01 MH61017 (PI: B. Stanley), R01 MH062665 (PI: B. Stanley)
Dr. Galfalvy discloses that she and her family own stock in IBM Inc. Dr. Mann has received grant support from the National Institute of Mental Health, and royalties from the New York State Research Foundation for Mental Hygiene for commercial use of the Colombia Suicide Severity Rating Scale. Dr Wall also reports grants from the National Institute of Health. Dr. Stanley is being included as an author posthumously.
Footnotes
Authorship statement:
Drs. Stanley and Brodsky conceived and planned the study. Dr. Galfalvy and Mr. Choo planned and carried out the analyses; Drs. Stanley, Mann, and Galfalvy and Mr. Choo interpreted the findings. All authors, except Dr. Stanley (posthumous), contributed to the writing of the manuscript and to the presentation and understanding of the results.
Conflicts of Interest:
Dr. Herzog, Dr. Brodsky, and Mr. Choo report no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- [1].Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jozefowicz R, Jia Y, Kaiser L, Kudlur M, Levenberg J, Mané D, Schuster M, Monga R, Moore S, Murray D, Olah C, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, & Zheng X. TensorFlow: Large-scale machine learning on heterogeneous systems [Software] (2015). Available from https://www.tensorflow.org/ [Google Scholar]
- [2].Armstrong JS, Collopy F. Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting. 1992;8(1):69–80. [Google Scholar]
- [3].Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, … & Tyrrell PN Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Canadian Association of Radiologists Journal 2019, 70(4), 344–353. [DOI] [PubMed] [Google Scholar]
- [4].Barrigón ML, Berrouiguet S, Carballo JJ, Bonal-Giménez C, Fernández-Navarro P, Pfang B, Delgado-Gómez D, Courtet P, Aroca F, Lopez-Castroman J, Artés-Rodríguez A, Baca-García E; MEmind study group. User profiles of an electronic mental health tool for ecological momentary assessment: MEmind. Int J Methods Psychiatr Res. 2017. Mar;26(1):e1554. doi: 10.1002/mpr.1554. Epub 2017 Mar 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Baum EB, & Haussler D. What size net gives valid generalization?. Neural computation 1989, 1(1), 151–160. [Google Scholar]
- [6].Beck AT, Ward CH, Mendelson M, Mock J, & Erbaugh J. An inventory for measuring depression. Archives of General Psychiatry (1961), 4(6), 561–571. [DOI] [PubMed] [Google Scholar]
- [7].Beck AT, Kovacs M, Weissman A. Assessment of suicidal intention: The Scale for Suicide Ideation. Journal of Consulting and Clinical Psychology (1979), 47(2), 343–352 [DOI] [PubMed] [Google Scholar]
- [8].Ben-Zeev D, Young MA, Depp CA. Real-time predictors of suicidal ideation: mobile assessment of hospitalized depressed patients. Psychiatry Res. 2012. May 15;197(1–2):55–9. doi: 10.1016/j.psychres.2011.11.025. Epub 2012 Mar 6. [DOI] [PubMed] [Google Scholar]
- [9].Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013. Aug;35(8):1798–828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]
- [10].Borrett DS, Yeap TH, Kwan HC. Neural networks and Parkinson’s disease. Can J Neurol Sci. 1993. May;20(2):107–13. doi: 10.1017/s0317167100047648. [DOI] [PubMed] [Google Scholar]
- [11].Brodsky BS, Groves SA, Oquendo MA, Mann JJ, Stanley B. Interpersonal precipitants and suicide attempts in borderline personality disorder. Suicide Life Threat Behav. 2006. Jun;36(3):313–22. doi: 10.1521/suli.2006.36.3.313. [DOI] [PubMed] [Google Scholar]
- [12].Bryan CJ, Rozek DC, Butner J, Rudd MD. Patterns of change in suicide ideation signal the recurrence of suicide attempts among high-risk psychiatric outpatients. Behav Res Ther. 2019. Sep;120:103392. doi: 10.1016/j.brat.2019.04.001. Epub 2019 Apr 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Carretero P, Campana-Montes JJ, Artes-Rodriguez A, Ecological Momentary Assessment for Monitoring Risk of Suicide Behavior, Curr Top Behav Neurosci. 2020;46:229–245. doi: 10.1007/7854_2020_170. [DOI] [PubMed] [Google Scholar]
- [14].Chaudhury SR, Galfalvy H, Biggs E, Choo TH, Mann JJ, Stanley B. Affect in response to stressors and coping strategies: an ecological momentary assessment study of borderline personality disorder. Borderline Personal Disord Emot Dysregul. 2017. May 21;4:8. doi: 10.1186/s40479-017-0059-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Cohen J, Statistical Power Analysis for the Behavioral Sciences, 2nd Ed (1988). Hillsdale, NJ: Laurence Erlbaum Associates [Google Scholar]
- [16].Czyz EK, King CA, Al-Dajani N, Zimmermann L, Hong V, Nahum-Shani I. Ecological Momentary Assessments and Passive Sensing in the Prediction of Short-Term Suicidal Ideation in Young Adults. JAMA Netw Open. 2023. Aug 1;6(8):e2328005. doi: 10.1001/jamanetworkopen.2023.28005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].De Gooijer JG, Hyndman RJ. 25 Years of Time Series Forecasting. International Journal of Forecasting. 2006;22(3):443–4 [Google Scholar]
- [18].Flegal KM, Keyl PM, Nieto FJ. Differential misclassification arising from nondifferential errors in exposure measurement. Am J Epidemiol. 1991. Nov 15;134(10):1233–44. doi: 10.1093/oxfordjournals.aje.a116026. [DOI] [PubMed] [Google Scholar]
- [19].Forkmann T, Spangenberg L, Rath D, Hallensleben N, Hegerl U, Kersting A, Glaesmer H. Assessing suicidality in real time: A psychometric evaluation of self-report items for the assessment of suicidal ideation and its proximal risk factors using ecological momentary assessments. J Abnorm Psychol. 2018. Nov;127(8):758–769. doi: 10.1037/abn0000381. Epub 2018 Oct 8. [DOI] [PubMed] [Google Scholar]
- [20].Glenn CR, Nock MK. Improving the short-term prediction of suicidal behavior. Am J Prev Med. 2014. Sep;47(3 Suppl 2):S176–80. doi: 10.1016/j.amepre.2014.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Gratch I, Choo T, Galfalvy H, Keilp J, Itzhaky L, Mann JJ, Oquendo M, Stanley B, Detecting suicidal thoughts: The power of ecological momentary assessment, Depress Anxiety. 2021. Jan;38(1):8–16. doi: 10.1002/da.23043. Epub 2020 May 22. [DOI] [PubMed] [Google Scholar]
- [22].Hamilton M. A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry (1960), 23(1), 56–62 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Harvey PD, Greenberg BR, & Serper MR. The affective lability scales: Development, reliability, and validity. Journal of Clinical Psychology (1989), 45(5), 786–793 [DOI] [PubMed] [Google Scholar]
- [24].Herzog S, Keilp JG, Galfalvy H, Mann JJ, Stanley BH. Attentional control deficits and suicidal ideation variability: An ecological momentary assessment study in major depression. J Affect Disord. 2023. Feb 15;323:819–825. doi: 10.1016/j.jad.2022.12.053. Epub 2022 Dec 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Hochreiter S, Schmidhuber J, Long short-term memory. Neural computation (1997), 9(8), 1735–1780. [DOI] [PubMed] [Google Scholar]
- [26].Howarth EJ, O’Connor DB, Panagioti M, Hodkinson A, Wilding S, Johnson J, Are stressful life events prospectively associated with increased suicidal ideation and behaviour? A systematic review and meta-analysis, Journal of Affective Disorders, Vol 266, 2020, p731–742, ISSN 0165–0327, 10.1016/j.jad.2020.01.171. [DOI] [PubMed] [Google Scholar]
- [27].Klonsky ED, Saffer BY, Bryan CJ. Ideation-to-action theories of suicide: a conceptual and empirical update. Curr Opin Psychol. 2018. Aug;22:38–43. doi: 10.1016/j.copsyc.2017.07.020. Epub 2017 Jul 24. [DOI] [PubMed] [Google Scholar]
- [28].Koppe G, Guloksuz S, Reininghaus U, Durstewitz D. Recurrent Neural Networks in Mobile Sampling and Intervention. Schizophr Bull. 2019. Mar 7;45(2):272–276. doi: 10.1093/schbul/sby171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Kriegeskorte N, Golan T, Neural network models and deep learning, Current Biology. Volume 29, Issue 7, 1 April 2019, Pages R231–R236 [DOI] [PubMed] [Google Scholar]
- [30].Mann JJ, Michel CA, Auerbach RP. Improving Suicide Prevention Through Evidence-Based Strategies: A Systematic Review. Am J Psychiatry. 2021. Jul;178(7):611–624. doi: 10.1176/appi.ajp.2020.20060864. Epub 2021 Feb 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Mikus A, Hoogendoorn M, Rocha A, Gama J, Ruwaard J, Riper H. Predicting short term mood developments among depressed patients using adherence and ecological momentary assessment data. Internet Interv. 2017. Oct 7;12:105–110. doi: 10.1016/j.invent.2017.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Oquendo M, Galfalvy H, Choo T, Kandlur R, Burke A, Sublette ME, Miller J, Mann JJ, Stanley B, Highly variable suicidal ideation: a phenotypic marker for stress induced suicide risk, Mol Psychiatry. 2021. Sep;26(9):5079–5086. doi: 10.1038/s41380-020-0819-0. Epub 2020 Jun 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Parker M, LeMay-Russell S, Schvey N, Crosby R, Ramirez E, Kelly N, Shank L, Byrne M, Engel S, Swanson T, Djan K, Kwarteng E, Faulkner L, Zenno A, Brady S, Yanovski S, Tanofsky-Kraff M, Yanovski J. Associations of sleep with food cravings and loss-of-control eating in youth: An ecological momentary assessment study, Pediatric Obseity, 08 September 2021. 10.1111/ijpo.12851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Patton JH, Stanford MS, & Barratt ES. Factor structure of the Barratt impulsiveness scale. Journal of Clinical Psychology (1995), 51(6), 768–774. [DOI] [PubMed] [Google Scholar]
- [35].Peis I, Olmos P, Vera-Varela C, Barrigon ML, Courtet P, Baca-Garcia E, Artes-Rodriguez A, Deep Sequential Models for Suicidal Ideation From Multiple Source Data, IEEE J Biomed Health Inform. 2019. Nov;23(6):2286–2293. doi: 10.1109/JBHI.2019.2919270. Epub 2019 May 27. [DOI] [PubMed] [Google Scholar]
- [36].Peters EM, Dong LY, Thomas T, Khalaj S, Balbuena L, Baetz M, Osgood N, Bowen R. Instability of Suicidal Ideation in Patients Hospitalized for Depression: An Exploratory Study Using Smartphone Ecological Momentary Assessment. Arch Suicide Res. 2022. Jan-Mar;26(1):56–69. doi: 10.1080/13811118.2020.1783410. Epub 2020 Jul 11. [DOI] [PubMed] [Google Scholar]
- [37].Piepho HP, 2019, A coefficient of determination (R2) for generalized linear mixed models, Biometrical Journal. 08 April 2019. 10.1002/bimj.201800270 [DOI] [PubMed] [Google Scholar]
- [38].Rath D, de Beurs D, Hallensleben N, Spangenberg L, Glaesmer H, Forkmann T. Modelling suicide ideation from beep to beep: Application of network analysis to ecological momentary assessment data. Internet Interv. 2019. Nov 20;18:100292. doi: 10.1016/j.invent.2019.100292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Rizk MM, Choo TH, Galfalvy H, Biggs E, Brodsky BS, Oquendo MA, Mann JJ, Stanley B. Variability in Suicidal Ideation is Associated with Affective Instability in Suicide Attempters with Borderline Personality Disorder. Psychiatry. 2019. Summer;82(2):173–178. doi: 10.1080/00332747.2019.1600219. Epub 2019 Apr 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Rozet A, Kronish I, Schwartz J, Davidson K, Using Machine Learning to Derive Just-In-Time and Personalized Predictors of Stress: Observational Study Bridging the Gap Between Nomothetic and Ideographic Approaches, J Med Internet Res. 2019. Apr 26;21(4):e12910. doi: 10.2196/12910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Schwarz JE, Stone AA, Strategies for analyzing ecological momentary assessment data, Health Psychol. 1998. Jan;17(1):6–16. doi: 10.1037//0278-6133.17.1.6. [DOI] [PubMed] [Google Scholar]
- [42].Shiffman S, Conceptualizing Analyses of Ecological Momentary Assessment Data, Nicotine Tob Res. 2014. May; 16(Suppl 2): S76–S87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Stanley B, Martínez-Alés G, Gratcha I, Rizk M, Galfalvy H, Choo T, Mann JJ, Coping strategies that reduce suicidal ideation: An ecological momentary assessment study, Journal of Psychiatric Research, Volume 133, January 2021, Pages 32–37 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Stewart JG, Shields GS, Esposito EC, Cosby EA, Allen NB, Slavich GM, Auerbach RP. Life Stress and Suicide in Adolescents. J Abnorm Child Psychol. 2019. Oct;47(10):1707–1722. doi: 10.1007/s10802-019-00534-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Thompson WK, Gershon A, O’Hara R, Bernert RA, Depp CA. The prediction of study-emergent suicidal ideation in bipolar disorder: a pilot study using ecological momentary assessment data. Bipolar Disord. 2014. Nov;16(7):669–77. doi: 10.1111/bdi.12218. Epub 2014 Jun 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Tibshirani R, Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 1996, 58(1), 267–288 [Google Scholar]
- [47].Zfhou L, Zhao P, Wu D, Cheng C, Huang H. Time series model for forecasting the number of new admission inpatients. BMC Med Inform Decis Mak. 2018. Jun 15;18(1):39. doi: 10.1186/s12911-018-0616-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


