Using categorical data analyses in suicide research: Considering clinical utility and practicality

Sean M Mitchell; Ian Cero; Andrew K Littlefield; Sarah L Brown

doi:10.1111/sltb.12670

. Author manuscript; available in PMC: 2022 Feb 1.

Published in final edited form as: Suicide Life Threat Behav. 2021 Feb;51(1):76–87. doi: 10.1111/sltb.12670

Using categorical data analyses in suicide research: Considering clinical utility and practicality

Sean M Mitchell ^1,², Ian Cero ², Andrew K Littlefield ¹, Sarah L Brown ^1,³

PMCID: PMC7995491 NIHMSID: NIHMS1677209 PMID: 33624878

Abstract

Objective:

Categorical data analysis is relevant to suicide risk and prevention research that focuses on discrete outcomes (e.g., suicide attempt status). Unfortunately, results from these analyses are often misinterpreted and not presented in a clinically tangible manner. We aimed to address these issues and highlight the relevance and utility of categorical methods in suicide research and clinical assessment. Additionally, we introduce relevant basic machine learning methods concepts and address the distinct utility of the current methods.

Method:

We review relevant background concepts and pertinent issues with references to helpful resources. We also provide non-technical descriptions and tutorials of how to convey categorical statistical results (logistic regression, receiver operating characteristic [ROC] curves, area under the curve [AUC] statistics, clinical cutoff scores) for clinical context and more intuitive use.

Results:

We provide comprehensive examples, using simulated data, and interpret results. We also note important considerations for conducting and interpreting these analyses. We provide a walk-through demonstrating how to convert logistic regression estimates into predicted probability values, which is accompanied by Appendices demonstrating how to produce publication-ready figures in R and Microsoft Excel.

Conclusion:

Improving the translation of statistical estimates to practical, clinically tangible information may narrow the divide between research and clinical practice.

Keywords: Categorical methods, clinical utility, logistic regression, probability, tutorial

1 ∣. INTRODUCTION

Many outcomes of interest in suicidology can be thought of as discrete categories rather than continuous dimensions. For example, the binary classification of suicide outcomes, such as suicide attempts and suicide deaths, has been recently examined among studies utilizing a range of modern analytic tools designed for classification of discrete outcomes (see Burke, Ammerman, & Jacobucci, 2019). What is often missing in the literature, however, is a precise and clinically relevant translation of statistical coefficients from categorical data analyses into information that can be more easily understood and applied in clinical settings. The current article aims to provide a solution to this problem by presenting the information and tools necessary to clarify the meaning of estimates obtained from various classification models and reviewing their potential relevancy to clinical applications.

In this article, we provide an overview of foundational categorical data analysis methods, how to correctly interpret estimates from these models, and, most importantly, how to give the most clinically useful information possible from model-derived estimates. This overview and tutorial paper is intended for individuals with an understanding of regression and basic knowledge of logistic regression assumptions and implementation. We aim to provide a non-technical (equation-light) description and tutorial of these methods for interpretation and translation into clinical context. If readers are interested in greater detail regarding the underlying assumptions and mathematics of these types of models, Agresti (2013) is an excellent resource.

1.1 ∣. A note on machine learning (ML)

The recent success of machine learning (ML) models in enhancing classification accuracy for the prediction of discrete suicide-related outcomes (see Burke et al., 2019) has made these approaches popular in suicide research. Although we do not provide a comprehensive overview of these various approaches (see Hastie, Tibshirani, & Friedman, 2009; Kuhn & Johnson, 2013), we emphasize to readers interested in ML techniques that the methods (e.g., logistic regression, Receiver Operating Characteristic [ROC] curve analyses) and concepts (e.g., sensitivity, specificity, positive predictive values, negative predictive values) described in this paper are foundational for models involving the classification of categorical outcomes. Indeed, texts about ML, such as Hastie et al. (2009) and Kuhn and Johnson (2013), provide dedicated coverage of some of the models (e.g., logistic regression) and terminology (e.g., ROC Curves) included in this manuscript. It is our hope that the current manuscript can be used as a helpful tool for readers who are interested in better digesting current work that utilizes ML within the field of suicidology or those who are seeking to implement these approaches in their own work.

2 ∣. REVIEW OF LOGISTIC REGRESSION AND ITS CLINICAL UTILITY

In this section, we provide an overview of binary logistic regression analyses with a focus on how to interpret findings from logistic regression models and translate findings from logistic regression models into clinically tangible information. We also identify common coefficient interpretation errors and provide correct interpretations. Lastly, we provide examples of how to convert logistic regression estimates into probabilities and review examples that are particularly relevant to suicide research. Other resources are available for an in-depth dive into the underlying assumptions and mathematics of these analyses (see Agresti, 2013, pp. 163–251).

2.1 ∣. Binary logistic regression

Binary logistic regression is used when a researcher hypothesizes that a variable or set of variables will predict a dichotomous or binary outcome variable. Here, x signifies a predictor variable (in most of our examples, we use a continuous depression score as our x), and y signifies an outcome variable (we use suicide attempt status as our y variable example). Throughout this paper, assume that suicide attempt status in our examples and descriptions was coded as no suicide attempt = 0 and suicide attempt = 1. Most commonly, researchers report (at times incorrectly) odds ratio (OR) values from these models; however, as described below, it is difficult to interpret an OR clinically. Therefore, we are proponents of converting ORs into predicted probability values to enhance clinical utility, and we provide a review and demonstration of this conversion.

2.1.1 ∣. Link functions, odds, and log-odds

As we begin, it is essential to remember that logistic regression is still a regression (i.e., a general linear model). Similar to linear regression models with continuous y variables, logistic regression uses the same linear equation ( $\hat{y} = b_{0} + b_{1} [x]$ ; or the predicted suicide attempt status = intercept + slope of depression[depression score]), but first y requires a transformation. When our y is binary, we know the probability of being in each category (e.g., the probability of suicide attempt status). However, here, we encounter a problem with the use of a binary y in a general linear model; a probability scale is bound between zero and one, which prevents the use of a linear model. Therefore, we must apply a transformation to y so that it is unbound and can be analyzed as a continuous variable in a general linear model. These transformations are termed “link functions,” and within logistic regression, a log-link function is required. A log-link function allows for a linear relationship to be estimated between x (depression) and a binary y by determining the log-odds of y (suicide attempt vs. no suicide attempt) across levels of x (depression). For this paper, we focus solely on how a log-link function impacts the interpretation of estimates from statistical program output for logistic regression models.

2.1.2 ∣. Odds and log-odds

What does a “log-odds” mean clinically, one might ask? We anticipate this is a common question that may contribute to inaccurate interpretation of results and difficulties translating statistical findings to clinical application. To understand log-odds, one must be familiar with both odds and logs (in logistic regression, the so-called natural log). Odds are the ratio of experiencing the event (e.g., a suicide attempt) to not experiencing the event (e.g., no suicide attempt). For example, if a sample of 50 participants includes 10 participants who made a suicide attempt over a given timeframe (e.g., over their lifetime) and 40 participants who have not made a suicide attempt, the odds of having a suicide attempt are 10/40, which reduces to ¼, or 0.25. Log-odds are the natural log (commonly abbreviated as In) of the odds. For example, the natural log of 0.25 = −1.386. That is, the natural number (Euler's number; roughly 2.71828…), e, would need to be raised to the −1.386 power to obtain 0.25. This conversion allows for a general linear model when y is binary.

2.1.3 ∣. Logits, odds ratios, and their interpretation

See Figure 1 for example SPSS output from a logistic regression analysis using simulated data (see Appendix S1 for details and instructions for how to simulate the data we used) where we labeled x as depression (a continuous score) and y as suicide attempt status (0 = no suicide attempt; 1 = suicide attempt). In these simulated data, depression was set to be normally distributed with a M = 0 and SD = 1; therefore, this variable is z-score transformed and can be thought of as standardized (i.e., a one-point increase in the depression reflects a standard deviation increase). Logistic regression analyses produce a positive or negative coefficient estimate in log-odds for the intercept and slope (often referred to as a logit) of the model. A positive logit would indicate that an increase in depression (x) is associated with greater log-odds of being in the suicide attempt category than the no suicide attempt category (the y categories), and a negative logit would indicate that a decrease in depression is associated with greater log-odds of being in the suicide attempt category than the no suicide attempt category. Therefore, the interpretation of the logit estimate (“B”) in Figure 1 is: for every one-point (1 SD) increase in depression, there is a 0.62 increase in the log-odds of being in the suicide attempt category. Despite the mathematical utility of this link function, we believe a log-odds scale and interpreting logits are not clinically useful or intuitive. Thus, researchers can convert the logit to an OR.

SPSS logistic regression output from simulated data where the continuous predictor variable (x) is depression and the binary outcome variable (y) is suicide attempt status (1 = suicide attempt; 0 = no suicide attempt)

ORs are commonly presented in logistic regression analyses in suicide research and are evaluated as an effect size of the relation between the x variable(s) and the binary y. If p is the probability of y (e.g., 10 out 50 people have a suicide attempt, p = 0.20 or 20%), the odds of y are p/(1 – p) (e.g., 0.20/(1 – 0.20) = 0.20/.80 = 0.25). An OR is, therefore, the ratio of the odds of being in a given suicide attempt category (y) across levels of a given covariate, x. This is perhaps best demonstrated assuming a binary x variable (e.g., sex). The numbers we use in our example are intended to make the following calculations easier; this example is not directly related to the example above despite similar numbers. Imagine, for example, that x (sex) = 0 for women and x = 1 for men, and a given sample includes 50 women and 50 men. Of the 50 women, 10 (20%) have a suicide attempt (and thus odds = 0.25), whereas 25 (50%) of the men have a suicide attempt (and thus odds = 1). Hence, the OR of comparing men to women would be 1/.25, or 4. That is, men's odds of having a suicide attempt is four times higher (odds = 1) compared to women (odds = 0.25). Based on probabilities, notice that men, compared to women, are only 50%/20% = 2.50 times more likely to have a positive attempt history. However, in terms of odds, men, compared to women, have four times higher odds of a suicide attempt. Mistakenly interpreting the OR as probability, rather than in odds, is a common mistake made in the literature (see more details below). To understand the relation between this OR and the corresponding logit when interpreting results, taking the natural log of OR = 4, ln(4), provides the logit of 1.386. That is, raising e to the 1.386 power equals (with rounding error) 4. To convert the logit back to an OR, the logit can be exponentiated, exp(1.386) = (with rounding error) 4. Exponentiating a number means taking e to a given power (e.g., ~2.71827 to the 1.386 power = 4 with rounding error). As seen in Figure 1 SPSS output, the OR is labeled as “EXP(B).” This is because to calculate an OR, one must exponentiate (“EXP”) the value of the logit (“B”; see description above and Agresti, 2013, pp. 164–165 for more information).

2.2 ∣. Avoiding errors when interpreting odds ratios

Regarding the interpretation of an OR, an OR greater than one indicates a positive association and should correspond with a positive logit estimate, whereas an OR less than one should correspond with a negative logit estimate. An OR representing a negative association between an x variable (e.g., positive affect) and suicide attempt status (y) has a restricted range between 0 and 1, whereas a positive association between an x variable (e.g., depression) and suicide attempt status (y) does not have a restricted range (between 1 and infinity); therefore, as with any ratio, ORs for positive effects can appear much more substantial than for negative effects. For example, our simulated depression scores produce a positive relation, OR = 1.86 (0.86 above 1), when predicting 1 = suicide attempt and 0 = no suicide attempt; however, our simulated depression scores produce a negative relation, OR = 0.54 (0.46 below 1), when 1 = no suicide attempt and 0 = suicide attempt. Therefore, when depression has a positive association with suicide attempt status, the OR appears to show a larger distance from a null relation (OR = 1.0 indicates a null relation) than when depression has a negative relation with suicide attempt status. One may consider recoding y so that the association is positive for ease of interpretation. This is also an important consideration for clinicians when they are evaluating effect sizes provided in the literature.

There are relatively common errors in interpretations of OR that occur in the literature, so we provide phrasing for the correct interpretation of ORs. As previously mentioned, one of the most frequent errors we have seen is mistakenly interpreting the OR in probability rather than in odds. While the logistic model allows log-odds of a suicide attempt (y) to increase linearly with increasing values of depression (x), the relations between log-odds, ORs, and probability values are all monotonic (increase or decrease in one direction) but are non-linear. Thus, except under some unusual and very specific cases, claiming that an OR of 1.86 (Figure 1 example) implies the risk of a suicide attempt (y) is 1.86 times higher or times more likely for each 1-point increase in depression (x) would be incorrect. Because the OR indicates the ratio of the odds of reporting a suicide attempt or not, the correct phrasing would be, for every one-unit increase in depression, there are 1.86 times the odds of having a suicide attempt. Although this wording difference might seem minor, the former implies different underlying mathematics (i.e., measuring the outcome in the metric of probability rather than odds) that are inconsistent with an OR.

One should also note, in these descriptions, we write "for every one-unit change in depressive symptoms”; this is the same interpretation for all general linear models. Thus, the units, metric, or scaling of a given x can impact the magnitude and interpretation of ORs. For example, even if we assume that depression (x₁) and hopelessness (x₂) are both equally good predictors of suicide attempt status (y), if hopelessness is continuous and has a wide range (e.g., 0–100) it would produce a smaller OR than depression, if depression is categorical (e.g., depression diagnosis status) or has a narrower range (e.g., 0–10). To demonstrate this with our simulated data, the depression scores (M = 0, SD = 1) predict suicide attempt status with OR = 1.86 (see Figure 1). When we transformed the depression scores to have M = 50 and SD = 10, then OR = 1.06. These transformed depression scores produce a smaller OR than the standardized depression scores, because a 1-point increase in x (depression score) is not a 1 SD increase in x that corresponds with an increase in y (suicide attempt status); it is a 0.10 SD increase in x that corresponds with an increase in y. One should see that a mere transformation and changing the scaling of x can impact the magnitude of the OR. Therefore, it is challenging to compare effect sizes of variables with different scaling of x using OR because the scaling of x would impact the size of the OR.

One solution to this problem is to standardize (i.e., z-score transform) continuous x variables, such that every one-unit change in x is one standard deviation-unit. The resulting standardized x variables are in the same metric, and the ORs would be standardized on the x variables. However, reporting unstandardized scores may provide greater translation to clinical settings where unstandardized assessment scores are often used. Therefore, we suggest reporting both unstandardized and standardized OR values and/or reporting an additional effect size measure, such as the area under the curve (AUC) statistic (described below), which does not change based on the scaling of the x variable(s).

2.3 ∣. Converting odds ratios to probability to improve clinical interpretation

Now, imagine that a clinician has an OR value from an empirical publication in front of them; how are they going to use this OR to inform their practice or use of an assessment? Likely, they will not. This is because ORs are not intuitive to most people and are often misunderstood as estimates of “X times more likely.” However, estimates from logistic models can be used to calculate predicted probabilities of the category membership of the y (suicide attempt status) as a function of x (depression). For example, these conversions can address questions such as, “What is the predicted probability of reporting a suicide attempt (y) for a given score on depression (x)?” Importantly, after converting estimates from a logistic regression (both the intercept [or “constant” in Figure 1] and logit [“B” in Figure 1] are required) from predictions based in log-odds to predictions based in probability, the relation between depression and suicide attempt status is no longer linear (probabilistic associations are curved and s-shaped; see Figure 2). Thus, the interpretation is no longer as straightforward as "for every one-unit change in depression, there is a (number) change in the probability of a suicide attempt." After converting to a probability scale, the change in the probability of a suicide attempt does not occur at the same rate across all values of depression. Therefore, we recommend plotting predicted probabilities of y category membership (suicide attempt status) across the possible values of x (depression) to allow for a depiction of the non-linear relation between x and y. Appendix S1, Appendix S2, and Appendix S3 demonstrate how to convert intercepts and logits from logistic regression models to probabilities, and how to plot the predicted probabilities.

The predicted probability of suicide attempt status (y; 1 = suicide attempt; 0 = no suicide attempt) given a depression (x) score from simulated data. See Appendix S2 for equations and instructions for developing this figure using Microsoft Excel. A similar graph is produced using the R code in Appendix S1

2.4 ∣. Demonstrations and examples

To do this, all one needs are fundamental regression skills and one other simple formula. For example, let us calculate the predicted probability of a suicide attempt given a depression score of 1 (which is 1 SD above the mean in our simulated dataset). Using the linear regression formula, we can estimate the predicted log-odds of a suicide attempt ( ${\hat{y}}_{\log - odds}$ ) from the output logit values (“B”) in Figure 1. That is, the ${\hat{y}}_{\log - odds} = b_{0 log-odds} + b_{1 log-odds} (x)$ , where b_{0 log-odds} = the intercept of the model (0.95; i.e., the log-odds of the outcome when depression = 0; when x is standardized as a z-score, this is the log-odds of a suicide attempt for those with mean values on depression), b_{1 log-odds} is the slope of depression in log-odds (0.62), and the x value is any number on the range of depression scores (1). Therefore, with a depression score of 1, this equation would be 1.57 = 0.95 + 0.62(1). Subsequently, the predicted log-odds of a suicide attempt (1.57) is exponentiated to produce the predicted odds of a suicide attempt (^Vodds = exp[1.57] = 4.79). Next, predicted odds of a suicide attempt when depression score is 1 (4.79) is converted to probability using the simple formula, p = odds/(1 + odds), or ${\hat{y}}_{probability} = {\hat{y}}_{odds} ∕ (1 + {\hat{y}}_{odds})$ , or 0.83 = 4.79/(1 + 4.79). This calculation produces the predicted probability of a suicide attempt when the depression score is 1. To plot the predicted probabilities, this equation and conversions would need to be completed for each possible value of depression, which can be done easily (using Microsoft Excel or similar programs; see Appendix S1, Appendix S2, and Appendix S3). To find the probability of not having a suicide attempt when the depression score is 1, one would calculate 1 – ${\hat{y}}_{probability}$ or 1 – 0.83 = 0.17. Again this would be calculated for each possible value on depression.

It is important to remember that if one changes the scale of x using a transformation (e.g., z-score standardization or mean-centering) and the analysis estimates (i.e., intercept, logit) are based on the transformed x scores, one should use the transformed x scores in the logit regression equation when converting to probabilities. Otherwise, the ${\hat{y}}_{\log - odds}$ values will be incorrect, which will, in turn, produce incorrect ${\hat{y}}_{odds}$ and ${\hat{y}}_{probability}$ . After one calculates the ${\hat{y}}_{probability}$ for each value of x, one can plot ${\hat{y}}_{probability}$ across the values of x, where the y-axis of the figures would range from 0 to 1 (the possible range of probability values), and the x-axis would range across the possible values of x (or transformed values of x). Developing a predicted probability plot in this way can provide convenient and easy-to-interpret results for clinicians. Clinicians can use these plots to see the expected probability that their patients’ suicide attempt status (y) based on their depression assessment score (x), for example.

Using the example output in Figure 1, Appendix S1 and Appendix S2 demonstrate these conversions for all possible depression scores, and Figure 2 depicts the predicted probability figure developed from the results from the simulated dataset. This procedure can be modified for logistic regression models with multiple x variables (e.g., depression and hopelessness predicting suicide attempt status; See Appendix S3). For ease of manipulating the logistic regression equations and plotting the predicted probabilities with multiple x variables, standardizing or mean centering the continuous x variables can be helpful, as we did in Appendix S3 with z-score transformed hopelessness and depression scores. This would transform the x variables to have a mean of zero; therefore, when using the regression equations to calculate the ${\hat{y}}_{\log - odds}$ for the x of interest (e.g., depression), the other covariate(s) (e.g., hopelessness) drop out of the equation because it is being held at the mean (which is zero). The predicted ${\hat{y}}_{\log - odds}$ would still indicate the $\hat{y}$ for the plotted x of interest (depression) when the other x variable(s) (hopelessness) are held constant at their mean.

2.5 ∣. An example from the literature

Next, we provide an overview of an example of these categorical methods in the suicide literature. Mitchell et al. (2020) tested thwarted belonging and perceived burden (continuous x variables) as predictors of desire for death and desire for suicide (y; coded 0 = no desire, 1 = some level of desire) among 318 psychiatric outpatients. To partially describe their results as they pertain to this paper (see Mitchell et al., 2020 for a complete description), they found a positive relation between thwarted belonging and desire for suicide (logit = 0.08, OR = 1.08, p < 0.001); however, to allow a more direct comparison of the ORs for thwarted belonging and perceived burden, they also conducted their analyses with z-score transformed thwarted belonging and perceived burden scores. Thwarted belonging had a standardized OR of 2.76. Note that this standardized OR (x is in SD units) is larger than the unstandardized OR (x is in the scale of the measure) because of the change in the meaning of a one-unit increase when standardizing x scores. To provide practical results that would allow clinicians to calculate and interpret the predicted probability of some level of suicidal desire from patients’ thwarted belonging scores, Mitchell et al. (2020) provided predicted probability figures and noted that these figures were derived from the unstandardized logistic regression estimates because the unstandardized assessment scores would likely be used in clinical settings. They also offered sample interpretations of the predicted probability figures. For example, a thwarted belonging score of 63 (the highest score) corresponded with a 66% chance of desire for suicide. A clinician could use the figures and this information to inform their clinical decisions, which may not have been possible if only logits or ORs were presented (see Mitchell et al., 2020, for further discussion).

3 ∣. A NOTE ON MULTI-CATEGORICAL AND COUNT MODELS

Much of what has been reviewed so far can also be applied to other models with a multi-categorical y (i.e., multinomial logistic regression and ordinal logistic regression). That is, one can use the standardized x variables for standardized estimates, convert logits to ORs, and ORs to predicted probabilities. Multi-categorical models do become more complex due to multiple intercepts (the number of intercepts corresponds to the number of categories in the y variable minus one [k – 1 intercepts]) and multiple slope coefficients for each x variable (only in multinomial logistic regression; the number of slopes corresponds to the number of categories in the y variable minus one). Ordinal logistic regression is used when a y variable is ordinal with more than two categories, such as Likert-type items (e.g., frequency of suicide ideation where 0 = never, 1 = less than once per week, 2 = a few times per week, 3 = more days than not). This model would produce k – 1 intercepts (3 intercepts in this example), and one logit slope coefficient. Ordinal logistic regression is able to produce one logit slope coefficient when the proportional odds assumption is satisfied. When this assumption is satisfied, it states that the logit slope coefficient describes the distance between the lowest category and all higher categories, the distance between the next lowest category and all higher categories, and so on. If the y variable is nominal or the proportional odds assumption of ordinal logistic regression is not satisfied, then multinomial logistic regression is an appropriate analysis. Using a Likert-type scale of degree of suicide ideation (y variable) in multinomial logistic regression, the model would still produce k – 1 intercepts (3 intercepts in this example); however, this time, we are comparing each group to a meaningful reference group (determined by the researcher). For this example, one might determine the reference group for frequency of suicide ideation is 0 = never; therefore, the “never” group would be compared to the other three frequency of suicide ideation groups (i.e., 1 = less than once per week, 2 = a few times per week, 3 = more days than not). Due to the comparison of the reference group to the other groups, multinomial logistic models produce multiple logit slope coefficients (the number of categories in y minus 1); in our current example, a multinomial logistic model would produce 3 intercepts and 3 logit slope coefficients. Like logistic regression, we can calculate and graph predicted probabilities in a figure; however, given the multiple intercepts and logit slope coefficients, this process is a bit more complex. We cannot go into further detail about these models in this paper, but there is a wonderful reference available for more information (see Agresti, 2013, pp. 293–338).

Count regression models are also relevant to suicide research. Count regression models include a y variable that is a rate or count variable (e.g., number of suicide attempts) or a continuous measure (e.g., self-report measure of suicide ideation scores) that fit a Poisson or negative binomial distribution (see Agresti, 2013, pp. 122–130 for more detail). Much like the previous models we have discussed, Poisson or negative binomial regression will produce a log-count slope estimate (conceptually similar to a logit), and an intercept. Therefore, if we exponentiate the log-count slope estimate (just as we would exponentiate a logit), we receive the estimate on the original count scale for ease of interpretation. Given the low base-rate of suicide ideation and attempts, a zero-inflated Poisson or negative binomial regression is especially relevant to suicide research. Zero-inflated Poisson or negative binomial regression is used when there a large number of zeros in one's data (e.g., many people report zero suicide attempts or a score of zero on self-reported suicide ideation). Zero-inflated models have been implemented with greater frequency in suicide research, especially to identify underreporting of suicide ideation (e.g., Anestis, Mohn, Dorminey, & Green, 2019; Ansell et al., 2015; Cukrowicz, Jahn, Graham, Poindexter, & Williams, 2013). Hu, Pavlicova, and Nunes (2011) provide a superb and non-technical review of zero-inflated count regression models for those interested.

4 ∣. RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE ANALYSES AND CLINICAL CUTOFF SCORES

ROC curve analyses are another way to describe the classification performance of a logistic model. That is, ROC curve analyses are focused on the predictive accuracy of an x variable(s), such as a depression assessment, for a y of interest (e.g., those with or without a disease, those with or without a suicide attempt). Inherently, ROC curve analyses have tremendous clinical utility. These analyses can demonstrate to a clinician the strength and accuracy of an assessment that they can use in practice. In addition, these analyses can be used to derive clinical cutoff scores (i.e., the point at which scores above would indicate a clinically significant problem) for an assessment to further guide clinicians’ interpretations of their patients’ assessment scores and guide treatment decisions. Some statistical programs (e.g., SPSS) only allow for a single x in their ROC curve models (although multiple x variables may be depicted in one figure), and other programs (e.g., SAS, R) allow for an ROC curve model with a single or multiple x variable(s).

Before moving into greater detail about ROC curve analyses and output, several relevant terms should be described:

Sensitivity (sometimes called recall) is the true positive rate or the proportion of people in our target category that our test(s) (i.e., predictor variable[s]) actually labeled as being in that category. For example, given a person is positive for a suicide attempt, what is the probability our model classifies this person as positive? The sensitivity value subtracted from one yields the false-negative rate (e.g., the probability that the test[s] indicates that an individual does not have a suicide attempt when they do).
Specificity is the true negative rate or the proportion of people not in our target category that our test(s) correctly labeled as outside our target category. For example, given a person is negative for a suicide attempt, what is the probability our model classifies this person as negative? The specificity value subtracted from one yields the false-positive rate (e.g., the probability the test indicates that an individual has a suicide attempt when, in fact, they do not).
Positive Predictive Value (PPV), which has also been referred to as precision, is the probability that a person who was labeled as being in the target category (e.g., someone with a suicide attempt) was actually in that category (this is usually a clinician's highest priority). For example, given our model classifies a person as positive for a suicide attempt, what is the probability that a person is positive for a suicide attempt?
Negative Predictive Value (NPV) is the probability that a person who was labeled outside our target category (e.g., someone without a suicide attempt) was, in fact, outside that category. For example, given our model classifies a person as negative for a suicide attempt, what is the probability that a person is negative for a suicide attempt?

4.1 ∣. Sensitivity, specificity, PPV, and NPV in suicide research and clinical decision making

It is important to note that sensitivity and specificity are not sensitive to the base rate (i.e., the percent of the sample that is positive for y, a suicide attempt), whereas PPV and NPV are very sensitive to the base rate. This has significant implications for suicide risk assessment and screening and clinical decision making. For example, say we have a model where sensitivity and specificity are both 90%. If the base rate of a suicide attempt is 10% (e.g., 10 out of 100 people have a suicide attempt), then NPV is over 98%. However, the PPV will only be 50% (i.e., there is a 50% chance that a person is positive for a suicide attempt when the model has classified them as such; see Table 1), given PPV is dependent on whether one is positive on y (has a suicide attempt) for which our base rate is only 10%. Conversely, if the base rate of a suicide attempt is 90%, then the PPV is over 90%, but the NPV is only 50%, despite maintaining a model where sensitivity and specificity are both 90% (see Table 2). Therefore, in suicide research where the base rate of being positive for a suicide attempt or death is very low, it is common for PPV to be low (low accuracy of our test or predictor of a positive suicide attempt status) despite high sensitivity and specificity. Furthermore, the NPV is likely to be high (high accuracy of our test or predictor of a negative suicide attempt status) suicide research; however, this is not particularly meaningful because of course our model would be more likely to accurately predict a negative suicide attempt status when that status is highly prevalent in the sample.

TABLE 1.

Example (N = 100) where the base rate for a positive outcome (suicide attempt) is 10%, and the test has a sensitivity and specificity of 90%

	Positive for outcome	Negative for outcome
Positive test	9	9
Negative test	1	81

Open in a new tab

TABLE 2.

Example (N = 100) where the base rate for a positive outcome (suicide attempt) is 90%, and the test has a sensitivity and specificity of 90%

	Positive for outcome	Negative for outcome
Positive test	81	1
Negative test	9	9

Open in a new tab

Given this information, when developing and conducting suicide risk assessments and screening tools, it is important to recognize that one could develop or be using a tool with high sensitivity and specificity; however, due to the low base rate of suicidal thoughts and behaviors, PPV will be low, and our tool will not accurately predict our outcome of interest (e.g., suicidal thoughts or behaviors). A low PPV and inaccurate determination of suicide risk could lead to high-stakes errors in clinical decision making. Let us consider the Columbia-Suicide Severity Rating Scale (C-SSRS), a commonly used suicide risk assessment, as an example. Some studies will report sensitivity and specificity, but not PPV or NPV (e.g., Lindh et al., 2018; Madan et al., 2016), which makes it difficult to fully assess the accuracy of the C-SSRS in identifying the suicide-related outcome. Others have reported sensitivity, specificity, PPV, and NPV for the C-SSRS; however, their findings follow the pattern we explained above when predicting low base rate behaviors. For example, one study demonstrated the C-SSRS has a sensitivity of 0.95, specificity of 0.95, PPV of 0.22, and NPV of 0.99 (Viguera et al., 2015), and another study indicated the C-SSRS has a sensitivity of 0.56, specificity of 0.98, PPV of 0.30, and NPV of 0.99 (Katz, Barry, Cooper, Kasprow, & Hoff, 2019). Although these statics differ due to different suicide-related outcomes, the pattern holds that PPV is low for suicide-related outcomes despite good sensitivity and specificity.

As another example, Nock et al. (2010) reported a PPV of 0.32 for an implicit association test (IAT)-based prediction of suicide attempts over 6 months. Their sample was a clinical population of recent suicide attempters who had an extremely high re-attempt rate of 15.2% over the course of the study. If we keep the sensitivity (0.50) and specificity (0.81) values from their model but apply it to a population with a much lower 6-month attempt rate of 1%, the PPV would drop to roughly 0.03. Thus, anyone using the exact same test (in this case, the IAT) in a population with a low base rate of suicide attempts would have significantly less accurate prediction than what would be expected in a population with a higher base rate. This phenomenon is why other metrics of model performance, such as the AUC statistic calculated using ROC curve analyses, are often used to evaluate the performance of a model, but as discussed below, this does not address the importance of considering the base rate. Thus, clinicians should incorporate many sources of clinical data into suicide risk determinations and clinical decision making rather than relying on any one tool (e.g., Homaifar, Matarazzo, & Wortzel, 2013; Wortzel, Matarazzo, & Homaifar, 2013).

4.2 ∣. Area under the curve (AUC) statistics

ROC curve analyses produce an AUC figure and statistic. The AUC statistic indicates the degree of sensitivity (true positives) and 1 – specificity (false positives) where a larger AUC statistic indicates a high true-positive rate and a low false-positive rate. An AUC value of 1.0 indicates perfect sensitivity and specificity, whereas an AUC value of 0.50 indicates that the x variable(s) do not correctly identify individuals with the outcome any better than chance. See Figure 3 for examples of AUC figures when the AUC value is 0.49 (very poor prediction), 0.81 (good prediction), and 0.95 (excellent prediction), which were produced from simulated data. Figure 4 depicts the AUC statistic (0.66) and figure using the simulated data we presented in the Figure 1 example where depression (x) was predicting suicide attempt status (y).

Receiver Operating Characteristic (ROC) curve figures developed using a simulated dataset in SPSS. The dashed diagonal line indicates an Area Under the Curve (AUC; the solid black line) value of 0.50, which is a chance-level prediction (i.e., no predictive value). Larger AUCs indicate a higher true positive rate (sensitivity; y-axis) and a lower false-positive rate (1 – specificity; x-axis). If the curve reached the top left corner of the figure (i.e., AUC = 1.0), that would indicate a 0 false-positive rate and a perfect (100%) true-positive rate. Panel A has an AUC of 0.49 (very poor prediction), Panel B has an AUC of 0.81 (good prediction), and Panel C has an AUC of 0.95 (excellent prediction) from simulated data

The AUC statistic for this plot (i.e., the area under the curved sold black line) is 0.66 using the simulated data example of depression predicting suicide attempt status (y; 0 = no suicide attempt, 1 = suicide attempt) presented in Figure 1. The dashed line indicates an AUC of 0.50 or chance-level prediction of the y variable

One way to understand the AUC is to note that it is mathematically equivalent to an estimate referred to as the concordance statistic or the c-statistic (see Agresti, 2013, p. 224). This statistic considers all pairs of observations where one observation is positive on y (has a suicide attempt), and the other is negative on y (does not have a suicide attempt). To simplify, consider the following situation with three individuals: Person A (with a suicide attempt), Person B (with no suicide attempt), and Person C (with no suicide attempt). Thus, there would be two pairs of positive vs. negative suicide attempt status: Person A with Person B (“Pair 1”) and Person A with Person C (“Pair 2”). The c-statistic is the proportion of these pairs where the model predicts a greater probability for the outcome for an individual with the outcome vs. the individual without the outcome. For example, say we have a logistic model that suggests Person A has a 30% chance of having a suicide attempt, Person B has a 20% chance of having a suicide attempt, and Person C has a 40% of having a suicide attempt (i.e., “Situation 1”). In this model, Pair 1 would be concordant (because Person A’s 30% > Person B’s 20%), whereas Pair 2 would not be concordant (because Person A’s 30% < Person C’s 40%). Thus, my c-statistic would be 50% (c = the number of concordant pairs/ the number of total pairs; in this example, c = ½ = 50%). That is, my model does no better than flipping a coin for each pair to determine which observation should be classified as positive vs. negative on y (suicide attempt status). Instead, say our model suggested Person C had a 15% chance of having a suicide attempt (i.e., “Situation 2”). In this situation, Pair 2 now becomes concordant (because Person A's 30% > Person C's 15%), and the c-statistic is now 100%. An equally valid way of expressing these results is the AUC for Situation 1 = 50% (no better than chance) and AUC for Situation 2 = 100% (perfect classification). Thus, per Figure 4, the AUC statistic of 0.66 would indicate that there is a 66% chance that an individual with a suicide attempt would evidence a higher probability of a suicide attempt (i.e., given the univariate context and the positive logit, a higher depression score) than a randomly paired individual who did not have a suicide attempt.

4.2.1 ∣. An example from the literature

The following is an example from the suicide literature where ROC curve analyses and AUC statistics were examined. Mitchell et al. (2020) provided ROC curve analyses and AUC statistics for thwarted belonging and perceived burden as predictors of desire for death and desire for suicide among psychiatric outpatients, in addition to their logistic regression analyses and predicted probability figures. For example, the ROC curve analyses provided information related to the sensitivity and specificity of the thwarted belonging as a predictor of desire for suicide. Additionally, AUC demonstrated the effect size of the relation between thwarted belonging and perceived burden on the desire for suicide. They found that thwarted belonging in relation to the desire for suicide had an AUC of 0.74. Note that the AUC should not change regardless of using the standardized or unstandardized predictor variable. Mitchell et al. also converted the AUC into a Cohen's d effect size score (see Ruscio, 2008 for conversion equations) and compared the magnitude of the AUC statistics of the different predictors in the model (Note: some programs will not allow for such a comparison; these analyses were conducted in SAS). This information is relevant when determining clinical cutoff scores.

4.3 ∣. Determining clinical cutoff scores

ROC analyses can be used to determine optimal cutoff scores given some criteria. For example, when sensitivity and specificity are given equal weight (i.e., identifying true positives is equally important as identifying true negatives), Youden's J index (sometimes referred to as Youden's statistic or Youden's index) is often used. This index is (sensitivity + specificity) – 1. By using this index, the optimal cutoff score for a continuous x (e.g., depression) can be determined to maximize Youden's index for y (e.g., a suicide attempt). This is equivalent to identifying an optimal probability threshold; that is, the predicted probability for when an individual would be considered positive (a suicide attempt), vs. negative (no suicide attempt), y. For example, the default probability threshold is typically 50%—those with a greater than 50% chance for having a suicide attempt would be coded positive, whereas those with a 50% chance or less would be coded as having no suicide attempt. In situations where only one x is considered, the cutoff score is usually highlighted, given its clinical utility. Clinicians are probably familiar with scales that use cutoff scores for classification (e.g., Beck Depression Inventory-II cutoff scores as indicating depression).

There may be situations where true positives vs. true negatives are deemed more important and vice versa. For example, within suicidology research, the cost of a false negative (e.g., a model classifies someone as not having a suicide attempt despite having a suicide attempt) may be far greater than the cost of false positive (e.g., a model classifies someone as having a suicide attempt despite not having a suicide attempt). There are various methods to identify optimal cutoffs, given the relative importance of sensitivity vs. specificity. For example, the InformationValue package (Prabhakaran, 2016) in R determines optimal cutoffs for four different types of optimization: “misclasserror” (the cutoff that gives the minimum misclassification error; i.e., 1-accuracy, where accuracy is the total number of correctly classified individuals divided by the total number of individuals), “Ones” (the cutoff that optimizes the detection of positive cases, those who have a suicide attempt), “Zeros” (the cutoff that optimizes the detection of negative cases, those who do not have a suicide attempt), and “Both” (the cutoff that maximizes Youden's J). Determining the optimal cutoff (referred to as “threshold tuning” within the ML literature) is also often utilized when class imbalance (i.e., a preponderance of cases of are one type; e.g., in a sample where there are far fewer cases having a suicide attempt vs. not having a suicide attempt) is apparent. Given the outcomes examined by suicidology researchers, issues such as differential costs for false positives vs. false negatives and class imbalance are highly relevant.

4.3.1 ∣. An example from the literature

As an applied example, Mitchell et al. (2017, 2020) used identical methods to determine clinical cutoff scores for thwarted belonging and perceived burden when predicting their suicide-related outcomes. As an example, Mitchell et al. (2020) used Youden's J index to identify cutoff scores for thwarted belonging and perceived burden when predicting the desire for suicide. Youden's J index, as mentioned above, places equal emphasis on balancing the false-positive and false-negative rates. Using this method, Mitchell et al. (2020) identified the cutoff score of 50 for thwarted belonging as a predictor of desire for suicide. This cutoff indicated that the sensitivity was 0.95, and the specificity was 0.46 (AUC = 0.74). These results indicated that a cut score of 50 best maximized the false-positive and false-negative rates. A thwarted belonging cut score of 50 has a 95% chance of identifying a patient with some level of suicidal desire; however, a thwarted belonging cut score of 50 only has a 46% chance of correctly identifying a patient with no suicidal desire or a 54% (1 – specificity) chance of falsely identifying an individual with desire for suicide who does not have suicidal desire or has not reported suicidal desire. These results indicated that the thwarted belonging score of 50 might do well at identifying those who do experience suicide desire, but is likely to misclassify someone without suicidal desire (or without disclosed suicidal desire).

4.3.2 ∣. The utility of clinical cutoff scores

The high possibility of misclassification may raise the question, “then are these measures and methods really even helpful in suicide risk assessment?” Although these results indicate an imperfect prediction of suicidal desire and should not be used as a sole instrument to assess suicidal desire, such assessments can provide valuable clinical insight into a patient's experiences and raise a discussion point to be addressed during sessions with a clinician. As discussed by Mitchell et al. (2020), in detail, the cutoff scores presented should not be used in isolation from other clinical data. Instead, they should be used as a reminder to the clinician to investigate suicide risk and relationship dysfunction further when scores are elevated. Furthermore, it is also important to note that clinical cutoff scores are likely not generalizable to populations for which they were not developed. For example, a measure such as the Beck Depression Inventory-II will produce different clinical cutoff scores when predicting suicide attempts depending on the proportion of those with and without suicide attempts in the sample. Therefore, more clinically severe samples with higher suicide attempt rates will have different clinical cutoff scores from less clinically severe samples with lower suicide attempt rates. This is an important consideration when interpreting and applying clinical cutoff scores. There are many excellent sources on incorporating many sources of clinical data into suicide risk determinations (e.g., Homaifar et al., 2013; Wortzel et al., 2013). The purpose of these methods is not to provide perfect prediction and clinical absolutes, which is an unlikely outcome and challenging pursuit in suicide research. Rather, issues with prediction and misclassification are notable problems and should be considered in suicide research, which we discuss in the next section of this paper.

5 ∣. WHY IS PREDICTION DIFFICULT IN SUICIDE RESEARCH?

The models we describe in this paper are well established, and they have been successfully deployed across multiple domains of science and medicine (there are many examples presented in Agresti, 2013). However, they have had less historic success in the prediction of suicide and related behavior (e.g., suicide ideation and attempts). Many explanations emphasize the role of poor study design as the source of this slow progress in suicide prediction and prevention (Franklin et al., 2017; Prinstein, 2008; Ribeiro et al., 2016); however, a more fundamental roadblock comes from the population base rate (referred to as the prior probability from a Bayesian perspective) of suicidal thoughts and behaviors. When base rates are low, false positives are high even for highly accurate models. For example, even a hypothetical "super predictor" with sensitivity and specificity of 0.90 applied to a population with a 1% suicide prevalence (OR = 81) would still have a PPV under 10%—a more than 90% false-positive rate (Cohen, 1986; also see the previous section on ROC curve analyses). This problem is more than just mathematical theorizing and has now been born out in the empirical literature on suicide prediction. For example, a recent systematic review found that most suicide prediction models would produce unfavorably low PPVs across a variety of populations and assessment contexts (Belsher et al., 2019). Although some populations are more vulnerable than others, the problem just outlined is a problem affecting all statistical models that seek to predict low base rate phenomenon. For example, Meehl (1973) noted (p. 232):

“One should always keep in mind that there is a relationship between prior probability (e.g., the base rate P of a certain diagnosis or dynamic configuration in the particular clinic population) and the increment in probability contributed by a certain diagnostic symptom or sign. If the prior probability is extremely low, you don’t get very much mileage out of a moderately strong sign or symptom. On the other hand, when the prior probability is extremely high, you get mileage out of an additional fact, but you don’t really ‘need it much,’ so to speak.”

All statistical models are bounded in direct proportion to the base rate of the phenomena they are trying to predict. For example, when the base rate is very low, simply guessing all individuals do not have the condition (referred to as the no information rate) will produce a model with high accuracy, which alternative analytic models will struggle to outperform. This is a consideration that will always be relevant to suicide research, as in other areas of sciences studying low base rate events (e.g., cancer). These considerations are also not unique to the methods described in this paper; they apply to ML methods as well.

6 ∣. CONCLUSIONS

We believe the way forward in suicide risk and prevention research and its application is twofold: to utilize the current methods discussed in this paper to their full potential and to develop additional methods to address the limitations of these models (examples listed below). Given the low base rate of suicide and related behaviors, traditional categorical methods are not ideally positioned to address the issues with prediction that have been increasingly emphasized in the literature (e.g., Franklin et al., 2017). However, this does not eliminate the utility of any of these models. In fact, we hope this article highlights the inherent clinical utility of these models and can serve as a guide for how researchers and clinicians can maximize their potential for real-world application.

Categorical methods can be used to clarify the role of specific risk factors, as well as the relative importance of risk factors in various populations. For example, reporting of standardized ORs allows for comparisons to be drawn between at-risk populations. Additionally, continued use of these methods and consistent reporting of effect sizes could facilitate future meta-analytic analyses that can inform suicide science and theory. We have provided information and guidelines to aid researchers and clinicians in the accurate reporting and interpretation of results from studies that used categorical methods. We also encourage researchers to use the helpful referenced resources on categorical methods (e.g., Agresti, 2013), which also include additional methods that we could not thoroughly discuss in the current paper (e.g., multi-categorical logistic regression, count regression models) that may be particularly useful for the study of suicide and related outcomes.

Potentially more important is how categorical analyses can directly inform suicide risk assessment. Empirically derived clinical cutoff scores are a tool that clinicians can use to aid in clinical decision making. Mitchell et al. (2017, 2020) provide examples of this application and highlight the need for replication with larger and more diverse samples. Considering a clinical cutoff score does not provide perfectly accurate information and may differ between samples with different rates of the outcome of interest (e.g., suicide attempts), the use of multiple well-validated measures with clinical cutoff scores may be invaluable in assessing the risk of our patients to the best of our ability. It is our hope that researchers carefully consider how they may best present their findings in a clinically tangible manner. To narrow the gap between research and clinical practice, we encourage more research aimed at deriving clinical cutoffs for theory-based measures for various at-risk populations. We hope our review of this technique, how to make predicated probability figures, as well as the calculators and R code provided (see Appendices), will aid in this endeavor.

Another potentially useful application of these methods is to use them within a specific setting (e.g., outpatient or inpatient clinic) to allow clinicians to better tailor assessments to their specific patient population. A return to a focus on the individual patient mitigates much of the concern over false-positive predictions from a classification model at the population level (see Carter & Spittal, 2018, for a summary). Rather than solely focusing on prediction, clinicians can use the methods in this paper to inform clinical assessment with the individual patient. A clinician can establish an assessment plan, evaluate individual patient's needs, then modify known risk factors for suicide. The kinds of models described in this paper can aid in this process by highlighting to a clinician which kinds of factors might be relevant for a particular patient and can cue the clinician into potential assessment targets for follow-up inquiry. Additionally, these approaches can be applied ideographically, such that repeated assessment of risk factors and outcomes can be used to determine predicted probabilities of future outcomes for an individual.

As suicide science continues to progress, there is also the potential for new solutions to these old problems. The development of new approaches aimed at addressing these issues of prediction will be important. For example, within the field of ML, several approaches have been suggested as remedies for severe class imbalance (model tuning to maximize the accuracy of the minority class, alternative cutoffs as discussed above, adjusting prior probabilities in Bayesian models such as Naive Bayes, using unequal case weights, using sampling methods such as up-sampling and down-sampling, and cost-sensitive training; see Kuhn & Johnson, 2013, Chapter 16 for more details).

We hope that our review of categorical data analysis methods, as they can apply to suicide risk and prevention research, has equipped researchers and clinicians with the background and resources they need to implement and interpret these methods effectively. Integrating research and clinical work to improve intervention and clinical assessment can lead to improved evidence-based care of suicidal individuals.

Supplementary Material

Appendix s2

NIHMS1677209-supplement-Appendix_s2.xlsx^{(34.9KB, xlsx)}

Appendix_S1

NIHMS1677209-supplement-Appendix_S1.docx^{(503.1KB, docx)}

ACKNOWLEDGEMENT

This work was partially supported by grants from the National Institute of Mental Health (T32 MH020061; L30 MH120575; L30 MH120727; R01 MH115922).

Funding information

The National institute of Mental Health, Grant/Award Number: T32 MH020061, L30 MH120575, L30 MH120727 and R01 MH115922

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section.

REFERENCES

Agresti A (2013). Categorical data analysis (3rd ed.). New York, NY: Wiley. [Google Scholar]
Anestis MD, Mohn RS, Dorminey JW, & Green BA (2019). Detecting potential underreporting of suicide ideation among U.S. Military personnel. Suicide & Life-Threatening Behavior, 49, 210–220. 10.1111/sltb.12425. [DOI] [PubMed] [Google Scholar]
Ansell EB, Wright AGC, Markowitz JC, Sanislow CA, Hopwood CJ, Zanarini MC, et al. (2015). Personality disorder risk factors for suicide attempts over 10 years of follow-up. Personality Disorders: Theory, Research, and Treatment, 6, 161–167. 10.1037/per0000089. [DOI] [PMC free article] [PubMed] [Google Scholar]
Belsher BE, Smolenski DJ, Pruitt LD, Bush NE, Beech EH, Workman DE, et al. (2019). Prediction models for suicide attempts and deaths: A systematic review and simulation. JAMA Psychiatry, 76, 642–651. 10.1001/jamapsychiatry.2019.0174. [DOI] [PubMed] [Google Scholar]
Burke TA, Ammerman BA, & Jacobucci R (2019). The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: A systematic review. Journal of Affective Disorders, 245, 869–884. 10.1016/j.jad.2018.11.073. [DOI] [PubMed] [Google Scholar]
Carter G, & Spittal MJ (2018). Suicide risk assessment: Risk stratification is not accurate enough to be clinically useful and alternative approaches are needed. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 39, 229–234. 10.1027/0227-5910/a000558. [DOI] [PubMed] [Google Scholar]
Cohen J (1986). Statistical approaches to suicidal risk factor analysis. Annals of the New York Academy of Sciences, 487, 34–41. 10.1111/j.1749-6632.1986.tb27883.x. [DOI] [PubMed] [Google Scholar]
Cukrowicz KC, Jahn DR, Graham RD, Poindexter EK, & Williams RB (2013). Suicide risk in older adults: Evaluating models of risk and predicting excess zeros in a primary care sample. Journal of Abnormal Psychology, 122, 1021–1030. 10.1037/a0034953. [DOI] [PubMed] [Google Scholar]
Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. (2017). Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research, 143, 187–232. 10.1037/bul0000084 [DOI] [PubMed] [Google Scholar]
Hastie T, Tibshirani R, & Friedman JH (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York, NY: Springer. [Google Scholar]
Homaifar B, Matarazzo B, & Wortzel HS (2013). Therapeutic risk management of the suicidal patient: Augmenting clinical suicide risk assessment with structured instruments. Journal of Psychiatric Practice, 19, 406–409. 10.1097/01.pra.0000435039.68179.70. [DOI] [PubMed] [Google Scholar]
Hu MC, Pavlicova M, & Nunes EV (2011). Zero-inflated and hurdle models of count data with extra zeros: Examples from an HIV-risk reduction intervention trial. American Journal of Drug & Alcohol Abuse, 37, 367–375. 10.3109/00952990.2011.597280. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katz I, Barry CN, Cooper SA, Kasprow WJ, & Hoff RA (2019). Use of the Columbia-Suicide Severity Rating Scale (C-SSRS) in a large sample of veterans receiving mental health services in the Veterans Health Administration. Suicide & Life-Threatening Behavior, 50, 111–121. 10.1111/sltb.12584. [DOI] [PubMed] [Google Scholar]
Kuhn M, & Johnson K (2013). Applied predictive modeling. New York, NY: Springer. [Google Scholar]
Lindh ÅU, Waern M, Beckman K, Renberg ES, Dahlin M, & Runeson B (2018). Short term risk of non-fatal and fatal suicidal behaviours: The predictive validity of the Columbia-Suicide Severity Rating Scale in a Swedish adult psychiatric population with a recent episode of self-harm. BMC Psychiatry, 18, 319. 10.1186/s12888-018-1883-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Madan A, Frueh BC, Allen JG, Ellis TE, Rufino KA, Oldham JM, et al. (2016). Psychometric reevaluation of the Columbia-Suicide Severity Rating Scale: Findings from a prospective, inpatient cohort of severely mentally ill adults. The Journal of Clinical Psychiatry, 77, e867–e873. 10.4088/JCP.15m10069. [DOI] [PubMed] [Google Scholar]
Meehl PE (1973). Why I do not attend case conferences. In Meehl PE (Ed.), Psychodiagnosis: Selected papers (pp. 225–302). Minneapolis, MN: University of Minnesota Press. [Google Scholar]
Mitchell SM, Brown SL, Roush JF, Bolaños AD, Littlefield AK, Marshall AJ, et al. (2017). The clinical application of suicide risk assessment: A theory-driven approach. Clinical Psychology & Psychotherapy, 24, 1406–1420. 10.1002/cpp.2086. [DOI] [PubMed] [Google Scholar]
Mitchell SM, Brown SL, Roush JF, Tucker RP, Cukrowicz KC, & Joiner TE(2020).The Interpersonal Needs Questionnaire: Statistical considerations for improved clinical application. Assessment, 27(3), 621–637. 10.1177/1073191118824660. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nock MK, Park JM, Finn CT, Deliberto TL, Dour HJ, & Banaji MR (2010). Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychological Science, 21, 511–517. 10.1177/0956797610364762. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prabhakaran S (2016). Performance analysis and companion functions for binary classification models. Retrieved from: http://r-statistics.co/Information-Value-With-R.html
Prinstein MJ (2008). Introduction to the special section on suicide and nonsuicidal self-injury: A review of unique challenges and important directions for self-injury science. Journal of Consulting and Clinical Psychology, 76, 1–8. 10.1037/0022-006X.76.1.1. [DOI] [PubMed] [Google Scholar]
Ribeiro JD, Franklin JC, Fox KR, Bentley KH, Kleiman EM, Chang BP, et al. (2016). Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: A meta-analysis of longitudinal studies. Psychological Medicine, 46, 225–236. 10.1017/S0033291715001804. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruscio J (2008). A probability-based measure of effect size: Robustness to base rates and other factors. Psychological Methods, 13, 19–30. 10.1037/1082-989X.13.1.19. [DOI] [PubMed] [Google Scholar]
Viguera AC, Milano N, Ralston L, Thompson NR, Griffith SD, Baldessarini RJ, & et al. (2015). Comparison of electronic screening for suicidal risk with the patient health questionnaire item 9 and the Columbia Suicide Severity Rating Scale in an outpatient psychiatric clinic. Psychosomatics: Journal of Consultation and Liaison Psychiatry, 56(5), 460–469. 10.1016/j.psym.2015.04.005. [DOI] [PubMed] [Google Scholar]
Wortzel HS, Matarazzo B, & Homaifar B (2013). A model for therapeutic risk management of the suicidal patient. Journal of Psychiatric Practice, 19, 323–326. 10.1097/01.pra.0000432603.99211.e8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix s2

NIHMS1677209-supplement-Appendix_s2.xlsx^{(34.9KB, xlsx)}

Appendix_S1

NIHMS1677209-supplement-Appendix_S1.docx^{(503.1KB, docx)}

[R1] Agresti A (2013). Categorical data analysis (3rd ed.). New York, NY: Wiley. [Google Scholar]

[R2] Anestis MD, Mohn RS, Dorminey JW, & Green BA (2019). Detecting potential underreporting of suicide ideation among U.S. Military personnel. Suicide & Life-Threatening Behavior, 49, 210–220. 10.1111/sltb.12425. [DOI] [PubMed] [Google Scholar]

[R3] Ansell EB, Wright AGC, Markowitz JC, Sanislow CA, Hopwood CJ, Zanarini MC, et al. (2015). Personality disorder risk factors for suicide attempts over 10 years of follow-up. Personality Disorders: Theory, Research, and Treatment, 6, 161–167. 10.1037/per0000089. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Belsher BE, Smolenski DJ, Pruitt LD, Bush NE, Beech EH, Workman DE, et al. (2019). Prediction models for suicide attempts and deaths: A systematic review and simulation. JAMA Psychiatry, 76, 642–651. 10.1001/jamapsychiatry.2019.0174. [DOI] [PubMed] [Google Scholar]

[R5] Burke TA, Ammerman BA, & Jacobucci R (2019). The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: A systematic review. Journal of Affective Disorders, 245, 869–884. 10.1016/j.jad.2018.11.073. [DOI] [PubMed] [Google Scholar]

[R6] Carter G, & Spittal MJ (2018). Suicide risk assessment: Risk stratification is not accurate enough to be clinically useful and alternative approaches are needed. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 39, 229–234. 10.1027/0227-5910/a000558. [DOI] [PubMed] [Google Scholar]

[R7] Cohen J (1986). Statistical approaches to suicidal risk factor analysis. Annals of the New York Academy of Sciences, 487, 34–41. 10.1111/j.1749-6632.1986.tb27883.x. [DOI] [PubMed] [Google Scholar]

[R8] Cukrowicz KC, Jahn DR, Graham RD, Poindexter EK, & Williams RB (2013). Suicide risk in older adults: Evaluating models of risk and predicting excess zeros in a primary care sample. Journal of Abnormal Psychology, 122, 1021–1030. 10.1037/a0034953. [DOI] [PubMed] [Google Scholar]

[R9] Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. (2017). Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research, 143, 187–232. 10.1037/bul0000084 [DOI] [PubMed] [Google Scholar]

[R10] Hastie T, Tibshirani R, & Friedman JH (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York, NY: Springer. [Google Scholar]

[R11] Homaifar B, Matarazzo B, & Wortzel HS (2013). Therapeutic risk management of the suicidal patient: Augmenting clinical suicide risk assessment with structured instruments. Journal of Psychiatric Practice, 19, 406–409. 10.1097/01.pra.0000435039.68179.70. [DOI] [PubMed] [Google Scholar]

[R12] Hu MC, Pavlicova M, & Nunes EV (2011). Zero-inflated and hurdle models of count data with extra zeros: Examples from an HIV-risk reduction intervention trial. American Journal of Drug & Alcohol Abuse, 37, 367–375. 10.3109/00952990.2011.597280. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Katz I, Barry CN, Cooper SA, Kasprow WJ, & Hoff RA (2019). Use of the Columbia-Suicide Severity Rating Scale (C-SSRS) in a large sample of veterans receiving mental health services in the Veterans Health Administration. Suicide & Life-Threatening Behavior, 50, 111–121. 10.1111/sltb.12584. [DOI] [PubMed] [Google Scholar]

[R14] Kuhn M, & Johnson K (2013). Applied predictive modeling. New York, NY: Springer. [Google Scholar]

[R15] Lindh ÅU, Waern M, Beckman K, Renberg ES, Dahlin M, & Runeson B (2018). Short term risk of non-fatal and fatal suicidal behaviours: The predictive validity of the Columbia-Suicide Severity Rating Scale in a Swedish adult psychiatric population with a recent episode of self-harm. BMC Psychiatry, 18, 319. 10.1186/s12888-018-1883-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Madan A, Frueh BC, Allen JG, Ellis TE, Rufino KA, Oldham JM, et al. (2016). Psychometric reevaluation of the Columbia-Suicide Severity Rating Scale: Findings from a prospective, inpatient cohort of severely mentally ill adults. The Journal of Clinical Psychiatry, 77, e867–e873. 10.4088/JCP.15m10069. [DOI] [PubMed] [Google Scholar]

[R17] Meehl PE (1973). Why I do not attend case conferences. In Meehl PE (Ed.), Psychodiagnosis: Selected papers (pp. 225–302). Minneapolis, MN: University of Minnesota Press. [Google Scholar]

[R18] Mitchell SM, Brown SL, Roush JF, Bolaños AD, Littlefield AK, Marshall AJ, et al. (2017). The clinical application of suicide risk assessment: A theory-driven approach. Clinical Psychology & Psychotherapy, 24, 1406–1420. 10.1002/cpp.2086. [DOI] [PubMed] [Google Scholar]

[R19] Mitchell SM, Brown SL, Roush JF, Tucker RP, Cukrowicz KC, & Joiner TE(2020).The Interpersonal Needs Questionnaire: Statistical considerations for improved clinical application. Assessment, 27(3), 621–637. 10.1177/1073191118824660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Nock MK, Park JM, Finn CT, Deliberto TL, Dour HJ, & Banaji MR (2010). Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychological Science, 21, 511–517. 10.1177/0956797610364762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Prabhakaran S (2016). Performance analysis and companion functions for binary classification models. Retrieved from: http://r-statistics.co/Information-Value-With-R.html

[R22] Prinstein MJ (2008). Introduction to the special section on suicide and nonsuicidal self-injury: A review of unique challenges and important directions for self-injury science. Journal of Consulting and Clinical Psychology, 76, 1–8. 10.1037/0022-006X.76.1.1. [DOI] [PubMed] [Google Scholar]

[R23] Ribeiro JD, Franklin JC, Fox KR, Bentley KH, Kleiman EM, Chang BP, et al. (2016). Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: A meta-analysis of longitudinal studies. Psychological Medicine, 46, 225–236. 10.1017/S0033291715001804. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Ruscio J (2008). A probability-based measure of effect size: Robustness to base rates and other factors. Psychological Methods, 13, 19–30. 10.1037/1082-989X.13.1.19. [DOI] [PubMed] [Google Scholar]

[R25] Viguera AC, Milano N, Ralston L, Thompson NR, Griffith SD, Baldessarini RJ, & et al. (2015). Comparison of electronic screening for suicidal risk with the patient health questionnaire item 9 and the Columbia Suicide Severity Rating Scale in an outpatient psychiatric clinic. Psychosomatics: Journal of Consultation and Liaison Psychiatry, 56(5), 460–469. 10.1016/j.psym.2015.04.005. [DOI] [PubMed] [Google Scholar]

[R26] Wortzel HS, Matarazzo B, & Homaifar B (2013). A model for therapeutic risk management of the suicidal patient. Journal of Psychiatric Practice, 19, 323–326. 10.1097/01.pra.0000432603.99211.e8. [DOI] [PubMed] [Google Scholar]

PERMALINK

Using categorical data analyses in suicide research: Considering clinical utility and practicality

Sean M Mitchell

Ian Cero

Andrew K Littlefield

Sarah L Brown

Abstract

Objective:

Method:

Results:

Conclusion:

1 ∣. INTRODUCTION

1.1 ∣. A note on machine learning (ML)

2 ∣. REVIEW OF LOGISTIC REGRESSION AND ITS CLINICAL UTILITY

2.1 ∣. Binary logistic regression

2.1.1 ∣. Link functions, odds, and log-odds

2.1.2 ∣. Odds and log-odds

2.1.3 ∣. Logits, odds ratios, and their interpretation

FIGURE 1.

2.2 ∣. Avoiding errors when interpreting odds ratios

2.3 ∣. Converting odds ratios to probability to improve clinical interpretation

FIGURE 2.

2.4 ∣. Demonstrations and examples

2.5 ∣. An example from the literature

3 ∣. A NOTE ON MULTI-CATEGORICAL AND COUNT MODELS

4 ∣. RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE ANALYSES AND CLINICAL CUTOFF SCORES

4.1 ∣. Sensitivity, specificity, PPV, and NPV in suicide research and clinical decision making

TABLE 1.

TABLE 2.

4.2 ∣. Area under the curve (AUC) statistics

FIGURE 3.

FIGURE 4.

4.2.1 ∣. An example from the literature

4.3 ∣. Determining clinical cutoff scores

4.3.1 ∣. An example from the literature

4.3.2 ∣. The utility of clinical cutoff scores

5 ∣. WHY IS PREDICTION DIFFICULT IN SUICIDE RESEARCH?

6 ∣. CONCLUSIONS

Supplementary Material

ACKNOWLEDGEMENT

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases