Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 6.
Published in final edited form as: Assessment. 2020 Apr 21;28(6):1624–1634. doi: 10.1177/1073191120913626

Mokken Scale Analysis of Lifetime Responses on the Columbia Suicide Severity Rating Scale’s Severity of Ideation Subscale

Jeffrey V Tabares 1,2, Jonathan E Butner 1, Craig J Bryan 1,2, Julia A Harris 1,2
PMCID: PMC9447347  NIHMSID: NIHMS1827296  PMID: 32316747

Abstract

Suicide risk screening assumes that suicidal thoughts and behaviors exist on a continuous, hierarchical spectrum with some suicidal thoughts implicated with greater risk for suicidal behaviors. However, screening measures based on the hierarchical model may not capture the suicide risk construct. This study assessed psychometric properties of the Columbia Suicide Severity Rating Scale (CSSRS) for (a) between- and within-person measurement dimensions, (b) item utility in capturing the suicide risk construct, and (c) tenability of a hierarchical risk model. We found that the CSSRS functions differentially between and within individuals, CSSRS items capture more suicide risk construct, and that CSSRS items in current practice likely appear in the correct order. The current CSSRS reasonably represents within-person suicide risk, but not between-person risk. Scale norms or alternate scoring could facilitate functional equivalence and utility for between- and within-person CSSRS dimensions.

Keywords: suicide, suicide risk, suicide assessment, suicidal behavior, suicidal ideation, CSSRS, military personnel


A central assumption of suicide risk screening is that suicidal thoughts and behaviors exist on a continuous spectrum. This spectrum has traditionally been organized hierarchically, such that some types of suicidal thoughts are presumed to be located higher on the spectrum than others, implicating greater risk for suicidal behaviors. For example, “passive” suicidal thoughts (e.g., thinking about death, wishing that one was dead) are often assumed to be lower risk than “active” suicidal thoughts (e.g., thinking about ending one’s life). Suicide planning, in turn, is considered to be an even higher risk form of suicidal thinking. This assumed hierarchy of suicide risk provides the conceptual foundation for many suicide risk screening and assessment tools and approaches.

Although the hierarchical model of suicide risk has played an important role in suicide research and clinical practice for some time (e.g., Andrews & Lewinsohn, 1992; McKeown et al., 1998; Paykel et al., 1974), several lines of evidence appear to contradict this notion. For example, multiple studies suggest that up to two thirds of individuals who attempt suicide deny experiencing severe suicidal thinking and/or planning in the time leading up to the attempt (Borges et al., 2000; Conner et al., 2006; Conner et al., 2007; Jeon et al., 2010; Jiang et al., 2010; Kessler et al., 1999; Nock et al., 2014; Ursano et al., 2015). Furthermore, individuals who endorse more severe or “active” levels of suicidal ideation do not necessarily endorse less severe or “passive” levels of suicidal ideation (Millner et al., 2016). Studies have further shown that many individuals who have attempted suicide report “skipping” more severe forms of suicidal ideation (i.e., planning), describing a process whereby they jumped from relatively mild levels of suicidal thinking directly to suicidal attempts (Shelef et al., 2019; Wyder & De Leo, 2007). Taken together, these findings suggest that suicide risk screening and assessment scales based on the hierarchical continuum model of suicide risk may not always reflect the true nature of the underlying construct.

One such scale, the Columbia Suicide Severity Rating Scale (CSSRS; Posner et al., 2011), has emerged as a popular suicide risk–screening tool across health care systems including the U.S. Armed Forces (Military Health System, 2017) and Department of Veterans Affairs (e.g., Peterson et al., 2018). Central to the CSSRS’s design is a multistep assessment approach with built-in “skip logic” that is based on an assumed hierarchy of suicide risk. Specifically, the CSSRS begins with two items that assess the respondent’s wish to be dead (e.g., “I wish I was dead”) and nonspecific active suicidal thoughts (e.g., “I’ve thought about killing myself”). If the respondent responds affirmatively to either of these two items, they are presented with three additional items that assess active suicidal ideation with any method but no plan or no intent to act, active suicidal ideation with some intent to act but no plan, and finally active suicidal ideation with specific plan and intent. The assessment of active suicidal ideation is therefore conditioned on the individual’s endorsement of the wish to be dead and/or nonspecific active suicidal thoughts, based on the assumption that more severe forms of suicidal ideation subsume less severe forms of suicidal ideation.

In combination, these five binary items make up the “Severity of Suicidal Ideation” subscale, with overall severity of suicidal ideation being determined by the respondent’s highest endorsed item, resulting in a possible score ranging from 0 (no suicidal thoughts) to 5 (active suicidal ideation with specific plan and intent; Nilsson et al., 2013). When used as a suicide risk–screening tool, higher scores are therefore assumed to reflect incremental increases in the risk for later suicidal behavior. Conversely, when used as a clinical outcome tool, decreasing scores over time are assumed to reflect clinical improvement, whereas increasing scores are assumed to reflect clinical worsening. Based on this assumption of a hierarchical continuum, clinical decision-making algorithms have also been developed wherein scores of 1 or 2 are considered low risk, a score of 3 is considered moderate risk, and scores of 4 or 5 are considered high risk (Columbia Lighthouse Project, 2017).

The design of the CSSRS’s Severity of Suicidal Ideation subscale is akin to a Mokken (1971) scale, whereby individuals can answer yes or no to any of the items, with the assumed hierarchy implying that the probability of answering “yes” to each successive item should decrease, consistent with the form outlined by Guttman (1950). This pattern of decreasing probability can be examined by comparing the probability of successive CSSRS item endorsement. Mokken scales can be thought of as a special case of Rasch models from item response theory, but with the added constraint of assumed ordering of response probabilities. Such scales are believed to be unidimensional (i.e., capture a single latent construct), although the relative importance of each item as a measure of the underlying latent construct may differ, such that relatively more important items will show larger factor loadings than others, denoting a greater degree of information regarding the latent construct. Scales in which each item possesses differential importance are known as two-parameter Rasch models, whereas scales in which each item possesses equivalent (or additive) importance are known as one-parameter Rasch models. The probability hierarchy for Mokken scales would then be an added examination on the difficulties (if conducted in an item response theory package) or thresholds (if conducted in a structural equation modeling package). These notions of Mokken scales can be expanded to consider a multilevel structure inclusive of both between-person (i.e., the scale’s ability to distinguish between different individuals with different levels of the latent construct) and within-person properties (i.e., the scale’s ability to distinguish between different levels of the latent construct within a given individual).

Although the CSSRS has become a widely used method for suicide risk screening and assessment in clinical and research settings, these particular aspects of the scale’s design and measurement model have received little to no empirical investigation. As a result, little is known about the Severity of Suicidal Ideation subscale’s construct validity, specifically with respect to the properties implied by treating items as consecutively indicative of increasing suicide risk severity. To fill this knowledge gap, the primary aim of the present study was to conduct a preliminary psychometric evaluation of the CSSRS’s Severity of Suicidal Ideation subscale in a sample of U.S. military personnel and veterans endorsing a lifetime history of suicidal thoughts and/or behaviors. To achieve this aim, we examined three fundamental properties of the lifetime version of the CSSRS relevant to how the scale is currently used in both research and clinical settings.

First, we tested if the lifetime CSSRS distinguishes between various levels of suicide risk both between people and within people over time. Differences in lifetime CSSRS responses between unique individuals may not mean the same thing as differences in lifetime CSSRS responses within a given individual over time. If supported, this would suggest that the scale may have differential utility as a method for distinguishing individuals with varying levels of risk versus a method for tracking or monitoring fluctuations in risk over time. Second, we tested if lifetime CSSRS items are better suited as an additive scale, meaning that responses to each item are equally informative, or if the items have differential importance, meaning that responses to some items are more discriminative than others. Note that, in line with Rasch modeling, item responses can still inform higher and lower levels of suicidal ideation regardless of item discrimination. The key difference is in the weighting of items toward their contributions to understand the underlying ideation dimension. In practical terms, this analysis may suggest the need for an alternative scoring and interpretation scheme than the 0 to 5 scale currently in widespread use. Third, we tested if CSSRS items tended to be answered with decreasing frequency, as would be expected for a scale intended to represent a hierarchical construct. If unsupported, this would suggest that treating the CSSRS as a Mokken scale misrepresents the underlying latent construct (i.e., suicide risk). These three notions are nearly orthogonal, meaning that one notion does not depend on either of the other two. When considered in combination, however, they inform best practices for scale use.

Method

Participants

Participants were 193 military personnel and veterans (76.2% male; age in years M = 41.6, SD = 10.3) from all conflict eras (40.4% Operation Enduring Freedom/Operation Iraqi Freedom) recruited across the United States through online advertisements posted in Internet communities, social media websites (e.g., Facebook, Reddit), or Craigslist. Participants were 77.7% Caucasian, 11.9% Black/African American, 3.6% Hispanic/Latino, 1.0% Asian, 1.0% Native American, and 4.7% selected “other race.” All branches of the military were represented, with a breakdown of 67.9% army, 13.0% air force, 7.3% navy, 10.9% marines, and 1.0% national guard and/or reserves. For education, 14.5% obtained a graduate-level degree, 21.2% obtained a degree from a 4-year college program, 19.2% completed a 2-year college degree program, 32.1% completed some college, 10.9% had a high school diploma or graduate equivalency degree, and 1% completed some high school. Combat exposure during time in service was endorsed by 58.5% of the sample. Additionally, 59.5% of the sample scored above 33 on the Posttraumatic Stress Disorder Checklist for the Diagnostic and Statistical Manual of Mental Disorders–Fifth edition (Weathers et al., 2013), the recommended threshold for a probable diagnosis of posttraumatic stress disorder.

All participants completed a 26-item screening questionnaire to assess for the presence of lifetime suicidal thoughts and behaviors. Individual items for the screening questionnaire were extracted from the Self-Injurious Thoughts and Behaviors Interview (Nock et al., 2007). These questions assessed for a lifetime history of suicide ideation, plans, gestures, attempts, and nonsuicidal self-injury. Inclusion criteria included (a) current or previous service in the U.S. military, (b) ability to speak and understand the English language, (c) 18 years of age or older, (d) ability to correctly answer verification questions about military job codes, and (e) willingness to provide contact information for the purposes of repeated assessment. No one who completed the study screener was excluded for any of these criteria. Participants who completed the survey at only one of the two time points of assessment were included in the analyses.

Procedures

Once eligibility criteria were established with the screening questionnaire, participants received an e-mail from a trained researcher at the University of Utah with a link to an online survey. The survey was completed at two time points separated by 1 month. At both time points, participants were asked to complete the CSSRS and other self-report surveys. Surveys were counterbalanced to reduce ordering effects. Participants were compensated $15 in Amazon.com gift cards for their participation: $5 for completing the first survey and $10 for completing the second survey. The average number of days between completion of the first and second surveys was 31.1 (SD = 38.7).

Measures

The CSSRS (Posner et al., 2011) assesses the severity and intensity of suicidal ideation and the occurrence of suicidal behavior during the patient’s lifetime. Although the scale was originally developed as a semistructured clinical interview, the scale has been widely used as a self-report instrument. Multiple versions of the scale have been used to assess suicide risk during different timeframes (e.g., lifetime history, past month, past 3 months, and time since last assessment). In the present study, we administered the scale as a self-report instrument and used the lifetime assessment timeframe. This version of the scale is often used during intakes in clinical settings and at baseline assessments in research settings to assess a respondent’s history of suicidal thoughts and behaviors. The focus of the present study was the Severity of Suicidal Ideation subscale items, which are presented in Table 1. All five items were administered to all participants, regardless of response to the first two items.

Table 1.

Suicidal Ideation Subsection Items From the Columbia Suicide Severity Rating Scale.

Within-person reliability
Between-person reliability
Item ω (standard error) ω (standard error)

1. Wish to be dead 0.806 (0.029) 0.949 (0.014)
 Have you wished you were dead or wished you could go to sleep and not wake up?
2. Nonspecific active suicidal thoughts
 Have you actually had any thoughts of killing yourself?
3. Active suicidal ideation with any methods (not plan) without intent to act
 Have you been thinking about how you might do this?
4. Active suicidal ideation with some intent to act, without specific plan
 Have you had these thoughts and had some intention of acting on them?
5. Active suicidal ideation with specific plan and intent
 Have you started to work out or worked out the details of how to kill yourself? Do you intend to carry out this plan?

Note. Multilevel composite reliability (ω) estimates for the two-level multilevel confirmatory factor analyses model retained from the first analytic step (Table 4). Composite reliability draws against factor loadings and allows for items to have varying strengths of association with the latent construct; alpha (α), by comparison, provides consistent estimates when item-latent relationships are strong (Geldhof et al., 2014).

Data Analytic Approach

The Mokken structure of the CSSRS Severity of Suicidal Ideation subscale was assessed in a series of iterative confirmatory factor analyses (CFAs) in Mplus version 7. Comparisons of models were used to examine each of the three properties with the best-fitting model (i.e., “preferred”) indicative of the CSSRS measurement qualities. Each model utilized maximum likelihood estimation with robust standard errors (i.e., “multiple linear regression”) to account for nonnormality. Furthermore, items were modeled as latent response variables to account for their dichotomous nature (L. K. Muthén & Muthén, 1998–2017). For all models, items loaded onto a single latent construct where the first item at each level (i.e., CSSRS Item 1) was fixed to one and factor variances were estimated (i.e., “marker variable” method for latent variable identification; Kline, 2005). When models were nested (i.e., subset model [restricted] drawn against a larger model [unrestricted]; restricted and unrestricted models share common parameters), an adjusted likelihood ratio test (also known as a chi-square difference test adjusted for scaling correction) was used to identify the preferred model. Nonsignificance showed preference for the more parsimonious model. When not nested (i.e., restricted and unrestricted models have differences in shared parameters), model preference was determined by the smallest Bayesian information criterion (BIC). Furthermore, multilevel CFA (MCFA) models allowed for a within-person factor and a between-person factor. Inclusion of multilevel models in the proposed analytic approach provided insight about the performance of CSSRS factor structures for within- and between-person measurement levels as either consistent or inconsistent with the hierarchical continuum model of suicide risk. We assessed the CSSRS properties in three phases of model comparisons.

In the first step, we assessed the equivalence of the between- and within-person factor structure by comparing a model wherein the within- and between-person factor structures were equated (i.e., nonhierarchical) to a model wherein these structures differed (i.e., two-level MCFA). In Mplus, this first analytic step can be done by comparing these respective models where the estimation type is stipulated as “complex,” partialing out the dependency of clustering (i.e., individual responses “within” from the sample “between”; “complex” adjusts standard errors of clustered/multilevel data to produce parameter estimates akin to a single- or nonhierarchical model; B. O. Muthén, & Satorra, 1995) in comparison with a model where the estimation type is “two-level,” building a latent construct “within” and a second latent construct “between.” Of note, the nonhierarchical model would be nested within the two-level MCFA model where the loadings would be equated.

The preferred model from the second step would determine if items had additive or differential utility becoming the baseline for the new model. Specifically, for this step, we examined the Mokken discrimination parameter by comparing the model retained from the first step that allowed factor loadings to differ across items against a model that forced all the factor loadings to equal one. Freely estimated factor loadings allowed for tests of differential item utility while factor loadings forced to one (i.e., all items on a comparator nonhierarchical model or an entire level of the two-level MCFA) formed a one-parameter Rasch model that indicated equal discrimination of each CSSRS item and allowed for tests of additive item utility. This discrimination feature is a Mokken property shared with Rasch models (Van Schuur, 2003). If the two-level MFCA model was preferred from the first step, separate comparator models would be created to test additive and differential item utility at each level. The first model would force factor loadings to one for the within-person level (i.e., additive within) and allow free estimation of factor loadings for the between-person level (i.e., differential between). The second model would force factor loadings to one for the between-person level (i.e., additive between) and allow free estimation of factor loadings for the within-person level (i.e., differential within). Each of these separate, level-additive models would be nested within the two-level MCFA model and assessed separately (i.e., additive within—differential between vs. two-level MCFA; additive between—differential within vs. two-level MCFA). The emergence of a preferred, level-additive model would identify corresponding item utilities for those levels; otherwise, a preferred two-level MCFA model would suggest differential item utility for within- and between-person levels. Alternatively, if the nonhierarchical CFA model was preferred from the first step, comparison against a model with factor loadings forced to one for each item would take place. The preferred model from this comparison would indicate either differential (i.e., nonhierarchical model with freely estimated factor loadings) or additive item utility (i.e., nonhierarchical model with forced factor loadings of one). Last, after identifying the preferred model, we examined factor loadings for any freely estimated items to determine if the assumed hierarchy of CSSRS items was supported empirically (i.e., later CSSRS items have progressively larger factor loadings relative to earlier items; Figure 1 summarizes the sequence of model comparisons and all possible models).

Figure 1.

Figure 1.

Full range of possible model comparisons.

Note. MCFA = multilevel confirmatory factor analyses. Arrows indicate the progression of the preferred model across analytic steps for each model type—the overall “preferred” model was retained after Step 3. Nested = share alpha-numeric superscripts for each model type within each step. Additionally for nested models, unrestricted models (full) are represented by solid line borders. Restricted (sub) models are represented by hashed-line borders and nested within unrestricted models. Nonnested = gray fill.

Finally, we compared the best-fitting model from the previous step with a model where the thresholds (z value representing the probability of a response of yes on an observed item) were forced to indicate consecutively smaller probabilities of occurrences. This was done through the addition of a nonlinear constraint and thus did not generate a nested test (we therefore relied on the BIC for this assessment). This final step allowed testing of the monotonic-equivalent model simulating Mokken scaling by forcing the order of the dichotomous endorsement items according to increasing difficulty. Specifically, Item 1 was assumed to be the “easiest” item, meaning it would be the most frequently endorsed item; Item 2 was assumed to be the next easiest item, meaning it would be the next most frequently endorsed item; and so on up to Item 5, which was assumed to be the most difficult item, meaning it would be the least frequently endorsed. This model fit comparison enabled us to test the assumption of conditional ordering.

Results

Overall, respondents most frequently endorsed lifetime CSSRS Items 1 (50%) and 2 (54%; see =Table 2). Test– retest reliabilities for lifetime CSSRS items assessed at Times 1 and 2 ranged from κ = .350 to κ = .608 (Table 3), which corresponds from fair to moderate agreement according to Landis and Koch’s (1977) criteria.

Table 2.

Response Frequencies (%), Central Tendency, and Dispersion for Columbia Suicide Severity Rating Scale Items at Time 1, 2, and Combined.

Item Endorsement type
Yes No Missing

Time 1 Wish to be dead 112 (58%) 74 (38%) 7 (4%)
Nonspecific active suicidal thoughts 119 (62%) 67 (35%) 7 (4%)
Active suicidal ideation with any methods (not plan) without intent to act 71 (37%) 48 (25%) 74 (38%)
Active suicidal ideation with some intent to act, without specific plan 59 (31%) 60 (31%) 74 (38%)
Active suicidal ideation with specific plan and intent 48 (25%) 71 (37%) 74 (38%)
MedianTime| = 2 1st quartile (25%)Time| = 1
Interquartile RangeTime| = 3 3rd quartile (75%)Time| = 4
Time 2 Wish to be dead 82 (42%) 65 (34%) 46 (24%)
Nonspecific active suicidal thoughts 90 (47%) 58 (30%) 45 (23%)
Active suicidal ideation with any methods (not plan) without intent to act 50 (26%) 40 (21%) 103 (53%)
Active suicidal ideation with some intent to act, without specific plan 42 (22%) 48 (25%) 103 (53%)
Active suicidal ideation with specific plan and intent 30 (16%) 60 (31%) 103 (53%)
MedianTime2 = 2 1st Quartile (25%)Time2 = 1
Interquartile rangeTime2 = 2 3rd Quartile (75%)Time2 = 3
Combined time Wish to be dead 194 (50%) 139 (36%) 53 (14%)
Nonspecific active suicidal thoughts 209 (54%) 125 (32%) 52 (13%)
Active suicidal ideation with any methods (not plan) without intent to act 121 (31%) 88 (23%) 177 (46%)
Active suicidal ideation with some intent to act, without specific plan 101 (26%) 108 (28%) 177 (46%)
Active suicidal ideation with specific plan and intent 78 (20%) 131 (34%) 177 (46%)
MedianCombined_Time = 2 1st Quartile (25%)Combined_Time = 1
Interquartile rangecombined_Time = 3 3rd Quartile (75%)Combined_Time = 4

Note. Interquartile ranges calculated as the difference between the third (75%) and first (25%) quartiles. “Combined time” consolidates reported endorsements at Times 1 and 2.

Table 3.

Test–Retest Reliabilities (Cohen’s Kappa) Between Columbia Suicide Severity Rating Scale Items Assessed at Times 1 and 2.

Time 2

Wish to be dead Nonspecific active suicidal thoughts Active suicidal ideation with any methods (not
plan) without intent
to act
Active suicidal ideation with some
intent to act, without specific plan
Active suicidal ideation with
specific plan and intent

Time 1 Wish to be dead .445 (p = .000) .281 (p = .001) .132 (p = .199) .200 (p = .040) .226 (p = .007)
Nonspecific active suicidal thoughts .336 (p = .000) .350 (p = .000) .238 (p = .013) .231 (p = .010) .204 (p = .006)
Active suicidal ideation with any methods (not plan) without intent to act .221 (p = .034) .200 (p = .045) .608 (p = .000) .346 (p = .004) .311 (p = .004)
Active suicidal ideation with some intent to act, without specific plan .045 (p = .647) .182 (p = .049) .376 (p = .002) .540 (p = .000) .407 (p = .001)
Active suicidal ideation with specific plan and intent .132 (p = .130) .120 (p = .125) .303 (p = .007) .434 (p = .000) .561 (p = .000)

Note. Bolded values denote test-retest reliability coefficients for the same Columbia Suicide Severity Rating Scale items assessed at times 1 and 2.

Our first analytic step assessed the equivalence of the lifetime CSSRS factor structure between- and within-person under two conditions: (a) where the between- and within-person structures were equal and (b) where the between- and within-person structures were different. The former model (nonhierarchical model) allowed the five CSSRS items to load freely onto a single latent structure for suicidal ideation whereas the latter model (two-level MCFA) allowed the five CSSRS items to load freely onto two separate levels (i.e., between- and within-person). The two-level MCFA model structure provided better fit, χ2(5, N = 193) = 41.28, p < .001, which suggests that the between- and within-person item contributions differed (Table 4). In light of this finding, the two-level MCFA model was retained for the second analysis.

Table 4.

Properties of the Preferred Two-Level Model Retained From the First Analytic Step.

Level Estimate Standard error p

Within Item
 CSSRS1 1.00a 0.00 999.00
 CSSRS2 1.79 3.31 .59
 CSSRS3 1.34 1.45 .36
 CSSRS4 1.29 1.80 .47
 CSSRS5 3.13 5.61 .58
Variances
 CSSRSW 1.18 2.01 .56
Between Item
 CSSRS1 1.00a 0.00 999.00
 CSSRS2 1.74 1.18 .14
 CSSRS3 3.29* 1.61 .04
 CSSRS4 2.83* 0.98 .00
 CSSRS5 4.93 4.33 .25
Thresholds
 CSSRS1$1 −0.54 0.23 .02
 CSSRS2$1 −1.19 0.77 .12
 CSSRS3$1 1.33 0.76 .08
 CSSRS4$1 2.04 0.62 .00
 CSSRS5$1 5.45 4.87 .26
Variances
 CSSRSB 2.02 0.92 .03

Note. CSSRS= Columbia Suicide Severity Rating Scale.

a

Indicates item loading equated to “1” (not estimated) for “marker variable” method of latent variable identification (Kline, 2005).

*

Denotes significant loading at p < .05.

In our second analytic step, comparisons for two-level models indicated that the within-person loadings were equivalent, χ2(4, N = 193) = 4.184, p = .382, but the between-person loadings were not, χ2(4, N = 193) = 23.414, p < .001. As a follow-up, we examined a model where only the between-person loadings were fixed to one, but this resulted in model misspecification, indicating a problem with sparseness under this strict measurement assumption. This suggests that CSSRS items were equally important within people over time. Items were differentially discriminatory between people, however, suggesting that some items were more useful than others for distinguishing between individuals with varying levels of suicide risk. The two-level MCFA “additive within—differential between” model was retained toward the third analysis.

In our third analytic step, we used the model with within-person loadings fixed to one, between-person loadings allowed to differ, and added the constraint that the thresholds must be in order, such that the probability of endorsing each incremental item must decline. This monotonicity requirement indicated a nearly equivalent BIC even using liberal criteria (1480.218 vs. 1477.422, respectively), suggesting that forced ordering did not decrease fit and that the response probabilities for CSSRS items do maintain a Mokken-style ordering. Table 5 contains the loadings, factor variances, and thresholds from the final model where monotonicity of thresholds were forced.

Table 5.

Properties of the Final Model: Two-Level Structure, Between-Person Estimated (Within-Person Constrained), and Forced-Order (Monotonic) Thresholds Per Item.

Level Estimate Standard error p

Within Item
 CSSRS1 1.00a 0.00 999.00
 CSSRS2 1.00a 0.00 999.00
 CSSRS3 1.00a 0.00 999.00
 CSSRS4 1.00a 0.00 999.00
 CSSRS5 1.00a 0.00 999.00
Variances
 CSSRSW 2.13 0.78 .01
Between Item
 CSSRS1 1.00b 0.00 999.00
 CSSRS2 1.23* 0.34 .00
 CSSRS3 2.89* 1.05 .01
 CSSRS4 2.75* 1.01 .01
 CSSRS5 3.05* 0.80 .00
Thresholds
 CSSRS1$1 −0.76 0.21 .00
 CSSRS2$1 −0.76 0.21 .00
 CSSRS3$1 1.26 0.60 .04
 CSSRS4$1 2.14 0.64 .00
 CSSRS5$1 3.52 0.72 .00
Variances
 CSSRSB 2.50 1.17 .03

Note. CSSRS= Columbia Suicide Severity Rating Scale.

a

Identifies variables with fixed factor loadings to assess additive item utility.

b

Indicates item loading equated to “1” (not estimated) for “marker variable” method of latent variable identification (Kline, 2005).

*

Denotes significant loading at p < .05.

Discussion

The present findings provide new insight into the structure of the lifetime CSSRS Suicidal Ideation subscale, which is a commonly used scale in clinical and research settings. First, our results suggest that a two-level structure fit the lifetime CSSRS data better than a between-within equivalent structure, suggesting that the scale functions differently within and between individuals. Second, lifetime CSSRS items possess differential importance when making comparisons across multiple people, but may be of equal utility when considering response stability and change within people over time. When considered in combination, these findings suggest that the respondents generally answer lifetime CSSRS Suicidal Ideation items in a consistent manner over time, but comparing lifetime CSSRS scores between people may not be so simple. Specifically, some items appear to provide more information about suicide risk than others, suggesting that the 0 to 5 scale commonly used to distinguish lower from higher risk patients (or research subjects) may not be meaningful.

Consistent with a hierarchical structure, we observed that lower level items captured less information about respondent suicidal ideation than higher level items. However, accounting for this hierarchical structure simultaneously with observed differences in scale functionality within and between respondents suggests that variation across people on the lower end of the scale is more error prone than variation on the higher end of the scale. In other words, the lifetime CSSRS items that are often administered to all respondents as an initial screener are the items that are most problematic. This may be due to ambiguous phrasing that may contribute to differential interpretations and wide response ranges among respondents (Giddens et al., 2014). Our results further suggest that, consistent with Mokken scales, the 0 to 5 scoring approach provided a reasonable representation of suicide risk within individuals, but not between individuals. Specifically, lower level items provided less useful information about suicide risk than higher level items. The difference between a score of 0 and a score of 1 was therefore less meaningful than the difference between a score of 3 and a score of 4. Furthermore, because the CSSRS employs a conditional response design wherein higher level items are typically presented only if the lower level items are positively endorsed, the present results suggest that the scale could result in larger than expected measurement error when used for the purposes of triaging patients and/or drawing conclusions about differences in suicide risk across multiple individuals or research participants.

The observed between-person thresholds associated with each lifetime CSSRS item further suggest that 1-point increases in scale score do not necessarily represent equivalent increases in suicide risk. A 1-point increase in score from 1 to 2, for instance, represented a negligible difference in suicide risk. A 1-point increase in score from 2 to 3, by comparison, represented a 2 standard deviation increase in suicide risk, a 1-point increase from 3 to 4 represented an approximately 0.9 standard deviation increase in suicide risk, and a 1-point increase from 4 to 5 represented a 1.4 standard deviation increase in suicide risk.

From a clinical perspective, our results implicate several conclusions. First, between-person measurement of lifetime suicide risk may be improved by eliminating the scale’s conditional response design, such that all five Severity of Ideation items are administered routinely, or by eliminating the scale’s first two items, such that only the three highest ranking items are administered routinely. Additional research is needed to determine if either or both of these options could sufficiently address the measurement issues identified in this study. Second, an alternative scoring and scaling method should be developed and tested, as the present findings suggest that conclusions based on comparing scores between patients or research participants may be less meaningful than desired and even prone to error. In particular, our results suggest that caution may be warranted with respect to clinical decision making based solely on CSSRS Severity of Ideation score. Specifically, patients with different scores may possess similar levels of suicide risk; conversely, patients with the same score may have meaningfully different levels of suicide risk. With larger samples, it would be possible, for instance, to generate scale norms wherein the loadings themselves can be used as weights to generate a more accurate scale ranking of values that could function both within and between people over time. Future studies should investigate this possibility to better understand the implications for clinical practice and research.

Several limitations of the present study warrant discussion. First, confirmation of item order for a true Mokken scale requires completion of two homogeneity models. The final model presented in the current study has features indicating differential importance (i.e., loadings) and endorsement probability (i.e., thresholds) that respectively correspond to the discrimination and item parameters of the monotone homogeneity (MH) model of Mokken scaling (Niemoller & Van Schuur, 1983; Van Schuur, 2003). Van Schuur (2003) indicated that the MH model alone is not sufficient to confirm that participants universally perceive the same item ordering, however. Typically, the double homogeneity (DH) model provides item order confirmation after completing the MH model, but the DH model procedures have not been adapted for multilevel-structured data (i.e., individual responses over time nested within sample group for the present study). Second, our sample was modest in size, which could have inflated the Type 1 error rate for between- and within-person CFA models. Type 1 error rates for CFA have been found to stabilize around 5% with sample sizes ranging from 300 to 500 participants (Bandalos, 2014).

Third, the present study only included two temporal assessment points, which could raise questions about model fit comparisons with CFAs instead of a measurement invariance model (e.g., comparing CSSRS properties at Time Point 1 vs. Time Point 2) to assess psychometric properties of the lifetime CSSRS. The analytic approach for the present study serves as a prototype for later psychometric studies that utilize between- and within-level data collected from many more than two time points. Fourth, our study design administered the lifetime version of the CSSRS only rather than the past-week or the since-last-assessment versions, the latter of which are more commonly used for tracking and monitoring within-person change in suicide risk. To this end, our findings specific to the lifetime CSSRS’s within-person performance likely reflect consistency in responding over time rather than actual change in acute suicide risk. Additional research using these alternative versions of the CSSRS are needed to provide a more definitive test of the scale’s within-person properties. Last, generalizability may be limited by the nature of the study sample, which was predominantly male and composed entirely of military personnel and veterans. Future studies using more diverse samples are needed to replicate these findings. Generalizability to other groups with varying levels of suicide risk (e.g., psychiatric inpatients with higher suicide risk, on average, and community samples with lower suicide risk, on average) may also be limited due to the possibility that different groups may interpret and respond to CSSRS items in different way. Relatedly, the performance of CSSRS items may differ across clinical and nonclinical settings due to the different demand characteristics. Additional studies designed to test various aspects of measurement invariance are needed to test these possibilities.

Overall, additional research with larger sample sizes and a greater number of repeated assessment points per respondent are needed to further test the psychometric properties of all versions of the CSSRS. Such studies could provide further psychometric insight about the utility of CSSRS Suicidal Ideation subscale items at between- and within-person levels and facilitate scale norming. Further development of measures like the CSSRS can improve extant screening procedures (e.g., Veterans Affairs Universal Suicide Screen; Peterson et al., 2018) by enhancing detection of high-risk respondents.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this work was made possible by the National Institute of Mental Health of the National Institutes of Health under award R01MH117600 (PI: Craig Bryan). The views expressed herein are solely those of the authors and do not reflect an endorsement by or the official policy or position of the National Institutes of Health or the U.S. Government.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Andrews JA, & Lewinsohn PM (1992). Suicidal attempts among older adolescents: Prevalence and co-occurrence with psychiatric disorders. Journal of the American Academy of Child & Adolescent Psychiatry, 31(4), 655–662. 10.1097/00004583-199207000-00012. [DOI] [PubMed] [Google Scholar]
  2. Bandalos DL (2014). Relative performance of categorically diagonally weighted least squares and robust maximum likelihood estimation. Structural Equation Modeling, 21(1), 102–116. 10.1080/10705511.2014.859510 [DOI] [Google Scholar]
  3. Borges G, Walters EE, & Kessler RC (2000). Associations of substance use, abuse, and dependence with subsequent suicidal behavior. American Journal of Epidemiology, 151(8), 781–789. 10.1093/oxfordjournals.aje.a010278 [DOI] [PubMed] [Google Scholar]
  4. Columbia Lighthouse Project. (2017). Guidelines for triage using the C-SSRS: Examples for using the scale for triage in a clinical setting. https://cssrs.columbia.edu/documents/clinical-triage-guidelines-using-c-ssrs/
  5. Conner KR, Duberstein PR, Beckman A, Heisel MJ, Hirsch JK, Gamble S, & Conwell Y.(2007). Planning of suicide attempts among depressed inpatients ages 50 and over. Journal of Affective Disorders, 97(1–3), 123–128. 10.1016/j.jad.2006.06.003 [DOI] [PubMed] [Google Scholar]
  6. Conner KR, Hesselbrock VM, Schuckit MA, Hirsch JK, Knox KL, Meldrum S, Bucholz KK, Kramer J, Kuperman S, Preuss U, & Soyka M.(2006). Precontemplated and impulsive suicide attempts among individuals with alcohol dependence. Journal of Studies on Alcohol and Drugs, 67(1), 95–101. 10.15288/jsa.2006.67.95 [DOI] [PubMed] [Google Scholar]
  7. Geldhof GJ, Preacher KJ, & Zyphur MJ (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. 10.1037/a0032138 [DOI] [PubMed] [Google Scholar]
  8. Giddens JM, Sheehan KH, & Sheehan DV (2014). The Columbia-Suicide Severity Rating Scale (C-SSRS): Has the “Gold Standard” become a liability? Innovations in Clinical Neuroscience, 11(9–10), 66–80. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267801/ [PMC free article] [PubMed] [Google Scholar]
  9. Guttman L.(1950). The basis for scalogram analysis. In Stouffer SA, et al. (Eds.), Measurement and prediction (Studies in social psychology in World War II. Vol.4, pp. 60–90). Princeton University Press. [Google Scholar]
  10. Jeon HJ, Lee JY, Lee YM, Hong JP, Won SH, Cho SJ, Kim JY, Chang SM, Lee HW, & Cho MJ (2010). Unplanned versus planned suicide attempters, precipitants, methods, and an association with mental disorders in a Korea-based community sample. Journal of Affective Disorders, 127(1–3), 274–280. 10.1016/j.jad.2010.05.027 [DOI] [PubMed] [Google Scholar]
  11. Jiang Y, Perry DK, & Hesser JE (2010). Suicide patterns and association with predictors among Rhode Island Public High School students: A latent class analysis. American Journal of Public Health, 100(9), 1701–1707. 10.2105/AJPH.2009.183483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kessler RC, Borges G, & Walters EE (1999). Prevalence of and risk factors for lifetime suicide attempts in the National Comorbidity Survey. Archives of General Psychiatry, 56(7), 617–626. 10.1001/archpsyc.56.7.617 [DOI] [PubMed] [Google Scholar]
  13. Kline RB (2005). Principles and practice of structural equation modeling (3rd ed.). Guilford Press. [Google Scholar]
  14. Landis JR, & Koch GG (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
  15. McKeown RE, Garrison CZ, Cuffe SP, Waller JL, Jackson KL, & Addy CL (1998). Incidence and predictors of suicidal behaviors in a longitudinal sample of young adolescents. Journal of the American Academy of Child & Adolescent Psychiatry, 37(6), 612–619. 10.1097/00004583-199806000-00011 [DOI] [PubMed] [Google Scholar]
  16. Military Health System. (2017, September 5). Suicide prevention: Each of us has an important role to play. Health.mil. https://health.mil/News/Articles/2017/09/05/Suicide-Prevention-Each-of-us-has-an-important-role-to-play [Google Scholar]
  17. Millner AJ, Lee MD, & Nock MK (2016). Describing and measuring the pathway to suicide attempts: A preliminary study. Suicide and Life-Threatening Behavior, 47(3), 353–369. 10.1111/sltb.12284 [DOI] [PubMed] [Google Scholar]
  18. Mokken RJ (1971). A theory and procedure of scale analysis with applications in political research. De Gruyter. 10.1515/9783110813203 [DOI] [Google Scholar]
  19. Muthén BO, & Satorra A.(1995). Complex sample data in structural equation modeling. Sociological Methodology, 25(1995), 267–316. 10.2307/271070 [DOI] [Google Scholar]
  20. Muthén LK, & Muthén BO (1998–2017). Mplus user’s guide (8th ed.). Muthén & Muthén. [Google Scholar]
  21. Niemoller B, & Van Schuur WH (1983). Stochastic models for unidimensional scaling: Mokken and Rasch. In McKay D, Schofield N, & Whiteley P.(Eds.), Data analysis and the social sciences (pp. 120–170). Francis Printer. [Google Scholar]
  22. Nilsson ME, Suryawanshi S, Gassmann-Mayer C, Dubrava S, McSorley P, & Jiang K.(2013). Columbia–Suicide Severity Rating Scale scoring and data analysis guide. https://cssrs.columbia.edu/wp-content/uploads/ScoringandDataAnalysisGuide-for-Clinical-Trials-1.pdf [Google Scholar]
  23. Nock MK, Holmberg EB, Photos VI, & Michel BD (2007). The self-injurious thoughts and behaviors interview: Development, reliability, and validity in an adolescent sample. Psychological Assessment, 19(3), 309–317. 10.1037/1040-3590.19.3.309 [DOI] [PubMed] [Google Scholar]
  24. Nock MK, Stein MB, Heeringa SG, Ursano RJ, Colpe LJ, Fullerton CS, Hwang I, Naifeh JA, Sampson NA, Schoenbaum M, Zaslavsky AM, & Kessler RC (2014). Prevalence and correlates of suicidal behavior among soldiers: Results from the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry, 71(5), 514–522. 10.1001/jamapsychiatry.2014.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Paykel ES, Myers JK, Lindenthal JJ, & Tanner J.(1974). Suicidal feelings in the general population: A prevalence study. British Journal of Psychiatry, 124(582), 460–469. 10.1192/bjp.124.5.460 [DOI] [PubMed] [Google Scholar]
  26. Peterson K, Anderson J, & Bourne D.(2018). Evidence brief: Suicide prevention in veterans (VA ESP Project #09–199). https://www.ncbi.nlm.nih.gov/books/NBK535971/pdf/Bookshelf_NBK535971.pdf [PubMed] [Google Scholar]
  27. Posner K, Brown GK, Stanley B, Brent DA, Yershova KV, Oquendo MA, Gurrier GW, Melvin GA, Greenhill L, Shen S, & Mann JJ (2011). The Columbia-Suicide Severity Rating Scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. American Journal of Psychiatry, 168(12), 1266–1277. 10.1176/appi.ajp.2011.10111704 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Shelef L, Klomek AB, Fruchter E, Kedem R, Mann JJ, & Zalsman G.(2019). Suicide ideation severity is associated with severe suicide attempts in a military setting. European Psychiatry, 61(September), 49–55. 10.1016/j.eurpsy.2019.06.005 [DOI] [PubMed] [Google Scholar]
  29. Ursano RJ, Kessler RC, Stein MB, Naifeh JA, Aliaga PA, Fullerton CS, Sampson NA, Kao T-C, Colpe LJ, Schoenbaum M, Cox KL, & Heeringa SG (2015). Suicide attempts in the US Army during the wars in Afghanistan and Iraq, 2004 to 2009. JAMA Psychiatry, 72(9), 917–926. 10.1001/jamapsychiatry.2015.0987 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Van Schuur WH (2003). Mokken scale analysis: Between the Guttman Scale and parametric item response theory. Political Analysis, 11(2), 139–163. 10.1093/pan/mpg002 [DOI] [Google Scholar]
  31. Weathers FW, Litz BT, Keane TM, Palmieri PA, Marx BP, & Schnurr PP (2013). The PTSD Checklist for DSM-5 (PCL-5) [Scale]. www.ptsd.va.gov [Google Scholar]
  32. Wyder M, & De Leo D.(2007). Behind impulsive suicide attempts: Indications from a community study. Journal of Affective Disorders, 104(1–3), 163–173. 10.1016/j.jad.2007.02.015 [DOI] [PubMed] [Google Scholar]

RESOURCES