Abstract
Background:
Although several short-forms of the PTSD Checklist (PCL) exist, all were developed using heuristic methods. This report presents results of analyses designed to create an optimal short-form PCL for DSM-5 (PCL-5) using both machine learning and conventional scale development methods.
Methods:
The short-form scales were developed using independent datasets collected by the Army Study to Assess Risk and Resilience among Servicemembers. We began by using a training dataset (n = 8, 917) to fit short-form scales with between 1 and 8 items using different statistical methods (exploratory factor analysis, stepwise logistic regression, and a new machine learning method to find an optimal integer-scored short-form scale) to predict dichotomous PTSD diagnoses determined using the full PCL-5. A smaller subset of best short-form scales was then evaluated in an independent validation sample (n = 11, 728) to select one optimal short-form scale based on multiple operating characteristics (AUC, calibration, sensitivity, specificity, net benefit).
Results:
Inspection of AUCs in the training sample and replication in the validation sample led to a focus on 4-item integer-scored short-form scales selected with stepwise regression. Brier scores in the validation sample showed that a number of these scales had comparable calibration (.015-.032) and AUC (.984-.994), but that one had consistently highest net benefit across a plausible range of decision thresholds.
Conclusions:
The recommended 4-item integer-scored short-form PCL-5 generates diagnoses that closely parallel those of the full PCL-5, making it well-suited for screening.
Keywords: Diagnosis, Military personnel, Psychological tests/Psychometrics, Trauma and stressor related disorders
Introduction
Posttraumatic stress disorder (PTSD) is a commonly-occurring and seriously impairing disorder (Koenen et al., 2017) with a low treatment rate (Thornicroft et al., 2018). Given that screening is effective in detecting PTSD (Warner, Warner, Appenzeller, & Hoge, 2013), several validated screening scales have been developed for this purpose (Gates et al., 2012; Parker-Guilbert, Moshier, Marx, & Keane, 2018; Wisco, Marx, & Keane, 2012). The PTSD Checklist (PCL; Weathers, Litz, Herman, Huska, & Keane, 1993; Weathers et al., 2013) is one of the most widely-used of these scales (Elhai, Gray, Kashdan, & Franklin, 2005; Hoge, Riviere, Wilk, Herrell, & Weathers, 2014). The DSM-5 version of the PCL (PCL-5) assesses each of the 20 DSM-5 (American Psychiatric Association, 2013) Criteria B-E symptoms of PTSD and is recommended for screening and monitoring PTSD symptoms throughout treatment in the Department of Veterans Affairs and Department of Defense (VA/DoD; Department of Veterans Affairs, 2017).
Although the PCL-5 has excellent psychometric properties (Blevins, Weathers, Davis, Witte, & Domino, 2015; Bovin et al., 2016; Keane et al., 2014; Keane et al., 2015; Wortmann et al., 2016), one weakness is the scale’s length (5–10 minutes completion time; National Center for PTSD), which is problematic given that VA/DoD also recommend screening for many other psychiatric disorders (U.S. Department of Veterans Affairs). To reduce respondent burden, several short-form (2–6 item) versions of the DSM-IV PCL (Bliese et al., 2008; Lang & Stein, 2005) and PCL-5 (Price, Szafranski, van Stolk-Cooke, & Gros, 2016) have been created along with a computer-adaptive version of the PCL-5 (Finkelman et al., 2017, 2018). These short-forms are limited, though, either because they were developed using heuristic methods or, in the case of computer-adaptive testing, cannot be used with paper and pencil administration. Furthermore, research on comparative performance of the different short-form PCLs is limited (Tiet et al., 2013), creating uncertainty about the optimal number and content of items (Bressler, Erford, & Dean, 2018).
We carried out a secondary analysis of the Army Study to Assess Risk and Resilience among Servicemembers (Army STARRS; Ursano et al., 2014) to develop an optimal short-form PCL-5 using machine learning methods and conventional statistical methods like those used to develop earlier short-forms. Scale development and validation were based on separate subsamples of respondents. The results of these analyses are reported in this paper.
Materials and Methods
Samples
Army STARRS was a 2009–2015 epidemiological-neurobiological study of risk-protective factors for suicidal behaviors among U.S. Army soldiers (Ursano et al., 2014). We used data from several Army STARRS surveys to create two independent samples for analysis: one in which our models were developed (Training Sample) and the other in which these models were tested (Validation Sample):
We used data from the Army STARRS Pre-Post Deployment Study (PPDS) for model development. The PPDS was a four-wave panel survey of three Brigade Combat Teams initially surveyed before deployment to Afghanistan (T0; October 2011-February 2012; n=8,558), then shortly after returning from Afghanistan (T1; September 2012-February 2013), 1–2 months later (T2; October 2012-March 2013) and 9–15 months later (T3; June 2013-May 2014). Because PCL-5 only became available for PPDS T2-T3, these waves were our training sample (n=8,365 in T2 and n=552 in T3 but not T2).
The validation sample consisted of respondents to the Army STARRS Longitudinal Survey (LS), an ongoing follow-up study of Army STARRS survey respondents, who were not in PPDS T2-T3 (n=11,728; including n=6,280 ever-deployed and n=5,448 never-deployed). The two Army STARRS surveys in this segment of STARRS-LS included: (1) the New Soldier Study (NSS; January 2011–November 2012) of new soldiers interviewed within 48 hours of reporting for Basic Combat Training (BCT; n=39,132) and (2) the All Army Study (AAS; January 2011-March 2013) of active duty soldiers not in basic training nor deployed to a combat theatre (n=24,894).
The recruitment and consent procedures for all these surveys, which are discussed in more detail elsewhere (Heeringa et al., 2013; Kessler, Colpe, et al., 2013), were approved by the Human Subjects Committees of all Army STARRS collaborating organizations.
Measures
PCL-5.
The PCL-5 includes 20 questions to evaluate the presence and severity of the 20 DSM-5 Criteria B-E symptoms of PTSD over the past month (0=not at all to 4=extremely). Probable clinical diagnoses of DSM-5 PTSD were assigned based on PCL-5 responses using four PTSD diagnostic thresholds validated against DSM-IV PCL cutoffs in prior work (e.g., Hoge et al., 2014): one threshold based on DSM-5 scoring (i.e., at least one PCL-5 item for Criteria B and C and two for Criteria D and E endorsed at a score of 2=moderately or higher) and three thresholds based on total PCL scores >= 28, >=32, and >=38. We aimed to create short-form PCL-5 scales that would reproduce each of these diagnoses derived from the full PCL-5 using responses to a subset of the 20 questions.
Psychopathological correlates.
We evaluated the convergent and discriminant validity of our short-form measure compared to the full PCL-5 by comparing their associations with known correlates that have been examined in prior psychometric work on the PCL-5 (Bovin et al., 2016) in the validation sample. The correlates considered were measures of DSM-IV major depressive episode, generalized anxiety disorder, panic disorder, and intermittent explosive disorder in the 30 days before the survey based on the self-administered version of the Composite International Diagnostic Interview Screening Scales (CIDI-SC; Kessler, Calabrese, et al., 2013). Good concordance exists between CIDI-SC diagnoses and diagnoses based on blinded clinical reappraisal interviews with the Structured Clinical Interviews for DSM-IV (Kessler, Santiago, et al., 2013). Suicide ideation in the 30 days before the LS1 survey was assessed with a modified version of the Columbia Suicidal Severity Rating Scale (C-SSRS; Posner et al., 2011) that asked about lifetime history of active (i.e., Did you ever in your life have thoughts of killing yourself?) and passive (i.e., Did you ever wish you were dead or would go to sleep and never wake up?) ideation and recency in the 30 days before the survey to create a single dichotomous variable of presence/absence of recent suicide ideation.
Socio-demographic correlates.
We also compared associations of diagnoses based on our final short-form PCL-5 and full PCL-5 with several socio-demographic variables, including sex, low education (no education beyond high school graduation or GED), junior enlisted rank (E1-E4), and history of multiple combat deployments (2 vs. 0–1), all assessed with administrative records, and self-reported minority status (Non-Hispanic Black or Hispanic).
Analysis methods
We created short-form PCL-5 scales using five statistical methods: three methods that aimed to produce the same integer scoring system as the full PCL-5 (which can be scored without a computer or a calculator) and two methods that used weighted scoring.
The first integer-scored method used RiskSLIM (Risk-Calibrated Supersparse Linear Integer Model; Ustun & Rudin, 2017), which is a machine learning algorithm to efficiently find the best-fitting logistic regression model that has small integer weights and obeys custom constraints. RiskSLIM optimized prediction of dichotomized PTSD diagnostic outcomes in the full PCL-5 (see Measures section) from responses to between one and eight PCL-5 questions. Similar to prior work (Ustun et al., 2017; Ustun & Rudin, 2017), each model was required to obey constraints so that it would use a fixed number of questions (1–8) and produce a positive integer-valued score that was monotonic across response levels. One possible RiskSLIM integer scoring of the 0–4 PCL-5 response categories is 0,1,1,1,1. This is equivalent to dichotomous yes-no scoring, as in the Primary Care PTSD Screen for DSM-5 (PC-PTSD-5), a short screening scale often used in VA settings rather than a short-form PCL-5 (Prins et al., 2016).
In addition to RiskSLIM, we used two other statistical methods, each generating both integer-scored and weighted short-form scales. The first was forward stepwise logistic regression to select between one and eight items to predict the same dichotomous PTSD diagnostic outcomes as in RiskSLIM. We summed the 0–4 responses to the selected items to create integer-scored versions and created the weighted versions by multiplying the regression coefficients by the 0–4 responses, summing, and transforming the logit to create predicted probabilities of the diagnostic outcome. The second statistical method was to select between one and eight items based on strength of loadings in a unidimensional exploratory factor analysis of all PCL-5 questions. Integer-scored and weighted versions were created as in the stepwise scales by summing the 0–4 response scores (integer-scored) and estimating logistic regression equations to generate weighted versions with logit-transformed predicted probabilities.
We considered 160 short-form scales (5 × 8 × 4): each scale was built using one of the five statistical methods, included between 1–8 PCL items, and was designed to predict each of the four dichotomous diagnostic outcomes defined by the full PCL-5. In particular, we considered the area under the receiver operating characteristic (ROC) curve (AUC), which reflects the probability that a randomly-selected case on the dichotomous diagnostic outcome will have a higher short-form score than a randomly-selected non-case.
We used inspection of the AUCs across models to narrow the range of short-form scales in the validation sample (Cortez & Mohri, 2004). We then evaluated the operating characteristics of each remaining scale using the following standard calibration and performance metrics:
Brier Score: the mean-squared difference between predicted probabilities of case designations and observed designations based on the full PCL-5 to assess calibration,
Sensitivity (SN): the proportion of respondents defined as cases by the full PCL-5 that are classified correctly at being cases on the short-form scale,
Specificity (SP): the proportion of respondents defined as non-cases by the full PCL-5 that are classified correctly as being non-cases on the short-form scale),
Positive predictive value (PPV): the proportion of respondents at or above a given screening threshold on the short-form scale that are defined as cases by the full PCL-5,
Net Benefit (NB): the number of true positives at or above the screening threshold minus the discounted number of false positives at or above the threshold, where the discount rate is defined as PPV/[1-PPV] at the threshold for each logically possible threshold on each scale.
Although seldom included in evaluations of screening scales, NB provides more intuitive and clinically useful information than SN, SP, and PPV in comparing scales because it accounts for between-clinician variation in the relative valuations of correctly detecting a true positive and correctly excluding a true negative (Van Calster et al., 2018). NB is typically evaluated through decision curves (Vickers & Elkin, 2006), which plot minimum PPV the clinician would require to designate a patient as screening positive (x-axis), and the NB of the screening scale at that threshold (y-axis). Comparing decision curves for different screening scales shows the range of PPV over which each scale is optimal and the magnitude of this benefit.
The validation sample data were weighted when we calculated short-form scale operating characteristics to adjust for the over-sampling in LS1 of respondents who reported mental disorders or suicidality in their baseline survey.
Results
Socio-demographic distribution of the samples
The unweighted socio-demographic distributions in the training sample and validation sample (including ever-deployed and never-deployed subsamples) were 6.3–24.3% female, 69.0–83.1% with no education beyond high school, 23.6–28.4% Non-Hispanic Black or Hispanic, and 34.9–82.3% junior enlisted rank. (Table 1) The much higher proportion of respondents with junior enlisted rank in the never-deployed validation sample (82.3%) than other samples (34.9–50.1%) reflects the high proportion of validation sample respondents from the NSS, virtually none of whom (other than the few who were in another branch of service prior to their recent Army enlistment) previously deployed. Roughly half of the training and ever-deployed validation samples (49.5–51.2%) had a history of multiple combat deployments.
Table 1.
Training sample | Validation sample | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Total | Ever deployed | Never deployed | |||||||||
% | (SE) | % | (SE) | % | (SE) | % | (SE) | ||||
Female | 6.3 | (0.3) | 18.0 | (0.4) | 12.6 | (0.4) | 24.3 | (0.6) | |||
Low education (No college) | 83.1 | (0.4) | 74.2 | (0.4) | 69.0 | (0.6) | 80.3 | (0.5) | |||
Minority status (Non-Hispanic Black/Hispanic)‡ | 23.6 | (0.4) | 26.0 | (0.4) | 23.9 | (0.5) | 28.4 | (0.6) | |||
Junior enlisted rank (E1-E4) | 50.1 | (0.5) | 56.9 | (0.5) | 34.9 | (0.6) | 82.3 | (0.5) | |||
History of multiple combat deployments | 49.5 | (0.5) | 27.4 | (0.4) | 51.2 | (0.6) | 0.0 | (0.0) | |||
(n) | (8,917) | (11,728) | (6,280) | (5,448) |
The training sample consisted of all T2 and T3 respondents to the Army STARRS Pre-Post Deployment Study. The validation sample consisted of all participants in the STARRS Longitudinal Study T1 survey who were not in the training sample. See the text for more detail on the samples.
Thirty-day prevalence estimates of DSM-5 PTSD based on the full PCL-5
Unweighted 30-day prevalence estimates of DSM-5 PTSD, determined by applying the aforementioned four diagnostic thresholds to the full PCL-5, were consistently highest in the ever-deployed validation sample (12.8–17.9%), lowest in the training sample (5.2–9.2%), and intermediate in the never-deployed validation sample (7.8–11.6%). (Table 2) Prevalence estimates within sample were consistently highest using the liberal PCL-5 >= 28 scoring rule (9.2–17.9%), lowest using the conservative >=38 scoring rule (5.2–12.8%), and intermediate using the >=32 (7.0–15.9%) and DSM-5 Criteria B-E (6.2–15.6%) scoring rules.
Table 2.
DSM-5 Criteria B-E | PCL-5 >=28+ | PCL-5>= 32+ | PCL-5>= 38+ | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
% | (SE) | % | (SE) | % | (SE) | % | (SE) | (n) | |||||
Training sample | 6.2 | (0.3) | 9.2 | (0.3) | 7.0 | (0.3) | 5.2 | (0.2) | (8,917) | ||||
Validation sample | |||||||||||||
Total | 13.1 | (0.3) | 15.0 | (0.3) | 13.2 | (0.3) | 10.5 | (0.3) | (11,728) | ||||
Ever deployed | 15.6 | (0.5) | 17.9 | (0.5) | 15.9 | (0.5) | 12.8 | (0.4) | (6,280) | ||||
Never deployed | 10.1 | (0.4) | 11.6 | (0.4) | 9.9 | (0.4) | 7.8 | (0.4) | (5,448) |
The PCL-5 items selected for the short-form scales
Given that integer-scored and weighted versions of the short-form scales have the same items, we considered a total of 96 (8 × 3 × 4) different short-form item sets: each contained between 1 and 8 items, created using one of three different statistical methods to select the subset of item (RiskSLIM, stepwise regression, factor analysis), and used to predict one of four different dichotomous PTSD outcomes.
Inspection of items in each set shows that those based on based on factor analysis were different from those based on RiskSLIM and stepwise regression (see Supplemental Tables 1–4). For example, the RiskSLIM and stepwise sets for the scales with 6 items (the minimum number required to determine PTSD diagnostic status based on DSM-5 diagnostic rules) included an average of 2 items from Criterion B (intrusive symptoms, compared to 1 required in DSM-5), 1 from Criterion C (avoidance, compared with at least 1 required in DSM-5), 2 from Criterion D (negative alterations in cognition and mood, compared with at least 2 required in DSM-5), and 1 from Criterion E (alterations in arousal and reactivity, compared with at least 2 required in DSM-5). In contrast, the factor analysis set included 4 symptoms from Criterion B, 1 symptom each from Criteria C and D, and none from Criterion E. These differences occurred because RiskSLIM and stepwise regression both select items to optimize explained variance in the outcomes, leading to selection of minimally redundant items, whereas factor analysis optimizes part-whole associations among the items, leading to selection of items with maximum redundancy.
The implications of these differences can be seen by inspecting AUCs in the training sample. (Figure 1a–1d) Four observations are noteworthy. First, short-form scales built using RiskSLIM and stepwise regression consistently outperformed those built via factor analysis. Second, although the AUCs continued to rise as number of questions increases, the marginal gain in performance of including a question became negligible after four questions, given that the AUC either approached or exceeded .99 for all scales predicting all diagnostic outcomes. Third, although we would expect scales built with weighted stepwise regression to outperform those built with unweighted stepwise regression (as the weights capture differences in relative importance of questions), the two methods yielded similar values of AUC (differences only in the third decimal place; see Supplemental Table 5). Fourth, although we would expect performance of scales based on RiskSLIM to be better than performance of unweighted stepwise regression because the optimal integer scoring in RiskSLIM allows question-specific nonlinearities to be detected, these differences were small. The latter two observations tell us that optimal weights were similar across questions and that the original PCL linear scoring assumption was consistent with optimal scoring across response categories.
Validation of short-form PCL-5 scales based on unweighted stepwise regression
Narrowing the focus to four-item short-form scales:
Based on the aforementioned results, we focused further analysis on the integer-scored short-form PCL-5 scales built with stepwise regression. We considered scales with between four and six items given that the incremental benefit of including more than six items was minimal. We expanded the analysis to consider 144 associations: each of 12 integer-scored short-form scales (4- to 6-item scales selected to predict four different dichotomous PCL-5 diagnostic outcomes in the training sample) with the same outcomes in the validation sample and subsamples. AUCs of all 12 scales either approached or exceeded .99 predicting all outcomes in the validation sample and subsamples (see Supplemental Table 6). We consequently focused subsequent analyses on the 4-item scales.
Operating characteristics at clinically useful screening thresholds:
Brier scores of all 4-item scales were consistently low in the total validation sample (.019-.028) and subsamples (.015-.032), indicating good calibration of all scales (see Supplemental Table 7). Inspection of ROC curves was of little help in distinguishing among the different 4-item scales, as none was consistently higher than the others (see Supplemental Figures 1a–1d, 2a–2d, 3a–3d) and all had excellent performance. For example, when SP was fixed at .9, SN was consistently greater than .9 in predicting each outcome.
Stronger discrimination between 4-item scales was found when examining NB. We focused on PPV in the range .25-.75, although we examined the full range of PPV, based on the assumptions that: (1) clinicians would not want to carry out further evaluations with more than three false positives for every one true positive (PPV = .25), noting that the vast majority of true positives would be screened in across scales and samples at that level of PPV (SN = .92-.98); and (2) clinicians would not want to require more than three true positives for every one false positive (PPV = .75), noting that such a stringent rule would miss 20–30% of true cases across scales and samples. The decision curves in the total sample (Figures 2a–2d) showed that the 4-item short-form scale designed to optimize prediction of the most liberal outcome (i.e., PCL-5 >=28) had marginally higher NB than the other 4-item scales when PPV was in the specified range for three of the four diagnostic outcomes and equivalent to the other 4-item scales for the other outcome (the DSM-5 Criteria B-E outcome). This pattern was more pronounced in the never-deployed subsample (Supplemental Figures 4a–4d), whereas all 4-item short-form scales had equivalent NB in the .25-.75 PPV range in the ever-deployed subsample (Supplemental Figures 5a–5d). Based on these results, we selected the 4-item short-form scale designed to optimize prediction of the most liberal outcome (i.e., PCL-5 >=28) as our recommended scale. (Appendix Table 1) We note that even outside this PPV range (<.25 and >.75), this pattern of results remains the same.
Characteristics of the optimal short-form scale:
The optimal 4-item short-form PCL-5 scale includes one item assessing each DSM-5 Criteria B-E: B3 (suddenly feeling or acting as if the stressful experience were actually happening again), C2 (avoidance of external reminders of the stressful experience), D6 (distant or cutoff from other people), and E1 (irritable or aggressive behavior). We do not recommend a single diagnostic threshold for this 0–16 integer-scored scale, as the appropriate threshold will depend on whether the user wants to use a conservative (PCL-5>=38), liberal (PCL-5>=28), or intermediate (PCL-5>=32 or DSM-5 Criteria B-E) definition of PTSD as well as the relative value to the user of correctly detecting true positives versus correctly excluding true negatives. However, full information in online supplemental materials (Supplemental Tables 8–10) allows users to select the appropriate threshold based on these considerations.
Comparing correlates of diagnoses based on full PCL-5 and short-form scales:
We compared socio-demographic and psychopathological correlates of PTSD diagnoses based on our recommended 4-item short-form PCL-5 scale with those of diagnoses based on the full PCL-5 in the validation sample. (Table 3) Thresholds in the short-form scale were selected to make prevalence estimates equivalent to those using the full PCL-5. Odds-ratios of correlates with the two diagnoses were very similar for all correlates across all diagnostic scoring systems.
Table 3.
PTSD Diagnostic Threshold | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DSM-5 Criteria B-E | PCL-5>=28 | PCL-5>=32 | PCL-5>=38 | ||||||||||||||||||||
Short-form | Full PCL-5 | Short-form | Full PCL-5 | Short-form | Full PCL-5 | Short-form | Full PCL-5 | ||||||||||||||||
OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | OR | (95% CI) | ||||||||
I. Socio-demographics and Army Career | |||||||||||||||||||||||
Female | 1.5* | (1.4–1.8) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.7) | 1.5* | (1.3–1.8) | |||||||
Low education | 1.7* | (1.5–1.9) | 1.8* | (1.5–2.0) | 1.7* | (1.5–1.9) | 1.6* | (1.5–1.9) | 1.7* | (1.5–1.9) | 1.7* | (1.5–2.0) | 1.6* | (1.4–1.8) | 1.6* | (1.4–1.9) | |||||||
Low family income | 1.5* | (1.3–1.7) | 1.4* | (1.2–1.5) | 1.4* | (1.3–1.6) | 1.3* | (1.2–1.5) | 1.4* | (1.3–1.6) | 1.4* | (1.2–1.5) | 1.5* | (1.4–1.7) | 1.5* | (1.3–1.7) | |||||||
Minority status | 1.3* | (1.2–1.5) | 1.2* | (1.1–1.4) | 1.3* | (1.1–1.4) | 1.2* | (1.1–1.3) | 1.3* | (1.1–1.4) | 1.3* | (1.1–1.4) | 1.3* | (1.2–1.5) | 1.3* | (1.1–1.5) | |||||||
Junior enlisted rank (E1-E4) | 1.2* | (1.1–1.3) | 1.2* | (1.1–1.4) | 1.2* | (1.0–1.3) | 1.2 | (1.0–1.3) | 1.2 | (1.0–1.3) | 1.1* | (1.0–1.2) | 1.2* | (1.0–1.3) | 1.1 | (1.0–1.2) | |||||||
History of multiple combat deployments | 1.3* | (1.2–1.5) | 1.4* | (1.2–1.6) | 1.4* | (1.3–1.6) | 1.4* | (1.3–1.6) | 1.4* | (1.3–1.6) | 1.5* | (1.3–1.6) | 1.4* | (1.3–1.6) | 1.5* | (1.4–1.7) | |||||||
II. Psychopathology | |||||||||||||||||||||||
Major depressive episode | 39.4* | (34.3–45.3) | 35.3* | (30.8–40.5) | 35.5* | (31.0–40.6) | 42.5* | (37.0–48.8) | 35.5* | (31.0–40.6) | 42.1* | (36.6–48.4) | 44.3* | (38.2–41.4) | 46.4* | (39.8–53.9) | |||||||
Generalized anxiety disorder | 41.9* | (36.3–48.3) | 39.1* | (34.0–45.0) | 39.7* | (34.5–45.7) | 45.7* | (39.6–52.8) | 39.7* | (34.5–45.7) | 47.7* | (41.3–55.2) | 48.3* | (41.5–56.1) | 56.5* | (48.4–66.0) | |||||||
Panic disorder | 14.5* | (11.7–17.8) | 16.1* | (13.0–20.0) | 16.3* | (13.1–20.4) | 18.8* | (15.0–23.6) | 16.3* | (13.1–20.4) | 18.8* | (15.1–23.4) | 13.5* | (11.0–16.5) | 15.3* | (12.4–18.8) | |||||||
Intermittent explosive disorder | 10.4* | (9.1–11.8) | 9.4* | (8.3–10.7) | 10.5* | (9.2–11.8) | 10.2* | (9.0–11.6) | 10.5* | (9.2–11.9) | 10.1* | (8.9–11.5) | 11.1* | (9.8–12.7) | 10.7* | (9.3–12.2) | |||||||
Suicide ideation | 9.9* | (8.4–11.5) | 8.6* | (7.4–10.0) | 9.8* | (8.4–11.4) | 10.7* | (9.2–12.5) | 9.8* | (8.4–11.4) | 10.9* | (9.3–12.7) | 10.2* | (8.7–11.9) | 10.2* | (8.7–11.9) |
Significant at the .05 level, two-sided test
The final 4-item short-form PCL scale was optimized to predict PCL>=28. See text for more detail.
PTSD diagnoses using the short-form scale were dichotomized at the threshold with the lowest McNemar test (i.e., with a prevalence estimate closest to the prevalence estimate in the full PCL-5) for each of the four PTSD outcomes.
Discussion
In this study, we sought to develop a short-form of the PCL with two goals in mind: (1) building a clinically useful brief PTSD screener to reduce respondent burden and (2) improving upon statistical methods used to create such a screener, given existing short-form PCLs were created using heuristic methods. To do so, we investigated empirically which PCL-5 items should be used in an optimal short-form version of the scale. Comparing several statistical methods, we found that regression-based short-form PCL-5 scales outperform factor analysis-based short-form scales but that the advantages of weighting (either unrestricted with logistic regression or restricted integer-score weighting with RiskSLIM) are minimal. The latter result indicates that the optimal logistic regression weights are very similar across PCL-5 questions and that the 0–4 scoring assumption is consistent with optimal scoring. One implication of the latter finding is that 0–4 scoring is superior to the 0–1 scoring used in the PC-PTSD-5. We also found that performance does not improve meaningfully with the addition of more than four items, leading us to recommend a 4-item short-form scale. This short-form PCL generates diagnoses that closely parallel those of the full PCL-5 and demonstrates similar psychometric properties (e.g., convergent and discriminant validity), making it well-suited for screening.
It is important to note that this study is not an attempt to ascertain which symptoms do or do not belong in the PTSD diagnostic criteria. Our results should not be interpreted as speaking to this question. Given the very strong associations among DSM-5 Criteria B-E symptoms of PTSD and the strong psychometric properties of the PCL-5, numerous 4-item short-form PCL-5 scales could be created that have operating characteristics close to those of our recommended short-form scale. The four items in our recommended scale are somewhat better than these others, though, in being the minimally redundant set of the 20 PCL-5 items distinguishing cases from non-cases according to previously identified PCL-5 PTSD diagnostic thresholds (Hoge et al., 2014). This differs from the content-driven item selection methods used in other PTSD screeners (e.g., the PC-PTSD-5; Prins et al., 2016). As in any stepwise regression scheme, the optimal items included on our short-form should be interpreted broadly as capturing the variance due to all scale items with which they are correlated rather than representing unique effects of specific symptoms. Like other PTSD screeners (i.e., PC-PTSD-5 and the 4-item PCL-5 developed by Price et al., 2016), however, our final short-form includes items assessing for at least one symptom from each DSM-5 PTSD criterion, though the individual items are mostly different (e.g., only one overlapping item between our 4-item short-form and Price et al.’s).
Screening scales should not be used to render clinical diagnoses (McDonald & Calhoun, 2010) but rather to focus attention on individuals most likely to warrant clinical evaluation. As shown in the supplemental materials, our recommended 4-item short-form scale would be well-suited to screen for PTSD in contexts where administration of the full PCL is not possible. At a threshold of 5+, for example, the scale would detect virtually all cases defined by the full PCL-5 as meeting DSM-5 criteria (SN = .976) while screening in only a small proportion of PCL-5 non-cases (1-SP = .066). At a threshold of 6+, the scale would detect an even higher proportion of cases using the conservative PCL-5>=38 threshold (SN = .982) with an even lower false positive rate (1-SP = .059).
Our findings should be interpreted in the context of several limitations. First, although our samples were large, they consisted entirely of U.S. Army soldiers and recently-separated Veterans. It would be useful to evaluate our recommended short-form scale in other populations, including civilian populations, given that past research has highlighted population-specific variation in PCL operating characteristics (Wilkins, Lang, & Norman, 2011). Such differences may be due to exposure to different traumatic event types between populations (e.g., experience of military-specific traumatic events such as combat) or time since event exposure. These factors may affect likelihood of experiencing a given PTSD symptom, which may necessitate development of additional short-form PCL-5 scales that are population-specific. Second, we did not evaluate the test-retest reliability of our recommended scale. This would be useful given the use of short-form scales for symptom tracking as part of measurement-based care (Fortney et al., 2017). Third, we did not have access to clinical interviews to validate PTSD diagnoses, instead using probable diagnoses based on the full PCL-5 as the outcomes. Although diagnoses based on the PCL have been shown to correlate highly with diagnoses based on blinded clinical interviews, including the ‘gold standard’ Clinician-Administered PTSD Scale (Keen, Kutter, Niles, & Krinsley, 2008), additional testing of our short-form scale in predicting interview-based PTSD diagnoses would be useful.
Conclusions
With the increased emphasis on screening for common mental disorders, the development and use of psychometrically sound and efficient screening tools is critical. To this end, we derived short-form PCL-5 scales using several statistical methods and found that the optimal one is a 4-item scale created using stepwise regression. Instead of a single diagnostic threshold, we offer clinicians the opportunity to select cutoffs on this short-form scale based on clinical setting and judgment using the detailed information provided in our Supplemental Materials. Given its brevity and excellent operating characteristics, this short-form PCL-5 could have great utility for case-finding in a variety of settings, particularly where screening time is a concern.
Supplementary Material
Acknowledgement and Disclaimer:
This work was supported, in part, by the STARRS-LS study. STARRS-LS is sponsored and funded by the Department of Defense (Grant number HU0001-15-2-0004). Dr. Zuromski was additionally supported by the Military Suicide Research Consortium, an effort supported by the Office of the Assistant Secretary of Defense for Health Affairs (Award No. W81XWH-16-2-0003; W81XWH-16-2-0004). The contents are solely the responsibility of the authors and do not necessarily represent the views of the Veteran’s Health Administration, Military Suicide Research Consortium, Department of the Army, or Department of Defense.
Role of the Funder/Sponsor:
Although a draft of this manuscript was submitted to the Army and NIMH for review and comment prior to submission, this was with the understanding that comments would be no more than advisory.
Footnotes
Conflict of Interest Statement:
In the past 3 years, Dr. Kessler received support for his epidemiological studies from Sanofi Aventis; was a consultant for Johnson & Johnson Wellness and Prevention, Sage Pharmaceuticals, Shire, Takeda; and served on an advisory board for the Johnson & Johnson Services Inc. Lake Nona Life Project. Kessler is a co-owner of DataStat, Inc., a market research firm that carries out healthcare research. Dr. Stein has in the past three years been a consultant for Actelion, Alkermes, Aptinyx, Bionomics, Dart Neuroscience, Healthcare Management Technologies, Janssen, Neurocrine Biosciences, Oxeia Biopharmaceuticals, Pfizer, and Resilience Therapeutics. Dr. Stein has stock options in Oxeia Biopharmaceticals. The remaining authors report nothing to disclose.
Data availability Statement:
The Army STARRS datasets, including the NSS, generated and analyzed during the current study are available in the Inter-University Consortium for Political and Social research (ICPSR) repository at the University of Michigan, https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/35197.
References
- American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, D.C.: American Psychiatric Publishing. [Google Scholar]
- Blevins CA, Weathers FW, Davis MT, Witte TK, & Domino JL (2015). The Posttraumatic Stress Disorder Checklist for DSM-5 (PCL-5): Development and initial psychometric evaluation. Journal of Traumatic Stress, 28, 489–498. doi: 10.1002/jts.22059 [DOI] [PubMed] [Google Scholar]
- Bliese PD, Wright KM, Adler AB, Cabrera O, Castro CA, & Hoge CW (2008). Validating the Primary Care Posttraumatic Stress Disorder Screen and the Posttraumatic Stress Disorder Checklist with soldiers returning from combat. Journal of Consulting and Clinical Psychology, 76, 272–281. doi: 10.1037/0022-006x.76.2.272 [DOI] [PubMed] [Google Scholar]
- Bovin MJ, Marx BP, Weathers FW, Gallagher MW, Rodriguez P, Schnurr PP, & Keane TM (2016). Psychometric properties of the PTSD Checklist for Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (PCL-5) in veterans. Psychological Assessment, 28, 1379–1391. doi: 10.1037/pas0000254 [DOI] [PubMed] [Google Scholar]
- Bressler R, Erford BT, & Dean S (2018). A systematic review of the Posttraumatic Stress Disorder Checklist (PCL). Journal of Counseling & Development, 96, 167–186. doi: 10.1002/jcad.12190 [DOI] [Google Scholar]
- Cortez C & Mohri M (2004). AUC optimization vs. error rate minimization In Saul LK, Weiss Y, & Bottou L (Eds.) Advances in neural information processing systems 17: Proceedings of the 2004 Conference. Cambridge, MA: MIT Press. [Google Scholar]
- Department of Veterans Affairs. (2017). VA/DoD clinical practice guideline for the management of posttraumatic stress disorder and acute stress disorder. (Version 3). Retrieved from https://www.healthquality.va.gov/guidelines/MH/ptsd/
- Elhai JD, Gray MJ, Kashdan TB, & Franklin CL (2005). Which instruments are most commonly used to assess traumatic event exposure and posttraumatic effects?: A survey of traumatic stress professionals. Journal of Traumatic Stress, 18, 541–545. doi: 10.1002/jts.20062 [DOI] [PubMed] [Google Scholar]
- Finkelman MD, Lowe SR, Kim W, Gruebner O, Smits N, & Galea S (2017). Customized computer-based administration of the PCL-5 for the efficient assessment of PTSD: A proof-of-principle study. Psychological Trauma: Theory, Research, Practice and Policy, 9, 379–389. doi: 10.1037/tra0000226 [DOI] [PubMed] [Google Scholar]
- Finkelman MD, Lowe SR, Kim W, Gruebner O, Smits N, & Galea S (2018). Item ordering and computerized classification tests with cluster-based scoring: An investigation of the countdown method. Psychological Assessment, 30, 204–219. doi: 10.1037/pas0000470 [DOI] [PubMed] [Google Scholar]
- Fortney JC, Unutzer J, Wrenn G, Pyne JM, Smith GR, Schoenbaum M, & Harbin HT (2017). A tipping point for measurement-based care. Psychiatric Services, 68, 179–188. doi: 10.1176/appi.ps.201500439 [DOI] [PubMed] [Google Scholar]
- Gates MA, Holowka DW, Vasterling JJ, Keane TM, Marx BP, & Rosen RC (2012). Posttraumatic stress disorder in veterans and military personnel: epidemiology, screening, and case recognition. Psychological Services, 9, 361–382. doi: 10.1037/a0027649 [DOI] [PubMed] [Google Scholar]
- Heeringa SG, Gebler N, Colpe LJ, Fullerton CS, Hwang I, Kessler RC, … Ursano RJ (2013). Field procedures in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). International Journal of Methods in Psychiatric Research, 22, 276–287. doi: 10.1002/mpr.1400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirschel MJ, & Schulenberg SE (2010). On the viability of PTSD Checklist (PCL) short form use: Analyses from Mississippi Gulf Coast Hurricane Katrina survivors. Psychological Assessment, 22, 460–464. doi: 10.1037/a0018336 [DOI] [PubMed] [Google Scholar]
- Hoge CW, Riviere LA, Wilk JE, Herrell RK, & Weathers FW (2014). The prevalence of post-traumatic stress disorder (PTSD) in US combat soldiers: A head-to-head comparison of DSM-5 versus DSM-IV-TR symptom criteria with the PTSD Checklist. The Lancet Psychiatry, 1, 269–277. doi: 10.1016/s2215-0366(14)70235-4 [DOI] [PubMed] [Google Scholar]
- Keane TM, Rubin A, Lachowicz M, Brief DJ, Enggasser JL, Roy M,…Rosenbloom D (2014). Temporal stability of DSM-5 posttraumatic stress disorder criteria in a problem-drinking sample. Psychological Assessment, 26, 1138–1145. doi: 10.1037/a0037133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keen SM, Kutter CJ, Niles BL, & Krinsley KE (2008). Psychometric properties of PTSD Checklist in sample of male veterans. Journal of Rehabilitation Research & Development, 45, 465–474. doi: 10.1682/jrrd.2007.09.0138 [DOI] [PubMed] [Google Scholar]
- Kessler RC, Calabrese JR, Farley PA, Gruber MJ, Jewell MA, Katon W, … Wittchen HU (2013). Composite International Diagnostic Interview Screening Scales for DSM-IV anxiety and mood disorders. Psychological Medicine, 43, 1625–1637. doi: 10.1017/s0033291712002334 [DOI] [PubMed] [Google Scholar]
- Kessler RC, Colpe LJ, Fullerton CS, Gebler N, Naifeh JA, Nock MK, … Heeringa SG (2013). Design of the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). International Journal of Methods in Psychiatric Research, 22, 267–275. doi: 10.1002/mpr.1401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kessler RC, Santiago PN, Colpe LJ, Dempsey CL, First MB, Heeringa SG, … Ursano RJ (2013). Clinical reappraisal of the Composite International Diagnostic Interview Screening Scales (CIDI-SC) in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). International Journal of Methods in Psychiatric Research, 22, 303–321. doi: 10.1002/mpr.1398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koenen KC, Ratanatharathorn A, Ng L, McLaughlin KA, Bromet EJ, Stein DJ, … Kessler RC (2017). Posttraumatic stress disorder in the World Mental Health surveys. Psychological Medicine, 47, 2260–2274. doi: 10.1017/s0033291717000708 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lang AJ, & Stein MB (2005). An abbreviated PTSD Checklist for use as a screening instrument in primary care. Behaviour Research and Therapy, 43, 585–594. doi: 10.1016/j.brat.2004.04.005 [DOI] [PubMed] [Google Scholar]
- McDonald SD, & Calhoun PS (2010). The diagnostic accuracy of the PTSD Checklist: A critical review. Clinical Psychology Review, 30, 976–987. doi: 10.1016/j.cpr.2010.06.012 [DOI] [PubMed] [Google Scholar]
- National Center for PTSD. Using the PTSD Checklist for DSM-5 (PCL-5). Retrieved from https://www.ptsd.va.gov/professional/assessment/documents/using-PCL5.pdf
- Parker-Guilbert K, Moshier SJ, Marx BP, & Keane TM (2018). Measures of PTSD symptom severity In Nemeroff CR & Marmar CB (Eds.) Post-traumatic stress disorder. New York, NY: Oxford University Press. [Google Scholar]
- Posner K, Brown GK, Stanley B, Brent DA, Yershova KV, Oquendo MA, … Mann JJ (2011). The Columbia-Suicide Severity Rating Scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. American Journal of Psychiatry, 168, 1266–1277. doi: 10.1176/appi.ajp.2011.10111704 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price M, Szafranski DD, van Stolk-Cooke K, & Gros DF (2016). Investigation of abbreviated 4 and 8 item versions of the PTSD Checklist 5. Psychiatry Research, 239, 124–130. doi: 10.1016/j.psychres.2016.03.014 [DOI] [PubMed] [Google Scholar]
- Prins A, Bovin MJ, Smolenski DJ, Marx BP, Kimerling R, Jenkins-Guarnieri MA, … Tiet QQ (2016). The primary care PTSD screen for DSM-5 (PC-PTSD-5): Development and evaluation within a veteran primary care sample. Journal of General Internal Medicine, 31, 1206–1211. doi: 10.1007/s11606-016-3703-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steele NM, Benassi HP, Chesney CJ, Nicholson C, & Fogarty GJ (2014). Evaluating the merits of using brief measures of PTSD or general mental health measures in two-stage PTSD screening. Military Medicine, 179, 1497–1502. doi: 10.7205/milmed-d-14-00183 [DOI] [PubMed] [Google Scholar]
- Thornicroft G, Evans-Lacko S, Koenen KC, Kovess-Masfety V, Williams DR, & Kessler RC (2018). Patterns of treatment and barriers to care in posttraumatic stress disorder In Bromet EJ, Karam EG, Koenen KC & Stein DJ (Eds.), Trauma and posttraumatic stress disorder: Global perspectives from the WHO World Mental Health surveys (pp. 137–152). New York: Cambridge University Press. [Google Scholar]
- Tiet QQ, Schutte KK, & Leyva YE (2013). Diagnostic accuracy of brief PTSD screening instruments in military veterans. Journal of Substance Abuse Treatment, 45, 134–142. doi: 10.1016/j.jsat.2013.01.010 [DOI] [PubMed] [Google Scholar]
- Ursano RJ, Colpe LJ, Heeringa SG, Kessler RC, Schoenbaum M, & Stein MB (2014). The Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Psychiatry, 77, 107–119. doi: 10.1521/psyc.2014.77.2.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- U.S. Department of Veterans Affairs. VA/DoD Clinical Practice Guidelines. Retrieved from https://www.healthquality.va.gov/
- Ustun B, Adler LA, Rudin C, Faraone SV, Spencer TJ, Berglund P,…& Kessler RC (2017). The World Health Organization adult attention-deficit/hyperactivity disorder self-report screening scale for DSM-5. JAMA Psychiatry, 74, 520–526. doi: 10.1001/jamapsychiatry.2017.0298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ustun B, & Rudin C (2017). Learning optimized risk scores on large-scale datasets. axriv.org, axriv: 1610.00168. Retreived from http://arxiv.org/abs/1610.00168 [Google Scholar]
- Van Calster B, Wynants L, Verbeek JFM, Verbakel JY, Christodoulou E, Vickers AJ, … Steyerberg EW (2018). Reporting and interpreting decision curve analysis: A guide for investigators. European Urology, 74, 796–804. doi: 10.1016/j.eururo.2018.08.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vickers AJ, & Elkin EB (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26, 565–574. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warner CH, Warner CM, Appenzeller GN, & Hoge CW (2013). Identifying and managing posttraumatic stress disorder. American Family Physician, 88, 827–834. [PubMed] [Google Scholar]
- Weathers F, Litz B, Herman D, Huska JA, & Keane T (1993). The PTSD Checklist (PCL): Reliability, validity, and diagnostic utility. Paper presented at the Annual Convention of the International Society for Traumatic Stress Studies, San Antonio, TX. [Google Scholar]
- Weathers FW, Litz BT, Keane TM, Palmieri PA, Marx BP, & Schnurr PP (2013). The PTSD Checklist for DSM-5 (PCL-5). Retrieved from https://www.ptsd.va.gov/professional/assessment/adult-sr/ptsd-checklist.asp
- Wilkins KC, Lang AJ, & Norman SB (2011). Synthesis of the psychometric properties of the PTSD Checklist (PCL) military, civilian, and specific versions. Depression and Anxiety, 28, 596. doi: 10.1002/da.20837 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wisco BE, Marx BP, & Keane TM (2012). Screening, diagnosis, and treatment of post-traumatic stress disorder. Military Medicine, 177, 7–13. [DOI] [PubMed] [Google Scholar]
- Wortmann JH, Jordan AH, Weathers FW, Resick PA, Dondanville KA, Hall-Clark B, … Litz BT (2016). Psychometric analysis of the PTSD Checklist-5 (PCL-5) among treatment-seeking military service members. Psychological Assessment, 28, 1392–1403. doi: 10.1037/pas0000260 [DOI] [PubMed] [Google Scholar]
- Zhang Y, & Yang Y (2015). Cross-validation for selecting a model selection procedure. Journal of Econometrics, 187, 95–112. doi: 10.1016/j.jeconom.2015.02.006 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.