Abstract
The present study evaluated the psychometric properties of the Patient Health Questionnaire-4 (PHQ-4), a screener of psychological distress, in English- and Spanish-speaking Hispanic Americans. Hispanic American adults (N = 436) completed the PHQ-4, which yields two subscales (anxiety and depression) that can be summed to create a total score. Multiple-group confirmatory factor analysis was used to evaluate structural validity. The two-factor structure was the best fit to the data for both English- and Spanish-speaking Hispanic Americans and items loaded equivalently across groups, demonstrating measurement invariance. Internal consistency reliability was good as measured by coefficient alpha. Construct validity was evidenced by significant expected relationships with perceived stress. These findings provide support for the reliability and validity of the PHQ-4 as a brief measure of psychological distress for English- or Spanish-speaking Hispanic Americans.
Keywords: Patient Health Questionnaire-4, Hispanic Americans, psychometrics, anxiety, depression
The Patient Health Questionnaire-4 (PHQ-4) is a four-item screener of psychological distress (Kroenke, Spitzer, Williams, & Lowe, 2009). The PHQ-4 is a composite of the Patient Health Questionnaire-2 (PHQ-2; Kroenke, Spitzer, & Williams, 2003) and Generalized Anxiety Disorder-2 scales (GAD-2; Kroenke, Spitzer, Williams, Monahan, & Lowe, 2007), two-item scales designed to screen for depression and anxiety over the prior two weeks, respectively. Kroenke et al. (2009) noted that depression and anxiety are the two most common mental health disorders, and are often comorbid with each other. Furthermore, disability has been found to be most severe when depression and anxiety co-occur. Thus, the PHQ-4 was developed, combining the PHQ-2 and GAD-2 into a four-item scale to permit efficient screening of both depression and anxiety using the same, very brief measure.
The PHQ-4 was developed and validated in a United States (U.S.) sample of 2,149 patients from 15 primary care sites. Patients had a mean age of 47.2 years (SD = 15.4), and were predominantly female and non-Hispanic White. A principal component analysis (PCA) of the four PHQ-4 items indicated that 84% of the total variance was explained by two factors. As expected, the two anxiety items had the highest factor loadings on one factor, and the two depression items had the highest factor loadings on the other. Kroenke et al. (2009) noted that two subscale scores can be calculated, one each for anxiety and depression, along with a total score reflecting psychological distress. The mean PHQ-4 score for the total sample was 2.5 (SD = 2.8), described by Kroenke et al. (2009) as indicating normal to mild levels of psychological distress, although determination of cut-off scores was not described. Internal consistency reliability was good (αs > 0.80) for the total score and subscales. The measure also demonstrated strong construct validity. Higher scores on the PHQ-4 were associated with increasing scores in all six domains of the Medical Outcomes Study Short-Form General Health Survey (SF-20; Wells et al., 1987).
The PHQ-4 has also been cross-validated in a large sample (N = 5,036) from the general population in Germany (Lowe et al., 2010). Participants had a mean age of 48.4 years (SD = 18) and were predominantly female. Information on the race/ethnicity of participants was not provided. Mean PHQ-4 scores were 1.76 (SD = 2.06) and internal consistency reliability for the total score and subscales was good. A two-factor model fit the data well. Scores on the PHQ-4 correlated at expected magnitudes and directions with the Rosenberg Self-Esteem Scale (Rosenberg, 1965), the Questionnaire on Life Satisfaction (Henrich & Herschbach, 2000), and the Resilience Scale (Schumacher et al., 2005), demonstrating convergent validity.
The psychometric properties of the PHQ-4 were recently examined in a sample of surgical patients attending preoperative anesthesiological assessment clinics (N = 2,852) in Germany (Kerper et al., 2014). Approximately half of the sample was female (Age: M = 47 years). Race/ethnicity data were not provided. Clinically significant psychological distress (t-score ≥ .63) on the Brief Symptom Inventory (BSI; Derogatis, 1993) was reported by 14.6% of the sample. A two-factor model was examined using PCA, and the factors explained 83% of the total variance. The four items, however, did not load onto the two factors as expected. The two GAD-2 items had the highest loadings on Factor 1, as expected, and Item 1 from the PHQ-2 had the highest loading on Factor 2 (assessing depression). However, item 2 from the PHQ-2 loaded highest on Factor 1, with the two GAD-2 items. Convergent validity was demonstrated by correlations with the total score and select subscales of the BSI, a measure of perceived distress.
The purpose of the present study was to evaluate the psychometric properties (reliability, structural validity, construct validity) of the PHQ-4 among a sample of English and Spanish speaking Hispanic Americans. The PHQ-4 has yet to be psychometrically evaluated in the Hispanic American population, despite being extensively used in research on diverse samples including Hispanic Americans. Furthermore, the structural invariance of the English and Spanish versions of the measure across different Hispanic American language groups has yet to be examined. Evidence of structural invariance across these language groups is a critical prerequisite for cross-group comparisons (Floyd & Widaman, 1995).
Method
Participants and Procedures
Hispanic American adults (N = 436) were recruited as part of a larger community study validating health-related measures. To be eligible for inclusion, individuals must have self-identified as Hispanic American, been at least 21 years old, resided in the U.S., and be literate in either English or Spanish. The sponsoring universities’ Institutional Review Boards approved all study procedures and materials and participants provided informed consent prior to participation.
Measures
Demographics
Participant demographic information was provided by self-report.
PHQ-4
As described above, the PHQ-4 (Kroenke et al., 2009) is a four-item measure of psychological distress. Total scores range from 0 to 12, with higher scores indicating greater psychological distress. GAD-2 and PHQ-2 scores can also be calculated by summing the first two items and last two items of the measure, respectively, with subscales scores ranging from 0 to 6. The Spanish versions of the GAD-2 items were previously translated by García-Campayo et al. (2010). The Spanish versions of the PHQ-2 items were translated by the Stanford Patient Education Research Center (Spanish Personal Health Questionnaire (PHQ-8) Depression, n. d.). See Table 1 for individual items of the PHQ-4 in English and Spanish.
Table 1.
Items from the Patient Health Questionnaire-4 in English and Spanish
Generalized Anxiety Disorder-2 (English/Spanish) |
Over the last 2 weeks, how often have you been bothered by the following problems?/Señale con qué frecuencia ha sufrido los siguientes problemas en los últimos 15 días
|
1. Feeling nervous, anxious or on edge/Se ha sentido nervioso, ansioso o muy alterado
|
2. Not being able to stop or control worrying/No ha podido dejar de preocuparse
|
Patient Health Questionnaire-2 (English/Spanish) |
Over the last 2 weeks, how often have you been bothered by the following problems?/Durante las últimas 2 semanas, ¿con qué frecuencia le han molestado los siguientes problemas?
|
1. Little interest or pleasure in doing things/Tener poco interés o placer en hacer las cosas
|
2. Feeling down, depressed, or hopeless/Sentirse desanimada, deprimida, o sin esperanza |
Perceived Stress Scale
The PSS is a self-report measure of perceived stress developed by Cohen, Kamarck, and Mermelstein (1983); the Spanish version of the measure was drawn from Cohen’s website (PSS Translations, n. d.). Total scores range from 0 to 40 and higher scores reflect greater perceived stress. In the present study, internal consistency reliability was good (α = 0.82) for the total sample and acceptable to good when language preference groups were examined separately (English: α = 0. 87; Spanish: α = 0.78).
Data Analysis
PHQ-4 total scores were computed for the English and Spanish language groups separately, and group means were compared with an independent samples t-test. Internal consistency reliability was evaluated using Cronbach’s alpha.
Multiple-group confirmatory factor analysis (MCFA) was used to evaluate the comparability of the factor structure of the PHQ-4 across language groups. Prior investigations have found evidence for a two-factor structure, thus a two-factor solution was expected for both language groups in the present analysis. The MCFA was conducted in accordance with the approach recommended by Vandenberg and Lance (2000). Three increasingly restrictive models are iteratively examined. To establish configural invariance, a model is examined in which the number of factors and the items contributing to each factor are constrained across groups, but all other parameters are freely estimated. Once configural invariance is established, the metric invariance of the structure between groups is examined to determine if each item loads equivalently onto the same factor in both groups. To establish metric invariance, a model is examined in which the loading of each item onto its respective factor is constrained across groups, but factor variances, factor covariances, and error variances, are freely estimated. Finally, once metric invariance is established and deemed to be a superior fit to the data than the configural invariance model, the factor variance/covariance invariance of the structure is examined across groups. In this most restrictive iteration, the loading of each item onto its respective factor is again constrained across groups, as are the variances and covariances of each factor; error variances are freely estimated. To determine which model is the optimal fit to the observed data, each model that is deemed to adequately fit the data is statistically compared to the prior, next less restrictive iteration using a Chi-squared difference test.
Multiple indicators of overall model fit were examined, including: (a) the Comparative Fit Index (CFI; Bentler, 1990), an absolute index of model fit, (b) the Root Mean Square Error of Approximation (RMSEA; Steiger, 1990), a parsimony-adjusted index of model fit; and (c) the Standardized Root Mean Residual (SRMR; Hu & Bentler, 1999), an absolute index of model fit. For the CFI, values >.95 indicated good model fit and values > 0.90 indicated acceptable model fit. For the RMSEA and SRMR, values < 0.08 indicated acceptable model fit and values < 0.05 indicated good model fit. A model was determined to adequately fit the observed data if at least two of the three descriptive fit indices met acceptable model fit criteria. The likelihood ratio Chi-squared was also reported, however, it did not serve as the only indicator of model fit because it is highly influenced by sample size and does not demonstrate degree of fit (Gerbing & Anderson, 1993).
The construct validity of PHQ-4 total and subscale scores were evaluated by examining Pearson product-moment correlations with scores on the PSS. The MCFA was conducted using MPlus version 7.11 (Muthén & Muthén, 2010). All other analyses were completed in SPSS version 20 (IBM Corp., 2011).
Results
Descriptive Statistics
Descriptive statistics can be found in Table 2. Spanish language group PHQ-4 scores (M = 2.94, SD = 2.94) were significantly higher than English language group scores (M = 2.07, SD = 2.59), t (432) = −3.26, p = 0.001. The majority of the sample (61.5%) had PHQ-4 scores indicative of normal levels of psychological distress (≤ 2), while 8.6% of the sample had scores indicative of severe levels (≥ 9). Overall, the Spanish language group had lower income, were less educated, and less likely to be employed in comparison to the English language group.
Table 2.
Sample Characteristics
English (n = 210) |
Spanish (n = 226) |
|
---|---|---|
Age*a | 38.50 (13.74) | 46.24 (13.37) |
Genderb | ||
Female | 107 (51.0%) | 112 (49.6%) |
Male | 103 (49.0%) | 114 (50.4%) |
Education*b | ||
Less then Bachelor’s degree | ||
Less than High School | 13 (6.2%) | 108 (47.7%) |
High school/Trade School | 39 (18.6%) | 48 (21.2%) |
Some college/Associates degree | 81 (38.5%) | 41 (18.2%) |
Bachelor’s degree or higher | ||
Bachelor’s degree | 57 (27.1%) | 17 (7.5%) |
Postgraduate | 18 (8.6%) | 7 (3.1%) |
Missing/Don’t Know | 2 (1.0%) | 5 (5.3%) |
Employment status*b | ||
Employed | 141 (68.1%) | 106 (46.5%) |
Not Employed for Wages | ||
Unemployed | 30 (14.2%) | 42 (18.6%) |
Homemaker | 6 (2.9%) | 30 (13.3%) |
Student/retired/disabled | 19 (9.0%) | 29 (12.7%) |
Social Security/SSI | 4 (1.9%) | 9 (4.0%) |
Missing/Don’t Know | 10 (3.9%) | 10 (4.9%) |
Marital status b | ||
Married | 95 (45.2%) | 116 (51.3%) |
Not Married | ||
Single | 65 (31.0%) | 59 (26.1%) |
Living with partner | 15 (7.1%) | 14 (6.2%) |
Divorced/Separated | 32 (15.2%) | 27 (11.9%) |
Widowed | 3 (1.4%) | 9 (4.0%) |
Missing | 0 (0.0%) | 1 (0.5%) |
Income*b | ||
$0 – $24,999 | 61 (29%) | 121 (53.5%) |
$25,000- $49,999 | 59 (28.1%) | 60 (26.5%) |
$50,000 – $74,999 | 41 (19.5%) | 11 (4.9%) |
> $75,000 | 34 (16.2%) | 9 (4%) |
Missing/Don’t Know | 15 (7.2%) | 25 (11.1%) |
Note.
M (SD);
n (%).
Independent sample t-tests resulted in a significant difference at p < .01 (two-tailed) between language preference groups. SSI = supplemental security income.
Reliability
For the total sample, internal consistency reliability was good for the PHQ-4 (α = 0.86) and its subscale scores (PHQ-2: α = 0.80; GAD-2: α = 0.81). For the English language group, internal consistency reliability was good for the PHQ-4 (α = 0.85) and acceptable to good for its subscale scores (PHQ-2: α = 0.81; GAD-2: α = 0.77). For the Spanish language group, internal consistency reliability was good for the PHQ-4 (α = 0.86) and its subscale scores (PHQ-2: α = 0.80; GAD-2: α = 0.82).
Multiple Group Confirmatory Factor Analysis Models
Preliminary analyses demonstrated that the data were significantly multivariately non-normal. Therefore, the Satorra-Bentler chi-square test statistic (S-Bχ2; Satorra & Bentler, 2001) was evaluated.
Configural Invariance
The baseline configural invariance model demonstrated that the two-factor structure fit the observed data well for both language groups (Table 3). For the English language sample, all estimated unstandardized factor loadings for both the PHQ-2 (Item 1: .89; Item 2: 1.13, p < .05) and the GAD-2 (Item 1: .74, Item 2: 1.35, p < .05) subscales were statistically significant, as were the variances for both factors (σ2PHQ-2 = 0.35, σ2GAD-2 = 0.29, all ps < 0.01). The covariance between the two factors was also statistically significant (r = 0.27, p < 0.01), indicating that the two dimensions of psychological distress were positively related to one another.
Table 3.
Fit Statistics for Configural Invariance, Metric Invariance, and Factor Variance/Covariance Invariance Models of the PHQ-4
Model | S-Bχ2 | df | p | CFIa | SRMRb | RMSEAb | Reference Model # | ΔS-Bχ2 | Δdf | Δp |
---|---|---|---|---|---|---|---|---|---|---|
1. Configural | 20.590 | 4 | < .01 | 0.966 | 0.035 | 0.138 | ||||
2. Metric | 21.821 | 6 | < .01 | 0.968 | 0.044 | 0.110 | 1 | 2.7774 | 2 | 0.250 |
3. Factor | 22.286 | 9 | < .01 | 0.973 | 0.071 | 0.082 | 2 | 2.7851 | 3 | 0.426 |
Note. CFI = comparative fit index; SRMR = standardized root mean square residual; RMSEA = root mean
square error of approximation.
Plausible fit > .90, Good fit >.95;
Plausible fit < .08, Good fit < .05.
For the Spanish language sample, all estimated unstandardized factor loadings for both the PHQ-2 (Item 1: .93; Item 2: 1.08, p < .05) and the GAD-2 (Item 1: .93; Item 2: 1.08, p < .05) subscales were statistically significant, and the factor variances for both factors were also significant (σ2PHQ-2 = 0.43, σ2GAD-2 = 0.55, all ps < 0.01). Furthermore, the covariance between the two factors was again statistically significant (r = 0.41, p < 0.01).
Metric Invariance
The metric invariance model fit the data well (Table 3). When this model was statistically compared to the configural invariance model, the metric invariance model was a superior fit to the data (ΔS-Bχ2 = 2.777, Δdf = 2, p = 0.250).
Factor Variance/Covariance Invariance
This most restrictive model fit the data well. When this model was compared to the metric invariance model, the factor variance/covariance invariance model was the best fit to the data (ΔS-Bχ2 = 2.785, Δdf = 3, p = 0.426).
Construct Validity
As expected, there was a strong positive correlation between PHQ-4 total and subscale scores and scores on the PSS (PHQ-4: r = 0.63, p < 0.01; PHQ-2: r = 0.57, p < 0.01; GAD-2: r = 0.60, p < 0.01).
Discussion
These findings suggest that the PHQ-4 is a reliable and valid measure of psychological distress for use with English and Spanish speaking Hispanic Americans. Internal consistency reliability was strong. Results from the MCFA indicate that the PHQ-4 consists of two factors, one reflecting symptoms of anxiety and the other symptoms of depression. The MCFA indicated that the factor variance/covariance invariance model was the best fit to the data. Thus, these results suggest that the PHQ-4 can be used to measure the construct of psychological distress equivalently across English and Spanish speaking Hispanic Americans, and that scores on the PHQ-4 can be compared across these language groups. In addition, higher PHQ-4 total and subscale scores were strongly associated with higher perceived stress, demonstrating construct validity.
These results should be interpreted while recognizing study limitations. The present sample did not include a large sample of participants with moderate or severe levels of distress. Additionally, the sample was predominantly Mexican American and lived in a metropolitan border city, further limiting the generalizability of study findings. Despite these limitations, the results support the PHQ-4 as a good choice for researchers and healthcare professionals who wish to quickly screen for psychological distress in Hispanic Americans.
References
- Bentler PM. Comparative fit indexes in structural models. Psychological Bulletin. 1990;107:238–246. doi: 10.1037/0033-2909.107.2.238. [DOI] [PubMed] [Google Scholar]
- Cohen S, Kamarck T, Mermelstein R. A global measure of perceived stress. Journal of Health and Social Behavior. 1983;24:385–396. [PubMed] [Google Scholar]
- Derogatis LR. The Brief Symptom Inventory (BSI): Administration, scoring and procedures manual. 3rd. Minneapolis, MN: National Computer System; 1993. [Google Scholar]
- Floyd FJ, Widaman KF. Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment. 1995;7:286–299. [Google Scholar]
- García-Campayo J, Zamorano E, Ruiz MA, Pardo A, Pérez-Páramo M, López-Gómez V, Freire O, Rejas J. Cultural adaptation into Spanish of the Generalized Anxiety Disorder-7 (GAD-7) scale as a screening tool. Health and Quality of Life Outcomes. 2010;8 doi: 10.1186/1477-7525-8-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerbing DW, Anderson JC. Monte Carlo evaluations of goodness-of-fit indices for structural equation models. Sociological Methods Research. 1992;21:132–160. [Google Scholar]
- Henrich G, Herschbach P. Questions of life satisfaction: A short measure for assessing quality of life. European Journal of Psychological Assessment. 2000;16:150–159. [Google Scholar]
- Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling. 1999;6:1–55. [Google Scholar]
- IBM Corp. Released. IBM SPSS Statistics for Windows, version 20.0. Armonk, NY: IBM Corp; 2011. [Google Scholar]
- Kerper L, Spies CD, Tillinger J, Wegscheider K, Salz A, Weiss-Garlach E, Neumann T, Krampe H. Screening for depression, anxiety, and general psychological distress in pre-operative surgical patients: A psychometric analysis of the Patient Health Questionnaire 4 (PHQ-4) Clinical Health Promotion. 2014;4:5–14. [Google Scholar]
- Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care. 2003;41:1284–1292. doi: 10.1097/01.MLR.0000093487.78664.3C. [DOI] [PubMed] [Google Scholar]
- Kroenke K, Spitzer RL, Williams JB, Lowe B. An ultra-brief screening scale for anxiety and depression: The PHQ-4. Psychosomatics. 2009;50:613–621. doi: 10.1176/appi.psy.50.6.613. [DOI] [PubMed] [Google Scholar]
- Kroenke K, Spitzer RL, Williams JB, Monahan PO, Lowe B. Anxiety disorders in primary care: Prevalence, impairment, comorbidity, and detection. Annals of Internal Medicine. 2007;164:317–325. doi: 10.7326/0003-4819-146-5-200703060-00004. [DOI] [PubMed] [Google Scholar]
- Lowe B, Wahl I, Rose M, Spitzer C, Glaesmer H, Wingeneld K, Schneider A, Brahler E. A 4-item measure of depression and anxiety: Validation and standardization of the Patient Health Questionnaire-4 (PHQ-4) in the general population. Journal of Affective Disorders. 2010;122:86–95. doi: 10.1016/j.jad.2009.06.019. [DOI] [PubMed] [Google Scholar]
- Muthén LK, Muthén BO. Mplus User’s Guide. 6th. Los Angeles, CA: Muthén & Muthén; 1998–2010. [Google Scholar]
- Spanish Personal Health Questionnaire (PHQ-8) Depression. Retrieved from http://patienteducation.stanford.edu/research/phqesp.html.
- PSS Translations. Retrieved from http://www.psy.cmu.edu/~scohen/scales.html.
- Rosenberg M. Society and the adolescent self-image. Princeton, NJ: University Press; 1965. [Google Scholar]
- Satorra A, Bentler PM. A scaled difference chi-square test statistic for moment structure analysis. Psychometrika. 2001;66:507–514. doi: 10.1007/s11336-009-9135-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schumacher J, Leppert K, Gunzelmann T, Stauß B, Brahler E. Resilienzskala — Ein Fragebogen zur Erfassung der psychischen Widerstandsfähigkeit als Personenmerkmal. [Resilience Scale — a questionnaire to assess psychological resilience as personality trait] Z f Klinische Psychologie, Psychiatrie und Psychotherapie. 2005;53:16–39. [Google Scholar]
- Steiger JH. Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research. 1990;25:173–180. doi: 10.1207/s15327906mbr2502_4. [DOI] [PubMed] [Google Scholar]
- Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods. 2000;3:4–70. [Google Scholar]
- Wells KB, Stewart AL, Hays RD, Burnam MA, Rogers W, Daniels M, Berry S, Greenfield S, Ware J. The functioning and well-being of depressed patients: Results from the Medical Outcomes Study. JAMA. 1987;262:914–919. [PubMed] [Google Scholar]