Skip to main content
. 2019 May 7;6:e7. doi: 10.1017/gmh.2019.5

Table 2.

Psychometric properties of assessments

Author(s) (year) Assessment measure Administration mode Psychometric testing results
Abeyasinghe et al. (2012) Peradeniya Depression Scale Interviewer administered
  1. Internal consistency: N/A

  2. Validity: N/A

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other:

    The area under the ROC curve was 0.95 (95% CI 0.91–0.99) meaning the PDS can be considered highly accurate.

    The highest sensitivity and specificity were 87.5% and 88% respectively.

Bass et al. (2008) Adapted Hopkins Symptom Checklist (HSCL) and Edinburgh Post-partum Depression Scale (EPDS); Screener containing locally sourced items Interviewer administered
  1. Internal consistency (α):

    All scales and the composite scale showed good internal consistency – the adapted HSCL, EPDS screener and local screener were 0.86, 0.76 and 0.88 respectively. For the mental health symptoms scale (i.e. HSCL  +  EPDS) α =  0.92.

    All scales showed good specificity and sensitivity: area under the curve for the detection of the local depression-like syndrome ranged from 0.83 to 0.87, depending on the scale used. The optimal cut-off scores for each scale were all at 80% or greater, except for the specificity of the EPDS cut-off.

  2. Validity:

    Convergence between the mental health symptoms instrument and a locally developed function scale showed evidence of convergent validity [0.34 (p  =  0.0001)]. There was a consistent increase in depression severity scores with each incremental increase in dysfunction which provides additional evidence of convergent validity for these measures.

    Discriminant validity testing showed that the mean for cases, 34.9 points (s.e. 1.8), and non-cases, 16.9 points (s.e. 2.4) (identified by Key Informants and self-identification), were substantially and statistically significantly different (p < 0.0001).

  3. Inter-rater reliability: N/A

  4. Test–retest reliability tests showed adequate reliability: The correlations between the scale scores from first interview and the re-interview were: 0.59 (p < 0.001), 0.53 (p  =  0.003), and 0.42 (p  =  0.02), for the HSCL, EPDS and total scores respectively.

  5. Other: N/A

Betancourt et al., (2009a, b) Acholi Psychosocial Assessment Instrument (APAI)
Subscales: Ma lwor (anxiety); Kwo Maraco (conduct);
Pro social; Depression (combination of two tam,
kumu and par)
Interviewer administered
  1. Internal consistency (α)

    For the scales and combinations of scales was adequate or strong: Two tam  =  0.87; Kumu  =  0.87; Par =  0.84; Ma lwor(anxiety)  =  0.70; Kwo Maraco (conduct)  =  0.83; Pro social  =  0.70; Total depression (combination of two tam, kumu and par) =  0.93; Total APAI problems  =  0.93

  2. Validity

    Concurrent validity – significant mean differences across case status confirmed for all three depression-like syndromes: Case/Non-case Mean(s.d.): Two tam 21.56(8.06)/15.76(8.29); Kumu 16.52 (7.15)/9.33(6.67); Par 17.24(7.69)/11.91(7.08)

    The mean scores for the corresponding scale scores of ‘cases' of the anxiety syndrome [ma lwor 10.35(5.61)/9.97(5.92)] and the conduct problem syndrome [kwo maraco 5.46(6.48)/2.45(3.09)] were not significantly different from ‘noncases’ as identified by adolescents and caregivers.

  3. Inter-rater reliability (r)

    Good for all the APAI depression like problem scales (Two tam =  0.86; Par =  0.78; Kumu =  0.92)

    Inter-rater reliability was less strong for the anxiety problem scale (ma lwor) (0.62)

    Poor for the conduct problem scale (kwo maraco) and the prosocial scale (0.25 and 0.35 respectively).

  4. Test–retest reliability (r)

    Good for all the APAI depression like problem scales (two tam =  0.79; Par =  0.79; Kumu =  0.89)

    Less strong for the anxiety problem scale (0.68)

Bolton et al. (2007) APAI Depression symptom scale (combination
of two tam, kumu and par)
Interviewer administered
  1. Internal consistency (α)

    In adolescents was 0.92

  2. Validity:

Concurrent validity was confirmed by substantial and significantly higher scores on the APAI for cases [mean (s.d.)] [45.3, (13.6)] compared with non-cases [15.6, (11.2)], as identified by caregivers and adolescents.
  1. Inter-rater reliability: N/A

  2. Test–retest reliability (r) for the depression symptom scale was 0.84.

  3. Other: N/A

Fernando (2008) Sri Lankan Index of Psychosocial Status Adult Version (SLIPSS-A) Self-report
  1. Internal consistency was high (α =  0.92).

  2. Validity

    Content validity was assessed by reviewing the SLIPSS-A items for consistency with the narrative data. 7/12 most frequently endorsed SLIPSS-A items were the most frequently mentioned in narratives.

    Convergent validity: scores on the SLIPSS-A were significantly correlated with scores on the PCL-C, r(99) = 0.75, p < 0.000.

    Predictive validity: the model successfully distinguished between those who had not been exposed to the tsunami and those who had, even after controlling for sample type, χ2(3) =  28.7, p < 0.000. This was repeated for the community sample alone, and again it significantly predicted trauma exposure χ2(1) =  7.72, p < 0.005. The model demonstrated adequate ability to predict correctly participants' trauma exposure status, with an overall prediction success rate of 61.6%.

    Predictive validity: The scores on the SLIPSS-A and an item assessing life satisfaction were strongly negatively correlated, r(132) = 0.51, p < 0.000.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: The western developed PCL-C could not distinguish between trauma groups.

Green et al. (2018) Perinatal Depression Screening (PDEPS), blending EPDS, Patient Health Questionnaire (PHQ-9) items with locally developed items Interviewer administered
  1. Internal consistency: α  =  0.81

  2. Validity:

    Discriminant validity: mean PDEPS score was twice as large for DSM cases than non-cases based on SCID-5-RV diagnosis (13.6 v. 6.3) and for cases based on ‘local’ diagnosis (In your clinical judgement, do you think that this woman is ‘depressed’?; 13.3 v. 6.2).

    Convergent validity: Correlation with counsellor rated social and occupational functioning scale (SOFAS)  =  −0.32 and with self-report rating of wellbeing  =  −0.25.

    Construct validity: associated with wealth (  =  −2.7) and work (  =  2.3)

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: r  =  0.62 (enumerator) and r  =  0.36 (automated phone)

  5. Other:

    Compared with SCID-5-RV diagnosis: sensitivity  =  0.90, specificity  =  0.90, AUC  =  0.89, LR+  =  8.62, LR−  =  0.11. Compared with ‘local’ diagnosis: sensitivity  =  0.58, specificity  =  0.88, AUC  =  0.86, LR+  =  5.00, LR−  =  0.47. PDEPS outperformed the full PHQ-9 and EPDS in terms of classification accuracy (0.90 v. 0.72 and 0.73, respectively).

Hinton et al. (2012) Cambodian Somatic Symptom and Syndrome
Inventory SSI (abbreviated)
Interviewer administered
  1. Internal consistency: N/A

  2. Validity:

    Convergent validity: Correlation with PTSD Check-list (PCL), r  =  0.69; Harvard Trauma Questionnaire (HTQ), r  =  0.51; Short Form Health Survey–3 (SF-3), r  =  0.49

  3. Discriminant validity: Symptom severity for all SSI items increases with increasing PTSD severity.

  4. Inter-rater reliability: N/A

  5. Test–retest reliability: N/A

  6. Other: SSI was more strongly correlated than PCL with HTQ and SF-3.

Hinton et al. (2013) Cambodian SSI (full) Interviewer administered
  1. Internal consistency: Somatic scale, α =  0.91; syndrome scale, α =  0.88; multi-item syndrome subscales, all αs >0.84

  2. Validity:

    Convergent validity: Correlation with PTSD Check-list (PCL), CSSI total, r  =  0.67, CSSI somatic scale, r  =  0.71, CSSI syndrome scale, r  =  0.63; Short Form Health Survey–12 (SF-12), r  =  0.7

  3. Discriminant validity: Mean (s.d.)  =  2.0 (0.8) for PTSD group and 0.6 (0.5) for non-PTSD group. Symptom severity for CSSI total and both CSSI scales increase with increasing PTSD severity.

  4. Inter-rater reliability: N/A

  5. Test–retest reliability: N/A

  6. Other: SSI was more strongly correlated than PCL with SF-12.

Hinton et al. (2018) Vietnamese Symptom and Cultural Symptom
Addendum (VN SSA)
Interviewer administered
  1. Internal consistency: N/A

  2. Validity:

    Convergent validity: All items were correlated with standardised and summed scale combining Generalised Anxiety Disorder-7 (GAD-7), Posttraumatic Diagnostic Scale (PDS), Patient Health Questionnaire-9 (PHQ-9), r  =  0.15 to 0.54

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Ice & Yogo (2005) Luo Perceived Stress Scale (LPSS) Interviewer administered
  1. Internal consistency: α =  0.75

  2. Validity

    Criterion validity: caregiving, social networks, depression, and cortisol were all associated with LPSS as predicted with the exception of caregiving.

    Known group validity was examined through comparisons of caregiving groups, genders, marital status, and participation in social groups. While they were generally associated with LPSS in the predicted direction, factor analysis suggested that the LPSS did not represent a single domain. The LPSS requires additional development.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Kaaya et al. (2008) Dar-es-Salaam Symptom Questionnaire (DSQ) Interviewer administered
  1. Internal consistency: α =  0.84

  2. Validity:

    Construct validity: Items loaded as expected for depression and anxiety symptoms in principal components analysis.

    Convergent validity: Correlation with Short-Form Health Survey-36 (SF-36), r  =  −0.57 to −0.37

    Criterion validity: Significant predictors were economic provisions, control over decisions on household matters, marital status, and education (β  =  −0.21 to −0.09).

  3. Inter-rater reliability: r  =  0.89

  4. Test–retest reliability: r  =  0.82

  5. Other: N/A

Kaiser et al. (2013) Kreyòl Distress Idioms (KDI) Interviewer administered
  1. Internal consistency: [Mean(s.d.)]: 17.02 (9.6) α = .89

  2. Validity: N/A

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Kaiser et al. (2015) Kreyòl Distress Idioms (KDI) Interviewer administered
  1. Internal consistency of the KDI was high (α =  0.86).

  2. Validity

    Correlations with other scales – BAI (r  =  0.67) and BDI (r  =  0.52) – support convergent and content validity.

    External validity confirmed by correlation with known risk factors: Number of traumatic events experienced and having a household member with mental distress were both statistically significantly associated with higher KDI score (  =  1.5 and 5.8, respectively; p ⩽ 0.003). Two perceived causes of mental distress were associated with KDI score: those endorsing that relationships can cause mental distress scored on average 3.2 points lower (p < 0.001), while those stating that disasters can cause mental distress scored 3.68 points higher (p < 0.001).

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Kohrt et al. (2016) PHQ-9; heart-mind/brain-mind Interviewer administered
  1. The internal consistency (α) for the PHQ-9 was 0.84.

  2. Validity:

    Validated by comparing the researcher administrated Nepali PHQ-9 and the CIDI. The CIDI and PHQ-9 were compared identifying an area under the curve (AUC) of 0.94 (95% CI 0.87–0.99).

    Discriminant validity: All PHQ-9 item means were significantly different when comparing non-depressed (CIDI negative) and depressed (CIDI positive) participants

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other:

    Sensitivity of PHQ-9: For a PHQ-9 score of 10 or greater, the sensitivity was 0.94 (95% CI 0.73–0.99), specificity was 0.80 (95% CI 0.71–0.86), PPV was 0.42 (95% CI 0.27–0.59), and NPV was 0.99 (95% CI 0.93–1.00), with a positive likelihood ratio of 4.62 (95% CI 3.12–6.83), and negative likelihood ratio of 0.07 (95% CI 0.01–0.47).

    Sensitivity of other idiom/CCD: Heart-mind problems had a sensitivity of 0.94 (95% CI 0.69–1.00), specificity of 0.27 (95% CI 0.19–0.36), PPV of 0.17 (95% CI 0.10–0.26), and NPV of 0.97 (95% CI 0.81–1.00). Brain-mind problems had low sensitivity for CIDI positive status (sensitivity  =  0.47, 95% CI 0.25–0.71).

McMullen et al. (2012) APAI (depression symptom scale) Interviewer administered
  1. Internal consistency (α): high  =  0.93

  2. Validity: N/A

  3. Inter-rater reliability: N/A

  4. Test–test reliability was strong (r  =  0.835, p < 0.001)

  5. Other: N/A

Miller et al. (2006) Afghan Symptom Checklist (ASCL) Interviewer administered
  1. Internal consistency (α): High 0.93

  2. Validity: Good construct validity, correlating strongly with a measure of exposure to war related violence and loss (Afghan War Experiences Scale, AWES) (r  =  70, p < 0.001).

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. The indigenous items were among the most frequently endorsed symptoms of distress on the ASCL.

Mumford et al. (2005) Pakistani Anxiety and Depression Questionnaire
(PAD-Q) (with 2 subscales, ‘AD’ and ‘D’)
Either self-report or interviewer administered
  1. The internal consistency (α) of the ‘AD’ scale was 0.92 (95% CI 0.90–0.94) and for the ‘D’ scale was 0.91 (95% CI 0.89–0.93).

  2. Validity

    Discriminant validity: A histogram of ‘AD’ (anxiety/depressive disorders) scale scores showed a clear bimodal distribution between controls and cases. The histogram of ‘D’ scale (depressive disorders) scores was weakly bimodal.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: Sensitivity and specificity were all >80% apart from the specificity of the ‘D’ scale in a sample of depressive v. anxiety patients

Patel et al. (1997) Shona Symptom Questionnaire (SSQ) Interviewer administered
  1. Internal consistency (α)  =  0.85

  2. Validity:

    Discriminant validity: Cases had significantly higher scores than non-cases (mean score, 8.6; 95% CI 7.9–9.2 v. mean score, 4.1; 95% CI 3.6–4.5; p < 0.001)

    Divergent validity: Positive Mental Health Items were all significantly more common among non-cases (r  =  −0.54, p < 0.001).

    The total score correlated strongly with patients' self-assessment of the emotional nature of their illness.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: Validity coefficients: ROC curves suggested an optimal cut-off point of 7/8 (of 14) (area under curve, 0.88; s.e., 0.02).

Phan et al. (2004) Phan Vietnamese Psychiatric Scale (PVPS) Interviewer administered
  1. Internal consistency (α) for the subscales ranged from 0.87 to 0.95

  2. Validity

    Construct validity: Factor analysis – the proposed four-factor structure of the PVPS appears to represent the best four-factor arrangement of the items

    Multitrait – multimeasure analysis also supported the construct validity of the scale

    Of all measures, the PVPS showed the most consistent evidence of discriminant validity

    The PVPS demonstrated good criterion validity against case assignments by psychiatrists, naturalist healers, and structured diagnostic measures.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: Test–retest correlations coefficients were 0.89 for the depression scale (0.88 for affective subscale, 0.89 for the psychovegetative subscale); 0.81 for the anxiety scale, and 0.84 for the somatisation scale.

  5. Other: The PVPS was rated by patients as more acceptable in comparison with other related measures. A larger proportion of patients assessed the PVPS as being more culturally sensitive than other measures.

Rasmussen et al. (2014) ASCL Interviewer administered
  1. Internal consistency (α) of the subscales was as follows: Fisher = −0.23 (unstable); Jigar Khun = 0.91; Aggression = 0.66 (both satisfactory)

  2. Validity

    Construct validity of the scale was carried out by comparing the items' face validity and using exploratory and confirmatory factor analysis: The 3-factor model suggested by the EFA fit the two confirmatory samples adequately (CFI  =  0.943, TLI  =  0.974, RMSEA  =  0.087 in confirmatory sample 1; CFI  =  0.920, TLI  =  0.959, RMSEA  =  0.099 in confirmatory sample)

    Only the second two subscales were used for determining the external validity by association with traumatic exposure and wealth indices. Trauma exposure and wealth were significantly correlated across subscales for women (trauma exposure: Jigar Khun =  0.28, Aggression =  0.25; wealth: Jigar Khun = −0.28, Aggression = −0.27), but inconsistently so for men (trauma exposure: Jigar Khun =  0.18, Aggression =  0.01; wealth: Jigar Khun = −0.11, Aggression =  0.06). This suggests that external validity of the scale was gender dependent.

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. The ASCL was a better measure of distress than the SRQ-20 for women, while the two measures were similar for men.

Rasmussen et al. (2015) Zanmi Lasante Depression Symptom Inventory (ZLDSI) Interviewer administered
  1. Internal consistency: N/A

  2. Validity

    Discriminant validity: Depressed participants (M  =  21.43, s.d.  =  8.31) scored statistically significantly higher than not depressed participants (M  =  14.05, s.d.  =  9.60; t(103)  =  4.17, p < 0.001), with a large effect (Cohen's d  =  0.82).

    Convergent validity: Total scores were strongly associated with functional impairment (WHODAS-II scores), r  =  0.71 (p < 0.001).

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other:

    ROC analysis predicting clinical diagnoses from total scores suggested moderate predictive accuracy. The AUC was 0.71, 95% CI 0.61–0.81.

    The scale had acceptable sensitivity but did less well with specificity which could not be improved without unacceptable losses in sensitivity

Roberts et al. (2006) Hwa-Byung (HB) scale Self-report
  1. Internal Consistency: N/A

  2. Validity

    Convergent validity: Correlation with the somatic MMPI-2 scale: Hs, which measures somatic complaints (0.75), D, which measures symptoms of depression including feelings of sadness (0.52), pessimism, and psychomotor retardation and Hy, which measures the development of physical symptoms in response to stress (0.47). The MMPI-2 psychological and clinical content scales that were hypothesised to correlate best with HB (i.e. HEA, ANX and DEP) yielded correlations of 0.60 or greater.

    Correlation between the HB scale and the Peer Rating Form (developed by the authors to serve as a measure of external validation by identifying 13 peer-rating items that appeared to address the somatic and psychiatric symptoms associated with HB) was moderate (0.21)

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Silove et al. (2009) Index of explosive anger Interviewer administered
  1. Internal consistency: N/A

  2. Construct validity: Theoretically driven predictors were associated with explosive anger, such as exposure to past trauma events where there was an increase in odds according to the number of events endorsed, with a major increase for the highest trauma endorsement group (odds, 95% CI): for 6–10 trauma categories; 3.4 (1.6–7.0); for 11–15 categories: 4.9 (2.2–10.8); and for 16+ categories: 10.7 (4.1–27.3) (Wald: 45.58, p  =  0.000).

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Snodgrass et al. (2017) Positive and Negative Affect Scale (PANAS) Interviewer administered
  1. Internal consistency: α  =  0.87

  2. Validity: (Note: higher score indicates more positive emotions and less negative emotions unless otherwise indicated)

    Convergent validity: Correlation with Hopkins Symptom Checklist (HSCL-10), r  =  −0.49; Bradford Somatic Index (BSI), r  =  −0.34; Correlation with Physical Illness Scale, r  =  −0.30; 4-item stress scale, r  =  −0.50; 4-item subjective wellbeing, r  =  0.63

    Construct validity: Associated as expected with gender, education, income, and household size; Mean scores for negative emotion scale were greater in villages exposed to greater stressors (e.g. relocation, deforestation).

    Content validity: Items loaded onto factors in a way that matched ethnographic data and literature

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Weaver (2017) Tension scale Interviewer administered
  1. Internal consistency:

  2. Validity:

    Convergent validity: correlation with HSCL-25 r =  0.778, p < 0.01 for women without diabetes; r  =  0.807, p < 0.01 for women with diabetes; HSCL-25 depression and anxiety components r  =  all above 0.7, p < 0.01.

    Discriminant validity: those who endorsed no experience of a ‘tension’ scale item scored significantly lower on the HSCL

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A

Weaver & Hadley (2011) Tension scale Interviewer administered
  1. Internal consistency: α  =  0.93

  2. Validity:

    Construct validity: Factor analysis revealed one dominant factor in the ‘tension’ scale

    Convergent validity: HSCL and ‘tension’ scores were moderately correlated (r  =  0.56, p < 0.01)

  3. Inter-rater reliability: N/A

  4. Test–retest reliability: N/A

  5. Other: N/A