Skip to main content
. Author manuscript; available in PMC: 2022 Dec 1.
Published in final edited form as: Ann N Y Acad Sci. 2021 Jul 26;1505(1):23–39. doi: 10.1111/nyas.14667

Table 2.

The psychometric assessment for self-report and observational tools (PAT)

# Properties Definition Score Criteria
1 Sample size: the number of items The size of the sample per item being tested 2 ≥10: 1
1 5:1–10:1
0 <5:1
2 Homogeneity: internal consistency The extent to which items in a (sub)scale are intercorrelated, thus measuring the same construct/concept 2 If 0.70 ≤ Cronbach’s alpha ≤ 0.90
1 If Cronbach’s alpha > 0.90 (indicates potential redundancy) or .60 ≤ Cronbach’s alpha < 0.70.
0 If Cronbach’s alpha < 0.60 or no information is provided
- Not applicable (e.g., instruments with only 1 item)
3 Test–retest reliability The level of agreement on item responses between two or more raters at the same time (applicable for instruments administrated through self- or proxy report) 2 If reliability coefficient (e.g., ICC, Kappa, r, and rs) ≥ 0.80.
1 If 0.60 ≤ reliability coefficient < 0.80 or some coefficients are ≥ 0.80 but others are < 0.60
0 If reliability coefficient < 0.60 or no information is provided
- Not applicable (e.g., instruments administrated through using raters)
4 Intra-rater reliability The consistency of item responses between one rater’s two assessments over time (applicable for instruments administrated through using raters) 2 If reliability coefficient (e.g., ICC, Kappa, r, and rs) ≥ 0.80
1 If 0.60 ≤ reliability coefficient < 0.80 or some coefficients are ≥0.80 but others are <0.60
0 If reliability coefficient <0.60 or no information is provided
- Not applicable (e.g., instruments administrated through self- or proxy report)
5 Inter-rater reliability The consistency of item responses over time (applicable for instruments administrated through using raters) 2 If reliability coefficient (e.g., ICC, Kappa, r, and rs) ≥0.80
1 If 0.60 ≤ reliability coefficient < 0.80 or some coefficients are ≥0.80 but others are <0.60
0 If reliability coefficient <0.60 or no information is provided
- Not applicable (e.g., instruments that are administrated through self- or proxy report)
6 Content validity The degree to which elements of a measure are relevant to and representative of the targeted construct for a particular assessment purpose 2 All aspects, including the instrument aim, target population, measured constructs, AND the item selection process involved the review by target population or experts (e.g., the developer), AND were clearly described (in reviewer’s opinion) with evidence of excellent CVI scores (i.e., I-CVI ≥ 0.78, S-CVI/UA ≥ 0.80, and S-CVI/Ave ≥ 0.90)
1 Most or all aspects, including the instrument aim, target population, measured constructs AND the item selection process involved the review of target population or experts (e.g., the developer), AND were described in moderate clarity (in reviewer’s opinion) with fair to excellent CVI scores (I-CVI = 0.67–0.78, 0.70 ≤ S-CVI/Ave < 0.90, 0.70 ≤ S-CVI/UA < 0.80) of CVI scores
0 Some or all aspects, including the instrument aim, target population, measured constructs with the item selection process involved the review of target population or experts (e.g., the developer), were poorly described or not described at all (in reviewer’s opinion) with unacceptable CVI scores (I-CVI ≤ 0.67, S-CVI/Ave < 0.70, S-CVI/UA < 0.70) or without CVI score
7 Criterion validity: concurrent validity The extent to which the construct measure under development/testing and a criterion measure collected simultaneously or concurrently are correlated 2 If correlation is acceptable to high [correlation coefficient (i.e., r and rs) ≥ 0.60, all P’s < 0.05 based on t-test, ANOVA, or chi-square test, or all 95% CIs are in the significant range], according to the “gold standard” or acceptable according to a “silver standard” and sensitivity/specificity is determined to be acceptable
1 If correlation is moderate to acceptable [0.40 ≤ correlation coefficient (i.e., r and rs) < 0.60, all P values are ranged from 0.05 to 0.10, or some P values/95% CIs are significant, and others are not significant] according to the “gold standard” or acceptable according to a “silver standard”
0 If correlation is low [correlation coefficient (i.e., r and rs) < 0.40, all P > 0.10, all 95% CIs are not in the significant range, or no information is provided]
- Not applicable (e.g., no comparator is identified by authors from literature as criterion for the instrument being tested)
8 Criterion validity: predictive validity The ability of a measure to effectively predict some subsequent and temporally ordered criterion 2 If correlation is acceptable to high (|correlation coefficient| ≥ 0.60, all P’s < 0.05 based on t-test, ANOVA, or chi-square test, or all 95% CIs are in the significant range), according to the “gold standard” or acceptable according to a “silver standard” and sensitivity/specificity is determined to be acceptable
1 If correlation is moderate to acceptable (0.40 ≤ |correlation coefficient| < 0.60, all p values are ranged from 0.05 to 0.10, or some P values/95% CIs are significant, but others are not significant) according to the “gold standard’ or acceptable according to a “silver standard”
0 If correlation is low (|correlation coefficient| < 0.40), all P’s > 0.10 or all 95% CIs are not in the significant range, or no information is provided
- Not applicable (e.g., no comparator is identified by authors from literature as criterion for the instrument being tested)
9 Construct validity: convergent validity The extent to which independent measures of theoretically related constructs converge or are highly correlated 2 If correlation is acceptable to high (|correlation coefficient| ≥ 0.60, all P’s < 0.05 based on t-test, ANOVA, or chi-square test, or all 95% CIs are in the significant range)
1 If correlation is moderate to acceptable (0.40 ≤ |correlation coefficient| < 0.60, all p values are ranged from 0.05–0.10, or some P values/95% CIs are significant, but others are not significant)
0 If correlation is low (|correlation coefficient| < 0.40), all P’s > 0.10 or all 95% CIs are not in the significant range, or no information is provided
- Not applicable (e.g., no comparator is identified by authors from literature for this type of validity for the instrument being tested)
10 Construct validity: divergent validity The extent to which independent measures of theoretically unrelated or distinct constructs diverge or are not correlated 2 If correlation is low (|correlation coefficient| < 0.40), all P’s > 0.10 or all 95% CIs are not in the significant range
1 If correlation is moderate to acceptable (0.40 ≤ |correlation coefficient| < 0.60, all P values are ranged from 0.05 to 0.10, or some P values/95% CIs are significant, but others are not significant)
0 If correlation is acceptable to high (|correlation coefficient| ≥ 0.60, all P’s < 0.05 based on t-test, ANOVA, or chi-square/Fisher’s exact test, or all 95% CIs are in the significant range), or no information is provided
- Not applicable (e.g., no comparator is identified by authors from literature for this type of validity for the instrument being tested)
11 Construct validity: known different groups The extent to which a measure differs as predicted/hypothesized between groups with different levels of the trait being measured 2 If the scale differentiates very well (all P’s < 0.05 based on t-test, ANOVA, or chi-square/Fisher’s exact test, or all 95% CIs are in significant range) between different groups on the level of measured construct
1 If the scale differentiates moderately well (all p values are ranged from 0.05–0.10, or some P values/95% CIs are significant, but others are not significant) between different groups on the level of measured construct
0 If the scale does not differentiate (all P’s > 0.10 or all 95% CIs are not in the significant range), or no information of P values is provided
- Not applicable (e.g., no group difference is identified by authors from literature for the instrument being tested)
12 Construct validity: structural validity Whether a measure assesses a unidimensional construct or multiple domains/factors of a construct 2 Both Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) were done, providing confirmed factor structure of the instrument with acceptable model fit
1 EFA was done resulting in explored factor structure; CFA was not done to confirm the explored factor structure in the population of interest
0 Both EFA and CFA were not performed (principal component analysis is not considered equivalent as factor analysis)
- Not applicable (e.g., instruments with less than three items that may not allow for factor analysis)

Note: CVI, content validity index; I-CVI, item-content validity index; ICC, intraclass correlation; r, Pearson correlation coefficient; rs, Spearman’s rank correlation coefficient; S-CVI/UA, scale-content validity index/universal agreement; S-CVI/Ave, scale-content validity index/average.

Reproduced from Liu, Kim, et al., 2021 with automatic permission following the copyright and permission guidelines of Elsevier (publisher of the International Journal of Nursing Studies where Liu, Kim, et al., 2021 was published). Both Elsevier and John Wiley & Sons Inc. (publisher of Annals of the New York Academy of Sciences) are on the updated list of STM publishers (International Association of Scientific, Technical and Medical Publishers), who have opted out of notifications for permission requests within the specified limits (use no more than three figures or tables from a journal article published by STM publishers).