Skip to main content
. 2012 Sep 26;10:120. doi: 10.1186/1477-7525-10-120

Table 1.

Summary of psychometric methods

Psychometric property Definition/criteria for acceptability
Acceptability
Assessed by data quality and targeting. Data quality refers to the completeness of item- and scale-level data. Assessed by completeness of data; criterion for missing data <10% [20]. Targeting is the extent to which the range of the variable measured by a scale matches the range of that variable in the study sample. Assessed by: maximum endorsement frequencies <80% [17], aggregate endorsement frequenciesa >10% [17], and skewness statistic −1 to +1 [35-37], proximity of scale mean score to scale midpointb (no fixed criterion but closer matches indicated better targeting) [38], and acceptable distribution of ACTS Burdens scoresc (no fixed criterion but closer to 100% indicates better targeting) [39]
Scaling assumptions
Tests of scaling assumptions assess the extent to which it is legitimate to sum a set of items, without weighting or standardisation, to produce a single total score. This criterion is satisfied when items have adequate corrected-item total correlations ≥0.30 [38,40] and the proposed grouping of items in each subscale is correct. Assessed by using two complementary approaches: principal components analysis (factor loadings >0.30, cross-loadings <0.20) and item convergent and discriminant validity (item own-scale correlations >0.30, magnitude >2 standard errors than other scales)
Reliability
Reliability is the extent to which scale scores are not associated with random error
Internal consistency reliability
The precision of the scale based on the homogeneity (intercorrelations) of items at a single point in time. Assessed using Cronbach’s alpha ≥0.80 [41,42], mean item-item correlations (known as the homogeneity coefficient) ≥0.30 [37] and item-total correlations ≥0.30 [42]
Test-retest reproducibility
This is based on the agreement between people scores at screening and baseline, and estimates the ability of components and scales to produce stable scores [34]. For adequate test-retest reproducibility, scale-level intraclass correlation coefficients ≥0.80 [40] and item-level intraclass correlation coefficients ≥0.50 [43] should be achieved
Validity
Validity is the extent to which a scale measures the construct that it is intended to measure
Validity (within scale)
Evidence that a scale measures a single construct, and that items can be combined to form a summary score. Assessed on the basis of internal consistency reliability (Cronbach’s alpha ≥ 0.80) and factor analysis (factor loadings >0.30, cross-loadings < 0.20)
Validity (correlations between scales)
Correlations between ACTS scales: moderate correlations (0.30–0.70) expected. Correlations between TSQM IId[44] and ACTS scales: low correlations (< 0.30) expected between TSQM II Effectiveness and ACTS Burdens/ACTS Benefits; low correlations (< 0.30) expected between TSQM II Side-effects and ACTS Benefits; moderate correlations (0.30–0.70) expected between TSQM II Side-effects and ACTS Burdens; moderate correlations (0.30–0.70) expected between TSQM II Convenience and ACTS Burdens/ACTS Benefits; moderate correlations (0.30–0.70) expected between TSQM II Global Satisfaction and ACTS Burdens/ACTS Benefits
Discriminant validity
Evidence that a scale is not correlated with other measures of different constructs. Assessed on the basis of correlations between the ACTS and age and gender; low correlations (<0.30) expected between ACTS scores and age and gender
Known-groups validity/hypothesis testing
Ability of a scale to detect hypothesised differences between known subgroups. Assessed by testing the hypothesis that known groups defined on the basis of high vs low ACTS global scores for: i) Burdens (Q13) and ii) Benefits (Q17) will differ significantly (in the expected direction) on ACTS Burdens and Benefits scale scores; based on ANOVA (p<0.05)
Responsiveness The ability of the ACTS Burden and Benefits scales to detect significant change over time, assessed by examining scores at two or more time points of surgery and calculating an effect size statistic calculated as the mean difference (change score) in scores at time point 1 to time point 2 divided by the standard deviation of the time 1 score [44]. Clinically, increasing moderate effect sizes over time would be expected, reflecting improved treatment satisfaction. Effect sizes were interpreted as the following: 0.20 (small change), 0.50 (moderate change) and >0.80 (large change) [45]

aCalculated as the sum of responses between any two adjacent response categories (e.g. if responses to ‘not at all’ = 2% and ‘a little’ = 7%, aggregate endorsement frequency = 9%, which fails the criterion).

bCalculated as possible scale midpoint minus actual scale mean score.

cCalculated as actual scale range divided by possible scale range multiplied by 100.

dTSQM II was designed as a general measure of treatment satisfaction with medication, and includes 11 items in 4 sub-scales (Effectiveness, Side-effects, Convenience, Global Satisfaction).

ACTS Anti-Clot Treatment Scale, ANOVA analysis of variance, TSQM II Treatment Satisfaction Questionnaire for Medication version 2.