Skip to main content
. 2014 Dec 19;2(12):2325967114562191. doi: 10.1177/2325967114562191

Table 1.

Multiple Indicators of Validities and Reliabilities Examineda

Psychometric Property Description/Interpretation
Validities
 Item fit Validity evidence of the 3 instruments (the LE CAT, HOS, and mHHS) was gathered through multiple perspectives. We initially examined whether the data fit the Rasch partial credit model. We utilized the outfit mean square (MNSQ) statistic to measure fit of the data to the Rasch partial credit model. An MNSQ that is <1.5 indicates that the data fit the Rasch model well.2,9,35 If the data do not fit the Rasch partial credit model, it would not be appropriate to proceed to further analyses using this model, as the instrument likely does not conform to the axioms of quantitative measurement.27
 Dimensionality The dimensionality of each of the instruments was investigated to determine if each instrument was unidimensional (measuring a single dimension, eg, construct, idea, phenomenon, factor) or multidimensional. Principal component analyses of residuals were conducted to determine the dimensionality of each instrument. After controlling for the first dimension, if the unexplained variance of the residuals in the first dimension was <5%, the instrument was viewed as unidimensional.35
 Monotonicity Monotonicity refers to the circumstance that item response categories are working as intended in increasing or decreasing hierarchical order. An item lacks monotonicity if the response categories are not correctly ordered (eg, 0 = never, 1 = always, 2 = sometimes). Response categories not in correct orders are also referred as disordered thresholds. A valid working instrument should not contain any items with disordered thresholds.
 Local independence Local independence occurs when the response to one item is independent of the response to another item, after taking into account the first dimension. When local independence is violated, the response to one item determines the response to another item. Local independence was determined by investigating the item residual correlations (residuals are part of the data that are not explained by the first dimension). We considered items with residual correlations >0.8 as substantially departing from local independence.
 Differential item functioning (DIF) DIF measures item bias. A properly constructed instrument should not vary greatly when administered to various subgroups within a population (eg, sex, age, ethnicity, race, socioeconomic status), at different time points, or when employing assorted modes of instrument administration. DIF was assessed on an item by item basis using Mantel-Haenszel chi-square test. We examined age (<65 years or ≥65 years) and sex (male or female) DIF in this study and considered items with Mantel-Haenszel chi-square test P < .05 as having significant DIF.
 Raw score to measure correlation Person raw scores for each of the 3 instruments are on an ordinal scale. Generally, the raw scores are not useful for parametric statistics unless they are in an interval scale. Interval scale scores are called measures. A low correlation between raw scores and measures indicates that it is not appropriate to use common statistical procedures such as sum, mean, standard deviation, and t test. We considered raw scores to measure correlation <0.4 as low and >0.8 as high.
 Instrument coverage Instrument coverage, or targeting, is the extent to which items in an instrument adequately measure the entire range of the sample’s trait levels (eg, ability levels, functioning levels, pain levels). If the items are not able to sufficiently cover people’s upper levels or lower levels of the trait, the instrument is said to have ceiling effects or floor effects, respectively. Instruments with high ceiling or floor effects are not useful for longitudinal or comparative effectiveness studies as they lack the ability to detect changes. Coverage is computed by taking the item and person score distributions (both in interval scale measures) and calculating the percentage of persons on the upper (ceiling) and the lower (floor) ends of the person score distribution that are not aligned with the item score distribution. Instruments >15% ceiling or floor are considered as problematic.
Reliabilities
 Internal consistency Internal consistency reliability is the extent to which all of the items within an instrument measure the same construct. We examined internal consistency of the instruments using the Cronbach alpha. Cronbach alpha ranges from 0 to 1, with a value of ≥0.70 generally regarded as adequate.
 Person separation We also calculated the person separation index (PSI) of the LE CAT, HOS-ADL, HOS-sports, and mHHS. The PSI is similar to the conventional Cronbach alpha except that there is no upper bound to the PSI; the PSI is on a ratio scale and ranges from 0 to infinity. In other words, as opposed to Cronbach alpha, the PSI has no ceiling in measuring reliability. The higher the PSI, the more reliable the instrument.35 An instrument with PSI of <1 is undesirable, as it is insensitive enough to distinguish the sample into at least 2 strata (such as high and low functioning abilities), and thus more items should be added to the instrument.

aHOS-ADL, Hip Outcome Score–activities of daily living subscale; HOS-sports, Hip Outcome Score–sports subscale; LE CAT, Lower Extremity Computerized Adaptive Test; mHHS, modified Harris Hip Score.