Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Mar 10.
Published in final edited form as: J Rehabil Res Dev. 2016;53(6):797–812. doi: 10.1682/JRRD.2015.12.0228

Psychometric evaluation of self-report outcome measures for prosthetic applications

Brian J Hafner 1, Sara J Morgan 1, Robert L Askew 2, Rana Salem 1
PMCID: PMC5345485  NIHMSID: NIHMS787296  PMID: 28273329

Abstract

Documentation of clinical outcomes is increasingly expected in delivery of prosthetic services and devices. However, many outcome measures suitable for use in clinical care and research have not been psychometrically tested with prosthesis users. The aim of this study was to determine test-retest reliability, mode-of-administration (MoA) equivalence, standard error of measurement (SEM), and minimal detectable change (MDC) of standardized, self-report instruments that assess constructs of importance to people with lower limb loss. Prosthesis users (n=201) were randomly assigned to groups based on MoA (i.e., paper, electronic, or mixed-mode). Participants completed two surveys 2-3 days apart. Instruments included the Prosthetic Limb Users Survey of Mobility, Prosthesis Evaluation Questionnaire–Mobility Subscale, Activities-Specific Balance Confidence Scale, Quality of Life in Neurological Conditions–Applied Cognition/General Concerns, Patient Reported Outcomes Measurement Information System Profile, and Socket Comfort Score. Intraclass correlation coefficients indicated all instruments are appropriate for group-level comparisons and select instruments are suitable for individual-level applications. Several instruments showed evidence of possible floor and ceiling effects. All were equivalent across MoAs. SEM and MDC were quantified to facilitate interpretation of outcomes and change scores. These results can enhance clinicians' and researchers' ability to select, apply, and interpret scores from instruments administered to prosthesis users.

Keywords: Amputees, artificial limbs, health surveys, outcomes research, outcome assessment (health care), questionnaires, rehabilitation, reproducibility of results

Introduction

Prosthetists and other health care professionals are increasingly encouraged or required to document the effects of the care they provide using valid and reliable outcome measures.1-3 Self-report instruments (i.e., surveys answered directly by a patient) are well suited to clinical applications because they are often brief, easy to complete, and require little time to administer. Further, information derived from self-report is often distinct and essential to understanding the impact of health interventions on the lives of those who receive them. In spite of these benefits, use of standardized outcome measures in clinical practice remains limited.3-6 Formulation of recommendations for instruments suited to clinical care and research involving people with lower limb loss, similar to those that exist for other rehabilitation populations,7,8 may address barriers to outcome measure use and facilitate improved understanding of prosthetic outcomes. However, to develop formal recommendations, evidence of each instrument's performance in the population of interest (e.g., persons with lower limb loss) is needed. Specifically, evidence of key psychometric properties (e.g., reliability, mode of administration equivalence, measurement error, and detectable change) is required to adequately formulate recommendations for how each may be applied.

Evidence of reliability (i.e., reproducibility) within the population of interest is critical for determining an instrument's utility or applications for which it can be recommended.9 It is generally accepted that an instrument must demonstrate test-retest reliability of 0.7 or greater to be recommended for group-level comparisons.10-15 Group-level comparisons are important in clinical trials, observational research studies, and clinical quality-improvement programs. For applications that involve decisions about individual patients or research participants, an instrument must possess much higher (i.e., 0.9 or greater) reliability.12,15-17 Evidence of test-retest reliability therefore becomes a key factor in distinguishing among those instruments that can be recommended for individual-level decisions and those that can be recommended for group-level comparisons.

Evidence of mode-of-administration (MoA) equivalence, or performance across different forms of the same instrument, is needed to demonstrate that scores obtained from different MOAs are directly comparable.18,19 Electronic administration via computer or tablet offers numerous benefits compared to paper surveys, including reduced respondent burden, automated and accurate scoring, and direct import into a medical or research record. Equivalence of paper and electronic MoAs would allow administrators to reap these benefits and have the flexibility to choose the format most appropriate for the respondent (e.g., paper surveys can be given to patients who may not be comfortable with technology). MoA equivalence requires that variations in scores in single-mode (e.g., paper-paper or electronic-electronic) and mixed-mode (e.g., paper-electronic) administrations be equivalent.19 Evidence of MoA equivalence is needed to guide recommendations that may benefit from use of multiple administration methods. For example, clinics may want to provide patients the option of paper surveys or computerized surveys administered on tablet computers.

Lastly, estimates of measurement error and detectable change are required to evaluate and interpret differences or changes in scores observed when using self-report instruments.15 Estimates of measurement error, such as standard error of measurement (SEM), are used to quantify uncertainties in scores or differences in scores obtained between individuals or between groups of individuals. Estimates of detectable change, such as minimum detectable change (MDC), describe a statistical threshold (e.g., 90% or 95% confidence interval) for score differences to be considered “true” change in the context of repeated assessments (i.e., a change in outcome above and beyond that expected from measurement error).20,21 Estimates of change are critical in longitudinal applications, as observed changes that do not exceed a minimum detectable change may not indicate a true change in outcome and should be interpreted with caution. Estimates of measurement error and detectable change are essential to formulating recommendations based on an instrument's potential to assess changes or differences in outcomes.

The importance of the aforementioned psychometric properties cannot be overstated. If self-report outcome measures are to be used with confidence in clinical practice or research, evidence of their performance is needed to justify their selection, use, and interpretation. Few self-report instruments have been evaluated for evidence of test-retest reliability or measurement error in large samples of prosthetic limb users.1,22 None, to-date, have been evaluated for MoA equivalence. Thus, there is a scarcity of evidence required to formulate use recommendations for patients or research participants with lower limb loss. The aim of this research was to acquire the evidence needed to formulate initial recommendations for use of self-report outcome measures in prosthetic clinical care and research. Specifically, we (1) assessed test-retest reliability, (2) evaluated equivalence between paper and electronic MoAs, and (3) derived estimates of SEM and MDC for several self-report measures that are well-suited to quick and efficient assessment of prosthetic outcomes. Results were also used to develop recommendations about measures most appropriate for clinical and/or research applications (e.g., measuring changes in patients over time in clinic settings or measuring differences between groups in research studies).

Methods

Participants

Participants with lower limb loss were recruited through the University of Washington (UW) Department of Rehabilitation Participant Pool, a national registry of individuals interested in participating in rehabilitation research. Individuals in the Participant Pool with limb amputation were invited to participate in the study via their preferred method of communication (i.e., mail or email). Study investigators screened interested individuals by phone, enrolled them into one of three study arms, and scheduled appointments for two survey sessions. Participants were assigned to a study arm (i.e., Arm 1a, 1b, 2, or 3) using simple randomization.23 Eligibility criteria included: (1) 18 years of age or older, (2) lower limb amputation between the hip and ankle, (3) amputation as the result of trauma, dysvascular complications (e.g., diabetes), infection, or tumor, (4) no other amputations (e.g., other leg or arms), (5) use of a lower limb prosthesis to transfer or walk, (6) access to an electronic device with an internet connection, and (7) ability to read, write, and understand English. Approval of study procedures was obtained from a UW institutional review board (IRB), and participants were provided with an information statement prior to beginning the study.

Study Design

We employed a three-arm randomized design (Figure 1) to compare scores from standardized outcome measures administered to participants with lower limb loss at different times and by different MoAs. Each participant was scheduled a “test” and “retest” (i.e., follow-up) survey approximately 2-3 days apart. A minimum period between surveys of 2 days was targeted to mitigate the potential for recollection bias.12 Similarly, the maximum duration between surveys was targeted to minimize natural changes in the selected outcomes (e.g., mobility, physical function, balance). At retest, participants were asked to indicate if they had experienced any changes in health status since the test survey. Those that indicated their health had changed were excluded from the final dataset. Participants in Arm 1 received one paper survey and one electronic survey. The order of the surveys (i.e., paper-electronic or electronic-paper) was assigned randomly, as recommended by Coons et al.24 Participants in Arm 2 were administered two paper surveys, and participants in Arm 3 were administered two electronic surveys.

Figure 1. Study design overview. Participants were assigned to arms based on survey MOA and completed surveys twice within 2-3 days.

Figure 1

Sample Size

Minimum sample size was calculated using the methodology outlined by Walter et al.,25 using α=0.05, β=0.20, ρ0=0.50, ρ1=0.70, and n=2 assessments. The lower bound of ρ0=0.50 was selected because the suitability of measures with this level of reliability is questionable, even for group-level comparisons. The sample size target was increased from 63 to 70 per arm (from n=189 to n=210 across all arms) to account for possible attrition of study participants or changes in health status between the test and retest surveys.

Surveys

Paper surveys were printed on standard letter paper, sealed in individual envelopes, and mailed to participants with instructions and a self-addressed return envelope. Instructions indicated that participants were to open the envelope only at the time of their scheduled appointment. Electronic surveys were created and administered using the Assessment Center (Northwestern University, Chicago, IL).26 Electronic questions were identical to paper questions, but formatting differed slightly to facilitate item-level computerized administration (e.g., response options were presented vertically on computer and horizontally on paper). Electronic surveys were uniquely coded to each participant and sent via an email link with instructions similar to the paper surveys (i.e., that the survey was not to be started until the time of the scheduled appointment). Paper survey responses were double-entered by research staff to minimize data-entry errors.27 Electronic responses were exported directly from the Assessment Center for analysis. Responses from all participants were screened for missing and/or potentially invalid responses. Participants were contacted to clarify responses, as needed.

Measures

Test and retest surveys included questions on demographics and participant characteristics in addition to the standardized self-report measures described below. An ad hoc question was also included in the retest survey to solicit any changes in respondents' health between survey time points. In addition, participants were asked to record the time that they began and ended the paper surveys to calculate the time required to complete each survey. Time to complete electronic surveys was recorded by the Assessment Center administration system.

Demographics and participant characteristics

Demographic information (e.g., gender, ethnicity, race, employment status) and participant characteristics (e.g., height, weight) were collected to describe the study sample. In addition, participants answered questions related to their amputation (e.g., date, cause, and level of amputation) and health (e.g., presence of comorbidities) to characterize their general health.

Outcome Measures

Six self-reported outcome measures suited to prosthetic applications were assessed in this study. The Prosthetic Limb Users Survey of Mobility (PLUS-M) is an item bank developed to measure perceived mobility in people with lower limb amputation.28-30 The PLUS-M 12- and 7-item short forms were both administered in this study. Additionally, the PLUS-M computerized adaptive test (CAT) was administered to participants in Arm 3 (i.e., electronic-electronic). The Prosthesis Evaluation Questionnaire – Mobility Subscale (PEQ-MS) is a 12-item self-report measure assessing the ability to perform mobility tasks while using a lower limb prosthesis.31 The Activities-Specific Balance Confidence Scale (ABC) is a 16-item instrument that measures respondents' confidence in performing basic ambulatory activities.32 Recent Rasch analyses of the PEQ-MS33 and ABC34 resulted in similar recommendations to reduce the instruments' original visual analog scale (PEQ-MS) and 0-100 (ABC) response options to 5-point ordinal scales. These recommended modifications33,34 were incorporated into the instruments administered in this study. The Quality of Life in Neurological Conditions – Applied Cognition/General Concerns v1.0 (NQ-ACGC) is an item bank developed to measure general cognitive abilities, including memory, attention, and decision-making.35 The 8-item NQ-ACGC short form was administered in this study. The Patient-Reported Outcomes Measurement Information System is a compilation of self-report instruments that measure eight symptom and quality of life constructs across patient populations: physical function, anxiety, depression, fatigue, sleep disturbance, social role satisfaction, pain interference, and pain intensity.36,37 The PROMIS 29-Item Profile (PROMIS-29) was administered to participants in this study. The Socket Comfort Score (SCS) is a one-item measure of prosthetic socket comfort.38 Participants' scores were calculated according to developers' instructions and used to evaluate test-retest reliability, MoA equivalence, SEM, and MDC. The ABC and PEQ-MS are scored from 0 to 4 (i.e., average score of all items), and the SCS is scored from 0 to 10 (i.e., score of the single SCS item). PLUS-M, PROMIS-29, and NQ-ACGC are scored on a T-score metric, which has a mean of 50 and standard deviation of 10.39

Statistical Analysis

Differences in participant demographics (e.g., sex, race/ethnicity, employment status, income, education, Veteran status, amputation level, amputation etiology) by study arm were assessed using X2 or Fisher's Exact Test. Descriptive statistics (e.g., means, standard deviations) were calculated for participants' scores at both time points (i.e., test and retest). The distribution of each measure was evaluated for problematic departures from normality using traditional statistical tests and histogram inspections.40,41 Mixed effects linear regression modeling was employed to test differences in mean scores by MoA and time (i.e., test-retest) because of its recognized advantages (e.g., flexibility and robustness) over traditional analysis of variance (ANOVA) analyses. Test-retest reliability was evaluated using the intra-class correlation coefficient (ICC) model 3,142,43 for individual scores using a fixed effect for time (i.e., test-retest) and a random effect for individuals. Confidence intervals for ICCs were derived using the F-distribution.44 MoA equivalence was similarly evaluated by tests of statistically significant differences using an F-distribution.45,46 The a priori alpha level (α =0.05) was adjusted for multiple comparisons using a Bonferroni correction.47 Given that statistically significant differences in ICCs may not always affect recommendations regarding a measure's suitability for clinical or research applications, we also subjectively assessed the ICCs across modes based on the recommended thresholds (i.e., 0.7 for group-level comparisons and 0.9 for intra-individual comparisons).10-17 Accordingly, when ICCs for each outcome measure (across MOAs) were similar (i.e., above 0.9, between 0.7 and 0.9, or below 0.7), global test-retest reliability estimates were computed for participants across all modes. Estimates of measurement error and detectable change (i.e., SEM and MDC) were derived using established algebraic transformations based on calculated ICCs and z-scores for the 90% and 95% confidence interval.43

Results

Two-hundred nineteen participants completed all study procedures (Figure 2). Eighteen participants reported changes in health between the test and retest time points and were excluded from the final dataset to avoid biasing reliability estimates. A variety of changes were reported, ranging from temporary socket discomfort to hospitalization. There were no significant differences between the participants that reported a change in health status over the test-retest period and those that were included in the final dataset. Similarly, there were no significant differences among study arms in terms of participants' gender, race, ethnicity, employment status, income, veteran status, education, amputation level, amputation etiology, age, age at amputation, time since amputation, or average time of prosthesis use per day. Participants (n=201, Table 1) in the final dataset were, on average, 60.2 (SD=11.4) years of age at the time of the survey, 41.8 (SD=17.3) years of age at the time of their amputation, and 18.4 (SD=17.2) years post-amputation. Participants reported wearing their prosthesis, on average, 13.4 (SD=3.8) hours per day. Retest surveys were taken, on average, 48.9 (SD=5.2) hours after the test survey. The average time to complete the test and re-test surveys was 12.3 (SD=7.8) minutes and 10.0 (SD=5.8) minutes, respectively. Paper surveys, on average, took longer to complete (13.8 minutes) than electronic surveys (8.3 minutes).

Figure 2. Study flow diagram.

Figure 2

Table 1.

Participant demographics by Arm. Number and percent of participants (in parentheses) who reported each characteristic are denoted. There were no significant differences in demographic characteristics between study arms (p>0.05).

Characteristic / Arm Arm 1 (Mixed-mode)
n=65
Arm 2 (Paper-only)
n=72
Arm 3 (Electronic-only)
n=64
All Arms
n=201
Sex
 Male 37 (56.9) 55 (76.4) 43 (67.2) 135 (67.2)
 Female 28 (43.1) 17 (23.6) 21 (32.8) 66 (32.8)
Race/Ethnicity
 Non-Hispanic White 57 (87.7) 66 (91.7) 60 (93.8) 183 (91.0)
 Non-Hispanic Black 3 (4.6) 4 (5.6) 2 (3.1) 9 (4.5)
 Other 5 (7.7) 2 (2.8) 2 (3.1) 9 (4.5)
Employment Status
 Employed 24 (36.9) 23 (31.9) 26 (40.6) 73 (36.3)
 Homemaker 2 (3.1) 1 (1.4) 2 (3.1) 5 (2.5)
 Retired 20 (30.8) 25 (34.7) 24 (37.5) 69 (34.3)
 On disability 16 (24.6) 18 (25.0) 10 (15.6) 44 (21.9)
 Unemployed 1 (1.5) 5 (6.9) 2 (3.1) 8 (4.0)
 Student 2 (3.1) 0 (0.0) 0 (0.0) 2 (1.0)
Individual Income
 <$25,000 30 (46.2) 21 (29.2) 19 (29.7) 70 (34.8)
 $25,000-$39,999 8 (12.3) 13 (18.1) 12 (18.8) 33 (16.4)
 $40,000-$54,999 5 (7.7) 8 (11.1) 6 (9.4) 19 (9.5)
 $55,000-$69,999 9 (13.8) 7 (9.7) 8 (12.5) 24 (11.9)
 $70,000-$84,999 3 (4.6) 6 (8.3) 5 (7.8) 14 (7.0)
 $85,000-$99,999 3 (4.6) 3 (4.2) 5 (7.8) 11 (5.5)
 $100,000+ 6 (9.2) 13 (18.1) 7 (10.9) 26 (12.9)
  Not reported 1 (1.5) 1 (1.4) 2 (3.1) 4 (2.0)
Veteran Status
 Not a veteran 50 (76.9) 51 (70.8) 48 (75.0) 149 (74.1)
 Active/veteran 14 (21.5) 21 (29.2) 16 (25.0) 51 (25.4)
   Not reported 1 (1.5) 0 (0.0) 0 (0.0) 1 (0.5)
Education
 High school graduate or less 7 (10.8) 12 (16.7) 6 (9.4) 25 (12.4)
 Some college or tech school 21 (32.3) 31 (43.1) 20 (31.3) 72 (35.8)
 College graduate 20 (30.8) 17 (23.6) 16 (25.0) 53 (26.4)
 Advanced degree 17 (26.2) 12 (16.7) 22 (34.4) 51 (25.4)
Amputation level
 Above knee 26 (40.0) 24 (33.3) 20 (31.3) 70 (34.8)
 Below knee 39 (60.0) 48 (66.7) 44 (68.8) 131 (65.2)
Amputation Etiology
 Dysvascular 17 (26.2) 13 (18.1) 16 (25.0) 46 (22.9)
 Trauma 36 (55.4) 49 (68.1) 36 (56.3) 121 (60.2)
 Infection 9 (13.8) 8 (11.1) 8 (12.5) 25 (12.4)
 Tumor 3 (4.6) 1 (1.4) 4 (6.3) 8 (4.0)
 Congenital 0 (0.0) 1 (1.4) 0 (0.0) 1 (0.5)

Observed score distributions were approximately normal for PLUS-M and PROMIS Physical Function, Fatigue, and Sleep Disturbance. Evidence of potential floor effects were observed for PROMIS Depression, Anxiety, Pain Interference, and Pain Intensity (42%, 34%, 28%, and 12% of respondents scored the minimum score on each instrument, respectively). Similarly, potential ceiling effects were observed for the SCS, PROMIS Physical Function, PROMIS Satisfaction with Social Roles, and NQ-ACGC (14%, 14%, 16%, and 17% of respondents scored the maximum score on each instrument, respectively). No evidence of floor or ceiling effects was present for the ABC, PEQ-MS, or PLUS-M. No statistically significant differences in mean scores (i.e., retest – test) were present between MoA groups (Table 2). Statistically significant effects of time (p<0.05) were observed for five PROMIS measures (i.e., Anxiety, Depression, Fatigue, Pain Intensity, and Sleep Disturbance) and the NQ-ACGC. However, differences between test and retest scores were negligible (i.e., -1.9 to 0.2) and below minimal important difference estimates (i.e., 2.5 to 6.0) reported for PROMIS measures in other clinical populations.48 No statistically significant time by MoA interactions were observed.

Table 2.

Mean scores by mode for test and retest administrations. Higher scores indicate better (desirable) reported health for the ABC, NQ-ACGC, PEQ-MS, PLUS-M (all versions), PROMIS Physical Function, PROMIS Social Role Satisfaction, and SCS. Higher scores indicate worse (undesirable) reported health for the PROMIS Anxiety, PROMIS Depression, PROMIS Fatigue, PROMIS Pain Intensity, PROMIS Pain Interference, and PROMIS Sleep Disturbance.

Measure Mode of Admin n Test Retest


Mean SD Min Max Mean SD Min Max
ABC All modes 201 2.7 0.9 0.4 4.0 2.7 0.9 0.2 4.0
Mixed mode 65 2.6 0.9 0.6 4.0 2.6 0.9 0.3 4.0
Paper only 72 2.7 0.9 0.7 4.0 2.7 0.9 0.2 4.0
Electronic only 64 2.7 0.9 0.4 4.0 2.6 0.9 0.4 4.0
NQ-ACGC All modes 201 45.6 8.2 20.0 59.3 46.2 8.6 20.0 59.3
Mixed mode 65 44.5 8.2 29.5 59.3 44.7 8.4 31.1 59.3
Paper only 72 45.4 8.5 27.8 59.3 46.5 9.2 26.9 59.3
Electronic only 64 46.9 8.0 20.0 59.3 47.2 8.0 20.0 59.3
PEQ-MS All modes 201 2.7 0.9 0.0 4.0 2.7 0.9 0.0 4.0
Mixed mode 65 2.6 0.9 0.0 4.0 2.6 0.8 0.0 4.0
Paper only 72 2.8 0.9 0.3 4.0 2.8 0.9 0.3 4.0
Electronic only 64 2.7 0.9 0.7 4.0 2.7 0.9 0.8 4.0
PLUS-M All modes 64 52.5 10.0 28.8 76.6 51.9 9.1 31.3 70.6
 CAT Mixed mode - - - - - - - -
Paper only - - - - - - - -
Electronic only 64 52.5 10.0 28.8 76.6 51.9 9.1 31.3 70.6
PLUS-M All modes 201 51.8 9.3 25.2 71.4 51.5 9.7 21.8 71.4
 12-Item Short Form Mixed mode 65 51.1 9.1 28.7 71.4 51.1 9.7 25.2 71.4
Paper only 72 52.7 9.3 25.2 71.4 52.6 9.5 25.2 71.4
Electronic only 64 51.4 9.6 30.0 71.4 50.9 9.9 21.8 71.4
PLUS-M All modes 201 51.9 9.2 27.0 69.9 51.7 9.4 23.3 69.9
 7-Item Short Form Mixed mode 65 51.3 9.3 31.1 69.9 51.4 9.8 27.0 69.9
Paper only 72 52.8 9.1 27.0 69.9 52.5 9.1 27.0 69.9
Electronic only 64 51.5 9.4 31.1 69.9 51.1 9.3 23.3 69.9
PROMIS All modes 201 50.6 9.1 40.3 81.6 49.1 8.7 40.3 71.2
 Anxiety Mixed mode 65 50.4 9.2 40.3 67.3 49.2 8.4 40.3 65.3
Paper only 72 50.3 9.1 40.3 67.3 49.0 9.3 40.3 69.3
Electronic only 64 51.2 9.1 40.3 81.6 49.2 8.3 40.3 71.2
PROMIS All modes 201 49.5 8.4 41.0 73.3 48.5 8.3 41.0 73.3
 Depression Mixed mode 65 49.6 8.5 41.0 69.4 48.9 9.3 41.0 71.2
Paper only 72 49.9 9.0 41.0 73.3 48.9 8.5 41.0 73.3
Electronic only 64 49.0 7.9 41.0 67.5 47.5 7.2 41.0 69.4
PROMIS All modes 201 50.3 8.4 33.7 69.0 48.3 8.4 33.7 69.0
 Fatigue Mixed mode 65 50.4 7.5 33.7 64.6 48.8 7.4 33.7 64.6
Paper only 72 49.6 9.4 33.7 69.0 47.7 9.6 33.7 64.6
Electronic only 64 51.0 8.0 33.7 69.0 48.6 7.9 33.7 69.0
PROMIS All modes 201 3.1 2.3 0.0 9.0 3.3 2.5 0.0 9.0
 Pain Intensity Mixed mode 65 3.0 2.2 0.0 8.0 3.0 2.3 0.0 8.0
Paper only 72 3.2 2.6 0.0 9.0 3.3 2.7 0.0 9.0
Electronic only 64 3.1 2.1 0.0 8.0 3.5 2.3 0.0 9.0
PROMIS All modes 201 53.7 8.8 41.6 75.6 53.2 8.3 41.6 75.6
 Pain Interference Mixed mode 65 52.2 9.2 41.6 68.0 52.6 8.6 41.6 69.7
Paper only 72 53.5 9.4 41.6 75.6 53.0 9.0 41.6 75.6
Electronic only 64 55.4 7.6 41.6 69.7 54.1 7.3 41.6 75.6
PROMIS All modes 201 42.3 7.6 26.9 56.9 42.5 7.7 26.9 56.9
 Physical Function Mixed mode 65 41.7 7.7 26.9 56.9 41.7 7.6 26.9 56.9
Paper only 72 43.0 6.9 26.9 56.9 43.4 7.4 26.9 56.9
Electronic only 64 42.0 8.4 26.9 56.9 42.3 8.0 29.1 56.9
PROMIS All modes 201 49.3 8.4 32.0 73.3 48.2 8.7 32.0 68.8
 Sleep Disturbance Mixed mode 65 49.1 7.2 32.0 68.8 47.8 8.1 32.0 68.8
Paper only 72 49.7 10.1 32.0 73.3 48.1 10.0 32.0 66.0
Electronic only 64 49.2 7.5 32.0 68.8 48.6 7.9 32.0 63.8
PROMIS All modes 201 50.1 8.8 29.0 64.1 50.9 9.1 29.0 64.1
 Social Role Satisfaction Mixed mode 65 48.9 8.2 29.0 64.1 49.6 8.9 29.0 64.1
Paper only 72 51.0 9.6 29.0 64.1 52.3 9.6 29.0 64.1
Electronic only 64 50.4 8.6 35.7 64.1 50.8 8.7 33.6 64.1
SCS All modes 201 7.2 2.3 0.0 10.0 7.2 2.3 1.0 10.0
Mixed mode 65 7.6 2.2 1.0 10.0 7.7 2.1 1.0 10.0
Paper only 72 7.1 2.6 0.0 10.0 7.1 2.5 1.0 10.0
Electronic only 64 6.8 2.1 3.0 10.0 6.8 2.2 2.0 10.0

Test-Retest Reliability

Test-rest reliability ICCs varied by instrument (Table 3). PROMIS and Neuro-QoL measures exhibited ICCs between 0.7 and 0.9, indicating they are appropriate for group-level comparisons, irrespective of MoA. ICCs for PLUS-M, PEQ-MS, and ABC were above 0.9, indicating they are appropriate for individual-level monitoring and decision-making. SCS test-retest ICCs ranged from 0.63-0.79, depending on MoA, indicating that it is appropriate for group-level comparisons, but only when administered in a single mode (e.g., either paper only or electronic only).

Table 3. Reliability of self-report instruments by MoA (mixed-mode, paper-only, electronic only) in people with lower limb loss.

Measure Mode of Admin ICC 95%LB 95%UB
ABC Mixed mode 0.94 0.90 0.96
Paper only 0.94 0.91 0.96
Electronic only 0.96 0.94 0.98
NQ-ACGC Mixed mode 0.88 0.82 0.93
Paper only 0.90 0.85 0.94
Electronic only 0.86 0.77 0.91
PEQ-MS Mixed mode 0.90 0.84 0.94
Paper only 0.95 0.93 0.97
Electronic only 0.91 0.86 0.95
PLUS-M Mixed mode - - -
 CAT Paper only - - -
Electronic only 0.92 0.87 0.95
PLUS-M Mixed mode 0.95 0.92 0.97
 12-item Short Form Paper only 0.97 0.95 0.98
Electronic only 0.95 0.92 0.97
PLUS-M Mixed mode 0.94 0.91 0.97
 7-item Short Form Paper only 0.97 0.95 0.98
Electronic only 0.94 0.91 0.97
PROMIS Mixed mode 0.80 0.70 0.87
 Anxiety Paper only 0.89 0.83 0.93
Electronic only 0.88 0.80 0.92
PROMIS Mixed mode 0.89 0.83 0.93
 Depression Paper only 0.90 0.85 0.94
Electronic only 0.84 0.74 0.90
PROMIS Mixed mode 0.83 0.73 0.89
 Fatigue Paper only 0.88 0.82 0.93
Electronic only 0.79 0.68 0.87
PROMIS Mixed mode 0.87 0.80 0.92
 Pain Intensity Paper only 0.89 0.83 0.93
Electronic only 0.85 0.77 0.91
PROMIS Mixed mode 0.81 0.71 0.88
 Pain Interference Paper only 0.86 0.78 0.91
Electronic only 0.77 0.64 0.85
PROMIS Mixed mode 0.86 0.79 0.92
 Physical Function Paper only 0.90 0.85 0.94
Electronic only 0.87 0.80 0.92
PROMIS Mixed mode 0.78 0.67 0.86
 Sleep Disturbance Paper only 0.89 0.83 0.93
Electronic only 0.85 0.77 0.91
PROMIS Mixed mode 0.82 0.72 0.89
 Social Role Satisfaction Paper only 0.75 0.63 0.84
Electronic only 0.81 0.71 0.88
SCS Mixed mode 0.63 0.45 0.75
Paper only 0.77 0.66 0.85
Electronic only 0.79 0.67 0.86

Mode of Administration Equivalence

ICCs by MoA varied slightly by measure (Table 3), but were generally consistent relative to the established reliability thresholds (i.e., below 0.7, between 0.7 and 0.9, or above 0.9). No significant differences in ICCs were observed by MoA for any of the measures, with the exception of PEQ-MS (p = 0.04). However, all PEQ-MS ICCs exceeded the 0.9 threshold recommended for individual-level applications.

SEM and MDC

As most measures were determined to be equivalent across MoAs or were consistently within ranges established by the reliability thresholds, SEM and MDC estimates were derived from combined ICCs (Table 4). For the SCS, SEM and MDC were derived for each mode, given that instrument's low (and variable) reliability by MoA. PLUS-M showed the lowest MDC (4.50) for all instruments scored using the T-metric (i.e., NG-ACGC, PLUS-M, and PROMIS instruments). MDC estimates for measures that used an average score (i.e., ABC and PEQ-MS) were comparable (i.e., 0.49 and 0.55, respectively). Interestingly, MDC for the SCS (2.73) was larger than the PROMIS Pain Intensity scale (1.97), although both instruments are scored similarly (i.e., 0 to 10 scale).

Table 4.

Test-retest reliability, SEM, and MDC of self-report instruments in people with lower limb loss. Reliability, SEM, and MDC are presented separately by MoA when differences were observed.

Measure ICC 95%LB 95%UB SEM MDC(90) MDC(95)
ABC 0.95 0.93 0.96 0.21 0.49 0.58
NQ-ACGC 0.88 0.85 0.91 2.87 6.67 7.94
PEQ-MS 0.92 0.90 0.94 0.24 0.55 0.65
PLUS-M
 CAT 0.92 0.87 0.95 2.79 6.42 7.65
 12-Item Short Form 0.96 0.95 0.97 1.93 4.50 5.36
 7-Item Short Form 0.95 0.94 0.96 2.02 4.69 5.59
PROMIS
 Anxiety 0.86 0.82 0.89 3.36 7.81 9.31
 Depression 0.88 0.85 0.91 2.89 6.71 8.00
 Fatigue 0.84 0.80 0.88 3.33 7.74 9.22
 Pain Intensity 0.87 0.84 0.90 0.85 1.97 2.35
 Pain Interference 0.82 0.77 0.86 3.66 8.51 10.14
 Physical Function 0.88 0.85 0.91 2.64 6.13 7.31
 Sleep Disturbance 0.85 0.81 0.89 3.27 7.61 9.07
 Social Role Satisfaction 0.79 0.73 0.84 4.10 9.53 11.36
SCS
 All modes 0.74 0.67 0.80 1.18 2.73 3.26
 Mixed mode 0.63 0.45 0.75 1.30 3.03 3.61
 Paper only 0.77 0.66 0.85 1.21 2.82 3.36
 Electronic only 0.79 0.67 0.86 0.99 2.31 2.75

Discussion

The aim of this study was to evaluate key psychometric properties for six self-report measures suitable for use in prosthetic clinical care and research. The estimates of reliability, MoA equivalence, SEM, and MDC provided here can help clinicians and researchers to select, use, and interpret information provided by the studied self-report outcome measures. To our knowledge, this is the first study to assess reliability, MoA equivalence, SEM, and MDC of these instruments in people with lower limb loss. Our sample (n=201) was large relative to similar studies that assessed reliability of self-report instruments in people with lower limb loss,49,50 and exceeded the minimum threshold (i.e., n=100) recommended by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) criterion for an ‘excellent’ rating in studies of instrument reliability.51 Demographics of participants included in this study were similar those reported in a large national survey of 935 people with limb loss.52 Sex and ethnicity in our study sample (i.e., 67% male and 91% non-Hispanic white) were nearly identical to those in the large national sample. Participants here were slightly older (i.e., 60 years) and there was a larger proportion of participants with traumatic amputation (i.e., 60%), compared to the prior study (i.e., 50 years and 39%, respectively). However, the overall similarities between samples suggest the results obtained here can be well generalized to people with lower limb loss.

Test-Retest Reliability

Test-retest reliability (i.e., reproducibility) provides critical information about instruments' stability of measurement when respondents are not experiencing change, and estimates of reliability obtained in these situations indicate applications for which the instrument may be appropriate. Measures with test-retest reliability estimates below 0.7 have an unacceptable amount of error variance (e.g., intra-individual variation, measurement error) and thus are typically not appropriate for use in clinical care or research. Measures with test-retest reliability estimates above 0.7 have an acceptable level of error variance for assessment of differences between groups.10-15 However, when the goal of measurement is to assess true change within an individual over time, very low error variance is acceptable. Thus, a minimum test-retest reliability estimate of 0.9 has been suggested as the cutoff for intra-individual measurement.15-17

All measures assessed in this study, with the exception of SCS, were found to have test-retest reliability ICCs acceptable for group-level comparisons (i.e., ICC ≥ 0.7), irrespective of MoA. The SCS was found to have test-retest reliability ICCs over 0.7 when administered by either paper-only or electronic-only versions, and thus can be recommended for group-level comparisons only when a single administration method is used. Three measures (i.e., PLUS-M, PEQ-MS, and ABC) were estimated to have test-retest reliability ICCs suitable for use in individual-level comparisons (i.e., ICC ≥ 0.9). Reliability estimates in this study for the PEQ-MS and the ABC are slightly higher than those published for the 7-level response scale version of the PEQ-MS (ICC=0.85)49 and the 101-level response scale version of the ABC (ICC= 0.91),53 suggesting that the 5-level response scales recommended in subsequent studies33,34 (and used in the present study) may provide greater stability than the original versions of these instruments.

Mode of Administration Equivalence

MoA equivalence is essential if data are collected via different methods (i.e., paper surveys, electronic surveys) and scores are to be compared or aggregated across modes (e.g., if a practitioner were to administer a paper survey in clinic and then send a patient an email survey at follow-up). Recent meta-analyses have concluded equivalence of paper and electronic self-report instruments in both healthy individuals and those with a variety of medical conditions.19,54,55 However, none of the studies included in the meta-analyses targeted (or, to our knowledge, included) people with lower limb loss. This study addresses this gap and contributes evidence to the body of knowledge regarding MoA equivalence.

Results of our study provided evidence of statistical MoA equivalence for five of the six measures, suggesting that paper and electronic forms of these measures are directly comparable. The PEQ-MS was not statistically equivalent across modes. However, the test-retest ICCs for this instrument were high and similar across modes (i.e., all ICCs ≥ 0.9), suggesting reliability of the PEQ-MS does not meaningfully differ across modes. Further, although the statistical analysis for the SCS demonstrated equivalence between modes, mixed-mode administration of this measure resulted in test-retest ICCs below the cut-off for group-level comparisons (i.e., ICC=0.63) and within-mode administration resulted in ICCs above 0.7 (i.e., paper-only ICC=0.77, electronic-only ICC=0.79). Because mixing MoAs appears to adversely affect reliability of the SCS, we recommend it be administered using only a single method (i.e., paper-only or electronic-only) across all individuals whose scores are to be combined or compared. Our results are largely consistent with prior findings, 19,54,55 and indicate that paper and electronic surveys can generally be used interchangeably in people with lower limb loss, with the notable exception of the SCS. The format in which the SCS was presented to respondents in the paper and electronic surveys (i.e., 11-point horizontal ordered response scale and 11-point drop-down menu, respectively) may have affected the cross-modal reliability of the instrument. Response options for other measures were generally more similar across modes (i.e., ABC, PEQ-MS, PLUS-M, and PROMIS all used horizontal 5-point ordered response scales in the paper mode and vertical 5-point ordered response scales in the electronic mode) and may have improved stability across modes, compared to the enumerated scale used by the SCS.56 However, as the SCS and PROMIS Pain Intensity instruments are constructed and administered similarly, the disparate MoA equivalency results found between these two instruments was unexpected. Further research is needed to ascertain the source of mixed-mode measurement error (or variation) with the SCS.

Measurement Error and Detectable Change

Estimates of measurement error (and detectable change) can be used to evaluate individuals' scores with respect to threshold (i.e., cutoff) values or previous measurements.43 They can also help in determining sample size estimates in group research.12 SEM and MDC values obtained in this study were derived in a manner similar to a recent similar study by Resnik et al. that examined reliability of performance-based and self-report instruments in 44 people with lower limb loss. Direct comparison of SEM and MDC values is difficult, as instruments differed between studies. However, values we obtained were similar to those they derived, when evaluated as a percentage of each instrument's overall range. For example, MDC of the PEQ-MS in our study (0.55) was 13.7% of the scale range (4.0), and MDC of the 7-level response PEQ-MS used in the prior study (0.8) was 13.3% of the scale range (6.0).49

Instruments in our study that included more than 10 items (i.e., ABC, PEQ-MS, PLUS-M 12-item short form) generally had lower measurement error than instruments with fewer than 10 items (e.g., PROMIS-29 instruments, NQ-ACGC) or single-item instruments (PROMIS Pain Intensity and SCS). This may be expected, as the relationship between instrument length and measurement error is well established.57,58 Estimates varied, however, even among items of similar length. For example, the PLUS-M 7-item short form has lower MDC (11.2% of the scale range) compared to the 8-item NQ-ACGC (17.0%). There was also variation in MDC estimates for the 4-item PROMIS instruments included in the PROMIS-29 Profile (17.5%-27.2%). All three versions of PLUS-M (i.e., CAT, 12-item short form, and 7-item short form) had slightly lower estimates of MDC (9.1-11.2% of the scale range) than the 16-item ABC (12.2%) and 12-item PEQ-MS (13.7%), which measure similar constructs. Thus, for measurement of mobility and balance in longitudinal applications (e.g., monitoring patients or participants over time), the PLUS-M 12-item short form and ABC are recommended.

Limitations

This study included a number of health status instruments designed to measure constructs of importance to prosthesis users, care providers, and researchers. PROMIS and Neuro-Qol instruments included in this study are available in lengths other than those tested. We used the 4-item versions of PROMIS instruments included in the PROMIS-29 Profile and the 8-item version of the NQ-ACGC. Estimates of reliability, MoA equivalence, measurement error, and detectable change derived here may therefore not apply to longer versions of these instruments (e.g., PROMIS Physical Function short forms are also available in 10- and 20-item lengths59). For example, evidence of potential floor and ceiling effects observed in this study may be due to the administered versions' limited range of measurement. Although scores obtained with instruments from the same item bank are comparable,60 additional research will be required to determine if different lengths of these instruments function similarly in people with lower limb loss.

The time between test and retest administrations in this study was relatively short (mean=48.9 hours). As such, respondents may have recalled responses to select questions. However, each survey included a large number of questions (n=78), and participants were not allowed to take retest surveys until 2-3 days had passed. Although test-retest periods of up to 2 weeks may be advocated for self-report measures,11 evidence suggests that reliability of health status surveys is unaffected by test-retest periods of 2 days to 2 weeks,61 Thus, we believe it is unlikely that memory effects significantly affected results in the present study.

We evaluated MOA equivalence by comparing reliability estimates among three distinct administration modes (i.e., electronic, paper, and mixed). While significant differences would indicate lack of MOA equivalence, a more thorough evaluation of MOA would require multi-group confirmatory factor analysis (MGFCA), or for measures developed within an item response theory (IRT) framework, an assessment of differential item function (DIF). MGCFA may provide evidence of equivalent factor structures across MOAs, whereas DIF analyses may provide evidence of MOA equivalence at the item level. However, MGCFA and DIF analyses require significantly larger sample sizes than those in this study (i.e., 200-500 people in each arm).24,62 Given that the sample size in our study would likely bias results in favor of population invariance (i.e., MOA equivalence), we limited the scope of our evaluations to differences in estimates of reliability.

Only paper and electronic methods of administration modes were included in this study. Although we determined that most instruments were equivalent by MoA, results obtained here may not apply to other MoAs, such as face-to-face administration. Face-to-face administration introduces possible measurement biases (e.g., social desirability63) that may disproportionally affect responses, relative to other MOAs. Setting of the interview may also have an effect on interview responses. Evidence to-date regarding equivalency of assisted (e.g., face-to-face interview) and self-report (e.g., paper or electronic survey) is limited,54 and further research in people with limb loss is required before equivalence can be verified.

Results of this study provide valuable evidence of test-retest reliability, MOA equivalency, measurement error, and detectable change for instruments suited to measuring prosthetic limb users. However, evidence of other measurement properties (e.g., validity and responsiveness) in this population may also guide how these instruments can and should be applied in clinical practice or research. Establishing evidence of these properties in people with lower limb loss is beyond the scope of this study, but may be considered a priority for future research.

Conclusions

The estimates of test-retest reliability, mode of administration equivalence, measurement error, and detectable change reported in this study can help clinicians and researchers better select, administer, and interpret outcomes from the self-report instruments. Reliability estimates showed that all of the studied measures are suited to group-level applications, and select instruments (i.e., ABC, PEQ-MS, and PLUS-M) are suited to individual-level applications based on thresholds established in the literature. Several instruments (i.e., PROMIS-29, NQ-ACGC, and SCS) showed evidence of potential floor or ceiling effects. SEM values derived in this study can allow users to calculate confident intervals around individual scores. Similarly, the derived estimates for MDC can be used as a guide to assess whether differences in scores represent a true change or error in the measurement over repeated measurement.

Acknowledgments

Funding/Support: This research is supported by the Orthotic and Prosthetic Education and Research Foundation (OPERF grant number 2014-SGA-1), the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NIH grant number HD-065340), and the Department of Education (NIDRR grant number H133P080006). The content is solely the responsibility of the authors and does not necessarily represent the official views of the Orthotic and Prosthetic Education and Research Foundation, the National Institutes of Health, or the Department of Education.

Acronyms

ABC

Activities-Specific Balance Confidence Scale

ICC

Intraclass correlation coefficient

LB

lower bound

MDC

Minimum detectable change

MoA

Mode of administration

NQ-ACGC

Quality of Life in Neurological Conditions–Applied Cognition/General Concerns

PEQ-MS

Prosthesis Evaluation Questionnaire–Mobility Subscale

PLUS-M

Prosthetic Limb Users Survey of Mobility

PROMIS

Patient Reported Outcome Measurement Information System

SD

standard deviation

SEM

Standard error of measurement

UB

upper bound

Footnotes

Clinical Trial Registration: Not required

Author Contributions: Study concept and design: Brian Hafner, Sara Morgan, Robert Askew

Analysis and interpretation of data: Robert Askew, Rana Salem, Brian Hafner, Sara Morgan

Drafting of manuscript: Brian Hafner, Sara Morgan

Critical revision of manuscript for important intellectual content: Robert Askew, Rana Salem

Additional Contributions: The authors gratefully acknowledge Andre Kajlich, Meighan Rasley, and Olga Kildisheva for their assistance with participant recruitment and data collection.

Financial Disclosures: The authors have declared that no competing interests exist.

Institutional Review: Human subject approval was received from a University of Washington Internal Review Board, and informed consent was obtained from subjects before study procedures were initiated.

Participant Follow-Up: The authors do not plan to notify participants of the publication of this study directly. This publication will be listed and linked on the research center's public website, which has been provided to all study participants.

Obtained funding: Brian Hafner

References

  • 1.Heinemann AW, Connelly L, Ehrlich-Jones L, Fatone S. Outcome instruments for prosthetics: clinical applications. Phys Med Rehabil Clin N Am. 2014;25(1):179–98. doi: 10.1016/j.pmr.2013.09.002. [DOI] [PubMed] [Google Scholar]
  • 2.Wedge FM, Braswell-Christy J, Brown CJ, Foley KT, Graham C, Shaw S. Factors influencing the use of outcome measures in physical therapy practice. Physiotherapy theory and practice. 2012;28(2):119–33. doi: 10.3109/09593985.2011.578706. [DOI] [PubMed] [Google Scholar]
  • 3.Jette DU, Halbert J, Iverson C, Miceli E, Shah P. Use of standardized outcome measures in physical therapist practice: perceptions and applications. Phys Ther. 2009;89(2):125–35. doi: 10.2522/ptj.20080234. [DOI] [PubMed] [Google Scholar]
  • 4.Gaunaurd I, Spaulding SE, Amtmann D, Salem R, Gailey R, Morgan SJ, Hafner BJ. Use of and confidence in administering outcome measures among clinical prosthetists: Results from a national survey and mixed-methods training program. Prosthet Orthot Int. 2015;39(4):314–21. doi: 10.1177/0309364614532865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Stapleton T, McBrearty C. Use of Standardised Assessments and Outcome Measures among a Sample of Irish Occupational Therapists working with Adults with Physical Disabilities. Br J Occup Ther. 2009;72(2):55–64. [Google Scholar]
  • 6.Hatfield DR, Ogles BM. The use of outcome measures by psychologists in clinical practice. Prof Psychol-Res Pract. 2004;35(5):485–91. [Google Scholar]
  • 7.Sullivan JE, Crowner BE, Kluding PM, Nichols D, Rose DK, Yoshida R, Pinto Zipp G. Outcome measures for individuals with stroke: process and recommendations from the American Physical Therapy Association neurology section task force. Phys Ther. 2013;93(10):1383–96. doi: 10.2522/ptj.20120492. [DOI] [PubMed] [Google Scholar]
  • 8.Potter K, Cohen ET, Allen DD, Bennett SE, Brandfass KG, Widener GL, Yorke AM. Outcome measures for individuals with multiple sclerosis: recommendations from the American Physical Therapy Association Neurology Section task force. Phys Ther. 2014;94(5):593–608. doi: 10.2522/ptj.20130149. [DOI] [PubMed] [Google Scholar]
  • 9.Roach KE. Measurement of health outcomes: reliability, validity and responsiveness. J Prosthet Orthot. 2006;18(1S):8–12. [Google Scholar]
  • 10.Reeve BB, Wyrwich KW, Wu AW, Velikova G, Terwee CB, Snyder CF, Schwartz C, Revicki DA, Moinpour CM, McLeod LD, Lyons JC, Lenderking WR, Hinds PS, Hays RD, Greenhalgh J, Gershon R, Feeny D, Fayers PM, Cella D, Brundage M, Ahmed S, Aaronson NK, Butt Z. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Qual Life Res. 2013;22(8):1889–905. doi: 10.1007/s11136-012-0344-y. [DOI] [PubMed] [Google Scholar]
  • 11.Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. doi: 10.1016/j.jclinepi.2006.03.012. [DOI] [PubMed] [Google Scholar]
  • 12.Frost MH, Reeve BB, Liepa AM, Stauffer JW, Hays RD. What is sufficient evidence for the reliability and validity of patient-reported outcome measures? Value Health. 2007;10(Suppl 2):S94–s105. doi: 10.1111/j.1524-4733.2007.00272.x. [DOI] [PubMed] [Google Scholar]
  • 13.Lohr KN. Rating the strength of scientific evidence: relevance for quality improvement programs. Int J Qual Health Care. 2004;16(1):9–18. doi: 10.1093/intqhc/mzh005. [DOI] [PubMed] [Google Scholar]
  • 14.Revicki DA, Osoba D, Fairclough D, Barofsky I, Berzon R, Leidy NK, Rothman M. Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Qual Life Res. 2000;9(8):887–900. doi: 10.1023/a:1008996223999. [DOI] [PubMed] [Google Scholar]
  • 15.Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess. 1998;2(14):i–iv. 1–74. [PubMed] [Google Scholar]
  • 16.Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;30(1):1–15. doi: 10.2165/00007256-200030010-00001. [DOI] [PubMed] [Google Scholar]
  • 17.Nunnally JC, Bernstein IH. Psychometric theory. 3rd. New York: McGraw-Hill; 1994. p. 752. xxiv. [Google Scholar]
  • 18.Hood K, Robling M, Ingledew D, Gillespie D, Greene G, Ivins R, Russell I, Sayers A, Shaw C, Williams J. Mode of data elicitation, acquisition and response to surveys: a systematic review. Health Technol Assess. 2012;16(27):1. doi: 10.3310/hta16270. + [DOI] [PubMed] [Google Scholar]
  • 19.Gwaltney CJ, Shields AL, Shiffman S. Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: a meta-analytic review. Value Health. 2008;11(2):322–33. doi: 10.1111/j.1524-4733.2007.00231.x. [DOI] [PubMed] [Google Scholar]
  • 20.Schmitt JS, Di Fabio RP. Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. J Clin Epidemiol. 2004;57(10):1008–18. doi: 10.1016/j.jclinepi.2004.02.007. [DOI] [PubMed] [Google Scholar]
  • 21.Ottenbacher KJ, Johnson MB, Hojem M. The significance of clinical change and clinical change of significance: issues and methods. Am J Occup Ther. 1988;42(3):156–63. doi: 10.5014/ajot.42.3.156. [DOI] [PubMed] [Google Scholar]
  • 22.Condie E, Scott H, Treweek S. Lower limb prosthetic outcome measures: a review of the literature 1995 to 2005. J Prosthet Orthot. 2006;18(1S):13–45. [Google Scholar]
  • 23.Suresh K. An overview of randomization techniques: An unbiased assessment of outcome in clinical research. Journal of human reproductive sciences. 2011;4(1):8–11. doi: 10.4103/0974-1208.82352. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 24.Coons SJ, Gwaltney CJ, Hays RD, Lundy JJ, Sloan JA, Revicki DA, Lenderking WR, Cella D, Basch E. Recommendations on evidence needed to support measurement equivalence between electronic and paper-based patient-reported outcome (PRO) measures: ISPOR ePRO Good Research Practices Task Force report. Value Health. 2009;12(4):419–29. doi: 10.1111/j.1524-4733.2008.00470.x. [DOI] [PubMed] [Google Scholar]
  • 25.Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;17(1):101–10. doi: 10.1002/(sici)1097-0258(19980115)17:1<101::aid-sim727>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
  • 26.Gershon R, Rothrock NE, Hanrahan RT, Jansky LJ, Harniss M, Riley W. The development of a clinical outcomes survey research application: Assessment Center. Qual Life Res. 2010;19(5):677–85. doi: 10.1007/s11136-010-9634-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Paulsen A, Overgaard S, Lauritsen JM. Quality of data entry using single entry, double entry and automated forms processing--an example based on a study of patient-reported outcomes. Plos One. 2012;7(4):e35087. doi: 10.1371/journal.pone.0035087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hafner BJ, Spualding SE, Salem R, Morgan SJ, Gaunaurd IA, Gailey RS. Prosthetists' perceptions and use of outcome measures in clinical practice: long-term effects of focused continuing education. Prosthet Orthot Int. 2016 doi: 10.1177/0309364616664152. (under review) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Amtmann D, Abrahamson D, Morgan S, Salem R, Askew R, Gailey R, Gaunaurd I, Kajlich A, Hafner B. The PLUS-M: item bank of mobility for prosthetic limb users. Qual Life Res. 2014;23:39–40. [Google Scholar]
  • 30.Morgan SJ, Amtmann D, Abrahamson DC, Kajlich AJ, Hafner BJ. Use of cognitive interviews in the development of the PLUS-M item bank. Qual Life Res. 2014;23(6):1767–75. doi: 10.1007/s11136-013-0618-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Legro MW, Reiber GD, Smith DG, del Aguila M, Larsen J, Boone D. Prosthesis evaluation questionnaire for persons with lower limb amputations: assessing prosthesis-related quality of life. Arch Phys Med Rehabil. 1998;79(8):931–8. doi: 10.1016/s0003-9993(98)90090-9. [DOI] [PubMed] [Google Scholar]
  • 32.Powell LE, Myers AM. The Activities-specific Balance Confidence (ABC) Scale. J Gerontol A Biol Sci Med Sci. 1995;50A(1):M28–34. doi: 10.1093/gerona/50a.1.m28. [DOI] [PubMed] [Google Scholar]
  • 33.Franchignoni F, Giordano A, Ferriero G, Orlandini D, Amoresano A, Perucca L. Measuring mobility in people with lower limb amputation: Rasch analysis of the mobility section of the prosthesis evaluation questionnaire. J Rehabil Med. 2007;39(2):138–44. doi: 10.2340/16501977-0033. [DOI] [PubMed] [Google Scholar]
  • 34.Sakakibara BM, Miller WC, Backman CL. Rasch analyses of the Activities-specific Balance Confidence Scale with individuals 50 years and older with lower-limb amputations. Arch Phys Med Rehabil. 2011;92(8):1257–63. doi: 10.1016/j.apmr.2011.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Cella D, Nowinski C, Peterman A, Victorson D, Miller D, Lai J-S, Moy C. The Neurology Quality-of-Life Measurement Initiative. Arch Phys Med Rehabil. 2011;92(10, Supplement):S28–S36. doi: 10.1016/j.apmr.2011.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, Amtmann D, Bode R, Buysse D, Choi S, Cook K, DeVellis R, DeWalt D, Fries JF, Gershon R, Hahn EA, Lai J-S, Pilkonis P, Revicki D, Rose M, Weinfurt K, Hays R. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol. 2010;63(11):1179–94. doi: 10.1016/j.jclinepi.2010.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader D, Fries JF, Bruce B, Rose M. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007;45(5 Suppl 1):S3–S11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hanspal RS, Fisher K, Nieveen R. Prosthetic socket fit comfort score. Disabil Rehabil. 2003;25(22):1278–80. doi: 10.1080/09638280310001603983. [DOI] [PubMed] [Google Scholar]
  • 39.Cohen RJ, Swerdlik ME, Phillips SM. Psychological testing and assessment: An introduction to tests and measurement. Mayfield Publishing Co; 1996. [Google Scholar]
  • 40.D'agostino RB, Belanger A, D'Agostino Jr RB. A suggestion for using powerful and informative tests of normality. Am Stat. 1990;44(4):316–21. [Google Scholar]
  • 41.Royston JP. sg3.5: Comment on sg3.4 and an improved D'Agostino test. Stata Technical Bulletin. 1991;3:23–4. [Google Scholar]
  • 42.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8. doi: 10.1037//0033-2909.86.2.420. [DOI] [PubMed] [Google Scholar]
  • 43.Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of strength and conditioning research / National Strength & Conditioning Association. 2005;19(1):231–40. doi: 10.1519/15184.1. [DOI] [PubMed] [Google Scholar]
  • 44.McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1(1):30. [Google Scholar]
  • 45.Feldt LS, Woodruff DJ, Salih FA. Statistical inference for coefficient alpha. Appl Psychol Meas. 1987;11(1):93–103. [Google Scholar]
  • 46.Kraemer HC. Extension of Feldt's approach to testing homogeneity of coefficients of reliability. Psychometrika. 1981;46(1):41–5. [Google Scholar]
  • 47.Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber; 1936. [Google Scholar]
  • 48.Yost KJ, Eton DT, Garcia SF, Cella D. Minimally important differences were estimated for six Patient-Reported Outcomes Measurement Information System-Cancer scales in advanced-stage cancer patients. J Clin Epidemiol. 2011;64(5):507–16. doi: 10.1016/j.jclinepi.2010.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Resnik L, Borgia M. Reliability of outcome measures for people with lower-limb amputations: distinguishing true change from statistical error. Phys Ther. 2011;91(4):555–65. doi: 10.2522/ptj.20100287. [DOI] [PubMed] [Google Scholar]
  • 50.de Laat FA, Rommers GM, Geertzen JH, Roorda LD. Construct Validity and Test-Retest Reliability of the Questionnaire Rising and Sitting Down in Lower-Limb Amputees. Arch Phys Med Rehabil. 2011;92(8):1305–10. doi: 10.1016/j.apmr.2011.03.016. [DOI] [PubMed] [Google Scholar]
  • 51.Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651–7. doi: 10.1007/s11136-011-9960-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Pezzin LE, Dillingham TR, Mackenzie EJ, Ephraim P, Rossbach P. Use and satisfaction with prosthetic limb devices and related services. Arch Phys Med Rehabil. 2004;85(5):723–9. doi: 10.1016/j.apmr.2003.06.002. [DOI] [PubMed] [Google Scholar]
  • 53.Miller WC, Deathe AB, Speechley M. Psychometric properties of the Activities-specific Balance Confidence Scale among individuals with a lower-limb amputation. Arch Phys Med Rehabil. 2003;84(5):656–61. doi: 10.1016/s0003-9993(02)04807-4. [DOI] [PubMed] [Google Scholar]
  • 54.Rutherford C, Costa D, Mercieca-Bebber R, Rice H, Gabb L, King M. Mode of administration does not cause bias in patient-reported outcome results: a meta-analysis. Qual Life Res. 2015 doi: 10.1007/s11136-015-1110-8. [DOI] [PubMed] [Google Scholar]
  • 55.Muehlhausen W, Doll H, Quadri N, Fordham B, O'Donohoe P, Dogar N, Wild DJ. Equivalence of electronic and paper administration of patient-reported outcome measures: a systematic review and meta-analysis of studies conducted between 2007 and 2013. Health Qual Life Outcomes. 2015;13(1):167. doi: 10.1186/s12955-015-0362-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Toepoel V, Das M, van Soest A. Design of web questionnaires: The effect of layout in rating scales. J Off Stat. 2009;25(4):509. [Google Scholar]
  • 57.Lord FM. Tests of the same length do have the same standard error of measurement. Educ Psychol Meas. 1959;19(2):233–9. [Google Scholar]
  • 58.Gardner PL. Test Length and the Standard Error of Measurement. J Educ Meas. 1970;7(4):271–3. [Google Scholar]
  • 59.Fries JF, Witter J, Rose M, Cella D, Khanna D, Morgan-DeWitt E. Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function. J Rheumatol. 2014;41(1):153–8. doi: 10.3899/jrheum.130813. [DOI] [PubMed] [Google Scholar]
  • 60.Cella D, Gershon R, Lai JS, Choi S. The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Qual Life Res. 2007;16(Suppl 1):133–41. doi: 10.1007/s11136-007-9204-6. [DOI] [PubMed] [Google Scholar]
  • 61.Marx RG, Menezes A, Horovitz L, Jones EC, Warren RF. A comparison of two time intervals for test-retest reliability of health status instruments. J Clin Epidemiol. 2003;56(8):730–5. doi: 10.1016/s0895-4356(03)00084-2. [DOI] [PubMed] [Google Scholar]
  • 62.Comrey AL, Lee HB. A first course in factor analysis. Psychology Press; 2013. [Google Scholar]
  • 63.Podsakoff PM, MacKenzie SB, Lee JY, Podsakoff NP. Common method biases in behavioral research: a critical review of the literature and recommended remedies. J Appl Psychol. 2003;88(5):879–903. doi: 10.1037/0021-9010.88.5.879. [DOI] [PubMed] [Google Scholar]

RESOURCES