Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jan 1.
Published in final edited form as: Value Health. 2011 Oct 22;15(1):128–134. doi: 10.1016/j.jval.2011.08.006

Feasibility and Construct Validity of PROMIS and Legacy Instruments in an Academic Scleroderma Clinic—Analysis from the UCLA Scleroderma Quality of Life Study

Dinesh Khanna 1, Paul Maranian 1, Nan Rothrock 4, David Cella 4, Richard Gershon 4, Puja P Khanna 1, Brennan Spiegel 3,5, Daniel E Furst 1, Phil J Clements 1, Amber Bechtel 1, Ron D Hays 2,6
PMCID: PMC3457915  NIHMSID: NIHMS379739  PMID: 22264980

Abstract

Background

The NIH Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative is a cooperative group program of research designed to develop, evaluate, and standardize item banks to measure patient-reported outcomes relevant across medical conditions. For adults, 11 domains have been developed in physical, mental, and social health.

Purpose

The objective of the current study was to assess feasibility and construct validity of PROMIS item banks versus legacy measures in a observational study in systemic sclerosis (SSc).

Methods

Patients with SSc in a single academic center completed computerized adaptive technology (CAT) administered PROMIS item banks during the clinic visit and legacy domains (using paper-and-pencil). The construct validity of PROMIS items was evaluated by examining correlations with corresponding legacy measures using multitrait-multimethod analysis.

Results

Participants consisted of 143 SSc patients with an average age of 51.5 years; 71% were female and 68% were Caucasian. The average number of items completed for each CAT-administered item bank ranged from 5 to 8 (69 CAT items per patient), and the average time to complete each CAT-administered item bank ranged from 48 seconds to 1.9 minutes per patient (average time= 11.9 minutes/per patient for 11 banks). All correlations between PROMIS domains and respective legacy measures were large and in the hypothesized direction (ranged from .61 to .82).

Conclusion

Our study supports the construct validity of the CAT-administered PROMIS item banks and shows that they can be administered successfully in a clinic with support staff. Future studies should assess the feasibility of PROMIS item banks in a busy clinical practice

Keywords: Systemic sclerosis, PROMIS, health-related quality of life, construct validity

Background

The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a cooperative research program designed to develop, evaluate, and standardize item banks to measure patient-reported outcomes (PROs) across different medical conditions as well as the US population (1). The goal of PROMIS is to develop reliable and valid item banks using item response theory (IRT) that can be administered as computerized adaptive tests (CAT)(13). CAT selects the most informative questions from an item bank on the basis of a person’s previous responses; this process determines an individualized score using a minimum number of questions while preserving precision. Eleven adult domains have been developed to date as short forms and CATs in physical, mental, and social health(1). We tested the 11 PROMIS domains in patients with systemic sclerosis (scleroderma; SSc) in a single center observational study.

Scleroderma, meaning thickened skin, is a rare disease that affects 300 to 700 people per million population(4). Scleroderma manifests itself in several forms, including localized disease, overlap syndromes, scleroderma-like diseases, and systemic sclerosis (SSc)(5). SSc is an autoimmune disease that includes thickening of the skin and internal organ involvement (heart, lung, gastrointestinal, and kidney involvement). People with SSc have both skin hardening and internal involvement; depending on the extent of skin involvement, SSc is divided into limited SSc and diffuse SSc(5). Patients with limited SSc generally have a more favorable outcome, with a 5-year survival as high as 86%(6). Diffuse SSc is characterized by rapid skin thickening and potentially severe pulmonary, cardiac, renal, and gastrointestinal involvement occurring in the first 3–5 years of disease and may be associated with poor survival(5). SSc is a chronic rheumatic disease with no effective treatment or cure, in which patients cope with pain, disfigurement, disability, and feelings of helplessness, each of which can impair health-related quality of life (HRQoL)(79). In this study we sought to assess the feasibility of administrating PROMIS item banks in an academic clinical setting and construct validity of PROMIS domains versus legacy measures in an observational study of patients with SSc. We hypothesized that the PROMIS item banks can be administered in a clinical setting with adequate staff support without disrupting the flow of clinic, and that the item banks would be highly correlated with corresponding legacy instruments.

Methods

Patients

We recruited SSc patients receiving care in the UCLA Scleroderma Program to serve as participants in the UCLA Scleroderma Quality of Life Study. The original objective of the study was to assess minimally important differences for OMERACT-endorsed outcomes measures in SSc. We added PROMIS item banks Adult patients (≥18 years) with a diagnosis of SSc were included in the study(10). Patients with SSc were further divided into limited SSc, diffuse SSc and overlap syndrome. Limited SSc is defined as skin thickening distal, but not proximal, to the knees and elbows, with or without facial involvement; diffuse SSc is defined as skin thickening distal and proximal, to the knees and elbows, with or without facial involvement; and overlap syndrome is defined as patients with SSc and another rheumatic disease (such as inflammatory myositis or rheumatoid arthritis).

Study Protocol

This study is a single center observational study where patients with SSc are invited to participate during their clinic visits. UCLA Scleroderma clinic is a weekly rheuatology clinic where patients with SSc are seen by 3 scleroderma experts (D.K., D.E.F, and P.J.C). Each clinician has 2 dedicated rooms assigned to him and assigned 10–12 patients. The current analysis reports the baseline data. SSc patients with new (60-minute time slot) or follow up (usually 30–45 minute time slot) consultations are approached at the time of their scheduled clinic visit by the front desk staff or nurse checking-in the patient and invited to participate in the study. If a patient is interested, s/he is handed a UCLA Institutional Review Board-approved written consent and HIPAA forms. The physician completes the clinical visit and then discusses the study in detail. Because one of the objectives of the study is to assess the feasibility of administrating PROMIS item banks in a clinical setting, we invite all patients irrespective of their disabilities. If the patient is interested, s/he signs the consent and HIPAA forms and the study coordinator directs the patient to the PROMIS Assessment Center (www.assessmentcenter.net) to complete the 11 item banks (discussed below) (11). Assessment Center is an online research management tool supported by PROMIS that administers CAT-administered item banks. The patient completes the PROMIS domains in the examination room and the physician uses the other assigned examination room to examine the next patient. The item banks are completed using a dedicated desktop computer in each examining room and patient has complete privacy. Subsequent to that, patient is asked to complete the legacy instruments in the room. Majority of the legacy measures were completed using paper-and-pencil during the clinic visit; in rare instances they were completed at home and returned within 1 week using pre-stamped envelopes.

Measures

PROMIS version 1.0 item banks including anger, anxiety, depression, fatigue, pain behavior, pain impact, physical function, sleep disturbance, wake disturbance, satisfaction with participation in social roles and satisfaction with participation in discretionary social activities were administered as CATs (available at www.nihpromis.org). With the exception of physical function which does not include a time frame and the social health banks that reference “lately,” all item banks reference the past 7 days.

All banks other than pain behavior use five response options that most commonly reflect intensity (e.g., not at all, a little bit, somewhat, quite a bit, very much) or frequency (e.g., never, rarely, sometimes, often, always). Pain Behavior includes a “Had no pain” response option as well. All PROMIS instruments are scored using a T score metric so that the mean in the U.S. general population is 50 with a standard deviation of 10. Higher scores reflect more of what is being measured. Therefore, high scores for physical and social function are desirable, whereas high symptom scores are undesirable. CATs were set to administer enough items to achieve a standard error (SE) <0.30 (corresponding to reliability>.90) after a minimum of 5 items were administered per bank. Each CAT stopped after 20 items were administered even if the SE criterion was not met. Additional information about the banks is available at www.nihpromis.org. Legacy instruments were also included in this study. Legacy instruments are the most widely used survey instruments to assess a particular patient-reported outcome; considered the state-of-the-science prior to PROMIS. Legacy instruments included the SF-36® version 2 (12), Health Assessment Questionnaire-Disability Index (HAQ-DI) (13),10-item Center for Epidemiologic Studies Depression Scale (CES-D)(14), Functional Assessment of Chronic Illness Therapy (FACIT)-Fatigue(15), and Medical Outcomes Study (MOS) Sleep scale(16). These legacy instruments were chosen as they have been endorsed by the Outcomes Measures in Rheumatology (OMERACT)(17, 18) and/or recently evaluated in SSc(19, 20).

The SF-36 version 2 is a generic health status measure consisting of 36 items assessing 8 scales.(12, 21) The 8 scales are summarized into Physical Component Summary (PCS) and Mental Component Summary (MCS) scores. The scales and summary scores are normalized to the U.S. general population, for whom the mean score is 50 and the standard deviation is 10. We used the 4-week recall period version of the SF-36 v.2.

The Health Assessment Questionnaire-Disability Index (HAQ-DI) is an arthritis-targeted measure intended for assessing functional ability in arthritis(13). It is a self-administered 20-question instrument that assesses a patient’s level of functional ability and includes questions that involve both upper and lower extremities. The HAQ-DI score ranges from 0 (no disability) to 3 (severe disability). It has a 7 day recall period.

Depressive symptoms were measured with the 10-item Center for Epidemiologic Studies Depression (CESD-10) Scale(14). The CESD-10 uses a 4-point categorical response scale (range 0 to 30) with higher scores representing greater depressive symptoms. A score ≥10 on the CESD-10 represents depressive symptoms. It has a 7 day recall period.

The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) is a 13-item questionnaire that assesses self-reported fatigue and its impact upon daily activities and function over the past 7 days. The range of possible scores is 0–52, with lower scores reflecting more fatigue.

The Medical Outcomes Study (MOS) Sleep scale(16) yields a sleep problems index and 6 scale scores. Answers were based on a retrospective assessment over the past 4 weeks. Quantity of sleep is scored as the average hours slept per night. The other scales and 9-item sleep problem index are scored on a 0–100 possible range, and higher scores indicate more of the concept being measured.

Analysis

Mean scores, standard deviations (SD), ranges, and percentages of respondents scoring the minimum (floor) and maximum (ceiling) possible scores were calculated to evaluate scale score distributions for PROMIS and legacy instruments. For easy interpretability, floor effect is presented as “worst” possible score and ceiling as “best” possible irrespective of the direction of the scale.

Internal consistency reliability for legacy items was estimated using Cronbach’s alpha(22). An alpha ≥ 0.70 is considered satisfactory for group comparisons(23). We also assessed if there were differences in socio-demographics, type of SSc or disease duration in patients falling in the 1st versus 4th quartiles for total CAT-items completed.

Multitrait-Multimethod Analysis

The construct validity of PROMIS measures was evaluated by examining correlations with corresponding legacy measures using a computer program for analyzing a multitrait-multimethod (MTMM) matrix. Construct validity is supported in MTMM analyses when the highest correlations are found for different methods of assessing the same domain (validity diagonals) and weaker correlations among measures of different domains (21). The 6 PROMIS domains selected for analysis were depression, fatigue, pain behavior, physical function, sleep disturbance, and satisfaction with participation in discretionary social activities. This is due to fact that only 6 PROMIS domains had corresponding legacy scales administered in the study. The corresponding legacy scales were the CESD-10, FACIT-Fatigue, SF-36 bodily pain, SF-36 physical functioning, MOS 9-item sleep problem index, and SF-36 social functioning, respectively. Analyses were also repeated by replacing HAQ-DI for SF-36 physical functioning and SF-vitality for FACIT-Fatigue. We hypothesized correlation coefficients for validity diagonals of ≥0.50 (a large effect size) and that these would be significantly larger than off-diagonal correlations.

Results

We recruited 143 patients with SSc. The average (SD) age was 51.5 (14.7) years; 117 (71%) were female and 94 (68%) were Caucasian. The mean (SD) disease duration was 7.5 (8.2) years. Seventy-six (55%) had limited SSc, 55 (39%) had diffuse SSc, and 9 (6%) had overlap syndrome. On average, patients had moderate functional disability as defined by HAQ-DI of ≥1.0 (24). Forty-five (32%) had depressed mood (CESD > 10) and SF-36 PCS (mean score=38.2) and MCS (mean score= 47.8) scores were 1.2 and 0.2 SD below the U.S. population means, respectively (Table 1). The average fatigue level (FACIT-F) was 31.6, a standard deviation below the general population and very close to the cut score of 30 indicating clinically significant fatigue. Ceiling effects (proportion of patients who reported no impairment) were seen in 6% of CESD scores, 10% of HAQ-DI scores, and 2% of FACIT-F scores.

Table 1.

Descriptive statistics of the legacy instruments including reliability and floor/ceiling effects

Legacy Domains Scores Cronbach’s alpha Floor Ceiling

N Mean Median min max N % N %
CESD (0–30)* 142 8.7 8 0 27 0.84 8 5.6 0 0.0
HAQ-DI (0–3)* 142 0.9 0.9 0 2.6 0.87 14 9.9 0 0.0
SF-36v.2 Physical Functioning 143 35.4 36 14.9 57 0.92 5 3.5 6 4.2
SF-36v.2 Role Limitations—Physical 141 37.8 37.3 17.7 56.9 0.96 13 9.2 15 10.6
SF-36v.2 Bodily Pain 142 45.2 46.1 19.9 62.1 0.92 1 0.7 19 13.4
SF-36v.2 General health 143 43.6 43.4 31.5 55.3 0.73 0 0.0 0 0.0
SF-36v.2 Vitality 143 43.8 45.8 20.9 70.8 0.88 1 0.7 3 2.1
SF-36v.2 Role Limitations--Emotional 141 43.7 48.1 9.2 55.9 0.94 9 6.4 58 41.1
SF-36v.2 Social Functioning 143 42.6 45.9 13.2 56.8 0.89 1 0.7 38 26.6
SF-36v.2 Emotional Well being 143 47.2 50 7.8 64.1 0.87 2 1.4 1 0.7
SF-36v.2 PCS (7–72) 140 38.2 37.7 12.9 58.9 0.97 0 0 0 0
SF-36v.2 MCS (6–71) 140 47.8 50.2 6.2 69.4 0.97 0 0 0 0
FACIT Fatigue (0–52) 143 31.6 31 4 52 0.95 0 0.0 3 2.1
MOS Sleep Problem index (9 items)* 142 39.4 40.3 0 84.4 0.84 1 0.7 0 0.0
*

Higher scores indicate poor HRQOL (for other domains, a higher score indicates better HRQOL) Floor effect is presented as “worst” possible score and ceiling as “best” possible score irrespective of the direction of the scale

Reliability, as assessed by Cronbach’s alpha, for all legacy domains was >0.70 and ranged from 0.73 (SF-36 general health) to 0.96 (SF-36 role limitations-physical).

Scores on PROMIS physical functioning domains in this sample were about 1.0 SD worse than scores from a sample representing the US general population(25) (Table 2). Other scales were 0.2 to 0.66 SD below the general population. Ceiling effects were seen in the pain domains (15–20%) suggesting no pain in these patients. Also, in 8 of 11 banks, no patient answered maximum 20 questions in the item bank (Table 2). In the remaining 3 banks (Social Satisfaction Discretionary, Social-Satisfaction Roles, Wake Disturbances) 5–12% of patients completed all questions in the item bank.

Table 2.

Descriptive statistics of the PROMIS domains including reliability, floor/ceiling effects, and total number of items answered

PROMIS Domains Scores Floor Ceiling Proportion answering all questions in the item bank

Scores: N Mean Median min max N % N % N %

Anger 1.0* 143 52.1 52 33 74 0 0 0 0 0 0
Anxiety 1.0* 143 55.6 56 34 74 0 0 0 0 0 0
Depression 1.0* 143 52.2 52 33 77 0 0 4 3 0 0
Fatigue 1.0* 143 54.6 55 23 72.5 0 0 1 1 0 0
Pain Behavior 1.0* 143 52.4 56 34 68 0 0 28 20 0 0
Pain Impact 1.0* 143 55.1 56 37 74 0 0 22 15 0 0
Physical Function 1.0 143 39.8 39 22.7 65 0 0 0 0 0 0
Sleep Disturbances 1.0* 143 51.7 52 28 76.1 1 1 1 1 0 0
Social Sat Discretionary 1.0 143 47.1 47 26 68 6 4 2 1 7 5
Social-Satisfaction Roles 1.0 142 43.4 43 24 67.9 6 4 8 6 10 7
Wake Disturbances 1.0* 142 51.7 53 29.2 71 2 1 2 1 17 12
*

Higher scores indicate poor HRQOL (for other domains, a higher score indicates better HRQOL) Floor effect is presented as “worst” score and ceiling as “best” irrespective of the direction of the scale

As discussed in the Methods section, CATs were set to administer a minimum of 5 items and enough items to achieve a standard error (SE) <0.30. The average number of items completed for each CAT-administered item bank ranged from 5 to 8 (69 CAT items per patient for all 11 CAT item banks), and the average time to complete each CAT-administered item bank ranged from 48 second to 1.9 minutes per patient (average time= 11.9 minutes/per patient for 11 CAT item banks; Table 3). Some patients’ physical impairments required study administration modifications while others experienced difficulty understanding the questions and required additional clarification and time to complete the study questionnaire. For example, some patients had one or more digital amputations and/or severe hand contractures related to their SSc and were physically unable to operate a mouse or computer keyboard (n=27 subjects). In these instances, the patients read the questions on the screen themselves, provided the answer to the coordinator, and then the coordinator selected the given response in Assessment Center. This same procedure was followed for patients who complained of poor vision and inability to read the questions despite setting the type font to its largest size. The mean time to administer PROMIS items was not different between patients who required additional help versus those who did not (p=.0993). There were no differences in the education level (p=0.3) and depressed mood (p=0.5) in patients who required additional help versus those who did not. Ten patients (7%) required > 25 minutes to complete their assessment. Of these, 5 of the patients took an average of 9.0 minutes to complete the anger item bank. Only 2 of these ten patients belonged to the physical impairment group.

Table 3.

Test length per item bank

PROMIS Item Banks # items administered: Time (minutes)

N mean median min max N mean median min max
 Anger 1.0* 143 6.6 6 5 20 143 1.9 1.0 0.4 13.9
 Anxiety 1.0* 143 5.5 5 5 20 143 0.9 1.0 0.3 8.1
 Depression 1.0* 143 6.2 5 5 20 143 0.8 1.0 0.3 8.4
 Fatigue 1.0* 143 5.2 5 5 20 143 0.9 1.0 0.3 5.9
 Pain Behavior 1.0* 143 8 5 5 20 143 1.2 1.0 0.4 4.6
 Pain Impact 1.0* 143 8.1 5 5 20 143 1.1 1.0 0.4 4.6
 Physical Function 1.0 143 5.3 5 5 20 143 1.2 1.0 0.4 5.8
 Sleep Disturbance 1.0* 143 6.3 5 5 20 142 0.9 1.0 0.3 4.6
 Social Sat Discretionary 1.0 143 5.5 5 5 12 142 1.0 1.0 0.3 7.1
 Social-Satisfaction Roles 1.0 142 5.9 5 5 14 141 1.0 1.0 0.3 10.4
 Wake Disturb 1.0* 142 6.9 5 5 16 141 1.0 1.0 0.3 6.3
All PROMIS item banks above 69.5 56 52 202 11.9 9.0 3.7 79.7
*

Higher scores indicate poor HRQOL (for other domains, a higher score indicates better HRQOL)

We also explored if there were any differences in the demographics of patients who completed ≤ 56 items (1st quartile) vs. ≥ 81 items (4th quartile; Table 4) when combining all 11 item banks. There were no statistical differences in the patients in regards to their socio-demographic characteristics (Table 4).

Table 4.

Patient Demographics by Item Count

All Patients n=143 Total item count ≤ 56 (n = 42) Total item count ≥ 81 (n = 36) p-value*
Age, mean (SD) 51.5 (14.7) 51.1 (13.5) 51.9 (15.0) 0.7987
Female, N (%) 117 (81.8) 36 (85.7) 30 (83.3) 0.771
Race, N (%) 0.515
 Caucasian 94 (67.6) 30 (75.0) 23 (65.7)
 African-American 9 (6.5) 1 (2.5) 3 (8.6)
 Asian 19 (13.7) 4 (10.0) 6 (17.1)
 Others 17 (12.2) 5 (12.5) 3 (8.6)
Education, N (%) 0.12
 high school or less 25 (17.7) 6 (14.3) 1 (2.8)
 some college 50 (35.5) 13 (31.0) 13 (36.1)
 college graduate 30 (21.3) 8 (19.1) 13 (36.1)
 one or more years post-college 36 (25.5) 15 (35.7) 9 (25.0)
Marital status, N (%) 0.772
 Married 83 (58.5) 27 (64.3) 22 (61.1)
 separated/divorced/widowed/single 59 (41.6) 15 (35.7) 14 (38.9)
Disease duration (years), median (IQR) 5 (7.25) 4 (7) 5 (6) 0.4891
Type of SSC, N (%) 0.124
 Diffuse 55 (39.3) 15 (36.6) 10 (28.6)
 Limited 75 (53.6) 20 (48.8) 24 (68.6)
 Overlap syndrome 9 (6.4) 6 (14.3) 1 (2.9)

The correlations in the MTMM matrix are provided in Table 5. Validity diagonals (correlations among different methods of measuring the same domain) were the largest correlations across the row and column in every case with one exception: the PROMIS scale (satisfaction with participation in discretionary social activities) had about the same size correlation with the legacy scale FACIT-Fatigue (r=.62) than with the SF-36 social functioning counterpart. Eighty-three percent of the paired correlation t-tests were statistically significantly larger than relevant off-diagonal correlations in the MTMM matrix, providing substantial support of construct validity of the measures. The correlation between HAQ-DI and PROMIS physical functioning CAT was 0.71 and between SF-vitality and PROMIS fatigue CAT was 0.75. Other MTMM matrices were analyzed replacing SF-physical functioning with HAQ-DI and FACIT-Fatigue with SF-vitality. Similarly, validity diagonals were the largest correlations across row and column with at most one exception in these matrices. The matrix with SF-physical functioning and FACIT-Fatigue is presented (Table 5) because it had the highest average convergent validity correlation.

Table 5.

MTMM Correlations Matrix

PROMIS CAT CES-D FACIT-Fatigue SF-BP SF-PF Sleep index SF_SF
Depression 1.0 0.67 0.44 0.31 0.20 0.33 0.46
Fatigue 1.0 0.59 0.76 0.59 0.51 0.49 0.59
Pain Behavior 1.0 0.44 0.53 0.66 0.38 0.37 0.47
Phys. Function 1.0 0.46 0.72 0.56 0.82 0.43 0.55
Sleep Disturb 1.0 0.50 0.37 0.23 0.24 0.75 0.28
Social Sat Discretionary 1.0 0.56 0.62 0.48 0.54 0.46 0.61

Validity diagonal correlations are not highlighted. SF= SF-36 subscales, SLP9: MOS Sleep Problem Index (9-item).

Discussion

The PROMIS initiative aims to create reliable and valid item banks that can be used across chronic conditions. This study expands on the original PROMIS effort by testing the performance of the PROMIS CATs in systemic sclerosis (SSc). We have demonstrated the feasibility of administrating CATs in patients with SSc seen in an academic center without interrupting the flow of the clinic. In addition, the PROMIS domains showed construct validity against the appropriate legacy instruments.

One of the objectives of PROMIS is to develop meaningful and precise instruments while reducing respondent burden(1, 2). In our patients with SSc, mean/median time of completion 11 CAT-administered domains was 11.9/9.0 minutes. Patients and providers can therefore anticipate approximately one minute per concept measured in clinical practice settings where PROMIS CATs are employed. In comparison, there were 91 items in the 5 legacy instruments that captured 6 health constructs (physical functioning, mental health, bodily pain, social functioning, sleep, and fatigue). Applying rule of thumb of completing 3–5 items/minute(26), these would take approximately 18.2 minutes to 30.3 minutes for 6 health contructs.

The time estimate is important given the PROMIS domains were administered during the office visit with a dedicated study coordinator and were done in patients with moderate physical disability. Data collection occurred in the clinic usually after a physician visit. In rare occasions, patients intiated the study visit before seeing the physician. This may explain the time of 79.7 minutes for a single patient who likely stopped in the middle to be interviewed and examined by the physician but didn’t log off.

The administration of PROMIS domains by the study coordinator in patients with disability didn’t significantly increase the time to complete the items. The study coordinator was able to read the questions or help answer patients’ choices, especially in patients with disabilities. However, this study was performed in a research setting. If these measures are to be used in routine clinical practice, that task would fall to a clinical staff (e.g., nurse, receptionist), which could drastically impact clinic flow and should be assessed in future studies.

By using an item bank for a particular domain that covers the continuum of the construct, PROMIS aims to reduce the floor and ceiling effects in PROs(2). As an example, Rose et al (27) used IRT to construct and evaluate a preliminary item bank for physical function (N=17,726). In simulations, a 10-item CAT eliminated floor and decreased ceiling effects. When using the 9-item HAQ-DI or 10-item SF-36 physical function measure, there were significant floor and ceiling effects. In our study, the presence of floor effect was minimal but a higher ceiling effect was seen in the assessment of pain impact and behavior (ranging between 20–25%) suggesting a large proportion of patients reported no pain. Lack of high proportion of patients with floor/ceiling effects on other measures reflect moderate-to-severe physical, mental and social impact of SSc.

We assessed construct validity of the PROMIS domains and legacy measures. All correlations between PROMIS domains and corresponding legacy measures were large and in the hypothesized direction; the absolute value of these correlations ranged from .61 to .82. The correlations of PROMIS scale (satisfaction with participation in discretionary social activities) correlated as highly with the legacy item FACIT-Fatigue (r=.62) as it did with it (r=.61). This is probably due to the fact that the fatigue probably plays a large role in the ability to participate in social activities

With the exception of physical function which does not include a time frame and the social health banks that reference “lately,” all PROMIS item banks reference the past 7 days. In comparison, SF-36 BP and SF scales and MOS Sleep scale use 4-week recall period. In a previous study, researchers compare acute (one-week recall) and standard (four-week recall) versions of the SF-36 scale scores in 142 patients with asthma(28). The acute form scores were reliable, the scales conform to assumptions underlying their scoring and scaling, and scales had factor content similar to the standard version.

Our study has many strengths. To our knowledge, this is the first application of CAT-administered PROMIS items in clinical setting without disrupting the flow of the clinic. Second, this study administered legacy instruments and PROMIS domains allowing us to assess the construct validity of PROMIS within SSc, a disease not specifically targeted in the instrument development.

Our study has limitations. First, we did not assess responsiveness of the PROMIS banks and legacy instruments. This will require longitudinal data. Second, we only included 6 legacy instruments in this study whereas we included all 11 PROMIS item banks. This is because the original objective of the study was to assess minimally important differences for OMERACT-endorsed outcomes measures in SSc. Fourth, we only evaluated the feasibility of PROMIS item banks in the research clinical setting. We had a dedicated study coordinator who was responsible for helping patients complete the PROMIS domains and providing legacy instruments. We had dedicated clinic area and the administrators were supportive of our research objectives. The feasibility of administrating PROMIS domains still needs to be determined. Fifth, we didn’t evaluate differential item functioning as we administered the CATs in this study. We made an assumption that the items performance was similar in US population and our sample. Previous analyses of DIF in PROMIS has shown that the items are generally robust to age, gender and education. Ongoing work is evaluating DIF by other subgroups (e.g., by disease group).

Finally, this study did not assess the clinical utility of the measures. As an example, legacy instruments such as HAQ-DI and CESD are used for clinical decision-making(19, 24). Rather, the objective of the study was to assess construct validity; other studies are needed to determine appropriate score cut points and linkage of legacy instrument scores with PROMIS item banks. In conclusion, our study provides support for the construct validity of CAT-administered PROMIS item banks and shows that is feasible to administer them in a clinical practice. Studies are underway to assess clinical utility and responsiveness of the PROMIS item banks.

Acknowledgments

D. Khanna, Maranian, Rothrock, Cella, Gershon, Furst, PP Khanna, Spiegel, Bechtel and Hays were supported by a National Institutes of Health Award (NIH/NIAMS U01 AR057936A), the National Institutes of Health through the NIH Roadmap for Medical Research Grant (AR052177). D. Khanna is also supported by NIAMS K23 AR053858-04) and the Scleroderma Foundation (New Investigator Award). Dr. PP Khanna was also supported by National Institutes of Health Award (T32 AR 053463). Hays was also supported by the UCLA Resource Center for Minority Aging Research/Center for Health Improvement in Minority Elderly (RCMAR/CHIME), NIH/NIA Grant Award Number P30AG021684, and the UCLA/Drew Project EXPORT, NCMHD, 2P20MD000182

Reference List

  • 1.Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007 May;45(5 Suppl 1):S3–S11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) Med Care. 2007 May;45(5 Suppl 1):S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
  • 3.Hays RD, Liu H, Spritzer K, Cella D. Item response theory analyses of physical functioning items in the medical outcomes study. Med Care. 2007 May;45(5 Suppl 1):S32–S38. doi: 10.1097/01.mlr.0000246649.43232.82. [DOI] [PubMed] [Google Scholar]
  • 4.Mayes MD, Lacey JV, Jr, Beebe-Dimmer J, Gillespie BW, Cooper B, Laing TJ, et al. Prevalence, incidence, survival, and disease characteristics of systemic sclerosis in a large US population. Arthritis Rheum. 2003 Aug;48(8):2246–55. doi: 10.1002/art.11073. [DOI] [PubMed] [Google Scholar]
  • 5.Clements PJ. Systemic sclerosis (scleroderma) and related disorders: clinical aspects. Baillieres Best Pract Res Clin Rheumatol. 2000 Mar;14(1):1–16. doi: 10.1053/berh.1999.0074. [DOI] [PubMed] [Google Scholar]
  • 6.Charles C, Clements P, Furst DE. Systemic sclerosis: hypothesis-driven treatment strategies. Lancet. 2006 May 20;367(9523):1683–91. doi: 10.1016/S0140-6736(06)68737-0. [DOI] [PubMed] [Google Scholar]
  • 7.Khanna D, Clements PJ, Furst DE, Chon Y, Elashoff R, Roth MD, et al. Correlation of the degree of dyspnea with health-related quality of life, functional abilities, and diffusing capacity for carbon monoxide in patients with systemic sclerosis and active alveolitis: results from the Scleroderma Lung Study. Arthritis Rheum. 2005 Feb;52(2):592–600. doi: 10.1002/art.20787. [DOI] [PubMed] [Google Scholar]
  • 8.Haythornthwaite JA, Heinberg LJ, McGuire L. Psychologic factors in scleroderma. Rheum Dis Clin North Am. 2003 May;29(2):427–39. doi: 10.1016/s0889-857x(03)00020-6. [DOI] [PubMed] [Google Scholar]
  • 9.Benrud-Larson LM, Heinberg LJ, Boling C, Reed J, White B, Wigley FM, et al. Body image dissatisfaction among women with scleroderma: extent and relationship to psychosocial function. Health Psychol. 2003 Mar;22(2):130–9. [PubMed] [Google Scholar]
  • 10.Preliminary criteria for the classification of systemic sclerosis (scleroderma) Subcommittee for scleroderma criteria of the American Rheumatism Association Diagnostic and Therapeutic Criteria Committee. Arthritis Rheum. 1980 May;23(5):581–90. doi: 10.1002/art.1780230510. [DOI] [PubMed] [Google Scholar]
  • 11.Gershon R, Rothrock NE, Hanrahan RT, Jansky LJ, Harniss M, Riley W. The development of a clinical outcomes survey research application: Assessment Center. Qual Life Res. 2010 Jun;19(5):677–85. doi: 10.1007/s11136-010-9634-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ware J, Kosinski M, Dewey J. How to Score Version Two of the SF-36 Health Survey. Lincoln, RI: QualityMetric Incorporated; 2000. [Google Scholar]
  • 13.Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum. 1980 Feb;23(2):137–45. doi: 10.1002/art.1780230202. [DOI] [PubMed] [Google Scholar]
  • 14.Andresen EM, Malmgren JA, Carter WB, Patrick DL. Screening for depression in well older adults: evaluation of a short form of the CES-D (Center for Epidemiologic Studies Depression Scale) Am J Prev Med. 1994 Mar;10(2):77–84. [PubMed] [Google Scholar]
  • 15.Cella D, Lai JS, Chang CH, Peterman A, Slavin M. Fatigue in cancer patients compared with fatigue in the general United States population. Cancer. 2002 Jan 15;94(2):528–38. doi: 10.1002/cncr.10245. [DOI] [PubMed] [Google Scholar]
  • 16.Hays RD, Martin SA, Sesti AM, Spritzer KL. Psychometric properties of the Medical Outcomes Study Sleep measure. Sleep Med. 2005 Jan;6(1):41–4. doi: 10.1016/j.sleep.2004.07.006. [DOI] [PubMed] [Google Scholar]
  • 17.Furst DE, Khanna D, Matucci-Cerinic M, Clements P, Steen V, Pope J, et al. Systemic Sclerosis - Continuing Progress in Developing Clinical Measures of Response, OMERACT 2006. J Rheumatol. 2007 (in press) [PubMed] [Google Scholar]
  • 18.Khanna D, Distler O, Avouac J, Behrens F, Clements PJ, Denton C, et al. Measures of response in clinical trials of systemic sclerosis: the combined response index for systemic sclerosis (CRISS) and Outcome Measures in Pulmonary Arterial Hypertension related to Systemic Sclerosis (EPOSS) J Rheumatol. 2009 Oct;36(10):2356–61. doi: 10.3899/jrheum.090372. [DOI] [PubMed] [Google Scholar]
  • 19.Thombs BD, Hudson M, Schieir O, Taillefer SS, Baron M. Reliability and validity of the center for epidemiologic studies depression scale in patients with systemic sclerosis. Arthritis Rheum. 2008 Mar 15;59(3):438–43. doi: 10.1002/art.23329. [DOI] [PubMed] [Google Scholar]
  • 20.Thombs BD, van LW, Bassel M, Baron M, Buzza R, Haslam S, et al. Psychological health and well-being in systemic sclerosis: State of the science and consensus research agenda. Arthritis Care Res (Hoboken ) 2010 Mar 16; doi: 10.1002/acr.20187. [DOI] [PubMed] [Google Scholar]
  • 21.Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992 Jun;30(6):473–83. [PubMed] [Google Scholar]
  • 22.Cronbach L. Coefficient alpha and the internal structure of tests. Psychometrica. 1951;16:297–334. [Google Scholar]
  • 23.Hays RD. Reliability and validity (including responsiveness) In: Fayers P, Hays RD, editors. Assessing quality of life in clinical trials. 2. New York: Oxford; 2005. pp. 25–39. [Google Scholar]
  • 24.Khanna D, Clements PJ, Postlethwaite AE, Furst DE. Does Incorporation of Aids and Devices Make a Difference in the Score of the Health Assessment Questionnaire-Disability Index? Analysis from a Scleroderma Clinical Trial. J Rheumatol. 2008 Mar;35(3):466–8. [PubMed] [Google Scholar]
  • 25.Liu H, Cella D, Gershon R, Shen J, Morales LS, Riley WT, et al. Representativeness of the PROMIS internet panel. Journal of Clinical Epidemiology. 2010 doi: 10.1016/j.jclinepi.2009.11.021. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hays RD, Reeve B. Measurement and Modeling of Health-Related Quality of Life. In: Heggenhougen K, Quah S, editors. International Encyclopedia of Public Health. Academic Press; 2008. pp. 241–52. [Google Scholar]
  • 27.Rose M, Bjorner JB, Becker J, Fries JF, Ware JE. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS) J Clin Epidemiol. 2008 Jan;61(1):17–33. doi: 10.1016/j.jclinepi.2006.06.025. [DOI] [PubMed] [Google Scholar]
  • 28.Keller SD, Bayliss MS, Ware JE, Jr, Hsu MA, Damiano AM, Goss TF. Comparison of responses to SF-36 Health Survey questions with one-week and four-week recall periods. Health Serv Res. 1997 Aug;32(3):367–84. [PMC free article] [PubMed] [Google Scholar]

RESOURCES