Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Nov 1.
Published in final edited form as: J Pain. 2010 Jun 2;11(11):1109–1119. doi: 10.1016/j.jpain.2010.02.005

PROMIS Pediatric Pain Interference Scale: An Item Response Theory Analysis of the Pediatric Pain Item Bank

James W Varni 1, Brian D Stucky 2, David Thissen 2, Esi Morgan DeWitt 3, Debra E Irwin 4, Jin-Shei Lai 5, Karin Yeatts 4, Darren A DeWalt 6
PMCID: PMC3129595  NIHMSID: NIHMS304877  PMID: 20627819

Abstract

An aim of the National Institutes of Health (NIH) Patient Reported Outcomes Measurement Information System (PROMIS) initiative is to develop item banks and computerized adaptive tests (CAT) that are applicable across a wide variety of chronic disorders. The PROMIS Pediatric Cooperative Group has concentrated on the development of pediatric self-report item banks for ages 8-17 years. The objective of the present study is to describe the Item Response Theory (IRT) analysis of the NIH PROMIS pediatric pain item bank and the measurement properties of the new unidimensional PROMIS Pediatric Pain Interference Scale. Test forms containing pediatric pain items were completed by a total of 3,048 respondents. IRT analyses regarding scale dimensionality, item local dependence, and differential item functioning were conducted. A pain item pool was developed to yield scores on a T-score scale with a mean of 50 and standard deviation of 10. The recommended 8-item unidimensional short form for the PROMIS Pediatric Pain Interference Scale contains the item set which provides the maximum test information at the mean (50) on the T-score metric. A simulated CAT was computed that provides the most information at five possible score locations (30, 40, 50, 60, and 70 on the T-score metric).

Keywords: Pain, pediatrics, PROMIS, pain interference, Item Response Theory

Introduction

The Patient Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) Roadmap Initiative, created to advance the assessment of patient-reported outcomes (PRO) in chronic diseases. To achieve this goal, self-report items are evaluated using modern measurement theory [“Item Response Theory” (IRT)] in order to derive assessments that are maximally reliable, valid, and generalizable for individuals falling along the full spectrum of the trait being measured [1]. A primary objective is to develop a group of item banks and computerized adaptive tests across a wide variety of chronic disorders [29]. During the past 5 years, the PROMIS Pediatric Cooperative Group has concentrated on the development of pediatric self-report PRO item banks for ages 8-17 years across five generic health domains (physical function, pain, fatigue, emotional health, social health) from the patient perspective, consistent with the larger PROMIS network [4]. It was anticipated that measures of these five health domains would be applicable across numerous pediatric chronic health conditions, and hence were developed as generic or nondisease-specific scales.

Given the widespread occurrence of chronic and recurrent pain in pediatric populations [12], particularly in pediatric chronic diseases [41], an item bank focused on pediatric pain items was an essential component of the PROMIS Pediatric Cooperative Group’s efforts. While the measurement of pain intensity using visual analogue scales [20; 40], rating scales [39; 36; 11], and pictorial scales [22; 21] has received empirical attention in pediatric populations over the past two decades as evidenced by recent comprehensive reviews [8; 32; 44; 5; 23; 26], the measurement of the pain interference construct has received less empirical attention, and consequently was an important focus in the development of the PROMIS pediatric pain item bank [46; 17]. For the purposes of this study, the a priori operational definition of “pain interference” was the interference by pain on daily activities during the past 7 days (interference upon physical, psychological, and social functioning). At the end of each item stem was the phrase “…when I had pain” to explicitly distinguish the items as pain-specific interference, rather than as generic functioning items.

While other scales have been developed that measure physical activities in pediatric patients, including those which have utilized either Rasch or IRT analyses [48; 14], these scales typically contain generic items (i.e., not pain-specific content) or have been used predominantly in specific populations [27]. In contrast, the Child Activity Limitations Interview (CALI) was designed to assess functional impairment in activities of daily living secondary to pediatric chronic and recurrent pain [28; 27]. However, the CALI and CALI-21 were developed utilizing Classical Test Theory rather than IRT. Early research with the CALI-21 demonstrates that it has two factors described as representing “Active and Routine activities”; such detailed factor analysis was an advance over earlier pain measures [27; 26]. Additional analyses of data from the CALI-21 would be helpful to investigate the possibility of local dependence and gender DIF. The larger sample sizes and IRT analytic techniques used in PROMIS item development permit these more detailed levels of psychometric scrutiny.

Thus, the majority of pediatric pain functional impairment scales, consistent with other pediatric assessment instruments, have utilized Classical Test Theory and have rarely taken advantage of IRT analysis in the scale development process [15; 19]. By utilizing IRT analysis, the resulting item bank can be the basis of a more customizable measure for meeting a researcher’s or clinician’s needs. Depending on the desired level of precision, the evaluator can then select the number of items to administer and obtain scores on the same metric as all other users of this item bank [10].

Consequently, the objective of the present study is to address this measurement gap in the pediatric pain literature by describing the IRT analysis of the PROMIS pediatric pain item bank and the measurement properties of the new PROMIS Pediatric Pain Interference Scale, including investigations of scale dimensionality and sources of local dependence and differential item functioning.

Methods

Sampling Plan

Participants were recruited in hospital-based outpatient general pediatrics and subspecialty clinics and in public school settings between January 2007 and May 2008 in North Carolina and Texas. This sample was derived to include a broad range of experiences from children that were healthy and children with chronic illnesses. Children completed questionnaires that included items across several domains of health including physical function, pain, fatigue, emotional distress, and social health. North Carolina and Texas were chosen as recruitment sites because of the diversity of cultural experience and population characteristics that existed in those areas.

To be eligible to participate in the large-scale testing survey, subjects were required to meet the following inclusion criteria: between the ages of 8 to 17 years old; able to speak and read English; and able to see and interact with a computer screen, keyboard, and mouse. They provided informed assent prior to study entry and a parent or guardian provided informed consent. Both the informed assent and the informed consent were administered in English so parents were also required to read and speak English. Parent reports were used to determine whether or not the child had any limitations (e.g., physical or cognitive) that would make it too difficult to complete a computer administered survey.

Potential clinic participants were identified through a variety of methods such as a review of pediatric clinic appointment rosters or while in the clinic waiting rooms according to protocols approved by the institutional review boards (IRBs) of The Children’s Hospital at Scott and White (S&W) in Texas, the University of North Carolina (UNC), and Duke University pediatrics clinics. The UNC, Duke, and S&W general pediatric clinics were representative of health issues for which children have physician office visits (e.g., well child visits, acute illnesses as well as some chronic illnesses). The specialty clinics included Pulmonology, Allergy, Gastroenterology, Rheumatology, Nephrology, Obesity, and Endocrinology and primarily saw children with more serious chronic illnesses. Children with asthma were over sampled during recruitment because asthma-specific items were tested. It was anticipated that pediatric patients in Rheumatology, Gastroenterology, and General Pediatrics would manifest recurrent or chronic pain based on previous literature [39; 16; 45].

School-based participants were recruited through the Chapel Hill-Carrboro (NC) Public School System including elementary after school programs as well as required middle and high school health classes. An informational packet about the study, including informed consent documents and a sociodemographic form, was mailed to all of the parents with children enrolled in the health classes to complete and return to the school.

Parents signed an informed consent document and children signed an informed assent document that outlined the following: purpose of the study, participation requirements, potential benefits and risks of participation and measures implemented to protect participant privacy. Child participants received a $10 gift card in return for their time and effort. The study protocols were approved by the institutional review boards at each institution.

To limit respondent burden, the number of items administered to any respondent was limited to no more than 76 items out of the entire pool of 293 PROMIS items and the legacy questionnaires. The items were written to accommodate low literacy levels [8]. Based on the experience of the research team, it was estimated that the younger children would be able to complete the survey in about 25 minutes and the adolescents in about 15 minutes. The 293 PROMIS items were divided among 4 testing forms and one additional form containing only general ‘legacy’ scales (See Table 1). The legacy scales were administered on a separate test form to characterize the population, but were not administered together with the PROMIS items. As such, this data collection does not allow us to compare individual responses on the legacy instruments with responses to the PROMIS items. Some items were administered on more than one form. The inclusion of overlapping items on different forms permits an evaluation of the associations between domains. Each PROMIS item from non-disease specific banks was administered to at least 754 respondents across four forms.

Table 1.

Survey participants demographic and background information

Form 1
n=759 (%)
Form 2
n=770 (%)
Form 3
n=754 (%)
Form 4
n=765 (%)
Child’s Gender
 Male
 Female
 Missing
382 (50.3)
377 (49.7)
0
351 (45.6)
419 (54.4)
0
355 (47.1)
399 (52.9)
0
382 (49.9)
383 (50.1)
0
Child’s Age (yrs)
 8-12
 13-17
 Missing
446 (58.8)
312 (41.1)
1 (0.1)
441 (56.4)
326 (42.3)
3 (0.3)
303 (40.2)
451 (59.8)
0
426 (55.7)
337 (44.0)
2 (0.3)
Child’s Race
 White
 Black or African-American
 American Indian/Alaska Native
 Asian
 Native Hawaiian/Pacific Is.
 Other
 Multiple Races
 Missing
457 (60.2)
154 (20.2)
5 (0.6)
12 (1.6)
0
58 (7.6)
47 (6.2)
26 (3.4)
452 (58.7)
168 (21.8)
10 (1.3)
13 (1.7)
1 (0.1)
50 (6.5)
54 (7.0)
22 (2.9)
457 (60.6)
172 (22.8)
7 (0.9)
6 (0.8)
2 (0.3)
58 (7.7)
27 (3.6)
25 (3.3)
462 (60.4)
150 (19.6)
10 (1.3)
10 (1.3)
2 (0.3)
64 (8.4)
43 (5.6)
24 (31.)
Child’s Ethnicity
 Non Hispanic
 Hispanic
 Missing
614 (80.9)
141 (18.6)
4 (0.5)
641 (83.2)
121 (15.7)
8 (1.1)
617 (81.8)
131 (17.4)
6 (0.8)
619 (80.9)
141 (18.4)
5 (0.7)
Child’s Chronic Conditions - 6 mo
 No
 Yes = 1 Chronic Condition
 Yes >= 2 Chronic Conditions
 Missing
600 (79.0)
113 (14.9)
44 (5.8)
2 (0.3)
580 (75.3)
145 (18.8)
42 (5.5)
3 (0.4)
569 (75.5)
134 (17.8)
46 (6.1)
5 (0.6)
592 (77.4)
120 (15.7)
49 (6.4)
4 (0.5)
Most Common Conditions
Diagnosed or Treated within 6
Months prior to Enrollment*
 Asthma 19 (2.5) 18 (2.3) 22 (2.9) 23 (3.0)
 ADD/ADHD 27 (3.6) 39 (5.1) 32 (4.2) 36 (4.7)
 Arthritis 23 (3.0) 28 (3.6) 25 (3.3) 24 (3.1)
 Gastrointestinal Disorders 24 (3.2) 21 (2.7) 15 (2.0) 15 (2.0)
 Mental Disorders 12 (1.6) 18 (2.3) 13 (1.7) 12 (1.6)
 Immune Disorders 11 (1.5) 18 (2.3) 18 (2.4) 16 (2.1)
Guardian’s Relationship to Child
 Parent
 Grandparent
 Guardian or Other
 Missing
696 (91.7)
32 (4.2)
31 (4.1)
0
717 (93.1)
30 (3.9)
21 (2.7)
2 (0.3)
695 (92.2)
32 (4.2)
26 (3.5)
1 (0.1)
708 (92.6)
43 (5.6)
13 (1.7)
1 (0.1)
Guardian’s Education Level
 <= 8th grade
 Some high school
 High school degree/GED
 Some college/technical degree
 College degree
 Advanced degree
 Missing
12 (1.6)
39 (5.1)
151 (19.9)
255 (33.6)
179 (23.6)
121 (15.9)
2 (0.3)
16 (2.3)
34 (4.4)
153 (19.7)
245 (31.8)
214 (27.8)
105 (13.6)
3 (0.4)
13 (1.8)
54 (7.2)
163 (21.6)
251 (33.2)
183 (24.3)
86 (11.4)
4 (0.5)
16 (2.1)
55 (7.2)
159 (20.8)
260 (34.0)
180 (23.5)
95 (12.4)
0
Data Collection Site
 Schools – NC
 Clinics - NC
 Clinics – TX
57 (7.5)
349 (46.0)
353 (46.5)
57 (7.4)
350 (45.5)
363 (47.1)
49 (6.5)
343 (45.5)
362 (48.0)
51 (6.7)
351 (45.9)
363 (47.4)
*

Parents reported more than 1 condition for some children; there were many other conditions reported in lower frequency (<1.5%) than the conditions listed.

Children without asthma were assigned sequentially to 1 of 5 forms (4 forms with PROMIS items and a few legacy general items and 1 form containing only legacy scales). This sampling plan was developed for collecting responses to the candidate items from the targeted PROMIS domains and was designed to accommodate multiple objectives: (1) confirm the factor structure of the domains; (2) evaluate items for local dependence (LD) and differential item functioning (DIF); and (3) calibrate the items for each domain using Item Response Theory.

We developed the PROMIS Pediatric item banks using a strategic item generation methodology adopted by the PROMIS Network [6]. Six phases of item development were implemented: identification of existing items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available pediatric scales. This was utilized to identify an initial item pool of over 3345 items. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network [4; 6]. Focus groups were used to confirm domain definitions, and to identify new areas of item development for future PROMIS item banks [46]. Cognitive interviews were used to examine and refine wording of individual items [17]. The pediatric items were written in the past tense with a seven day recall period and most utilized a standard set of response options [17]. Items successfully screened through the cognitive interview process were sent to field testing. The final item set contained 293 items across the 6 domains (Physical Function, Emotional Distress, Social Role Relationships, Fatigue, Pain, Asthma) [17].

Most pain items had a 7-day recall period and used standardized 5-point response options (never, almost never, sometimes, often, almost always). Occasionally, participants responded to items on an 11-point pain intensity scale (0 through 10), or a response scale in reference to the number of days (0 through 7 days). A complete list of items may be found in the Tables and Appendix.

Statistical and Psychometric Methods

The PROMIS methods used for the psychometric evaluation and calibration of the pain items have been previously described [29]. First, traditional descriptive statistics were computed to verify that there were no empty (zero frequency) response categories for any item, and as preliminary checks on the validity of the data. Included in these checks were tables of marginal frequencies of item responses and the correlations of item scores with the total summed score.

The IRT model that is used here for item analysis and scoring is based on the assumption that responses to the items indicate individual differences on a single underlying, or latent, variable (here, pain interference). To confirm the validity of that assumption, the second phase of the data analysis used confirmatory factor analysis (CFA) of the interitem polychoric correlation matrix to ensure that the latent variable underlying the item responses was unidimensional. These analyses were performed using the DWLS algorithm as implemented in the software LISREL [18]; this approach takes into account the categorical nature of the item responses, in a way that corresponds with the IRT model that is subsequently used.

In addition to a single-factor model, fitting additional factors, and/or error covariances, served as indications of local dependence (LD) for pairs or small numbers of items. LD is a term that describes any violation of the local independence assumption of unidimensional IRT [15]; that assumption is that all of the observed covariation among the item responses is accounted for by the single latent variable being measured by the scale. If a pair of items are more correlated than is accounted for by the latent variable underlying the responses to all of the (other) items, that is an indication that responses to those items behave to some extent as though the same question had been asked twice (which would produce perfect LD). If an additional factor appears for a small subset of items, that means those items as a cluster measure some other aspect of individual difference variation, and the data analyst must decide whether to measure that additional aspect separately, or set it aside. In the case of the construction of the pain interference scale, items were set aside from subsets that exhibited LD.

Third, after conducting CFA, item sets determined to be unidimensional were next calibrated by fitting Samejima’s Graded Response Model (GRM; [30]) using the software Multilog [7] (the GRM has been selected for other PROMIS scales [29]). Calibration, as that term is used in IRT analysis, refers to the estimation of a set of parameters for each item that characterize the relation of the item responses with the latent variable (here, pain interference) being measured. For each item, the GRM estimates a slope or discrimination parameter (a), reflecting the degree of association of the item responses with the latent construct being measured, and four threshold parameters (bk) (for five response option items; or seven thresholds for eight response options) that indicate the level of pain interference at which a response in a given category or higher becomes probable. In item analysis, the item parameters are used to compute an information function for each item. The statistical information provided by each item reflects the degree to which the item contributes to the precision of measurement of the scale in an additive way: If one has five items that each have information equal to 2.0 at some value of the latent variable, then the information value for the five-item scale is 10. The variance of measurement of the scale at that value of the latent variable is the inverse of the information, so that would be 0.1 in standard-score units. Classical Test Theory is based on algebra that assumes that the variance of measurement has the same value for all scores; in the classical theory, for scores in standard units, reliability is one minus the error variance, so for error variance 0.1, reliability is 0.9. IRT more realistically represents error variance as a quantity that varies as a function of the latent variable; error variance is small for levels of the latent variable where the items provide information, and larger elsewhere. Nevertheless, because 0.9 has often been considered a useful value of reliability, for IRT analysis 10 is a useful value of information. The item parameters computed during calibration can be used to identify the levels of the latent variable for which the items provide information, and items can be selected until aggregate information exceeds some desired value, like 10.

The item parameters obtained in the calibration phase are also used to compute IRT scale scores, either for a summed score for a fixed set of items, or for response patterns for any arbitrary subset of items in a pool. IRT scale scores are estimates of the value of the latent variable (pain interference, here) for which the observed item responses are likely. As a consequence of the assumption of unidimensionality, the IRT scale scores are on a single continuum, and comparable, even if respondents are measured using different subsets of items. This aspect of IRT represents one of its most important advantages over the classical theory, which can provide comparable scores only for a fixed set of items. One use of this feature of IRT is to assemble alternate short forms that yield comparable scores. Another more extreme use is to administer CATs, which adaptively select a customized set of items for each respondent, to provide maximum information at the level of that person. When using a CAT, each person may respond to a different set of questions; nevertheless, their IRT scale scores are quantitatively comparable.

The goodness of fit of the IRT model to the data was examined using the S X2 statistic [24-25] (generalized by Bjorner et al. [3]). As a goodness of fit statistic, a nonsignificant S X2 value suggests adequate fit of the model to the data.

Fourth, for item selection for the final pool, differential item functioning (DIF) was investigated between males and females using the IRT-LR DIF detection procedure [35] as implemented in the software IRTLRDIF [33]. In this case, DIF indicates that the relation of item responses with the latent variable differs between boys and girls. Such a difference suggests that some other factor, related to gender but different from the construct being measured, influences item responses, which is a violation of the assumption of unidimensionality. Here again, a nonsignificant χ2 indicates a lack of DIF. Because DIF detection involves a large number of tests of significance, the Benjamini-Hochberg procedure [2; 47] was used to control for multiple comparisons. In addition to χ2 statistics, graphical methods, as suggested by Steinberg & Thissen [31], were used to evaluate the magnitude of effect sizes when significant DIF was detected. After the item pool was selected, we also evaluated DIF between younger (ages 8-11) and older (ages 12-17) respondents; because we do not expect the scale to be used for the purpose of comparison of pain interference among children classified by age, we did not include these results among the item selection criteria, but the results are reported here.

Finally, though IRT scale scores may be computed from either item response patterns or summed scores, we expect scale scores for summed scores to be used more often. Thus, the Appendix Table A1 provides a translation table to be used for this purpose [34]. The IRT scale scores reported here use the North Carolina sample as the reference group.

Results

Test forms containing PROMIS pediatric pain items were completed by a total of 3,048 respondents. The sample was about 52% female and 58% of the children were between the ages 8 to 12 years old. Sixty percent were Caucasian, 21% Black, 6% multi-racial, and 13% other races (Asian/Pacific Islanders, Native Americans and Other Races). Eighteen percent of the sample was of Hispanic ethnicity. The vast majority of the adults providing informed consent for the children were parents of the child (92%) or grandparents (4%). The educational attainment of these parents or guardians ranged from less than high school (8%) to advanced degree (13%) with 25% reporting a college degree, 33% some college, and 21% a high school diploma. Approximately 23% of the children participating in the survey had a chronic illness diagnosis during the past 6 months. Participant characteristics are summarized in Table 1.

There were adequate numbers of pain items on each of the four forms to permit factor analysis of each. Tables 2 and 3 provide the factor loadings from models that fit well. The models indicate that the items on separate forms are generally unidimensional, though with some evidence of local dependence. Local dependence, or nuisance multidimensionality, is modeled in Forms 1, 2, and 4 (Table 2) by error covariances (in this case between two items, or “doublets”). Form 3 (Table 3) contains three items (a “triplet”) pertaining to the physical limitations caused by pain, and as such was modeled as a second factor (with a correlation between the general pain interference factor and the “difficulty moving” subfactor). Indicators of goodness of fit suggest all four models fit the data well, using indices suggested by Reeve et al. [29]: For Form 1 (Table 2) , χ2(7) = 9, CFI = 1.00, TLI = 1.00, RMSEA = 0.02; Form 2 (Table 2), χ2(12) = 10, CFI = 1.00, TLI = 1.00, RMSEA = 0.00; Form 3 (Table 3), χ2(10) = 8, CFI = 1.00, TLI = 1.00, RMSEA = 0.00; and Form 4 (Table 2), χ2(13) = 21, CFI = 1.00, TLI = 1.00, RMSEA = 0.03.

Table 2.

Factor Loadings and Error Covariances for Pain Interference Items on Forms 1, 2, and 4

Form 1 Items Pain Doublet Error
Covariances
How many days were you free of pain (no pain)? 0.40 0.14
How bad is your pain right now? 0.52
It was hard for me to think when I had pain. 0.85
It was hard for me to ride in a car when I had pain. 0.72 0.13
It was hard for me to walk one block when I had pain. 0.71
I felt grumpy when I had pain. 0.61
Form 2 Items
How bad was your worst pain? 0.55 0.33
How bad was your pain on average? 0.51
I felt sad when I had pain. 0.72
I had trouble doing schoolwork when I had pain. 0.72 0.15
I had trouble watching TV when I had pain. 0.62
It was hard for me to run when I had pain. 0.76
I had trouble sleeping when I had pain. 0.80
Form 4 Items
It was hard to get along with other people when I had pain. 0.62 0.21
I wanted to be alone when I had pain. 0.54
I hurt a lot. 0.60
It was hard for me to remember things when I had pain. 0.63
It was hard to do sports or exercise when I had pain. 0.67
I missed school when I had pain. 0.56
It was hard to stay standing when I had pain. 0.80

All factor loadings and error covariances are significantly different from zero at p < .05.

Table 3.

Factor Loadings and Error Covariances for Pain Interference Items on Form 3.

Form 3 Items Pain Difficulty
Moving
I had trouble moving around when I had pain. 0.30 0.61
It was hard to have fun when I had pain. 0.40 0.49
It was hard for me to walk up a flight of stairs when I had pain. 0.50 0.34
How many days did you have pain? 0.67
It was hard for me to pay attention when I had pain. 0.81
I hurt all over my body. 0.73
I felt angry when I had pain. 0.67

All factor loadings are significantly different from zero at p < .05.

The correlation between the general pain interference and difficulty moving factors is r = 0.79.

The local dependence in Forms 1 through 4 occurs primarily because items share similar wording, or have shared content that differs from the content of the scale’s other items. As an example of shared item content, Form 3 contains a “triplet,” or 3 items with responses that are more related than expected given the items’ relationship with the pain interference dimension. In this case the triplet measures physical limitations caused by pain. In other instances, local dependence may result from shared content or the response scale used. Form 2 contains two items measuring pain intensity on a 0 to 10 scale. In addition to being similarly worded and assessed on a unique response scale, the items are measuring pain intensity, while the scale’s other items assess interference on daily activities caused by pain. To ensure unidimensionality of the final scales, only one item from each doublet or triplet was included in the final item pool.

Following the factor analyses, locally independent sets of items from Forms 1 through 4 were calibrated using the GRM. To control for local dependence identified in the item factor analyses, separate item calibrations were completed for each collection of unidimensional items. This process resulted in two sets of calibrations for each Form (three in the case of Form 3). To avoid capitalization on chance, we conservatively selected parameter estimates across calibrations that had the lower estimated slope. Table 4 shows the item parameter estimates, item fit statistics (S X2), and DIF statistics (LR X2) for the items comprising the final pool (sorted in order of magnitude of slope parameters), and for the items set aside.

Table 4.

Item Parameters, Fit Indices, and DIF Statistics for the Pain Interference Items

Item Parameters S X2 Fit Index LR DIF
Item Stem a b1 b2 b3 b4 X2 d.f. p X2 d.f. p
Final item pool:
I had trouble sleeping when I had pain. 2.35 −0.23 0.31 1.17 1.69 57 45 0.108 13.1 5 0.022
It was hard for me to pay attention when I had
pain.
2.35 −0.25 0.32 1.33 2.03 37 34 0.332 8.9 5 0.113
It was hard to stay standing when I had pain. 2.35 −0.18 0.44 1.40 1.97 47 33 0.054 1.0 5 0.963
It was hard to have fun when I had pain. 2.31 −0.49 0.00 1.02 1.71 56 36 0.018 2.2 5 0.821
It was hard for me to walk one block when I
had pain.
2.14 0.28 0.79 1.50 1.97 33 30 0.323 7.4 5 0.193
I had trouble doing schoolwork when I had
pain.
1.94 −0.23 0.46 1.47 2.16 46 48 0.555 5.6 5 0.347
It was hard for me to run when I had pain. 1.89 −0.85 −0.25 0.85 1.63 55 47 0.198 6.3 5 0.278
I hurt all over my body. 1.82 0.49 1.19 2.05 2.72 44 32 0.077 4.8 5 0.441
I felt angry when I had pain. 1.62 −0.01 0.66 1.56 2.24 48 37 0.106 9.6 5 0.087
It was hard for me to remember things when I
had pain.
1.50 0.29 1.08 2.12 3.55 38 33 0.252 5.3 5 0.380
I hurt a lot. 1.41 −0.48 0.76 2.17 3.04 47 35 0.085 4.2 5 0.521
It was hard to get along with other people when
I had pain.
1.34 −0.24 0.60 1.77 2.74 46 36 0.123 7.1 5 0.213
I missed school when I had pain. 1.26 0.13 0.93 2.30 3.02 46 37 0.147 7.7 5 0.174
Items set aside due to LD:
It was hard for me to ride in a car when I had
pain.
2.02 0.63 1.11 1.92 2.71 34 31 0.325 3.8 5 0.579
I had trouble watching TV when I had pain. 1.45 0.63 1.43 2.49 3.13 45 39 0.235 7.1 5 0.213
I had trouble moving around when I had pain. 2.16 −0.56 0.10 1.12 1.85 42 37 0.263 7.1 5 0.213
It was hard for me to walk up a flight of stairs
when I had pain.
2.22 −0.04 0.54 1.33 1.82 46 37 0.147 8.9 5 0.113
I wanted to be alone when I had pain. 1.13 −0.44 0.38 1.65 2.69 61 40 0.018 3.7 5 0.593
Items set aside due to DIF:
It was hard for me to think when I had pain. 2.61 −0.19 0.46 1.35 1.84 55 33 0.009 20.9 5 0.001
I felt sad when I had pain. 1.85 −0.14 0.50 1.51 2.08 45 41 0.308 18.4 5 0.002
It was hard to do sports or exercise when I had
pain.
1.65 −0.79 0.00 1.20 2.16 37 36 0.423 16.2 5 0.006
I felt grumpy when I had pain. 1.30 −0.19 0.62 1.92 2.90 54 33 0.012 20.1 5 0.001
Items set aside due to low discrimination:
How bad is your pain right now? 1.08 58 41 0.041
How many days were you free of pain? 0.79 122 45 0.000 7.4 5 0.193
How bad was your worst pain? 1.16 70 69 0.444
How bad was your pain on average? 1.31 66 63 0.374
How many days did you have pain? 1.50 41 42 0.515 16.1 5 0.007

Due to the high number of response categories for items measuring pain intensity, threshold parameters are not reported. In addition, items with 10 threshold parameters were not analyzed for DIF.

The North Carolina sample is set as the scale for the item calibrations (mean of 0 and variance of 1).

The Benjamini-Hochberg correction for multiplicity was used with the fit and DIF statistics. Two items had either significant DIF or lack of fit as indicated by the S X2 statistic; however, these items were retained when considered in relation to the relatively good fit of the items comprising the final pool. As indicated in Table 4, there were 15 items set aside. Five were set aside from locally dependent item sets. An additional five were set aside due to low discrimination parameters. Interestingly, these items measured pain intensity, and as such discriminate poorly between levels of pain interference. Finally, four items were set aside for DIF (both threshold and slope DIF). As an interpretive example of threshold DIF, boys were less willing to endorse the item “It was hard to do sports or exercise when I had pain,” after controlling for mean and variance differences between boys and girls. Additionally, slope DIF occurred for the item “I felt grumpy when I had pain,” indicating that “feeling grumpy” is a poor indicator of pain interference for boys. The remaining 13 items comprise the final pain item pool.

In the analysis of DIF by age, five of the 13 items in the pool exhibited significant DIF. For three of those items, the aggregate effect size of the DIF is very small: For the items “It was hard for me to pay attention when I had pain,” “I had trouble doing schoolwork when I had pain,” and “I felt angry when I had pain,” the difference between older and younger children in the expected value of the item response on the 0-4 scale is much less than a half point across the entire range of the latent variable pain interference. To a large extent, the tendency is for those three items to be slightly more discriminating for older than younger children. For the item “It was hard to remember things when I had pain,” younger children tend to give slightly higher responses than older children; the difference, which varies as a function of the latent variable, is around a half point on the 0-4 scale. For “It was hard to get along with other people when I had pain,” older children tend to select slightly higher responses than younger children (again, the difference is a fraction of a point on the 0-4 scale, and is only observed for respondents at high levels of pain interference).

Figure 1 shows test information functions for the pain item pool and four potential short forms on a T-score scale with a mean of 50 and standard deviation of 10 (on which all PROMIS scales are reported). Test information is the expected value of the inverse of the squared standard error of measurement, and indicates the precision of scores on a scaled metric. A standard error of measurement of approximately 0.32 (on a standardized metric, or 3.2 on a T-score metric) is associated with a test information value of 10 and hence a reliability coefficient of approximately 0.90. Three 8-item short forms provide test information greater than 10 for a range of scores between, approximately, 45 to 70 on the T-score scale. The recommended 8-item short form in the Appendix contains the item set which provides the maximum test information at the mean (50) on the T-score metric. However, if more score precision is required (or “broader” precision), the complete item pool is contained in Table 2 and may be used to compute IRT response pattern scores or IRT-scaled scores from summed scores.

Figure 1.

Figure 1

Test information functions for Pediatric Pain Interference Scale.

Figure 1 also serves as a simulated Computer Adaptive Test (CAT). A CAT selects items based on an individual’s response to previous items. As such, a CAT can theoretically choose the most informative items for an individual depending on their level of the trait being measured, in this case, pain interference. For this simulation, separate test information functions are computed from the 8 items that provide the most information at five possible score locations (30, 40, 50, 60, and 70 on the T-score metric). In other words, the items used to generate the test information function at T = 50 are those that a perfect CAT would select for an individual at the mean of pain interference. To consider the usefulness of CAT given these items, one may compare both the range of score precision and the magnitude of score precision across the separate potential short forms. In this case, because the items in the final pool generally discriminate in the same range, there is little score precision gained between the four potential short forms. However, the PROMIS Assessment Center contains the item pool and is capable of administering these items as a CAT if the researcher desires to do so.

Discussion

Recent recommendations from the Pediatric Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (PedIMMPACT) indicated that investigators conducting clinical trials in pediatric chronic and recurrent pain “should consider assessing outcomes in pain intensity; physical functioning; emotional functioning; role functioning; symptoms and adverse events; global judgment of satisfaction with treatment; sleep; and economic factors [23].” However, the consensus by the PedIMMPACT group was that “pain-related functional impairment” measures still require further research. The PROMIS Pediatric Pain Interference Scale in part addresses this identified gap in the empirical literature with the advantages of IRT analyses in the instrument development process.

The present study describes the development of the new NIH PROMIS Pediatric Pain Interference Scale based on an iterative series of IRT analyses regarding scale dimensionality, item local dependence, and differential item functioning. After determining scale dimensionality, items with local dependence and differential item functioning were next identified and removed resulting in the final unidimensional PROMIS Pediatric Pain Interference Scale. A number of possible methods for scoring are presented that can be tailored to meet the objectives of a particular clinical research endeavor. To our knowledge, this is the first pediatric pain interference scale developed through IRT analyses.

The vast majority of generic pediatric pain measures in the empirical literature have utilized Classical Test Theory and generally have not taken full advantage of IRT analysis in the scale development process. The potential advantages of utilizing IRT analysis in item and scale development include greater flexibility in selecting items from the existing pediatric pain item bank tailored to the objectives of a particular clinical research investigation. Further, scales that have been developed with Classical Test Theory often have gaps in their ability to measure the full spectrum of the latent construct; while in contrast, with IRT calibrated items one can construct a measure that is useful across the full continuum of the latent variable [10]. Thus, this analytic methodology provides clinical researchers the opportunity to select the most meaningful items for their study design and hypotheses. In the present study, we proposed a short form measuring pediatric pain interference; however a smaller subset of items from the item bank can also be used and scored on the same metric as the larger set using a more dynamic CAT algorithm.

By administering the pain items spread over several test forms, we are unable to perform factor analyses across the entire bank. This limitation makes it impossible to ensure that pain items from different forms do not exhibit local dependence. Additionally, it is possible that factor analyses would turn out differently if the pain items were analyzed as a single set. Instead, factor analysis was conducted over the subgroups of pain items tested on each form. Because the pain items were created to fill content from qualitative work and then were randomly allocated to each test form, the different test forms can be viewed as replications. By having replicated factor analyses, our impressions of multidimensionality, when repeated across forms, increased our confidence in the factor analytic results. We are currently performing cross-sectional testing using the entire bank to verify these results.

We recruited children from clinics in Texas and North Carolina and schools in North Carolina to achieve a sample with diverse experiences in terms of health outcomes, but also cultural and ethnic influences. This study does not report on using the items in languages other than English or in children living in other countries, as such, we cannot assume that the scales would have the same test characteristics in those other populations.

Using the current sample, we were able to determine that two of the items in the pool, “It was hard to remember things when I had pain,” and “It was hard to get along with other people when I had pain” exhibit sufficient DIF between younger and older children that it would not be wise to use those items in an instrument meant to compare pain interference levels across age. However, for comparisons within age based on other variables, such as treatments, those items are discriminating and useful so they remain in the pool. Future research with other samples may reveal other sources of DIF for other items; an advantage of IRT as a method is that it can detect item-level DIF (a concept completely ignored by Classical Test Theory), and “flag” items to be used only with caution for comparisons across levels of a variable for which DIF exists. Because comparison across gender is ubiquitous, items exhibiting substantial DIF between boys and girls have been set aside from the item pool. Although careful analysis of DIF, as was performed in this study, led to a smaller item bank, we believe this approach will ultimately yield a more broadly applicable measure for comparing results across important populations.

The PROMIS Pediatric Items use a 7-day recall period. The appropriate recall period for pain and other symptoms and functions is a topic of considerable debate with no sound conclusions as to the “best” way to construct a measure, particularly in children. Almost certainly, these effects would be more pronounced in the area of pain severity or frequency than by pain interference. The pain interference items allow the respondent to assess how pain affected their activities which anchors their pain experience in other activities.

The PROMIS pediatric pain item bank was developed to provide accurate and efficient assessment of this important domain utilizing IRT item calibrations, anticipating its use in pediatric patients with chronic and recurrent pain. We are currently testing this item bank, along with other PROMIS pediatric scales in children with rheumatic disease, sickle cell disease, cancer, chronic kidney disease, obesity, and a rehabilitation population to further evaluate aspects of construct validity. In conclusion, the present study provides initial IRT calibrations of the PROMIS pediatric pain interference item bank and the creation of the NIH PROMIS Pediatric Pain Interference Scale which addresses an important gap in the current literature. Further research is indicated on construct validity, including hypothesized associations with emotional distress [38; 36; 9], fatigue [37; 11], functional status [43; 13], pediatric pain coping strategies [42; 45] and generic health-related quality of life [39; 16; 11], as well as tests of the responsiveness of this new scale and item banks in larger samples of pediatric patients with chronic and recurrent pain.

Perspective.

The present study provides initial calibrations of the National Institutes of Health (NIH) Patient Reported Outcomes Measurement Information System (PROMIS) pediatric pain item bank and the creation of the PROMIS Pediatric Pain Interference Scale. It is anticipated that this new scale will have application in pediatric chronic and recurrent pain.

Acknowledgements

This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1U01AR052181-01. Information on the Patient-Reported Outcomes Measurement Information System (PROMIS) can be found at http://nihroadmap.nih.gov/ and http://www.nihpromis.org. We would like to acknowledge the contribution of Harry A. Guess, MD, PhD to the conceptualization and operationalization of this research prior to his death. We thank Jolynn Pek, Guillaume Filteau, and James McGinley for assistance with the data analysis.

Appendix

Listed below are the item stems for the recommended eight-item short forms for the PROMIS Pediatric Pain Interference Scale. All items use a 7-day recall period (the preface is “In the past seven days”), and a 5-point response scale with the options never (0), almost never (1), sometimes (2), often (3) and almost always (4).

PROMIS Pediatric Pain Interference Scale Items

  • I had trouble sleeping when I had pain.

  • It was hard for me to pay attention when I had pain.

  • It was hard to stay standing when I had pain.

  • It was hard to have fun when I had pain.

  • I had trouble doing schoolwork when I had pain.

  • It was hard for me to walk one block when I had pain.

  • It was hard for me to run when I had pain.

  • I felt angry when I had pain.

Summed score to scale score translation for these short forms is in Table A1.

Table A1.

Summed Score to Scale Score Translation Table for the Recommended Short Form

Summed
Score
Scale
Score
SD
0 34 6
1 39 4
2 41 4
3 43 4
4 44 4
5 46 3
6 47 3
7 48 3
8 50 3
9 51 3
10 52 3
11 53 3
12 54 3
13 55 3
14 56 3
15 57 3
16 58 3
17 59 3
18 60 3
19 60 3
20 61 3
21 62 3
22 63 3
23 64 3
24 65 3
25 67 3
26 68 3
27 69 3
28 70 3
29 72 3
30 73 4
31 75 4
32 78 5

Scale scores are on a T-score scale; the values of SD are reported as conditional standard errors of measurement.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • [1].Ader DN. Developing the Patient-Reported Outcomes Measurement Information System (PROMIS) Medical Care. 2007;45(Suppl 1):S1–S2. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;57:289–300. [Google Scholar]
  • [3].Bjorner JB, Smith KJ, Edelen MO, Stone C, Thissen D, Sun X. IRTFIT: A Macro for Item Fit and Local Dependence Tests under IRT Models. QualityMetric Incorporated; Lincoln, RI: 2007. [Google Scholar]
  • [4].Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader DN, Fries JF, Bruce B, Rose M. The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH Roadmap Cooperative Group during its first two years. Medical Care. 2007;45(Suppl 1):S3–S11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cohen LL, Lemanek K, Blount RL, Dahlquist LM, Lim CS, Palermo TM, McKenna KD, Weiss KE. Evidence-based assessment of pediatric pain. Journal of Pediatric Psychology. 2008;33:939–955. doi: 10.1093/jpepsy/jsm103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].DeWalt DA, Rothrock N, Yount S, Stone AA. Evaluation of item candidates: The PROMIS qualitative item review. Medical Care. 2007;45(Suppl 1):S12–S21. doi: 10.1097/01.mlr.0000254567.79743.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].du Toit M. IRT from SSI. Scientific Software International; Lincolnwood, IL: 2003. [Google Scholar]
  • [8].Eccleston C, Jordan AL, Crombez G. The impact of chronic pain on adolescents: A review of previously used measures. Journal of Pediatric Psychology. 2006;31:684–697. doi: 10.1093/jpepsy/jsj061. [DOI] [PubMed] [Google Scholar]
  • [9].Eccleston C, McCracken LM, Jordan A, Sleed M. Development and preliminary psychometric evaluation of the parent report version of the Bath Adolescent Pain Questionnaire (BAPQ-P): A multidimensional parent report instrument to assess the impact of chronic pain on adolescents. Pain. 2007;131:48–56. doi: 10.1016/j.pain.2006.12.010. [DOI] [PubMed] [Google Scholar]
  • [10].Embretson SE, Reise SP. Item Response Theory for Psychologists. Erlbaum; Mahwah, NJ: 2000. [Google Scholar]
  • [11].Gold JI, Mahrer NE, Yee J, Palermo TM. Pain, fatigue, and health-related quality of life in children and adolescents with chronic pain. Clinical Journal of Pain. 2009;25:407–412. doi: 10.1097/AJP.0b013e318192bfb1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Goodman JE, McGrath PJ. The epidemiology of pain in children and adolescents. Pain. 1991;46:247–264. doi: 10.1016/0304-3959(91)90108-A. [DOI] [PubMed] [Google Scholar]
  • [13].Hainsworth KR, Davies WH, Khan KA, Weisman SJ. Development and preliminary validation of the Child Activity Limitations Questionnaire: Flexible and efficient assessment of pain-related functional disability. Journal of Pain. 2007;8:746–752. doi: 10.1016/j.jpain.2007.05.005. [DOI] [PubMed] [Google Scholar]
  • [14].Haley SM, Fragala-Pinkham MA, Dumas HM, Ni P, Gorton GE, Watson K, Montpetit K, Bilodeau N, Hambleton RK, Tucker CA. Evaluation of an item bank for a computerized adaptive test of activity in children with cerebral palsy. Physical Therapy. 2009;89:589–600. doi: 10.2522/ptj.20090007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Hill CD, Edwards MC, Thissen D, Langer MM, Wirth RJ, Burwinkle TM, Varni JW. Practical issues in the application of item response theory: A demonstration using items from the Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 Generic Core Scales. Medical Care. 2007;45(Suppl 1):S39–S47. doi: 10.1097/01.mlr.0000259879.05499.eb. [DOI] [PubMed] [Google Scholar]
  • [16].Huguet A, Miró J. The severity of chronic pediatric pain: An epidemiological study. Journal of Pain. 2008;9:226–236. doi: 10.1016/j.jpain.2007.10.015. [DOI] [PubMed] [Google Scholar]
  • [17].Irwin DE, Varni JW, Yeatts K, deWalt DA. Cognitive interviewing methodology in the development of a pediatric item bank: a patient reported outcomes measurement information system (PROMIS) study. Health and Quality of Life Outcomes. 2009;7(3):1–10. doi: 10.1186/1477-7525-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Joreskog KG, Sorbom D. LISREL 8.5. Scientific Software International, Inc.; Lincolwood, IL: 2003. [Google Scholar]
  • [19].Langer MM, Hill CD, Thissen D, Burwinkle TM, Varni JW, DeWalt DA. Item response theory detects differential item functioning between healthy and ill children in quality-of-life measures. Journal of Clinical Epidemiology. 2008;61:268–276. doi: 10.1016/j.jclinepi.2007.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].McGrath PA. The measurement of human pain. Endodonics and Dental Traumatology. 1986;2:124–129. doi: 10.1111/j.1600-9657.1986.tb00598.x. [DOI] [PubMed] [Google Scholar]
  • [21].McGrath PA. Pain in children: Nature, assessment, and treatment. Guilford; New York: 1990. [Google Scholar]
  • [22].McGrath PJ. The clinical measurement of pain in children: A review. Clinical Journal of Pain. 1986;1:221–227. [Google Scholar]
  • [23].McGrath PJ, Walco GA, Turk DC, Dworkin RH, Brown MT, Davidson K, Eccleston C, Finley GA, Goldschneider K, Haverkos L, Hertz SH, Ljungman G, Palermo T, Rappaport BA, Rhodes T, Neil Schechter N, Scott J, Sethna N, Svensson OK, Stinson J, von Baeyer CL, Walker L, Weisman S, White RE, Zajicek A, Lonnie Zeltzer L. Core outcome domains and measures for pediatric acute and chronic/recurrent pain clinical trials: PedIMMPACT recommendations. Journal of Pain. 2008;9:771–783. doi: 10.1016/j.jpain.2008.04.007. [DOI] [PubMed] [Google Scholar]
  • [24].Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement. 2000;24:50–64. [Google Scholar]
  • [25].Orlando M, Thissen D. Further examination of the performance of S-X2, an item fit index for dichotomous item response theory models. Applied Psychological Measurement. 2003;27:289–298. [Google Scholar]
  • [26].Palermo TM. Assessment of chronic pain in children: Current status and emerging topics. Pain Research and Management. 2009;14:21–26. doi: 10.1155/2009/236426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Palermo TM, Lewandowski AS, Long AC, Burant CJ. Validation of a self-report questionnaire version of the Child Activity Limitations Interview (CALI): The CALI-21. Pain. 2008;139:644–652. doi: 10.1016/j.pain.2008.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Palermo TM, Witherspoon D, Valenzuela D, Drotar DD. Development and validation of the Child Activity Limitations Interview: a measure of pain-related functional impairment in school-age children and adolescents. Pain. 2004;109:461–470. doi: 10.1016/j.pain.2004.02.023. [DOI] [PubMed] [Google Scholar]
  • [29].Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, Thissen D, Revicki DA, Weiss DL, Hambleton RK, Lui H, Gershon R, Reise SP, Lai JS, Cella D. Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Report Outcomes Measurement Information System (PROMIS) Medical Care. 2007;45(Suppl 1):S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
  • [30].Samejima F. Graded response model. In: van der Linden WJ, Hambleton RK, editors. Handbook of Item Response Theory. Springer-Verlag; New York: 1997. pp. 85–100. [Google Scholar]
  • [31].Steinberg L, Thissen D. Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods. 2006;11:402–415. doi: 10.1037/1082-989X.11.4.402. [DOI] [PubMed] [Google Scholar]
  • [32].Stinson JN, Kavanagh T, Yamada J, Gill N, Stevens B. Systematic review of the psychometric properties, interpretability and feasibility of self-report pain intensity measures for use in clinical trials in children and adolescents. Pain. 2006;125:143–157. doi: 10.1016/j.pain.2006.05.006. [DOI] [PubMed] [Google Scholar]
  • [33].Thissen D. IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. L.L.Thurstone Psychometric Laboratory, The University of North Carolina at Chapel Hill; Chapel Hill, NC: 2001. [Google Scholar]
  • [34].Thissen D, Nelson L, Rosa K, McLeod LD. Item response theory for items scored in more than two categories. In: Thissen D, Wainer H, editors. Test Scoring. Lawrence Erlbaum Associates; Mahwah, NJ: 2001. pp. 141–186. [Google Scholar]
  • [35].Thissen D, Steinberg L, Wainer H. Detection of differential item functioning using the parameters of item response models. In: Holland PW, Wainer H, editors. Differential Item Functioning. Lawrence Erlbaum Associates; Hillsdale, NJ: 1993. pp. 67–113. [Google Scholar]
  • [36].Varni JW, Burwinkle TM, Katz ER. The PedsQL™ in pediatric cancer pain: A prospective longitudinal analysis of pain and emotional distress. Journal of Developmental and Behavioral Pediatrics. 2004;25:1–8. doi: 10.1097/00004703-200408000-00003. [DOI] [PubMed] [Google Scholar]
  • [37].Varni JW, Burwinkle TM, Limbers CA, Szer IS. The PedsQL™ as a patient-reported outcome in children and adolescents with fibromyalgia: An analysis of OMERACT domains. Health and Quality of Life Outcomes. 2007;5(9):1–12. doi: 10.1186/1477-7525-5-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Varni JW, Rapoff M, Waldron SA, Gragg RA, Bernstein BH, Lindsley CB. Chronic pain and emotional distress in children and adolescents. Journal of Developmental and Behavioral Pediatrics. 1996;17:154–161. [PubMed] [Google Scholar]
  • [39].Varni JW, Seid M, Knight TS, Burwinkle TM, Brown J, Szer IS. The PedsQL™ in pediatric rheumatology: Reliability, validity, and responsiveness of the Pediatric Quality of Life Inventory™ Generic Core Scales and Rheumatology Module. Arthritis and Rheumatism. 2002;46:714–725. doi: 10.1002/art.10095. [DOI] [PubMed] [Google Scholar]
  • [40].Varni JW, Thompson KL, Hanson V. The Varni/Thompson Pediatric Pain Questionnaire: I. Chronic musculoskeletal pain in juvenile rheumatoid arthritis. Pain. 1987;28:27–38. doi: 10.1016/0304-3959(87)91056-6. [DOI] [PubMed] [Google Scholar]
  • [41].Varni JW, Walco GA, Katz ER. Assessment and management of chronic and recurrent pain in children with chronic diseases. Pediatrician. 1989;16:56–63. [PubMed] [Google Scholar]
  • [42].Varni JW, Waldron SA, Gragg RA, Rapoff MA, Bernstein BH, Lindsley CB, Newcomb MD. Development of the Waldron/Varni Pediatric Pain Coping Inventory. Pain. 1996;67:141–150. doi: 10.1016/0304-3959(96)03077-1. [DOI] [PubMed] [Google Scholar]
  • [43].Varni JW, Wilcox KT, Hanson V, Brik R. Chronic musculoskeletal pain and functional status in juvenile rheumatoid arthritis: An empirical model. Pain. 1988;32:1–7. doi: 10.1016/0304-3959(88)90016-4. [DOI] [PubMed] [Google Scholar]
  • [44].von Baeyer CL, Spagrud LJ. Systematic review of observational (behavioral) measures of pain for children and adolescents aged 3 to 18 years. Pain. 2007;127:140–150. doi: 10.1016/j.pain.2006.08.014. [DOI] [PubMed] [Google Scholar]
  • [45].Walker LS, Baber KF, Garber J, Smith CA. A typology of pain coping strategies in pediatric patients with chronic abdominal pain. Pain. 2008;137:266–275. doi: 10.1016/j.pain.2007.08.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Walsh TR, Irwin DE, Meier A, Varni JW, DeWalt DA. The use of focus groups in the development of the PROMIS pediatrics item bank. Quality of Life Research. 2008;17:725–735. doi: 10.1007/s11136-008-9338-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Williams V, Jones LV, Tukey JW. Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics. 1999;24:42–69. [Google Scholar]
  • [48].Young NL, Williams JI, Yoshid KK, Wright JG. Measurement properties of the Activities Scale for Kids. Journal of Clinical Epidemiology. 2000;53:125–137. doi: 10.1016/s0895-4356(99)00113-4. [DOI] [PubMed] [Google Scholar]

RESOURCES