Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 May 1.
Published in final edited form as: Health Psychol. 2019 May;38(5):435–444. doi: 10.1037/hea0000685

Interviewer- vs. Self-administration of PROMIS Measures for Adults with Traumatic Injury

Pamela A Kisala 1, Aaron J Boulton 1, Matthew L Cohen 2, Mary D Slavin 3, Alan M Jette 4, Susan Charlifue 5, Robin Hanks 6, MJ Mulcahey 7, David Cella 8, David S Tulsky 1,9
PMCID: PMC6506178  NIHMSID: NIHMS1022692  PMID: 31045427

Abstract

Objective:

To assess differential item functioning (DIF) and observed mean differences across two modes of administration for PROMIS measure scores in a sample of adults with traumatic injury.

Methods:

Items from 7 PROMIS adult measures (v1.0 Physical Function, Fatigue, Pain Interference, Anger, Anxiety, and Depression and v2.0 Social Health - Emotional Support) were administered as fixed-length short forms in random order to a cross-sectional sample. Participants were randomly assigned to interviewer- administered (phone or in-person) or self-administered (via Assessment CenterSM website) conditions.

The research was conducted at five medical rehabilitation institutions across the U.S. Participants included 277 adults with spinal cord injury (SCI; n = 148) or traumatic brain injury (TBI; n = 129).

Results:

DIF analyses indicated that all items were invariant to mode of administration. There was no significant effect of mode of administration for the majority of PROMIS measures tested. Regarding observed scores, there were small but significant effects of mode of administration on the Emotional Support and Depression measures, with participants in the interview condition reporting better support/fewer symptoms.

Conclusions:

PROMIS instruments demonstrated measurement equivalence across interviewer- administered and self-administered conditions. These findings are particularly important for research or clinical applications where administration of PROMIS measures by independent, web- or tablet-based administration is not ideal, for example with individuals with physical or cognitive disabilities or with individuals who lack computer and/or internet access. PROMIS v1.0 Depression and PROMIS v2.0 Emotional Support scores displayed a tendency towards social desirability which should be considered when these measures are interviewer- administered.

Keywords: spinal cord injury, traumatic brain injury, patient outcomes assessment, PROMIS, mode of administration


A patient reported outcome (PRO) is an individual’s report of his or her own health status that has not been interpreted by a clinician or anyone else (U.S. Food and Drug Administration,2006). Over the past decade, the science and application of PROs have advanced dramatically. Today, the patient’s perspective of his or her own health is a core pillar of both evidence-based practice (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996) and patient- and family- centered medicine (Frank, Basch, & Selby, 2014). The Patient Reported Outcomes Measurement Information System® (PROMIS®) was supported by the NIH Roadmap for Medical Research Initiative (now the NIH Common Fund) and is a comprehensive set of PROs that assess health- related quality of life (HRQOL), including physical, emotional, and social components of health (Cella et al., 2010a). PROMIS typifies patient-centeredness, is one of the NIH Common Data Elements (Grinnon et al., 2012), and is becoming widely adopted for clinical research as well as clinical practice (Bingham et al., 2016; Wagner et al., 2015; Witter, 2016).

Other PROMIS-based measurement systems have recently been developed to assess HRQOL in the context of specific disorders. The Neurological Quality of Life (Neuro-QOL) measurement system (Cella et al., 2011) was developed to assess HRQOL in the context of neurological illness broadly. Neuro-QOL instruments were created and validated among stakeholders (patients, families, and clinicians) representing five neurological illnesses: epilepsy, Parkinson’s disease, amyotrophic lateral sclerosis, stroke, and multiple sclerosis (Gershon et al., 2012). The Neuro-QOL instruments are PROMIS instruments that were tailored to better assess HRQOL in these populations.

The Spinal Cord Injury-Quality of Life (SCI-QOL; Tulsky, Kisala, Victorson, Choi, et al.,2015; Tulsky, Kisala, Victorson, Tate, et al., 2015) and Traumatic Brain Injury – Quality of Life (TBI-QOL; Tulsky et al., 2016) measurement systems further tailored PROMIS and Neuro-QOL instruments, to create instruments to assess HRQOL among individuals with SCI or TBI. SCI- QOL and TBI-QOL include item banks linked to PROMIS where there is a PROMIS item bank of the same construct (e.g., Depression, Anxiety). When there is not a PROMIS item bank of the same construct, TBI-QOL and SCI-QOL item banks are linked with a Neuro-QOL item bank of the same construct, if available (e.g., Positive Affect and Well-being, Stigma). The TBI-QOL and SCI-QOL measurement systems also include item banks that assess constructs that are not assessed by PROMIS or Neuro-QOL (e.g., Resilience, Grief/Loss, Independence).

PROMIS and its derivatives (Neuro-QOL, SCI-QOL, TBI-QOL, and others) are founded on item response theory (IRT) and can therefore produce reliable scores from administration of relatively few items from a larger item bank (Baylor et al., 2011; Cella, Gershon, Lai, & Choi, 2007; Hays, Morales, & Reise, 2000). These subsets of items can be administered as brief, fixed-length “short forms” or by computer adaptive testing (CAT; Bjorner, Chang, Thissen, & Reeve,2007). Respondents typically complete PROMIS and related measures by independent, electronic self-administration via NIH’s Assessment Center platform (www.assessmentcenter.net) or alternate platforms such as REDCap (Harris et al., 2009) or the NIH Toolbox (Gershon et al., 2013) iPad application. However, poor visual acuity, cognitive or reading limitations, low literacy (Hahn et al., 2004), and/or poor fine motor functioning may preclude self-administration of PRO instruments, either by paper-and-pencil or electronically. One accommodation for individuals with these challenges is to have an interviewer assist the administration by reading the items aloud and/or marking the respondent’s answers. Interviewer-administration has proved extremely valuable for enhancing the accessibility of PROMIS and PROMIS-derived measures for individuals with SCI (Tulsky et al., 2012; Tulsky, Kisala, Victorson, Choi, et al., 2015; Tulsky, Kisala, Victorson, Tate, et al., 2015) and TBI (Sherer et al., 2017; Tulsky et al., 2016). For example, the pace of the interview can be individually tailored to accommodate individuals with cognitive or communication limitations. Furthermore, the presence of an interviewer (either by phone or in person) can be motivating to complete the entire set of items, and the interviewer-administered process typically results in reduced rates of missing data (McHorney, Kosinski, & Ware, 1994). However, the presence of a live administrator may bias a person’s response. For example, a respondent might under-report symptoms (“fake good”) to a live person because of the desire to maximize positive attributes and minimize negative attributes (social desirability; Nederhof, 1985). Alternatively, a person may over-report symptoms (“fake bad”) to a live person, for example, to solicit compassion or resources (Fayers & Machin, 2016, p. 515).

The definition of a PRO necessitates that the patient’s perspective is not interpreted by anyone else. It is equally important that the patient’s perspective not be biased during completion of the assessment by the presence of an interviewer. While use of an interviewer to administer PRO measures allows researchers and clinicians to include individuals who may be unable to self-administer the assessment, it is important to empirically examine the psychometric equivalence of responses obtained through interviewer-administration as compared to self-administration.

Two recent studies by Bjorner et al. (2014a, 2014b) have reported no differences between administration by paper-and-pencil and administration by personal digital assistant, or interactive voice response. A recent and comprehensive meta-analysis by Rutherford et al. (2016) of 56 studies also found no systematic bias for self-administered versus interviewer-administered PRO measures. However, one limitation of this meta-analysis was that most of the included studies only assessed a single construct (e.g., depression) or a single measure of global functioning and with a single type of participant (e.g., college students). The ability of the meta-analysis to detect mode-of-administration biases across domains of functioning may have been insufficient due to limited combinations of participant types and constructs. For example, the magnitude of difference between different modes of administration might vary between healthy college students reporting on anxiety and pain compared to patients with cancer.

Using factor analytic methods, Wang, Chen, Usinger, and Reeve (2017) recently studied the measurement invariance of PROMIS items across modes of administration (phone interview and computer self-administered survey) in a sample of men with prostate cancer approximately two years after treatment. Seven of the fourteen PROMIS measures (fatigue, depression, anxiety, diarrhea, bowel incontinence, physical function, and illness impact-negative) demonstrated strong invariance which implies an equivalent factor structure and item characteristics (i.e., factor loadings and intercepts) such that test score means and variances could be safely compared across modes. Four PROMIS measures (sleep disturbance, sleep-related impairment, illness impact-positive, and erectile function) achieved partial strong invariance with only a single item on each measure not exhibiting equivalent measurement characteristics. Finally, three PROMIS measures (pain interference, interest in sexual activity, and global satisfaction with sex life) failed to establish configural invariance, implying item responses from the two modes of administration exhibited different factor structures and therefore different metrics. However, the fact that the order of administration was not counterbalanced (computer self-assessment always preceded telephone assessment) and the sample was relatively homogenous (predominantly white male college graduates with prostate cancer living in North Carolina) must be considered when interpreting the findings from this study.

In general, previous research has indicated generally weak effects of mode-of- administration, but with some variability of results, most likely depending on the sample population, specific measures, and study design. Given the substantial amount of work that has been conducted to evaluate and optimize PROMIS and Neuro-QOL for the SCI and TBI populations and the fact that interview format has been consistently used with these groups but not with the original PROMIS and Neuro-QOL calibration samples, it is imperative to examine the effects of mode of administration on PROMIS items in these populations specifically. Accordingly, this study sought to evaluate mode-of-administration effects in a sample of participants with SCI or TBI for whom interviewer-administered measures may be an especially important option. Indeed, interviewer-administered measures may be particularly important to accommodate these participants’ physical (e.g., visual or fine-motor), cognitive (e.g., reading comprehension), or economic (e.g., lack of accessible transportation to a location with public internet access) limitations.

In sum, here we report on a study that randomized adults with SCI or TBI to complete interviewer-administered or self-administered PROMIS measures in order to test the measurement equivalence of administration modes, and, should equivalence hold, examine any differences in scale score means.

Method

Participants

Participants were adults with TBI or SCI who were recruited as part of a PROMIS II network grant that examined the validity of using PROMIS in rehabilitation populations. Participating centers were 5 medical rehabilitation institutions in the United States, and this study was approved by the Institutional Review Board at each site: University of Michigan, Rehabilitation Institute of Michigan, Craig Hospital, Shriners Hospital for Children – Philadelphia, and Shriners Hospital for Children – Chicago. While the Shriners Hospitals are primarily pediatric facilities, they serve youth up to age 21. Inclusion criteria were being age 18 or older at the time of study participation and diagnosis of traumatic SCI or complicated mild,(Williams, Levin, & Eisenberg, 1990) moderate, or severe TBI.

The sample was stratified by injury type (SCI vs. TBI) to ensure that there were sufficient participants in each diagnosis group and, especially, that there were sufficient participants in each diagnosis × mode of administration subgroup. Participants within each stratification group were randomly assigned to one of two PROMIS administration conditions: interviewer-administered or self-administered.

Modes of Administration

Following completion of informed consent, the simple random assignment feature of the Assessment Center online platform was used to randomly assign participants (with equal probability) to one of the two study conditions. Participants assigned to the self-administered condition were given login information to complete the study measures remotely independently via the Assessment Center web interface. Interviewer-administered measures were completed primarily by phone, though participants were given the option to complete the measures in person in a clinic. All interviewers were trained and certified in a standardized protocol that included the use of pre-printed (and emailed or mailed) response cards. A detailed demographics form was included at the beginning of the Assessment Center study and was completed by each participant in the same format as PROMIS measures (i.e., either by interview or independent self-report).

PROMIS Measures

Participants completed fixed-length (i.e., all participants see the identical set of items) PROMIS short forms (SF) representing several HRQL domains, as follows: PROMIS v1.0 Physical Function, Fatigue, Pain Interference, Anger, Anxiety, and Depression, and PROMIS v2.0 domain of Social Health - Emotional Support. A complete list of included PROMIS measures and sample items, by domain, is included as Table 1.

Table 1.

Included PROMIS Domains and Measures

Domain Included PROMIS Measure(s) Sample Item Stem Scoring Direction
(Higher scores indicate…)

Physical Function v1.0 Physical Function SF10a Are you able to do chores such as vacuuming or yard work? Better Function
Physical Symptoms v1.0 Fatigue SF8a How fatigued were you on average? Severe Symptoms
v1.0 Pain Interference SF8a How much did pain interfere with your day to day activities? Severe Symptoms
Emotional Distress v1.0 Anger SF8a I felt angry. Severe Symptoms
v1.0 Anxiety SF8a My worries overwhelmed me. Severe Symptoms
v1.0 Depression SF8b I felt that I had nothing to look forward to. Severe Symptoms
Social Health v2.0 Emotional Support SF8a I have someone who will listen to me when I need to talk. Better Function

Note: The name of the short form reflects the number of component items (e.g., “SF10a” indicates a 10-item form).

Scoring.

All PROMIS measures were scored automatically by the Assessment Center platform using expected IRT-based a posteriori or “response pattern scoring.” All PROMIS scores are reported as standard scores on a T-metric (M = 50, SD = 10). Higher scores indicate a greater degree of the trait being measured – for example, for physical functioning, higher scores represent better functioning whereas for depression, higher scores represent more depression. Scoring direction for each PROMIS measure is also included in Table 1.

Data Analysis

In order to test whether PROMIS scores were equivalent between the two modes of administration, differential item functioning (DIF) analyses were conducted. DIF analyses allow researchers to examine whether respondents in two or more groups respond to questionnaire items in different ways. Differential responding can reflect differences in how individuals in each group interpret a given item, ultimately impacting the meaning of any summary score derived using that item.

DIF analyses were conducted in R (version 3.3.2) using a hybrid item-response theory/ordinal logistic regression (IRT/OLR) iterative approach as programmed in the lordif R package (version 0.3–3; Choi, Gibbons, & Crane, 2011). This procedure tests for uniform and non-uniform DIF. For polytomous (i.e., ordinal) items, uniform DIF occurs when the probabilities of participants responding to certain categories differs systematically between groups across the entire range of the trait being measured. Non-uniform DIF occurs when there are differences in the probability of responding to certain categories across groups, but the direction of the difference depends on whether trait levels are low or high.

The hybrid IRT/OLR procedure was applied to each measure separately. This procedure involved two steps. In the first step, trait scores were estimated using the graded-response model (GRM; Samejima, 1969), the same model used for the development of PROMIS measures (Cella et al., 2010b). Assumptions underlying the GRM, unidimensionality and local independence, were checked by estimating categorical confirmatory factor analysis models (CCFA) in Mplus (version 8.0; Muthen & Muthen, 1998–2017) using the WLSMV estimator. To test for unidimensionality, fit indices were inspected and compared to traditional cutoff values (CFI >.95, RMSEA < .06, WRMR < 1.0; Hu & Bentler, 1999; Yu, 2002). Additionally, indices of factor “strength”, the degree to which item variance is partitioned into a general factor underlying all items in a scale versus other factors thought to reflect extraneous dimensions, were computed (Reise, Scheines, Widaman, & Haviland, 2013). These indices included the percentages of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (omegaH). Preliminary guidelines suggest that if PUC < .80, then ECV > .60 and omegaH > .70 roughly indicate that estimated latent trait scores will not produce biased estimates of relations with other variables in structural models (Reise et al., 2013). Local independence was evaluated by inspecting residual item correlations; |r’s| greater than .20 were considered indicative of local item dependence.

In the second step, a DIF detection procedure was initiated. Prior to initiation, response categories were collapsed for specific items if any cells in the two-way contingency table between response format condition and item response categories had less than 5 observations. At each iteration of the DIF detection procedure, items were analyzed using three univariate OLR models: (a) a baseline model (predictor: PROMIS trait scores), (b) a uniform DIF model (predictors: PROMIS trait scores + response format condition), and (c) a non-uniform DIF model (predictors: PROMIS trait scores + response format condition + interaction). Items were flagged as having DIF if significant (p < .01) likelihood ratio test (LRT) statistics emerged when comparing the baseline model to the uniform DIF model, the baseline model to the non-uniform DIF model, or the uniform DIF model to the non-uniform DIF model. At the conclusion of each iteration, PROMIS trait scores were recalculated using the GRM and accounted for DIF items identified from the OLR models - a process referred to as latent trait “purification” (Choi et al., 2011). The “purified” trait scores were then used in the next DIF detection iteration. The procedure concluded if the same items were flagged as containing DIF in successive iterations or the maximum number of iterations was reached (10).

In order to ensure this hybrid IRT/OLR procedure did not rely exclusively on significance tests for flagging DIF items, the procedure was repeated for each measure using McFadden’s pseudo R2 as the DIF item flagging criterion. Specifically, an item was flagged for DIF if an difference in R2R2) greater than .035 was observed between models (Jodoin & Gierl, 2001). Furthermore, empirically-derived cutoff values for the LRT statistic and ΔR2, against which results from the final iteration of each IRT/OLR procedure (using both flagging criteria) were compared, were obtained via Monte Carlo simulation. More details regarding the IRT/OLR procedure can be found in Choi et al. (2011).

Primary analyses were conducted on the combined sample (SCI+TBI). Secondary DIF analyses were conducted on each subgroup separately, although due to the smaller sample sizes standard OLR DIF analyses using observed short-form trait scores were used. Missing data rates at the item level were low. Across measures, the maximum proportion of missing observations for an item were as follows: Physical Function (2.9%), Fatigue (1.1%), Pain Interference (1.1%), Anger (0.7%), Anxiety (0.7%), Depression (1.4%), Emotional Support (1.1%). Therefore, analyses in the IRT/OLR procedure and standalone OLR subsample analyses were completed using pairwise deletion.

Results

Participants

A total of 149 adults with SCI completed the assessment. However, 1 individual originally assigned to the self-report condition broke randomization (due to difficulty navigating the assessment independently) and was excluded from analyses. Therefore, the final subsample of individuals with SCI included 148 individuals ages 18–77 (M = 40.1, SD = 14.8), including 67 individuals assigned to the independent self-report condition and 81 assigned to the interviewer- administered condition. Table 2 presents the demographic characteristics of the SCI group by study mode. The administration mode groups did not significantly differ on any demographic, functional, or injury-related characteristics. Participants in the study had been living with SCI for 0.25–42.5 years with an average of 9.4 (SD = 8.2) years and were roughly evenly split between para- and tetraplegia levels of injury. Overall, 132 adults with TBI completed the assessment. However, 3 broke randomization due to an unanticipated lack of internet access. The final TBI subsample thus included 129 individuals ages 18–77 (M = 40.05, SD = 14.8), including 58 in the self-administered condition and 71 in the interviewer-administered condition. Table 1 also presents the demographic characteristics of both TBI conditions. The two conditions did not differ on any demographic or injury-related characteristics. Participants had been living with their injury from 0.3 to 24.5 years (M = 7.9, SD = 6.0). There was a range of TBI severity represented in the sample with 23.3% complicated-mild, 18.6% moderate, 44.2% severe, and 14% unknown. More detailed information on participant characteristics is provided in Table 2.

Table 2.

Demographic Characteristics of the SCI and TBI Samples by Mode of Administration

SCI subsample TBI subsample

Self-report Interview Self-report Interview

N 67 81 58 71
Age(Years)
M (SD) 39.2 (14.7) 40.7 (15.0) 36.7 (14.6) 38.3 (14.9)
Years Injured
M (SD) 8.7 (7.7) 10.0 (8.6) 7.4 (6.2) 8.2 (6.5)
Gender(%)
 Male 67.2 69.1 62.1 74.6
 Female 32.8 30.9 36.2 25.4
 Not reported 0.0 0.0 1.7 0.0
Race (%)
 Caucasian 68.7 74.1 55.2 53.5
 African American 23.9 21.0 39.7 32.4
 Other 4.5 4.9 5.2 14.1
 Unknown 3.0 0.0 0.0 0.0
 Ethnicity (%)
 Not Hispanic or Latino 81.5 85.9 93.4 93.0
 Hispanic or Latino 6.2 8.1 1.7 4.2
 Not Provided 12.3 6.1 4.9 2.8
Education (%)
 8th Grade or Less 3.0 2.5 0.0 0.0
 Some High School 4.5 12.3 15.5 8.5
 High School Diploma 29.9 24.7 20.7 26.8
 Some College 38.8 37.0 43.1 42.3
 College Degree 9.0 16.0 15.5 14.1
 Some Grad School 9.0 2.5 0.0 1.4
 Graduate Degree 6.0 4.9 3.4 7.0
 Not Provided 0.0 0.0 1.7 0.0
Work Status (%)
 Working 23.9 13.6 25.9 33.8
 Out of work < 1 year 4.5 2.5 6.9 2.8
 Out of work > 1 year 19.4 17.3 8.6 1.4
 Homemaker 3.0 0.0 0.0 1.4
 Student 14.9 19.8 24.1 15.5
 Retired 7.5 3.7 10.3 4.2
 Unable to Work 26.9 43.2 22.4 40.8
 Not Provided 0.0 0.0 1.7 0.0
SCI Diagnosis (%)
 Paraplegia 40.3 39.5 n/a n/a
 Tetraplegia 52.2 50.6 n/a n/a
 Missing/Unknown 7.5 9.9 n/a n/a
Completeness (%)
 Complete 47.8 48.1 n/a n/a
 Incomplete 41.8 40.7 n/a n/a
 Not Known 10.4 11.1 n/a n/a
TBI Severity (%)
 Mild n/a n/a 24.1 22.5
 Moderate n/a n/a 15.5 21.1
 Severe n/a n/a 46.6 38.0
 Unknown/Missing n/a n/a 13.8 18.3

Differential Item Functioning

Assessment of GRM Unidimensionality and Local Independence.

The CCFA models provided conflicting results for unidimensionality analyses. Factor loadings were high, ranging from a minimum of .755 (Anger) to a maximum of .973 (Physical Function) across all PROMIS banks. Chi-square test statistics ranged between 62.489 (Fatigue) and 207.457 (Physical Function) and were all significantly different from zero (based on 20 degrees of freedom). CFI values were all greater than .984 whereas RMSEA values ranged between .088 (Fatigue) and .179 (Depression). WRMR values ranged between .589 (Fatigue) and 1.646 (Physical Function), with two banks exhibiting values greater than 1.0 (Physical Function, Depression). Thus, according to the CFI, all of the banks exhibited unidimensionality, whereas the RMSEA and WRMR values indicated violations in all (RMSEA) or two (WRMR) of the banks.

Other researchers have encountered similar discrepancies between the CFI and RMSEA when using traditional structural equation modeling fit indices for unidimensionality tests in PROMIS short forms (Cook, Kallen, & Amtmann, 2009; Jones, Tommet, Ramirez, Jensen, & Teresi, 2016). This has led some authors to question the use of fit indices, particularly RMSEA, for tests of unidimensionality (Cook et al., 2009; Reise et al., 2013). As an alternative, indices based on bi-factor models - the PUC, EVC, and omegaH - have been suggested as alternative indicators of whether scales are “unidimensional enough” for subsequent analyses with other variables (Reise et al., 2013). PUC values were greater than .80 for three of the four banks (Fatigue, Pain, Depression). Physical Function and Emotional Support values also had high PUC values (.786) whereas the Anger and Anxiety short forms exhibited lower values (.571 and .464, respectively). All EVC values were greater than .72 and all omegaH values were greater than .81. Therefore, given the strong factor loadings and high values for the CFI and bi-factor strength indices, we concluded unidimensionality was supported. Regarding local independence, only two item pairs across all of the banks (Physical Function PROMIS item #26 with items #36 and #37) exhibited residual correlations greater than .20 (#26/#36 r = −.262; #26/#37 r = −.244) with the majority of item pairs showing residual correlations less than .05. Thus, for the Physical Function item bank, we conducted a sensitivity analyses by running the DIF analyses with and without item #26.

DIF Results.

Results from the DIF analyses are shown in Table 3. The results show ranges of LRT p-values and ΔR2 statistics across all items in a given measure for the OLR model comparisons. No items were flagged for DIF in the full sample. Although some p-values associated with LRT statistics were close to the .01 threshold, all of the ΔR2 change statistics were well below the .035 threshold. Additionally, none of the items were flagged according to the Monte Carlo-based thresholds. Similarly, no items were flagged for DIF in either of the condition-specific subsamples (Table 3). Sensitivity analyses further confirmed no indications of DIF in the Physical Function measure after removing item #26.

Table 3.

Differential Item Functioning Analysis Results

LRT p-value ΔR2

M1 vs. M2 M2 vs. M3 M1 vs. M2 M2 vs. M3

Full sample
 Physical Function .013–.814 .249–.978 .000–.010 .001–.002
 Fatigue .197–.938 .096–.816 .000–.002 .000–.003
 Pain Interference .501–.966 .132–.945 .000–.006 .000–.003
 Anger .168–.990 .037–.779 .000–.003 .000–.006
 Anxiety .075–.618 .117–.721 .000–.004 .000–.004
 Depression .029–.456 .111–.928 .001–.008 .000–.004
 Emotional Support .041–.890 .094–.932 .000–.005 .000–.004
SCI subsample
 Physical Function .056–.768 .025–.885 .000–.027 .000–.013
 Fatigue .072–.960 .164–.846 .000–.008 .000–.005
 Pain Interference .004–.892 .032–.929 .000–.019 .000–.010
 Anger .033–.989 .009–.971 .000–.015 .000–.017
 Anxiety .021–.855 .199–.926 .000–.018 .000–.004
 Depression .140–.952 .042–.924 .000–.008 .000–.013
 Emotional Support .039–.980 .058–.873 .000–.011 .000–.010
TBI subsample
 Physical Function .148–.873 .050–.998 .000–.013 .000–.022
 Fatigue .043–.835 .007–.777 .000–.015 .000–.022
 Pain Interference .064–.893 .305–.888 .000–.018 .000–.004
 Anger .191–.954 .016–.834 .000–.005 .000–.021
 Anxiety .120–.987 .021–.882 .000–.009 .000–.020
 Depression .001–.544 .348–.683 .002–.031 .001–.004
 Emotional Support .077–.880 .333–.879 .000–.012 .000–.003

Note: All values correspond to ranges across all items for a given PROMIS measure. M1 = Baseline OLR model. M2 = Uniform DIF OLR model. M3 = Non-uniform DIF OLR model. LRT p-value = p-value for LRT comparing two models. ΔR2 = difference in McFadden’s R2 between two models.

Differences in mean trait scores for the two response format conditions are shown in Table 4. Overall, t-tests revealed that participants in the interview-administered condition exhibited higher Emotional Support scores (higher positive trait) on average as compared with participants in the self-administered condition. This pattern was observed in the SCI subsample, but not the TBI subsample. Participants in the self-administered condition reported lower scores on average on the Depression short-form (lower negative trait) compared to participants in the interview-administered condition for the SCI subsample only.

Table 4.

Differences in PROMIS Measures by Mode of Administration and Diagnosis

Self-Report Interview

n M SD n M SD D t df p

Full sample
  Physical Function 119 38.87 13.70 150 39.09 14.06 −0.22 −0.13 256.04 0.90
  Fatigue 123 51.68 9.87 152 50.58 9.89 1.10 0.91 261.38 0.36
  Pain Interference 120 54.37 9.58 151 53.78 9.81 0.60 0.50 257.88 0.62
  Anger 123 50.66 9.24 152 50.46 9.53 0.20 0.18 264.27 0.86
  Anxiety 123 52.67 8.85 152 52.52 8.99 0.15 0.14 262.76 0.89
  Depression 123 52.61 9.15 151 50.84 9.38 1.78 1.58 263.31 0.11
  Emotional Support 121 48.56 8.12 152 51.13 9.18 −2.57 −2.45 267.93 0.01
SCI subsample
  Physical Function 62 28.66 8.39 79 29.69 10.04 −1.03 −0.67 138.42 0.51
  Fatigue 66 53.31 9.12 81 50.71 9.60 2.60 1.68 141.58 0.10
  Pain Interference 66 56.36 10.49 80 54.94 10.39 1.42 0.82 138.26 0.41
  Anger 66 51.70 8.80 81 49.71 9.55 1.99 1.31 142.79 0.19
  Anxiety 66 53.14 8.86 81 51.87 8.68 1.27 0.87 137.89 0.39
  Depression 66 53.62 8.62 81 50.53 9.35 3.08 2.08 142.75 0.04
  Emotional Support 65 46.98 7.46 81 51.68 8.89 −4.69 −3.47 143.70 0.00
TBI subsample 57 49.97 8.83 71 49.54 9.91 0.43 0.26 124.59 0.80
  Physical Function 57 49.78 10.45 71 50.43 10.29 −0.64 −0.35 119.30 0.73
  Fatigue 54 51.95 7.77 71 52.47 9.01 −0.53 −0.35 121.04 0.73
  Pain Interference 57 49.44 9.65 71 51.30 9.49 −1.86 −1.09 119.25 0.28
  Anger 57 52.13 8.89 71 53.27 9.34 −1.14 −0.71 122.36 0.48
  Anxiety 57 51.45 9.67 70 51.18 9.47 0.27 0.16 118.77 0.87
  Depression 56 50.39 8.54 71 50.51 9.52 −0.12 −0.08 122.90 0.94
  Emotional Support 57 49.97 8.83 71 49.54 9.91 0.43 0.26 124.59 0.80

Note: D = Differences in means between administration format conditions; positive values reflect higher means in the self-report condition whereas negative values reflect higher means in the interview condition. Degrees of freedom calculated using Satterthwaite approximation.

Discussion

PROMIS measures have been calibrated and validated with participants responding to items via self-administration on a computer screen. However, this mode of administration may be inaccessible to individuals with physical (e.g., upper limb and/or fine motor functioning, visual acuity) or cognitive (e.g., reading comprehension) limitations. Furthermore, individuals with significant mobility limitations (e.g., individuals with SCI who are power-wheelchair users) may face additional barriers in their lack of accessible transportation to the lab or clinic (to complete PROMIS measures on study/clinic tablet) or to facilities with public internet access (e.g., libraries). One way of making PROMIS measures more accessible to these individuals is by interviewer-administration. However, it is important to ensure that these two modes of administration are truly equivalent. Here, we report on a study that randomly assigned 277 individuals with TBI or SCI (populations for whom interviewer-assisted administration may be necessary) to one of the two administration types to assess whether there is measurement equivalence or observed mean differences across administration modes.

The current results indicate that the PROMIS measures demonstrate measurement equivalence across modes of administration in individuals with SCI or TBI, collectively and in each subsample separately. However, with regard to observed mean scores in the full sample, participants in the interviewer-administered group had significantly higher scores for Emotional Support compared to the self-administered group. Significant differences also emerged between the interviewer-administered and self-administered conditions for the SCI subsample, with higher scores for Emotional Support and lower scores for Depression. These findings are consistent with a CDC finding of elevated mean scores when using interviewer administration of the PROMIS global health measure. In this study, Weaver et al. (2012) reported means of 53.3 and 53.9 for Physical and Mental Health subscales, respectively, in contrast to the PROMIS validation sample self-administered means of 49.8 for Physical and 49.0 for Mental Health. Taken together with the current results, this suggests that some PROMIS measures may be subject to more “positive” responses (i.e., indicating better functioning) when interviewer- administered. Because DIF was not detected in any of the PROMIS measures, respondents with similar levels of a given latent trait would be expected to have similar probabilities of responding to specific response categories regardless of the mode of administration (i.e., participants interpreted items the same way and are being measured on the same trait). Therefore, the few observed differences in estimated latent trait scores may be due to the effect of an interviewer on the respondent’s desire to demonstrate social desirability.

Overall, these results indicate that interviewer-administration is equivalent to self-administration and can be used with individuals that may need the accommodation, although some caution may be needed when interpreting Depression and Emotional Support scores for persons with SCI.

These findings, while noteworthy, must be considered within the context of this study’s limitations. First, the study was cross-sectional and did not involve comparisons within individuals. The findings reported here could be bolstered in future studies using between- and within-person longitudinal designs. Second, while we were able to enroll a sample comprising two conditions with varying injury characteristics for which interview accommodations can be particularly important, our results are nevertheless restricted by the demographic makeup of these individuals. Of note, our sample may have overrepresented persons with high educational attainment and underrepresented specific racial categories, notably individuals who are Hispanic or Latino/a. Future studies investigating PROMIS mode of administration should continue to recruit samples representing a wide berth of health conditions and demographic categories. Third, the SCI and TBI sample sizes were below rules of thumb suggested for conducting DIF testing with logistic regression and likelihood ratio tests (n = 200 per group; Scott et al., 2009). Although we attempted to mitigate any potential power loss by using effect size measures and Monte-Carlo derived cut-offs in addition to likelihood ratio tests, future research should replicate the results presented here using larger samples. Moreover, condition-specific sample sizes were particularly modest and thus results pertaining exclusively to SCI or TBI should be interpreted with extreme caution until further replication. Finally, the lordif package does not permit tests of the OLR proportional odds assumption. While we have no reason to believe this assumption was violated, the DIF results are predicated on assuming equal logistic coefficients across response categories.

Conclusion

As is consistent with other recent findings, PROMIS measures appear to be DIF-free across the two modes of administration studied here, although some individuals with SCI may under-report Depression and some individuals with SCI or TBI may over-report Emotional Support on interviewer-administered measures (perhaps reflecting social desirability effects). The flexibility to use an alternate administration is critical when conducting research with special populations such as individuals with physical or cognitive disabilities, or with individuals who may not be able to easily access internet-wired devices.

Acknowledgments

PROMIS® was funded with cooperative agreements from the National Institutes of Health (NIH) Common Fund Initiative (Northwestern University, PI: David Cella, PhD, U54AR057951, U01AR052177; Northwestern University, PI: Richard C. Gershon, PhD, U54AR057943; American Institutes for Research, PI: Susan (San) D. Keller, PhD, U54AR057926; State University of New York, Stony Brook, PIs: Joan E. Broderick, PhD and Arthur A. Stone, PhD, U01AR057948, U01AR052170; University of Washington, Seattle, PIs: Heidi M. Crane, MD, MPH, Paul K. Crane, MD, MPH, and Donald L. Patrick, PhD, U01AR057954; University of Washington, Seattle, PI: Dagmar Amtmann, PhD, U01AR052171; University of North Carolina, Chapel Hill, PI: Harry A. Guess, MD, PhD (deceased), Darren A. DeWalt, MD, MPH, U01AR052181; Children’s Hospital of Philadelphia, PI: Christopher B. Forrest, MD, PhD, U01AR057956; Stanford University, PI: James F. Fries, MD, U01AR052158; Boston University, PIs: Alan Jette, PT, PhD, Stephen M. Haley, PhD (deceased), and David Scott Tulsky, PhD (University of Michigan, Ann Arbor), U01AR057929; University of California, Los Angeles, PIs: Dinesh Khanna, MD (University of Michigan, Ann Arbor) and Brennan Spiegel, MD, MSHS, U01AR057936; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR052155; Georgetown University, PIs: Carol. M. Moinpour, PhD (Fred Hutchinson Cancer Research Center, Seattle) and Arnold L. Potosky, PhD, U01AR057971; Children’s Hospital Medical Center, Cincinnati, PI: Esi M. Morgan DeWitt, MD, MSCE, U01AR057940; University of Maryland, Baltimore, PI: Lisa M. Shulman, MD, U01AR057967; and Duke University, PI: Kevin P. Weinfurt, PhD, U01AR052186). NIH Science Officers on this project have included Deborah Ader, PhD, Vanessa Ameen, MD (deceased), Susan Czajkowski, PhD, Basil Eldadah, MD, PhD, Lawrence Fine, MD, DrPH, Lawrence Fox, MD, PhD, Lynne Haverkos, MD, MPH, Thomas Hilton, PhD, Laura Lee Johnson, PhD, Michael Kozak, PhD, Peter Lyster, PhD, Donald Mattison, MD, Claudia Moy, PhD, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, Peter Scheidt, MD, Ashley Wilder Smith, PhD, MPH, Susana Serrate-Sztein, MD, William Phillip Tonkins, DrPH, Ellen Werner, PhD, Tisha Wiley, PhD, and James Witter, MD, PhD. We thank Catriona Mowbray, PhD, RN at the Children’s National Health System site, as well as Susan Massengill at Levine Children’s Hospital and Rasheed Gbadegesin at Duke University for important contribuitons, and Karon Cook, PhD, and Ron Hays, PhD, for very helpful comments on an earlier draft. The contents of this article uses data developed under PROMIS. These contents do not necessarily represent an endorsement by the US Federal Government or PROMIS. See www.nihpromis.org for additional information on the PROMIS® initiative.

We would also like to acknowledge an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number U54-GM104941 (PI: Binder-Macleod) which provided salary support for Dr. Cohen.

Footnotes

All authors have contributed significantly to the design, analysis and writing of this manuscript. The contents represent original work and have not been published elsewhere. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the authors or upon any organization with which the authors are associated.

Disclosure: Authors have no financial disclosures related to this project.

References

  1. Baylor C, Hula W, Donovan NJ, Doyle PJ, Kendall D, & Yorkston K (2011). An introduction to item response theory and Rasch models for speech-language pathologists. Am J Speech Lang Pathol, 20(3), 243–259. doi: 10.1044/1058-0360(2011/10-0079) [DOI] [PubMed] [Google Scholar]
  2. Bingham CO 3rd., Bartlett SJ, Merkel PA, Mielenz TJ, Pilkonis PA, Edmundson L, . . .Sabharwal RK (2016). Using patient-reported outcomes and PROMIS in research and clinical applications: experiences from the PCORI pilot projects. Qual Life Res. doi: 10.1007/s11136-016-1246-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bjorner JB, Chang CH, Thissen D, & Reeve BB (2007). Developing tailored instruments: item banking and computerized adaptive assessment. Quality of Life Research, 16 Suppl 1, 95–108. doi: 10.1007/s11136-007-9168-6 [DOI] [PubMed] [Google Scholar]
  4. Bjorner JB, Rose M, Gandek B, Stone AA, Junghaenel DU, & Ware JE (2014a). Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity. Journal of clinical epidemiology, 67(1), 108–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bjorner JB, Rose M, Gandek B, Stone AA, Junghaenel DU, & Ware JE Jr.. (2014b). Difference in method of administration did not significantly impact item response: an IRT-based analysis from the Patient-Reported Outcomes Measurement Information System (PROMIS) initiative. Qual Life Res, 23(1), 217–227. doi: 10.1007/s11136-013-0451-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cella D, Gershon R, Lai JS, & Choi S (2007). The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Qual Life Res, 16 Suppl 1, 133–141. doi: 10.1007/s11136-007-9204-6. [DOI] [PubMed] [Google Scholar]
  7. Cella D, Nowinski C, Peterman A, Victorson D, Miller D, Lai J-S, & Moy C (2011). The Neurology Quality-of-Life Measurement Initiative. Archives of Physical Medicine and Rehabilitation 92(10, Supplement), S28–S36. doi: 10.1016/j.apmr.2011.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, . . . Hays R (2010a). The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol, 63(11), 1179–1194. doi:S0895–4356(10)00173–3 [pii] 10.1016/j.jclinepi.2010.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, . . . Hays R (2010b). The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J.Clin.Epidemiol., 63(11), 1179–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Choi SW, Gibbons LE, & Crane PK (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of statistical software, 39(8), 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cook KF, Kallen MA, & Amtmann D (2009). Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption. Quality of Life Research, 18(4), 447–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fayers PM, & Machin D (2016). Quality of Life: Third Edition. Chichester: John Wiley & Sons, Ltd. [Google Scholar]
  13. Frank L, Basch E, & Selby JV (2014). The PCORI perspective on patient-centered outcomes research. JAMA, 312(15), 1513–1514. [DOI] [PubMed] [Google Scholar]
  14. Gershon RC, Lai JS, Bode R, Choi S, Moy C, Bleck T, . . . Cella D (2012). Neuro-QOL: quality of life item banks for adults with neurological disorders: item development and calibrations based upon clinical and general population testing. Qual Life Res, 21(3), 475–486. doi: 10.1007/s11136-011-9958-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, & Nowinski CJ (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology, 80(11 Suppl 3), S2–6. doi: 10.1212/WNL.0b013e3182872e5f [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Grinnon ST, Miller K, Marler JR, Lu Y, Stout A, Odenkirchen J, & Kunitz S (2012). National Institute of Neurological Disorders and Stroke Common Data Element Project - approach and methods. Clin Trials, 9(3), 322–329. doi: 10.1177/1740774512438980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hahn EA, Cella D, Dobrez D, Shiomoto G, Marcus E, Taylor SG, . . . Webster K (2004). The talking touchscreen: a new approach to outcomes assessment in low literacy. Psychooncology, 13(2), 86–95. doi: 10.1002/pon.719 [DOI] [PubMed] [Google Scholar]
  18. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, & Conde JG (2009). Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics, 42(2), 377–381. doi: 10.1016/j.jbi.2008.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hays RD, Morales LS, & Reise SP (2000). Item response theory and health outcomes measurement in the 21st century. Medical care, 38(9 Suppl), II28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hu L. t., & Bentler PM (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 1–55. [Google Scholar]
  21. Jodoin MG, & Gierl MJ (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349. [Google Scholar]
  22. Jones RN, Tommet D, Ramirez M, Jensen R, & Teresi JA (2016). Differential item functioning in Patient Reported Outcomes Measurement Information System (PROMIS®) Physical Functioning short forms: Analyses across ethnically diverse groups. Psychological Test and Assessment Modeling, 58(2), 371–402. [Google Scholar]
  23. McHorney CA, Kosinski M, & Ware JE Jr. (1994). Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: results from a national survey. Med Care, 32(6), 551–567. [DOI] [PubMed] [Google Scholar]
  24. Muthén LK, & Muthen BO (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthen & Muthen. [Google Scholar]
  25. Nederhof AJ (1985). Methods of coping with social desirability bias: A review. European journal of social psychology, 15(3), 263–280. [Google Scholar]
  26. Reise SP, Scheines R, Widaman KF, & Haviland MG (2013). Multidimensionality and structural coefficient bias in structural equation modeling: A bifactor perspective. Educational and Psychological Measurement, 73(1), 5–26. [Google Scholar]
  27. Rutherford C, Costa D, Mercieca-Bebber R, Rice H, Gabb L, & King M (2016). Mode of administration does not cause bias in patient-reported outcome results: a meta-analysis. Qual Life Res, 25(3), 559–574. doi: 10.1007/s11136-015-1110-8 [DOI] [PubMed] [Google Scholar]
  28. Sackett DL, Rosenberg WM, Gray JM, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. Bmj, 312(7023), 71–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Samejima F (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. [Google Scholar]
  30. Scott NW, Fayers PM, Aaronson NK, Bottomley A, de Graeff A, Groenvold M, . . . Sprangers MA (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of clinical epidemiology, 62(3), 288–295. [DOI] [PubMed] [Google Scholar]
  31. Sherer M, Nick TG, Sander AM, Melguizo M, Hanks R, Novack TA, . . . Tang X (2017). Groupings of Persons With Traumatic Brain Injury: A New Approach to Classifying Traumatic Brain Injury in the Post-Acute Period. J Head Trauma Rehabil, 32(2), 125–133. doi: 10.1097/HTR.0000000000000207. [DOI] [PubMed] [Google Scholar]
  32. Tulsky DS, Jette AM, Kisala PA, Kalpakjian C, Dijkers MP, Whiteneck G, . . . Zanca J (2012). Spinal cord injury-functional index: item banks to measure physical functioning in individuals with spinal cord injury. Arch Phys Med Rehabil, 93(10), 1722–1732. doi: 10.1016/j.apmr.2012.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Tulsky DS, Kisala PA, Victorson D, Carlozzi N, Bushnik T, Sherer M, . . . Cella D (2016). TBI-QOL: Development and Calibration of Item Banks to Measure Patient Reported Outcomes Following Traumatic Brain Injury. J Head Trauma Rehabil, 31(1), 40–51. doi: 10.1097/HTR.0000000000000131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Tulsky DS, Kisala PA, Victorson D, Choi SW, Gershon R, Heinemann AW, & Cella D (2015). Methodology for the development and calibration of the SCI-QOL item banks. J Spinal Cord Med, 38(3), 270–287. doi: 10.1179/2045772315Y.0000000034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Tulsky DS, Kisala PA, Victorson D, Tate DG, Heinemann AW, Charlifue S, . . . Cella D (2015). Overview of the Spinal Cord Injury - Quality of Life (SCI-QOL) measurement system. J Spinal Cord Med, 38(3), 257–269. doi: 10.1179/2045772315Y.0000000023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. U. S. Food and Drug Administration (2006). Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. . Retrieved from http://www.Fda.Gov/Cder/Guidance/5460dft.Pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wagner LI, Schink J, Bass M, Patel S, Diaz MV, Rothrock N, . . . Cella D (2015). Bringing PROMIS to practice: Brief and precise symptom screening in ambulatory cancer care. Cancer, 121(6), 927–934. doi: 10.1002/cncr.29104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Wang M, Chen RC, Usinger DS, & Reeve BB (2017). Evaluating measurement invariance across assessment modes of phone interview and computer self-administered survey for the PROMIS measures in a population-based cohort of localized prostate cancer survivors. Qual Life Res. doi: 10.1007/s11136-017-1640-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Weaver KE, Forsythe LP, Reeve BB, Alfano CM, Rodriguez JL, Sabatino SA, . . . Rowland JH (2012). Mental and physical health-related quality of life among U.S. cancer survivors: population estimates from the 2010 National Health Interview Survey. Cancer Epidemiol Biomarkers Prev, 21(11), 2108–2117. doi: 10.1158/1055-9965.EPI-12-0740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Williams DH, Levin HS, & Eisenberg HM (1990). Mild head-injury classification. Neurosurgery, 27(3), 422–428. [DOI] [PubMed] [Google Scholar]
  41. Witter JP (2016). The Promise of Patient-Reported Outcomes Measurement Information System-Turning Theory into Reality: A Uniform Approach to Patient-Reported Outcomes Across Rheumatic Diseases. Rheum Dis Clin North Am, 42(2), 377–394. doi: 10.1016/j.rdc.2016.01.007. [DOI] [PubMed] [Google Scholar]
  42. Yu C-Y (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes (Vol. 30): University of California, Los Angeles Los Angeles. [Google Scholar]

RESOURCES