Skip to main content
Archives of Rehabilitation Research and Clinical Translation logoLink to Archives of Rehabilitation Research and Clinical Translation
. 2024 Jan 5;6(1):100320. doi: 10.1016/j.arrct.2024.100320

New Dizziness Impact Measures of Positional, Functional, and Emotional Status Were Supported for Reliability, Validity, and Efficiency

Daniel Deutscher a,b,, Deanna Hayes a, Michael A Kallen c
PMCID: PMC10928300  PMID: 38482099

Highlights

  • Developed by applying modern methods to Dizziness Handicap Inventory items, the Dizziness Functional Status CAT and short form, the 4-item Dizziness Positional Status, and the 6-item Dizziness Emotional Status are new item-response theory-based patient-reported outcome measures.

  • The new measures were reliable and valid for assessing perceived dizziness effect on positional, functional, and emotional status.

    This study supports a transition to patient-reported outcome measures that are based on modern measurement approaches to achieve the combined benefits of high score precision and efficiency by reducing response and administration burdens for patients and care providers.

KEYWORDS: Computerized Adaptive Testing, Dizziness, Dizziness Handicap Inventory, Functional status, Item response theory, Patient-reported outcome measures, Vertigo, Vestibular

Abstract

Objective

To calibrate the 25 items from the Dizziness Handicap Inventory (DHI) patient-reported outcome measure (PROM), using item response theory (IRT), into 1 or more item banks, and assess reliability, validity, and administration efficiency of scores derived from computerized adaptive test (CAT) or short form (SF) administration modes.

Design

Retrospective cohort study.

Setting

Outpatient rehabilitation clinics.

Participants

Patients (N=28,815; women=69%; mean age [SD]=60 [18]) included in a large national dataset and assessed for dizziness-related conditions who responded to all DHI items at intake.

Interventions

Not applicable.

Main Outcome Measures

IRT model assumptions of unidimensionality, local item independence, item fit, and presence of differential item functioning (DIF) were evaluated. Generated scores were assessed for reliability, validity, and administration efficiency.

Results

Patients were treated in 976 clinics from 49 US states for either vestibular-, brain injury-, or neck-related impairments. Three unidimensional item banks were calibrated, creating 3 distinct PROMs for Dizziness Functional Status (DFS, 13 items), Dizziness Positional Status (DPS, 4 items), and Dizziness Emotional Status (DES, 6 items). Two items did not fit into any domain. A DFS-CAT and a DFS 7-item SF were developed. Except for 2 items by age groups and 1 item by main impairment, no items were flagged for DIF; DIF impact was negligible. Median reliability estimates were 0.91, 0.72, and 0.79 for the DFS, DPS, and DES, respectively. Scores discriminated between patient groups in clinically logical ways and had a large effect size (>0.8), with acceptable floor and ceiling effects (<15%), except for a floor effect for DPS (20.4%). DFS-CAT scores were generated using a median of 8 items; they correlated highly with full-bank scores (r=0.99).

Conclusion

The 3 dizziness impact PROMs demonstrated moderate to high reliability, were valid, and highly responsive to change; thus, they are suitable for research and routine clinical administration.


Dizziness is a common health concern affecting roughly 15%-20% of adults.1 Vestibular disorders, such as benign paroxysmal positional vertigo and vestibular hypofunction, account for about 25% of dizziness cases.1,2 Vestibular and other dizziness-related conditions significantly affect quality of life,3 are costly,4,5 and render notably increased risk for falls,6, 7, 8 which are a growing public health problem with tremendous economic impact.9,10

The 25-item Dizziness Handicap Inventory (DHI) is a widely used patient-reported outcome measure (PROM).11,12 It was developed using classical test theory methods to quantify perceived levels of handicap due to dizziness, with higher scores indicating greater handicap (0-100 scale). Functional, emotional, and physical subdomains were developed using non-empirical methods,11 and subsequent research has been unsupportive of their validity.13, 14, 15, 16 Two short versions of the DHI have been developed: a version using 10 items, shown to have the highest item-total correlations,17 and a 13-item version,18 shown to lack essential unidimensionality.19

This study sought to apply modern psychometric methods to DHI items with the main objective of developing unidimensional measures of patient-perceived impact of dizziness. We evaluated the suitability of DHI items for item response theory (IRT)-based item banks and assessed the reliability, validity, and efficiency of the new PROMs, including the development of computerized adaptive testing (CAT) and short-form (SF) administration modes.

Methods

Design, data collection, and patient selection

This study assessed retrospective cohort data collected from outpatient rehabilitation clinics participating in a large national patient database system in the United States, the Focus on Therapeutic Outcomes (FOTO; Net Health Systems, Inc, Pittsburgh PA, USA) Patient Outcomes system.20 The study was exempt from informed consent after review by Solutions IRB, a private institutional review board located in Yarnell, Arizona.

Patients were 14 years old or older, which is consistent with multiple PROMs developed recently for assessing functional status related to different condition-specific impairments including musculoskeletal,21 edema,22,23 stroke,24 and balance confidence.25 Patients started an episode of care between January 2017 and December 2020, were discharged from therapy, completed all DHI items at intake (initial evaluation), and responded to a set of demographic and health questions described below as part of their standard assessment. Intake data served for measure development and reliability testing. Data from patients who completed DHI candidate items at both intake and discharge were used to assess known-groups validity, longitudinal validity, and score coverage.

Candidate items

The candidate item pool was composed of all 25 DHI items. DHI items ask patients about difficulty with aspects of functional status due to dizziness or unsteadiness. Original response categories were reversed to be coded as 1=yes, 2=sometimes, and 3=no to reflect lower dizziness impact (higher status) with higher scores. Established methods were used for Spanish translations of items.26

Measure development and reliability estimates

Evaluation of IRT model assumptions and fit of an IRT model

IRT is a method for calibrating and scoring items that considers 1 or more parameters on which items are characterized, including the “difficulty” of items (eg, 1-parameter/Rasch model). Additional parameters, such as item discrimination between different levels of ability, can be considered (2-parameter model). The IRT model estimates the probabilities that people with different levels of function would respond to each item response category. For example, the item “Does walking down a sidewalk increase your problem?” represents a relatively low-level activity for patients with dizziness. The probability of responding “yes” is high for those with very poor function but very low for those with excellent function. Similarly, the item “Because of your problem, is it difficult for you to do strenuous homework or yard work” represent high level activities; thus, the probability of responding “no” is low for those with poor function but high for those with excellent function. IRT calibration and scoring is based on such probabilities.27, 28, 29

We used the 2-parameter graded response model that considers both the level of difficulty represented by each item and its discrimination to account for varying levels of item discrimination.30 Responses to the 25 candidate items were assessed for critical assumptions of IRT modeling including monotonicity, unidimensionality, and local item independence.31, 32, 33 Each episode was analyzed separately; therefore, we refer to episodes of care as patients. Two samples of n=10,000 were randomly selected for analyses. A “developmental” sample was used for exploratory factor analysis (EFA). The second “testing” sample was used for confirmatory factor analysis (CFA), IRT calibration, reliability testing, DIF investigations, and CAT simulation. EFA and CFA were used to test whether the candidate items assessed 1 dominant trait of dizziness impact and thus were unidimensional enough to not disturb score estimates. Items that met these criteria were calibrated into an item bank using the IRTPROa software. Patients with extreme scores who responded in only the highest or lowest response categories to all items were excluded from the calibration data set to reduce item parameter estimation bias.34 Item fit to the IRT model was tested using 3 random samples of n=2000, n=1000, and n=500 to accommodate for potentially overpowering the test when using the full CFA sample.35, 36, 37 Item misfit was defined as the ratio of the item fit χ2 value to degrees of freedom greater than 3.37 Next, items were evaluated for DIF to assess if they elicited responses differently between patient subgroups (ie, displayed item bias).38 CAT administration was simulated only for item banks with more than 10 items; similarly, corresponding SFs for item banks with >10 items were constructed. Reliability of scores based on each full item bank, as well as CATs and SFs when relevant, were evaluated.

Criteria used to assess item monotonicity, the degree of multidimensionality, local item independence, and the overall fit of the unidimensional CFA model were the same as detailed in recent measure development studies.21,23,25 Briefly, we used a minimum of 10 observations per response category for item calibration.39,40 The software program Mplusb was used for EFA and CFA model estimation.41 Items with item-rest correlations of <0.442 or factor loadings of <0.543 were considered for removal.42,43 Local item dependency was identified by flagging item pairs with residual correlations of >0.20 and then removing them iteratively until all remaining residual correlations were ≤0.20.44 Acceptable fit of the unidimensional CFA model was evaluated using a root mean square error of approximation of <0.1, a comparative fit index of >0.95, a Tucker-Lewis index of >0.95, and a standardized root mean square residual (SRMR) of <0.08.45, 46, 47, 48, 49 When those standards were not fully met, correlated error modification indices were used to identify additional items showing local item dependence that could be removed to improve model fit. All score estimates and score-related statistics, unless specifically noted otherwise, are reported in (or based on) the theta metric (mean = 0 [SD = 1]).

DIF

Items were evaluated for DIF using the lordif Rc package, version 0.3-3. This package iteratively applies a hybrid logistic ordinal regression (LOR) and IRT approach, flagging items with a McFadden pseudo-R2 change of ≥0.02.50,51 DIF factors studied were sex (men, women), age group (14-55, 56-70, 71-89), acuity of symptoms (calendar days between onset of symptoms and intake; 0-21 days, 22-90 days, over 3 months), exercise history (seldom or never, 1-2 × /week, 3 × /week or more), and main impairment (vestibular, brain injury, or neck). DIF by main impairment was assessed both for the CFA sample and also using the full study sample due to a low count of patients with neck impairments in the CFA sample (n=173; 1.7%), which may have underpowered the CFA test compared with the full sample (n=468; 1.6%).50 Items flagged for DIF were assessed for practical impact by quantifying changes in individual scores before and after adjustment for DIF, assessing percentage of score differences greater than their associated unadjusted score standard error (SE).21 These and other descriptive statistics were conducted using the Statad software.

Reliability

Internal consistency reliability was assessed using Cronbach's alpha as well as McDonald's omega that does not assume equal factor loadings between items.52 To provide more informative reliability estimates, we also calculated and plotted score-level-specific reliability estimates derived from IRT-based item and test information values.53 Additionally, traditional-type IRT-based overall test reliability estimates are reported, representing the frequency-weighted summary of the observed score-level-specific reliabilities, accounting for both score levels and the specific score distribution of the sample tested.54

CAT and SF development

CAT administration has been described in detail elsewhere.55,56 We used the R-based program Firestare to simulate CAT performance using the following stopping parameters: score SE, mean absolute change in score estimates over the last 3 items, and a minimum and maximum number of items administered.57 Mean and median number of items administered were assessed. Correlation between CAT scores and those based on the full item bank were calculated and plotted for visual inspection. CAT-based overall reliability was calculated as: reliability = 1 − (median SE2/SD2).58 Score-level-specific reliabilities and SEs were plotted across score levels. SF items were selected if they had the highest sum of item information (ie, highest reliability) across the expected theta metric score range and by reviewing items’ clinical content. Overall and score-level-specific reliabilities were calculated.

Known-groups construct validity

Known-groups construct validity is supported if scores can discriminate among groups of patients in expected clinical patterns.59 Better outcomes (more change) were expected for patients who were younger, had greater acuity of symptoms, exercised more, and had fewer comorbidities.21,59,60 Theta scores were linearly transformed to the T score metric (mean=50, SD=10). We conducted 1-way analysis of covariance for each known-groups variable, using T score change as the dependent variable and intake T score as the covariate. Post hoc Scheffe analyses were used for significant main factors. Group differences observed in the expected direction were interpreted as supporting construct validity.

Change effect size and score coverage

Longitudinal validity was assessed as the magnitude of score change over time by calculating as an effect-size statistic, along with its 95% CI using T scores: (mean discharge score − mean intake score)/intake SD.61 We considered effect sizes of 0.2-0.49, ≥0.5-0.79, and ≥0.8 as representing respectively small, moderate, and high levels of responsiveness, which is 1 aspect of longitudinal validity.62,63 We defined maximally acceptable floor and ceiling effects as 15% of scores at the minimum or maximum of the score range, respectively.64,65

Results

Patients

The total patient cohort (N=28,815) and the cohort of discharged patients who completed all 25 candidate DHI items at intake and discharge (table 1) were managed in either hospital-based outpatient clinics (61%) or private practice settings (39%) by 2178 clinicians from 976 clinics throughout 49 (US) states. Patients had vestibular- (88.6%), brain injury- (9.8%), or neck-related (1.6%) impairments. Other comorbidities reported by patients at intake are described in table 1. Because the data available to us were not linked to electronic health records, we could not identify specific diagnostic codes.

Table 1.

Health and demographic patient characteristics for the full cohort (N=28,815), and the cohort used for validity testing (n=17,308)*

Characteristic Value for:
Full Cohort Cohort With Responses at Both Intake and Discharge
Age
 Mean ± SD 60.0 (17.9) 60.8 (17.9)
 Minimum/maximum 14/89 14/89
Age groups, y
 14–<18 814 (2.8) 497 (2.9)
 18–<45 4697 (16.3) 2603 (15.0)
 45–<65 9346 (32.4) 5355 (30.9)
 65–89 13,958 (48.4) 8853 (51.1)
Sex: women 12,196 (59.9)
Acuity of symptoms
 0–7 d 3291 (11.4) 1877 (10.8)
 8–14 d 2992 (10.4) 1720 (9.9)
 15–21 d 2976 (10.3) 1779 (10.3)
 22–90 d 7726 (26.8) 4709 (27.2)
 91 d–6 mo 3531 (12.3) 2203 (12.7)
 >6 mo 8299 (28.8) 5020 (29.0)
Payer source
 Indemnity insurance 669 (2.3) 415 (2.4)
 Medicaid 1256 (4.4) 728 (4.2)
 Medicare B under age 65 857 (3.0) 494 (2.9)
 Medicare B age 65 or above 9796 (34.0) 6419 (37.1)
 Worker's compensation 556 (1.9) 367 (2.1)
 No fault/Auto insurance 437 (1.5) 276 (1.6)
 HMO, preferred provider 11,567 (40.1) 6528 (37.7)
 Other (litigation, Medicare A, Medicare C, patient, early intervention, school, no charge, commercial insurance) 3677 (12.8) 2081 (12.0)
Surgeries for problem being treated
 None 27,621 (95.9) 16,592 (95.9)
 1 714 (2.5) 439 (2.5)
 2 220 (0.8) 131 (0.8)
 ≥3 260 (0.9) 146 (0.8)
Exercise history
 At least 3 times/wk 10,032 (34.8) 6094 (35.2)
 1 or 2 times/wk 6912 (24.0) 4124 (23.8)
 Seldom or never 11,871 (41.2) 7090 (41.0)
Medication for main condition 10,259 (35.6) 6202 (35.8)
Previous treatment 18,413 (63.9) 11,048 (63.8)
Language used to respond to the PROM
 English 28,624 (99.3) 17,176 (99.2)
 Spanish 191 (0.7) 132 (0.8)
No. of comorbidities
 Mean ± SD 5.3 (3.4) 5.4 (3.4)
 Median (IQR) 5 (4) 5 (4)
Specific comorbidities
 Allergy 8922 (31.0) 5428 (31.4)
 Angina 517 (1.8) 301 (1.7)
 Anxiety or panic disorders 7028 (24.4) 4071 (23.5)
 Arthritis 11,365 (39.4) 7013 (40.5)
 Asthma 3424 (11.9) 2062 (11.9)
 Back pain (neck pain, low back pain, degenerative disk disease) 14,659 (50.9) 8835 (51.0)
 Cancer 3198 (11.1) 1990 (11.5)
 Chronic obstructive pulmonary disease (COPD) 1302 (4.5) 747 (4.3)
 Congestive heart failure 1976 (6.9) 1225 (7.1)
 Depression 6638 (23.0) 3844 (22.2)
 Diabetes type I or II 4509 (15.6) 2726 (15.7)
 Gastrointestinal 6065 (21.0) 3671 (21.2)
 Headaches 12,657 (43.9) 7525 (43.5)
 Hearing 3747 (13.0) 2346 (13.6)
 Hepatitis/HIV-AIDS 303 (1.1) 185 (1.1)
 High blood pressure 11,773 (40.9) 7290 (42.1)
 Heart attack (myocardial infarction) 1020 (3.5) 599 (3.5)
 Incontinence 2653 (9.2) 1620 (9.4)
 Kidney, bladder, prostate, or urination problems 3840 (13.3) 2316 (13.4)
 Neurological disease 617 (2.1) 365 (2.1)
 Obesity (BMI≥30) 11,418 (39.6) 6805 (39.3)
 Osteoporosis 2851 (9.9) 1760 (10.2)
 Other disorders 1045 (3.6) 606 (3.5)
 Peripheral vascular disease (or claudication) 571 (2.0) 351 (2.0)
 Previous collisions (motor vehicle, work, or other collision) 3983 (13.8) 2401 (13.9)
 Previous surgery 10,825 (37.6) 6616 (38.2)
 Prosthesis/implants 2206 (7.7) 1369 (7.9)
 Sleep dysfunction 6342 (22.0) 3800 (22.0)
 Stroke or transient ischemic attack 1958 (6.8) 1212 (7.0)
 Visual impairment 4623 (16.0) 2897 (16.7)
 Pacemaker 762 (2.6) 481 (2.8)
 Seizures 657 (2.3) 376 (2.2)

Abbreviations: BMI, body mass index; HMO, health maintenance organization; IQR, interquartile range.

Values are presented as numbers (percentages) of patients unless noted otherwise. Total percentages for categorical variables may range from 99.9 to 100.1 because of rounding.

Median and IQR are reported for the number of comorbidities because of the skewed distribution.

Measure development and reliability estimates

IRT model assumptions and fit of an IRT model

All items had 1690 or more observations per response category. Means of total summed scores for each item response category increased monotonically. EFA results supported 3 dominant factors, with the first, second, and third factor eigenvalues explaining 45%, 9%, and 6% of the model's variance, respectively. Item content for factors 1-3 were identified as representing Dizziness Functional Status (DFS, 13 items), Dizziness Positional Status (DPS, 5 items), and Dizziness Emotional Status (DES, 7 items) (appendix 1). For each factor, CFA results displayed no item pairs with residual correlations above 0.2. Factor loadings were all 0.57 or higher (appendix 2). For each domain, item-rest correlations were all above 0.4. DHI items that served as an item pool, along with their assigned domain, are described in table 2.

Table 2.

Dizziness impact item pool

Domain Label Item Description
DFS ALONE Because of your problem, are you afraid to stay home alone?
SIDEWALK Does walking down a sidewalk increase your problem?
HANDICAPPED Because of your problem, do you feel handicapped?
AFRAID Because of your problem, are you afraid to leave you home without having someone accompany you?
WALKAISL* Does walking down an aisle in a supermarket increase your problem?
WALKSELF* Because of your problem, is it difficult for you to go for a walk by yourself?
WALKDARK Because of your problem, is it difficult for you to walk around your house in the dark?
SOCIAL* Does your problem significantly restrict your participation in social activities such as going out to dinner, going to movies, dancing, or to parties?
INTERFERE* Does your problem, interfere with your job or household responsibilities?
TRAVEL*, Because of your problem, do you restrict your travel for business or recreation?
SPORTS* Does performing more ambitious activities such as sports, dancing, or household chores such as sweeping or putting away dishes increase your problem?
HEIGHTS Because of your problem, do you avoid heights?
HOUSE* Because of your problem, is it difficult for you to do strenuous homework or yard work
DPS TURNOVER Does turning over in bed increase your problem?
LOOKUP Does looking up increase your problem?
BENDING Does bending over increase your problem?
QUICK Do quick movements of your head increase your problem?
DES INTOXICATED Because of your problem, are you afraid that people may think you are intoxicated?
STRESS Has your problem placed stress on your relation with members of your family or friends?
DEPRESSED Because of your problem, are you depressed?
EMBARRASSED Because of your problem, have you been embarrassed in front of others?
CONCENTRATE Because of your problem, is it difficult for you to concentrate?
FRUSTRATED Because of your problem, do you feel frustrated?
- BED§ Because of your problem, do you have difficulty getting into or out of bed?
- READINGǁ Because of your problem, do you have difficulty reading?

NOTE. Response categories: 1=yes, 2=sometimes, and 3=no.

Abbreviations: DES, dizziness emotional status; DFS, dizziness functional status; DPS, dizziness positional status.

Items selected for the DFS 7-item short form.

DFS computerized adaptive test starting item.

DES screening items.

§

The item BED loaded on the DPS domain and was removed because of high modification index with TURNOVER.

ǁ

The item READING loaded on the DES domain and was removed because of high modification index with CONCENTRATE.

For DFS, all unidimensional fit criteria were met, allowing retention of all 13 items. For DPS and DES, only 1 of 4 CFA fit criteria were met. After removing the items BED from DPS and READING from DES, all 4 CFA fit criteria were met for both domains. CFA fit criteria by iteration are presented in table 3. For the 13 DFS and 6 DES items, no items demonstrated misfit to the IRT model; all ratios of χ2 to degrees of freedom were 2.2 or less for the 3 random samples tested. For the remaining 4 DPS items, χ2 to degrees of freedom was 3 or less for the random sample of n=500, and greater than 3 for the random samples of n=1000 and n=2000. Overall, these results supported the 3 item banks for assessing 3 unique unidimensional constructs of dizziness impact. The final item fit to the 3 IRT models, as well as average item difficulty and item discrimination, are displayed in table 4.

Table 3.

CFA item selection process

Domain # of Items RMSEA CFI TLI SRMR
DFS iteration #1 13 0.093 0.964 0.956 0.049
DPS iteration #1* 5 0.150 0.937 0.873 0.057
DPS iteration #2 4 0.027 0.999 0.997 0.007
DES iteration #1 7 0.131 0.930 0.895 0.064
DES iteration #2 6 0.088 0.975 0.958 0.039
Recommended fit of the unidimensional CFA model.39, 40, 41, 42, 43 <0.1 >0.95 >0.95 <0.08

NOTE. Bolded fit results are not met given the selected criteria.

Abbreviations: CFI, comparative fit index; DES, dizziness emotional status; DFS, dizziness functional status; DPS, dizziness positional status; RMSEA, root mean square error of approximation; SRMR, standardized root mean square residual; TLI, Tucker-Lewis index.

The item BED had a high modification index with the item TURNOVER, and the lowest factor loading, and was removed for the subsequent iteration.

The item READING had a high modification index with CONCENTRATE and was removed for the subsequent iteration.

Table 4.

Dizziness impact domains item parameters and fit

Item Banks Slope (a) Location Threshold Parameters
Item Fit (χ2/d.f.)
Domain Label b1 b2 n=2000 n=1000 n=500
DFS ALONE 1.52 -1.92 -2.4 -1.4 1.6 1.0 1.1
SIDEWALK 1.78 -0.70 -1.3 -0.1 0.9 1.2 1.1
HANDICAPPED 1.79 -0.69 -1.2 -0.2 1.0 1.3 0.8
AFRAID 2.24 -0.68 -1.1 -0.3 1.3 1.1 0.6
WALKAISL 1.97 -0.64 -1.2 -0.1 1.1 1.0 0.8
WALKSELF 2.48 -0.34 -0.7 0.0 1.0 1.0 0.9
WALKDARK 1.54 -0.15 -0.6 0.3 1.4 1.6 1.0
SOCIAL 2.49 -0.05 -0.5 0.4 1.5 1.4 1.5
INTERFERE 2.70 0.13 -0.3 0.6 1.4 0.9 1.3
TRAVEL 2.85 0.14 -0.2 0.5 1.4 0.9 0.8
SPORTS 2.16 0.27 -0.3 0.8 1.7 1.0 2.1
HEIGHTS 1.14 0.33 0.0 0.7 1.1 0.7 1.0
HOUSE 2.59 0.38 0.0 0.8 2.0 0.7 1.3
DPS TURNOVER 1.19 0.17 -0.4 0.7 6.0 1.5 1.1
LOOKUP 1.81 0.47 0.0 1.0 8.8 3.7 2.0
BENDING 1.72 0.76 0.1 1.4 5.1 3.2 1.2
QUICK 2.76 1.09 0.6 1.6 7.8 5.7 3.0
DES INTOXICATED 1.35 -1.04 -1.4 -0.7 1.2 1.0 1.1
STRESS 2.08 -0.98 -1.3 -0.6 1.9 1.2 1.4
DEPRESSED 2.10 -0.88 -1.4 -0.4 2.0 2.0 1.6
EMBARRASSED 2.11 -0.85 -1.1 -0.6 2.1 0.9 1.5
CONCENTRATE 1.62 -0.05 -0.6 0.5 2.2 1.5 1.3
FRUSTRATED 2.06 0.85 0.3 1.4 1.5 1.8 0.9

NOTES. Slope: The discrimination parameter. Items with larger slopes are better able to distinguish between individuals with higher and lower levels of the latent trait being measured.

Location: The average of threshold parameters. Represents the average item difficulty level. Items are sorted from low to high location.

Threshold parameters: Also named location parameters. Each parameter represents the points along the latent trait at which the corresponding response categories are the most discriminating or informative, that is, the location where the respondent has a 50% probability of endorsing either of the consecutive response categories. For example, b1 represents the threshold between response category 1 and 2, b2 represents the threshold between response category 2 and 3, and so on.

Item Fit: Represents the ratio between the item chi-squared and the degrees of freedom for the chi-squared test. A ratio >3 suggests item misfit. Fit indices above 3 are marked in bold.

Abbreviations: DES, dizziness emotional status; DFS, dizziness functional status; DPS, dizziness positional status.

DIF

No items demonstrated DIF for sex, acuity of symptoms, and exercise history. Two items were flagged for DIF by age group (INTERFR for DFS and CONCENT for DES). The item CONCENT was flagged for DIF by main impairment for DES. Differences in adjusted and unadjusted scores for age group and main impairment all were below the individual scores’ unadjusted SE, suggesting no practical impact of statistically significant DIF.

Reliability

Cronbach's alpha, coefficient omega, and IRT-based overall reliability were 0.90, 0.90, and 0.92 for DFS, 0.69, 0.70, and 0.72 for DPS, and 0.77, 0.78, and 0.79 for DES item banks, respectively. Corresponding IRT-based SE's for DFS, DPS, and DES were 2.8, 5.3, and 4.6 T score points, respectively. For DFS, a 7-item SF demonstrated an overall reliability estimate of 0.89 (overall SE=3.4 T score points). Score-level-specific reliability levels for the DFS-CAT, DPS, and DES are displayed in figure 1. Conversion tables that associate DFS SF, DPS, and DES raw summed scores with IRT-based T scores and corresponding individual-score SEs are available upon request.

Fig 1.

Fig 1

(A-C) Reliability and standard error by T score level. Score reliability (left axis) and the standard error (right axis) are shown as a function of DFS (a), DPS (b), and DES (c) using the T score metric. Each dot represents 1 or more patients (n=10,000). The horizontal dashed lines relate only to the left vertical axis and marks the 0.9 reliability threshold recommended for individual score discrimination and the 0.7 reliability threshold recommended for group score discrimination.66

DFS-CAT score equivalency, precision, and efficiency

For the DFS-CAT, the item TRAVEL was selected as the starting item based on its high discrimination parameter and moderate item difficulty (table 4). CAT stopping criteria were SE <0.3 theta points, mean absolute change in the score estimate <0.05 theta points over the last 3 items, and a minimum of 5 to a maximum of 10 items administered. DFS-CAT item usage ranged from 12% to 100%. CAT and full-bank scores correlated highly (Pearson correlation coefficient=0.989; fig 2). The overall CAT score median reliability was 0.91 (SE=3.0 T score points). The mean and median number of items administered by the CAT were 8.03 and 8, respectively. The frequency distribution when number of items per CAT is 5, 6, 7, and 8-10 items was 11.7%, 10.9%, 13.5%, and 63.9%, respectively; 69.6% of CATs administered less than 10 items.

Fig 2.

Fig 2

Correlation between DFS CAT and the full 13-item bank T scores. Pearson correlation coefficient =0.989. The red diagonal line represents perfect agreement between scores derived from the full item bank and the CAT. Each dot represents 1 or more patients (N=10,000).

Known-groups construct validity

T score changes by patient groups when controlling for scores at intake for the DFS, DPS, and DES are displayed in table 5. For DFS, change scores discriminated between patient groups in expected ways for all variables tested, with more change (better outcomes) for patients who were younger, had greater acuity of symptoms, exercised more, and had fewer comorbidities (P<.001). For the DPS and DES, these trends were also observed by acuity of symptoms and number of comorbidities (P<.001), but not clearly observed for age group (P=.027 and P=.007, respectively) and prior exercise history (P=.752 and P=.044, respectively).

Table 5.

Known-groups construct validity; dizziness impact change* by patient group

Patient Characteristic DFS DPS DES
Variable Groups No. (%) of Patients ANCOVA Marginal Mean ANCOVA Marginal Mean ANCOVA Marginal Mean
P (F test) B (95% CI) P (F test) B (95% CI) P (F test) B (95% CI)
Age 14-55 y old 5517 (31.9) P<.001 9.3 (9.0-9.5) P=.027 10.3 (10.1-10.6) P=.007 7.0 (6.8-7.2)
56-70 y old 5737 (33.1) 9.4 (9.2-9.6) 10.7 (10.5-11.0) 7.2 (7.0-7.4)
76-89 y old 6054 (35.0) 8.3 (8.1-8.5) 10.6 (10.4-10.9) 6.8 (6.6-7.0)
Acuity of symptoms 0-21 d 5376 (31.1) P<.001 11.9 (11.7-12.1) P<.001 12.8 (12.6-13.0) P<.001 8.9 (8.7-9.1)
22-90 d 4709 (27.2) 9.3 (9.0-9.5) 11.0 (10.8-11.3) 7.3 (7.1-7.5)
3 or more months 7223 (41.7) 6.6 (6.4-6.8) 8.6 (8.4-8.8) 5.3 (5.2-5.5)
Prior exercise history At least 3 × /week 6094 (35.2) P<.001 9.3 (9.1-9.5) P=.752 10.6 (10.4-10.8) P=.044 7.0 (6.8-7.2)
1-2 × /week 4124 (23.8) 9.1 (8.9-9.4) 10.6 (10.4-10.9) 7.2 (7.0-7.4)
Seldom or never 7090 (41.0) 8.6 (8.4-8.8) 10.5 (10.3-10.7) 6.8 (6.7-7.0)
Number of comorbidities 0-3 5862 (33.9) P<.001 10.4 (10.2- 10.6) P<.001 12.0 (11.7-12.2) P<.001 7.8 (7.6-8.0)
4-6 5771 (33.3) 9.1 (8.9- 9.3) 10.6 (10.4- 10.8) 7.1 (6.9-7.3)
7 or more 5675 (32.8) 7.3 (7.1-7.5) 9.1 (8.9- 9.3) 6.1 (5.9-6.3)

Abbreviations: ANCOVA, analyses of covariance; DES, dizziness emotional status; DFS, dizziness functional status; DPS, dizziness positional status.

Positive change demonstrates improved status (less impact) of dizziness on positional, functional, or emotional status.

n=17,308.

The marginal mean quantifies the expected change in dizziness effect for the mean dizziness effect at intake of 50 on the T score scale. B quantifies the unstandardized change in dizziness effect for each patient group while controlling for dizziness effect at intake.

Change effect size and score coverage

Change effect sizes (95% CI) for the DFS, DPS, and DES were 0.97 (0.95 - 0.99), 1.29 (1.26 - 1.31), and 0.81 (0.78 - 0.83), all considered high. At intake, DFS, DPS, and DES floor effects were 1.3%, 20.4%, and 2.4%, respectively, and ceiling effects were 5.1%, 2.8%, and 9.2%, respectively.

Discussion

The new IRT-based DFS, DPS, and DES PROMs that assess patient-perceived impact of dizziness on physical functional, positional, and emotional status were supported for reliability, validity, and efficiency. Strengths of this work include use of a large and diverse sample of patients with predominantly vestibular conditions managed by numerous clinicians and clinics across 49 states, supporting the PROMs’ external validity. Both CAT (for DFS) and static versions were developed, offering simplified as well as advanced administration modes, increasing their feasibility for use in diverse clinical settings. The lack of DIF impact on score estimates for all factors investigated supports the ability of the 3 PROMs to provide unbiased estimates across multiple patient groups. Longitudinal validity was supported by strong change effect sizes (>0.8). EFA and CFA results confirmed prior findings,13, 14, 15,67 that is, that the DHI represents multiple domains that are different from those originally hypothesized,11 and, thus, using a single score derived from all 25 DHI items poses a threat to validity.

The 13 DFS items include content pertinent to patients with vestibular hypofunction, an observation that needs to be validated using combined outcomes and diagnostics. The DFS-CAT demonstrated improved efficiency (an average of only 8 items administered) vs administrating the full item bank while maintaining high reliability estimates and nearly perfect correlation with scores from the full item bank. These results support a successful balance between low response burden and high precision. Future research could examine whether additional items representing higher and lower levels of dizziness impact would result in further improvements at the ends of the measurement scale.

The 4 DPS items were representative of problems frequently expressed by patients with benign paroxysmal positional vertigo. While we did not have access to diagnostic codes or physical exam data, this clinical observation is supported by past research in which such data were available.68,69

Given that DPS and DES contain only 4 and 6 items, respectively, we were not surprised by reliability estimates meeting only the 0.7 threshold recommended for group score discrimination.66 This can be addressed by calibrating more items that represent low and high positional and emotional status, possibly also reducing the 20% DPS floor effect below the recommended 15% threshold.

Construct validity of each PROM score was supported in that score discrimination between patient groups was clinically logical overall except that younger age and higher exercise history did not clearly demonstrate expected patterns for DPS and DES. These results may be related to the fact that the original hypotheses for construct validity testing were established based on studies of outcomes of physical function that, in fact, corresponded well with the DFS construct validity results.

Limitations

A noted limitation of this study is the use of retrospective data from patients who were selected to complete the full set of candidate items, possibly introducing a risk for patient-selection bias. This bias may have been mitigated by the large cohort (N>28,000), if such bias existed. Additionally, we were unable to validate that DFS and DPS items pertain to vestibular hypofunction and benign paroxysmal positional vertigo, respectively, due to lacking diagnostic codes in the dataset. Future studies using integration of PROM data with electronic medical record data would address this limitation. Finally, this study aimed to identify 1 or more unidimensional domains. Future research is needed to assess if multidimensional IRT modeling structures can be supported where the 3 unidimensional domains identified can be modeled under 1 multidimensional domain, considering potential impact on clinical decision making for patients with different clinical representations.70

Conclusions

In conclusion, the DFS, DPS, and DES demonstrated moderate to high reliability, were valid, and highly responsive to change. They are suitable for use in routine clinical practice and further research.

Suppliers

  • a.

    IRTPRO 5.2. 2020 Vector Psychometric Group, LLC, Chapel Hill, NC, USA.

  • b.

    Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User's Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén. 3463 Stoner Avenue, Los Angeles, CA 90066.

  • c.

    R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  • d.

    StataCorp. 2023. Stata Statistical Software: Release 18. College Station, TX: StataCorp LLC.

  • e.

    choi-phd/Firestar: Computerized Adaptive Testing (CAT) Simulation Program for Polytomous IRT Models. https://rdrr.io/github/choi-phd/Firestar/.

Acknowledgments

The authors thank the many physical therapists, who provided ongoing input to help ensure these measures are meaningful to patients and clinical practice, and Jerry Mioduski, MS, for data curation and support.

Footnotes

This study was performed at Net Health Systems, Inc, Pittsburgh, PA.

This material was presented as a poster presentation at the 2022 American Congress of Rehabilitation Medicine conference.

This study did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Disclosures: Dr Kallen provided consulting services to Net Health Systems, Inc, the company that owns the Focus on Therapeutic Outcomes Patient Outcomes system that gathers and manages the data analyzed in this manuscript. The authors have no other conflicts to declare.

Appendix 1. Exploratory Factor Analysis; Factor Loadingsa for All 25 DHI Items

Item Factor 1 Factor 2 Factor 3 Domain
WALKSELF 0.995 -0.014 -0.178 DFS
SIDEWALK 0.831 0.136 -0.161 DFS
AFRAID 0.791 -0.039 0.039 DFS
WALKAISL 0.754 0.086 -0.015 DFS
HOUSE 0.672 0.087 0.140 DFS
TRAVEL 0.623 -0.045 0.287 DFS
WALKDARK 0.616 0.187 0.008 DFS
SPORTS 0.541 0.177 0.207 DFS
SOCIAL 0.534 -0.006 0.349 DFS
INTERFERE 0.512 0.023 0.372 DFS
ALONE 0.507 0.013 0.218 DFS
HEIGHTS 0.477 0.207 0.032 DFS
HANDICAPPED 0.448 -0.061 0.387 DFS
TURNOVER 0.064 0.739 -0.131 DPS
QUICK -0.031 0.677 0.242 DPS
LOOKUP 0.071 0.607 0.114 DPS
BENDING 0.316 0.555 0.005 DPS
BED 0.22 0.534 -0.006 DPS
CONCENTRATE -0.105 0.057 0.841 DES
STRESS 0.035 -0.153 0.759 DES
DEPRESSED 0.02 -0.087 0.715 DES
EMBARRASSED 0.074 -0.057 0.659 DES
READING -0.028 0.089 0.651 DES
FRUSTRATED 0.102 0.017 0.644 DES
INTOXICATED 0.09 0.061 0.497 DES

Abbreviations: DES, dizziness emotional status; DFS, dizziness functional status; DHI, Dizziness Handicap Inventory; DPS, dizziness positional status.

aValues represent oblique rotated factor loadings.

Appendix 2. Confirmatory Factor Analysis; Factor Loadingsa

Domain Item Loading
DFS TRAVEL 0.854
INTERFERE 0.839
HOUSE 0.825
WALKSELF 0.825
SOCIAL 0.82
AFRAID 0.799
SPORTS 0.776
WALKAISL 0.764
SIDEWALK 0.742
HANDICAPPED 0.707
WALKDARK 0.670
ALONE 0.653
HEIGHTS 0.566
DPS QUICK 0.827
LOOKUP 0.727
BENDING 0.701
TURNOVER 0.582
DES EMBARRASSED 0.784
DEPRESSED 0.767
FRUSTRATED 0.760
STRESS 0.756
CONCENTRATE 0.686
INTOXICATED 0.638

Abbreviations: DES, dizziness emotional status; DFS, dizziness functional status; DPS, dizziness positional status.

aValues represent oblique rotated factor loadings.

References

  • 1.Neuhauser HK. The epidemiology of dizziness and vertigo. Handb Clin Neurol. 2016;137:67–82. doi: 10.1016/B978-0-444-63437-5.00005-4. [DOI] [PubMed] [Google Scholar]
  • 2.Hall CD, Herdman SJ, Whitney SL, et al. Vestibular rehabilitation for peripheral vestibular hypofunction: an updated clinical practice guideline from the Academy of Neurologic Physical Therapy of the American Physical Therapy Association. J Neurol Phys Ther. 2022;46:118–177. doi: 10.1097/NPT.0000000000000382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weidt S, Bruehl AB, Straumann D, Hegemann SC, Krautstrunk G, Rufer M. Health-related quality of life and emotional distress in patients with dizziness: a cross-sectional approach to disentangle their relationship. BMC Health Serv Res. 2014;14:317. doi: 10.1186/1472-6963-14-317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kovacs E, Wang X, Grill E. Economic burden of vertigo: a systematic review. Health Econ Rev. 2019;9:37. doi: 10.1186/s13561-019-0258-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang X, Strobl R, Holle R, Seidl H, Peters A, Grill E. Vertigo and dizziness cause considerable more health care resource use and costs: results from the KORA FF4 study. J Neurol. 2019;266:2120–2128. doi: 10.1007/s00415-019-09386-x. [DOI] [PubMed] [Google Scholar]
  • 6.Agrawal Y, Carey JP, Della Santina CC, Schubert MC, Minor LB. Disorders of balance and vestibular function in US adults: data from the National Health and Nutrition Examination Survey, 2001-2004. Arch Intern Med. 2009;169:938–944. doi: 10.1001/archinternmed.2009.66. [DOI] [PubMed] [Google Scholar]
  • 7.Herdman SJ, Blatt P, Schubert MC, Tusa RJ. Falls in patients with vestibular deficits. Am J Otol. 2000;21:847–851. [PubMed] [Google Scholar]
  • 8.Schlick C, Schniepp R, Loidl V, Wuehr M, Hesselbarth K, Jahn K. Falls and fear of falling in vertigo and balance disorders: a controlled cross-sectional study. J Vestib Res. 2016;25:241–251. doi: 10.3233/VES-150564. [DOI] [PubMed] [Google Scholar]
  • 9.Burns ER, Stevens JA, Lee R. The direct costs of fatal and non-fatal falls among older adults - United States. J Safety Res. 2016;58:99–103. doi: 10.1016/j.jsr.2016.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Florence CS, Bergen G, Atherly A, Burns E, Stevens J, Drake C. Medical costs of fatal and nonfatal falls in older adults. J Am Geriatr Soc. 2018;66:693–698. doi: 10.1111/jgs.15304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jacobson GP, Newman CW. The development of the Dizziness Handicap Inventory. Arch Otolaryngol Head Neck Surg. 1990;116:424–427. doi: 10.1001/archotol.1990.01870040046011. [DOI] [PubMed] [Google Scholar]
  • 12.Mutlu B, Serbetcioglu B. Discussion of the Dizziness Handicap Inventory. J Vestib Res. 2013;23:271–277. doi: 10.3233/VES-130488. [DOI] [PubMed] [Google Scholar]
  • 13.Asmundson GJ, Stein MB, Ireland D. A factor analytic study of the Dizziness Handicap Inventory: does it assess phobic avoidance in vestibular referrals? J Vestib Res. 1999;9:63–68. [PubMed] [Google Scholar]
  • 14.Kurre A, Bastiaenen CH, van Gool CJ, Gloor-Juzi T, de Bruin ED, Straumann D. Exploratory factor analysis of the Dizziness Handicap Inventory (German version) BMC Ear Nose Throat Disord. 2010;10:3. doi: 10.1186/1472-6815-10-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Perez N, Garmendia I, Garcia-Granero M, Martin E, Garcia-Tapia R. Factor analysis and correlation between Dizziness Handicap Inventory and dizziness characteristics and impact on Quality of Life scales. Acta Otolaryngol Suppl. 2001;545:145–154. doi: 10.1080/000164801750388333. [DOI] [PubMed] [Google Scholar]
  • 16.Van De Wyngaerde KM, Lee MK, Jacobson GP, Pasupathy K, Romero-Brufau S, McCaslin DL. The component structure of the Dizziness Handicap Inventory (DHI): a reappraisal. Otol Neurotol. 2019;40:1217–1223. doi: 10.1097/MAO.0000000000002365. [DOI] [PubMed] [Google Scholar]
  • 17.Jacobson GP, Calder JH. A screening version of the Dizziness Handicap Inventory (DHI-S) Am J Otol. 1998;19:804–808. [PubMed] [Google Scholar]
  • 18.Tesio L, Alpini D, Cesarani A, Perucca L. Short form of the Dizziness Handicap Inventory: construction and validation through Rasch analysis. Am J Phys Med Rehabil. 1999;78:233–241. doi: 10.1097/00002060-199905000-00009. [DOI] [PubMed] [Google Scholar]
  • 19.Ardic FN, Tumkaya F, Akdag B, Senol H. The subscales and short forms of the Dizziness Handicap Inventory: are they useful for comparison of the patient groups? Disabil Rehabil. 2017;39:2119–2122. doi: 10.1080/09638288.2016.1219923. [DOI] [PubMed] [Google Scholar]
  • 20.Werneke MW, Deutscher D, Grigsby D, Tucker CA, Mioduski JE, Hayes D. Telerehabilitation during the COVID-19 pandemic in outpatient rehabilitation settings: a descriptive study. Phys Ther. 2021;101:pzab110. doi: 10.1093/ptj/pzab110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Deutscher D, Kallen MA, Hayes D, et al. The lower extremity physical function patient-reported outcome measure was reliable, valid, and efficient for patients with musculoskeletal impairments. Arch Phys Med Rehabil. 2021;102:1576–1587. doi: 10.1016/j.apmr.2021.02.005. [DOI] [PubMed] [Google Scholar]
  • 22.Deutscher D, Kallen MA, Hayes D, et al. Lower quadrant edema patient-reported outcome measure is reliable, valid, and efficient for patients with lymphatic and venous disorders. Phys Ther. 2023;103:pzad083. doi: 10.1093/ptj/pzad083. [DOI] [PubMed] [Google Scholar]
  • 23.Deutscher D, Hayes D, Cook KF, et al. Upper quadrant edema patient-reported outcome measure is reliable, valid, and efficient for patients with lymphatic and venous disorders. Phys Ther. 2021;101 doi: 10.1093/ptj/pzab219. [DOI] [PubMed] [Google Scholar]
  • 24.Deutscher D, Kallen MA, Hayes D, et al. The stroke upper and lower extremity physical function measures were supported for score reliability, validity, and administration efficiency for patients poststroke. Phys Ther. 2023;103:pzad107. doi: 10.1093/ptj/pzad107. [DOI] [PubMed] [Google Scholar]
  • 25.Deutscher D, Kallen MA, Werneke MW, Mioduski JE, Hayes D. Reliability, validity, and efficiency of an item response theory-based balance confidence patient-reported outcome measure. Phys Ther. 2023;103:pzad058. doi: 10.1093/ptj/pzad058. [DOI] [PubMed] [Google Scholar]
  • 26.Lewin-Epstein N, Sagiv-Schifter T, Shabtai EL, Shmueli A. Validation of the 36-item short-form Health Survey (Hebrew version) in the adult population of Israel. Med Care. 1998;36:1361–1370. doi: 10.1097/00005650-199809000-00008. [DOI] [PubMed] [Google Scholar]
  • 27.Cook KF. A conceptual introduction to item response theory. 2013. Available at: https://www.youtube.com/watch?v=SrdbllMYq8M. Accessed January 25, 2024.
  • 28.Cook KF, O'Malley KJ, Roddey TS. Dynamic assessment of health outcomes: time to let the CAT out of the bag? Health Serv Res. 2005;40(5 Pt 2):1694–1711. doi: 10.1111/j.1475-6773.2005.00446.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Reeve BB. Item response theory modeling in health outcomes measurement. Expert Rev Pharmacoecon Outcomes Res. 2003;3:131–145. doi: 10.1586/14737167.3.2.131. [DOI] [PubMed] [Google Scholar]
  • 30.Samejima F. Estimation of ability using a response pattern of graded responses. Psycometrika. 1969 Monograph 17. [Google Scholar]
  • 31.Edelen MO, Reeve BB. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res. 2007;16(Suppl 1):5–18. doi: 10.1007/s11136-007-9198-0. [DOI] [PubMed] [Google Scholar]
  • 32.Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38(9 Suppl):II28–II42. doi: 10.1097/00005650-200009002-00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Reise SP, Ainsworth AT, Haviland MG. Item response theory: fundamentals, applications, and promise in psychological research. Curr Dir Psychol Sci. 2005;14:95–101. [Google Scholar]
  • 34.Reise SP, Rodriguez A, Spritzer KL, Hays RD. Alternative approaches to addressing non-normal distributions in the application of IRT models to personality measures. J Pers Assess. 2018;100:363–374. doi: 10.1080/00223891.2017.1381969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Crisan DR, Tendeiro JN, Meijer RR. Investigating the practical consequences of model misfit in unidimensional IRT models. Appl Psychol Meas. 2017;41:439–455. doi: 10.1177/0146621617695522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Drasgow F, Levine MV, Tsien S, Williams B, Mead AD. Fitting polytomous item response theory models to multiple-choice tests. Appl Psychol Meas. 1995;19:143–166. [Google Scholar]
  • 37.Stark S, Chernyshenko OS, Drasgow F, Williams BA. Examining assumptions about item responding in personality assessment: should ideal point methods be considered for scale development and scoring? J Appl Psychol. 2006;91:25–39. doi: 10.1037/0021-9010.91.1.25. [DOI] [PubMed] [Google Scholar]
  • 38.Kleinman M, Teresi JA. Differential item functioning magnitude and impact measures from item response theory models. Psychol Test Assess Model. 2016;58:79–98. [PMC free article] [PubMed] [Google Scholar]
  • 39.Choi SW, Cook KF, Dodd BG. Parameter recovery for the partial credit model using MULTILOG. J Outcome Meas. 1997;1:114–142. [PubMed] [Google Scholar]
  • 40.Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas. 2002;3:85–106. [PubMed] [Google Scholar]
  • 41.Muthén LK, Muthén BO. 8th ed. Muthén & Muthén; Los Angeles, CA: 2017. Mplus User's Guide. 1998- [Google Scholar]
  • 42.Zijlmans EAO, Tijmstra J, van der Ark LA, Sijtsma K. Item-score reliability in empirical-data sets and its relationship with other item indices. Educ Psychol Meas. 2018;78:998–1020. doi: 10.1177/0013164417728358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cutillo L. In: Encyclopedia of bioinformatics and computational biology. Ranganathan S, Gribskov M, Nakai K, Schönbach C, editors. Academic Press; Oxford: 2019. Parametric and multivariate methods; pp. 738–746. [Google Scholar]
  • 44.Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH roadmap cooperative group during its first two years. Med Care. 2007;45(5 Suppl 1):S3–11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107:238–246. doi: 10.1037/0033-2909.107.2.238. [DOI] [PubMed] [Google Scholar]
  • 46.Browne MW, Cudeck R. In: Testing structural equation models. Bollen KA, Long JA, editors. Sage Publications; Newbury Park, CA: 1993. Alternative ways of assessing model fit; pp. 136–162. [Google Scholar]
  • 47.Hu LT, Bentler P. Cutoff criteria for fit indices in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55. [Google Scholar]
  • 48.Kline RB. 2nd ed. Guilford Press; New York: 2005. Principles and practice of structural equation modeling. [Google Scholar]
  • 49.Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) Med Care. 2007;45(5 Suppl 1):S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04. [DOI] [PubMed] [Google Scholar]
  • 50.Choi SW, Gibbons LE, Crane PK. lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. 2011;39:1–30. doi: 10.18637/jss.v039.i08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Choi SW, Gibbons LE, Crane PK. lordif: logistic ordinal regression differential item functioning using IRT. R package version 0.3-3. 2016. Available at: https://CRAN.R-project.org/package=lordif. Accessed January 25, 2024.
  • 52.Deng L, Chan W. Testing the difference between reliability coefficients alpha and omega. Educ Psychol Meas. 2017;77:185–203. doi: 10.1177/0013164416658325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Cappelleri JC, Jason Lundy J, Hays RD. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clin Ther. 2014;36:648–662. doi: 10.1016/j.clinthera.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Green BF, Bock RD, Humphreys LG, Linn RL, Reckase MD. Technical guidelines for assessing computerized adaptive tests. J Educ Meas. 1984;21:347–360. [Google Scholar]
  • 55.Chakravarty EF, Bjorner JB, Fries JF. Improving patient reported outcomes using item response theory and computerized adaptive testing. J Rheumatol. 2007;34:1426–1431. [PubMed] [Google Scholar]
  • 56.Hart DL, Deutscher D, Werneke MW, Holder J, Wang YC. Implementing computerized adaptive tests in routine clinical practice: experience implementing CATs. J Appl Meas. 2010;11:288–303. [PubMed] [Google Scholar]
  • 57.Choi SW. Firestar: computerized adaptive testing simulation program for polytomous IRT models. Appl Psychol Meas. 2009;33:644–645. [Google Scholar]
  • 58.Pilkonis PA, Yu L, Dodds NE, Johnston KL, Maihoefer CC, Lawrence SM. Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS) in a three-month observational study. J Psychiatr Res. 2014;56:112–119. doi: 10.1016/j.jpsychires.2014.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Deutscher D, Hart DL, Stratford PW, Dickstein R. Construct validation of a knee-specific functional status measure: a comparative study between the United States and Israel. Phys Ther. 2011;91:1072–1084. doi: 10.2522/ptj.20100175. [DOI] [PubMed] [Google Scholar]
  • 60.Deutscher D, Werneke MW, Hayes D, et al. Impact of risk adjustment on provider ranking for patients with low back pain receiving physical therapy. J Orthop Sports Phys Ther. 2018;48:637–648. doi: 10.2519/jospt.2018.7981. [DOI] [PubMed] [Google Scholar]
  • 61.Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989;27(3 Suppl):S178–S189. doi: 10.1097/00005650-198903001-00015. [DOI] [PubMed] [Google Scholar]
  • 62.Jette DU, Jette AM. Physical therapy and health outcomes in patients with spinal impairments. Phys Ther. 1996;76:930–941. doi: 10.1093/ptj/76.9.930. discussion 942-5. [DOI] [PubMed] [Google Scholar]
  • 63.Jette DU, Jette AM. Physical therapy and health outcomes in patients with knee impairments. Phys Ther. 1996;76:1178–1187. doi: 10.1093/ptj/76.11.1178. [DOI] [PubMed] [Google Scholar]
  • 64.Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42. doi: 10.1016/j.jclinepi.2006.03.012. [DOI] [PubMed] [Google Scholar]
  • 65.Wamper KE, Sierevelt IN, Poolman RW, Bhandari M, Haverkamp D. The Harris hip score: do ceiling effects limit its usefulness in orthopedics? Acta Orthop. 2010;81:703–707. doi: 10.3109/17453674.2010.537808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Bland JM, Altman DG. Cronbach's alpha. BMJ. 1997;314:572. doi: 10.1136/bmj.314.7080.572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Valancius D, Ulyte A, Masiliunas R, et al. Validation and factor analysis of the Lithuanian version of the Dizziness Handicap Inventory. J Int Adv Otol. 2019;15:447–453. doi: 10.5152/iao.2019.6977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Whitney SL, Marchetti GF, Morris LO. Usefulness of the Dizziness Handicap Inventory in the screening for benign paroxysmal positional vertigo. Otol Neurotol. 2005;26:1027–1033. doi: 10.1097/01.mao.0000185066.04834.4e. [DOI] [PubMed] [Google Scholar]
  • 69.Zamyslowska-Szmytke E, Politanski P, Jozefowicz-Korczynska M. Dizziness Handicap Inventory in clinical evaluation of dizzy patients. Int J Environ Res Public Health. 2021;18:2210. doi: 10.3390/ijerph18052210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Immekus JC, Snyder KE, Ralston PA. Multidimensional item response theory for factor structure assessment in educational psychology research. Front Educ. 2019;4 [Google Scholar]

Articles from Archives of Rehabilitation Research and Clinical Translation are provided here courtesy of Elsevier

RESOURCES