Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: J Affect Disord. 2021 Jun 12;293:36–42. doi: 10.1016/j.jad.2021.06.005

Differential Item Functioning of the Beck Anxiety Inventory in a Rural, Multi-Ethnic Cohort

Joshua M Garcia a, Matthew W Gallagher a, Sid E O’Bryant b, Luis D Medina a,*
PMCID: PMC8349838  NIHMSID: NIHMS1716080  PMID: 34166907

Abstract

Background:

Evaluating measurement bias is vital to ensure equivalent assessment across diverse groups. One approach for evaluating test bias, differential item functioning (DIF), assesses item-level bias across specified groups by comparing item-level responses between groups that have the same overall score. Previous DIF studies of the Beck Anxiety Inventory (BAI) have only assessed bias across age, sex, and disease duration in monolingual samples. We expand this literature through DIF analysis of the BAI across age, sex, education, ethnicity, cognitive status, and test language.

Methods:

BAI data from a sample (n = 527, mean age=61.4±12.7, mean education=10.9±4.3, 69.3% female, 41.9% Hispanic/Latin American) from rural communities in West Texas, USA were analyzed. Item response theory (IRT) / logistic ordinal regression DIF was conducted across dichotomized demographic grouping factors. The Mann-Whitney U test and Hedge’s g standardized mean differences were calculated before and after adjusting for the impact of DIF.

Results:

Significant DIF was demonstrated in 10/21 items. An adverse impact of DIF was not identified when demographics were assessed individually. Adverse DIF was identified for only one participant (1/527, 0.2%) when all demographics were aggregated.

Limitations:

These results might not be generalizable to a sample with broader racial representation, more severe cognitive impairment, and higher levels of anxiety.

Conclusions:

Minimal item-level bias was identified across demographic factors considered. These results support prior evidence the BAI is valid for assessing anxiety across age and sex while contributing new evidence of its clinical relevance across education, ethnicity, cognitive status, and English/Spanish test language.

Keywords: psychometrics, bias, ethnic groups, measurement, anxiety disorders, rural population

Introduction

Health behavior and psychological measures are subject to psychometric bias and variable interpretation by respondents on the basis of cultural context (Azocar et al., 2001; Ellis, 1989; Ramirez Gomez et al., 2017; Statucka and Cohn, 2019). Self-report measures of psychiatric symptoms rely on the assumption that the items function similarly across groups of people, but measurement equivalence is often not evaluated or reported. This has implications for providing adequate screening, diagnosis, and treatment in culturally and linguistically diverse (CALD) populations. It is therefore imperative to routinely evaluate health behavior assessments to ensure they are not systematically biased against different groups of individuals (Martinková et al., 2017; Teresi and Fleishman, 2007). One approach for evaluating test bias is differential item functioning (DIF). DIF assesses item-level bias across specified groups by comparing item-level responses between groups that have the same overall score on the instrument. DIF analysis has been employed to determine item- and test-level bias in assessment tools, allowing for adjustments that can mitigate or eliminate this bias (Teresi et al., 2012).

The Beck Anxiety Inventory (BAI) is a valid and reliable tool, supported in both clinical and research settings, that is used to screen for 21 anxiety symptoms on a scale of 0–3 (None, Mild, Moderate, Severe) (Beck et al., 1988). A recent psychometric meta-analysis across 192 scholarly works demonstrated strong evidence of high internal consistency and test-retest reliability of the measure, but highlighted the need for further work on cross-cultural considerations (Bardhoshi et al., 2016). Various studies on the psychometric properties of the Spanish-translated BAI have assessed its factor structure. While a single factor has been supported in some samples in the United States and Mexico (Benuto et al., 2020; Toledano-Toledano et al., 2020), other research has suggested varied dimensionality ranging from a two-factor structure in Spanish samples (Magán et al., 2008; Sanz et al., 2012; Sanz and Navarro, 2003; Vázquez Morejón et al., 2014) to a four-factor structure in some Mexican samples (Galindo Vázquez et al., 2015; Guillén Díaz-Barriga and González-Celis Rangel, 2018). In relation to test bias, the only previous DIF analysis of the Spanish-language BAI was completed in a sample from Spain and only assessed bias by sex; DIF was reported on four items (Magán et al., 2008). Notably, similar DIF analyses completed in other samples only assessed bias related to age and sex (Portugal; Quintão et al., 2013) or age, sex, and Parkinson’s disease duration (Netherlands; Forjaz et al., 2013). DIF was not detected in these other studies. To our knowledge, no research to date has assessed the cross-lingual properties of the measure within a sample of English and Spanish speaking Hispanic/Latin Americans in the United States. Given that this group is the largest and one of the fastest growing CALD groups in the United States (Vespa et al., 2018), anxiety symptoms occur in nearly 15% of this group’s members (Terlizzi and Villarroel, 2020), and the BAI is one of the most commonly used measures to assess anxiety symptoms (Bardhoshi et al., 2016), there is a critical need to examine the instrument’s psychometric properties in this population.

While the need to investigate DIF of the BAI related to language is clear, various demographic characteristics are often excluded from descriptions of samples despite evidence of associations between these factors and brain-behavior relationship, thus limiting the interpretability and generalizability of prior research (Medina et al., 2020). Along these lines, these sample characteristics may also influence differences in item-level endorsement on a measure like the BAI. Despite a lower prevalence of anxiety disorders among older adults compared to younger adults (Wolitzky-Taylor et al., 2010), physical changes associated with typical aging (e.g., mobility, muscle strength, bone strength, joint functioning) could influence the endorsement of certain items, given that somatic anxiety symptoms might be associated or confused with changes in the physical body (Whitbourne, 1998). In the context of age-related cognitive decline, a relationship between clinically relevant cognitive impairment and anxiety symptoms has been reported in the literature. Endorsement of anxiety symptoms occurs at higher frequencies among older adults with more severe levels of cognitive impairment (Beaudreau and O’Hara, 2008). In order to address these potential relationships, age and cognitive status were investigated as sources of DIF.

Other sample characteristics that may influence DIF include sex, education, and ethnicity. In relation to sex, prior research suggests females tend to demonstrate higher levels of fear and anxiety than males (McLean and Anderson, 2009). This warrants consideration of whether any anxiety items have significantly higher probabilities of endorsement for either sex. It is possible that some sex differences in anxiety are partially explained by specific anxiety symptoms that are more often experienced by a particular sex. Investigating DIF across education is necessary given that the validity of self-report measures is contingent upon comprehension of instructions and measure items. Special attention to education as an aspect of psychometric properties in development and validation has been recommended for improving generalizability of measurement (McHugh et al., 2011). Although the BAI has been demonstrated as an effective measure of anxiety symptoms among Hispanic/Latin Americans (Carter et al., 2012; Hirai et al., 2006) and Non-Hispanic/Latin Americans (Bardhoshi et al., 2016) alike, further validation on the utility of this measure across cultures is beneficial for advancing measurement.

The current study aimed to further psychometric evidence and understanding of anxiety assessment. Specifically, this study sought to examine the presence and impact of measurement bias, as measured by DIF analysis, across age, sex, education, cognitive status, ethnicity, and test language (English, Spanish). Moreover, we sought to understand how adjusting for DIF impacts on observed group differences.

Method

Sample

Participants.

Sample demographics are summarized in Table 1. Given that this analysis required completed responses, participants with complete BAI and demographic data (n=527) from a larger, epidemiological study of cognitive aging among rural-dwelling individuals residing in West Texas (Project FRONTIER) were utilized. Missing data were removed with listwise deletion. The drawback to this method is reduced sample size, which can potentially bias parameter estimates (Banks, 2015). However, a sample size of 527 is large enough to conduct this analysis as previously evidenced (Dmitrieva et al., 2015; Fieo et al., 2015). The Project FRONTIER cohort has been described in other studies (O’Bryant et al., 2009; Torres et al., 2020). Briefly, to be eligible for Project FRONTIER, participants had to be age 40 or above and reside in either Cochran County or Parmer County, Texas. The protocol included a standardized medical examination, clinical labs, and neuropsychological testing, as well as an interview with the participant and a brief interview with an informant. Data were collected in the language with which the participant reported feeling most comfortable (English, Spanish) by bilingual research team personnel. Data were collected over a two-year period between 2009 and 2011. The study was approved by the University of North Texas (UNT) Health Science Center Institutional Review Board and all participants provided written informed consent. All research was conducted in accordance with the Helsinki declaration.

Table 1.

Sample Demographics

N 527
Mean Age in Years (Range) 61.4 ± 12.7 (40–96)
Mean Education in Years (Range) 10.9 ± 4.3 (0–20)
Mean Beck Anxiety Inventory Score (Range) 6 ± 6.9 (0–41)
Age N %
 Younger (<60) 249 47.2
 Older (60+) 278 52.8
Sex
 Male 162 30.7
 Female 365 69.3
Education
 <12 years 215 40.8
 12+ years 312 59.2
Current Household Income
 Less than $10,000 84 15.9
 $10,001 to $20,000 139 26.4
 $20,001 to $30,000 81 15.4
 $30,001 to $40,000 48 9.1
 $40,001 to $50,000 50 9.5
 $50,001 to $60,000 22 4.2
 $60,001 to $70,000 25 4.7
 $70,001 or Higher 64 12.1
 Refused to Answer/Don’t Know 14 2.7
Ethnicity/Race
 Hispanic/Latin American 221 41.9
  American Indian/Alaskan Native 2 0.4
  Mexican American/Chicano 206 39.1
  Puerto Rican 1 0.2
  White/Caucasian 199 37.8
  Other Hispanic Origin 14 2.7
 Non-Hispanic/Latin American 306 58.1
  American Indian/Alaskan Native 13 2.5
  Black/African American 27 5.1
  White/Caucasian 278 52.8
Primary Language
 English Monolingual 318 60.3
 Spanish Monolingual 78 14.8
 Bilingual (English and Spanish) 131 24.9
Clinical Dementia Rating Global Score
 Normal (0.0) 379 71.9
 Questionable (0.5) 146 27.7
 Mild (1.0) 2 0.4
Beck Anxiety Inventory
 Minimal (0–7) 374 71.0
 Mild (8–15) 105 19.9
 Moderate (16–25) 32 6.1
 Severe (26+) 16 3.0

Demographic grouping.

Age was dichotomized into middle or older adult age groups (40–59 years old, n=272; 60–96 years old, n=302). Education was dichotomized into less than high school education (<12 years, n=243) or higher education (12+ years, n=337) groups. Sex was split by male (n=180) and female (n=400); ethnicity was categorized as Non-Hispanic/Latin American (NH; n=330) and Hispanic/Latin American (H/L; n=249); language of administration was categorized as either English (n=457) or Spanish (n=123). Cognitive status was dichotomized based on Clinical Dementia Rating – Global Cognition Scale (CDR-GS; Morris, 1997). Cognitively normal participants (CDR-GS: 0; n=379) were compared to participants with evidence of cognitive impairment (CDR-GS: 0.5, n=146; CDR: 1.0, n=2). The CDR-GS is a valid and reliable measure for rating stages of dementia severity. A global score of 0 indicates normal cognition while a score of 0.5 indicates questionable functioning (scores of 1, 2, and 3 indicate mild, moderate, and severe dementia, respectively). A score of 0.5 is also used to denote evidence of mild cognitive impairment that is inconsistent with normal cognitive status but not severe enough to reflect a dementia diagnosis. A recent meta-analysis pooled the diagnostic accuracy of the CDR-GS across 13 studies, indicating a 93% sensitivity and 97% specificity for mild cognitive impairment, and 87% sensitivity and 99% specificity for dementia (Huang et al., 2021).

Analyses

Analyses were completed using R (R Core Team, 2013). The distribution of BAI scores was assessed with the psych package (Revelle, 2011) and the unidimensionality assumption of item response theory (IRT) DIF was assessed with a single-factor model in the lavaan package (Rosseel, 2012). Given that the score distribution was positively skewed (Supplementary Table 1; Supplementary Figures 1 and 2), a diagonally weighted least squares (DWLS) estimator was utilized for confirmatory factor analysis (Li, 2016; Mîndrilă, 2010; Zhao, 2015). Conventional criteria for acceptable model fit, CFI>0.95, TLI>0.95, RMSEA<0.06, and SRMR<0.08 (Reeve et al., 2007), were considered in addition to factor loadings (Stevens, 2012). The semTools package (Jorgensen et al., 2020) was used to estimate composite reliability with polychoric correlations from the single-factor model. Conventional criteria for acceptable reliability: Cronbach’s α >0.80, McDonald’s ω >0.75, and average variance extracted>0.50 were considered (Fornell and Larcker, 1981; Reise et al., 2013; Thorndike and Thorndike-Christ, 2010).

DIF detection.

A logistic regression/IRT approach based on the likelihood ratio χ2 test, using the lordif package (Choi et al., 2011), was employed to assess for the presence of DIF in the BAI. For each source of DIF, IRT was implemented to estimate the underlying level of anxiety (theta) with Samejima’s Graded Response Model for ordinal variables (Samejima, 1968). The theta value (i.e., latent anxiety variable) was used as an input for the logistic regression analyses to form three nested models for each item with varying explanatory variables. The first modelled the probability of endorsing an item in relation to the latent anxiety variable. The second model included a term for group membership, and the third model included an additional interaction term between the latent anxiety variable and group membership (Fieo et al., 2015; Juhel and Gaillot, 2012). DIF was detected by comparing the log likelihood values between models. Uniform DIF (a constant effect across all levels of anxiety) was identified with Models 1 and 2; non-uniform DIF (an effect that varies conditionally on the level of anxiety) with Models 2 and 3. The overall test of “total DIF effect” was assessed by comparing Models 1 and 3, which identified both uniform and non-uniform DIF and controlled the overall Type I error rate (see Choi et al., 2011 for further details). Therefore, DIF was considered present if Model 3 was significantly improved from Model 1. DIF was considered significant if the likelihood (LR) χ2 p-value was less than 0.01, and the McFadden R2 was greater than 0.02.

Group-level impact.

Item-level impact of DIF by each demographic factor is demonstrated through group-specific IRT parameters. McFadden R2 values indicate DIF magnitude. A high discrimination parameter suggests the item has a high ability to differentiate subjects on latent anxiety scores. Category threshold parameters represent the point along the latent anxiety scale at which a respondent has a 50% probability of endorsing the indicated item-level response or higher. Comparison groups with a lower category threshold parameter have a higher probability of endorsing that item-level response. Missing category thresholds indicate that data were collapsed and recoded by lordif due to missing item-level responses for at least one comparison group. This reflects a strength of the graded response model in that it does not require items to have the same number of response categories (Choi et al., 2011; Hays et al., 2000). The Mann-Whitney U test and Hedge’s g were assessed separately on the raw and mean DIF-adjusted scores to examine standardized latent mean differences between grouped demographic factors. The Mann-Whitney U test was employed in place of Student’s t tests due to non-normality of data as indicated by the Shapiro-Wilk test of normality. The jamovi and compute.es packages were utilized for group difference calculations (Del Re, 2010; Şahin and Aybek, 2019).

Individual-level impact.

The impact of DIF on individual-level test scores was evaluated by comparing the original IRT scores to bias-adjusted scores that accounted for each source of DIF. Using IRT, we estimated both the anxiety level and the standard error of measurement for every participant. These estimates were created for each source of DIF and provided both original scores and bias-adjusted scores. We then subtracted the original IRT anxiety scores from the bias-adjusted scores. Salient DIF was identified when the difference between the scores was larger than the median standard error of measurement (MSEM) of the original scores. Identification of salient DIF was used to determine the presence of measurement bias (Crane et al., 2010; Dmitrieva et al., 2015; Fieo et al., 2015; Gibbons et al., 2009; Kleinman and Teresi, 2016).

Results

The average raw BAI score was 6 ± 6.9 (observed range = 0 to 41). The single-factor model had marginally adequate fit (CFI=0.91, TLI=0.90, RMSEA [95% CI]=0.06 [0.06, 0.07], SRMR=0.09). All items significantly loaded onto the single factor (p≤0.001) and factor loadings ranged from 0.516 to 0.879 (Supplementary Table 2). Composite reliability was acceptable (Cronbach’s α: 0.94, McDonald’s ω: 0.78, average variance extracted: 0.46).

Across all assessed demographic factors, significant DIF presented on 10/21 items (Table 2). The MSEM of the BAI was 0.398. However, the impact of DIF was not salient for any of the demographic factors independently, based on maximum shifts in scores of 0.13 for age, 0.20 for sex, 0.14 for cognitive status, 0.15 for education, 0.20 for ethnicity, 0.18 for test language. When all sources were aggregated, only one participant presented with a value (0.401) beyond the MSEM, indicating a salient impact of DIF for this individual (Figure 1).

Table 2.

Significant DIF Item Parameters

Item Group Discrimination Category thresholds R2
Mild Moderate Severe Total Uniform Non-Uniform
2 Feeling hot
Age Younger (<60) 0.981 0.428 1.656 3.524 0.009** 0.008** 0.001
Older (60+) 1.173 0.921 1.911 3.901
Sex Male 1.435 1.270 2.154 - 0.033*** 0.031*** 0.002
Female 0.973 0.413 1.649 -
3 Wobbliiness in legs
Age Younger (<60) 1.529 1.320 2.091 3.277 0.019*** 0.019*** 0.0001
Older (60+) 1.690 0.689 1.761 2.753
Cognitive Status Normal 1.553 1.246 2.048 3.544 0.024*** 0.021*** 0.002
Impaired 1.262 0.513 2.013 2.965
6 Dizzy or lightheaded
Age Younger (<60) 1.415 1.354 2.641 - 0.024*** 0.021*** 0.003
Older (60+) 1.764 0.723 1.846 -
7 Heart pounding or racing
Cognitive Status Normal 2.453 0.907 1.783 - 0.019** 0.006* 0.012**
Impaired 1.418 1.373 2.650 -
8 Unsteady
Age Younger (<60) 1.769 1.148 2.423 - 0.049*** 0.047*** 0.002
Older (60+) 2.084 0.430 1.624 -
13 Shaky
Age Younger (<60) 1.974 1.266 2.507 - 0.017** 0.015** 0.002
Older (60+) 2.297 0.896 1.977 -
16 Fear of dying
Ethnicity Non-Hispanic/Latin American 2.495 1.920 2.314 - 0.051*** 0.002 0.049***
Hispanic/Latin American 0.776 3.473 5.021 -
Language English 1.961 2.164 2.708 - 0.046** 0.012 0.034**
Spanish 0.566 3.999 5.950 -
18 Indigestion or discomfort in abdomen
Education <12 years 1.469 0.831 1.828 2.987 0.018*** 0.006* 0.012***
12+years 0.773 0.482 2.972 5.429
19 Faint
Education <12 years 0.997 2.405 3.866 - 0.028** 0.024** 0.004
12+years 1.454 2.605 3.614 -
Ethnicity Non-Hispanic/Latin American 2.495 1.920 2.314 - 0.051*** 0.002 0.049***
Hispanic/Latin American 0.776 3.473 5.021 -
Language English 1.328 2.761 3.798 - 0.061*** 0.061*** 0.001
Spanish 1.142 1.610 3.057 -
21 Sweating (not due to heat)
Sex Male 1.828 1.479 2.659 - 0.016** 0.013** 0.002
Female 1.133 1.351 2.250 -

Note: McFadden R2 values indicate DIF magnitude; χ2 p-values indicate significance level:

* =

<0.05,

** =

<0.01,

*** =

<0.001.

Figure 1.

Figure 1.

Box plot of changes in IRT-anxiety scores after accounting for DIF

Box plot of changes in IRT-anxiety scores after adjusting for DIF (Wickham, 2016). The plot shows the difference between unadjusted scores and scores accounting for DIF due to each factor. If DIF had no impact for an individual, their anxiety score would be unchanged, and therefore at zero. The boxes represent the interquartile range, and the whiskers signify the upper and lower adjacent values as defined by Tukey (Mcgill et al., 1978). Observations more extreme than the upper and lower adjacent values are outliers. Dotted vertical lines are placed at the median standard error of measurement (0.398), and observations outside the lines indicate the presence of salient DIF.

Mann-Whitney’s U tests with original, unadjusted values indicated small distribution differences between groups by age (p= 0.010, g= 0.24), sex (p= 0.030, g= 0.20), and cognition (p= 0.001, g= 0.31). When adjusting for all assessed sources of bias, differences by age (p= 0.004, g= 0.26) and cognitive status (p= 0.003, g= 0.31) remained significant, while differences by sex did not (p= 0.110, g=0.19). Compared to original scores, these adjusted differences between groups increased by age, decreased slightly by sex, and remained practically identical by cognitive status. Overall, greater anxiety was endorsed by adults 40–59 years of age compared to adults 60–96 years of age; greater anxiety was endorsed by females than males; and greater anxiety was endorsed by those with cognitive impairment compared to those without impairment (Figure 2; Table 3).

Figure 2.

Figure 2.

Latent Trait Distribution

Grouped kernel density distributions (Wickham, 2016) of original IRT-anxiety scores by a) age, b) sex, c) education, d) cognitive status, e) ethnicity, f) language. Vertical lines indicate mean values.

Table 3.

Item Response Theory Latent Mean Differences

Group Original Adjusted
Mean (SD) Hedge’s g [95% CI] Mean (SD) Hedge’s g [95% CI]
Age
 40–59 years 0.11 (0.94) 0.24 [0.07, 0.41] * 0.12 (0.95) 0.26 [0.09, 0.43] **
 60–96 years −0.10 (0.84) −0.11 (0.83)
Sex
 Male −0.13 (0.85) 0.20 [0.02, 0.39] * −0.12 (0.85) 0.19 [0.01, 0.38]
 Female 0.06 (0.91) 0.05 (0.91)
Education
 <12 years −0.03 (0.95) 0.05 [−0.12, 0.22] −0.03 (0.95) 0.05 [−0.13, 0.22]
 12+ years 0.02 (0.85) 0.02 (0.86)
Cognitive Status
 Normal −0.08 (0.88) 0.31 [0.12, 0.50] ** −0.08 (0.88) 0.31 [0.12, 0.50] **
 Impaired 0.20 (0.91) 0.20 (0.91)
Ethnicity
 Non-Hispanic/Latin American 0.02 (0.84) 0.04 [−0.13, 0.21] 0.01 (0.84) 0.03 [−0.14, 0.21]
 Hispanic/Latin American −0.02 (0.97) −0.02 (0.97)
Language
 English 0.02 (0.88) 0.09 [−0.12, 0.3] 0.02 (0.89) 0.09 [−0.12, 0.31]
 Spanish −0.06 (0.94) −0.07 (0.93)

Mann-Whitney U test p-value:

* =

<0.05,

** =

<0.01.

Unadjusted IRT anxiety scores ranged from −1.21 – 2.88 with an average of 0.0004 ± 0.8942; mean DIF-adjusted scores ranged from −1.22 – 2.78 with an average of 0.0006 ± 0.8945.

Limitations

Several limitations of the current study are important to note. In the current logistic regression IRT/DIF framework, covariance between demographic characteristics of interest is addressed by aggregating results of the sequential DIF tests. Statistical advances in the field such as automated moderated nonlinear factor analysis (aMNLFA; Gottfredson et al., 2019) and generalized partial credit models (GPCM; Schauberger and Mair, 2020) provide methods of addressing covariance by estimating models with simultaneous demographic predictors. R packages with these statistical implementations are available for use in future research when testing hypotheses regarding measurement invariance and DIF.

While this was a CALD sample from rural West Texas, there was limited racial diversity that precluded evaluation of DIF by race. The majority of the Hispanic/Latin American sample identified as Mexican American or Chicano, results in which may not be generalizable to other Latin American heritage groups. The cognitively impaired group generally had mild symptoms. It is unclear how these results would hold in individuals with more severe cognitive impairment. Similarly, most of the sample (nearly 90%) endorsed symptoms suggestive of minimal to mild levels of anxiety. Low endorsement of higher anxiety responses warranted lordif to collapse and recode many item-level responses across groups so that the models could be estimated. The number of response categories ranged from 2–4 (None/Mild – None/Mild/Moderate/Severe) and varied depending on the demographic factor of interest (Supplementary Table 6). Further work may be needed to support these findings in a clinically anxious sample.

Discussion

We sought to examine the psychometric properties of the Beck Anxiety Inventory as they relate to potential test bias in a multiethnic cohort across several demographic variables. The results of the DIF analyses reflect minor trends in the ways some groups may endorse certain items in contrast to others (e.g., adults 60 years of age and older were more likely than adults less than 60 years of age to endorse somatic symptoms of ”wobbliness in legs,” “dizzy or lightheaded,” “unsteady,” and “shaky”). Small differences, robust to DIF-adjustment, were observed in grouped distributions of latent anxiety scores, where adults 40–59 years of age and those with cognitive impairment endorsed significantly higher anxiety than their comparison groups. Furthermore, a significant difference by sex was detected with original anxiety scores but was not robust to DIF-adjustment. This indicated that differences in endorsed anxiety between males and females were mitigated when accounting for all sources of DIF.

Overall, negligible item-level bias was identified across demographic factors considered. This is evidenced in two ways. First, only 1 out of 527 participants presented with salient DIF when all sources of bias were considered together. Having only one individual with salient DIF reflects benign test-bias as demonstrated in a similar study that utilized this analysis technique and presented with a higher proportion of salient DIF that was interpreted as overall unimpactful to assessment (Dmitrieva et al., 2015). Second, results of standardized latent mean difference calculations between grouped demographic factors remained practically identical when comparing original and DIF-adjusted calculations. These results build on existing psychometric evidence suggesting the BAI is a valid tool for assessing anxiety across age, sex, education, ethnicity, cognitive status, and test language (English or Spanish).

This study provides additional support for the valid use of the BAI in a culturally and linguistically diverse population. In order to enhance confidence in the reliability and validity of health measurement in CALD populations, thorough and continuous evaluation of measurement equivalence with varied methods should be pursued. Trends in the way that some groups may respond to clinical measures must be acknowledged when screening individuals whose scores may be susceptible to bias. Differences in the interpretation of measurement items have considerable implications on diagnosis and other clinical decision making. Notably, DIF analyses demonstrated that individuals of older age with evidence of cognitive impairment may be prone to endorsing somatic symptoms on the BAI that may reflect cognitive aging rather than manifest anxiety. While no substantial bias due to ethnicity and test language were demonstrated on the BAI in the current sample, which may not generalize to more anxious groups or other CALD populations, future research and clinical decision making must consider potential factors that may influence the reporting of anxiety symptoms in diverse populations.

Supplementary Material

1

Highlights.

  • Trends in anxiety symptom endorsement were detected across demographic factors

  • Differential item functioning partially explained differences in anxiety by sex

  • The impact of detected differential item functioning was not adverse

  • The Beck Anxiety Inventory is a valid measure of anxiety for diverse groups

Acknowledgements

We thank the participants who made this research possible.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

The authors have no relevant disclosures.

References

  1. Azocar F, Areán P, Miranda J, Muñoz RF, 2001. Differential item functioning in a Spanish translation of the Beck Depression Inventory: Item Bias in a Spanish Translation of the BDI. J. Clin. Psychol 57, 355–365. 10.1002/jclp.1017 [DOI] [PubMed] [Google Scholar]
  2. Banks K, 2015. An Introduction to Missing Data in the Context of Differential Item Functioning. Pract. Assess. Res. Eval 20. 10.7275/FPG0-5079 [DOI] [Google Scholar]
  3. Bardhoshi G, Duncan K, Erford BT, 2016. Psychometric Meta-Analysis of the English Version of the Beck Anxiety Inventory. J. Couns. Dev 94, 356–373. 10.1002/jcad.12090 [DOI] [Google Scholar]
  4. Beaudreau SA, O’Hara R, 2008. Late-Life Anxiety and Cognitive Impairment: A Review. Am. J. Geriatr. Psychiatry 16, 790–803. 10.1097/JGP.0b013e31817945c3 [DOI] [PubMed] [Google Scholar]
  5. Beck AT, Brown G, Epstein N, Steer RA, 1988. An Inventory for Measuring Clinical Anxiety: Psychometric Properties. J. Consult. Clin. Psychol 56, 893–897. [DOI] [PubMed] [Google Scholar]
  6. Benuto LT, Zimmermann M, Gonzalez FR, Corral Rodríguez A, 2020. A confirmatory factor analysis of the beck anxiety inventory in Latinx primary care patients. Int. J. Ment. Health 1–21. 10.1080/00207411.2020.1812833 [DOI] [Google Scholar]
  7. Carter MM, Mitchell FE, Sbrocco T, 2012. Treating ethnic minority adults with anxiety disorders: Current status and future recommendations. J. Anxiety Disord 26, 488–501. 10.1016/j.janxdis.2012.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Choi SW, Gibbons LE, Crane PK, 2011. lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. J. Stat. Softw 39, 1–30. 10.18637/jss.v039.i08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Crane PK, Gibbons LE, Willig JH, Mugavero MJ, Lawrence ST, Schumacher JE, Saag MS, Kitahata MM, Crane HM, 2010. Measuring depression levels in HIV-infected patients as part of routine clinical care using the nine-item Patient Health Questionnaire (PHQ-9). AIDS Care 22, 874–885. 10.1080/09540120903483034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Del Re A, 2010. compute.es: Compute Effect Sizes. The Comprehensive R Archive Network. [Google Scholar]
  11. Dmitrieva NO, Fyffe D, Mukherjee S, Fieo R, Zahodne LB, Hamilton J, Potter GG, Manly JJ, Romero HR, Mungas D, Gibbons LE, 2015. Demographic characteristics do not decrease the utility of depressive symptoms assessments: examining the practical impact of item bias in four heterogeneous samples of older adults: Differential item function in depressive symptoms. Int. J. Geriatr. Psychiatry 30, 88–96. 10.1002/gps.4121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ellis BB, 1989. Differential Item Functioning: Implications for Test Translations. J. Appl. Psychol 74, 912–921. 10.1037/0021-9010.74.6.912 [DOI] [Google Scholar]
  13. Fieo R, Mukherjee S, Dmitrieva NO, Fyffe DC, Gross AL, Sanders ER, Romero HR, Potter GG, Manly JJ, Mungas DM, Gibbons LE, 2015. Differential item functioning due to cognitive status does not impact depressive symptom measures in four heterogeneous samples of older adults: DIF due to cognitive status in depression scales. Int. J. Geriatr. Psychiatry 30, 911–918. 10.1002/gps.4234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Forjaz MJ, Martinez-Martin P, Dujardin K, Marsh L, Richard IH, Starkstein SE, Leentjens AFG, 2013. Rasch analysis of anxiety scales in Parkinson’s disease. J. Psychosom. Res 74, 414–419. 10.1016/j.jpsychores.2013.02.009 [DOI] [PubMed] [Google Scholar]
  15. Fornell C, Larcker DF, 1981. Evaluating Structural Equation Models with Unobservable Variables and Measurement Error. J. Mark. Res 18, 39–50. https://doi.org/10.1177%2F002224378101800104 [Google Scholar]
  16. Galindo Vázquez O, Rojas Castillo E, Meneses García A, Aguilar Ponce JL, Álvarez Avitia MÁ, Alvarado Aguilar S, 2015. Propiedades psicométricas del inventario de ansiedad de beck (BAI) en pacientes con cáncer. Psicooncología 12, 51–58. 10.5209/rev_PSIC.2015.v12.n1.48903 [DOI] [Google Scholar]
  17. Gibbons LE, McCurry S, Rhoads K, Masaki K, White L, Borenstein AR, Larson EB, Crane PK, 2009. Japanese–English language equivalence of the Cognitive Abilities Screening Instrument among Japanese-Americans. Int. Psychogeriatr 21, 129–137. 10.1017/S1041610208007862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gottfredson NC, Cole VT, Giordano ML, Bauer DJ, Hussong AM, Ennett ST, 2019. Simplifying the implementation of modern scale scoring methods with an automated R package: Automated moderated nonlinear factor analysis (aMNLFA). Addict. Behav 94, 65–73. 10.1016/j.addbeh.2018.10.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Guillén Díaz-Barriga C, González-Celis Rangel AL, 2018. Propiedades psicométricas del Inventario de Ansiedad de Beck en adultos asmáticos mexicanos. Psicol. Salud 29, 5–16. 10.25009/pys.v29i1.2563 [DOI] [Google Scholar]
  20. Hays RD, Morales LS, Reise SP, 2000. Item Response Theory and Health Outcomes Measurement in the 21st Century: Med. Care 38, II-28–II–42. 10.1097/00005650-200009002-00007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hirai M, Stanley MA, Novy DM, 2006. Generalized Anxiety Disorder in Hispanics: Symptom Characteristics and Prediction of Severity. J. Psychopathol. Behav. Assess 28, 49–56. 10.1007/s10862-006-4541-2 [DOI] [Google Scholar]
  22. Huang H, Tseng Y, Chen Y, Chen P, Chiu H, 2021. Diagnostic accuracy of the Clinical Dementia Rating Scale for detecting mild cognitive impairment and dementia: A bivariate meta-analysis. Int. J. Geriatr. Psychiatry 36, 239–251. 10.1002/gps.5436 [DOI] [PubMed] [Google Scholar]
  23. Jorgensen TD, Pornprasertmanit S, Schoemann AM, Rossee Y, 2020. semTools. The Comprehensive R Archive Network. [Google Scholar]
  24. Juhel J, Gaillot A-C, 2012. Structural validity and age-based differential item functioning of the French Nottingham Health Profile in a sample of surgery patients. Adv Psychol Stud 1, 14–21. [Google Scholar]
  25. Kleinman M, Teresi JA, 2016. Differential item functioning magnitude and impact measures from item response theory models. Psychol Test Assess Model 58, 79–98. https://www.ncbi.nlm.nih.gov/pubmed/28706769 [PMC free article] [PubMed] [Google Scholar]
  26. Li C-H, 2016. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods 48, 936–949. 10.3758/s13428-015-0619-7 [DOI] [PubMed] [Google Scholar]
  27. Magán I, Sanz J, García-Vera MP, 2008. Psychometric Properties of a Spanish Version of the Beck Anxiety Inventory (BAI) in General Population. Span. J. Psychol 11, 626–640. 10.1017/S1138741600004637 [DOI] [PubMed] [Google Scholar]
  28. Martinková P, Drabinová A, Liaw Y-L, Sanders EA, McFarland JL, Price RM, 2017. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments. CBE—Life Sci. Educ 16. 10.1187/cbe.16-10-0307 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mcgill R, Tukey JW, Larsen WA, 1978. Variations of Box Plots. Am. Stat 32, 12–16. 10.1080/00031305.1978.10479236 [DOI] [Google Scholar]
  30. McHugh RK, Rasmussen JL, Otto MW, 2011. Comprehension of self-report evidence-based measures of anxiety. Depress. Anxiety 28, 607–614. 10.1002/da.20827 [DOI] [PubMed] [Google Scholar]
  31. McLean CP, Anderson ER, 2009. Brave men and timid women? A review of the gender differences in fear and anxiety. Clin. Psychol. Rev 29, 496–505. 10.1016/j.cpr.2009.05.003 [DOI] [PubMed] [Google Scholar]
  32. Medina LD, Torres S, Gioia A, Lopez AO, Wang J, Cirino PT, 2020. Reporting of Demographic Variables in Neuropsychological Research: An Update of O’Bryant et al.’s Trends in the Current Literature. J. Int. Neuropsychol. Soc 1–11. 10.1017/S1355617720001083 [DOI] [PubMed] [Google Scholar]
  33. Mîndrilă D, 2010. Maximum Likelihood (ML) and Diagonally Weighted Least Squares (DWLS) Estimation Procedures: A Comparison of Estimation Bias with Ordinal and Multivariate Non-Normal Data. Int. J. Digit. Soc 1, 60–66. 10.20533/ijds.2040.2570.2010.0010 [DOI] [Google Scholar]
  34. Morris JC, 1997. Clinical Dementia Rating: A Reliable and Valid Diagnostic and Staging Measure for Dementia of the Alzheimer Type. Int. Psychogeriatr 9, 173–176. 10.1017/S1041610297004870 [DOI] [PubMed] [Google Scholar]
  35. O’Bryant SE, Zhang Y, Owen D, Cherry B, Ramirez V, Silva M, Hudson C, Hobson V, Grammas P, Schiffer RB, Manning G, Schrimsher GW, Lucas JA, Sutker PB, 2009. The Cochran County Aging Study: Methodology and Descriptive Statistics. Tex.Public Health J 61, 5–7. [Google Scholar]
  36. Quintão S, Delgado AR, Prieto G, 2013. Validity study of the Beck Anxiety Inventory (Portuguese version) by the Rasch Rating Scale model. Psicol. Reflex. E Crítica 26, 305–310. 10.1590/S0102-79722013000200010 [DOI] [Google Scholar]
  37. R Core Team, 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  38. Ramirez Gomez L, Jain FA, D’Orazio LM, 2017. Assessment of the Hispanic Cognitively Impaired Elderly Patient. Neurol. Clin 35, 207–229. 10.1016/j.ncl.2017.01.003 [DOI] [PubMed] [Google Scholar]
  39. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, Thissen D, Revicki DA, Weiss DJ, Hambleton RK, Liu H, Gershon R, Reise SP, Lai J, Cella D, 2007. Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med. Care 45, S22–S31. 10.1097/01.mlr.0000250483.85507.04 [DOI] [PubMed] [Google Scholar]
  40. Reise SP, Bonifay WE, Haviland MG, 2013. Scoring and Modeling Psychological Measures in the Presence of Multidimensionality. J. Pers. Assess 95, 129–140. 10.1080/00223891.2012.725437 [DOI] [PubMed] [Google Scholar]
  41. Revelle W, 2011. psych: Procedures for Psychological, Psychometric, and Personality Research. The Comprehensive R Archive Network. [Google Scholar]
  42. Rosseel Y, 2012. lavaan: An R Package for Structural Equation Modeling. J. Stat. Softw 48. 10.18637/jss.v048.i02 [DOI] [Google Scholar]
  43. Şahin M, Aybek E, 2019. Jamovi: An Easy to Use Statistical Software for the Social Scientists. Int. J. Assess. Tools Educ 6, 670–692. 10.21449/ijate.661803 [DOI] [Google Scholar]
  44. Samejima F, 1968. Estimation of Latent Ability Using a Response Pattern of Graded Scores. ETS Res. Bull. Ser 1968. 10.1002/j.2333-8504.1968.tb00153.x [DOI] [Google Scholar]
  45. Sanz J, García-Vera MP, Fortún M, 2012. The Beck Anxiety Inventory: Psychometric properties of the Spanish version in patients with psychological disorders. Behav. Psychol. Psicol. Conduct. Rev. Int. Clínica Salud 20, 563–583. https://psycnet.apa.org/record/2012-34357-005 [Google Scholar]
  46. Sanz J, Navarro ME, 2003. The Psychometric Properties of a Spanish Version of the Beck Anxiety Inventory (BAI) in a University Students Sample. Ansiedad Estrés 9, 59–84. https://psycnet.apa.org/record/2003-99798-006 [Google Scholar]
  47. Schauberger G, Mair P, 2020. A regularization approach for the detection of differential item functioning in generalized partial credit models. Behav. Res. Methods 52, 279–294. 10.3758/s13428-019-01224-2 [DOI] [PubMed] [Google Scholar]
  48. Statucka M, Cohn M, 2019. Origins Matter: Culture Impacts Cognitive Testing in Parkinson’s Disease. Front. Hum. Neurosci 13. 10.3389/fnhum.2019.00269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stevens JP, 2012. Applied Multivariate Statistics for the Social Sciences. Routledge. [Google Scholar]
  50. Teresi JA, Fleishman JA, 2007. Differential item functioning and health assessment. Qual. Life Res 16, 33–42. 10.1007/s11136-007-9184-6 [DOI] [PubMed] [Google Scholar]
  51. Teresi JA, Ramirez M, Jones RN, Choi S, Crane PK, 2012. Modifying Measures Based on Differential Item Functioning (DIF) Impact Analyses. J. Aging Health 24, 1044–1076. 10.1177/0898264312436877 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Terlizzi EP, Villarroel MA, 2020. Symptoms of Generalized Anxiety Disorder Among Adults: United States, 2019 (NCHS Data Brief No. 378). National Center for Health Statistics, National Health Interview Survey. [PubMed] [Google Scholar]
  53. Thorndike RM, Thorndike-Christ TM, 2010. Measurement and evaluation in psychology and education, 8th ed. Pearson, New York, NY. [Google Scholar]
  54. Toledano-Toledano F, Moral de la Rubia J, Domínguez-Guedea MT, Nabors LA, Barcelata-Eguiarte BE, Rocha-Pérez E, Luna D, Leyva-López A, Rivera-Rivera L, 2020. Validity and Reliability of the Beck Anxiety Inventory (BAI) for Family Caregivers of Children with Cancer. Int. J. Environ. Res. Public. Health 17, 7765. 10.3390/ijerph17217765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Torres S, Alexander A, O’Bryant S, Medina LD, 2020. Cognition and the Predictive Utility of Three Risk Scores in an Ethnically Diverse Sample. J. Alzheimers Dis 75, 1049–1059. 10.3233/JAD-191284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Vázquez Morejón AJ, Vázquez-Morejón Jiménez R, Zanin GB, 2014. Beck Anxiety Inventory: Psychometric Characteristics in a Sample from the Clinical Spanish Population. Span. J. Psychol 17, E76. 10.1017/sjp.2014.76 [DOI] [PubMed] [Google Scholar]
  57. Vespa J, Medina L, Armstrong DM, 2018. Demographic turning points for the United States: Population projections for 2020 to 2060. US Department of Commerce, Economics and Statistics Administration, US; …. [Google Scholar]
  58. Whitbourne SK, 1998. Physical changes in the aging individual: Clinical implications., in: Clinical Geropsychology. American Psychological Association, Washington, DC, US, pp. 79–108. 10.1037/10295-006 [DOI] [Google Scholar]
  59. Wickham H, 2016. ggplot2: Elegant Graphics for Data Analysis. Springer. [Google Scholar]
  60. Wolitzky-Taylor KB, Castriotta N, Lenze EJ, Stanley MA, Craske MG, 2010. Anxiety disorders in older adults: a comprehensive review. Depress. Anxiety 27, 190–211. 10.1002/da.20653 [DOI] [PubMed] [Google Scholar]
  61. Zhao Y, 2015. The Performance of Model Fit Measures by Robust Weighted Least Squares Estimators in Confirmatory Factor Analysis. Pennsylvania State University. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES