Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: Int J Eat Disord. 2019 Jun 24;52(9):1047–1051. doi: 10.1002/eat.23126

Gender-based differential item functioning in measures of eating pathology

Lauren M Schaefer 1, Lisa M Anderson 2, Melissa Simone 2, Shannon M O’Connor 3, Hana Zickgraf 3, Drew A Anderson 4, Rachel F Rodgers 5,6, J Kevin Thompson 7
PMCID: PMC6815513  NIHMSID: NIHMS1054698  PMID: 31233228

Abstract

Objective:

Eating disorder (ED) symptoms are common and impairing in males, despite their perception as “female” disorders. As existing self-report symptom measures were developed and primarily validated in women, there is a need to establish the utility of these measures in men. The present study used differential item functioning (DIF) analyses to explore whether item endorsement differed by gender for three commonly used ED symptom measures.

Method:

Participants were undergraduate men (n = 1,083) and women (n = 2,424) from three universities in the United States. Global scores on the Eating Attitudes Test-26 (EAT-26), Eating Disorder Examination Questionnaire (EDEQ), and Eating Disorder Diagnostic Scale for DSM-IV (EDDS) were examined. Tests of DIF were conducted by regressing each item against its composite scale score, and then comparing fit and variance explained (R2) to a model with the interaction of item*gender. The clinical significance threshold for DIF is ΔR2 ≥ 0.13.

Results:

There was no evidence of clinically significant DIF within the EAT-26, EDEQ, or EDDS.

Discussion:

Findings suggest that the examined measures perform similarly for undergraduate men and women, supporting their use in nonclinical male samples. However, development and testing of items reflecting ED symptoms that more commonly occur in males (e.g., muscularity-oriented behaviors) is encouraged.

Keywords: assessment, disordered eating, gender differences, measurement, men, psychometric

1 |. INTRODUCTION

Although men have historically been underrepresented in eating disorder (ED) research, males constitute a substantial and increasing portion of ED cases (Murray et al., 2017), highlighting a critical need for additional research examining ED pathology in this population. As accumulating research suggests gender-based differences in body image concerns (Murray, Griffiths, & Mond, 2016) and the specific behaviors enacted to achieve gendered body ideals (Mitchison & Mond, 2015), investigators have begun to question the appropriateness of existing measures for assessing ED concerns in men (Schaefer et al., 2018). Despite regular use in male samples, measures of ED symptomatology have typically been developed and validated within female samples (e.g., Eating Attitudes Test-26 [EAT-26]; Garner, Olmsted, Bohr, & Garfinkel, 1982), introducing substantial potential for gender-based measurement bias in the resulting scales (Mitchison & Mond, 2015). For example, existing measures primarily assess thinness-oriented concerns (e.g., dieting behavior, fears of fatness, and weight gain), which may not fully reflect the muscularity-oriented body image and eating concerns that are more common among males (Murray et al., 2017). Prior psychometric examinations of self-report ED measures suggest the possibility of gender bias in these scales. For example, clinical and community samples of men consistently demonstrate lower mean scores on ED measures than clinical and community samples of women (e.g., Eating Disorder Examination Questionnaire [EDEQ]; Lavender, De Young, & Anderson, 2010; Smith et al., 2017). Further, empirically derived clinical cutoffs, which are used to identify probable ED cases, are lower for men than women (Rø, Reas, & Stedal, 2015; Schaefer et al., 2018), suggesting that men with ED pathology may be less likely to endorse certain items (e.g., those relating to predominantly female body ideals) than women with ED pathology.

An implicit assumption underlying the comparison of means across groups is that a given measure assesses the same construct in the same way within each group (Thielemann et al., in press). That is, given the same level of eating pathology, members of distinct subgroups should exhibit the same likelihood of endorsing an item (Meulders & Xie, 2004). If the probability of item endorsement is influenced by other variables (e.g., gender), the item demonstrates differential item functioning (DIF). A small number of studies have examined gender-based DIF in measures of body image and ED pathology. Thielemann et al. (in press) found evidence of clinically significant DIF on three items of the Eating Attitudes Test-8 (EAT-8) indexing consumption of diet foods, discomfort after consuming sweets, and fear of being overweight. Reilly, Anderson, Schaumberg, and Anderson (2014) found no evidence of clinically significant DIF on the Weight and Shape Concern subscales of the EDEQ, but did observe DIF on a Body Shape Questionnaire item related to laxative abuse.

Given concerns regarding the potential for gender bias in measures of ED symptomatology, the goal of the current study was to examine gender-based DIF in three commonly used measures of ED pathology. As the field’s ability to successfully identify EDs among men depends on the valid assessment of ED symptoms within this group, evidence of gender-based DIF within these measures would have important implications for both clinical practice and scientific research.

2 |. METHOD

2.1 |. Participants

Participants were undergraduates at the University of Pennsylvania (Penn; n = 1,125), University at Albany, State University of New York (Albany; n = 914), and the University of South Florida (USF; n = 1,480). Sample-specific demographics are presented in Table 1.

TABLE 1.

Demographic characteristics for each analytic sample

Analytic sample
Demographic characteristics EDEQ (USF, n = 1,425) EAT-26 (Penn, n = 1,125) EDDS (Albany, n = 902)
Age M (SD) 21.51 (4.9) 19.9 (2.2) 19.2 (3.8)
BMI [kg/m2] M (SD) 24.22 (5.3) 22.6 (4.4) 22.8 (3.9)
Gender n (valid %)
 Men 301 (21.1) 367 (32.5) 365 (40.5)
 Women 1,124 (78.9) 760 (67.4) 537 (59.5)
Race/ethnicity n (valid %)a
 African American or black 196 (13.8) 73 (6.5) 89 (9.9)
 Asian 106 (7.4) 202 (17.9) 94 (10.4)
 Hispanic or Latinx 228 (16.0) 134 (11.9) 26 (2.9)
 White 752 (52.8) 596 (52.9) 599 (66.4)
 Other or mixed heritage 142 (10.0) 115 (10.2) 94 (10.4)

Note. EDEQ, Eating Disorder Examination Questionnaire; EAT-26, Eating Attitudes Test-26; EDDS, Eating Disorder Diagnostic Scale for DSM-IV; USF, sample from University of South Florida; Penn, sample from University of Pennsylvania; Albany, sample from University at Albany, State University of New York; NA, race/ethnicity data for this category was not available in the particular study sample.

a

Percentages do not add up to 100% due to missing self-report data.

2.2 |. Measures

2.2.1 |. Demographics

Participants indicated their gender, age, and race/ethnicity via self-report questionnaire. Self-reported height and weight were used to calculate body mass index (BMI; kg/m2).

2.2.2 |. Eating Disorder Examination—Questionnaire (EDEQ)

The EDEQ (Fairburn & Beglin, 1994) is a 28-item, self-report questionnaire that assesses ED attitudes and behaviors over the past 28 days. Twenty-two Likert-type items are rated on a scale ranging from 0 to 6. The EDEQ yields a global score reflecting respondents’ overall level of shape concern, weight concern, eating concern, and restraint. Higher global scores indicate greater ED pathology. Previous work has demonstrated the reliability and validity of EDEQ global scores within undergraduate men and women (Lavender et al., 2010; Luce & Crowther, 1999). In the current study, internal consistency for the global score was excellent in women (α = .94) and men (α = .95).

2.2.3 |. Eating Disorder Diagnostic Scale

The Eating Disorder Diagnostic Scale (EDDS) (Stice, Telch, & Rizvi, 2000) is a 22-item self-report questionnaire assessing symptoms of anorexia nervosa (AN), bulimia nervosa (BN), and binge-eating disorder (BED) according to DSM-IV criteria. The scale contains Likert-type items, dichotomous items, frequency items, and open-ended items. A symptom composite score was computed by summing raw scores for items 1–18 and 21. Composite scores range from 0 to 112, with higher scores indicating greater ED pathology. Past studies have demonstrated the reliability and validity of composite scores in women and men (Arditte Hall, Bartlett, Iverson, & Mitchell, 2017; Stice et al., 2000). The composite score within the present study exhibited good internal consistency for both women (α = .82) and men (α = .79).

2.2.4 |. Eating Attitudes Test-26

The EAT-26 (Garner et al., 1982) is a 26-item measure of behaviors and cognitions associated with AN, BN, and BED. Items are scored on a 0–3 scale and summed for the composite symptom score. The EAT-26 has demonstrated good reliability and validity in women and men (Gleaves, Pearson, Ambwani, & Morey, 2014). Within the present study, the composite score exhibited good internal consistency for women (α = .89) and men (α = .81).

2.3 |. Procedure

EDEQ and EAT-26 respondents were undergraduate students recruited from the research participant pools at USF and Penn, respectively. Participants from both samples responded to questionnaires through a secure online portal in a single session. EDDS respondents were undergraduate students at Albany recruited through the research participant pool and flyers posted throughout the university campus. Participants completed questionnaires online during a single in-laboratory appointment. All participants provided informed consent and received a debriefing. Study procedures were approved by each recruiting university’s institutional review board.

2.4 |. Data analyses

Analyses utilized ordinal logistic regression models to assess DIF. Consistent with recommendations (Zumbo, 1999), predictors of item scores were entered into the regression model in a stepwise approach. Specifically, predictors included the composite score for the ED measure (Step 1), gender (Step 2), and the interaction term (ED measure composite score*gender; Step 3). Statistically significant DIF was indicated by a significant chi-square difference test comparing fit at Steps 1 and 3. To account for the high number of statistical tests performed, criteria for a statistically significant 2-df_2-difference test was set at .01. Further, as large sample sizes boost statistical power to detect small effects, the clinical significance of the DIF was also evaluated. Specifically, the “Zumbo-Thomas effect size” standard, which requires changes in R2 from Step 1 to Step 3 to meet or exceed .13, was used to identify clinically significant DIF (Zumbo, 1999).

2.5 |. Missing data

Across samples, there were no missing responses on the composite scale scores. Item-level missingness was low (0–0.7%). Composite scores were, therefore, calculated with all available information for each participant. Missingness on gender items was very low (<0.1%). Participants who did not respond to the gender identity question or reported a nonbinary gender identity (<0.1%) were excluded from analyses.

3 |. RESULTS

3.1 |. Eating Disorder Examination—Questionnaire

Women evidenced higher EDEQ global scores (M = 1.77, SD = 1.41) than men (M = 1.21, SD = 1.13; t[1,423] = 6.38, p < .001). Statistically significant DIF was observed for all Likert-type EDEQ items relative to the global score. However, no EDEQ items achieved clinically significant DIF. Results from all DIF analyses are presented in Data S1.

3.2 |. Eating Disorder Diagnostic Scale

Women reported greater EDDS symptom composite scores (M = 18.77, SD = 13.85) than men (M = 9.47, SD = 11.74; t[912] = 10.59, p < .001). Items used to calculate the EDDS symptom composite score (items 1–18) were evaluated for DIF, as only women were asked to complete item 21 (assessing menses). All EDDS items demonstrated statistically significant DIF, with the exception of item 14 (“During these episodes of overeating and loss of control did you feel very upset with yourself, depressed, or very guilty after overeating?”), which did not evidence statistically significant DIF. However, no items met criteria for clinically significant DIF.

3.3 |. Eating Attitudes Test-26

EAT-26 composite scores were greater for women (M = 10.67, SD = 9.62) than for men (M = 7.10, SD = 6.68; t[1,175] = 6.67, p < .001). Analyses indicated that all EAT-26 items met criteria for statistically significant DIF, except for item 7 [“Particularly avoid food with a high carbohydrate content (i.e., bread, rice, potatoes, etc.)”] and item 22 (“Feel uncomfortable after eating sweets”). No items met clinically significant DIF criteria.

4 |. DISCUSSION

The current study examined gender-based DIF in three commonly used measures of ED symptomatology (i.e., EDEQ, EAT-26, EDDS). Across all measures, there was no evidence of clinically significant gender-based DIF on any item. Results from analyses examining EDEQ items are generally consistent with previous work indicating an absence of clinically significant DIF within the Weight and Shape Concern subscales (Reilly et al., 2014), and extend prior findings by demonstrating an absence of gender-based DIF on items from the global score. While previous work examining DIF in the EAT-8 indicated the presence of gender-based DIF for three items (Thielemann et al., in press), the current study did not replicate this finding in analyses using the larger EAT-26 item pool. The current study is the first to examine DIF within the EDDS, demonstrating an absence of gender-based DIF within this scale.

Findings from the present study offer a degree of comfort to researchers and clinicians interested in using existing measures of thinness-oriented ED pathology among men, as results strongly suggest that the items contained within the EDEQ, EDDS, and EAT-26 operate similarly among men and women. However, experts have also raised concerns that existing measures may not assess the full range of ED-relevant attitudes and behaviors that more commonly manifest in men (Murray et al., 2016). For example, male appearance ideals emphasize low body fat and heightened muscularity (e.g., Cafri, Belvins, & Thompson, 2006; Murray et al., 2016). While existing measures of ED pathology may be able to capture behaviors intended to reduce body fat, none of the examined measures contain items to assess pathological behaviors intended to increase muscularity. If such behaviors are determined to represent clinically relevant ED pathology, improved assessment of these experiences within ED measures would be an important avenue for future work.

Strengths of the current study include the use of large samples of men and women, and the comprehensive exploration of gender-based DIF (including statistically and clinically significant DIF) within three of the most frequently used measures of ED pathology. However, the study is not without limitations. As all included samples were drawn from nonclinical populations, findings should be interpreted with caution as they may not generalize to clinical samples. Future research should examine gender-based DIF among individuals with clinically significant ED pathology. In addition, the present study was not able to examine DIF related to diverse gender identities (e.g., transgender). Previous work has highlighted health disparities in ED symptoms among transgender college students (Diemer, Grant, Munn-Chernoff, Patterson, & Duncan, 2015). Therefore, future research should consider DIF across additional gender identities. Similarly, future research should consider DIF across intersecting social identities for which items may differentially relate to global scores (e.g., sexual orientation or race) within a range of genders. Finally, DIF analyses assume that an individual’s score on a measure is a valid indicator of the intended latent trait. However, if this assumption is not met, DIF analyses would not yield valid results.

In sum, the current results indicate that three commonly used dimensional ED symptom inventories do not show differential item response for undergraduate men and women, despite having been developed and validated for women. Results suggest that the EDEQ, EDDS, and EAT-26 do not differ by gender in their ability to assess thinness-oriented ED symptoms, and therefore support the use of these measures to assess thinness-oriented ED pathology in nonclinical samples of undergraduate men. However, as growing evidence highlights the centrality of muscularity-oriented body image and eating concerns among men (Murray et al., 2016), clinicians are also encouraged to assess for problematic muscularity-oriented eating behaviors in this group. Further, the development and testing of measures to capture these concerns is encouraged.

Funding information

National Institute of Mental Health, Grant/Award Number: T32 MH082761

Footnotes

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available on request from the corresponding author.

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of this article.

REFERENCES

  1. Arditte Hall KA, Bartlett BA, Iverson KM, & Mitchell KS (2017). Military-related trauma is associated with eating disorder symptoms in male veterans. International Journal of Eating Disorders, 50(11), 1328–1331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Cafri G, Belvins N, & Thompson JK (2006). The drive for muscle leanness: A complex case with features of muscle dysmorphia and eating disorder not otherwise specified. Eating and Weight Disorders, 11(4), 117–118. [DOI] [PubMed] [Google Scholar]
  3. Diemer EW, Grant JD, Munn-Chernoff MA, Patterson DA, & Duncan AE (2015). Gender identity, sexual orientation, and eating-related pathology in a national sample of college students. Journal of Adolescent Health, 57(2), 144–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Fairburn CG, & Beglin SJ (1994). Assessment of eating disorders: Interview or self-report questionnaire? International Journal of Eating Disorders, 16, 363–370. [PubMed] [Google Scholar]
  5. Garner DM, Olmsted MP, Bohr Y, & Garfinkel PE (1982). The eating attitudes test: Psychometric features and clinical correlates. Psychological Medicine, 12, 871–878. [DOI] [PubMed] [Google Scholar]
  6. Gleaves DH, Pearson CA, Ambwani S, & Morey LC (2014). Measuring eating disorder attitudes and behaviors: A reliability generalization study. Journal of Eating Disorders, 2(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lavender JM, De Young KP, & Anderson DA (2010). Eating disorder examination questionnaire (EDE-Q): Norms for undergraduate men. Eating Behaviors, 11(2), 119–121. [DOI] [PubMed] [Google Scholar]
  8. Luce KH, & Crowther JH (1999). The reliability of the eating disorder examination—Self report questionnaire version (EDE-Q). International Journal of Eating Disorders, 25, 349–351. [DOI] [PubMed] [Google Scholar]
  9. Meulders M, & Xie Y (2004). Person-by-item predictors In De Boeck P & Wilson M (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (Vol. 1, pp. 213–240). New York, NY: Springer Science + Business Media New York. [Google Scholar]
  10. Mitchison D, & Mond J (2015). Epidemiology of eating disorders, eating disordered behaviour, and body image disturbance in males: A narrative review. Journal of Eating Disorders, 3(20), 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Murray SB, Griffiths S, & Mond JM (2016). Evolving eating disorder psychopathology: Conceptualizing muscularity-oriented disordered eating. The British Journal of Psychiatry, 208(5), 414–415. [DOI] [PubMed] [Google Scholar]
  12. Murray SB, Nagata JM, Griffths S, Calzo JP, Brown TA, Michison D, … Mond JM (2017). The enigma of male eating disorders: A critical review and synthesis. Clinical Psychology Review, 57, 1–11. [DOI] [PubMed] [Google Scholar]
  13. Reilly EE, Anderson LM, Schaumberg K, & Anderson DA (2014). Gender-based differential item functioning in common measures of body dissatisfaction. Body Image, 11, 206–209. [DOI] [PubMed] [Google Scholar]
  14. Rø O, Reas DL, & Stedal K (2015). Eating disorder examination questionnaire (EDE-Q) in Norwegian adults: Discrimination between female controls and eating disorder patients. European Eating Disorders Review, 23(5), 408–412. [DOI] [PubMed] [Google Scholar]
  15. Schaefer LM, Smith KE, Leonard R, Wetterneck C, Smith B, Farrell N, … Thompson JK (2018). Identifying a male clinical cutoff on the eating disorder examination-questionnaire (EDE-Q). International Journal of Eating Disorders, 51, 1357–1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Smith KE, Mason TB, Murray SB, Griffiths S, Leonard RC, Wetterneck CT, … Lavender JM (2017). Male clinical norms and sex differences on the eating disorder inventory (EDI) and eating disorder examination questionnaire (EDE-Q). International Journal of Eating Disorders, 50, 769–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Stice E, Telch CF, & Rizvi SL (2000). A psychometric evaluation of the eating disorder diagnostic screen: A brief self-report measure for anorexia, bulimia, and binge eating disorder. Psychological Assessment, 12, 123–131. [DOI] [PubMed] [Google Scholar]
  18. Thielemann D, Richter F, Strauss B, Braehler E, Altmann U, & Berger U (2018). Differential item functioning in brief instruments of disordered eating. European Journal of Psychological Assessment, 1–11. 10.1027/1015-5759/a000472 [DOI] [Google Scholar]
  19. Zumbo BD (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. [Google Scholar]

RESOURCES