Skip to main content
BMC Psychology logoLink to BMC Psychology
. 2025 Sep 26;13:1042. doi: 10.1186/s40359-025-03400-w

Screening for eating disorders in adolescents: psychometric evaluation of the eating disorder examination questionnaire short version (EDE-QS) in a community sample

Camilla Lindvall Dahlgren 1,, Lasse Bang 2, Ellie Bastos Degobi 3
PMCID: PMC12465705  PMID: 41013701

Abstract

Background

This study aimed to evaluate the psychometric properties of the Eating Disorder Examination Questionnaire Short version (EDE-QS) in a community sample of adolescents and to establish an optimal clinical cut-off score for screening purposes.

Method

Clinical interview and self-report data were collected as part of a broader epidemiological study conducted across six upper secondary schools in Norway. The sample included 1,430 adolescents (744 females, 686 males) aged 16–19 years (M = 17.03, SD = 0.90). Data were collected between November 2020 and May 2021. Psychometric evaluation included confirmatory factor analysis (CFA), assessment of convergent validity, and receiver operating characteristic (ROC) analysis to determine the optimal clinical cut-off score in females.

Results

The EDE-QS demonstrated good structural validity and excellent internal consistency. Findings also supported its convergent and criterion validity. The measure appears particularly suitable for epidemiological and clinical research contexts, where brief yet valid screening tools are essential. ROC analysis using the Youden criterion identified a cut-off score of 20 as providing the optimal balance of sensitivity (0.64) and specificity (0.81). This cut-off was derived from female participants only.

Conclusions

The EDE-QS combines robust psychometric properties with brevity, making it especially appropriate for use in large-scale epidemiological and clinical studies, where minimizing participant burden is critical. A limitation of the current study is that the clinical cut-off was established based solely on diagnostic data from female participants. Further validation is needed to assess the utility of the EDE-QS as a screening tool in male and gender-diverse adolescents.

Clinical trial number

Not applicable.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40359-025-03400-w.

Keywords: Eating disorders, Diagnosis, Screening, Adolescents, EDE-QS, Psychometrics

Background

Adolescence is a peak period for the development of eating disorders (EDs), a group of serious and often debilitating mental health conditions marked by pathological concerns about body shape and weight, and disordered eating and weight-control behaviors. Current diagnostic frameworks distinguish between three primary EDs —anorexia nervosa (AN), bulimia nervosa (BN), and binge eating disorder (BED)—and three feeding disorders: Pica, Rumination Disorder, and Avoidant-Restrictive Food Intake Disorder (ARFID). Additionally, the category Other Specified Eating disorders (OSFED) captures clinically significant presentations that do not meet full diagnostic criteria for the main EDs. Despite being considered subthreshold, OSFED is associated with distress and impairment comparable to full-syndrome EDs [1]. EDs disproportionately affect females, with the most pronounced gender disparity seen in AN (male-to-female ratio 1:10). The gender gap narrows in BN (1:5) [2] and BED [3]. Epidemiological studies in adults suggest that OSFED is up to six times more prevalent than AN, BN, or BED, with approximately 30% of individuals seeking treatment falling into this diagnostic group. However, less is known about the prevalence and characteristics of OSFED in adolescent populations [4].

The aetiology of EDs is complex, involving a dynamic interplay of biological, psychological, and sociocultural factors. A substantial body of research highlights the role of body image disturbance and sociocultural influences such as appearance-related teasing, low self-esteem, and exposure to idealized body images in media in the development and maintenance of EDs [5], EDs also significantly reduce quality of life (QoL), often to levels comparable with other severe psychiatric conditions, particularly anxiety disorders [6]. Impairments are most notable in psychological and social domains, negatively affecting self-perception and interpersonal relationships [7, 8].

Although less prevalent than many other psychiatric conditions, EDs have disproportionately severe consequences. AN is associated with serious medical complications including cardiac arrhythmias, electrolyte imbalances, and osteoporosis, and has the highest mortality rate of all psychiatric disorders, with suicide being the leading cause of death [9]. BN involves recurrent binge eating episodes followed by compensatory behaviors such as purging or excessive exercise, and is linked to severe metabolic and gastrointestinal issues [10]. Individuals with BED are often overweight or obese, exposing them to weight-related health problems and stigma [11], as well as at higher risk of developing diabetes and other metabolic dysfunction [12]. All EDs are associated with elevated rates of comorbid psychiatric disorders, such as anxiety, depression, and substance misuse [2]. As a result, EDs impose substantial direct healthcare costs and indirect societal costs, including loss of productivity [13]. Early diagnosis and intervention not only improve individual outcomes but also reduce the overall economic burden [14].

Epidemiological data on the prevalence and distribution of EDs are vital for informing public health strategies. However, the gold-standard approach using a two-stage design involving initial self-report screening followed by clinical interviews is resource-intensive, limiting the feasibility of large-scale studies. Existing prevalence studies often rely on clinical case registers, which miss undiagnosed individuals and underestimate true prevalence in the general population. Moreover, apparent increases in prevalence over time may reflect better case identification rather than true increases in incidence [9]. A promising solution is to use self-reported symptom scores as proxies for ED diagnoses in large-scale research. When using self-report tools to obtain such scores, they must strike a balance between sensitivity (accurately identifying likely cases) and specificity (excluding individuals without EDs), while also being brief enough for widespread use, often alongside other questionnaires.

One of the most widely used gold-standard tools for diagnosing EDs is the Eating Disorder Examination (EDE), a semi-structured interview based on DSM-5 criteria [15]. Although the EDE is freely available, its use requires extensive specialized training, limiting accessibility for many clinicians and researchers. In addition, the interview is time-intensive, typically requiring around 90 min to complete. While the EDE provides rich, detailed information and is well-suited for clinical settings, it is impractical for large-scale epidemiological studies due to its resource demands. To address this limitation, the current study aims to validate a brief, self-report alternative, the Eating Disorder Examination Questionnaire Short (EDE-QS) [16] as a potential proxy for a clinical DSM-5 ED diagnosis in population-level research.

Previous, albeit limited, research has demonstrated that the EDE-QS exhibits excellent internal consistency and test–retest reliability, as well as strong convergent validity with the full EDE-Q across samples comprising individuals both with and without eating disorders (EDs). Its brevity, combined with these strong psychometric properties, offers preliminary support for its use as a screening tool for EDs in community settings [16]. However, to date, only two studies [16, 17] have examined the psychometric properties of the EDE-QS in samples that include both individuals with suspected EDs and those unlikely to meet diagnostic criteria. Consequently, the optimal clinical cut-off score for individuals with a confirmed ED diagnosis remains undetermined. Furthermore, the EDE-QS has not yet been validated in adolescent populations, highlighting a critical gap in the literature. Establishing its validity in this age group could support its use as a practical and efficient instrument for identifying EDs in large adolescent cohorts—facilitating earlier detection, informing public health strategies, and guiding the allocation of resources for prevention and intervention.

Methods

Sample

The sample initially consisted of 1546 adolescents (825 females, 721 males) aged 16–19 (Mean age = 17.01, SD = 0.89) recruited from six upper secondary schools in Norway. After data quality checks, the cleaned data consisted of 1430 adolescents (744 females, 686 males) aged 16–19 (Mean age = 17.03, SD = 0.90).

Procedure

The data for this study were collected as part of a broader investigation into the prevalence and correlates of eating disorders (EDs) among Norwegian adolescents [18]. Participant recruitment was concentrated in major urban areas of Norway. Data collection occurred between November 2020 and May 2021 through an online survey assessing dietary habits, body and weight concerns, loneliness, appearance-related attitudes and pressures, QoL, social media use, and non-suicidal self-injury, which was distributed via school channels. The study employed a two-phase design, consisting of a web-based screening (Phase 1) followed by clinical telephone interviews (Phase 2). In Phase 1, students completed the survey either in class or at home using individual electronic devices. In Phase 2, students who met or exceeded the ED risk cut-off and consented to further contact were invited to participate in a diagnostic telephone interview. The mean interval between the initial screening and the diagnostic interview was 7.5 days (SD = 3.8, Range: 2–22).

As a main rule in Norway, minors above the age of 16 can consent to participate in research without parental approval. In this study, participation was entirely voluntary, with eligibility restricted to students aged 16 years or older. Participants provided digital informed consent in compliance with the Norwegian regulation regarding individuals’ privacy, using Services for Sensitive Data (TSD). We used the adolescent template for subject information and consent from the Norwegian Regional Committee for Medical and Health Research Ethics (Reference ID 116178), who, in addition to the Norwegian Data Protection Authority at Oslo University Hospital, approved the project. The study is registered in the Open Science Framework (OSF) (Identifier: DOI 10.17605/OSF.IO/5RB6P).

Assessment

Sociodemographic information

The online survey solicited self-reported data on weight and height to determine Body Mass Index (BMI in kg/m²), along with participants’ age, gender, ethnic background, educational institution, academic level, and study program.

The eating disorder examination questionnaire short (EDE-QS)

The EDE-QS [16] is a short version of the Eating Disorder Examination Questionnaire (EDE-Q) [19] designed to evaluate ED related thoughts and behaviors from the previous week (e.g. “Have you been deliberately trying to limit the amount of food you eat to influence your weight or shape (whether or not you have succeeded)?”, “Have you tried to control your weight or shape by making yourself sick (vomit) or taking laxatives?” and “Have you had a sense of having lost control over your eating (at the time that you were eating)?”. The EDE-QS features 12 items rated on a 4-point Likert scale (0 = 0 days, 1 = 1–2 days, 2 = 3–5 days, 3 = 6–7 days). Aggregate scores range from 0 to 36, with higher totals suggesting increased symptom severity. The measure has demonstrated good internal consistency, test-retest reliability, and convergent validity in individuals with probable and non-probable EDs [16]. Its brevity and psychometric properties underpin the instrument’s potential as an effective screening instrument for EDs in in non-clinical samples [17]. The optimal balance of sensitivity and specificity is reported achieved with a cut-off score of 15, although adjustments to lower scores are permissible to capture those at significant risk of developing EDs [17]. The translation of the English EDE-QS to Norwegian was carried out in line with the Norwegian version of the EDE-Q [20].

The eating disorder assessment for DSM-5 (EDA-5)

The EDA-5 [21] is a semi-structured, web-based diagnostic interview designed to assess feeding and EDs by identifying symptoms present over the preceding three months. The operationalization of diagnostic criteria within the EDA-5 is described in detail by Walsh et al. [22] and Dahlgren et al. [18], and is also available in Supplementary Materials 1. Previous research has demonstrated high concordance between EDA-5 diagnoses and those obtained through traditional clinical interviews, supporting its diagnostic validity [21, 23]. Recent findings from the Norwegian validation study [23] support the EDA-5’s ability to generate ED diagnoses efficiently (i.e. requiring less time than traditional ED interviews) without compromising diagnostic accuracy. In this validation study, diagnoses assigned using the current gold-standard clinical interview, the Eating Disorder Examination (EDE) [15] and the EDA-5 were identical in 75 of the 91 cases (82.4%). Across individual diagnostic categories, interrater Cohen’s kappa coefficients ranged from moderate (0.49) to perfect (1.00) agreement. These kappa values are comparable to those reported in the original validation study by Sysko et al. [21], in which kappa coefficients ranged from 0.56 for OSFED or unspecified feeding or eating disorders (UFED) to 0.94 for BN. The EDA-5 test-retest kappa coefficient was 0.87 across diagnoses. Detailed results on kappa values, sensitivity, specificity, positive and negative predictive values, as well as overall agreement accuracy in a Norwegian adolescent sample have been reported previously in Dahlgren et al. [18]. No test–retest reliability data are currently available for the Norwegian version. In the current study, the EDA-5 was administered to a subsample of 99 participants (87 girls and 12 boys) who scored at or above the EDE-QS cut-off score of 13.

Sociocultural attitudes towards appearance Questionnaire-4 revised (SATAQ-4R)

The SATAQ-4R [24] assesses individuals’ internalization of societal standards of appearance and attractiveness (e.g. “I think a lot about my appearance”, “It is important to me to look muscular”, and “I want my body to look very thin”) as well as the perceived pressure to conform to these standards from various social influences, including family, peers, partners, and media (e.g. «I feel pressure from family members to look thinner”, “ I feel pressure from my peers to look in better shape” and “I feel pressure from the media to improve my appearance”). The instrument has gender-specific versions, consisting of 31 items for females and 28 for males. It utilizes a 5-point Likert response scale ranging from “Strongly Disagree” to “Strongly Agree” and is structured into seven subscales: Internalization—Thin/low body fat, Internalization—Muscular, Internalization—General attractiveness, Pressures—Family, Pressures—Media, Pressures—Peers, and Pressures—Significant others. Each subscale’s mean score is calculated, with higher scores indicating greater levels of internalization and perceived pressure. The Norwegian adaptation of the SATAQ-4R has been validated, demonstrating strong psychometric properties [5].

Quality of life in Youth- 9 items ( LivUng9; Livskvalitet Hos Ungdom-9)

There is a lack of well-validated, brief Norwegian QoL measures specifically designed for adolescents. Applying adult measures to adolescent populations is problematic due to linguistic differences and substantial variations in life circumstances. National health surveys targeting this age group in Norway, such as the national youth survey UngData and the Health Behavior in School-aged Children (HEVAS) survey, commonly rely on single-item questions rather than validated psychometric instruments to assess QoL. To address this gap, we used LivUng9, a nine-item QoL questionnaire originally developed for a separate epidemiological study on Norwegian adolescents [18]. The nine items (see Supplementary Materials 2) cover key QoL domains including physiological (e.g. “I am satisfied with my health”) psychological (e.g. “I feel positive about the future”), social (e.g. “I am happy with my friends”), and environmental (e.g. “I have fun or interesting things to do in my free time”) aspects, and were derived from single-item QoL measures used in UngData, the Youth Quality of Life Instrument – Short Form (YQoL-SF) [25], and additional items developed by the research group at the Regional Department for Eating Disorders at Oslo University Hospital, Ullevål. Each item is rated on a five-point Likert scale, ranging from “Completely agree” to “Completely disagree”. Higher scores indicate higher QoL. To date, the LivUng-9 has not yet undergone validation in a Norwegian sample.

Data analysis

Quality checks for psychometric properties

Careless responding occurs when participants fail to adequately engage with survey items, either by not thoroughly reading or paying sufficient attention to the item content. This behavior compromises the accuracy of the data, as it no longer reflects the respondents’ actual levels of the constructs being assessed [26, 27]. Careless responses were assessed by seeing multivariate outliers in data using the careless R package [28]. To assess the integrity of item response data, the Mahalanobis Distance was employed, a multivariate statistical measure that evaluates the distance between an observation and the overall distribution. Unlike Euclidean distance, which treats variables as independent, Mahalanobis Distance accounts for correlations among variables and adjusts for variance within the dataset. This property makes it particularly useful for detecting multivariate outliers, as it measures how atypical a given observation is relative to the distribution’s centroid. This metric determines the extent to which an individual data point deviates from the mean of a multivariate distribution. The calculation incorporates the covariance matrix, which ensures that distances are measured in standardized units, reflecting the structure of the data. Observations with higher Mahalanobis Distance values are considered more extreme relative to the overall sample. Outlier responses on the EDE-QS were identified and excluded prior to the psychometric analyses. Due to the limited sample size, data quality checks were not conducted for the ROC analysis.

EDE-QS psychometric properties

To evaluate the structural validity of the EDE-QS, a Confirmatory Factor Analysis (CFA) was conducted to test a unidimensional model of the instrument’s 12 items. The analysis was performed in R (R Core Team, 2024) using the lavaan package [29]. The model was estimated with the ULSMV estimator, appropriate for ordinal response data. Prior research has shown that ULSMV yields parameter estimates and standard errors comparable to, or slightly more accurate than, those produced by WLSMV under various conditions [30, 31]. Additionally, ULSMV has demonstrated superior performance in maintaining Type I error control and statistical power through its robust chi-square statistic [32]. Model fit was evaluated using a combination of fit indices: chi-square (χ²), Tucker–Lewis Index (TLI), Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). According to Hu and Bentler’s (1999), guidelines, values above 0.95 for CFI and TLI, below 0.06 for RMSEA, and below 0.08 for SRMR indicate good model fit To assess the reliability of the unidimensional model, we calculated Cronbach’s alpha (α) and McDonald’s omega (ωₜ). Reliability coefficients equal to or exceeding 0.70 were considered indicative of acceptable internal consistency [33].

We also compared EDE-QS scores between males and females. To assess measurement invariance across gender, we conducted a Differential Item Functioning (DIF) analysis using the logistic regression method for polytomous ordinal items [34, 35]. This analysis involved comparing three nested models: (a) Model 1 included only the latent trait score as a predictor of item endorsement; (b) Model 2 added the group variable (i.e., gender); and (c) Model 3 included an interaction term between gender and trait score. For each model comparison, effect sizes were estimated using pseudo R² [35], following the recommendations of Jodoin and Gierl [36]. An item was considered to exhibit DIF if the likelihood ratio test was significant (p < 0.05) and the change in pseudo R² was ≥ 0.035. The DIF analysis followed five key steps: [1] item calibration and parameter estimation using the graded response model; [2] estimation of trait scores using the Expected A Posteriori (EAP) method; [3] identification of items exhibiting DIF; [4] in cases where DIF was detected, re-estimation was conducted excluding non-invariant items; and [5] all DIF analyses were performed using the mirt [37] and lordif [34] packages in R. Following the identification of items exhibiting DIF, participants’ EAP scores were recalculated to account for DIF-adjusted item functioning. These corrected scores were then used to assess mean gender differences via a Welch t-test conducted in R [38]. Effect sizes were calculated using Cohen’s d and Hedges’ g, implemented with the effectsize package in R [39].

Convergent validity of the EDE-QS was evaluated using Spearman correlations between EDE-QS sum scores and scores on the LivUng9 and SATAQ measures. Correlations were considered statistically significant at p < 0.05. A visual representation of the correlation matrix was generated using the corrplot package in R [40].

EDE-QS Cut-Off

Receiver operating characteristic (ROC) analysis was conducted on a female subsample (n = 82) drawn from the larger study sample (N = 1546). These participants had completed the EDE-QS and participated in a diagnostic interview using the EDA-5. Individuals diagnosed with any ED were coded as “1” (positive for ED), while those without a diagnosis were coded as “0” (negative for ED). The ROC analysis was used to evaluate the sensitivity and specificity of the EDE-QS in identifying clinical ED cases. Sensitivity (true positive rate) reflects the proportion of correctly identified cases among those with an ED, whereas specificity (true negative rate) reflects the proportion of correctly identified non-cases [41].

Optimal cut-off points for predictive probabilities were determined using Youden’s J statistic [42], which identifies the threshold that maximizes the distance from the identity line (i.e., the diagonal) in the ROC curve. The optimal cut-off score for the EDE-QS was calculated using the cutpointr package in R [43]. To evaluate the performance of the binary classification, accuracy was used as a summary statistic. Accuracy was defined as the proportion of correct classifications, both true positives and true negatives, relative to the total number of cases analysed [44].

A two-tailed p-value was calculated using the z-distribution, along with the confidence interval for the Area Under the Curve (AUC). According to Streiner and Cairney [45], test accuracy is considered low when AUC values range from 0.50 to 0.70, moderate between 0.70 and 0.90, and high when exceeding 0.90. Consequently, only AUC values approaching or exceeding 0.70 were interpreted as indicative of acceptable classification accuracy. The ROC analysis was conducted on a subsample of 82 female participants. While this approach allowed for initial evaluation of classification accuracy, the limited sample size and gender-specific composition may affect the generalizability and precision of the results. In particular, the estimated cut-off score may not extend to male participants or more diverse populations, and the small sample may contribute to reduced stability of the ROC estimates. These limitations highlight the need for replication in larger and more representative samples.

Results

Data quality

Data quality was assessed through multivariate outlier detection using Mahalanobis Distance analysis. A stringent criterion was applied, setting the confidence level at 0.999, which led to the exclusion of 116 participants from subsequent analyses. The exclusion of 116 participants identified as multivariate outliers using a conservative threshold (p = 0.001) was based on the QQ-Plot (Fig. 1) for maintaining the integrity of multivariate analyses. As can be seen in the Fig. 1, several observations in the tails of the distribution fell well beyond the diagonal line in the Q–Q plot, reflecting heavy-tailed behavior or extreme values. While this step helped mitigate the influence of extreme or potentially anomalous response patterns, it may have slightly reduced sample representativeness.

Fig. 1.

Fig. 1

Mahalanobis Distance QQ-Plot (N = 1546)

Table 1 presents the summary statistics for the excluded sample on the EDE-QS items. The excluded cases exhibited a heterogeneous response pattern, with median scores of 3 on items EDE-QS5 and EDE-QS6, and median scores of 0 and 1 on items EDE-QS7 and EDE-QS8, respectively. Such variability is consistent with the nature of the Mahalanobis Distance, which identifies multivariate outliers based on deviations across multiple variables, rather than within individual items.

Table 1.

Summary statistics of the excluded sample

Items N Mean SD Median min max range skew kurtosis se
edeqs1 115 1.88 1.26 2.0 0 3 3 -0.55 -1.42 0.12
edeqs2 113 1.06 1.21 1.0 0 3 3 0.57 -1.32 0.11
edeqs3 114 1.47 1.23 1.5 0 3 3 0.02 -1.60 0.12
edeqs4 114 1.54 1.26 2.0 0 3 3 -0.05 -1.67 0.12
edeqs5 114 1.89 1.34 3.0 0 3 3 -0.53 -1.57 0.13
edeqs6 114 1.94 1.27 3.0 0 3 3 -0.58 -1.42 0.12
edeqs7 112 0.56 0.91 0.0 0 3 3 1.32 0.38 0.09
edeqs8 113 1.26 1.18 1.0 0 3 3 0.28 -1.46 0.11
edeqs9 112 1.31 1.13 1.0 0 3 3 0.15 -1.42 0.11
edeqs10 111 0.97 1.13 1.0 0 3 3 0.69 -1.04 0.11
edeqs11 115 2.14 1.01 2.0 0 3 3 -0.84 -0.54 0.09
edeqs12 115 2.16 0.94 2.0 0 3 3 -0.69 -0.77 0.09

EDE-QS psychometric properties

The EDE-QS scale in this Norwegian sample showed great fit: χ² (66) = 6,665.693, p-value < 0.001, TLI = 0.972; CFI = 0.977; RMSEA [CI 90%] = 0.045 [0.038–0.051]; SRMR = 0.063, which indicates that the single-factor model is a plausible model for the EDE-QS scale. Table 2 shows the factor loadings of each item for the single-factor model for the complete cases, where items showed factor loadings above 0.50 (i.e., varying from 0.592 to 0.897). Reliability estimates calculated by Alpha and Omega Total were also considered good for this sample (⍺ = 0.90; Omega Total = 0.92).

Table 2.

Factor loadings of the 12 EDE-QS items

Item Factor Loading (Single Factor)
edeqs1 0.795
edeqs2 0.770
edeqs3 0.806
edeqs4 0.844
edeqs5 0.897
edeqs6 0.888
edeqs7 0.758
edeqs8 0.592
edeqs9 0.778
edeqs10 0.668
edeqs11 0.780
edeqs12 0.812

Note. All factor loadings had a p-value < 0.001; N = 1391

To proceed with the Differential Item Functioning and t-test analyses, needed to have responses in some category of the response scale. Since all items meet this criterion, they were not excluded for the DIF and t-test analyses. Table 3 shows that neither EDE-QS item 2 (edeqs2) nor item 10 (edeqs10) had any endorsements in the final category, while item 7 (edeqs7) lacked endorsement in the last two categories.

Table 3.

Frequency of responses for each item of the EDE-QS response scale (N = 1430)

EDE-QS Item EDE-QS Response Scale
0 (0 days) 1 (1–2 days) 2 (3–5 days) 3 6–7 days)
edeqs1 852 274 153 144
edeqs2 1240 146 34 0
edeqs3 1102 221 74 30
edeqs4 1053 242 97 31
edeqs5 864 276 118 165
edeqs6 887 214 121 202
edeqs7 1393 30 0 0
edeqs8 1086 236 88 14
edeqs9 1138 209 73 2
edeqs10 1232 160 25 0
edeqs11 437 659 205 126
edeqs12 371 676 255 123

When conducting the DIF for gender on the EDE-QS, most items were flagged with significant DIF. In the comparison between Model 1 and Model 2, items edeqs5, and edeqs6 exhibited both a significant p-value and a pseudo-R² > = 0.035. The same pattern was observed in the comparison between Model 1 and Model 3. When comparing Model 2 and Model 3, no items exhibited DIF. As some items exhibited DIF in specific model comparisons, individuals’ scores were calibrated using EAP scores to enable subsequent t-test analyses.

A Welch t-test was conducted using EAP scores to investigate mean differences between females and males. Figure 2 presents a raincloud plot illustrating the distribution of EAP scores by genders. Females (M = 0.041, SD = 0.957) scored significantly higher on EDE-QS than males (M = − 0.962, SD = 0.683), t(1,345.2) = 22.95, p < 0.001, Cohen’s d = 1.207 [95% CI (1.093–1.320)], Hedges’s g = 1.206 [95% CI (1.093–1.319)]. The plot was generated using JASP version 0.19.3.

Fig. 2.

Fig. 2

Dot-plot, Boxplot and Distributions of EAP scores for females and males

For comparison, we ran the t-test to investigate mean differences between females and males for the mean-scores for the full scale (i.e., without correcting item scores for DIF) and for the partial Scale (i.e., without items flagged with DIF). Table 4 shows that although all scoring methods reveal gender differences, the magnitude of these differences varies depending on the metric used. Specifically, the EAP scores—derived from item response theory and which consider item difficulty and discrimination—indicate a larger gender difference than both the raw mean scores and the partial scale mean scores. When comparing mean scores using the same metric (i.e., raw scores), we observe that the partial scale mean scores show smaller gender differences than the full scale mean scores. This suggests that certain items - possibly those showing differential functioning - amplify the gender difference when included. This pattern highlights the importance of the scoring method: metrics that weight items differently (such as EAP scores, which consider item parameters) may magnify latent differences, while metrics that treat all items equally (like raw means) may under- or overestimate those differences depending on the item set used. In essence, whether we assume all items are equally informative or account for their psychometric properties can lead to very different conclusions about group differences.

Table 4.

Sample size, t-test and effect size statistics for EAP scores, mean scores of the full scale, and mean scores of the partial scale

Dependent Variables N t df p Cohen’s d Hedges’ g
EAP Scores 1.430 20.309 1373.177 < 0.001 1.069 [95% CI 0.958–1.180] 1.068 [95% CI 0.957–1.179]
Mean Scores Full Scale 1.430 18.332 1113.183 < 0.001 0.958 [95% CI 0.847–1.069] 0.958 [95% CI 0.847–1.069]
Mean Scores Partial Scale (without EDEQS5 and EDEQS6) 1.429 16.169 1154.205 < 0.001 0.846 [95% CI 0.737–0.956] 0.846 [95% CI 0.736–0.955]

For both females and males, EDE-QS showed a significant correlation with LivUng9 and SATA-Q measures. Females showed higher correlations between EDE-QS and the other measures (see Fig. 3) compared to males (Fig. 4). Correlations from EDE-QS and other measures for females ranged from 0.25 to 0.69, and for males from 0.28 to 0.49.

Fig. 3.

Fig. 3

Correlations between EDE-QS, LivUng9, and SATAQ-4R measures in females

Note. All correlations were significant with p < 0.05; N = 742

Fig. 4.

Fig. 4

Correlations between EDE-QS, LivUng9, and SATAQ-4R measures in males

Note. All correlations were significant with p < 0.05 except between LivUng9 and Male Internalization: Muscularity; N = 683

Receiver operating characteristic (ROC) and cutpoint analysis

Figure 5 shows that, among all possible cutpoint for the scale, the Youden criterion identified a cut-off score of 20 as yielding the highest combined sensitivity and specificity. This cutpoint demonstrated acceptable accuracy (Accuracy = 0.7317), with Sensitivity = 0.641 and Specificity = 0.814. In addition, the positive predictive value (PPV) was 0.76, indicating that 76% of individuals who screened positive were true cases, while the negative predictive value (NPV) was 0.71, suggesting that 71% of those with a negative result were correctly identified as not having the condition. Accordingly, any sum scores greater or equal to this value is considered indicative of a potential ED.

Fig. 5.

Fig. 5

Optimal cutpoints based on the combined sensitivity and specificity

As shown in Fig. 6 (ROC Curve; upper right panel), the AUC exceeds the 0.70 threshold (AUC [95% CI] = 0.7501 [0.636 – 0.864], p < 0.001), indicating acceptable discriminative ability for the cut-off classification. In the upper left panel of Fig. 6 (Independent Variable), a cut-off score of 20 captures the majority of individuals classified as having an ED, while misclassifying relatively few without the condition. Specifically, this cut-off yielded 25 true positives, 14 false negatives, 8 false positives, and 35 true negatives. In 1,000 bootstrap samples, the most frequently selected optimal cut-off was 20, followed by 22 (see lower left panel of Fig. 6). The relatively small female subsample (N = 82) used for ROC analysis limits the generalizability of findings. Specifically, the cut-off score may not be applicable to broader populations, including males or more diverse clinical groups. Moreover, the small sample size may lead to reduced precision and stability of the ROC estimates, emphasizing the need for replication in larger and more diverse samples.

Fig. 6.

Fig. 6

ROC Curve and optimal cutpoints for the EDE-QS scale in females based on bootstrap analysis

Note. Number of bootstraps = 1,000

Discussion

The current study supports the psychometric soundness of the EDE-QS to assess ED symptoms in adolescent community samples. The EDE-QS demonstrated good structural validity and excellent internal consistency. Results also offered support for its convergent and criterion validity. It appears particularly well-suited for epidemiological and clinical studies, where brief and valid screening tools are in demand. To our knowledge, this is the largest study of the EDE-QS’s psychometric properties, and the first to be conducted in an adolescent population. However, issues of measurement non-invariance were raised for several items, highlighting the need for caution when comparing raw scores between genders.

The EDE-QS showed good fit with a unidimensional model, supporting its structural validity. This suggests that a single-factor solution is a plausible model for the scale, consistent with a prior study [16]. The proposed factor structure of the original EDE-Q has been contested in numerous studies [46]. It is therefore noteworthy that the EDE-QS, as well as other brief forms of the EDE-Q [46, 47, 48, 49], show stronger structural validity, suggesting that these abbreviated versions may have a more stable factor structure. The stronger structural validity observed for brief EDE-Q versions, such as the EDE-QS may reflect its brevity and targeted item selection. Its concise format may help minimize measurement error and improve conceptual coherence, as well as reduce respondent burden; an important consideration given the increased susceptibility to attentional fatigue among adolescents. Additionally, the EDE-QS demonstrated excellent internal consistency, supporting its reliability. While we were unable to assess other forms of reliability in this study, previous research has found its test-retest reliability to be satisfactory [16].

The findings also provided evidence for the convergent validity of the EDE-QS, based on its associations with other ED-related measures. Specifically, the EDE-QS was negatively correlated with a quality-of-life measure (LivUng9) and positively correlated with a measure of sociocultural attitudes toward appearance (the SATAQ-4R). Correlations were generally stronger among females than males. However, we lacked a direct comparison with another established ED symptom measure, which would have offered stronger support for convergent validity. Nonetheless, strong associations between the EDE-QS and other ED symptom measures have previously been demonstrated in adult samples [16].

The EDE-QS was capable of accurately distinguishing between female ED cases and non-cases, supporting its criterion validity. The optimal cut-off value, balancing sensitivity and specificity, was  20. To our knowledge, only one previous study [17] has evaluated optimal screening thresholds of the EDE-QS, identifying a slightly lower cut-off at  15. Classification performance was also somewhat lower in our study (AUC = 0.75) compared to the previous one (AUC = 0.89). These discrepancies are likely due in part to differences in sample characteristics, as our sample was considerably younger (16–23 years) than that of the earlier study (18–34 years). Self-reported ED symptoms are typically higher in younger populations [50, 51], which may result in increased difficulty discriminating cases and non-cases, and result in higher cut-off values being needed to achieve adequate classification. While these findings support the EDE-QS as a useful screening tool, its performance should be further validated in independent or clinical samples. The number of false positives and false negatives was not negligible, underscoring the need for caution when interpreting screening results. Unfortunately, due to the small number of male cases, we were unable to perform ROC analyses for males.

Results from the current study also revealed item-level gender-based DIF highlighting the need for caution when comparing raw scores between genders due to evidence of measurement non-invariance. Specifically, DIF analyses indicated that males and females with similar underlying levels of ED symptoms may respond differently to certain EDE-QS items. This has important practical implications, as direct comparison of raw scores across genders may be misleading. Future adaptations of the scale could consider modifying or replacing items with significant DIF or implementing gender-normed scoring to account for these differences and improve the interpretability and equity of the assessment. This raise concerns that some items are interpreted or endorsed differently across genders. Future research should explore the sources of these differences and assess whether item revisions are warranted to improve measurement gender fairness in the EDE-QS. That said, it is often acknowledged that ED symptom levels differ markedly between females and males, such that direct score comparisons may be of limited practical value. Therefore, in practical terms, these findings may have less critical implications.

Together, these findings support the psychometric soundness of the EDE-QS in adolescent community samples and its viability as a tool for ED assessment and screening. Brief versions of ED symptom scales - particularly the EDE-Q - are needed to allow for ED symptom assessment in large studies. Although many such short forms have been proposed, none has achieved the widespread use of the original full EDE-Q. The EDE-QS was developed in response to known limitations of the original EDE-Q and offers a promising alternative. The combination of sound psychometric properties and brevity makes it particularly well-suited for use in epidemiological and clinical studies, where response burden is a critical issue. A further advantage is that, unlike some other brief forms (e.g., the EDE-Q7; [47]), the EDE-QS includes items assessing key behavioural symptoms such as binge-eating and purging, which may be of importance in certain contexts. However, as a brief measure, it does not capture the full range of ED psychopathology covered by the original full EDE-Q. Finally, the EDE-QS is a relatively new measure and requires further psychometric evaluation in diverse populations and settings.

Strengths of the current study include the use of a large adolescent community sample, a balanced gender distribution, and structured diagnostic interviews to determine ED case status. To our knowledge, this is the largest study evaluating the psychometric properties of the EDE-QS, and the only one conducted in an adolescent population. However, a number of limitations should be noted. Only 44% of eligible adolescents participated in the study, raising concerns about nonresponse bias and the representativeness of the sample. The study was also conducted in Norway, where country-specific factors may influence the generalizability of the findings to other populations. Moreover, an additional ED symptom measure was not included in the assessment battery, which limits the strength of the convergent validity evidence presented. However, the EDE-QS is derived from the original EDE-Q, for which good convergent validity has previously been demonstrated [52].

It should also be noted that the QoL measure used, the LivUng9, has not yet been validated in a Norwegian adolescent population; however, a formal validation study is planned. The observed negative associations between ED symptoms and both internalization of appearance ideals and perceived pressure may indicate preliminary convergent validity. Nonetheless, further studies using established QoL instruments are needed to support a more comprehensive validation of the LivUng9. Finally, weight and height, used to calculate BMI, were self-reported, and relatively few participants met the criteria for clinical case status, limiting statistical power. We were also unable to test the discriminatory ability of the EDE-QS in a diagnostically naïve sample. Future studies should also aim to improve representation of boys and gender-diverse adolescents, who remain markedly underrepresented in psychometric validation studies within the field of EDs.

Conclusion

To conclude, the EDE-QS demonstrated good validity and reliability, and shows promise as a brief measure of ED symptoms. Findings from the current study suggest that the EDE-QS is a viable tool for ED symptom assessment in adolescent males and females, and for ED screening in young females. Further validation is needed to assess the utility of the EDE-QS as a screening tool in male and gender-diverse adolescents.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

40359_2025_3400_MOESM1_ESM.docx (41.7KB, docx)

Supplementary Materials 1: Operationalization of the diagnostic criteria as applied in the EDA-5 and used in this study. Supplementary Materials 2: LivUng9 (Quality of Life in Youth – 9 items)

Acknowledgements

We would like to show our sincere gratitude to all participating schools who so generously contributed with their time and enthusiasm in this project. ChatGPT-4o was used for language improvements.

Abbreviations

AN

Anorexia Nervosa

AUC

Area Under the Curve

ARFID

Avoidant Restrictive Food Intake Disorder

BED

Binge Eating Disorder

BN

Bulimia Nervosa

BMI

Body Mass Index

CFA

Confirmatory Factor Analysis

CFI

Comparative Fit Index

DIF

Differential Item Functioning

DSM-5

Diagnostic and Statistical Manual of Mental Disorders 5th version

ED

Eating Disorder

EDA-5

Eating Disorder Assessment for DSM-5

EDE

Eating Disorder Examination

EDE-QS

Eating Disorder Examination Questionnaire Short version

EAP

Expected A Posteriori

HEVAS

Health Behavior in School-aged Children

LivUng9

Livskvalitet hos Ungdom-9 (Quality of Life in Youth – 9 items)

M

Mean

OSFED

Other Specified Feeding or Eating Disorders

QoL

Quality of Life

RMSEA

Root Mean Square Error of Approximation

ROC

Receiver Operating Characteristics

SATAQ-4R

Sociocultural Attitudes Towards Appearance Questionnaire – 4th Revision

SD

Standard Deviation

SRMR

Standardized Root Mean Square Residual

TLI

Tucker–Lewis Index

UFED

Unspecified Feeding or Eating Disorder

YQoL-SF

Youth Quality of Life Instrument - Short Form

Author contributions

CLD served as the principal investigator for the study, oversaw data collection, and drafted the original version of the manuscript, including the background and methods sections. EBD conducted all statistical analyses and prepared the results section. LB authored the discussion section. All authors reviewed, contributed to, and approved the final version of the manuscript.

Funding

The study was supported by the Dam Foundation (Project number 353509) in collaboration with the member organization Mental Health (Mental Helse). Open access funding provided by Oslo New University College (ONH) through the Sikt consortia.

Data availability

Materials and analysis code for this study are available by emailing the corresponding author. The data are not publicly available due to containing information that could compromise the privacy of research participants.

Declarations

Ethics approval and consent to participate

Ethical approval was secured from the Norwegian Committee for Medical and Health Research Ethics (Ref ID 116178) and the local Data Protection Authority. All participants provided written informed consent. The procedures used in this study are in line with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare no conflicts of interest.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Forbush KT, Hagan KE, Kite BA, Chapa DAN, Bohrer BK, Gould SR. Understanding eating disorders within internalizing psychopathology: A novel transdiagnostic, hierarchical-dimensional model. Compr Psychiatry. 2017;79:40–52. [DOI] [PubMed] [Google Scholar]
  • 2.Hudson JI, Hiripi E, Pope HG, Kessler RC. The prevalence and correlates of eating disorders in the National comorbidity survey replication. Biol Psychiatry. 2007;61(3):348–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Keski-Rahkonen A. Epidemiology of binge eating disorder: prevalence, course, comorbidity, and risk factors. Curr Opin Psychiatry. 2021;34(6):525–31. [DOI] [PubMed] [Google Scholar]
  • 4.Mitchison D, Mond J, Bussey K, Griffiths S, Trompeter N, Lonergan A, et al. DSM-5 full syndrome, other specified, and unspecified eating disorders in Australian adolescents: prevalence and clinical significance. Psychol Med. 2020;50(6):981–90. [DOI] [PubMed] [Google Scholar]
  • 5.Lie SØ, Bastos RVS, Sundgot-Borgen C, Wisting L, Dahlgren CL. Sociocultural attitudes towards appearance questionnaire-4-revised (SATAQ-4R): validation in a community sample of Norwegian adolescents. J Eat Disord. 2024;12(1):195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.De La Rie SM, Noordenbos G, Van Furth EF. Quality of life and eating disorders. Qual Life Res. 2005;14(6):1511–21. [DOI] [PubMed] [Google Scholar]
  • 7.Engel SG, Adair CE, Hayas CL, Abraham S. Health-related quality of life and eating disorders: A review and update. Int J Eat Disord. 2009;42(2):179–87. [DOI] [PubMed] [Google Scholar]
  • 8.Jenkins PE, Hoste RR, Meyer C, Blissett JM. Eating disorders and quality of life: A review of the literature. Clin Psychol Rev. 2011;31(1):113–21. [DOI] [PubMed] [Google Scholar]
  • 9.Smink FRE, Van Hoeken D, Hoek HW. Epidemiology, course, and outcome of eating disorders. Curr Opin Psychiatry. 2013;26(6):543–8. [DOI] [PubMed] [Google Scholar]
  • 10.Nitsch A, Dlugosz H, Gibson D, Mehler PS. Medical complications of bulimia nervosa. Cleve Clin J Med. 2021;88(6):333–43. [DOI] [PubMed] [Google Scholar]
  • 11.Hollett KB, Carter JC. Separating binge-eating disorder stigma and weight stigma: A vignette study. Int J Eat Disord. 2021;54(5):755–63. [DOI] [PubMed] [Google Scholar]
  • 12.McCuen-Wurst C, Ruggieri M, Allison KC. Disordered eating and obesity: associations between binge-eating disorder, night-eating syndrome, and weight-related comorbidities. Ann N Y Acad Sci. 2018;1411(1):96–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ágh T, Kovács G, Supina D, Pawaskar M, Herman BK, Vokó Z, et al. A systematic review of the health-related quality of life and economic burdens of anorexia nervosa, bulimia nervosa, and binge eating disorder. Eat Weight Disord - Stud Anorex Bulim Obes. 2016;21(3):353–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Allen KL, Mountford VA, Elwyn R, Flynn M, Fursland A, Obeid N, et al. A framework for conceptualising early intervention for eating disorders. Eur Eat Disord Rev. 2023;31(2):320–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cooper Z, Fairburn C. The eating disorder examination: A semi-structured interview for the assessment of the specific psychopathology of eating disorders. Int J Eat Disord. 1987;6(1):1–8. [Google Scholar]
  • 16.Gideon N, Hawkes N, Mond J, Saunders R, Tchanturia K, Serpell L. Development and psychometric validation of the EDE-QS, a 12 item short form of the eating disorder examination questionnaire (EDE-Q). Takei N, editor. PLoS ONE. 2016;11(5):e0152744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Prnjak K, Mitchison D, Griffiths S, Mond J, Gideon N, Serpell L, et al. Further development of the 12-item EDE-QS: identifying a cut-off for screening purposes. BMC Psychiatry. 2020;20(1):146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dahlgren CL, Reneflot A, Brunborg C, Wennersberg A, Wisting L. Estimated prevalence of DSM-5 eating disorders in Norwegian adolescents: A community based two‐phase study. Int J Eat Disord. 2023;56(11):2062–73. [DOI] [PubMed] [Google Scholar]
  • 19.Fairburn CG, Beglin SJ. Assessment of eating disorders: interview or self-report questionnaire? Int J Eat Disord. 1994;16(4):363–70. [PubMed] [Google Scholar]
  • 20.Dahlgren CL, Stedal K, Rø Ø. Eating disorder examination questionnaire (EDE-Q) and clinical impairment assessment (CIA): clinical norms and functional impairment in male and female adults with eating disorders. Nord J Psychiatry. 2017;71(4):256–61. [DOI] [PubMed] [Google Scholar]
  • 21.Sysko R, Glasofer DR, Hildebrandt T, Klimek P, Mitchell JE, Berg KC, et al. The eating disorder assessment for DSM-5 (EDA‐5): development and validation of a structured interview for feeding and eating disorders. Int J Eat Disord. 2015;48(5):452–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Walsh BT, Attia E, Glasofer DR, Sysko R. Handbook of the assessment and treatment of eating disorders. American Psychiatric Association Publishing; 2016.
  • 23.Dahlgren CL, Walsh BT, Vrabel K, Siegwarth C, Rø Ø. Eating disorder diagnostics in the digital era: validation of the Norwegian version of the eating disorder assessment for DSM-5 (EDA-5). J Eat Disord. 2020;8(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Schaefer LM, Harriger JA, Heinberg LJ, Soderberg T, Kevin Thompson J. Development and validation of the Sociocultural attitudes towards appearance questionnaire-4‐revised (SATAQ‐4R). Int J Eat Disord. 2017;50(2):104–17. [DOI] [PubMed] [Google Scholar]
  • 25.Patrick DL, Edwards TC, Topolski TD. Adolescent quality of life, part II: initial validation of a new instrument. J Adolesc. 2002;25(3):287–300. [DOI] [PubMed] [Google Scholar]
  • 26.Meade AW, Craig SB. Identifying careless responses in survey data. Psychol Methods. 2012;17(3):437–55. [DOI] [PubMed] [Google Scholar]
  • 27.Ward MK, Meade AW. Applying social psychology to prevent careless responding during online surveys. Appl Psychol. 2018;67(2):231–63. [Google Scholar]
  • 28.Yentes R, Wilhelm F. careless: Procedures for computing indices of careless responding.R package version 1.2.2. 2023.
  • 29.Rosseel Y. lavaan: An R Package for Structural Equation Modeling. J Stat Softw. 2012 [cited 2025 Feb 2];48(2). Available from: http://www.jstatsoft.org/v48/i02/
  • 30.Forero CG, Maydeu-Olivares A, Gallardo-Pujol D. Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS Estimation. Struct Equ Model Multidiscip J. 2009;16(4):625–41. [Google Scholar]
  • 31.Yang-Wallentin F, Joreskog K, Luo H. Confirmatory factor analysis of ordinal variables with misspecified models. Struct Equ Model Multidiscip J. 2010;17(3):392–423. [Google Scholar]
  • 32.Savalei V, Rhemtulla M. The performance of robust test statistics with categorical data. Br J Math Stat Psychol. 2013;66(2):201–23. [DOI] [PubMed] [Google Scholar]
  • 33.Tavakol M, Dennick R. Making sense of cronbach’s alpha. Int J Med Educ. 2011;2:53–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Choi SW, Gibbons LE, Choi PK, Lordif. An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. 2011;39(8):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zumbo BD. A Handbook on the Theory and Methods of Differential Item Functioning (DIF) Logistic Regression Modeling as a Unitary Framework For Binary and Likert-Type (ordinal) IItem Scores. In. 1999. Available from: https://api.semanticscholar.org/CorpusID:41969422
  • 36.Jodoin MG, Gierl MJ, Evaluating Type I. Error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Appl Meas Educ. 2001;14(4):329–49. [Google Scholar]
  • 37.Chalmers RP. mirt: A Multidimensional Item Response Theory Package for the R Environment. J Stat Softw. 2012 [cited 2025 Feb 2];48(6). Available from: http://www.jstatsoft.org/v48/i06/
  • 38.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2024. Available from: https://www.R-project.org/
  • 39.Ben-Shachar M, Lüdecke D, Makowski D. Effectsize: Estimation of effect size indices and standardized parameters. J Open Source Softw. 2020;5(56):2815. [Google Scholar]
  • 40.Wei T, Simko V. R package ‘corrplot’: Visualization of a Correlation Matrix (Version 0.95). 2024. Available from: https://github.com/taiyun/corrplot
  • 41.Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74. [Google Scholar]
  • 42.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5. [DOI] [PubMed] [Google Scholar]
  • 43.Thiele C, Hirschfeld G, cutpointr. Improved Estimation and Validation of Optimal Cutpoints in R. J Stat Softw. 2021 [cited 2025 May 19];98(11). Available from: http://www.jstatsoft.org/v98/i11/
  • 44.Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;8(4):283–98. [DOI] [PubMed] [Google Scholar]
  • 45.Streiner DL, Cairney J. What’s under the ROC? An introduction to receiver operating characteristics curves. Can J Psychiatry. 2007;52(2):121–8. [DOI] [PubMed] [Google Scholar]
  • 46.Jenkins PE, Rienecke RD. Structural validity of the eating disorder Examination—Questionnaire: A systematic review. Int J Eat Disord. 2022;55(8):1012–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Grilo CM, Reas DL, Hopwood CJ, Crosby RD. Factor structure and construct validity of the eating disorder examination-questionnaire in college students: further support for a modified brief version. Int J Eat Disord. 2015;48(3):284–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kliem S, Mößle T, Zenger M, Strauß B, Brähler E, Hilbert A. The eating disorder examination-questionnaire 8: A brief measure of eating disorder psychopathology (EDE‐Q8). Int J Eat Disord. 2016;49(6):613–6. [DOI] [PubMed] [Google Scholar]
  • 49.Lev-Ari L, Bachner-Melman R, Zohar AH. Eating Disorder Examination Questionnaire (EDE-Q-13): expanding on the short form. J Eat Disord. 2021 Dec [cited 2025 May 23];9(1). Available from: https://jeatdisord.biomedcentral.com/articles/10.1186/s40337-021-00403-x [DOI] [PMC free article] [PubMed]
  • 50.Rø Ø, Reas DL, Rosenvinge J. The impact of age and BMI on eating disorder examination questionnaire (EDE-Q) scores in a community sample. Eat Behav. 2012;13(2):158–61. [DOI] [PubMed] [Google Scholar]
  • 51.Rø Ø, Bang L, Reas DL, Rosenvinge JH. The impact of age and BMI on impairment due to disordered eating in a large female community sample. Eat Behav. 2012;13(4):342–6. [DOI] [PubMed] [Google Scholar]
  • 52.Berg KC. Psychometric evaluation of the eating disorder examination and eating disorder examination-questionnaire: a systematic review of the literature. Int J Eat Disord. 2012;45(3):428–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

40359_2025_3400_MOESM1_ESM.docx (41.7KB, docx)

Supplementary Materials 1: Operationalization of the diagnostic criteria as applied in the EDA-5 and used in this study. Supplementary Materials 2: LivUng9 (Quality of Life in Youth – 9 items)

Data Availability Statement

Materials and analysis code for this study are available by emailing the corresponding author. The data are not publicly available due to containing information that could compromise the privacy of research participants.


Articles from BMC Psychology are provided here courtesy of BMC

RESOURCES