Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 18.
Published before final editing as: Psychol Violence. 2025 Apr 13:10.1037/vio0000674. doi: 10.1037/vio0000674

Measurement structure, invariance, and item functioning of intimate partner violence scales in high-income countries: Results from 30 population-based surveys

Irina Bergenfeld 1,*, Regine Haardörfer 2, Angela M Bengtson 3, Timothy L Lash 3, Cari Jo Clark 1
PMCID: PMC13089963  NIHMSID: NIHMS2154064  PMID: 42006846

Abstract

Objective.

Intimate partner violence (IPV) is a serious and prevalent threat to the health of women worldwide, yet limitations in measurement hinder efforts to monitor, prevent, and respond to this global challenge. These include variability in definitions and assessment tools, small item sets, lack of comparability across countries, and variability in outcome construction in IPV studies.

Methods.

We performed exploratory and confirmatory factor analysis, measurement invariance testing, and item response theory analysis on IPV data from publicly available, population-based surveys conducted in 28 EU countries and two US states.

Results.

We found that a unidimensional model of IPV based on 30 dichotomous indicators in the EU and 22 dichotomous indicators in the US described IPV well in all countries and states and was strictly invariant across countries and states. Additionally, we identified items that might be removed from the two scales without substantive information loss and high-information items that might be added to IPV scales used in other countries to improve content validity. Finally, we found that outcome construction (dichotomized/count of acts/factor scores) impacted the magnitude, but not the significance, of associations between IPV experience and general health measures.

Conclusions.

Our findings highlight opportunities to improve commonly used IPV scales and suggest that psychological IPV and controlling behaviors are salient aspects of women’s IPV experience that should be incorporated into national and international reporting mechanisms.

Keywords: intimate partner violence, women’s health, high-income countries, measurement, assessment

Introduction

Intimate partner violence (IPV), the most common form of violence worldwide, is a major global health issue that negatively impacts the health and well-being of individuals, families, and communities (Krug et al., 2002). However, challenges in conceptualizing and measuring IPV across countries limit our current understanding of the problem and our ability to evaluate potential solutions. Chief among the limitations of current IPV measurement include a lack of consensus about the core domains and acts that constitute IPV, weak content validity in the brief item sets used to assess IPV, uncertainty about the comparability of these item sets across countries, and variability in IPV outcome construction across studies.

First, the variety of IPV definitions and domains complicates global efforts to assess IPV prevalence (Walby, 2005). Although the World Health Organization (WHO) recognizes that IPV includes “acts of physical aggression, sexual coercion, psychological abuse, and controlling behaviors,” global and regional meta-analyses published by the WHO incorporate only physical and sexual forms of IPV (Sardinha et al., 2022). Indeed, the commonly cited statistic that over one quarter of women and girls worldwide have experienced IPV refers only to physical and/or sexual IPV (Sardinha et al., 2022). Psychological abuse, while receiving less attention, is known to be highly prevalent globally and to have pernicious impacts on victims’ health (Adams & Beeble, 2019; Começanha et al., 2017; Martín-Fernández et al., 2019). The inclusion of controlling behaviors in many scales has further complicated efforts to conceptualize IPV, with some studies recommending that controlling behaviors be measured as part of psychological IPV (Martín-Fernández et al., 2019) and others arguing that it is a distinct form of IPV that should be measured separately (Heise et al., 2019). Still other studies characterize controlling behaviors as an antecedent or risk factor for IPV (Aizpurua et al., 2021). Until the full breadth of behaviors that comprise IPV can be incorporated into global monitoring and reporting mechanisms, our understanding of its scope and impact may be incomplete.

In addition to a lack of agreement on the core domains of IPV that should be measured, it is unclear how well common item sets capture the breadth of these domains. Most existing large-scale monitoring of IPV is based on items from the Revised Conflict Tactics Scales (CTS2), considered a “gold standard” in the field (Alexander, Backes & Johnson, 2022). The CTS2 has also informed the understanding of distinct physical, sexual, and psychological domains of IPV (Chapman & Gillespie, 2019; Straus et al., 1996). IPV prevalence is increasingly monitored, especially in low- and middle-income countries (LMICs), via standardized population-based surveys adapted from the CTS2, such as the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) (Corsi et al., 2012; Khan & Hancioglu, 2019). However, researchers have questioned the content validity of these scales, given that sexual and psychological IPV are measured using three items or fewer (Clark, Bergenfeld, Cheong, Najera, et al., 2023; Follingstad & Rogers, 2013). Because surveys in high-income countries (HICs) tend to be administered at the country level, survey methodology across countries is not always aligned in terms of the items and domains of IPV measured, making comparisons across surveys difficult. However, these differences allow us to assess how surveys with different item sets may cover the latent construct of IPV differently and offer ideas for the refinement of scales. Item response theory may be useful in identifying gaps and opportunities to improve content validity of common IPV measures used in both HICs and LMICs.

Moreover, scales used to assess IPV may not function similarly across countries. As with global reporting mechanisms, invariance studies of IPV scales often focus narrowly on specific forms of IPV. For example, a study of DHS items measuring physical IPV found these items to be approximately invariant across 36 LMICs (Yount et al., 2022), while a separate analysis of DHS controlling behaviors items found that these items were not invariant across 19 LMICs (Yount et al., 2024). A recent analysis of 46 LMICs found that when all forms of IPV are modeled together, the latent construct of IPV is largely unidimensional and generally invariant across regions, with considerable variation in model fit across countries (Bergenfeld et al., 2025). Less is known about cross-country invariance in HICs, partly due to a lack of standardization of item sets in different countries. However, measurement invariance across European Union (EU) states has been examined using the 2014 European Agency for Fundamental Rights (FRA) Survey on Violence Against Women (European Agency for Fundamental Rights, 2014). Two studies using the FRA data analyzed physical/sexual IPV and psychological IPV/controlling behaviors separately, finding that a one-factor model of psychological IPV/controlling behaviors and a two-factor model of physical and sexual IPV were strongly invariant across all 28 countries (Martín-Fernández et al., 2019, 2020). Outside of the EU, the diversity of scales and populations used to measure IPV in HICs is mirrored in the diversity of factor structures proposed to model IPV. For example, factor analyses of IPV scales have described models with anywhere from one (Aiquipa-Tello & Ponce-Díaz, 2025; Peitzmeier et al., 2021) to five factors (Borjesson et al., 2003; Shorey et al., 2019). Even when the same items are used, similar factor structures are not always confirmed in different populations. While the CTS2 was theorized to measure psychological aggression, physical assault, and sexual coercion (as well as two additional domains of injury and negotiation) as distinct factors, this structure has not been universally replicated across studies (Chapman & Gillespie, 2019). Moreover, existing evidence suggests that common IPV measures used in HICs may not be invariant across gender (O’Hara et al., 2018; Wareham et al., 2022), sexual minority status (Palm Reed et al., 2022), or language (Connelly et al., 2005).

Finally, IPV measures (either holistic or domain-specific) derived from multi-item scales are often dichotomized in both descriptive and analytic studies, which can lead to a loss of information and statistical power. Incorporating frequency and severity of IPV acts (usually only measured in more recent time frames) into variable construction can recover some of this missing information, elucidate some of the more nuanced impacts of IPV interventions (Chatterji et al., 2023), and distinguish between interventions that prevent incident IPV from those that reduce ongoing IPV (Chatterji et al., 2020). Latent modeling techniques, which treat IPV as a continuous construct derived from multiple observed indicators, offer another alternative to binary IPV measures. The choice of outcome construction, whether binary, categorical, or continuous, can have important implications for study inference (Clark, Bergenfeld, Cheong, Kaslow, et al., 2023), yet few studies to date have examined how outcome construction impacts the findings of IPV studies.

Valid, comprehensive, and comparable quantitative measures are needed for global monitoring of IPV and for evaluating programs and policies intended to reduce or prevent IPV. Previous studies have psychometrically assessed the more standardized item sets commonly used in LMICs. Although making comparisons between studies in HICs is difficult, differences in item sets used to measure IPV in HICs also provide opportunities to examine item functioning across surveys to inform future refinement of IPV scales. In this study, we aimed to investigate the four research questions: (1) What is the measurement structure of IPV, as measured in two standardized scale measures, across HICs? (2) Are these two scales measuring an equivalent construct across HICs? (3) How informative are the items used to measure IPV in HICs in these two surveys? (4) Does the method of outcome construction impact the relative prevalence of IPV across countries and/or associations between IPV and general physical and mental health outcomes? The findings of these analyses may be used to inform valid use of these data sets and to refine existing measures of IPV for future data collection.

Methods

This study was deemed exempt by the Institutional Review Board of Emory University. Our inclusion criteria were publicly available, population-based data on IPV from HICs. Additionally, we sought data that spanned multiple states or countries to enable measurement invariance testing across these broad geographical regions. We were able to find and access two violence-focused surveys from 28 European Union (EU) countries and two US states. We also located one dataset that met our criteria, the National Intimate Partner and Sexual Violence Survey, which is maintained by the National Archive of Criminal Justice Data under the National Institute of Justice. However, we were unable to finalize a data use agreement between Emory University and the National Institute of Justice in a timely manner. We hope to gain access to this dataset in the future to repeat the analyses described in this study using national data.

Data and sample

In 2014, the European Agency for Fundamental Rights (FRA) conducted survey on violence against women across 28 EU countries (European Agency for Fundamental Rights, 2014) (https://datacatalogue.ukdataservice.ac.uk/studies/study/7730#details). The FRA survey used population-based sampling to recruit approximately 1500 women aged 18-74 in each country via a multistage sampling procedure, resulting in a total sample of 42,002 women. Of these, 40,192 women were ever-partnered and were asked IPV questions either over the phone or in person. Women were intentionally sampled to achieve roughly equal numbers of women in each 10-year age group (European Agency for Fundamental Rights, 2014). The FRA provides weights to adjust sample demographics to be representative of both national and EU populations. About 97% of women surveyed identified as heterosexual and about 97% were citizens of their country of residence. While precise individual data on age were not available in the FRA survey, the median age category was 40-49 years.

The California Violence Experiences Survey (CalVEX) (https://www.openicpsr.org/openicpsr/project/199087/version/V1/view) and Louisiana Violence Experiences Survey (LaVEX) (https://www.openicpsr.org/openicpsr/project/199088/version/V1/view). Henceforth referred to collectively as the Violence Experiences Surveys (VEX), these surveys measure different forms of violence experienced by women and men 18 years and older, including IPV and non-partner violence and discrimination (Raj, Johns, Closson, Mahoney et al., 2023). The LaVEX was conducted once, in 2023, with a sample of 1081 individuals (Raj, Johns, Yore, Closson et al., 2023). The CalVEX has collected several waves of data; we used only the 2023 wave, composed of 3560 individuals, for comparability with LaVEX data. For comparability with the FRA survey, we only included women’s data in our analytic sample (n=2796). Both VEX surveys were administered in person, over the phone, and on the web. The VEX provides sampling weights to adjust these surveys for national representativeness across the entire US population. Overall, about 96% of female VEX participants were US citizens and 88% identified as heterosexual. The mean age of female participants was 48 years. For both the FRA and the VEX datasets, we removed all individuals with missing data on all IPV items (n = 185 (0.5%) and n = 17 (0.6%)), respectively), resulting in an analytic sample of 40,007 for the FRA and 2779 for the VEX.

Measures

The FRA survey asked participants 30 questions about experiences of IPV, including controlling behaviors, threats, physical aggression, psychological abuse, and sexual violence, separately for current and former partners. For each item, we combined violence from current and former partners into a single item to better align with the way IPV is measured in other surveys, including the VEX. Due to very low (and in many cases zero) frequencies for some response options, as well as variation in the response options for different forms of IPV and time periods, we dichotomized each item to ever/never, as has been done in previous factor analyses using this data (Martín-Fernández et al., 2019, 2020). For example, psychological IPV and controlling behaviors responses were never, sometimes, often, and all the time for current partner but ever/never for former partners; physical and sexual IPV responses were 0 times, 1 time, 2-5 times, and 6 or more times regardless of whether acts were perpetrated by current or former partners. In some cases, the highest frequency response option was endorsed by fewer than 10 individuals across the entire sample of 40,007, suggesting that three or fewer response options would effectively capture variation in responses without substantive loss of information.

The CalVEX and LaVEX asked participants about lifetime and past year violence from any intimate partner (current or former) using 22 dichotomous items, including controlling behaviors, threats, physical aggression, psychological abuse, and sexual violence. To correspond to the FRA data, we retained data only on lifetime experiences of IPV, i.e. each item is a dichotomous measure of ever or never having experienced the act. Table 1 juxtaposes the IPV acts covered in the FRA and VEX with the most commonly used “gold standard” measure of IPV, the adapted CTS2.

Table 1.

Comparison of IPV acts covered in the FRA, VEX, and Adapted CTS2

Act Adapted CTS2
(“gold standard”)
VEX FRA
Slap X X X
Hit/Punch/Beat X X X
Push/shove X X X
Shake X
Burn X X X
Kick X X
Drag X
Grab X
Choke/suffocate X X X
Throw something X
Beat head on something X
Slam X
Pull hair X X X
Twist arm X
Threaten/attack with weapon X X X
Forced sex X X X
Coerced sex X X
Other forced sex acts X X X
Attempted forced sex X X
Consent from fear X
Forced to watch pornography X
Hurt children X
Insult/Belittle X X X
Humiliate/Make fun of X X X
Threats of harm X X X
Threats to pets X
Threat to harm children X
Threat to take away children X
Threats to others X X X
Threat to harm self X
Scare/intimidate X
Destroy property X
Accuse infidelity X X
Forbid to work outside X
Location tracking X X X
Restrict movement X
Become jealous/angry if speaks to other men X X
Harass X
Control decisions X X
Control finances X X X
Isolate from friends/family X X X

Mental health symptoms in the VEX surveys were assessed using the Patient Health Questionnaire for Depression and Anxiety (PHQ-4), a validated, four-item screener for anxiety and depressive symptoms (Kroenke et al., 2009).

Analysis

Exploratory factor analysis (EFA) & Confirmatory factor analysis (CFA).

Because the item sets used in the EU and US surveys differed from other IPV scales, we did not assume a factor structure a priori. Instead, we conducted EFA on a random split-half sample of each country, considering one to four factors corresponding to the four theoretical domains of IPV: physical, sexual, psychological, and controlling behaviors. A priori criteria for model fit were: Root Mean Square Error of Approximation (RMSEA) ≤0.05 (≤0.08 adequate), Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) ≥0.95 (≥0.90 adequate) (Hu & Bentler, 1995). We reserved the other random split half of the sample in each country for CFA to independently confirm the structure elucidated during the EFA, using the same cutoffs.

Measurement invariance testing.

To determine whether the IPV models in each country were comparable, we tested the measurement invariance (MI) of the IPV model across EU countries, and separately across US states, using multi-group CFA (MGCFA). Among countries/states that demonstrated the same measurement model (from the CFA), we sequentially constrained parameters (first loadings and thresholds, then residual variances) and assessed the difference in fit (RMSEA, CFI, TLI) from the less to the more constrained model. Equivalent loadings and thresholds suggest comparable interpretation of items and response options, respectively, across groups; equivalent residual variances allow for valid comparison of average latent IPV across groups. Due to the greater risk of bias using dichotomous indicators (Tse et al., 2024), MI was defined as ≤0.007 increase in RMSEA and ≤0.002 decrease in CFI from the configural (all parameters free to vary across groups) to the scalar (loadings and thresholds fixed across groups) model and from the scalar to the strict model (loadings, thresholds, and residual variances fixed across groups) (Meade et al., 2008). Small changes in fit indicate that the models are invariant, or comparable, across groups.

Item response theory (IRT) analysis.

To evaluate which items contribute the most information about the severity of underlying construct of IPV as measured by the FRA and VEX scales, we used IRT methods. We modeled each item set using the two-parameter logistic (2PL) model (Toland, 2014). For each item, we estimated a discrimination parameter α (typically between 0.5 and 2.0), which describes the item’s ability to differentiate between levels of IPV along a continuum, and a difficulty parameter β, representing the level of latent IPV severity at which half of respondents would be expected to endorse the item. Ideally, items on a scale should have higher discrimination parameters; items with lower discrimination parameters may be candidates for dropping from the scale as they can less reliably distinguish between more and less severe cases of IPV. Items should also cover a wide range of difficulties to measure the full breadth of the underlying construct with accuracy. In the FRA and VEX scales, items with lower difficulty reflect less severe acts of IPV, while those with higher difficulty reflect more severe IPV. We also visualized total information curves and item information curves for each country (Toland, 2014). These curves represent the level of the latent IPV construct at which each item and the full scale, respectively, provide the most information (ability to discriminate between less versus more severe IPV).

Outcome construction.

To assess whether outcome construction resulted in notable differences in measures of prevalence and association, we compared three ways of measuring overall IPV in each country based on the model selected during EFA/CFA: (1) the proportion of individuals endorsing at least one of the dichotomized IPV variables, (2) the average count of all dichotomized items endorsed, and (3) IPV factor scores extracted from the latent model. We then ranked each country and state and compared the results across the three methods of outcome construction. Finally, we regressed general health (measured using a single ordinal item), as well as mental health symptoms for the VEX data (measured using the PHQ-4) separately on the three IPV variables. For comparability, all variables were standardized.

Data cleaning, variable creation, and regression analysis were performed in STATA (version 18). EFA, CFA, MGCFA, and IRT analyses were performed in Mplus (version 8.1) using mean- and variance-adjusted weighted least squares estimator and the theta parameterization, as is appropriate for dichotomous items. For all analyses, we used sampling weights provided by FRA and VEX, respectively.

Results

Item coverage

Table 1 details 41 acts of IPV that are considered in the VEX, the FRA, or the adapted CTS2 commonly used in large-scale IPV monitoring in LMICs. While all three scales captured a minimum of nine acts of physical IPV, other forms of IPV, particularly sexual forms, were more sparsely covered in the CTS2 and VEX. The FRA, with the larger number of items, unsurprisingly covered the widest range of acts. It was also the only scale that considered threats and harm directed at children as a form of abuse, and the only scale that included more than three acts of sexual abuse. The VEX scale uniquely measured threats to pets, perpetrator threats to harm themselves, and harassment.

EFA/CFA

In every country and state except Luxembourg, one- through four-factor models showed good or adequate fit to the data in EFA (Table 2). In Luxembourg, the one-factor model did not show adequate fit, and one item (“suffocate or strangle”) loaded negatively due to very low frequency. However, two-, three-, and four-factor models were implausible due to having only one item loading on one or more factors. We selected the one-factor model to proceed to CFA because multifactorial models in all countries had numerous cross-loadings (items loading at >0.400 on more than one factor), and in three- and four-factor models in many countries, factors emerged onto which no items or only a single item loaded strongly. In all countries, including Luxembourg, a unidimensional model demonstrated adequate fit to the data in CFA. Standardized factor loadings in all 30 surveys exceeded 0.400 in CFA, except in Portugal where extremely low frequency on a single item (“forced you to watch pornography”) resulted in a negative loading (see Supplemental Material for item frequencies by country). In Greece, this item’s frequency was so low in the CFA sample that it interfered with model convergence and had to be removed.

Table 2.

Unidimensional model fit statistics from EFA and CFA

Full Sample Exploratory Sample Confirmatory Sample
RMSEA CFI TLI Range of
Loadings
RMSEA CFI TLI Range of Loadings
Austria 1420 0.029 0.984 0.982 0.677-0.984 0.031 0.986 0.985 0.548-0.967
Belgium 1467 0.043 0.978 0.976 0.609-0.962 0.040 0.974 0.973 0.581-0.943
Bulgaria 1467 0.034 0.991 0.990 0.470-0.974 0.033 0.991 0.990 0.665-0.970
Croatia 1463 0.057 0.972 0.970 0.543-0.985 0.054 0.971 0.969 0.418-0.974
Cyprus 1392 0.030 0.984 0.983 0.616-0.971 0.028 0.986 0.985 0.700-0.989
Czech Rep. 1577 0.036 0.978 0.976 0.734-0.951 0.047 0.954 0.950 0.682-0.932
Denmark 1467 0.041 0.952 0.949 0.437-0.914 0.049 0.951 0.948 0.493-0.969
Estonia 1428 0.033 0.980 0.979 0.698-0.966 0.035 0.977 0.975 0.651-0.969
Finland 1473 0.039 0.972 0.970 0.443-0.918 0.038 0.976 0.975 0.531-0.930
France 1414 0.035 0.979 0.977 0.520-0.927 0.040 0.974 0.972 0.688-0.931
Germany 1487 0.041 0.969 0.966 0.549-0.912 0.038 0.970 0.968 0.503-0.947
Greece* 1443 0.028 0.989 0.989 0.632-0.970 0.029 0.987 0.986 0.614-0.978
Hungary 1477 0.048 0.955 0.951 0.700-0.974 0.049 0.959 0.956 0.649-0.981
Ireland 1438 0.036 0.989 0.988 0.437-0.968 0.031 0.994 0.993 0.670-0.970
Italy 1487 0.019 0.992 0.992 0.862-0.994 0.018 0.992 0.991 0.808-1.004
Latvia 1445 0.036 0.974 0.972 0.616-0.928 0.038 0.976 0.974 0.497-0.950
Lithuania 1362 0.037 0.981 0.980 0.466-0.979 0.040 0.980 0.978 0.572-0.953
Luxembourg 895 0.087 0.875 0.866 −0.143-0.945 0.033 0.987 0.986 0.566-0.945
Malta 1373 0.024 0.989 0.988 0.555-0.952 0.021 0.989 0.988 0.664-0.931
Netherlands 1459 0.042 0.963 0.961 0.420-0.917 0.040 0.973 0.972 0.639-0.938
Poland 1429 0.034 0.986 0.985 0.690-0.980 0.033 0.979 0.977 0.641-0.948
Portugal 1438 0.037 0.977 0.975 0.563-0.947 0.057 0.954 0.950 −0.386-0.979
Romania 1512 0.035 0.990 0.989 0.720-0.988 0.032 0.985 0.984 0.741-0.946
Slovakia 1407 0.054 0.955 0.952 0.342-0.946 0.046 0.974 0.972 0.513-0.953
Slovenia 1374 0.042 0.974 0.972 0.287-0.956 0.030 0.980 0.979 0.660-0.969
Spain 1452 0.026 0.988 0.987 0.557-0.950 0.023 0.988 0.987 0.575-0.976
Sweden 1487 0.033 0.960 0.957 0.403-0.933 0.030 0.975 0.973 0.589-0.941
UK 1474 0.034 0.987 0.986 0.644-0.953 0.035 0.987 0.986 0.578-0.952
California 2034 0.023 0.993 0.992 0.644-0.959 0.024 0.986 0.984 0.572-0.963
Louisiana 745 0.024 0.980 0.978 0.457-0.925 0.019 0.996 0.996 0.577-0.975
*

Suffocated/strangled was removed due to extremely low frequency in the CFA sample.

MGCFA

The configural, scalar, and strict models from both the VEX and FRA items showed good fit to the data (Table 3). Strict invariance held for both the US and the EU, with negligible changes in fit with the addition of equality constraints on loadings, thresholds, and residual variances.

Table 3.

Multi-group confirmatory factor analysis for the European Union and United States

European Union
RMSEA CFI TLI
Configural 0.037 0.979 0.977
Scalar 0.037 0.977 0.977
Strict 0.036 0.977 0.979
Configural vs. Scalar (change in fit) <0.001 −0.002 <0.001
Scalar vs. Strict (change in fit) −0.001 <0.001 0.002
United States
RMSEA CFI TLI
Configural 0.021 0.990 0.989
Scalar 0.021 0.990 0.990
Strict 0.019 0.991 0.991
Configural vs. Scalar (change in fit) <0.001 <0.001 0.001
Scalar vs. Strict (change in fit) −0.002 0.001 0.001

IRT

VEX items measured the latent construct of IPV with high information between roughly 0 and 2 standard deviations above the mean, while FRA items measured a broader range of the latent construct between roughly 0 and 2.5 standard deviations above the mean (Figure 1). Item discriminations for the VEX ranged from 0.83 (forced or attempted forced sexual acts) to 2.62 (slammed you against something) (Table 4). The items with the lowest difficulties were “insulted, humiliated, or made fun of you in front of others” (0.54) and “pushed or shoved you” (0.69), while the items with the highest difficulties were “used a knife on you” (3.10), “used a gun on you” (2.94), and “burned you on purpose” (3.00). For the FRA items, discriminations ranged from 0.94 (forced you to watch pornographic material) to 2.23 (pushed or shoved you). The items with the lowest difficulties in the FRA were “belittle or humiliate you in private” (0.74), “insist on knowing where you are at all times” (0.81), and “get angry if you speak to another man/woman” (0.84), while “made you watch pornography” (3.00), “cut, stab, or shoot you” (3.05), and “beat or kicked you” (2.93) had the highest difficulties.

Figure 1. Total information curves for VEX (left) and FRA (right).

Figure 1.

Total information curves for the Violence Experiences Survey (22 items) and the Agency for Fundamental Rights Violence Against Women Survey (30 items).

Table 4.

Discrimination (alpha) and difficulty (beta) parameters for VEX and FRA items

VEX Alpha SE Beta SE
1. Insulted, humiliated, or made fun of you in front of others 1.43 0.12 0.54 0.05
2. Kept you from having your own money 1.21 0.12 1.59 0.10
3. Tried to keep you from seeing or talking to your family or friends 1.83 0.15 0.89 0.06
4. Kept track of you by demanding to know where you were and what you were doing 1.63 0.14 0.83 0.06
5. Made threats to physically harm you 2.05 0.17 1.02 0.06
6. Made threats to harm someone close to you 1.28 0.13 1.93 0.12
7. Threatened to hurt themselves or commit suicide because they were upset with you 0.93 0.08 1.63 0.12
8. Made decisions for you that should have been yours to make 1.14 0.09 1.26 0.08
9. Destroyed something that was important to you 1.73 0.14 0.96 0.06
10. Intentionally hurt or threatened to use violence against your pet(s) 1.09 0.11 2.05 0.14
11. Harassed you by phone, text, email or using social media 1.19 0.10 1.09 0.07
12. Slapped you 2.13 0.19 1.00 0.05
13. Pushed or shoved you 2.44 0.20 0.69 0.05
14. Hit you with a fist or something hard 2.30 0.22 1.23 0.06
15. Hurt you by pulling your hair 2.02 0.21 1.36 0.07
16. Slammed you against something 2.62 0.27 1.04 0.05
17. Tried to hurt you by choking or suffocating you 1.80 0.17 1.30 0.07
18. Beaten you 2.40 0.27 1.49 0.07
19. Burned you on purpose 1.14 0.22 3.00 0.31
20. Used a knife on you 0.95 0.10 3.10 0.20
21. Used a gun on you 0.93 0.14 2.94 0.30
22. Forced or tried to force you to have sex, or made you perform sexual acts that you did not want to perform 0.83 0.08 1.67 0.13
FRA Alpha SE Beta SE
1. Try to keep you from seeing friends 1.98 0.06 1.00 0.02
2. Try to restrict contact with relatives 1.61 0.05 1.38 0.02
3. Insist on knowing where you are 1.99 0.05 0.81 0.02
4. Get angry if you speak to another man/woman 1.78 0.04 0.84 0.02
5. Suspicious that you are unfaithful 1.65 0.04 0.92 0.02
6. Prevent you from making financial decisions 1.31 0.04 1.56 0.03
7. Forbid work outside the home 1.21 0.05 2.16 0.05
8. Forbid you to leave the house 1.55 0.06 1.94 0.04
9. Belittle or humiliate in front of others 1.67 0.04 1.04 0.02
10. Belittle or humiliate in private 2.16 0.07 0.74 0.01
11. Scare or intimidate 2.11 0.06 0.99 0.02
12. Made you watch pornography 0.94 0.05 2.99 0.11
13. Threatened to take children 1.29 0.05 1.76 0.04
14. Threatened to hurt children 1.72 0.09 2.14 0.05
15. Hurt children 1.45 0.06 2.24 0.05
16. Threatened others 1.39 0.05 2.08 0.04
17. Threatened to hurt you 2.04 0.06 1.19 0.02
18. Pushed or shoved 2.24 0.07 1.10 0.02
19. Slapped 1.86 0.05 1.30 0.02
20. Thrown a hard object at you 1.78 0.06 1.73 0.03
21. Grabbed you or pulled hair 2.12 0.07 1.54 0.02
22. Beat or kicked you 1.06 0.05 2.93 0.10
23. Burned 1.47 0.06 2.22 0.05
24. Suffocate or strangle 2.06 0.08 1.71 0.03
25. Cut, stab, or shoot 1.23 0.09 3.05 0.12
26. Beat your head on something 1.93 0.08 2.02 0.04
27. Physically forced sex 2.01 0.08 1.97 0.03
28. Attempted forced sex 1.97 0.08 1.99 0.03
29. Unwanted sex acts 1.57 0.06 2.08 0.04
30. Consent out of fear 1.73 0.06 1.89 0.03

Impact of outcome construction formats on rank and health outcomes

In the FRA data, the proportion of individuals who experienced at least one act ranged from 0.32 in Ireland to 0.63 in Denmark, and the mean count of acts experienced ranged from 1.96 in Slovenia to 4.04 in Latvia. Factor scores ranged from <0.01 in Spain to 0.33 in Denmark. In both the FRA and VEX data, many confidence intervals for different countries’ IPV prevalence overlapped, regardless of the method of outcome construction (proportion experiencing at least one act, count of acts, factor scores) (Figure 2). In the VEX data, proportions of individuals experiencing at least one act of IPV were similar, but factor scores and counts were statistically different, showing higher levels of IPV in Louisiana. In general, country ranks were largely similar across the three methods with some exceptions, with the largest similarities between the proportion endorsing at least one item and factor scores (7/28 exactly the same rank, 24/28 within two ranks) (Table 5).

Figure 2. Proportion endorsing one or more items, count, and factor scores by EU country.

Figure 2.

Estimates and 95% confidence intervals for IPV with three methods of outcome construction (proportion of individuals experiencing at least one act, average count of acts, factor score) by EU country.

Table 5.

Rank of proportion endorsing one or more items, count, and factor scores by country and state

Rank proportion Rank count Rank factor score
Austria 20 25 21
Belgium 15 9 13
Bulgaria 18 9 16
Croatia 16 17 18
Cyprus 21 19 20
Czech Rep. 13 15 14
Denmark 1 5 1
Estonia 7 11 10
Finland 3 6 4
France 11 10 15
Germany 8 11 8
Greece 25 17 25
Hungary 10 6 12
Ireland 28 14 26
Italy 19 11 19
Latvia 2 3 2
Lithuania 4 4 3
Luxembourg 9 6 11
Malta 22 10 24
Netherlands 6 5 7
Poland 23 8 23
Portugal 24 7 22
Romania 17 6 17
Slovakia 12 4 9
Slovenia 26 6 27
Spain 27 5 28
Sweden 5 4 6
UK 14 3 5

All three types of IPV outcome construction were significantly associated with worse reported general health in the EU and with lower physical health and more mental health symptoms in the US. Although regression coefficients were much larger, and very similar in scale, using standardized count and factor score variables versus dichotomous variables, this did not impact the statistical significance of the association between IPV and health outcomes (Table 6).

Table 6.

Standardized association of dichotomous, count, and factor scores with ordinal general health measures

Coefficient [95% confidence interval]
European Union Dichotomous Count of acts Factor score
General Health (n=39980) −0.17 [−0.20, −0.14] −0.23 [−0.27, −0.20] −0.23 [−0.26, −0.20]
United States Dichotomous Count of acts Factor score
Physical Health (n=2772) −0.11 [−0.22, −0.01] −0.22 [−0.33, −0.10] −0.18 [−0.30, −0.07]
PHQ-4 (n=2694) 0.44 [0.22, 0.55] 0.57 [0.44, 0.70] 0.57 [0.45, 0.70]

Discussion

This study contributes to knowledge on global measurement of IPV by assessing the invariance and item functioning of a comprehensive model of IPV, including physical, sexual, psychological, and controlling behaviors domains, in population-based surveys from HICs. We found that a unidimensional model of IPV, using dichotomous items and incorporating all four domains, fit the data well in all states and countries in the sample and demonstrated strict invariance across states and countries. Evidence of strict invariance suggests that the FRA and VEX scales and indicators can be used to validly and accurately assess cross-country or cross-state differences in IPV in the EU and US, respectively. Additionally, our findings suggest that measuring the proportion of women in each EU country who experience any IPV produces broadly similar results to extracted factor scores from latent modelling for measures of prevalence. However, measuring IPV exposure as a simple dichotomy of at least one act experienced may lack power to detect associations compared to count or factor-score derived IPV measures in smaller samples. Taken together, our findings indicate that the FRA items, which have been used in subsequent research in the EU, function well at capturing a broad range of IPV in this context. While the VEX items should be tested in more states before drawing similar conclusions, preliminary evidence from this study indicates that they are functioning well in the two states where they have been administered.

This study also suggests that psychological IPV and controlling behaviors in HICs are part of a broader IPV construct along with physical and sexual forms and underscores the need to incorporate psychological IPV and controlling behaviors into global and regional prevalence estimates (Heise et al., 2019). Current estimates, based only on physical and sexual IPV, may be underestimates that risk failing to capture the full scope of IPV and its negative downstream impacts. It is worth noting that a unidimensional model of IPV does not necessarily preclude the consideration of different forms of IPV in studies where examining specific domains may be informative. For example, recent evidence from LMICs suggests that psychological IPV may be on the rise in countries where physical forms of IPV are decreasing (Ma et al., 2023).

IRT analyses confirm that the FRA and VEX item sets are functioning well and capturing a broad range of the latent IPV construct in the EU and US, respectively. The FRA measures a slightly broader range of the construct, likely due to the larger number of items. IRT also provides some evidence of items that may be dropped from the FRA or VEX without losing information or range and suggests domains where measurement gaps exist. In particular, the FRA item about being forced to watch pornography is a low discrimination item with similar difficulty to other more severe behaviors. Other studies have also noted that this item does not function well in certain LMICs (Clark, Bergenfeld, Cheong, Kaslow, et al., 2023). The only VEX item measuring sexual IPV had the lowest discrimination of any item in that scale, likely due to insufficient coverage of sexual behaviors. The item referencing partner threats of self-harm is another low discrimination item with considerable overlap with other items that could be dropped without any substantive loss of information and replaced with one or more sexual IPV items from the FRA scale to improve content validity.

Limitations

The conclusions that can be drawn from this study are limited by the inherent limitations of the data and the analytic approach. First, the VEX data are available from only two states in the US, although these are weighted to be nationally representative on key demographic variables. Therefore, these data are less generalizable than the FRA data, which were collected from all 28 EU states (as of 2014). Second, IPV is a sensitive topic that is often underreported (Cullen, 2023), and estimates derived from self-report surveys can therefore be viewed as conservative estimates of the true prevalence of IPV. Third, these findings are limited to the construct of IPV against women in HICs. IPV against men is an understudied public health issue that is beyond the scope of the current study. Finally, our decision to dichotomize the FRA items to eliminate sparse data and resolve model convergence issues may also have resulted in some loss of information from the original four-level items.

Future research directions

While this study represents an advancement in establishing the validity of IPV scales in HICs, there remains much work to be done in terms of refining and standardizing IPV scales internationally. Multi-country qualitative research, including cognitive interviewing, will be needed to ensure that any new items are valid and function comparably in diverse populations. In the short term, tools such as AI and emerging techniques such as exploratory graph analysis may be useful in preliminary psychometric testing of revised scales (Golino & Epskamp, 2017). Such approaches have the additional benefit of eliminating risks of retraumatizing survivors of IPV through intensive qualitative interviewing. Finally, additional analyses should explore the psychometric properties of the VEX scale among men as well as the scale’s invariance across gender. Given the unique nature of the FRA items referencing threats of harm to children, invariance testing of the scale using separate models for women with and without children is needed.

Prevention and policy implications

The evidence for strict invariance of a unidimensional model of IPV against women in HICs suggests that psychological IPV and controlling behaviors can be incorporated into regional estimates of IPV in HICs without substantive risk of bias due to measurement differences across countries. Furthermore, prevalence surveys, as well as interventions to prevent and reduce IPV, should measure and report on psychological IPV and controlling behaviors in addition to physical and sexual IPV to ensure that the full scope of IPV is being captured. Including non-physical forms of IPV will bring international reporting more into alignment with Sustainable Development Goal Indicator 5.2, which specifically references psychological IPV. To improve standardization of estimates, “gold standard” IPV item sets commonly used in LMICs to measure IPV, such as the adapted CTS2 (Table 1), may benefit from expanding the scope of psychological and sexual behaviors by considering high discrimination items from the FRA and/or VEX, including consenting to sex out of fear and threats to take away or harm children. Finally, measuring IPV in a more comprehensive way will allow for more accurate evaluation of policy and programmatic interventions to reduce and prevent IPV. For example, if a wider range of IPV acts are assessed, it may enable researchers and practitioners to identify at-risk individuals before abuse becomes more severe as well as individuals whose abusers have adopted more covert tactics to avoid detection.

In conclusion, IPV is one of the leading threats to women’s wellbeing across the life course, yet remains inadequately measured in many contexts. IPV is identified not only by sexual and physical abuse but by a range of both physical and non-physical behaviors that result in an environment of fear and control (Heise et al., 2019). Continued refinement and standardization of existing measures based on rigorous psychometric testing is the first step to understanding the full scope of this pervasive issue and to developing effective interventions to address it.

Supplementary Material

Supplemental Material

Public Significance Statement.

Intimate partner violence is the most common form of violence and a major threat to the health of individuals. Challenges in accurate measurement of intimate partner violence across countries limit global knowledge of its prevalence and downstream impacts. Our study suggests the need to incorporate psychological abuse and controlling behaviors into common measures of intimate partner violence used in high-income countries.

References

  1. Adams AE, & Beeble ML (2019). Intimate partner violence and psychological well-being: Examining the effect of economic abuse on women’s quality of life. Psychology of violence, 9(5), 517. [Google Scholar]
  2. Aiquipa-Tello JJ, & Ponce-Díaz CR (2025). Evidence of Internal Structure Validity, Measurement Invariance, Convergent Validity, Clinical Validity, and Reliability of the Woman Abuse Screening Tool in Peruvian Women. Violence and Victims, 40(1), 3–18. [DOI] [PubMed] [Google Scholar]
  3. Aizpurua E, Copp J, Ricarte JJ, & Vázquez D (2021). Controlling behaviors and intimate partner violence among women in Spain: An examination of individual, partner, and relationship risk factors for physical and psychological abuse. Journal of Interpersonal Violence, 36(1-2), 231–254. [DOI] [PubMed] [Google Scholar]
  4. Alexander Erin F., Backes Bethany L., and Johnson Matthew D.. Evaluating measures of intimate partner violence using consensus-based standards of validity. Trauma, Violence, & Abuse 23.5 (2022): 1549–1567. [Google Scholar]
  5. Bergenfeld I, Clark CJ, Bengtson AM, & Haardörfer R (2025). Measurement Structure and Regional Invariance of the Demographic and Health Survey Intimate Partner Violence Items: A Comparative Confirmatory Factor Analysis. Assessment, 10731911251340847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Borjesson WI, Aarons GA, & Dunn ME (2003). Development and confirmatory factor analysis of the abuse within intimate relationships scale. Journal of Interpersonal Violence, 18(3), 295–309. [Google Scholar]
  7. Chapman H, & Gillespie SM (2019). The Revised Conflict Tactics Scales (CTS2): A review of the properties, reliability, and validity of the CTS2 as a measure of partner abuse in community and clinical samples. Aggression and violent behavior, 44, 27–35. [Google Scholar]
  8. Chatterji S, Boyer C, Sharma V, Abramsky T, Levtov R, Doyle K, Harvey S, & Heise L (2023). Optimizing the construction of outcome measures for impact evaluations of intimate partner violence prevention interventions. Journal of Interpersonal Violence, 38(15-16), 9105–9131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chatterji S, Heise L, Gibbs A, & Dunkle K (2020). Exploring differential impacts of interventions to reduce and prevent intimate partner violence (IPV) on sub-groups of women and men: A case study using impact evaluations from Rwanda and South Africa. SSM-population health, 11, 100635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Clark CJ, Bergenfeld I, Cheong YF, Kaslow NJ, & Yount KM (2023). Impact of measurement variability on study inference in partner violence prevention trials in low-and middle-income countries. Assessment, 30(5), 1339–1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Clark CJ, Bergenfeld I, Cheong YF, Najera H, Sardinha L, García-Moreno C, & Heise L (2023). Patterns of Womenʼs exposure to psychological violence: A global examination of low-and middle-income countries. SSM-population health, 24, 101500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Connelly CD, Newton RR, & Aarons GA (2005). A psychometric examination of English and Spanish versions of the Revised Conflict Tactics Scales. Journal of Interpersonal Violence, 20(12), 1560–1579. [DOI] [PubMed] [Google Scholar]
  13. Corsi DJ, Neuman M, Finlay JE, & Subramanian S (2012). Demographic and health surveys: a profile. International journal of epidemiology, 41(6), 1602–1613. [DOI] [PubMed] [Google Scholar]
  14. Cullen C (2023). Method matters: The underreporting of intimate partner violence. The World Bank Economic Review, 37(1), 49–73. [Google Scholar]
  15. European Agency for Fundamental Rights. (2014). Violence against women: an EU-wide survey. Luxembourg: Publications Office of the European Union, 358. [Google Scholar]
  16. Follingstad DR, & Rogers MJ (2013). Validity concerns in the measurement of women’s and men’s report of intimate partner violence. Sex roles, 69, 149–167. [Google Scholar]
  17. Golino HF, & Epskamp S (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS one, 12(6), e0174035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Heise L, Pallitto C, García-Moreno C, & Clark CJ (2019). Measuring psychological abuse by intimate partners: Constructing a cross-cultural indicator for the Sustainable Development Goals. SSM-population health, 9, 100377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hu L-T, & Bentler PM (1995). Evaluating model fit. [Google Scholar]
  20. Khan S, & Hancioglu A (2019). Multiple indicator cluster surveys: delivering robust data on children and women across the globe. Studies in family planning, 50(3), 279–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kroenke K, Spitzer RL, Williams JB, & Löwe B (2009). An ultra-brief screening scale for anxiety and depression: the PHQ–4. Psychosomatics, 50(6), 613–621. [DOI] [PubMed] [Google Scholar]
  22. Krug EG, Mercy JA, Dahlberg LL, & Zwi AB (2002). The world report on violence and health. The lancet, 360(9339), 1083–1088. [Google Scholar]
  23. Ma N, Chen S, Kong Y, Chen Z, Geldsetzer P, Zeng H, Wu L, Wehrmeister FC, Lu C, & Subramanian S (2023). Prevalence and changes of intimate partner violence against women aged 15 to 49 years in 53 low-income and middle-income countries from 2000 to 2021: a secondary analysis of population-based surveys. The Lancet Global Health, 11(12), e1863–e1873. [DOI] [PubMed] [Google Scholar]
  24. Martín-Fernández M, Gracia E, & Lila M (2019). Psychological intimate partner violence against women in the European Union: a cross-national invariance study. BMC Public Health, 19, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Martín-Fernández M, Gracia E, & Lila M (2020). Ensuring the comparability of cross-national survey data on intimate partner violence against women: a cross-sectional, population-based study in the European Union. BMJ open, 10(3), e032231. [Google Scholar]
  26. Meade AW, Johnson EC, & Braddy PW (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of applied psychology, 93(3), 568. [DOI] [PubMed] [Google Scholar]
  27. O’Hara KL, Perkins AB, Tehee M, & Beck CJ (2018). Measurement invariance across sexes in intimate partner abuse research. Psychology of violence, 8(5), 560. [Google Scholar]
  28. Palm Reed KM, Kline NK, Benz M, Cabrera K, & Hines DA (2022). Measurement Invariance in the Assessment of Intimate Partner Abuse Among Sexual Minority and Non-Sexual Minority Individuals. Partner abuse, 13(4). [Google Scholar]
  29. Peitzmeier SM, Wirtz AL, Humes E, Hughto JM, Cooney E, Reisner SL, & Women AT (2021). The transgender-specific intimate partner violence scale for research and practice: Validation in a sample of transgender women. Social Science & Medicine, 291, 114495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Raj A, Johns N, Closson K, Mahoney A, Yore J, Kully G, LaVeist T, Theall K (2023). Louisiana Violence Experiences Survey (LaVEX) 2023. Newcomb Institute, Tulane University and Center on Gender Equity and Health, University of California San Diego. [Google Scholar]
  31. Raj A, Johns N, Yore J, Closson K, Kully G, Thomas J (2023). California Violence Experiences Survey (CalVEX) 2023. Center on Gender Equity and Health, University of California San Diego and Newcomb Institute, Tulane University. [Google Scholar]
  32. Sardinha L, Maheu-Giroux M, Stöckl H, Meyer SR, & García-Moreno C (2022). Global, regional, and national prevalence estimates of physical or sexual, or both, intimate partner violence against women in 2018. The lancet, 399(10327), 803–813. [Google Scholar]
  33. Shorey RC, Allan NP, Cohen JR, Fite PJ, Stuart GL, & Temple JR (2019). Testing the factor structure and measurement invariance of the conflict in Adolescent Dating Relationship Inventory. Psychological assessment, 31(3), 410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Straus MA, Hamby SL, Boney-McCoy S, & Sugarman DB (1996). The revised conflict tactics scales (CTS2) development and preliminary psychometric data. Journal of family issues, 17(3), 283–316. [Google Scholar]
  35. Toland MD (2014). Practical guide to conducting an item response theory analysis. The Journal of Early Adolescence, 34(1), 120–151. [Google Scholar]
  36. Tse WW-Y, Lai MH, & Zhang Y (2024). Does strict invariance matter? Valid group mean comparisons with ordered-categorical items. Behavior Research Methods, 56(4), 3117–3139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Walby S (2005). Improving the statistics on violence against women. Statistical Journal of the United Nations Economic Commission for Europe, 22(3-4), 193–216. [Google Scholar]
  38. Wareham J, Wagers SM, Rodriguez LM, & Neighbors C (2022). An exploration of measurement invariance across sex in intimate partner violence perpetration. Victims & offenders, 17(2), 161–181. [Google Scholar]
  39. Yount KM, Cheong YF, Khan Z, Bergenfeld I, Kaslow N, & Clark CJ (2022). Global measurement of intimate partner violence to monitor Sustainable Development Goal 5. BMC Public Health, 22(1), 465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Yount KM, Johnson E, Kaslow N, & Cheong YF (2024). Are “Global Measures” of Psychological Intimate Partner Violence Against Women Really Comparable? A Measurement Invariance Analysis of Controlling Behaviors in 19 Low-and Middle-Income Countries. Research Square, rs. 3. rs-4963461. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES