Abstract
In lower- and middle-income countries (LMICs), studies of interventions to reduce intimate partner violence (IPV) are expanding, yet measurement equivalence of the IPV construct—the primary outcome in these investigations—has not been established. We assessed the measurement equivalence of physical and sexual IPV item sets used in recent trials in LMICs and tested the impact of noninvariance on study inference. With data from four intervention trials (N = 3,545) completed before 2020, we used multiple-group confirmatory factor analysis to assess invariance across arms, over time, and across studies. We also calculated average treatment effects adjusting for covariate imbalance to assess concordance with published results. Most items functioned equivalently within studies at baseline and end line. Some evidence of longitudinal noninvariance was observed in at least one study arm in three studies, but did not meaningfully affect latent means or effect-size estimates. Evidence of partial invariance across studies at baseline and strict invariance over time was observed. Common measures of physical and sexual IPV were valid for measuring intervention impact in these samples. The study highlights the need for harmonized use of the tested scale, content validity assessments, and routine measurement equivalence testing to ensure valid inferences about intervention effectiveness.
Keywords: intimate partner violence, measurement invariance, measurement equivalence, prevention trials, lower- and middle-income countries
Intimate partner violence (IPV) is a major global problem, recognized first by feminist activists who helped shift global norms and national laws. Efforts to measure IPV aimed to fill the epidemiologic gap in knowledge. These studies generated rich evidence documenting that this public health problem affects over one in four women globally (World Health Organization, 2021), is accompanied by health and social consequences (Bacchus et al., 2018), and costs 5% of the world’s gross domestic product (Hoeffler & Fearon, 2014)
The field has expanded to include tests of a growing number of prevention interventions. Lagging behind, however, are basic tests of the equivalence of the IPV scale (i.e., Is it measuring the same construct across intervention and control groups?). To date, IPV-prevention researchers have compared measures of IPV across study arms and over time, assuming, but not testing for measurement equivalence. Measurement nonequivalence, however, renders comparisons of IPV estimates across study arms and time invalid. Other fields, such as psychology, more routinely conduct these tests or have the benefit of widely validated gold-standard measures. The field of international violence prevention still has not attained this level of rigor, including assessment of measurement equivalence. As evidence from randomized-controlled IPV-prevention trials grows, including in lower- and middle-income countries (LMICs) where the prevalence is disproportionately high and resources for prevention scarce, the assumptions of the field’s primary outcome must be tested to ensure valid measurement and inferences about the efficacy and effectiveness of interventions. To fill this gap, we assessed the cross-arm and cross-time measurement invariance of IPV outcomes and effects of noninvariance on study inference in four LMICs.
We addressed these questions using data from recent trials fielded as part of the What Works to Prevent Violence Against Women and Girls Program (2014-2020), which is the largest effort to date (£25 million, US$34 million) to concurrently test different interventions to prevent violence against women and girls in LMICs (Crawford et al., 2020). The four studies included in this novel test of measurement equivalence represent different LMIC settings and different interventions, but have a common measurement timeframe (past 12 months) and content (physical and sexual IPV). We hypothesized as follows:
Hypothesis 1 (H1): All IPV items would exhibit cross-arm and cross-time measurement equivalence in a subset of the studies.
Hypothesis 2 (H2): Items worded more objectively (physical IPV) would exhibit equivalence more often than more subjectively worded items (sexual IPV).
Hypothesis 3 (H3): A lack of measurement equivalence would bias the treatment effect on IPV, with the direction and magnitude depending on the type of nonequivalence that was observed.
Method
Overview
We assessed within-trial cross-arm and cross-time measurement equivalence of scales to measure physical and sexual IPV (as separate or combined scales) in four studies. We conducted a pooled cross-trial analysis of the subset of four physical IPV items that were identically worded in the English translations across studies. We assessed the impact of any identified nonequivalence on intervention effect size and direction.
Sample
Studies measuring IPV victimization among women as a primary outcome with experimental and quasi-experimental designs were identified through the What Works to Prevent Violence Against Women and Girls Program (Crawford et al., 2020). While studies participating in What Works were encouraged to use standard outcome measures for IPV, a diversity of study designs and programming types were represented. Of the publicly available What Works data sets with IPV as an outcome (N = 12, Supplemental Table A), four met the inclusion criteria for the present analysis: (a) a panel design, (b) individual-level measurement of IPV victimization, and (c) a control arm.
Table 1 summarizes design characteristics of the four studies: cluster randomized (Indashyikirwa; Stern et al., 2018), Stepping Stones and Creating Futures (SSCF; Gibbs et al., 2017), individually randomized (Women for Women [WW]; Gibbs et al., 2018), and quasi-experimental (HERrespect; Al Mamun et al., 2018). Indashyikirwa, a couples-based intervention, focused on gender, violence, conflict resolution, power dynamics, social norms, economic empowerment, and community change. SSCF involved group work with men and women on gender norms, violence, sexual health, communication and conflict resolution, and livelihood development. HERrespect, a workplace-based intervention, focused on communication, gender norms, relationships power, violence, stress, conflict management, factory policies, goal setting, and change making. Women for Women was the only intervention that did not explicitly incorporate gender, but emphasized vocational skills training and financial support to empower participants economically. Intervention programs ranged from 18 to 70 hours.
Table 1.
Summary of Trials Selected for the Analysis (n = 4).
Trial and country | Study period in months | Design | Analytic sample | Eligibility criteria | Participant age (years) | Intervention | IPV domains and number of items |
---|---|---|---|---|---|---|---|
Indashyikirwa (Stern et al., 2018), Rwanda | 24 | Cluster randomized (28 clusters) | 815 Tx 801 Ctl |
Member or partner of a member of an active village savings and loan association; 18–49 years old; living with or married to current partner for at least 6 months; willing and able to give informed consent; willing to give contact information of three friends, neighbors, or family members; living in study area for next 2.5 years | Mean age 32.74 (range 19–49) | Couples work on types and uses of personal and interpersonal power, gender, gender-based violence, triggers for IPV, conflict management, economic development, sexuality, alcohol, social norms, and community change (60 hours) | Physical (five items) Sexual (three items) |
Stepping Stones and Creating Futures (Gibbs et al., 2017), South Africa | 24 | Cluster randomized (34 clusters) | 260 Tx 285 Ctl |
Cluster level: informal settlement where community partner
stated trial was safe. Participant level: normal resident of informal settlement; aged 18–30 years; not formally employed; able to communicate in main languages of study; no significant disability or currently drunk, drugged, or psychotic |
Mean age 23.98 (range 18–35) | Group work on gender norms, gender-based violence, sexual health, communication and conflict resolution skills; livelihood development (63 hours) | Physical (five items) Sexual (four items) |
HERrespect (Al Mamun et al., 2018), Bangladesh | 12 | Cluster nonrandomized (eight clusters) | 303 Tx 304 Ctl |
Female garment workers, currently married, and living with husband, worked in factory for at least 1 year | Mean age 27.76 (range 17–57) | Gender-transformative group sessions including communication skills, gender norms, relationships power, gender-based violence, stress and conflict management, factory policy analysis, goal setting, and being a change maker (18 hours) | Physical (five items) Sexual (five items) |
Women for Women (Gibbs et al., 2018), Afghanistan | 24 | Individually randomized | 434 Tx 343 Ctl |
Women earning less than US$1.25/day, unemployed, not in school, not in similar program, no mental illness or very severe disability, aged 18–45 years, married | Mean age 32.20 (range 18–46) | Group sessions on information; skill building (numeracy, basic vocational), conditional cash transfer, savings, social support (70 hours) | Physical (five items) |
Note. IPV = intimate partner violence.
Except for HERrespect, studies invited participants to volunteer and, therefore, did not report baseline participation rates. While HERrespect randomly sampled 100 participants per factory from a list of eligible participants, the HERrespect study team did not report participation rates (Naved et al., 2021). The four data sets originally included 4,597 individuals (study range 677 to 1,659) enrolled at baseline (Figure 1). Retention rates ranged from 76% to 97% across studies. Data for the present analysis were subset to include the individual women who completed end line data collection (3,978), had a current husband/partner (3,559), and were not missing outcomes of interest at baseline (3,545). After restricting on these criteria, the final sample totaled 3,545 (study range 545 to 1,616).
Figure 1.
CONSORT Diagram.
Data
This study assessed measurement equivalence of items developed to measure women’s experience of violence perpetrated by a male partner. A focus on women’s victimization is justified by research identifying men as the predominant, although not the only, perpetrators of IPV, especially in LMICs (Kishor & Bradley, 2012; Kishor & Johnson, 2004). It is also supported by evidence that violence perpetrated by men is of greater severity and accompanied by more severe health impact than partner violence perpetrated by women (Kishor & Bradley, 2012). The What Works items measuring women’s experience of IPV were adapted from the World Health Organization’s (WHO) Multicountry Study on Women’s Health and Domestic Violence Against Women. Adaptations included the combination of two of the severe WHO physical IPV items into one and some additional minor wording adjustments. The WHO scale itself was adapted from behaviorally based items originating from the Conflicts Tactics Scale (CTS) Revised version (Straus et al., 1996). The revised CTS items were modified based on evidence available at the time and subsequently pilot tested before administration (Garcia-Moreno et al., 2005).
All four studies measured physical IPV in the past 12 months using five items that assessed the frequency with which women’s intimate partners (a) slapped or threw an object at them; (b) pushed or shoved them; (c) hit them with a fist or object; (d) kicked, dragged, beat, choked, or burnt them; and (e) threatened with or actually attacked them with a weapon. Three studies included at least three items measuring sexual IPV in the past 12 months, including the following experiences: (a) forced to have sex, (b) threatened or intimidated into having sex, (c) forced to do other sexual acts, (d) forced to view pornography, and (e) engaged in unwanted sex due to fear of the consequences of refusing. Table 2 compares English translations of item wording across studies.
Table 2.
English Wording of Physical and Sexual IPV Items, by Included Trial.
Women for Women international | HERrespect | Stepping Stones and Creating Futures | Indashyikirwa |
---|---|---|---|
Physical IPV | Physical IPV | Physical IPV | Physical IPV |
In the past 12 months, how many times has your husband slapped you or thrown something at you which could hurt you? | In the past 12 months, how many times has your husband slapped you or thrown something at you which could hurt you? | In the past 12 months, how many times has a current or previous husband or boyfriend ever slapped you or thrown something at you which could hurt you? | In the past 12 months, how many times has your current husband slapped you or thrown something at you which could hurt you? |
In the past 12 months, how many times has your husband pushed or shoved you? a | In the past 12 months, how many times has your husband pushed you or shoved you or pulled your hair? a | In the past 12 months, how many times has a current or previous husband or boyfriend ever pushed or shoved you? a | In the past 12 months, how many times has your current husband pushed or shoved you? a |
In the past 12 months, how many times has your husband hit you with a fist or with something else which could hurt you? | In the past 12 months, how many times has your husband hit you with a fist or with something else which could hurt you? | In the past 12 months, how many times has a current or previous husband or boyfriend ever hit you with a fist or with something else which could hurt you? | In the past 12 months, how many times has your current husband hit you with a fist or with something else which could hurt you? |
In the past 12 months, how many times has your husband kicked, dragged, beaten, choked, or burnt you? | In the past 12 months, how many times has your husband kicked, dragged, beaten, choked, or burnt you? | In the past 12 months, how many times has a current or previous husband or boyfriend ever kicked, dragged, beaten, choked, or burnt you? | In the past 12 months, how many times has your current husband kicked, dragged, beaten, choked, or burnt you? |
In the past 12 months, how many times has your husband threatened to use or actually used a gun, knife, or other weapon against you? | In the past 12 months, how many times has your husband threatened to use or actually used a gun, knife, or other weapon against you? | In the past 12 months, how many times has a current or previous husband or boyfriend ever threatened to use or actually used a gun, knife, or other weapon against you? | In the past 12 months, how many times has your current husband threatened to use or actually used a gun, knife, or other weapon against you? |
Sexual IPV | Sexual IPV | Sexual IPV | Sexual IPV |
NA | In the past 12 months, how many times has your current husband physically forced you to have sex when you did not want to? | In the past 12 months, how many times has a current or previous husband or boyfriend ever physically forced you to have sex when you did not want to? | In the past 12 months, how many times has your husband physically forced you to have sex with him when you didn’t want to? |
NA | In the past 12 months, how many times has your current husband used threats or intimidation to get you to have sex when you did not want to? | In the past 12 months, how many times has your current or previous boyfriend, husband or partner used threats or intimidation to get you to have sex when you did not want to? c | In the past 12 months, how many times has your husband used threats or intimidation to make you have sex when you did not want to? |
NA | In the last 12 months, how many times current husband has forced you to do something sexual that you found degrading or humiliating? b | In the past 12 months, how many times has a current or previous husband or boyfriend ever forced you to do something else sexual that you did not want to do? | In the past 12 months, how many times has your husband used physical force or threats to make you do something else sexual that you did not want to do? |
NA | In the last 12 months, how many times your current husband has forced you to watch pornography when you did not want to? b | In the past 12 months, how many times has a current or previous husband or boyfriend ever forced you to watch pornography when you did not want to? | NA |
NA | In the past 12 months, how many times did you have sexual intercourse you did not want to because you were afraid of what your husband might do? | NA | NA |
Note. IPV = intimate partner violence.
Dropped from cross-trial analysis due to wording differences. bDropped at Exploratory Factor Analysis (EFA) stage. cDropped at multiple-group confirmatory factor analysis (MGCFA) stage due to model convergence issues.
Pooled analyses were restricted to items worded identically (in English), which included four of the five physical IPV acts: slap, hit, kick, and attack or threaten with a weapon. The “pushed or shoved” item was excluded because one study also included hair pulling in the item wording. Item response categories for all studies provided ordinal options (never, once, few, many) for partner perpetration of each acts. Given the scarcity of data in the higher frequency categories, response options were collapsed to be dichotomous (ever in past 12 months/never).
Descriptive Analyses
All investigative team members were blinded to the trial from which data sets originated except the primary analyst, who could not be blinded due to familiarity based on acquiring and managing the data sets. Univariate analyses were conducted for each trial across study wave and arm, examining prevalence and missingness for all IPV items. Tetrachoric correlations were estimated to assess the association between dichotomous items measuring specific acts of IPV and dichotomous items measuring exposure to each general domain IPV (e.g., physical or sexual).
Exploratory and Confirmatory Factor Analysis (EFA/CFA)
Unless specified, models were estimated in Mplus 8.0 using mean and variance-adjusted weighted least squares (WLSMV), an estimation approach suitable for dichotomous data (Muthén & Muthén, 2018). For each study, incremental EFA was performed using a random split-half sample, accounting for the complex sampling design, where applicable, based on the theorized number of constructs being measured. Model fit was assessed using the Root Mean Square Error of Approximation (RMSEA) < 0.06, Comparative Fit Index (CFI) > 0.95, and Tucker–Lewis Index (TLI) > 0.95; Hu & Bentler, 1995. We chose the various model fit indices based on results from simulation studies (e.g., Cheung & Rensvold, 2002; Hu & Bentler, 1999) and reviews of current practices (Putnick & Bornstein, 2016; Vandenberg & Lance, 2000).
Cross-Arm, Cross-Study, and Longitudinal Invariance
We performed measurement invariance testing in stages of nested models through the application of constraints using multiple-group confirmatory factor analysis (MGCFA) for dichotomous indicators (Millsap, 2012). Adequate fit was assessed using the criteria outlined above for EFA/CFA. When investigating measurement invariance, we assessed changes in the goodness-of-fit indices between the more and less stringent models, using a guide of ΔRMSEA/CFI ≤ 0.01 to determine an acceptable worsening in fit in more stringent models (Liu et al., 2017). We compared the fit of models with and without factor loadings and category thresholds constrained (Davidov et al., 2012). Constraints were applied across arms and across time separately. We performed stratified tests across time points in each study arm to identify differential noninvariance over time.
For items showing any noninvariance, we used the maximum likelihood estimation to determine whether noninvariance arose primarily from loadings or thresholds. Where there was a lack of support for invariance through comparison of configural versus metric models, modification indices from WLSMV estimation were used to identify potential constraints to relax. For these studies, we then assessed partial measurement invariance by relaxing constraints on some items and studying the fit of the revised models, and estimated the severity of the impact of noninvariance on meaningful cross-arm comparisons by the expected parameter changes (standardized differences of latent IPV means) with the aforementioned equality restrictions lifted. For studies with suggested noninvariance in one of the two arms, we calculated the change in the difference-in-difference estimate as a proportion of the standard deviation.
Concordance With published Trial Results
The four studies in the sample used regression-based methods to calculate the effect of their respective interventions on observed binary physical and sexual IPV outcomes. To assess concordance of our latent IPV findings with published trial results (Dunkle et al., 2020; Gibbs, Corboz, et al., 2020; Gibbs, Washington, et al., 2020; Naved et al., 2021), MGCFA was used to estimate the average treatment effect on the treated (ATT) for each of the four trials. Because all four trials controlled for baseline outcome measures and for theoretically relevant demographic covariates, we constructed propensity score (PS) weights using baseline physical and sexual IPV as latent inputs and age, household home ownership and land ownership (where available), food/financial insecurity, savings and earnings in the last month (where available), and education to account for baseline differences in treatment arms (Leite et al., 2019). We then compared standardized differences in physical and sexual IPV factor score means between control and intervention groups at end line in PS weighted versus unweighted models (Leite et al., 2019). Finally, we performed sensitivity analyses using quadratic and cubic PS weights (Leite et al., 2019).
Role of the Funding Source
The funder had no role in study design; data collection, analysis, or interpretation; report writing; or decision to submit the article for publication.
Results
Descriptive Statistics
Table 3 provides univariate data on IPV across arms and time points. The overall prevalence of physical IPV at baseline ranged from 24% to 38%, with sexual IPV ranging from 31% to 42%. Individual interitem correlations ranged from .25 to .60 (across domains) and from .54 to .98 (within domains). In all studies measuring sexual IPV, correlations between any physical and any sexual IPV item were between .51 and .61.
Table 3.
Prevalence of IPV Act and Domain, by Included Study Trial.
Indashyikirwa | Treatment | n = 815 | Control | n = 801 | Overall | n = 1,616 |
---|---|---|---|---|---|---|
Physical IPV | Baseline % | End line % | Baseline % | End line % | Baseline % | End line % |
Slapped or thrown something | 31.90 | 15.71 | 21.97 | 21.20 | 26.98 | 18.38 |
Pushed or shoved | 30.92 | 19.63 | 23.35 | 23.60 | 27.17 | 21.60 |
Hit with fist or other object | 22.21 | 11.66 | 16.48 | 16.85 | 19.37 | 14.23 |
Kicked/dragged/beaten/choked/burnt | 13.01 | 5.64 | 8.74 | 6.99 | 10.89 | 6.31 |
Attacked/threatened with weapon | 5.28 | 3.19 | 3.12 | 3.75 | 4.21 | 3.47 |
Any physical | 42.45 | 25.28 | 33.08 | 32.08 | 37.81 | 28.65 |
Sexual IPV | Baseline % | End line % | Base line % | End line % | Baseline % | End line % |
Physically forced to have sex | 33.25 | 21.35 | 27.34 | 29.09 | 30.32 | 25.19 |
Threatened/intimidated to have sex a | 37.30 | 26.75 | 31.21 | 32.71 | 34.28 | 29.70 |
Forced other sex act | 5.77 | 3.19 | 3.50 | 2.87 | 4.64 | 3.03 |
Any sexual | 44.91 | 32.02 | 37.83 | 37.83 | 41.40 | 34.90 |
Stepping Stones | Treatment | n = 260 | Control | n = 285 | Overall | n = 545 |
Physical IPV | Baseline % | End line % | Baseline % | End line % | Baseline % | End line % |
Slapped or had object thrown at you | 42.31 | 39.62 | 47.02 | 39.65 | 44.77 | 39.63 |
Pushed or shoved | 36.15 | 31.92 | 43.16 | 42.11 | 39.82 | 37.25 |
Hit with fist or other object | 30.77 | 32.31 | 35.09 | 32.98 | 33.03 | 32.66 |
Kicked/dragged/beaten/choked/burnt | 23.85 | 23.85 | 27.37 | 25.61 | 25.69 | 24.77 |
Attacked threatened with weapon | 15.38 | 19.62 | 17.89 | 22.46 | 16.70 | 21.10 |
Any physical | 32.15 | 51.54 | 32.84 | 54.39 | 32.50 | 53.03 |
Sexual IPV | Baseline % | End line % | Baseline % | End line % | Baseline % | End line % |
Physically forced to have sex | 20.77 | 20.38 | 21.05 | 27.72 | 20.92 | 24.22 |
Threatened/intimidated to have sex a | 18.85 | 25.77 | 18.95 | 23.16 | 18.90 | 24.40 |
Forced other sex act | 17.69 | 20.38 | 19.65 | 21.75 | 18.72 | 21.10 |
Forced to watch pornography | 11.15 | 13.85 | 12.63 | 14.74 | 11.93 | 14.31 |
Any sexual | 30.38 | 35.77 | 31.23 | 37.54 | 30.83 | 36.70 |
HERrespect | Treatment | n = 303 | Control | n = 304 | Overall | n = 607 |
Physical IPV | Baseline % | End line % | Baseline % | End line % | Baseline % | End line % |
Slapped or had object thrown at you | 24.09 | 22.11 | 34.54 | 22.70 | 29.32 | 22.41 |
Pushed or shoved | 12.54 | 9.24 | 17.76 | 10.53 | 15.16 | 9.88 |
Hit with fist or other object | 8.25 | 5.94 | 10.20 | 10.20 | 9.23 | 8.07 |
Kicked/dragged/beaten/choked/burnt | 5.61 | 3.96 | 9.21 | 7.24 | 7.41 | 5.60 |
Attacked/threatened with weapon | 0.66 | 0.33 | 3.29 | 2.30 | 1.98 | 1.32 |
Any physical | 27.72 | 24.09 | 38.82 | 25.99 | 33.28 | 25.04 |
Sexual IPV | Baseline % | End line % | Baseline % | End line % | Baseline % | End line % |
Physically forced to have sex | 22.44 | 15.18 | 38.82 | 20.07 | 30.64 | 17.63 |
Threatened/intimidated to have sex | 8.91 | 5.61 | 21.05 | 8.88 | 14.99 | 7.25 |
Had sex out of fear | 20.46 | 13.20 | 42.43 | 14.47 | 31.47 | 13.84 |
Forced other sex act a | 5.28 | 5.94 | 8.22 | 6.25 | 6.75 | 6.10 |
Forced to watch pornography a | 1.65 | 3.30 | 5.26 | 3.62 | 3.46 | 3.46 |
Any sexual | 30.69 | 22.11 | 52.63 | 26.97 | 41.68 | 24.55 |
Women for Women | Intervention | n = 434 | Control | n = 343 | Overall | n = 777 |
Physical IPV | Baseline % | End line | Baseline % | End line | Baseline % | End line |
Slapped or had object thrown at you | 19.08 | 17.24 | 21.51 | 20.93 | 20.15 | 18.87 |
Pushed or shoved | 18.62 | 18.62 | 22.38 | 20.64 | 20.28 | 19.51 |
Hit with fist or other object | 14.94 | 18.39 | 19.19 | 20.64 | 16.82 | 19.38 |
Kicked/dragged/beaten/choked/burnt | 11.03 | 8.05 | 12.79 | 9.01 | 11.81 | 8.47 |
Attacked/threatened with weapon | 5.75 | 3.68 | 5.81 | 5.23 | 5.78 | 4.36 |
Any physical | 22.07 | 23.22 | 26.16 | 25.58 | 23.88 | 24.26 |
Note. IPV = intimate partner violence.
Item not used in measurement invariance testing.
Factor Structure
For the three studies measuring physical and sexual IPV, EFA/CFA of the full item set revealed a two-factor structure; physical IPV items loaded on Factor 1, and sexual IPV items loaded on Factor 2 (Fit statistics and item loading ranges for all models are summarized in Supplemental Table B.). In HERrespect, two sexual IPV items, “forced you to watch pornography” and “forced you to perform other sexual acts,” loaded onto the physical IPV factor and were subsequently dropped. Due to high interitem correlations (.89–.92) interfering with model convergence, the item referencing sex through threats or intimidation was dropped from the SSCF item set. In the Women for Women study, which measured only physical IPV, the EFA/CFA supported a single-factor model encompassing the five physical IPV items. Difference testing confirmed a better fit for the two-factor over the one-factor models for studies measuring physical and sexual IPV. Overall, EFA/CFA supported modeling of IPV according to the theorized domains of physical and sexual IPV, and all models showed good fit.
Within-Study Cross-Arm and Cross-Time Results
Tables 4 to 7 summarize results of invariance testing. For all studies, cross-arm tests for baseline and end line data showed equivalence based on change in fit statistics and on chi-square difference tests comparing configural and scalar models. Configural and scalar models showed good fit to the data, and ΔCFI and ΔRMSEA were within acceptable limits in all studies.
Table 4.
Comparison of Models in Indashyikirwa.
Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|
Cross-arm baseline (n = 1,616) | |||||||||||
Configural | 62.535 | 38 | .0073 | 0.028 | 0.015 | 0.040 | 0.996 | 0.995 | |||
Scalar | 68.067 | 42 | .0067 | 0.028 | 0.015 | 0.039 | 0.996 | 0.995 | |||
Scalar vs. configural | 7.220 | 4 | .1247 | 5.532 | 0.000 | 0.000 | |||||
Cross-arm end line (n = 1,616) | |||||||||||
Configural | 71.207 | 38 | .0009 | 0.033 | 0.021 | 0.045 | 0.996 | 0.995 | |||
Scalar | 74.863 | 42 | .0014 | 0.031 | 0.019 | 0.043 | 0.996 | 0.995 | |||
Scalar vs. configural | 4.111 | 4 | .3912 | 3.656 | 0.000 | 0.002 | |||||
Cross-time (n = 1,616) | |||||||||||
Configural | 141.735 | 98 | .0026 | 0.017 | 0.010 | 0.022 | 0.996 | 0.995 | |||
Scalar | 146.478 | 102 | .0026 | 0.016 | 0.010 | 0.022 | 0.996 | 0.995 | |||
Scalar vs. configural | 6.455 | 4 | .1676 | 4.743 | 0.000 | 0.001 | |||||
Treatment cross-time (n = 815) | |||||||||||
Configural | 118.339 | 98 | .0793 | 0.016 | 0.000 | 0.025 | 0.996 | 0.995 | |||
Scalar | 123.630 | 102 | .0715 | 0.016 | 0.000 | 0.025 | 0.995 | 0.995 | |||
Scalar vs. configural | 9.924 | 4 | .0417 | 5.291 | 0.001 | 0.000 | |||||
Control cross-time (n = 801) | |||||||||||
Configural | 123.040 | 98 | .0444 | 0.018 | 0.003 | 0.027 | 0.995 | 0.994 | |||
Scalar | 127.835 | 102 | .0426 | 0.018 | 0.004 | 0.027 | 0.995 | 0.994 | |||
Scalar vs. configural | 6.705 | 4 | .1523 | 4.795 | 0.000 | 0.000 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
Table 7.
Comparison of Models in Women for Women.
Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|
Cross-arm baseline (n = 777) | |||||||||||
Configural | 12.444 | 10 | .2564 | 0.025 | 0.000 | 0.063 | 1.000 | 1.000 | |||
Scalar | 17.868 | 13 | .1626 | 0.031 | 0.000 | 0.063 | 1.000 | 1.000 | – | – | – |
Scalar vs. configural | 7.329 | 3 | .0621 | 5.424 | 0.000 | 0.006 | |||||
Cross-arm end line (n = 755) | |||||||||||
Configural | 23.246 | 10 | .0099 | 0.059 | 0.028 | 0.091 | 0.999 | 0.999 | |||
Scalar | 24.070 | 13 | .0305 | 0.047 | 0.014 | 0.077 | 0.999 | 0.999 | |||
Scalar vs. configural | 1.414 | 3 | .7021 | 0.824 | 0.000 | 0.012 | |||||
Cross-time (n = 751) a | |||||||||||
Configural | 60.254 | 34 | .0036 | 0.032 | 0.018 | 0.045 | 0.999 | 0.999 | |||
Scalar | 67.127 | 37 | .0018 | 0.033 | 0.020 | 0.045 | 0.999 | 0.999 | |||
Scalar vs. configural | 9.135 | 3 | .0275 | 6.873 | 0.000 | 0.001 | |||||
Treatment cross-time (n = 4,211) a | |||||||||||
Configural | 49.889 | 34 | .0386 | 0.033 | 0.008 | 0.052 | 0.999 | 0.999 | |||
Scalar | 55.627 | 37 | .0252 | 0.035 | 0.013 | 0.052 | 0.999 | 0.999 | |||
Scalar vs. configural | 7.700 | 3 | .0526 | 5.738 | 0.000 | 0.002 | |||||
Control cross-time (n = 3,300) a | |||||||||||
Configural | 43.602 | 34 | .1253 | 0.029 | 0.000 | 0.052 | 0.999 | 0.999 | |||
Scalar | 46.554 | 37 | .1349 | 0.028 | 0.019 | 0.051 | 0.999 | 0.999 | |||
Scalar vs. configural | 2.358 | 3 | .5016 | 2.952 | 0.000 | 0.001 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
24 participants missing outcome data at end line; two additional data missing due to ID mismatch.
Table 5.
Comparison of Models in Stepping Stones.
Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|
Cross-arm baseline (n = 545) | |||||||||||
Configural | 84.168 | 38 | <.0001 | 0.067 | 0.048 | 0.086 | 0.985 | 0.977 | |||
Scalar | 84.565 | 42 | <.0001 | 0.061 | 0.042 | 0.080 | 0.986 | 0.981 | |||
Scalar vs. configural | 0.914 | 4 | .9225 | 0.396 | 0.001 | 0.006 | |||||
Cross-arm end line (n = 545) | |||||||||||
Configural | 49.216 | 38 | .1051 | 0.033 | 0.000 | 0.057 | 0.997 | 0.996 | |||
Scalar | 54.936 | 42 | .0870 | 0.034 | 0.000 | 0.056 | 0.997 | 0.996 | |||
Scalar vs. configural | 4 | .1252 | 5.720 | 0.000 | 0.001 | ||||||
Cross-time (n = 545) | |||||||||||
Configural | 132.721 | 98 | .0112 | 0.025 | 0.013 | 0.036 | 0.993 | 0.992 | |||
Scalar | 138.311 | 102 | .0097 | 0.026 | 0.013 | 0.036 | 0.993 | 0.992 | |||
Scalar vs. configural | 8.970 | 4 | .0618 | 5.590 | 0.000 | 0.001 | |||||
Treatment cross-time (n = 260) | |||||||||||
Configural | 115.607 | 98 | .1082 | 0.026 | 0.000 | 0.044 | 0.992 | 0.991 | |||
Scalar | 119.589 | 102 | .1125 | 0.026 | 0.000 | 0.043 | 0.992 | 0.991 | |||
Scalar vs. configural | 4.440 | 4 | .3497 | 3.982 | 0.000 | 0.000 | |||||
Control cross-time (n = 285) | |||||||||||
Configural | 111.889 | 98 | .1597 | 0.022 | 0.000 | 0.040 | 0.994 | 0.993 | |||
Scalar | 117.242 | 102 | .1436 | 0.023 | 0.000 | 0.040 | 0.993 | 0.992 | |||
Scalar vs. configural | 8.829 | 4 | .0655 | 5.353 | 0.001 | 0.001 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
Table 6.
Comparison of Models in HERrespect.
Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|
Cross-arm baseline (n = 607) | |||||||||||
Configural | 37.908 | 38 | .4737 | 0.000 | 0.000 | 0.040 | 1.000 | 1.000 | |||
Scalar | 45.459 | 42 | .3300 | 0.016 | 0.000 | 0.043 | 0.999 | 0.999 | |||
Scalar vs. configural | 8.071 | 4 | .0890 | 7.998 | 0.001 | 0.016 | |||||
Cross-arm end line (n = 607) | |||||||||||
Configural | 36.980 | 38 | .5165 | 0.000 | 0.000 | 0.039 | 1.000 | 1.000 | |||
Scalar | 44.055 | 42 | .3847 | 0.013 | 0.000 | 0.042 | 1.000 | 0.999 | |||
Scalar vs. configural | 8.776 | 4 | .0670 | 7.075 | 0.000 | 0.013 | |||||
Cross-time (n = 607) | |||||||||||
Configural | 93.001 | 98 | .6238 | 0.000 | 0.000 | 0.019 | 1.000 | 1.002 | |||
Scalar | 97.840 | 102 | .5981 | 0.000 | 0.000 | 0.019 | 1.000 | 1.001 | |||
Scalar vs. configural | 8.657 | 4 | .0703 | 4.839 | 0.000 | 0.000 | |||||
Treatment cross-time (n = 303) | |||||||||||
Configural | 160.404 | 98 | .0001 | 0.046 | 0.033 | 0.058 | 0.983 | 0.980 | |||
Scalar | 163.386 | 102 | .0001 | 0.045 | 0.031 | 0.057 | 0.984 | 0.981 | |||
Scalar vs. configural | 2.282 | 4 | .6840 | 2.982 | 0.001 | 0.001 | |||||
Control cross-time (n = 304) | |||||||||||
Configural | 109.368 | 98 | .2033 | 0.020 | 0.000 | 0.037 | 0.994 | 0.993 | |||
Scalar | 114.575 | 102 | .1860 | 0.020 | 0.000 | 0.037 | 0.994 | 0.993 | |||
Scalar vs. configural | 9.956 | 4 | .0412 | 5.207 | 0.000 | 0.000 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
Based on small changes in fit statistics, the items administered in all studies showed evidence of invariance over time; however, only SSCF showed unambiguous evidence based on the chi-square difference test in the full sample and in arm-stratified subsamples. Although the cross-time chi-square difference test for the full Women for Women sample was significant (suggesting potential lack of invariance), no differences emerged when the analysis was stratified by arm. Moreover, the change in fit statistics between configural and scalar cross-time models for this study remained negligible, at 0.001 for RMSEA and <0.001 for CFI. Removal of loading and factor constraints from the item “hit” in the scalar model resulted in a nonsignificant change in the chi-square (Table 8). While full sample cross-time chi-square tests suggested equivalence, cross-time chi-square tests for treatment arm in Indashyikirwa and control arm in HERrespect were significant. Additional testing using maximum likelihood estimation confirmed that cross-time invariance in these arms arose primarily from differences in thresholds, or likelihood of endorsing the item, based on nonsignificant chi-square difference tests between configural and metric models and significant chi-square difference tests between metric and scalar models (Supplemental Table C). Freeing the threshold for “slap” in Indashyikirwa and “hit” in HERrespect resulted in nonsignificant chi-square difference tests between configural and scalar models. Freeing these constraints resulted in small changes in difference in difference estimates (Indashyikirwa: 0.078; HERrespect: 0.055), which if judged against Cohen’s D (Cohen, 1992) is well below the threshold of a small effect (0.20). When considered in relation to a standard deviation of 1.00, the size of the difference equates to 0.128 standard deviations (Indashyikirwa) and 0.182 standard deviations (HERrespect), confirming the small impact of freeing the constraints. The lack of full invariance did not affect the direction or significance of mean differences in IPV over time (Table 9).
Table 8.
Partial Invariance Testing for Trials Showing Scalar Noninvariance Over Time.
Trial/arm | Parameters freed | Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Women for Women full sample | “Hit” loading | Configural | 60.254 | 34 | .0036 | 0.032 | 0.018 | 0.045 | 0.999 | 0.999 | |||
Scalar | 61.715 | 36 | .0048 | 0.031 | 0.017 | 0.044 | 0.999 | 0.999 | |||||
Scalar vs. configural | 0.477 | 2 | .7876 | 1.461 | 0.000 | 0.001 | |||||||
Indashyikirwa treatment | “Slap” threshold | Configural | 118.339 | 98 | .0793 | 0.016 | 0.000 | 0.025 | 0.996 | 0.995 | |||
Scalar | 121.607 | 101 | .0796 | 0.016 | 0.000 | 0.025 | 0.996 | 0.995 | |||||
Scalar vs. configural | 4.316 | 3 | .2293 | 3.268 | 0.000 | 0.000 | |||||||
HERrespect control | “Hit” threshold | Configural | 109.368 | 98 | .2033 | 0.020 | 0.000 | 0.037 | 0.994 | 0.993 | |||
Scalar | 112.786 | 101 | .1989 | 0.020 | 0.000 | 0.037 | 0.994 | 0.993 | |||||
Scalar vs. configural | 5.986 | 3 | .1123 | 3.418 | 0.000 | 0.000 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
Table 9.
Change in Latent Mean Differences for Partial Invariance Over Time.
Trial/arm | Model | Standardized physical mean difference | SD | Difference-in-difference | Difference-in-difference change (%) | Standardized sexual mean difference |
---|---|---|---|---|---|---|
Women for Women full sample | All parameters constrained | 0.670* | 0.806 | NA | NA | NA |
Women for Women full sample | “Hit” loading free | 0.509* | 0.821 | NA | NA | NA |
Indashyikirwa treatment | All parameters constrained | −0.594* | 1.019 | −0.385 | 0.078 (20.26%) | −0.077 |
Indashyikirwa treatment | “Slap” threshold free | −0.516* | 0.977 | −0.307 | −0.077 | |
Indashyikirwa control | All parameters constrained | −0.209 | 0.998 | |||
HERrespect control | All parameters constrained | −0.545* | 1.107 | 0.445 | 0.055 (12.36%) | −0.919* |
HERrespect control | “Hit” threshold free | −0.490* | 0.962 | 0.390 | −0.922* | |
HERrespect treatment | All parameters constrained | 0.100 | 0.817 |
Significant at p < .05.
Impact of Covariate Imbalance on Treatment Effect
The greatest source of imbalance across trials was in baseline outcome measures, although age and education also differed significantly across (nonrandomized) study arms in HERrespect. In Indashyikirwa, the unweighted physical IPV mean difference over time was −0.146 from baseline to end line and was nonsignificant (Table 10). With PS weights incorporated into the model, the mean difference increased to −0.253 and became significant. The mean difference for sexual IPV increased from −0.170 to −0.310 with PS weights remaining significant. In Stepping Stones, physical (−0.155) and sexual IPV (−0.385) showed a nonsignificant change that was unaffected by PS weights (to −0.148 and −0.370, respectively). In HERrespect, the physical IPV mean difference changed slightly from 0.598 to 0.589, remaining nonsignificant. The sexual IPV mean difference increased from −0.044 to 0.673, remaining nonsignificant. In WW, nonsignificant decreases in IPV diminished in magnitude with PS weighting (−0.177 to −0.056) and remained nonsignificant. In all models, sensitivity analyses using squared and cubic PSs produced similar results.
Table 10.
Average Treatment Effect Among Treated in Indashyikirwa and HERrespect With and Without Propensity Score Weights.
End line physical IPV | End line sexual IPV | |||||||
---|---|---|---|---|---|---|---|---|
Trial | Unweighted | PS weighted | Squared PS | Cubic PS | Unweighted | PS weighted | Squared PS | Cubic PS |
Indashyikirwa | −0.146 | −0.253* | −0.291* | −0.336* | −0.17* | −0.310* | −0.357* | −0.411* |
Stepping Stones | −0.155 | −0.148 | −0.146 | −0.139 | −0.385 | −0.370 | −0.321 | −0.245 |
HERrespect | 0.598 | 0.589 | 0.516 | 0.456 | −0.440 | 0.673 | 0.807 | 0.818 |
Women for Women | −0.177 | −0.056 | 0.041 | 0.26 | NA | NA | NA | NA |
Note. IPV = intimate partner violence; PS = propensity score.
Significant at p < .05.
Pooled Analyses
Analysis of the baseline pooled sample using the common physical IPV items showed good fit for the configural model, but suggested scalar nonequivalence across the four studies (Table 11) based on the chi-square test for the scalar versus configural model and a large change in RMSEA (0.033). Freeing the threshold on “hit” lead to a nonsignificant p value and a less than 0.01 RMSEA change. In cross-time model comparisons, the chi-square difference test was significant, but an improvement in RMSEA and TLI from the configural to the scalar model suggested equivalence.
Table 11.
Comparison of Models in Pooled Sample Across Trials and Partial Invariance Testing.
Model | χ2 | df | p | RMSEA | 90% CI LL | 90% CI UL | CFI | TLI | Δχ2 | ΔCFI | ΔRMSEA |
---|---|---|---|---|---|---|---|---|---|---|---|
Cross-trial baseline | |||||||||||
Configural | 8.220 | 8 | .4123 | 0.006 | 0.000 | 0.040 | 1.000 | 1.000 | |||
Scalar | 33.316 | 14 | .0026 | 0.039 | 0.022 | 0.057 | 0.998 | 0.996 | |||
Scalar vs. configural | 35.032 | 6 | <.0001 | 22.096 | 0.002 | 0.033 | |||||
Scalar with “hit” threshold free | 10.526 | 11 | .4838 | 0.000 | 0.000 | 0.034 | 1.000 | 1.000 | |||
Configural vs. scalar with “hit” threshold free | 2.117 | 3 | .5484 | 2.306 | 0.000 | 0.006 | |||||
Cross-time pooled | |||||||||||
Configural | 70.090 | 19 | <.0001 | 0.028 | 0.021 | 0.035 | 0.995 | 0.992 | |||
Scalar | 74.580 | 21 | <.0001 | 0.027 | 0.020 | 0.034 | 0.995 | 0.993 | |||
Scalar vs. configural | 6.271 | 2 | .0435 | 4.490 | 0.000 | 0.001 |
Note. RMSEA = root mean square error of approximation; CI = confidence interval; LL = lower limit; UL = upper limit; CFI = comparative fit index; TLI = Tucker–Lewis index.
Discussion
This study is the first to test measurement equivalence of commonly used items to assess IPV in prevention studies in LMICs. The study leverages our experience assessing cross-group, cross-context, and cross-time equivalence of measures for IPV (Miedema et al., n.d.; Yount et al., 2021) and empowerment (Cheong et al., 2017; Miedema et al., 2018; Yount et al., 2020) in observational and intervention studies. Findings suggest that, despite the lack of prior formal measurement testing, the items that were not dropped in the EFA/CFA performed well within the studies and the presence of some noninvariance in a single item did not impact study inferences. The results are important, as they emanate from the largest flagship project to identify what works to prevent violence across geographic and cultural settings and a range of interventions (“What Works to Prevent Violence Global Program,” 2015) and bode well for the prior and subsequent use of the scale. The consistency of the findings suggests that the core items are robust to these differences, although not without challenges that should be rectified in future research.
The authors hypothesized that a subset of studies would demonstrate measurement equivalence. Instead, we found that all studies exhibited equivalence, and in the few instances of partial equivalence, its presence was inconsequential to study inference. When pooled, the four tested physical IPV items demonstrated partial invariance across studies at baseline and strict invariance over time. Also contrary to our hypothesis, the sexual IPV items demonstrated measurement equivalence while “hit” demonstrated potential nonequivalence over time in two of the four studies and in the cross-study analysis. While freeing this item from the constraints imposed by measurement equivalence testing did not impact study inferences, it is the most commonly reported severe item in the physical IPV construct, reported by 9% to 33% of women in the studies, and by large percentages of women in multicountry studies (Garcia-Moreno et al., 2005). Finally, dropping some suboptimally functioning items during EFA/CFA and modeling the remaining items as a latent IPV construct did not substantively impact trial inference, and concordance with published, regression-based trial results was high.
Limitations and Strengths
The findings must be considered in light of study limitations. Midline data issues with two of the studies, including loss to follow-up and uncertainty regarding participant IDs, led us to exclude midline data from this analysis. Published reports of HERrespect and WW highlighted other data quality issues including civil unrest interfering with participant follow-up (Gibbs, Corboz, et al., 2020) and interference from factory ownership (Naved et al., 2021). HERrespect was quasi-experimental and had a 12-month follow-up. Study findings cannot speak to equivalence beyond 24 months. Although we selected items for similar wording in English, we cannot assess whether the quality of the translation was comparable across countries. Finally, a small number of studies met our eligibility criteria and used similar item sets, limiting the size and generalizability of our findings and our ability to assess measurement equivalence in pooled samples.
Despite these limitations, the four studies were diverse in terms of geography and intervention programming and utilize items that are among the most commonly used in LMICs for surveillance and Sustainable Development Goal (United Nations, 2015) reporting. Furthermore, the study was completed independently and, in two cases, before the release of study results. Our findings regarding the interventions’ impact on women’s experience of IPV are concordant except for one study, despite different outcome modeling and analysis strategies.
Implications for Research and IPV Prevention Practice
Need for a Common Set of Valid IPV Items to Assess Effectiveness of Prevention Trials
Our study highlights the need for a valid set of identically worded (Yount et al., 2011) “core” items to measure IPV as a primary outcome across prevention trials. The WW study excluded questions about sexual IPV due to their sensitivity (Gibbs et al., 2018). Indashyikirwa included only three sexual IPV items, of which one (“forced you to perform other sexual acts”) had a low loading on the sexual IPV factor. In HERrespect, two sexual IPV items—“forced you to watch pornography” and “forced you to perform sexual acts you found degrading or humiliating”—had to be excluded because they did not load on the sexual IPV factor. While these items functioned well in the SSCF trial, it is likely that they were either too sensitive or poorly understood in more sexually conservative contexts such as Bangladesh (Rashid, 2000). An item referencing forced sex through coercion also had to be dropped from the SSCF trial due to very high correlations with other items. Notably, dropping these items did not impact trial inference in tests of concordance with published results. While physical IPV items were largely consistent in wording across studies, one study altered the wording on the “push” item; as a result, this item was dropped from the cross-study analysis. The lack of consistently worded item sets highlights the compromises made in the diverse and logistically challenging settings in which prevention trials are implemented. These understandable limitations, still, are a lost opportunity for assessing the equivalence of all items within and across studies.
More Items and More Measurement Testing Are Needed to Assure Valid Measurement
A dearth of items was available to assess the two core IPV constructs serving as the primary outcome. While this study demonstrates the equivalence of items within studies, the limited number of items, especially for sexual IPV, calls into question the scale’s construct validity. IPV is a complex phenomenon, comprised of multiple, related domains (Follingstad & Rogers, 2013). It is unlikely that three to five items can capture the multifaceted construct. Furthermore, the use of valid scales is a prerequisite of valid inference. Ensuring that a scale passes multiple validity tests would facilitate valid inference, minimize underreporting due to limited measurement of the construct, enable assessment of impacts on subdomains of the construct, and facilitate meta-analytic assessments of effect size (Nugent, 2009). Given the limited scope of the study subset and the one scale tested, measurement equivalence testing should be done and reported routinely to ensure valid study inference and enable further research on the impact of nonequivalence on study inference.
A Lack of Covariate Balance at Baseline of Intervention and Control Groups Had Stronger Impact on Study Inference Than Identified Non-Equivalence
In this study, we used PS weighting to compare the impact of nonequivalence to that of imbalanced arms. Some degree of baseline, cross-arm imbalance on covariates, particularly in outcome measures, was identified in all studies. In this study, the presence of imbalance had a potentially larger effect on study outcome than did any identified measurement nonequivalence. The HERrespect study, the only study whose published inference differenced from our own, reported significantly higher relative risk of IPV in the treatment arm (Naved et al., 2021). In our analyses, with PS adjustment, we found no significant latent mean differences at end line, suggesting that the treatment did not significantly increase women’s IPV risk. The importance of exchangeability was replicated in our analysis of the Indashyikirwa study, which only demonstrated significant impacts of the intervention on physical IPV after adjustment for imbalanced covariates. While randomization of sufficient numbers of individuals or clusters often is difficult in the challenging circumstances that prevention studies in LMICs face, researchers should consider use of PS adjustment in addition to randomization.
Conclusion
Most items measuring the physical and sexual IPV constructs were equivalent within studies, suggesting that these items are valid for evaluating the impact of prevention interventions pending confirmation on a wider set of studies. Where we found evidence of potential nonequivalence, the impact on study inference was negligible. Bias in estimates of effect size is more likely due to baseline covariate imbalance across study arms. The measurement of physical and sexual IPV would benefit from more extensive validity testing, starting with content validity, as not all sexual IPV items measured the constructs they are intended to represent. While we were able to drop these items and retain enough items for measurement equivalence testing, we reached the minimum three (Pett et al., 2003) for sexual IPV within, but not across, studies, precluding our ability to test cross-study equivalence of sexual IPV. The partial baseline and full cross-time equivalence of the common physical IPV items bodes well for future meta-analyses, but is only a preliminary step as the number of items available for this test was limited. Future studies including a wider range of IPV scales, intervention types, and geographic regions may elucidate the scope of measurement equivalence or nonequivalence under specific prevention trial conditions.
Supplemental Material
Supplemental material, sj-docx-1-asm-10.1177_10731911221095599 for Impact of Measurement Variability on Study Inference in Partner Violence Prevention Trials in Low- and Middle-Income Countries by Cari Jo Clark, Irina Bergenfeld, Yuk Fai Cheong, Nadine J. Kaslow and Kathryn M. Yount in Assessment
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD099224; Clark PI / Yount MPI).
Ethics Approval: The current study was determined exempt by the Emory University Institutional Review Board because it is not research with “human subjects” nor is it a “clinical investigation” as defined by federal regulations in Ethics Approval.
Consent: Informed consent was obtained from all individual participants included in the study.
ORCID iD: Irina Bergenfeld https://orcid.org/0000-0003-2601-2854
Supplemental Material: Supplemental material for this article is available online.
Data Availability: The data sets generated and analyzed during the current study are publicly available from the What Works to Prevent Violence Consortium at http://medat.samrc.ac.za/index.php/catalog/WW.
References
- Al Mamun M., Parvin K., Yu M., Wan J., Willan S., Gibbs A., Jewkes R., Naved R. T. (2018). The HERrespect intervention to address violence against female garment workers in Bangladesh: Study protocol for a quasi-experimental trial. BMC Public Health, 18(1), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bacchus L. J., Ranganathan M., Watts C., Devries K. (2018). Recent intimate partner violence against women and health: A systematic review and meta-analysis of cohort studies. BMJ Open, 8(7), e019995. 10.1136/bmjopen-2017-019995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheong Y. F., Yount K. M., Crandall A. A. (2017). Longitudinal measurement invariance of the women’s agency scale. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 134(1), 24–36. [Google Scholar]
- Cheung G., Rensvold R. (2002). Evaluating goodness-of-fit indices for testing measurement invariance. Structural Equation Modeling, 9, 233–255. [Google Scholar]
- Cohen J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. [DOI] [PubMed] [Google Scholar]
- Crawford D., Lloyd-Laney M., Bradley T., Atherton L., Byrne G. (2020). Final performance evaluation of DFID’s what works to prevent violence against women and girls programme (DFID’s what works to prevent VAWG programme). IMC Worldwide. [Google Scholar]
- Davidov E., Datler G., Schmidt P., Schwartz S. H. (2012). Testing the invariance of values in the Benelux countries with the European Social Survey: Accounting for ordinality. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis (pp. 171–194). Routledge. [Google Scholar]
- Dunkle K., Stern E., Chatterji S., Heise L. J. B. g. h. (2020). Effective prevention of intimate partner violence through couples training: A randomised controlled trial of Indashyikirwa in Rwanda. BMJ Global Health, 5(12), e002439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Follingstad D. R., Rogers M. J. (2013). Validity concerns in the measurement of women’s and men’s report of intimate partner violence. Sex Roles, 69(3–4), 149–167. [Google Scholar]
- Garcia-Moreno C., Jansen H. A. F. M., Ellsberg M., Heise L., Watts C. (2005). WHO multi-country study on women’s health and domestic violence against women: Initial results on prevalence, health outcomes and women’s responses. World Health Organization. [Google Scholar]
- Gibbs A., Corboz J., Chirwa E., Mann C., Karim F., Shafiq M., Mecagni A., Maxwell-Jones C., Noble E., Jewkes R. (2020). The impacts of combined social and economic empowerment training on intimate partner violence, depression, gender norms and livelihoods among women: An individually randomised controlled trial and qualitative study in Afghanistan. BMJ Global Health, 5(3), e001946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbs A., Corboz J., Shafiq M., Marofi F., Mecagni A., Mann C., Karim F., Chirwa E., Maxwell-Jones C., Jewkes R. (2018). An individually randomized controlled trial to determine the effectiveness of the women for women international programme in reducing intimate partner violence and strengthening livelihoods amongst women in Afghanistan: Trial design, methods and baseline findings. BMC Public Health, 18(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbs A., Washington L., Abdelatif N., Chirwa E., Willan S., Shai N., Sikweyiya Y., Mkhwanazi S., Ntini N., Jewkes R. J. J. o. A. H. (2020). Stepping stones and creating futures intervention to prevent intimate partner violence among young people: Cluster randomized controlled trial. Journal of Adolescent Health, 66(3), 323–335. [DOI] [PubMed] [Google Scholar]
- Gibbs A., Washington L., Willan S., Ntini N., Khumalo T., Mbatha N., Sikweyiya Y., Shai N., Chirwa E., Strauss M. (2017). The Stepping Stones and Creating Futures intervention to prevent intimate partner violence and HIV-risk behaviours in Durban, South Africa: Study protocol for a cluster randomized control trial, and baseline characteristics. BMC Public Health, 17(1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoeffler A., Fearon J. (2014). Benefits and costs of the conflict and violence targets for the post-2015 development agenda: Post-2015 consensus (Conflict and Violence Assessment Paper, Issue). Copenhagen Consensus Center. [Google Scholar]
- Hu L.-T., Bentler P. M. (1995). Evaluating model fit. In Hoyle R. H. (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76–99). SAGE. [Google Scholar]
- Hu L.-t., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. [Google Scholar]
- Kishor S., Bradley S. E. K. (2012). Women’s and men’s experience of spousal violence in two African countries: Does gender matter? (DHS analytical studies no. 27). ICF International. [Google Scholar]
- Kishor S., Johnson K. (2004). Profiling domestic violence: A multi-country study. ORC Macro. [Google Scholar]
- Leite W. L., Stapleton L. M., Bettini E. F. (2019). Propensity score analysis of complex survey data with structural equation modeling: A tutorial with Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 26(3), 448–469. [Google Scholar]
- Liu Y., Millsap R. E., West S. G., Tein J.-Y., Tanaka R., Grimm K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miedema S. S., Cheong Y. F., Naved R. T., Yount K. M. (2022). Development and Validation of the Economic Coercion Scale-20 (ECS-20): A Short-Form of the ECS-36. Available at SSRN 4081469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miedema S. S., Haardörfer R., Girard A. W., Yount K. M. (2018). Women’s empowerment in East Africa: Development of a cross-country comparable measure. World Development, 110, 453–464. [Google Scholar]
- Millsap R. E. (2012). Statistical approaches to measurement invariance. Routledge. [Google Scholar]
- Muthén L., Muthén B. (2018). Mplus user’s guide (Version 8). Muthén & Muthén. [Google Scholar]
- Naved R. T., Mamun M. A., Parvin K., Willan S., Gibbs A., Jewkes R. (2021). Learnings from the evaluation of HERrespect: A factory-based intervention to prevent intimate partner and workplace violence against female garment workers in Bangladesh. Global Health Action, 14(1), 1868960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nugent W. R. (2009). Construct validity invariance and discrepancies in meta-analytic effect sizes based on different measures: A simulation study. Educational and Psychological Measurement, 69(1), 62–78. [Google Scholar]
- Pett M. A., Lackey N. R., Sullivan J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. SAGE. [Google Scholar]
- Putnick D. L., Bornstein M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71–90. 10.1016/j.dr.2016.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rashid S. F. (2000). Providing sex education to adolescents in rural Bangladesh: Experiences from BRAC. Gender & Development, 8(2), 28–37. [DOI] [PubMed] [Google Scholar]
- Stern E., Heise L., McLean L. (2018). Working with couples to prevent IPV: Indashyikirwa in Rwanda. UK Aid. [Google Scholar]
- Straus M. A., Hamby S. L., Boney-McCoy S., Sugarman D. B. (1996). The revised Conflict Tactics Scales (CTS2): Development and preliminary psychometric data. Journal of Family Issues, 17(3), 283–316. [Google Scholar]
- United Nations. (2015). Transforming our world: The 2030 agenda for sustainable development. United Nations. [Google Scholar]
- Vandenberg R. J., Lance C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. 10.1177/109442810031002 [DOI] [Google Scholar]
- What Works to Prevent Violence Global Program. (2015). Standard outcomes for assessment of intimate partner violence. Author. [Google Scholar]
- World Health Organization. (2021). Violence against women prevalence estimates, 2018: Global, regional and national prevalence estimates for intimate partner violence against women and global and regional prevalence estimates for non-partner sexual violence against women. Author. [Google Scholar]
- Yount K. M., Cheong Y. F., Miedema S., Naved R. T. (2021). Development and validation of the Economic Coercion Scale 36 (ECS-36) in rural Bangladesh. Journal of Interpersonal Violence. Advance online publication. 10.1177/0886260520987812 [DOI] [PubMed]
- Yount K. M., Halim N., Hynes M., Hillman E. R. (2011). Response effects to attitudinal questions about domestic violence against women: A comparative perspective. Social Science Research, 40(3), 873–884. 10.1016/j.ssresearch.2010.12.009 [DOI] [Google Scholar]
- Yount K. M., James-Hawkins L., Abdul Rahim H. F. (2020). The Reproductive Agency Scale (RAS-17): Development and validation in a cross-sectional study of pregnant Qatari and non-Qatari Arab Women. BMC Pregnancy and Childbirth, 20(1), Article 503. 10.1186/s12884-020-03205-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-docx-1-asm-10.1177_10731911221095599 for Impact of Measurement Variability on Study Inference in Partner Violence Prevention Trials in Low- and Middle-Income Countries by Cari Jo Clark, Irina Bergenfeld, Yuk Fai Cheong, Nadine J. Kaslow and Kathryn M. Yount in Assessment