Abstract
We respond to commentaries on our 2021 paper “Concerns and recommendations for using Amazon MTurk for eating disorder research.” The commentators raised many thoughtful and nuanced points regarding data validity and ethical means of online data collection. We echo concerns about the ethics of recruiting via platforms such as MTurk, and highlight tensions between recommendations for ethical data collection and ensuring data integrity. Especially, we highlight the consistent finding that MTurk workers display elevated (often remarkably so) rates of psychopathology, and argue such findings merit further scrutiny to ensure both data are valid and workers not exploited.
Keywords: mturk, validity, ethics, eating disorders, methodology
We thank the Editor of the International Journal of Eating Disorders for inviting commentaries to our paper, “Concerns and recommendations for using Amazon MTurk for eating disorder research.” We feel it is the best-case scenario if sharing our story generates thoughtful, constructive dialogue to ensure the electronic data we collect and ultimately publish are ethically obtained and valid. We are grateful for the commentators’ diverse perspectives and expertise with online data collection who elaborated and enriched the issues we posed in our original article.
Our authorship team did not intend to assert our original study as the best-designed example of mitigating low-quality data. On the contrary, clearly our design was not rigorous enough given the abundant inconsistencies and improbable responses we received. However, because we did consult the literature when designing our study and still encountered insurmountable doubt about our data quality, we felt it was important to share our experience. Although other fields were engaged in dialogue around these issues, we noted no papers in the eating disorder (ED) field specifically, despite proliferating MTurk research. From our experience and the additional suggestions offered in the following commentaries, we continue to encourage ongoing development and empirical evaluation of data validation and security strategies for online, crowdsourced research platforms. Evidence-based strategies are likely to change over time, as threats to online data integrity are ever-evolving and, as Vogel et al., noted may include participant cohort effects and sociopolitical events.
We want to emphasize and echo the important ethical issues raised by Gleibs et al. Our perceptions of MTurk research profoundly changed because of our original project. The authorship team is composed of psychologists, whose ethics code explicitly prohibits doing harm. We believe a higher bar must be set to ensure no harm comes to MTurk workers participating in psychological research, potentially through the creation of specific ethical guidance related to online data collection. Additionally, we argue research design should be informed in collaboration with the community itself.
A related point meriting consideration is the tension we see between recommendations for ethical research and those for ensuring data integrity. For instance, existing literature recommends researchers “avoid over-incentivizing” MTurk tasks (Yarrish et al., 2019, p. 238) by paying rates similar to other studies on the platform; however, most studies on MTurk offer significantly below minimum wage. It is imperative we pay participants fair wages and ensure these participants are from the intended population. As researchers, we must determine how to reconcile these priorities if we are to continue using online data collection methods. We welcome continued dialogue around these issues.
De Young and Kambanis recommended collecting MTurk worker IDs externally. Although this appears to be common practice, we want to add caution to this recommendation, as MTurk considers worker IDs to be private information. Indeed, worker IDs can be tied to public Amazon profiles, which reveal product reviews and ratings and may threaten confidentiality. CloudResearch’s TurkPrime includes a feature that sends worker IDs to Qualtrics; yet, given these IDs are potentially identifiable, we strongly recommend researchers (1) carefully consider whether gathering such IDs are necessary; and (2) receive explicit ethics approval for such procedures.
Both Moeck et al. and De Young and Kambanis caution that our concerns were overstated, and we respectfully disagree. Even with the highest degree of protections, MTurk studies consistently show signs of sampling bias, with sometimes remarkably high rates of psychopathology. In a rigorously designed study conducted by two of the commentary authors, 29.2% of the MTurk sample met likely criteria for posttraumatic stress disorder (PTSD; Bridgland et al., 2022). This prevalence is noteworthy considering a study of US combat soldiers found 18% met DSM-5 PTSD criteria (Hoge et al., 2014), with global population prevalence estimates ranging from 2–10% (Atwoli et al., 2015). Bridgland et al. (2022) were forced to make changes to their preregistered study design after issues with MTurk data quality, and ultimately took an action they recommend against in their commentary (i.e., removing participants who failed all attention checks) to protect data integrity. It is highly plausible participants who self-select into a study advertised as assessing feelings and beliefs about traumatic experiences would have higher-than-average PTSD rates. However, Kambanis et al. (2021) made the laudable decision to assess for self-selection bias and found no evidence it contributed to the striking extent of ED psychopathology observed in their MTurk survey (i.e., 40% met ED criteria, 45.2% self-induced vomiting, 48.7% laxative abuse). Higher-than-average prevalence of psychopathology does not inherently mean data are invalid; yet, with current technology, the tools available to verify the quality and source of data collected online are limited. Thus, we caution against confident assertions about MTurk data reliability. We continue to believe the stakes of publishing questionable or invalid data are extremely high, particularly when attempting to eliminate disparities in scientific knowledge. Moreover, the aforementioned tension between ethics and data integrity is relevant here. If these data are valid and prevalence estimates of EDs in MTurk workers are potentially hundreds of times more than the general population, then we must consider how we can ethically gather data from this population to minimize harm and avoid outright exploitation. Thus, regardless of the data’s validity, we perceive unusual results merit further scrutiny.
We hope as researchers in this field we are all working towards the common goals of reducing the number of individuals affected by EDs, increasing representation in research and treatment, and improving treatment access and outcomes. We hope we can collectively engage in ongoing reflexivity and dialogue to ensure those goals are front and center and that our research practices are aligned to achieve these goals.
Acknowledgements:
We wish to thank Dr. Shelby Martin and Lisa Calderwood for their contributions to our original manuscript. The first author’s time on this manuscript was supported by Grant Number T32HL150452 from the National Heart, Lung, and Blood Institute (PI: Dianne Neumark-Sztainer). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
Footnotes
The authors have no conflicts of interest to declare.
References
- Atwoli L, Stein DJ, Koenen KC, & McLaughlin KA (2015). Epidemiology of posttraumatic stress disorder: Prevalence, correlates and consequences. Current Opinion in Psychiatry, 28, 311. 10.1097/YCO.0000000000000167 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bridgland VME, Barnard JF, & Takarangi MKT (2022). Unprepared: Thinking of a trigger warning does not prompt preparation for trauma-related content. Journal of Behavior Therapy and Experimental Psychiatry, 75, 101708. 10.1016/J.JBTEP.2021.101708 [DOI] [PubMed] [Google Scholar]
- Hoge CW, Riviere LA, Wilk JE, Herrell RK, & Weathers FW (2014). The prevalence of post-traumatic stress disorder (PTSD) in US combat soldiers: A head-to-head comparison of DSM-5 versus DSM-IV-TR symptom criteria with the PTSD checklist. The Lancet Psychiatry, 1, 269–277. 10.1016/S2215-0366(14)70235-4 [DOI] [PubMed] [Google Scholar]
- Kambanis PE, Bottera AR, & De Young KP (2021). Eating disorder prevalence among Amazon MTurk workers assessed using a rigorous online, self-report anthropometric assessment. Eating Behaviors, 41, 101481. 10.1016/j.eatbeh.2021.101481 [DOI] [PubMed] [Google Scholar]
- Yarrish C, Groshon L, Mitchell JD, Appelbaum A, Klock S, Winternitz T, & Friedman-Wheeler DG (2019). Finding the signal in the noise: Minimizing responses from bots and inattentive humans in online research. The Behavior Therapist, 42, 235–242. [Google Scholar]
