Skip to main content
Health Expectations : An International Journal of Public Participation in Health Care and Health Policy logoLink to Health Expectations : An International Journal of Public Participation in Health Care and Health Policy
letter
. 2024 Jun 6;27(3):e14099. doi: 10.1111/hex.14099

Fraudulent Online Survey Respondents May Disproportionately Threaten Validity of Research in Small Target Populations

Joshua H Gordon 1, Kiyono Fujinaga‐Gordon 2,, Christian Sherwin 2
PMCID: PMC11156680  PMID: 38845165

To the Editor,

We would like to share our recent experiences with fraudulent study participants in response to a previous letter published in Health Expectations entitled ‘“Imposter participants” in online qualitative research, a new and increasing threat to data integrity?’ [1].

The widespread nature of the internet and the accessibility of online surveys have reduced barriers for collecting data from large samples of individuals. Nevertheless, it has become increasingly apparent that fraudulent activity is a critical threat to the validity of online data collection. We appreciate HEX's engagement in relatively small, bad‐faith respondents may have a proportionally greater impact on study validity. In the case of our study, we believe there was an extreme preponderance of fraudulent responses that exceeded many previous reports in the literature.

Between December 2023 and January 2024, we conducted a nationwide ethnographic questionnaire survey in the United States to characterize communities of Japanese speakers. We aimed to assess demographics, language attitudes, language use patterns, experiences of discrimination, health care access and mental health in this population. We used the ‘quota sampling’ method, ‘snowball technique’ [2] or ‘network sampling’ [3] by the PI reaching out to people through her professional networks and Japanese heritage community friends.

We opened the survey on 15 December 2023 using Qualtrics as a research survey tool. In the first 13 days of the study, we had 69 respondents. On 28 December 23, we approved an organization to reach out to their constituents via their social media accounts (Facebook and Instagram). That day, we had 1774 survey responses. After our research team reviewed the results, our suspicions were quickly raised about the possibility of fraud based on observations of repetitive answers, answers not consistent with questions being asked and numerous emails sent to study PI asking about survey compensation, as well as many completely blank emails, consistent with concerning features reported by authors in HEX. We contacted our IRB about our concerns and closed the survey at 10 pm on the night of 28 December. We reopened our survey after obtaining IRB approval to adjust the language of our survey, adding language stating that we would be evaluating responses for authenticity.

During our data analysis, we utilized the following criteria to classify responses based on suspected fraud or authenticity: (1) responses that appeared to be duplicate submissions from the same individual (i.e., multiple submissions with identical answers or with the same name listed); (2) low effort, repetitive submissions (e.g., answering 1 for all Likert scale questions or ‘n/a’ for open‐ended responses); (3) nonsensical submissions; (4) submissions where survey responses were inconsistent with eligibility criteria (i.e., not listing Japanese as a spoken language, since the survey was aimed at surveying Japanese speakers); (5) responses not internally consistent (e.g., mismatch between reported age and birth year).

Based on these criteria, we classified 356 responses as authentic and 1487 as suspected. During the first 12 days of our survey being open, 12/69 (17/4%) were classified as suspected fraud. After being posted to social media, we classified 1475/1774 (83.1%) as suspected fraud. Nevertheless, there is a significant likelihood that our criteria were not able to correctly classify all fraudulent responses. Considering fraud classification in terms of test sensitivity and specificity, it is apparent that the higher the proportion of fraudulent responses, the more likely that fraudulent responses would be misclassified as authentic, and that these misclassified responses would have a greater impact on study results. To illustrate this point, if we consider a hypothetical classification strategy that has a 90% sensitivity and specificity to detect fraudulent responses, assuming a prevalence of 50% fraud, approximately 90% of responses we determine to be authentic will truly be authentic. However, if the prevalence of fraud is 80%, only approximately 69% of responses classified as authentic would be truly authentic; at 90% fraud, the proportion of truly authentic responses decreases to 50%. As such, we conjecture that the potential impact of fraud is a yet greater threat to validity in studies in which the population of interest is small.

In addition to the threats to study validity, we wish to acknowledge the impact of this experience on investigators. It is disheartening to encounter issues related to fraud during research. Our team experienced significant demoralization related to this occurrence. Although we would hope that this is not something any researcher would encounter, we are concerned that this may be an issue that is here to stay. Moreover, as technology in AI‐based language processing continues to improve and wider access, the sophistication of fraudulent respondents may become even more difficult to identify. Successful strategies have been described elsewhere such as providing regional‐based compensation, requiring a physical mailing address, and conducting interviews over phone [1, 4]. Additional strategies are needed to continue to address these issues and minimize the impact of fraudulent responses on future studies.

Author Contributions

Joshua H. Gordon: conceptualization, formal analysis, investigation, writing–original draft, writing–review and editing, methodology. Kiyono Fujinaga‐Gordon: conceptualization, data curation, investigation, methodology, project administration, writing–review and editing, funding acquisition. Christian Sherwin: formal analysis, investigation, writing–review and editing.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgements

This study was supported by the Japan Foundation, Los Angeles (to K.F.‐G.) and the National Institute of Mental Health (Grant R25MH119043 to J.H.G.).

Data Availability Statement

Research data are not shared.

References

  • 1. Ridge D., Bullock L., and Causer H., et al., “Imposter Participants' in Online Qualitative Research, a New and Increasing Threat to Data Integrity?” Health Expectations 26, no. 3 (June, 2023): 941–944, 10.1111/hex.13724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Milroy L. and Gordon M. J., Sociolinguistics: Method and Interpretation (Oxford, UK: Blackwell Publishers, 2003). [Google Scholar]
  • 3. Hammersley M. and Atkinson P., Ethnography: Principles in Practice (London, UK: Routledge, 1995). [Google Scholar]
  • 4. Davies M. R., Monssen D., Sharpe H., et al., “Management of Fraudulent Participants in Online Research: Practical Recommendations From a Randomized Controlled Feasibility Trial,” International Journal of Eating Disorders (2023): 1–11, 10.1002/eat.24085. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Research data are not shared.


Articles from Health Expectations : An International Journal of Public Participation in Health Care and Health Policy are provided here courtesy of Wiley

RESOURCES