Abstract
With the increasing sophistication of online survey tools and the necessity of distanced research during the COVID-19 pandemic, the use of online questionnaires for research purposes has proliferated. Still, many researchers undertake online survey research without knowledge of the prevalence and likelihood of experiencing survey questionnaire fraud nor familiarity with measures used to identify fraud once it has occurred. This research note is based on the experience of researchers across four sites who implemented an online survey of families’ experiences with COVID-19 in the U.S. that was subject to substantial fraud. By the end of data collection, over 70% of responses were flagged as fraudulent with duplicate IP addresses and concurrent start/end times representing the most common indicators of fraud observed. We offer lessons learned to illustrate the sophisticated nature of fraud in online research and the importance of multi-pronged strategies to detect and limit online survey questionnaire fraud.
Keywords: Covid-19, families, online survey, fraud detection, lessons learned
Introduction
During autumn 2020, Children, COVID-19, and its Consequences (“Triple C”) launched its survey to assess the economic and educational consequences of the COVID-19 pandemic on a non-representative, but targeted sample of elementary age children and their families living in Durham (NC), New Brunswick (NJ), Pittsburgh (PA), and Seattle (WA). The four cities represented places in the United States with varying policy responses to the pandemic and infection rates. Because of the pandemic, the study team reached out to potential participants through online recruitment via existing, curated study lists, Facebook pages, and school administrators, rather than through in-person visits to schools or parent-teacher associations. Despite our recruitment strategy and ties to school communities in each of the locales, our study experienced an unusual amount of survey questionnaire fraud, which we estimate at over 70% of the responses. Our survey offered remuneration for completed questionnaires, which likely contributed to our exposure to fraud. Several early indications of fraudulent responses led us to suspend data collection, add more fraud checks to our instrument, and modify our consent form to clarify that respondents who completed the questionnaire more than once would not be compensated. Subsequently, one of our sites was informed by their institution of multiple project gift cards deposited to a single bank account, indicating one more instance of fraud. Our experience offers insights about the extent of potential survey questionnaire fraud and lessons learned for detecting and preventing it through online data collection.
Fraudulent responses are a well-known phenomenon, especially when instruments are administered online (Bauermeister et al., 2012; Bosnjak & Tuten, 2003; Bowen et al., 2008; Pozzar et al., 2020; Teitcher et al., 2015). In recent years, bots, or automated software simulating human responses, have been of particular concern and a number of studies have developed approaches to identify or limit their access to a survey (Perkel & Simone, 2020; Shanahan, 2018; Storozuk et al., 2020). However, a growing number of studies point to the many types of fraudulent respondents, from bots to bad actors (Ballard et al., 2019; Dewitt et al., 2018; Howell, n.d.; Pozzar et al., 2020). Importantly neither screening questions, reCAPTCHA techniques, innovative payment methods, nor checking for duplicate IP addresses are enough to comprehensively identify fraudulent respondents (Ahler et al., 2019). Instead, current protocols recommend incorporating multiple fraud detection techniques (Ballard et al., 2019; Buchanan & Scofield, 2018; Dewitt et al., 2018; Pozzar et al., 2020; Simone, 2019).
We adopted many of these techniques but learned that no standard exists for adjudicating among different fraud checks to identify invalid responses nor are there consistent sets of standards to monitor fraud during the implementation of an online questionnaire (Perkel & Simone, 2020). In this research note we describe our survey, our experience with fraud, our approach to identifying fraudulent cases, the distribution of fraud indicators, and conclude with recommendations for prevention and identification.
Data and Methods
We sought to recruit 2,000 respondents to a 3-wave panel survey administered online to participants in Durham, NC; New Brunswick, NJ; Seattle, WA; and Pittsburgh, PA. The first wave occurred between October 2020 and March 2021. Eligible respondents had to live in one of the four U.S. cities, be over age 18, and be the primary caregiver for a child between ages 5 and 12. The sample was stratified by race/ethnicity to ensure sufficient samples of non-Hispanic Black, non-Hispanic White, and Hispanic respondents. Respondents received a $25 e-gift card for completing the 30-minute questionnaire in English or Spanish.
Participant Recruitment
Recruitment was done online by circulating an advertisement to school-based, parent Facebook groups; community organizations; and curated email listservs of parents with children. Interested individuals could follow a link in the advertisement to the screening questions. If respondents were eligible, they would then be shown a link to the questionnaire. The screen also had a reCAPTCHA test to help distinguish humans from bots.
The survey link was not unique and could be used multiple times to reduce barriers to completing the questionnaire. Providing a unique link to each respondent would have required potential respondents to contact us to receive the link, an additional step that we felt could inhibit participation.
Detecting Patterns of Fraud
We suspected problems with the survey almost as soon as it went live in October 2020. The survey questionnaire was launched in Durham, but immediately began receiving responses from respondents reporting that they were from the other three cities. We quickly became concerned about impersonation (pretending to qualify to receive remuneration) as well as respondent duplication (completing the questionnaire more than once) based on suspicious patterns in responses and email addresses.
Because of concerns about fraudulent data, we halted data collection after a month. We then altered our questionnaire so that we could better identify fraud, both by screening out possible fraudulent respondents before they completed the questionnaire and by including additional checks when we analyzed the survey data.
In total, using both pre-screening and post-hoc data checks, we identified 13 validity checks. These included: 1) reCAPTCHA test; 2) reporting in questionnaire that child was less than 5 or greater than 12 years old (after indicating in screener that child was between ages 5 to 12 years old); 3) child’s age and grade were inconsistent by more than 2 years; 4) completed questionnaire in less than 15 minutes; 5) did not complete two-thirds of questionnaire; 6) duplicate email; 7) invalid email; 8) no email; 9) duplicate IP address; 10) invalid phone number (not implemented in Seattle; phone numbers in other sites were verified through a text messaging service); 11) no phone number; and 12) did not pass attention check questions (e.g., did you swim with sharks this past weekend?); and 13) concurrent questionnaire start/end time.
Closing the Survey
By the end of February 2021, we received 6,536 responses to the online questionnaire. 1,912 responses (29.3%) passed the above validity checks and were compensated. Subsequent analysis of these 1,912 cases indicated that the fraction of invalid responses might have been even higher. Descriptive statistics, for example, indicated that 90% of our respondents were married, which far exceeds the marriage rate found in the general population of low-income households. This, as well as other data anomalies (e.g., people who reported low incomes but low levels of material hardship), raised serious concerns.
To further verify the quality of the data collected, in April 2021, the 1,912 respondents who were compensated for their participation were sent a follow-up verification questionnaire to confirm race, birth year, education and contact information. Of the 502 responses received, only 33% provided consistent information with the original responses. Roughly 25% had one major inconsistency in their demographic or contact information across the questionnaires, 26% had two or more major inconsistencies, and the remainder had three or more major inconsistencies.
Based on this series of data validity checks, as well as the follow-up questionnaire, we estimate that of the 6,536 responses we received, only roughly 7% were definitively valid.
Analysis of Results
Our team conducted analyses to verify the quality of our data and understand the patterns of fraud. We used descriptive analysis to assess how frequently we observed each fraud measure and the extent to which these measures co-occur, indicating different “types” of fraud. Detection measures were categorized into three groups: consistency checks, effort checks, and duplication checks. Table 1 presents the proportion of fraud observed by indicator, across respondents for the sample after we removed the cases from October 2020 (n=786). As we mentioned earlier, when the survey launched in October we noticed suspicious patterns (i.e. impersonation and respondent duplication) and decided to modify our screening and fraud checks. Thus, we did not include those initial responses in the remainder of our analysis.
Table 1.
Proportion of fraud indicators observed among respondents for the pooled samples from all study sites. (There were 6,536 respondents overall, but for the above analyses, we have removed all responses received during October 2020 (N=786)).
Consistency Checks | Not Observed | Only Fraud Observed | Observed with Other Fraud | Overall Fraud Observed |
---|---|---|---|---|
Invalid Child Age | 0.983 | 0.002 | 0.014 | 0.017 |
Inconsistent Child Age | 0.972 | 0.008 | 0.020 | 0.028 |
Inconsistent Child Grade | 0.931 | 0.018 | 0.052 | 0.069 |
Effort Checks | ||||
| ||||
Attention Check Question | 0.978 | 0.008 | 0.015 | 0.022 |
Completed in Under 15 Minutes | 0.745 | 0.064 | 0.191 | 0.255 |
Answered Less than Two Thirds | 0.999 | 0 | 0.001 | 0.001 |
Duplication Checks | ||||
| ||||
Duplicate Emails | 0.945 | 0.014 | 0.041 | 0.055 |
Duplicate IP Addresses | 0.699 | 0.164 | 0.137 | 0.301 |
Concurrent Start/End Time | 0.616 | 0.113 | 0.270 | 0.384 |
Duplicate Module Time | 0.951 | 0.002 | 0.047 | 0.049 |
Overall Fraud | ||||
| ||||
Any Fraud Measure | 1495 (0.260) | 2248 (0.391) | 2007 (0.349) | 4255 (0.740) |
Average Number of Fraud Measures | 1.172 | |||
| ||||
N | 5750 |
Consistency checks were implemented to ensure the information participants provided aligned with our eligibility criteria and other corresponding questionnaire responses. These checks flagged cases of invalid child age, inconsistent child age, and inconsistent child grade. Inconsistent child grade was the most commonly observed (0.069) and co-occurring (0.052) among consistency checks. Responses flagged for reporting an inconsistent child grade were most likely to co-occur with reports of invalid child age (i.e. a child not between the ages 5 and 12) in cases where more than one fraud measure was observed.
Effort checks captured questionnaire completion and coherence in response to an attention check. Fraud was detected if participants responded unrealistically to an attention check question, completed the questionnaire in less than 15 minutes, or answered less than two-thirds of the questions. The main survey questionnaire included 225 questions across 23 modules making it unrealistic for respondents to finish in less than 15 minutes. Yet, spending less than 15 minutes was the most commonly observed fraud measure detected (0.255) and most frequently co-occurring (0.191) among effort checks. The measure was observed most frequently with concurrent start/end time, which flagged responses that started and ended within one minute of another respondent on the same day. The length of the questionnaire and co-occurrence of these two time-related fraud checks suggest that more sophisticated bot-based software may have been involved.
Finally, duplication checks were implemented to capture multiple responses from the same participant. This includes participants who completed the questionnaire using the same email address, from the same IP address, and within the same amount of time – overall or across at least 10 out of 13 of the questionnaire modules – as another respondent. The most common fraud measures observed among duplication checks were duplicate IP addresses (0.301) and concurrent start/end time (0.384). These measures were also the most observed across all fraud indicators. Duplicate IP addresses frequently co-occurred with inconsistent child grades, which may correspond to a human-like pattern of manually completing the questionnaire multiple times from the same device. This differs from concurrent start/end time, which often co-occurred with duplicate module time and short survey time indicating more automated, bot-like patterns among responses flagged with multiple fraud indicators.
By the end of data collection, over 70% of responses were flagged as fraudulent with an average of 1.172 fraud measures observed among all respondents included in the analysis. The proportion of observations flagged with multiple fraud indicators peaked in December, not long after we relaunched the survey questionnaire with new detection measures in place. This timing corresponds to the holiday season and may reflect the increased economic pressures faced by families, especially during the first year of the COVID-19 pandemic. Fraudulent activity appeared to decline at the start of the new year and eventually plateaued in February, shortly before data collection ended.
While we could not definitively determine which responses were fraudulent or valid, our preliminary results indicate that fraud can be extremely prevalent within online questionnaires. Flagging responses based on suspicious patterns in start and stop times may be especially effective for identifying fraud. We also found that certain fraud checks tended to co-occur, suggesting they capture different kinds of fraud and highlighting the value of a multi-pronged approach. These results suggest that fraud may be best understood as operating on a continuum with different types posing distinct challenges based on study design and the fraud detection measures in place.
Lessons Learned
The Triple C project came to a premature end because of clear evidence of rampant fraud. Even among “non-fraudulent” cases, the study team was suspicious about the validity of the data because of inconsistent demographic patterns when compared to nationally representative samples. Ultimately, the investigative team decided these concerns were too grave to continue the study. Nevertheless, the study provides several important lessons for researchers interested in conducting online survey questionnaires in the future.
First, single fraud checks are not enough to protect against invalid responses. In the Triple C Study, commonly used fraud checks, such as duplicate IP and email addresses, did not capture the vast majority of fraud observed. Instead, multiple typologies of fraud checks must be implemented to protect against fraud. This includes more common approaches in the field, such as attention checks and screening for duplicate email and IP addresses as well as consistency checks of the data. Additionally, survey metadata, such as the time of completion or temporal adjacency of respondents, provides useful information for identifying potentially fraudulent data.
Multiple fraud checks are necessary because different sources of fraud exist in online questionnaires. In the Triple C study, we observed fraud that we suspect came from bots used to hack the questionnaire for financial gain. Metadata were helpful in detecting these cases, which often had identical survey questionnaire durations or were in clusters of temporally adjacent responses that failed multiple consistency checks. Another source of fraud was from dishonest or careless individuals, who were revealed through consistency and effort checks. The prevalence of these sources of fraud may depend on key aspects of study design like participant recruitment. For example, investigations that rely on social media for recruitment and do not distribute unique survey questionnaire links to participants may fall prey to malicious computer programs that troll social media for online questionnaires. Other recruitment methods, such as collecting names through research registries, may deal with fewer bots, but still be plagued by dishonest reporting.
Second, online studies should consider structuring participant payments in a way that reduces the financial rewards of fraudulent responses without undermining overall study participation. The Triple C Study provided moderate financial payments to participants for completing the questionnaire. The investigative team thought the remuneration was appropriate based on length of the questionnaire, however, the level of compensation was high compared to other online studies and thus seemed to incentivize fraudulent responses. Lower levels of compensation or paying a subset of participants (e.g. entering all who complete the questionnaire into a lottery) could help deter fraudulent respondents.
A third lesson for future studies is that direct communication between the study team and potential participants may be crucial for enhancing the validity of responses. Even the most extensive fraud checks are unlikely to be a substitute for these one-on-one interactions. These may include in-person phone screenings when participants are enrolled, followed up by an email that includes a unique questionnaire link for each respondent. This approach would eliminate fraud from computer programs that hack survey questionnaires. Screening and enrollment over the phone could also help detect participants who may be lying about their eligibility. Moreover, one-on-one conversations with participants may help the study team build rapport with respondents increasing the likelihood that they provide honest, complete responses. These methods greatly increase the cost of online studies, however, there may be less costly alternatives that could serve a similar purpose.
Certain elements of our study design likely made us more susceptible to fraud, however, our approach was necessitated by the constraints of in-person data collection during the pandemic, the need to compensate participants during a period of great financial precarity, and our interest in quickly characterizing hardships disproportionately affecting socioeconomically vulnerable populations. Furthermore, sampling frames with verified respondents who met our study criteria were not readily available in any of our sites. Instead we chose to target our online advertising among likely communities of school-age families and allowed the survey to be open access with plenty of pre-screening questions. Given the high costs of in-person surveys and the challenges of finding localized lists of verified respondents, we suspect our approach will continue to find purchase among online survey researchers.
Requiring some point of human contact during screening and enrollment can mitigate the risk of fraud, but we lacked the resources to do so. And, while we provided relatively large incentives in retrospect, we felt strongly that it was unethical to ask economically vulnerable families to participate and not compensate them for their time. Additionally, researchers working to sample hard-to-reach populations may make similar trade-offs in the interest of finding participants that meet their eligibility criteria and recruiting them at relatively low-cost. In such instances, they must carefully consider their sampling frame, how much capacity can be dedicated towards additional identity verifications, and the types of incentives offered, particularly in online research where the length or complexity of a questionnaire are insufficient deterrents of fraud.
No strategy is perfect, and it is important that researchers weigh the costs and benefits of applying different fraud detection measures to their studies. More randomized control trials are needed to test different strategies such as lottery systems, individualized links, and different fraud checks. The measures outlined in this research note simply illustrate the importance of multi-pronged strategies that are both proactive and reactive to detect and minimize online survey questionnaire fraud more effectively.
Any funding acknowledgement details:
Funding for our project comes from two Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grants: P2C HD042828, to the Center for Studies in Demography & Ecology (CSDE) at the University of Washington and P2C HD065563 to the Duke Population Research Center (DuPri) at Duke University. Additionally, support for research assistance came from a Shanahan Endowment Fellowship and a Eunice Kennedy Shriver National Institute of Child Health and Human Development training grant, T32 HD101442-01, to the UW’s Center for Studies in Demography & Ecology. Additional funding was received from seed grants from the Population Health Initiative at the University of Washington, the Sanford School of Public Policy at Duke University, and the Department of Psychology and the Learning Research and Development Center at University of Pittsburgh. Funds also came from the UW’s Department of Sociology and UW’s Department of Epidemiology.
Author/s bios:
Aasli Abdi Nur, MPH, is a PhD candidate in the Department of Sociology and fellow with the Center for Studies in Demography and Ecology at the University of Washington. Her current research focuses on gender, fertility, and demographic methods, specifically the methodological approaches used to measure fertility change and family planning behavior as well as the challenges with their application. Aasli received her MPH from the Rollins School of Public Health at Emory University with a graduate certificate in maternal and child health. She has published research on women’s health, mental health, and behavior adoption approaches during the Covid-19 pandemic.
Christine Leibbrand, PhD, is an Institutional Analyst with the Office of Planning and Budgeting at University of Washington. Christine received her MA and PhD in Sociology from University of Washington, with concentrations in Demography and Social Statistics. Her current research focuses on assessing student and institutional outcomes in order to inform policy decisions at the university level and beyond. She has also published on segregation, neighborhood outcomes, the health impacts of gun violence on mothers and children, and internal migration within the United States.
Sara R. Curran, PhD, is a professor of sociology, international studies, and public policy and governance at the University of Washington. She is also the Director of the UW’s Center for Studies in Demography & Ecology. She is a demographer who studies population dynamics domestically and internationally, gender, climate change, and research methods. Her research is supported by NIH, NSF, and private foundations and has been published widely.
Elizabeth Votruba-Drzal, PhD, is a professor of psychology and Senior Scientist at the Learning Research and Development Center at the University of Pittsburgh. She is a developmental scientist who studies how socioeconomic circumstances relate to opportunities for healthy growth and development. Her research examines key contexts including families, schools, early care and education settings, neighborhoods, and public policies. Her research involves both primary data collection as well as the analysis of large, publicly-available databases. She has published extensively in leading journals in psychology and education. Her research program has been supported by grants from NIH, NSF, and several private foundations.
Christina Gibson-Davis, PhD, is a professor of public policy and sociology at Duke University. She is a family demographer who studies the health and well-being of low-income families and their children, concentrating on factors that determine familial and child flourishing, including economic and policy inputs and family structure. She has extensive expertise in using large, administrative data sets and been the PI or co-PI on several foundation, NSF, and EPA-funded grants. Her work has been published in top demography, psychology, and medical journals.
References
- Ahler DJ, Roush CE, & Sood G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. Political Science Research and Methods, 1–20. [Google Scholar]
- Ballard AM, Cardwell T, & Young AM (2019). Fraud Detection Protocol for Web-Based Research Among Men Who Have Sex With Men: Development and Descriptive Evaluation. JMIR Public Health and Surveillance, 5(1), e12344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bauermeister J, Pingel E, Zimmerman M, Couper M, Carballo-Diéguez A, & Strecher VJ (2012). Data Quality in web-based HIV/AIDS research: Handling Invalid and Suspicious Data. Field Methods, 24(3), 272–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bosnjak M, & Tuten TL (2003). Prepaid and Promised Incentives in Web Surveys: An Experiment. Social Science Computer Review, 21(2), 208–217. [Google Scholar]
- Bowen AM, Daniel CM, Williams ML, & Baird GL (2008). Identifying multiple submissions in Internet research: Preserving data integrity. AIDS and Behavior, 12(6), 964–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buchanan EM, & Scofield JE (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50(6), 2586–2596. [DOI] [PubMed] [Google Scholar]
- Dewitt J, Capistrant B, Kohli N, Rosser BS, Mitteldorf D, Merengwa E, & West W. (2018). Addressing participant validity in a small internet health survey (The Restore Study): Protocol and recommendations for survey response validation. JMIR Research Protocols, 7(4), e7655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howell B. (n.d.). Dealing with Bots, Randoms and Satisficing in Online Research. Retrieved March 21, 2022, from https://www.psychstudio.com/articles/bots-randoms-satisficing/
- Perkel JM, & Simone M. (2020). Melissa Simone Survey Sleuth. NATURE, 579(7799), 461–461. [DOI] [PubMed] [Google Scholar]
- Pozzar R, Hammer MJ, Underhill-Blazey M, Wright AA, Tulsky JA, Hong F, Gundersen DA, & Berry DL (2020). Threats of Bots and Other Bad Actors to Data Quality Following Research Participant Recruitment Through Social Media: Cross-Sectional Questionnaire. Journal of Medical Internet Research, 22(10), e23021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shanahan T. (2018). Are You Paying Bots to Take Your Online Survey? Fors Marsh Group. https://www.forsmarshgroup.com/knowledge/news-blog/posts/2018/march/are-you-paying-bots-to-take-your-online-survey/ [Google Scholar]
- Simone M. (2019). Bots started sabotaging my online research. I fought back. Statnews. Https://Www.Statnews.Com/2019/11/21/Bots-Started-Sabotaging-My-Online-Research-i-Fought-Back. [Google Scholar]
- Storozuk A, Ashley M, Delage V, & Maloney E. (2020). Got Bots? Practical Recommendations to Protect Online Survey Data from Bot Attacks. The Quantitative Methods for Psychology, 16, 472–481. [Google Scholar]
- Teitcher JEF, Bockting WO, Bauermeister JA, Hoefer CJ, Miner MH, & Klitzman RL (2015). Detecting, preventing, and responding to “fraudsters” in internet research: Ethics and tradeoffs. The Journal of Law, Medicine & Ethics: A Journal of the American Society of Law, Medicine & Ethics, 43(1), 116–133. [DOI] [PMC free article] [PubMed] [Google Scholar]