Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 17.
Published in final edited form as: JAMA Psychiatry. 2021 Nov 1;78(11):1185–1186. doi: 10.1001/jamapsychiatry.2021.2315

Navigating the benefits and pitfalls of online psychiatric data collection

Brittany Quagan 1, Scott W Woods 1, Albert R Powers 1,*
PMCID: PMC9205608  NIHMSID: NIHMS1816074  PMID: 34431976

Behavioral and epidemiological investigators have moved toward gathering online data in recent months: the number of PubMed hits for “online data collection” in 2021 already exceeds the average yearly number of hits for the decade preceding the COVID-19 pandemic. Online data collection opens up great opportunities to gather large amounts of data inexpensively and in short periods of time. However, it also leaves researchers susceptible to malingering and fraud, which have been shown to be common in some settings1. Psychiatric research, because of its reliance on behavior and subjective reports of symptomatology, may be especially susceptible to fraud and poor data quality in these settings. Here, we offer a perspective on strategies to maximize the benefits of online data collection while minimizing attendant risks.

Without geographic limitations, researchers are able to avail themselves of a pool of participants potentially far from research hubs, enhancing representation of rural groups, preventing potential generalizability problems that come with repeated participation of small groups of individuals and enhancing a greater understanding of experiences across ethnic backgrounds and differences in care across these groups. Online data collection can also be far less taxing on resources than in-person. Adequate recruitment and data collection from large samples can take many years. Online, the tasks of scheduling and meeting with participants are moot, and participants can perform research-related tasks at any time of day with little to no assistance from investigators, freeing staff to focus on other tasks.

Despite its conveniences, however, online data collection is not without risks. Data collection without appropriate safeguards can lead to poor or unknown data quality. Fortunately, simple, low-cost strategies for data quality control can significantly mitigate this risk.

Risk mitigation starts with recruitment. Online platforms can certainly attract attention to a study, but are also accessed by individuals whose primary motivation may be financial compensation and they may offer inaccurate information to either maximize the likelihood of meeting inclusion criteria or minimize the time expended on studies. Engaging existing networks may offer a safer alternative, particularly if those networks offer clinical support, social support, or other specialized services; in essence, the more specialized the group, the less likely it will have been discovered by individuals outside the target population who might be driven to feign group membership and accompanying symptoms. Anecdotally, our recent study showed a marked uptick in numbers of fraudulent or repeated participation upon broad engagement with social media, whereas recruitment through existing networks of participants produced no such incidents. This may be because the ease of joining groups online and the frequency with which these groups are engaged for research may make them attractive targets for fraud. Aside from these benefits, engaging with an established network also aids in participant retention, helping compensate for the absence of personal contact in online settings.

The infrastructure in place for online data collection should also allow for real-time tracking of data quality. HIPAA-compliant databases such as REDCap or Qualtrics2 track data acquired over multiple time points; because all engagement is logged automatically, researchers can quickly pinpoint time and location of participation and identify potential fraudulent activity. Additionally, databases such as these can analyze the participants browser, operation system and location to detect possible bots or duplicate participation. Additional repeated data fields (such as email addresses and phone numbers), can also be flagged to trigger additional identity verification procedures. Of course, low-level requirements such as use of a unique email address per participant can be easily circumvented; email addresses are easy to obtain. For this reason, use of participants’ phone numbers for short message service (SMS) verification may offer a more secure method. SMS verification programs can be set to automatically disallow repeat phone numbers, or voice-over-internet-protocol (VoIP) numbers from being used, offering the safeguard of requiring a real cellular number. Internet protocol (IP) address tracking is already commonly used in research and readily available. However, device fingerprinting programs not only track IP addresses, but can look them up and score potential for fraud based on other activity, behaviors, and location3. Suspicious activity is flagged through device fingerprinting functionalities, which are now commonly included as add-on services in several online data-collection platforms. Another option to ensure that participants are located within the required parameters of the study is to mail physical items to the participants mailing address such as assessments that are typically done in person or mailing a physical gift card as opposed to electronic gift cards. Addresses can be requested at the start of the study so that those who may be dishonest about their location could not proceed.

Participants may also be required to partake in a video visit before proceeding. This gives researchers the opportunity to cross-check information already provided. In addition, these mandatory video meetings discourage repeated participation and may also provide a valuable opportunity to modify study procedures when flags produce false hits. Individuals unwilling to participate in video meetings can be excluded. This approach may discourage some legitimate participants from completing some studies; however, we find that most participants are willing to comply if this possibility is made clear at consent. Use of webcams to capture facial recognition data may also aid in detecting repeat participation, although quality and cost of currently-available options may limit use at this time.

Beyond fraudulent participation, clinical data quality assurance is also important when using online-only methods. Study candidates can reveal their true identity yet mislead the study about their symptoms. Fortunately, several tools from forensic psychiatry have been built with malingering detection in mind, and most are adaptable to online assessment: the Miller Forensic Assessment of Symptoms Test (MFAST)4 assesses for malingering by providing individual scores for inconsistency as well as unusual, rare, or extreme symptoms; and elements of the Structured Interview for Reported Symptoms (SIRS)5 may be used for consistency checks across a broad range of psychiatric symptoms. A positive flag can be used as a trigger for a more in-depth clinical assessment.

Many online studies gather behavioral measures also, which come with the challenge of ensuring interpretable data in the face of sometimes vast differences in hardware implementation and fidelity of stimulus presentation. However, this challenge can be overcome by minimizing heterogeneity of hardware implementation. We created qualifying tasks that use participant responses to ensure use of headphones6 and adequate audio volume and screen brightness. Additionally, rather than presenting stimuli at fixed intensities, we used participant responses to calibrate stimulus intensity using highly efficient adaptive procedures7. The use of graded stimuli in tasks themselves allows for a built-in structural check to ensure non-randomness of responses, attentional and consistency checks can ensure sustained attention. Although still in development, methods using webcams for eye tracking offer additional levels of quality assurance in online tasks. Flags for inconsistencies can be used to trigger a repeat of the tasks in question and/or personalized assistance via videoconferencing.

As a field, we expect authors to explain their methods in enough detail to allow for judgment of data quality and replication of results. Papers including data collected online should also describe the steps taken to ensure the integrity of data collected. With adequate safeguards in place, it is possible to retain the benefits of online data collection while minimizing factors that endanger data quality. We hope that the suggestions outlined here will provide an impetus to field-wide establishment of best practices for utilizing these important capabilities.

References

  • 1.Teitcher JEF, Bockting WO, Bauermeister JA, Hoefer CJ, Miner MH, Klitzman RL. Detecting, Preventing, and Responding to “Fraudsters” in Internet Research: Ethics and Tradeoffs. Journal of Law, Medicine & Ethics. 2015;43(1):116–133. doi: 10.1111/jlme.12200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Harris PA, Taylor R, Minor BL, et al. The REDCap consortium: Building an international community of software platform partners. J Biomed Inform. 2019;95:103208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Moalosi M, Hlomani H, Phefo OSD. Combating credit card fraud with online behavioural targeting and device fingerprinting. International Journal of Electronic Security and Digital Forensics. 2019;11(1):46. doi: 10.1504/ijesdf.2019.10016642 [DOI] [Google Scholar]
  • 4.Miller HA. Miller Forensic Assessment of Symptoms Test (M–FAST). Encyclopedia of Psychology and Law. doi: 10.4135/9781412959537.n195 [DOI] [Google Scholar]
  • 5.Rogers R, Sewell KW, Gillard ND. SIRS-2: Structured Interview of Reported Symptoms : Professional Manual.; 2010. [Google Scholar]
  • 6.Woods KJP, Siegel MH, Traer J, McDermott JH. Headphone screening to facilitate web-based auditory experiments. Atten Percept Psychophys. 2017;79(7):2064–2072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Watson AB, Pelli DG. QUEST: a Bayesian adaptive psychometric method. Percept Psychophys. 1983;33(2):113–120. [DOI] [PubMed] [Google Scholar]

RESOURCES