Van der Stigchel et al. (1) rightly extend concerns about AI contamination beyond surveys to behavioral experiments. Their findings suggest even millisecond-level behavioral data may now be compromised. Their letter prompted reconsideration of my original paper (2). We now believe it understated the problem. Here, we present direct evidence that AI contamination of online research panels is not “potential” but a measurable present reality: Over 4% of respondents in a Prolific sample show clear signs of AI assistance. Critically, this figure captures only the most unsophisticated AI users and thus represents a lower bound on true prevalence.
We audited Prolific by embedding code to monitor keystroke behavior during open-ended responses (n = 2,898). Our method flagged respondents showing copy-paste events or typing speeds exceeding 800 characters per minute—four times the human average. Among flagged respondents, median effective speed exceeded 2,400 CPM. Our method flagged 3.1% of respondents; Prolific’s independent detection tool flagged 3.0%. However, only 41% of flagged cases were caught by both methods, indicating nonoverlapping detection. Flagged responses often exhibit characteristic AI formatting: structured headers (e.g., “Simplicity and Clarity”), balanced multipoint arguments, and formal prose misaligned with reported educational attainment.
Combined, 4.4% (128 respondents, 95% CI: 3.7 to 5.2%) were flagged by at least one method (Fig. 1)—a contamination rate exceeding the margin of error (±0.7 pp). Simulations show that if 4.4% of respondents made random selections, this would bias treatment effects downward and reduce power by up to 3.5 percentage points. However, since synthetic respondents are not random (2), 4.4% AI contamination would potentially inflate average treatment effects by 2.3 percentage points.
Fig. 1.
Agreement between two independent AI detection methods in a Prolific audit (n = 2,898). Combined, 128 respondents (4.4%, 95% CI: 3.7 to 5.2%) were flagged by at least one method—a contamination rate exceeding the survey margin of error. Only 41% of flagged cases were caught by both methods, suggesting the two approaches detect partially distinct populations of AI-assisted respondents.
These estimates represent a lower bound on true AI prevalence. Typing detection captures only respondents using AI chat tools or browser automation assistants. It cannot detect purpose-built automated survey tools (2). Some argue that mouse movements and user agents can identify such tools (3). This assumption is incorrect. Using PyAutoGUI, we built a system simulating realistic keystrokes with typos and corrections, typing speeds drawn from empirically calibrated distributions, and human-like mouse movements with hesitation, variable acceleration/deceleration, overshooting, and off-center clicks. Our detections represent only the most careless AI respondents.
This undermines vendor detection claims. CloudResearch (3) reports 100% detection rates via mouse movement analysis and browser profiling but released no replication data or methods. Moreover, most respondents use mobile devices where mouse movements are meaningless, and desktop tools can masquerade as mobile to bypass this detection. Detection based on behavioral tells faces the same obsolescence as attention checks (2): Once tells are documented, they can be simulated.
Van der Stigchel et al. (1) are correct that science needs methods verifying human presence, not merely plausible signals. At minimum, every paper using survey data should disclose AI detection efforts and observed flagging rates as a peer review requirement. Longer-term, research may need to shift toward mobile-only platforms, where automation is harder, or adopt invasive verification such as GPS geolocation, gyroscope monitoring, biometric authentication, or camera-based presence confirmation—each with significant privacy and accessibility tradeoffs.
Acknowledgments
Author contributions
S.J.W. designed research; S.J.W. and S.F. performed research; S.J.W. and S.F. analyzed data; and S.J.W. wrote the paper.
Competing interests
The authors declare no competing interest.
References
- 1.S. Van der Stigchel, C. Strauch, B. de Zwart, L. Van Maanen, Will online behavioral research follow the fate of online survey research? Proc. Natl. Acad. Sci. U.S.A. 123, e2535585123 (2025). [DOI] [PubMed]
- 2.Westwood S. J., The potential existential threat of large language models to online survey research. Proc. Natl. Acad. Sci. U.S.A. 122, e2518075122 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.CloudResearch, AI agent detection (2025). https://www.cloudresearch.com/resources/blog/ai-agent-detection/. Accessed 12 December 2025.

