Skip to main content
. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691

Table 4. Blinded guess of question writer (i.e. AI vs human).

AI (total = 50) Human (total = 50) Correlation p
(Correct guess, %) (Correct guess, %)
Assessor A 24, 48% 23, 46% - 0.14–0.26 0.55
Assessor B 14, 28% 41, 82% - 0.38–0.10 0.24
Assessor C 33, 66% 24, 48% - 0.35–0.06 0.16
Assessor D 27, 53% 26, 52% - 0.26–0.14 0.55
Assessor E 26, 52% 32, 64% - 0.36–0.04 0.11
GPT-2 Output Detector 7, 14% 45, 90% - 0.40–0.21 0.54