Skip to main content
. 2023 Dec 28;9:e48904. doi: 10.2196/48904

Table 6.

Results of detecting ChatGPT-generated medical text in the medical abstract and radiology data sets.


Accuracy Precision Recall F1 score
Perplexity-CLSa, mean (SD)

Medical abstract 0.847 (0.014) 0.849 (0.015) 0.847 (0.014) 0.847 (0.014)

Radiology report 0.743 (0.011) 0.756 (0.015) 0.743 (0.011) 0.74 (0.011)
CARTb, mean (SD)

Medical abstract 0.869 (0.019) 0.888 (0.012) 0.867 (0.019) 0.867 (0.02)

Radiology report 0.831 (0.004) 0.837 (0.007) 0.831 (0.004) 0.83 (0.005)
XGBoost, mean (SD)

Medical abstract 0.957 (0.007) 0.958 (0.006) 0.957 (0.007) 0.957 (0.007)

Radiology report 0.924 (0.007) 0.925 (0.006) 0.924 (0.007) 0.924 (0.007)
BERTc, mean (SD)

Medical abstract 0.982 (0.003) 0.982 (0.003) 0.982 (0.003) 0.982 (0.003)

Radiology report 0.956 (0.033) 0.957 (0.032) 0.956 (0.033) 0.956 (0.033)

aPerplexity-CLS: Perplexity-classification.

bCART: classification and regression trees.

cBERT: bidirectional encoder representations from transformers.