Skip to main content

View full-text article in PMC

. 2023 Dec 28;9:e48904. doi: 10.2196/48904

Table 6.

Results of detecting ChatGPT-generated medical text in the medical abstract and radiology data sets.

		Accuracy		Precision		Recall	F₁ score
Perplexity-CLS^a, mean (SD)
	Medical abstract		0.847 (0.014)		0.849 (0.015)	0.847 (0.014)	0.847 (0.014)
	Radiology report		0.743 (0.011)		0.756 (0.015)	0.743 (0.011)	0.74 (0.011)
CART^b, mean (SD)
	Medical abstract		0.869 (0.019)		0.888 (0.012)	0.867 (0.019)	0.867 (0.02)
	Radiology report		0.831 (0.004)		0.837 (0.007)	0.831 (0.004)	0.83 (0.005)
XGBoost, mean (SD)
	Medical abstract		0.957 (0.007)		0.958 (0.006)	0.957 (0.007)	0.957 (0.007)
	Radiology report		0.924 (0.007)		0.925 (0.006)	0.924 (0.007)	0.924 (0.007)
BERT^c, mean (SD)
	Medical abstract		0.982 (0.003)		0.982 (0.003)	0.982 (0.003)	0.982 (0.003)
	Radiology report		0.956 (0.033)		0.957 (0.032)	0.956 (0.033)	0.956 (0.033)

^aPerplexity-CLS: Perplexity-classification.

^bCART: classification and regression trees.

^cBERT: bidirectional encoder representations from transformers.