. Author manuscript; available in PMC: 2023 Nov 16.

Published in final edited form as: J Med Syst. 2022 Nov 16;46(12):96. doi: 10.1007/s10916-022-01880-6

Table 3:

Data utility for original, sifted and naive suppressing CDC text (n=86,666) according to the constructed BERT model (f(·)) that maps the CDC injury records to 5-class OIICS labels. The label prediction accuracy records $\frac{1}{n} \sum_{i = 1}^{n} I [f (W_{i}^{*}) = L_{i}]$ and the label prediction agreement records $\frac{1}{n} \sum_{i = 1}^{n} I [f (W_{i}) = f (W_{i}^{*})]$ .

Obfuscation Level	DataSifterText Accuracy	Naive Suppressing Accuracy	DataSifterText Agreement
Original	98.5%	-	-
Sifted low obfuscation	86.5%	85.7%	87.3%
Sifted medium obfuscation	74.5%	71.1%	75.0%
Sifted high obfuscation	60.9%	48.8%	61.1%