Skip to main content
. Author manuscript; available in PMC: 2023 Nov 16.
Published in final edited form as: J Med Syst. 2022 Nov 16;46(12):96. doi: 10.1007/s10916-022-01880-6

Table 3:

Data utility for original, sifted and naive suppressing CDC text (n=86,666) according to the constructed BERT model (f(·)) that maps the CDC injury records to 5-class OIICS labels. The label prediction accuracy records 1ni=1nI[f(Wi*)=Li] and the label prediction agreement records 1ni=1nI[f(Wi)=f(Wi*)].

Obfuscation Level DataSifterText Accuracy Naive Suppressing Accuracy DataSifterText Agreement
Original 98.5% - -
Sifted low obfuscation 86.5% 85.7% 87.3%
Sifted medium obfuscation 74.5% 71.1% 75.0%
Sifted high obfuscation 60.9% 48.8% 61.1%