Table 3:
Data utility for original, sifted and naive suppressing CDC text (n=86,666) according to the constructed BERT model (f(·)) that maps the CDC injury records to 5-class OIICS labels. The label prediction accuracy records and the label prediction agreement records .
| Obfuscation Level | DataSifterText Accuracy | Naive Suppressing Accuracy | DataSifterText Agreement |
|---|---|---|---|
| Original | 98.5% | - | - |
| Sifted low obfuscation | 86.5% | 85.7% | 87.3% |
| Sifted medium obfuscation | 74.5% | 71.1% | 75.0% |
| Sifted high obfuscation | 60.9% | 48.8% | 61.1% |