Table 4:
Data utility for original and sifted MIMIC III text (n=44,423) according to the constructed BERT model (f(·)) that maps patient discharge summary to 3-class CCIs. Original text was preprocessed by truncating long documents to 512 tokens. Summarized text was preprocessed with the TextRank algorithm and truncating the ranked summaries to 512 tokens per document. The label prediction accuracy records and the label prediction agreement records .
| Preprocessing | Original text | Summarized text | ||||
|---|---|---|---|---|---|---|
| Obfuscation level | None | Low | High | None | Low | High |
| Accuracy | 63.1% | 57.9% | 50.5% | 62.4% | 58.8% | 53.3% |
| Agreement | - | 75.6% | 61.9% | - | 80.9% | 71.6% |