. Author manuscript; available in PMC: 2023 Nov 16.

Published in final edited form as: J Med Syst. 2022 Nov 16;46(12):96. doi: 10.1007/s10916-022-01880-6

Table 4:

Data utility for original and sifted MIMIC III text (n=44,423) according to the constructed BERT model (f(·)) that maps patient discharge summary to 3-class CCIs. Original text was preprocessed by truncating long documents to 512 tokens. Summarized text was preprocessed with the TextRank algorithm and truncating the ranked summaries to 512 tokens per document. The label prediction accuracy records $\frac{1}{n} \sum_{i = 1}^{n} I [f (W_{i}^{*}) = L_{i}]$ and the label prediction agreement records $\frac{1}{n} \sum_{i = 1}^{n} I [f (W_{i}) = f (W_{i}^{*})]$ .

Preprocessing	Original text			Summarized text
Obfuscation level	None	Low	High	None	Low	High
Accuracy	63.1%	57.9%	50.5%	62.4%	58.8%	53.3%
Agreement	-	75.6%	61.9%	-	80.9%	71.6%