Skip to main content
. Author manuscript; available in PMC: 2023 Nov 16.
Published in final edited form as: J Med Syst. 2022 Nov 16;46(12):96. doi: 10.1007/s10916-022-01880-6

Table 4:

Data utility for original and sifted MIMIC III text (n=44,423) according to the constructed BERT model (f(·)) that maps patient discharge summary to 3-class CCIs. Original text was preprocessed by truncating long documents to 512 tokens. Summarized text was preprocessed with the TextRank algorithm and truncating the ranked summaries to 512 tokens per document. The label prediction accuracy records 1ni=1nI[f(Wi*)=Li] and the label prediction agreement records 1ni=1nI[f(Wi)=f(Wi*)].

Preprocessing Original text Summarized text
Obfuscation level None Low High None Low High
Accuracy 63.1% 57.9% 50.5% 62.4% 58.8% 53.3%
Agreement - 75.6% 61.9% - 80.9% 71.6%