Skip to main content
. 2013 Jan 16;14:10. doi: 10.1186/1471-2105-14-10

Table 5.

Descriptive statistics of the patient notes corpora

Corpus # Notes # Words # Concepts
All Informative (input)
8,557
6,131,879
599,847
Last Informative Note (baseline)
1,247
435,387
44,145
Selective- Fingerprinting maximum similarity 0.33
4,524
3,614,409
337,034
Selective-Fingerprinting maximum similarity 0.25
3,970
3,283,558
302,159
Selective-Fingerprinting maximum similarity 0.20 3,645 3,061,854 278,644

All Informative, input corpus, the corpus obtained by the redundancy reduction baseline (Last Informative Note), and the corpora produced by the fingerprinting redundancy reduction strategy at different level.