. 2013 Jan 16;14:10. doi: 10.1186/1471-2105-14-10

Table 5.

Descriptive statistics of the patient notes corpora

Corpus	# Notes	# Words	# Concepts
All Informative (input)	8,557	6,131,879	599,847
Last Informative Note (baseline)	1,247	435,387	44,145
Selective- Fingerprinting maximum similarity 0.33	4,524	3,614,409	337,034
Selective-Fingerprinting maximum similarity 0.25	3,970	3,283,558	302,159
Selective-Fingerprinting maximum similarity 0.20	3,645	3,061,854	278,644

All Informative, input corpus, the corpus obtained by the redundancy reduction baseline (Last Informative Note), and the corpora produced by the fingerprinting redundancy reduction strategy at different level.