Table 5.
Corpus | # Notes | # Words | # Concepts |
---|---|---|---|
All Informative (input) |
8,557 |
6,131,879 |
599,847 |
Last Informative Note (baseline) |
1,247 |
435,387 |
44,145 |
Selective- Fingerprinting maximum similarity 0.33 |
4,524 |
3,614,409 |
337,034 |
Selective-Fingerprinting maximum similarity 0.25 |
3,970 |
3,283,558 |
302,159 |
Selective-Fingerprinting maximum similarity 0.20 | 3,645 | 3,061,854 | 278,644 |
All Informative, input corpus, the corpus obtained by the redundancy reduction baseline (Last Informative Note), and the corpora produced by the fingerprinting redundancy reduction strategy at different level.