Table 4.
Name | Complexity | N | |L| | d | d/N | LCavg | PUNIQ | Pmax |
---|---|---|---|---|---|---|---|---|
StimModAbsa | 4,449,705 | 247 | 5 | 3603 | 14.59 | 1.15 | 0.036 | 0.008 |
CogParaAllb | 46,451,808 | 247 | 48 | 3919 | 15.87 | 1.13 | 0.336 | 0.004 |
Medical | 63,770,490 | 978 | 45 | 1449 | 1.48 | 1.25 | 0.096 | 0.158 |
Slashdot | 89,777,116 | 3782 | 22 | 1079 | 0.29 | 1.18 | 0.041 | 0.139 |
Enron | 90,296,206 | 1702 | 53 | 1001 | 0.59 | 3.38 | 0.442 | 0.096 |
Values taken from Read et al. (2011); see there for details and sources. For notation, see section 2.1.2 and 2.2. Included are the values for the least and most complex data sets included in this paper.
Abstract alone corpus; stimulus modality labels.
Abstract, title, and keyword corpus; cognitive paradigm class labels.