Skip to main content
. 2019 Jun 24;26(11):1279–1285. doi: 10.1093/jamia/ocz085

Table 1.

Descriptors and basic statistics for 3 text classification tasks

Task No. Number of unique labels Corpus size Number of tokens in document
Number of sentences in document
Number of labels in document
Mean Max Min SD Mean Max Min SD Mean Max Min SD
1 10 1580 209.29 638 44 58.32 9.44 27 2 2.87 1.56 5 1 0.78
2 32 3661 233.66 622 49 60.41 9.88 34 1 2.81 2.05 8 0 1.30
3 7042 22 815 1039.73 5882 8 623.21 165.71 904 4 95.86 36.68 127 5 16.16

Note. Task 1: Hallmarks of cancer classification; Task 2: Chemical exposure assessments; Task 3: Diagnosis code assignment (data after label augmentation).

Abbreviation: SD, standard deviation.