Table 1.
Task No. | Number of unique labels | Corpus size | Number of tokens in document |
Number of sentences in document |
Number of labels in document |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | Max | Min | SD | Mean | Max | Min | SD | Mean | Max | Min | SD | |||
1 | 10 | 1580 | 209.29 | 638 | 44 | 58.32 | 9.44 | 27 | 2 | 2.87 | 1.56 | 5 | 1 | 0.78 |
2 | 32 | 3661 | 233.66 | 622 | 49 | 60.41 | 9.88 | 34 | 1 | 2.81 | 2.05 | 8 | 0 | 1.30 |
3 | 7042 | 22 815 | 1039.73 | 5882 | 8 | 623.21 | 165.71 | 904 | 4 | 95.86 | 36.68 | 127 | 5 | 16.16 |
Note. Task 1: Hallmarks of cancer classification; Task 2: Chemical exposure assessments; Task 3: Diagnosis code assignment (data after label augmentation).
Abbreviation: SD, standard deviation.