Table 3.
GSC+ | EHR | |
---|---|---|
Number of documents | 228 | 100 |
Number of annotations | 2773 | 1815 |
Unique HPO concepts | 461 | 252 |
Total number of tokens in the corpus | 5724 | 59 470 |
Unique tokens in the corpus | 1035 | 5672 |
Annotations containing “canonical” tokens | 2362 (85.2%) | 1450 (79.9%) |
Total “canonical” tokens | 3685 (64.4%) | 1095 (19.3%) |
Unique “canonical” tokens | 571 (55.2%) | 260 (23.7%) |