Skip to main content
. 2023 Nov 24;39(12):btad716. doi: 10.1093/bioinformatics/btad716

Table 1.

Clinical token corpus statistics.

Number of clinical notes 2 692 451
Number of tokens 159 236 294
Unique tokens 700 475
(GS) Unique “canonical” tokens (length > 4) 256 005
(GS) Unique typographical errors 51 337
Unique tokens in ontology concepts 6862
Ontology tokens in gold standard 4769
Ontology tokens with typographical errors 3858