Table 3. Overview of extraction statistics.
Abstracts | Full text | Total | Entrez Gene | Ensembl Genomes | |
Articles | 6.4M | 384K | 6.8M | – | – |
Sentences | 54.8M | 66.9M | 121.6M | – | – |
Gene/protein mentions | 43.3M | 33.3M | 76.5M | 28.8M (37.6%) | 47.9M (62.6%) |
Events | 23.5M | 16.7M | 40.2M | 16.3M (40.5%) | 26.0M (64.7%) |
Extraction statistics for PubMed abstracts and PubMed Central full texts with at least one identified gene/protein mention. The last two columns state the number of mentions/events that could be fully normalized to Entrez Gene identifiers or Ensembl Genomes families.