Skip to main content
. 2021 Feb 1;49(4):1859–1871. doi: 10.1093/nar/gkab012

Table 1.

Data availability per resource and organism (human, mouse, rat and pig)

Resource/annotation Human Mouse Rat Pig
Ensembl genome annotations
Coding genes 20 418 22 619 22 250 22 452
Non-coding genes 22 107 15 795 8 934 3 250
eggNOG mammalian orthology
Coding genes assigned to orthologous groups 86.7% 84.3% 76.4% 82.2%
Mentions in biomedical literature
Organism 1 824 080 1 629 280 133 937
Gene 1 304 170 734 243 57 230
Gene Ontology annotations
Experimental 107 301 89 360 49 281 817
Author statement 48 894 4 760 3 396 27
Inferred 86 785 170 033 188 718 47 225
Electronic 74 049 44 022 40 559 101 074
High-scoring STRING protein-protein interactions
Experimental 18 069 1 304 920 1 266
Experimental transferred 12 713 22 030 39 381 32 312
TISSUES expression data
Experimental datasets 4 4 3 3
Tissues covered by experimental data 20 20 12 20

The number of coding and non-coding genes for the assemblies of human, mouse, rat, and pig in Ensembl release 95 are reported. From the eggNOG v4.5 orthology database, we report the percentage of genes from each organism that are assigned to a mammalian orthologous group. Text mining of all PubMed abstracts and a subset of full text articles available from PMC provided the number of publications that mention each organism and its genes. We grouped the most recent Gene Ontology annotations into four categories based on their evidence codes and counted the number of annotations for each group in each organism. High-scoring protein–protein interactions from the STRING v10.5 database (overall confidence score above 0.7) were counted. For the TISSUES 2.0 database of mammalian expression, the number of experimental datasets supporting the 21 main tissues is reported together with the number of tissues covered by these datasets. See Supplementary Table S1 as well as Methods for more details.