Skip to main content
. 2021 Nov 2;6(6):e00673-21. doi: 10.1128/mSystems.00673-21

TABLE 1.

Comparison among frequented annotation resourcesa

Metric TubercuList PATRIC RefSeq Mtb Network Portal UniProt KEGG BioCyc
Coding sequences 4,038 4,367 3,989 4,038 3,997 3,906 4,031
Proteins with functional assignments 2,815 3,007 2,341 2,853 2,906 1,750 2,571
Hypothetical proteins 1,223 1,360 1,648 1,185 1,091 2,156 1,460
Proteins with ≥1 GO term 2,629 969 0 2,460 3,305 0 3,557
Proteins with EC no.(s) assigned 1,293 1,074 1,081 1,003 1,138 1,050 1,018
a

“Functional assignments” refer to annotations that describe protein function and exclude hypothetical, unknown/uncharacterized, and PE/PPE family proteins. Counts reflect database content on 17 May 2017 for RefSeq (36) (https://www.ncbi.nlm.nih.gov/refseq/), PATRIC (6) (https://www.patricbrc.org/), and Mtb Network Portal (9) (http://networks.systemsbiology.net/mtb/) and 23 June 2017 for KEGG (120) (https://www.kegg.jp/kegg/genome/pathogen.html) and UniProt (116) (https://www.uniprot.org/uniprot/). The number of CDS in KEGG is reported as 3,906 because they include only protein-coding genes. The source of annotations for M. tuberculosis protein-coding genes in KEGG is TubercuList (131).