TABLE 1.
Metric | TubercuList | PATRIC | RefSeq | Mtb Network Portal | UniProt | KEGG | BioCyc |
---|---|---|---|---|---|---|---|
Coding sequences | 4,038 | 4,367 | 3,989 | 4,038 | 3,997 | 3,906 | 4,031 |
Proteins with functional assignments | 2,815 | 3,007 | 2,341 | 2,853 | 2,906 | 1,750 | 2,571 |
Hypothetical proteins | 1,223 | 1,360 | 1,648 | 1,185 | 1,091 | 2,156 | 1,460 |
Proteins with ≥1 GO term | 2,629 | 969 | 0 | 2,460 | 3,305 | 0 | 3,557 |
Proteins with EC no.(s) assigned | 1,293 | 1,074 | 1,081 | 1,003 | 1,138 | 1,050 | 1,018 |
“Functional assignments” refer to annotations that describe protein function and exclude hypothetical, unknown/uncharacterized, and PE/PPE family proteins. Counts reflect database content on 17 May 2017 for RefSeq (36) (https://www.ncbi.nlm.nih.gov/refseq/), PATRIC (6) (https://www.patricbrc.org/), and Mtb Network Portal (9) (http://networks.systemsbiology.net/mtb/) and 23 June 2017 for KEGG (120) (https://www.kegg.jp/kegg/genome/pathogen.html) and UniProt (116) (https://www.uniprot.org/uniprot/). The number of CDS in KEGG is reported as 3,906 because they include only protein-coding genes. The source of annotations for M. tuberculosis protein-coding genes in KEGG is TubercuList (131).