Skip to main content
. 2017 Sep 2;12:50. doi: 10.1186/s40793-017-0266-y

Table 3.

Genome statistics

Attribute Value %
Genome size (bp) a 3,088,407 100
DNA coding (bp) 2,621,999 84.9
DNA G + C (bp) 1,164,329 37.7
DNA scaffolds 1281 100
Total genes 3097 100
Protein-coding genes 3045 98.3
RNA genes 46 1.5
Pseudo genes 6 0.2
Genes in internal clusters - -
Genes with function prediction b 2051 67.4
Genes assigned to COGs 1659 54.5
Genes with Pfam domains 1984 65.2
Genes with signal peptides c 337 11.1
Genes with transmembrane helices 626 20.6
CRISPR repeats 10

aAll 1281 scaffolds >200 bp. 478 of these (37.3%) are scaffolds >1000 bp, comprising 2,726,561 bp (88.3% of all base pairs)

bGenes with function prediction are all 3045 protein-coding genes minus those 994 genes annotated as “hypothetical proteins” that have no COG category or fall into the COG categories “unknown function” or “general function prediction only” and that have no Pfam domain or a Pfam “domain of unknown function”

cIncludes genes for which a signal peptide was predicted with at least two of the three tools used. Percentages of genes with function prediction, COGs, Pfam domains, signal peptides and transmembrane helices were calculated against a total of 3045 protein-coding genes