Skip to main content
. 2013 Jun 22;14:419. doi: 10.1186/1471-2164-14-419

Table 1.

Gene families used in simulations

Gene name Median length Rate of amino acid evolution Number of sequences Source of sequences Source of profile
16S rRNA
1535 bp
NA (Highly conserved)
427
RDP
RDP (INFERNAL)
rpoB
1296 aa
73.51
460
AMPHORA + GenBank
AMPHORA (HMMER)
rpsB
226 aa
51.96
411
AMPHORA + GenBank
AMPHORA (HMMER)
dnaG
395 aa
112.53
456
AMPHORA + GenBank
AMPHORA (HMMER)
lolC 411 aa 184.04 442 UniProt + GenBank PhyloFacts (HMMER)

Each family of gene sequences was limited to its unique representatives among AMPHORA taxa (see Methods). Rate of amino acid evolution was determined by summing all branch lengths in a phylogenetic tree inferred via RAxML from the protein sequences; smaller values indicate fewer substitutions and greater conservation. The 16S rRNA gene requires a nucleotide model of evolution and hence has an incomparable value; it is well known to be highly conserved, with variable regions. 16S rRNA sequences were obtained from the Ribosomal Database Project (RDP) [20]. A larger set of 1,071 16S rRNA sequences was used only for the Fast UniFrac analysis (see Additional file 1: Table S1). Amino acid sequences for rpoB, rpsB, and dnaG families were obtained via AMPHORA [14], while corresponding DNA sequences were downloaded from NCBI GenBank [21]. For lolC, family members were determined by PhyloFacts [22] (family accession bpg052966 as of February 16, 2011); amino acid sequences were downloaded from UniProt [23], and corresponding DNA sequences were downloaded from EMBL-EBI [24]. Additional file 1: Table S1 provides download dates and sequence accession numbers.