Table 1.
Gene name | Median length | Rate of amino acid evolution | Number of sequences | Source of sequences | Source of profile |
---|---|---|---|---|---|
16S rRNA |
1535 bp |
NA (Highly conserved) |
427 |
RDP |
RDP (INFERNAL) |
rpoB |
1296 aa |
73.51 |
460 |
AMPHORA + GenBank |
AMPHORA (HMMER) |
rpsB |
226 aa |
51.96 |
411 |
AMPHORA + GenBank |
AMPHORA (HMMER) |
dnaG |
395 aa |
112.53 |
456 |
AMPHORA + GenBank |
AMPHORA (HMMER) |
lolC | 411 aa | 184.04 | 442 | UniProt + GenBank | PhyloFacts (HMMER) |
Each family of gene sequences was limited to its unique representatives among AMPHORA taxa (see Methods). Rate of amino acid evolution was determined by summing all branch lengths in a phylogenetic tree inferred via RAxML from the protein sequences; smaller values indicate fewer substitutions and greater conservation. The 16S rRNA gene requires a nucleotide model of evolution and hence has an incomparable value; it is well known to be highly conserved, with variable regions. 16S rRNA sequences were obtained from the Ribosomal Database Project (RDP) [20]. A larger set of 1,071 16S rRNA sequences was used only for the Fast UniFrac analysis (see Additional file 1: Table S1). Amino acid sequences for rpoB, rpsB, and dnaG families were obtained via AMPHORA [14], while corresponding DNA sequences were downloaded from NCBI GenBank [21]. For lolC, family members were determined by PhyloFacts [22] (family accession bpg052966 as of February 16, 2011); amino acid sequences were downloaded from UniProt [23], and corresponding DNA sequences were downloaded from EMBL-EBI [24]. Additional file 1: Table S1 provides download dates and sequence accession numbers.