Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2009 Jan 5;191(6):1974–1978. doi: 10.1128/JB.01448-08

The Genome of Thermosipho africanus TCF52B: Lateral Genetic Connections to the Firmicutes and Archaea

Camilla L Nesbø 1,*, Eric Bapteste 2, Bruce Curtis 1, Håkon Dahle 3, Philippe Lopez 2, Dave Macleod 1, Marlena Dlutek 1, Sharen Bowman 4, Olga Zhaxybayeva 1, Nils-Kåre Birkeland 3,5, W Ford Doolittle 1
PMCID: PMC2648366  PMID: 19124572

Abstract

Lateral gene transfers (LGT) (also called horizontal gene transfers) have been a major force shaping the Thermosipho africanus TCF52B genome, whose sequence we describe here. Firmicutes emerge as the principal LGT partner. Twenty-six percent of phylogenetic trees suggest LGT with this group, while 13% of the open reading frames indicate LGT with Archaea.


Thermosipho africanus TCF52B was isolated from produced fluids of a high-temperature oil reservoir in the North Sea using fish waste as the only substrate (4). Phylogenetic analyses based on the 16S rRNA gene sequence and DNA-DNA hybridization placed it as a strain of Thermosipho africanus, which was first isolated from a shallow marine hydrothermal system in Djibouti, Africa (8, 21).

The complete genome sequence of this strain was determined by the conventional whole-genome shotgun strategy. Genomic libraries containing 1- to 4-kb and 40-kb fragments were constructed, and sequence chromatograms were produced using a MegaBACE 1000 capillary DNA sequencer (GE Healthcare). Nucleotide skews were computed as described previously (11). Automated open reading frame (ORF) identification and annotation were performed using the annotation software Manatee made available by TIGR (23). Pseudogenes were identified by doing BLAST searches of neighboring ORFs with the same or similar annotations and by using the program Psi-phi (9, 10), and clustered regularly interspaced short palindromic repeat loci (CRISPRs) were identified using the web site http://crispr.u-psud.fr/crispr/CRISPRHomePage.php with the default parameters (6). Maximum-likelihood (ML) trees (WAG [Γ+Ι model, four categories]) were constructed from protein-coding ORFs using PHYML and the PhyloGenie package (5). Recently, several Thermotogales genomes have become available in GenBank. As these genomes had not been published yet, we did not include them in any “genome-scale” analyses (i.e., the phylogenetic analyses). We did, however, include them in the BLAST analyses of mobile Thermosipho africanus genes.

The genome of Thermosipho africanus strain TCF52B is a single circular chromosome consisting of 2,016,657 bp with an average G+C content of 30.8%. Strand asymmetries, such as GC skew and tetramer skews, are pronounced and show two clear singularity points, located at roughly 8 kb and 1033 kb from the +1 site (see Fig. S1 in the supplemental material). Since these two points are diametrically opposed on the circular chromosome, dividing it into two halves with opposite compositional skews, they make good candidates for the putative origin and termination of replication. The 1,033-kb region is likely to harbor the origin, since GC skew becomes positive past this location, as in most bacterial genomes with a known origin.

The genome contains 2,000 potential coding sequences, of which 1913 are putative protein-coding ORFs, 30 are putatively assigned as pseudogenes, and 57 encode RNA. A comparison to the genome of Thermotoga maritima is given in Table 1. The Thermosipho africanus genome is about 156 kb larger than the Thermotoga maritima genome and carries 36 more ORFs. The genome contains duplicated regions comprising paralogous gene copies, CRISPRs, and mobile genetic elements, which collectively provide considerable indirect evidence for genomic instability and acquisition of exogenous genetic information.

TABLE 1.

General features of the Thermosipho africanus genome, with a comparison to Thermotoga maritima

Feature Thermosipho africanus Thermotoga maritima
Length of sequence (bp) 2,016,657 1,860,725
G+C content (%) 30.8 46
No. of:
    ORFs 1,913 1,877
    Pseudogenes (disrupted reading frame) 30 (17 transposase and integrases) 3 (1 transposase) (28 according to http://www-bio3d-igbmc.u-strasbg.fr/ICDS/)
    rRNAs 3 16S-23S-5S 1 16S-23S-5S
    tRNAs 48 (11 clusters, 19 single genes) 46 (10 clusters, 19 single genes)
CRISPR direct repeats
    CRISPR 1, 2, 4 GTTTAGAATCTACCTATGAGGAATGAAAAC TTTCCATACCTCTAAGGAATTATTGAAACA
    CRISPR 3, 5, 6, 7, 11 GTTTTCATTCCTCATAGGTAGATTCTAAAC
    CRISPR 8, 9, 12 RTTTCAATTCCTRCAAGGTAAGGTACAAAC
    CRISPR 10 GTTTCAATCCCTAATAGGTATGCTAAAAAC

CRISPR structures comprise direct genomic repeats of 24 to 47 bp length separated by variable-length spacers (1, 13, 22) and are thought to function as a prokaryotic “immune system.” Due to their patchy distribution in prokaryotes, CRISPRs are often assumed to undergo frequent lateral transfer. Thermosipho africanus displays 12 CRIPSRs spread over its chromosome (Fig. 1), compared to 8 such loci in Thermotoga maritima (15). These 12 CRISPRs fall into four groups based on the sequence of their direct repeats (Table 1). CRISPR-associated proteins, encoded by CRISPR-associated (Cas) genes near CRISPR repeats, function somehow in CRISPR biology, and Cas gene phylogenies provide some of the most compelling evidence for CRISPR mobility (7). In Thermosipho africanus their phylogenetic origins appear to be especially complex. Most interestingly, they do not show strong affinities with other Thermotogales sequences. Instead, although Thermotoga maritima MSB8 harbors many Cas genes (26 in reference 7), in almost every case these do not branch together in ML trees; they are sisters in only 3 of 25 trees (Thermosipho africanus has 30 Cas genes).

FIG. 1.

FIG. 1.

Distribution of CRISPR loci and mobile elements along the Thermosipho africanus genome, as well as phylogenetic “affiliation” of genes along the chromosome and the GC contents of genes. Outer circle, phylogenetic affiliation of the sister of Thermosipho africanus in phylogenetic trees estimated from predicted ORFs. The following color coding for the sister in the phylogenetic tree was used: green, self; red, Thermotogales; yellow, Firmicutes; blue, Archaea; orange, “others” as defined in Fig. 2; pink, complex; gray, complex including Thermotogales; light blue, no tree. Second and third circles, distribution of the mobile elements along the Thermosipho africanus chromosome. Mobile elements in forward orientation are indicated in red, and mobile elements in reverse orientation are indicated in blue. Fourth circle, distribution of CRISPRS and Cas genes along the genome. CRISPR repeats are in green, and Cas genes are in purple. Innermost circle, distribution of gene GC content. Genes having a GC content above the mean are in red, while those with a GC content below the mean are in green. The three spikes in GC content correspond to rRNA operons.

Seventy-eight ORFs were annotated as encoding transposases or integrases, and at least 61 of these are likely to be active genes (Fig. 1). (In contrast, the Thermotoga maritima genome contains only 12 ORFs annotated as encoding transposases.) All 78 fall into one of eight groups of highly similar sequences, and each of the 78 is sister to another (see Table S1 in the supplemental material), indicating recent intragenomic transposition and/or lateral gene transfers (LGT) from a closely related lineage. Remarkably, only four of these eight families had homologs in other Thermotogales genomes, and there are no homologs in its closest relative, Thermosipho melanesiensis (see Table S1 in the supplemental material). We did, however, detect likely inactive homologs in Thermosipho melanesiensis for three of the groups (see Table S1 in the supplemental material).

We attempted to calculate ML phylogenetic trees from each of the 1,913 ORFs and obtained trees from 1,578 (82%), using the PhyloGenie package. The distribution of the “immediate sisters” (nearest neighbors) of Thermosipho africanus in the trees is shown in Fig. 2. In 60% of the trees the sister was another Thermotogales bacterium, in most cases Thermotoga maritima, since this was the only other complete Thermotogales genome included in the analysis. For 9% of the treeable ORFs, the sister gene originated from within its own genome.

FIG. 2.

FIG. 2.

Distribution of Thermosipho africanus sister taxon or clade in 1,578 phylogenetic trees for potentially protein-coding ORFs. “Other group” means that the organism(s) in the sister group belonged to a taxonomic group that was not Thermotogales, Firmicutes, or Archaea. “Complex” means that the sister clade was composed of organisms from several different taxonomic groups, and “complex including Thermotogales” means that another Thermotogales sequence was included in this clade.

The phylogenetic analysis revealed that 58 ORFs (3.7%) had Archaea as immediate sister in the tree. This is considerably lower than the 24% first reported for the Thermotoga maritima genome (16). A lower value was to be expected, for two reasons. First, growth of the bacterial gene and genome data has outpaced that for Archaea, so that bacterial best hits to patchily distributed genes with ambiguous phylogenetic signals have become differentially more likely. Second, the Thermotoga maritima genome will itself be sister for all or most Thermosipho africanus genes that were transferred prior to their divergence and are still present in both.

We therefore visually inspected each of the trees in order to also obtain information on LGT that predate the split between Thermosipho and Thermotoga (see Fig. S2 in the supplemental material). This also allowed us to detect transfers where the genes involved have later been duplicated in the Thermosipho africanus genome (so that the sister in the tree was another Thermosipho africanus gene.) This analysis suggested that a total of 202 ORFs (∼13%) have been involved in LGT with Archaea (including both ancient and recent events). Among these, 125 (∼62%) also involve Thermotoga maritima, while 77 (∼38%) have no close homolog in Thermotoga maritima. This latter number is of course an overestimate of the number of potential recent transfers, as many of the transferred genes might have been lost by Thermotoga maritima MSB8, but these numbers do suggest that LGT between the Thermotogales and the Archaea is a still an ongoing process. Thermophilic Archaea such as members of the genera Archaeoglobus (2) and Thermococcus (3, 14) are among the few other organisms considered to be native to oil reservoirs, the habitat from which this strain was isolated (4). Moreover, a recent reanalysis of the Thermotoga maritima genome reported 11.3% archaeal genes in this genome, consistent with our findings (20).

A large proportion of the ORFs have a close phylogenetic relationship with Firmicutes, with 8% of the ORFs having Firmicutes as sister in the tree (Fig. 2). This connection has also been observed earlier in phylogenetic analyses (17, 19, 20). To further investigate this, we performed the same analysis of the trees in which Thermosipho africanus clusters with Firmicutes as we did for Archaea (see Fig. S3 in the supplemental material). In total there are 417 (26%) trees that suggest LGT between these lineages. For 244 (58.5%) of these trees the LGT predated the Thermosipho/Thermotoga split, as there was also a close homolog in Thermotoga maritima MSB8, while there was no close Thermotoga maritima homolog in 173 (41.5%) of the trees. Moreover, Thermotogales and Firmicutes were sisters, rather than nested one within the other, in 62 (3.9%) of the trees. One could interpret this as evidence that these two phyla are indeed sisters or that there has been substantial transfer between them, though the true phylogenetic position of the Thermotogales is elsewhere (likely deeper) in the tree. Alternatively, of course, the notion of a unique “true” phylogenetic position could be questioned.

A high level of LGT between Thermotogales and Firmicutes might in any case be expected, since some members of the Firmicutes, e.g., the Thermoanaerobales, frequently cohabit with Thermotogales in natural environments. For instance, Thermotogales and the Firmicutes genera Thermoanaerobacter and Desulfotomaculum are the only bacteria thought to be indigenous to oil reservoirs (4, 12, 18). Moreover, most of the mobile elements found scattered in the Thermosipho africanus genome seem to have recently originated from Firmicutes, further supporting the importance of LGT between these lineages.

Nucleotide sequence accession number.

The genome sequence of Thermosipho africanus strain TCF52B has been submitted to GenBank under accession number CP001185.

Supplementary Material

[Supplemental material]

Acknowledgments

This work was supported by funds from the Canadian Institutes for Health Research (MOP 4467) and Genome Atlantic (ACOA) to W.F.D. and by funds from the Norwegian Research Council (grant no. 145854/110 to N.K.B.). C.L.N. is supported by a Young Scientist grant from the Norwegian Research Council (180444/V40).

Sequencing and assembly were performed at The Atlantic Genome Centre (Halifax, Canada). We thank TIGR (now JCVI) for providing the TIGR Annotation Service, which provided us with automatic annotation data and the manual annotation tool Manatee. We also thank Peter Cordes and Sebastien Halary for help with the data analysis and Angie Lewis for help with sequencing and assembly.

Footnotes

Published ahead of print on 5 January 2009.

Supplemental material for this article may be found at http://jb.asm.org/.

REFERENCES

  • 1.Barrangou, R., C. Fremaux, H. Deveau, M. Richards, P. Boyaval, S. Moineau, D. A. Romero, and P. Horvath. 2007. CRISPR provides acquired resistance against viruses in prokaryotes. Science 3151709-1712. [DOI] [PubMed] [Google Scholar]
  • 2.Beeder, J., R. K. Nilsen, J. T. Rosnes, T. Torsvik, and T. Lien. 1994. Archaeoglobus-Fulgidus isolated from hot North Sea oil field waters. Appl. Environ. Microbiol. 601227-1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bonch-Osmolovskaya, E. A., M. L. Miroshnichenko, A. V. Lebedinsky, N. A. Chernyh, T. N. Nazina, V. S. Ivoilov, S. S. Belyaev, E. S. Boulygina, Y. P. Lysov, A. N. Perov, A. D. Mirzabekov, H. Hippe, E. Stackebrandt, S. L'Haridon, and C. Jeanthon. 2003. Radioisotopic, culture-based, and oligonucleotide microchip analyses of thermophilic microbial communities in a continental high-temperature petroleum reservoir. Appl. Environ. Microbiol. 696143-6151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dahle, H., F. Garshol, M. Madsen, and N. K. Birkeland. 2008. Microbial community structure analysis of produced water from a high-temperature North Sea oil-field. Antonie van Leeuwenhoek 9337-49. [DOI] [PubMed] [Google Scholar]
  • 5.Frickey, T., and A. N. Lupas. 2004. PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. 325231-5238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Grissa, I., G. Vergnaud, and C. Pourcel. 2007. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 35W52-W57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Haft, D. H., J. Selengut, E. F. Mongodin, and K. E. Nelson. 2005. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput. Biol. 1e60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Huber, R., C. R. Woese, T. A. Langworthy, H. Fricke, and K. O. Stetter. 1989. Thermosipho africanus gen. nov., represents a new genus of thermophilic eubacteria within the“Thermotogales.” Syst. Appl. Microbiol. 1232-37. [Google Scholar]
  • 9.Lerat, E., and H. Ochman. 2004. Psi-Phi: exploring the outer limits of bacterial pseudogenes. Genome Res. 142273-2278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lerat, E., and H. Ochman. 2005. Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res. 333125-3132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lopez, P., P. Forterre, H. le Guyader, and H. Philippe. 2000. Origin of replication of Thermotoga maritima. Trends Genet. 1659-60. [DOI] [PubMed] [Google Scholar]
  • 12.Magot, M., B. Ollivier, and B. Patel. 2000. Microbiology of oil reservoirs. Antonie van Leeuwenhoek 77103-116. [DOI] [PubMed] [Google Scholar]
  • 13.Makarova, K. S., N. V. Grishin, S. A. Shabalina, Y. I. Wolf, and E. V. Koonin. 2006. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct. 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Miroshnichenko, M. L., H. Hippe, E. Stackebrandt, N. A. Kostrikina, N. A. Chernyh, C. Jeanthon, T. N. Nazina, S. S. Belyaev, and E. A. Bonch-Osmolovskaya. 2001. Isolation and characterization of Thermococcus sibiricus sp. nov. from a Western Siberia high-temperature oil reservoir. Extremophiles 585-91. [DOI] [PubMed] [Google Scholar]
  • 15.Mongodin, E. F., I. R. Hance, R. T. Deboy, S. R. Gill, S. Daugherty, R. Huber, C. M. Fraser, K. Stetter, and K. E. Nelson. 2005. Gene transfer and genome plasticity in Thermotoga maritima, a model hyperthermophilic species. J. Bacteriol. 1874935-4944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nelson, K. E., R. A. Clayton, S. R. Gill, M. L. Gwinn, R. J. Dodson, D. H. Haft, E. K. Hickey, J. D. Peterson, W. C. Nelson, K. A. Ketchum, L. McDonald, T. R. Utterback, J. A. Malek, K. D. Linher, M. M. Garrett, A. M. Stewart, M. D. Cotton, M. S. Pratt, C. A. Phillips, D. Richardson, J. Heidelberg, G. G. Sutton, R. D. Fleischmann, J. A. Eisen, O. White, S. L. Salzberg, H. O. Smith, J. C. Venter, and C. M. Fraser. 1999. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399323-329. [DOI] [PubMed] [Google Scholar]
  • 17.Nesbø, C. L., M. Dlutek, O. Zhaxybayeva, and W. F. Doolittle. 2006. Evidence for existence of “mesotogas,” members of the order Thermotogales adapted to low-temperature environments. Appl. Environ. Microbiol. 725061-5068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ollivier, B., and J. Cayol. 2005. Fermentive, iron-reducing, and nitrogen-reducing microorhganisms, p. 71-88. In B. Ollivier and M. Magot (ed.), Petroleum microbiology. ASM Press, Washington, DC.
  • 19.Omelchenko, M. V., Y. I. Wolf, E. K. Gaidamakova, V. Y. Matrosova, A. Vasilenko, M. Zhai, M. J. Daly, E. V. Koonin, and K. S. Makarova. 2005. Comparative genomics of Thermus thermophilus and Deinococcus radiodurans: divergent routes of adaptation to thermophily and radiation resistance. BMC Evol. Biol. 557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Podell, S., and T. Gaasterland. 2007. DarkHorse: a method for genome-wide prediction of horizontal gene transfer. Genome Biol. 8R16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ravot, G., B. Ollivier, B. K. Patel, M. Magot, and J. L. Garcia. 1996. Emended description of Thermosipho africanus as a carbohydrate-fermenting species using thiosulfate as an electron acceptor. Int. J. Syst. Bacteriol. 46321-323. [Google Scholar]
  • 22.Sorek, R., V. Kunin, and P. Hugenholtz. 2008. CRISPR—a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat. Rev. Microbiol. 6181-186. [DOI] [PubMed] [Google Scholar]
  • 23.White, O. 2004. Bacterial genome annotation at TIGR. In C. M. Fraser, T. D. Read, and K. E. Nelson (ed.), Microbial genomes. Humana Press, Totowa, NJ.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental material]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES