Abstract
Legionella tunisiensis is a gammaproteobacterium from the class Legionellaceae, growing in amoebae. We sequenced the genome from strain LegMT. It is composed of 3,508,121 bp and contains 4,747 protein-coding genes and 38 RNA genes, including 3 rRNA genes.
GENOME ANNOUNCEMENT
Legionella tunisiensis was first isolated from a water sample collected from a hypersaline lake, named Lake Sabka, located in Tunisia. The isolation procedure is described elsewhere and consisted of amoebal coculture (4). This is a Gram-negative and Gimenez-positive bacillus, classified in the genus Legionella, and the closest species related to L. tunisiensis, based on the complete sequencing of the 16S rRNA gene, is Legionella feeleii (GenBank accession number X73406), with 98% similarity using BLASTN (1). The strain L. tunisiensis LegMT is unable to grow on Columbia agar with 5% sheep blood and grows in 3 days on buffered charcoal yeast extract (BCYE) medium, under a 5% CO2 atmosphere. It has a pathogenic effect on amoeba within 2 days, and it has the same metabolic characteristics as other Legionellaceae, except the capability to hydrolyze gelatin.
The genome was pyrosequenced using the 454 GS FLX titanium platform (Roche, Branford, CT) (9) and assembled using Newbler software v2.5.3 (Roche). A total of 213,119 reads were obtained using paired-end sequencing. Scaffolding was improved using Opera software v1.1 (5) combined with GapFiller v1.10 (3).
The draft genome of L. tunisiensis consists of 13 scaffolds of 328 contigs containing 3,374,639 bp and an estimated size, including gaps, of 3,508,121 bp. The G+C content of this genome is 38.39%, which is similar to those of the other close Legionella species. Using BLASTN, Aragorn (8), and RNAmmer (7), the genome was shown to contain 38 RNA genes, including 3 rRNAs in a single operon and 35 tRNAs.
Potential coding sequences (CDSs) were predicted using Prodigal software (6), whereas the predicted open reading frames (ORFs) were excluded if they were spanning a sequencing gap region. Assignment of protein functions was performed by searching against the GenBank, Clusters of Orthologous Groups (COG), and Pfam databases using BLASTP (2, 10, 11). A total of 4,782 CDSs were identified, representing 4,747 genes and a coding capacity of 2,758,416 bp (82.7% of the sequenced genome). The number of CDSs in L. tunisiensis is higher than those found in the common Legionella species (e.g., L. longbeachae NSW150, 3,739 genes; L. pneumophila 130b, 3,141 genes). However, we have to consider that leaving gaps in this draft genome could increase this number by splitting CDSs.
Among these genes, 212 (4.4%) were assigned to putative proteins, 2,236 (46.7%) were assigned to hypothetical proteins, and 49 (1%) encode proteins of unknown function. Moreover, 4,645 genes matched at least one sequence in the COG database (12) with BLASTP default parameters. Interestingly, L. tunisiensis presents many genes annotated as encoding resistance proteins (40 genes) and many more than in L. pneumophila 130b (3 genes are annotated as drug resistance only).
Nucleotide sequence accession numbers.
The L. tunisiensis genome, consisting of 328 contigs arranged in 13 scaffolds, was deposited in the EMBL nucleotide sequence database under the accession numbers CALJ01000001 to CALJ01000340.
ACKNOWLEDGMENT
This work had no funding source.
REFERENCES
- 1.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410 [DOI] [PubMed] [Google Scholar]
- 2.Altschul SF, et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome Biol. 13:R56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Campocasso A, Boughalmi M, Fournous G, Raoult D, La Scola B.3 February 2012. Description of two new Legionella species isolated from environmental water samples. Int. J. Syst. Evol. Microbiol. [Epub ahead of print.] doi:10.1099/ijs.0.037853-0 [DOI] [PubMed] [Google Scholar]
- 5.Gao S, Sung WK, Nagarajan N. 2011. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. 18:1681–1691 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lagesen K, et al. 2007. RNammer: consistent annotation of rRNA genes in genomic sequences. Nucleic Acids Res. 35:3100–3108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32:11–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Margulies M, et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Punta M, et al. 2012. The Pfam protein families database. Nucleic Acids Res. 40:D290–D301 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on protein families. Science 278:631–637 [DOI] [PubMed] [Google Scholar]
- 12.Tatusov RL, Galperin MY, Natale DA, Koonin EV. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28:33–36 [DOI] [PMC free article] [PubMed] [Google Scholar]
