Abstract
Pseudomonas aeruginosa is an important cause of disease in hospitalized and immunocompromised patients. The genome of P. aeruginosa is among the largest of bacteria pathogenic to humans. We present the draft genome sequence of P. aeruginosa strain PABL056, a human bloodstream isolate with the largest genome yet reported in P. aeruginosa.
GENOME ANNOUNCEMENT
The Gram-negative bacterium Pseudomonas aeruginosa is capable of infecting plants, animals, and humans. In hospitalized patients, P. aeruginosa infections of the lungs, bloodstream, urinary tract, and skin and soft tissues are responsible for considerable morbidity and mortality (6). The genome of P. aeruginosa is one of the largest of the bacterial pathogens that infect humans, with reported sizes of between 6.1 and 6.9 Mbp (Pseudomonas aeruginosa Genome Sequencing Projects, National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov/genome/genomes/187). Here we report the genome sequence of P. aeruginosa strain PABL056, a clinical isolate from the bloodstream of a human patient with an intravascular catheter infection (7). PABL056 has the largest genome yet reported among sequenced P. aeruginosa strains.
A total of 1.22 billion bases in paired-end reads were generated using the Illumina HiSeq 2000 platform at the University of Maryland Institute for Genome Sciences. The read set was randomly downsampled to yield approximately 100× coverage. Using the Ray de novo assembly program v2.0.0-rc8 (3), an assembly of 401 contigs in 391 scaffolds was generated. The total length of all scaffolds was 7,283,157 bp, with an average G+C content of 65.5%. Sixty-three copies of tRNA genes were predicted by tRNAscan-SE (4). BLAST alignment of the predicted 16S rRNA gene sequence against the Green Genes database (http://greengenes.lbl.gov) showed 100% identity to P. aeruginosa 16S rRNA gene sequences. Using GeneMarkS (2), 7,146 open reading frames (ORFs) larger than 100 bp were identified, compared to 6,191 ORFs predicted from the sequence of the next largest P. aeruginosa strain, PA2192 (5) (GenBank accession no. NZ_CH482384.1). PABL056 ORF lengths ranged from 102 bp to 14,487 bp, with an average length of 901 bp and a coding intensity of 88.29%. The ORF amino acid sequences were searched against the COG, KEGG, Swiss-Prot, TrEMBL, and NR databases, generating hits against 6,862, 3,370, 6,697, 7,088, and 7,100 records, respectively. A total of 33 coding sequences (CDSs) did not have hits above a score cutoff of 30 bits in any of these databases and therefore represent novel ORFs. Alignment of reads against four plasmids found in P. aeruginosa (GenBank accession no. NC_008357, NC_009739, NC_010722, and NC_007100) gave sequence coverage values ranging between 0% and 7.3%.
As this strain appears to have the largest genome length yet reported for P. aeruginosa, we sought to verify that the size was not an artifact of sequence read misassembly. To evaluate for possible artifactual duplications that would artificially increase the total scaffold size, nucleotide sequences of predicted ORFs were aligned to each other with BLAST (1). Fifty-eight ORFs had at least 98% identity with one or more ORFs on another scaffold. Many of these were predicted to be phage or transposon sequences. Together, these ORFs constituted 28,128 bp representing either true genome duplication or assembly artifacts. Thus, even if all 28 kb were due to artifactual assembly errors, the genome size of PABL056 would still be substantially larger than those of other P. aeruginosa strains sequenced to date.
Nucleotide sequence accession numbers.
This Whole Genome Shotgun project has been deposited in DDBJ/EMBL/GenBank under accession no. ALPS00000000. The version described in this article is the first version, ALPS01000000.
ACKNOWLEDGMENTS
This work was supported by National Institute of Allergy and Infectious Diseases (NIAID) grants F32AI089068 (E.A.O.), T32AI007476 (E.A.O.), R01AI075191 (A.R.H.), R01AI053674 (A.R.H.), and R44AI068185 (A.R.H.).
We thank Lisa DeShong Sadzewicz, Luke Tallon, and staff at the University of Maryland School of Medicine Institute for Genome Sciences for NGS sequencing. We also thank Mark Mandel and Sudhir Penugonda for discussions and guidance.
REFERENCES
- 1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410 [DOI] [PubMed] [Google Scholar]
- 2. Besemer J, Lomsadze A, Borodovsky M. 2001. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29:2607–2618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Boisvert S, Laviolette F, Corbeil J. 2010. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol. 17:1519–1533 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Mathee K, et al. 2008. Dynamics of Pseudomonas aeruginosa genome evolution. Proc. Natl. Acad. Sci. U. S. A. 105:3100–3105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Pier GB, Ramphal R. 2010. Pseudomonas aeruginosa, p 2835–2860 In Mandell GL, Bennett JE, Dolin R. (ed), Principles and practice of infectious diseases, 7th ed Elsevier, Churchill Livingstone, Philadelphia, PA [Google Scholar]
- 7. Scheetz MH, et al. 2009. Morbidity associated with Pseudomonas aeruginosa bloodstream infections. Diagn. Microbiol. Infect. Dis. 64:311–319 [DOI] [PMC free article] [PubMed] [Google Scholar]
