Abstract
Previously, we demonstrated that the recently evolved PCR-ribotype 027 hypervirulent Clostridium difficile strain (R20291) has acquired five genetic regions compared to the historic 027 counterpart strain (CD196), that may in part explain phenotypic traits relating to survival, antimicrobial resistance and virulence. Closer scrutiny of the three genome sequences reveals that, in addition to gene gain/loss, point mutations and inversions appear to have accumulated. Inversions are located upstream of potential coding sequences and could affect expression of these. C. difficile has a highly fluid genome with multiple mechanisms to modify its genetic content and is continuing to evolve in our hospitals influenced by environmental changes and human activity.
Key words: Clostridium difficile, 027 ribotype, point mutations, inversions, hypervirulence
Introduction
Clostridium difficile is a Gram-positive, anaerobic, spore-forming bacillus that is the leading cause of nosocomial diarrhea worldwide.1 C. difficile is a unique pathogen that often predominates in the bowel microflora as a result of the microbial compositional changes which follow antibiotic treatment. The hospital environment and patients undergoing antibiotic treatment provide a discrete ecosystem where C. difficile persists and virulent clones thrive. The continued rise of C. difficile infection (CDI) worldwide has been accompanied by the rapid emergence and transcontinental dissemination of a highly virulent clone, designated PCR-ribotype 027.2 These strains have risen from obscurity to become the most frequently isolated C. difficile strain types. Additionally, patients infected with these strains often experience more severe diarrhea, more recurrent episodes and higher mortality.3–7 The emergence of 027 strains partly explains the 35-fold increase in reported incidence of CDI in the United Kingdom in the last decade.
In a recent study, we compared the genomes of a historic 027 strain (CD196, isolated in France in 1984) with a modern hypervirulent strain (R20291, isolated in 2006 and the index case of epidemic 027 infection in the UK) and showed that this modern strain has five additional genetic regions compared to its historic counterpart. Furthermore both the 027 strains have an additional 234 genes compared to C. difficile 630 (a PCR-ribotype 012 strain, isolated from a patient in Zurich, Switzerland in 1982) and the only other reported full genome sequence of a C. difficile strain.8 The implications of these studies are that the additional genes may account for the marked increase in disease capability (gain-of-trait-function). However, in bacteria there are other mechanisms of genetic variation, and perhaps counter intuitively gene re-arrangements and gene loss can be equally important in the evolution of virulence.9 In this addendum we take a closer look at the 027 genome sequence data to reveal potential point mutations and inversions, which could contribute to 027 hypervirulence.
C. difficile Point Mutations
Prior to our sequence analysis it was known that some important genes in C. difficile have undergone point mutations. These include multiple mutations in the actin-specific ADP-ribosylating toxin in several strains, point mutations in the negative toxin regulator particularly in 027 strains and point mutations in the gyrA gene of fluoroquinolone-resistant strains.10 Re-analysis of the 630, R20291 and CD196 genomes identified 39 coding sequences (CDSs) present in all three isolates, in which at least one orthologue contained an apparent inactivating mutation (Table 1). 13 CDSs contained interruptions in all three isolates e.g., CD1809 putative MDR efflux pump. 12 inactivating point mutations were specific to 630 [e.g., CD1388 putative transcriptional regulator (Fig. 1 and Suppl. Fig. 1)], four specific to R20291 (e.g., CDR20291_2368 putative competence membrane protein) and four specific to CD196 (e.g., CD196_0481 chemotaxis protein methyltransferase cheR). Five interruptions were conserved in both 027 isolates (e.g., CDR20291_1656/CD196_1681 putative membrane protein precursor [Fig. 2 and Suppl. Fig. 2)]. Interruptions were due to either frame shifts (23/39) or point mutations resulting in introduction of stop codons (four ochre, nine opal and two amber). A single CDS was functional in R20291, contained a frame shift due to an additional adenosine in CD196 (AAT GCA to AAT AGC A) and was disrupted by a copy of transposase-like protein B in 630 (Fig. 3). Interestingly bclA1, which is fully functional in 630, contained both a point mutation and loss of repeats in both 027 isolates. In order to confirm this was not an error of sequence assembly, PCR and sequencing analysis was performed (unpublished).
Table 1.
630 | Interruption | R20291 | Interruption | CD196 | Interruption | Function/annotation |
CD0541 | F | CDR20291_0466 | F | CD196_0481 | FS | chemotaxis protein methyltransferase (cheR) |
CD1181 | F | CDR20291_1019 | F | CD196_1041 | FS | malonyl coa-acyl carrier protein transacylase (fabD) |
CD2501 | F | CDR20291_2393 | F | CD196_2346 | FS | putative hydrolase |
CD2541 | F | CDR20291_2428 | F | CD196_2381 | FS | sodium:dicarboxylate symporter family protein |
CD0182 | F | CDR20291_0183 | FS | CD196_0195A | F | putative membrane protein precursor |
CD1045 | F | CDR20291_0901 | FS | CD196_0922 | F | putative membrane protein |
CD1678A | F | CDR20291_1576 | FS | CD196_1601 | F | hypothetical protein |
CD2475 | F | CDR20291_2368 | FS | CD196_2321 | F | putative competence membrane protein |
CD0682 | F | CDR20291_0608 | FS (1) | CD196_0626 | FS (1) | putative sodium:solute symporter |
CD2126 | F | CDR20291_2033 | FS | CD196_1990 | FS | putative membrane protein precursor |
CD0332 | F [Q/caa] | CDR20291_0337 | Ochre (1) [taa] | CD196_0351 | Ochre (1) [taa] | putative exosporium glycoprotein |
CD1761 | F [Q/caa] | CDR20291_1656 | Ochre [taa] | CD196_1681 | Ochre [taa] | conserved hypothetical protein |
CD0672 | F [G/gga] | CDR20291_0595 | Opal [tga] | CD196_0613 | Opal [tga] | putative uncharacterized protein |
CD0157 | FS | CDR20291_0156 | F | CD196_0169 | F | putative membrane protein |
CD0348 | FS | CDR20291_0353 | F | CD196_0367 | F | conserved hypothetical protein |
CD0525 | FS | CDR20291_0451 | F | CD196_0465 | F | putative aminobenzoyl-glutamate transporter |
CD1388 | FS | CDR20291_1234 | F | CD196_1257 | F | putative transcriptional regulator |
CD1426 | FS | CDR20291_1273 | F | CD196_1296 | F | putative isochorismatase |
CD1982 | FS | CDR20291_1907 | F | CD196_1864 | F | conserved hypothetical protein |
CD2267 | FS | CDR20291_2166 | F | CD196_2123 | F | putative membrane-associated caaX amino terminal protease |
CD3020 | FS | CDR20291_2856 | F | CD196_2809 | F | conserved hypothetical protein |
CD3156A | FS | CDR20291_3008 | F | CD196_2961 | F | conserved hypothetical protein |
CD3185 | FS | CDR20291_3041 | F | CD196_2995 | F | conserved hypothetical protein |
CD3674 | FS | CDR20291_3534 | F | CD196_3488 | F | methyltransferase (putative glucose inhibited division protein B) |
CD0196 | FS | CDR20291_0197 | FS | CD196_0209 | FS | conserved hypothetical protein |
CD0440A | FS | Not annotated | (FS) | Not annotated | (FS) | regulatory protein (partial) |
CD1718 | I | CDR20291_1617 | F | CD196_1642 | FS | putative hydantoinase |
CD1990A | Amber | Not annotated | (Amber) | Not annotated | (Amber) | putative regulatory protein |
CD0857 | Amber [tag] | CDR20291_0787 | F [S/tcg] | CD196_0806 | F [S/tcg] | oligopeptide ABC transporter, ATP-binding protein |
CD3611 | Ochre [taa] | CDR20291_3450 | F [Q/caa] | CD196_3404 | F [Q/caa] | putative multidrug resistance protein |
CD0858 | Ochre | CDR20291_0788 | Ochre | CD196_0807 | Ochre | putative transcription antiterminator |
CD1741 | Opal | CDR20291_1638 | (Opal) | CD196_1663 | (Opal) | sarcosine reductase complex component b beta subunit. |
CD1809 | Opal | CDR20291_1704 | (Opal) | CD196_1729 | (Opal) | putative multi-drug resistance efflux pump |
CD2351 | Opal | CDR20291_2239 | (Opal) | CD196_2193 | (Opal) | glycine reductase complex component B gamma subunit. |
CD2352 | Opal | CDR20291_2240 | Opal | CD196_2194 | Opal | glycine/sarcosine/betaine reductase complex component A. |
CD2362 | Opal | CDR20291_2249 | Opal | CD196_2203 | Opal | putative aliphatic sulfonates ABC transporter, permease protein |
CD2496 | Opal | CDR20291_2388 | (Opal) | CD196_2341 | (Opal) | selenide, water dikinase |
CD3241 | Opal | CDR20291_3101 | Opal | CD196_3055 | Opal | proline reductase |
CD3317 | Opal | CDR20291_3179 | Opal | CD196_3133 | Opal | formate dehydrogenase H (fdhF) |
Homologous CDSs between 630 (PCR-ribotype 012) and two PCR-ribotype 027s; CD196 (historic) and R20291 (epidemic). F, Uninterrupted CDS; FS, frame shift; Amber, point mutation (pm) resulting in a TAG stop codon; Ochre, TAA stop codon point mutation; Opal, TGA selenocysteine incorporation codon; I, interruption due to insertion of transposase-like protein B; (), different/missing CDS annotation but >98% amino acid identity (including interruption) was present in genome sequence, [X/xxx] indicates amino acid (X) and DNA sequence (xxx) present in uninterrupted CDS, [xxx] indicates sequence in interrupted CDS; (1), also truncated due to loss of repeats.
C. difficile Putative Phase Variation
Recently, confirmation of the first example of phase variation has been demonstrated in C. difficile (strain 630).11 Expression of cwpV (CD0514) that encodes a surface protein (CwpV) is switched on or off via DNA inversion by a site-specific recombinase.11 Comparative analysis of the three genomes revealed putative inversions. Three intergenic inversions, including the cwpV inversion (Fig. 4), were detected in all three strains, present in both orientations (Table 2). Interestingly C. difficile inversion (Cdi) 1 was annotated in the ‘off’ position in 630 and R20291 but ‘on’ in CD196 (Fig. 4). The other two additional inversions were located upstream of putative signaling proteins. Cdi2 was located 872 bp upstream of CD0757/CDR20291_0685/CD196_0704 (Fig. 5), in the same orientation in 630 and R20291, but inverted in CD196. However, no left inverted repeat (LIR) or right inverted repeat (RIR) were identified. Cdi3 was located 64 bp upstream of CD1616/CDR20291_1514/CD196_1539 (Fig. 6). Cdi3 was inverted in both R20291 and CD196 compared to 630. The presence of these inversions indicates the possibility that phase variation is an important mode of genetic regulation. The absence of similarity between the LIR/RIR of the two inversions suggests that at least two invertases are responsible for these inversions and potentially a different mechanism for Cdi2. In addition to CD1167, the recombinase responsible for inverting cwpV,11 and a number of unconserved transposon- or phage-related recombinases, there are at least three other tyrosine recombinases conserved in the three genomes (CD1222, CD1333 and CD1932). Tyrosine recombinases have previously been shown to be associated with phase-variable inversions in Bacteroides fragilis.12
Table 2.
Name | Gene ID 630 | Gene ID R20291 | Gene ID CD196 | LIR/RIR | Spacer | Gene function | Inversion |
Cdi1 | CD0514 | CDR20291_0440 | CD196_0454 | 5′-TTTTAATTCTAAAGGcTACTT 5′-AAGTAtCCTTTAGAATTAGAA | 195 bp | Cell surface protein (cwpV) | Inverted in CD196 |
Cdi2 | CD0757 | CDR20291_0685 | CD196_0704 | none | 178 bp | Putative signaling protein | Inverted in CD196 |
Cdi3 | CD1616 | CDR20291_1514 | CD196_1539 | 5′-CATTTCTTGTAAAATGGATAGTTT 5′-AAACTATCCATTTACAAGAAATG | 215 bp | Putative signaling protein | Inverted in R20291 & CD196 |
Analysis of three C. difficile genomes identified three conserved intergenic inversions. Two of the three inversion sites are flanked by inverted repeats, Cdi1 has LIR (left inverted repeat) and RIR (right inverted repeat) as described by Emerson et al. found upstream of cwpV. Cdi3 contains a novel set of inverted repeats, but Cdi2 has no identifiable repeats. Inversion = orientation relative to C. difficile 630 genome sequence.
Discussion
Genetic changes, such as inversions and point mutations, are key mechanisms for genetic variability in bacteria. Often these mechanisms for genetic variation are under estimated, due to difficulty in detection, compared to horizontal gene transfer and gain-of-trait functions. We show that inversions are located upstream of genes encoding for putative signaling and cell surface proteins. The Cdi1 inversion is located downstream of the promoter (PcwpV), suggesting that phase variation in this instance is based on intrinsic terminator formation, resulting in switching off transcription.11 For Cdi3 the position of the promoter for the upstream CDS is unknown, therefore this inversion may affect transcription of the downstream genes in a number of different ways: it could function in the classical way to flip a promoter contained within the LIR/RIR, such as in E. coli (reviewed in ref. 13) or may function in a similar way to cwpV by forming a transcriptional terminator.11
Closer analysis of the genomes has revealed putative disruptions of many genes through small mutations that have resulted in either frame shifts or premature stop codons. Opal stop codons may alternatively encode for the insertion of selenocysteine and therefore may result in fully functional proteins; however this also requires a selenocysteine incorporation sequence (SECIS) element in close proximity and our analysis did not identify any SECIS signatures. In E. coli, selenocysteine requires constitutively expressed selABCD, with selC encoding a unique tRNA species.14,15 DNA BLAST using selC did not identify any homologues in the three C. difficile strains. However, selABD are present (e.g., CD2495, CD2493 & CD2496 respectively in 630), and all opal stop codons are conserved in all three strains, suggesting that these are indeed functional proteins with the insertion of selenocysteine. This indicates that the selC homologue and SECIS signatures in C. difficile are too divergent from the E. coli sequences to be recognized by DNA comparisons. The majority of interruptions are due to frame shifts (23/39) with 11 occurring in homopolymeric adenosine tracts. Although both 027 sequences were derived using 454 sequencing technology, which can introduce errors in homopolymeric tracts this sequence data was confirmed using Solexa sequencing technology, which usually negates the problems associated with 454 sequencing technology.16 Furthermore, seven of the frame shifts identified in homopolymeric tracts occur in 630, that was sequenced using the dye terminator method, which does not have this problem with polymeric tracts. Given the dual approach to sequencing these genomes, this suggests that these variations maybe genuine and that C. difficile may undergo phase variation by slipped-strand mispairing. However, experimental validation is required.
The re-analysis of the three genome sequences suggests that C. difficile has altered its gene content and functionality to significantly affect adaptation and emergence of hypervirulent strains.
Acknowledgements
We acknowledge the Wellcome Trust for funding this research.
Footnotes
Previously published online: www.landesbioscience.com/journals/gutmicrobes/article/11870
Supplementary Material
References
- 1.Bartlett JG. Clostridium difficile: history of its role as an enteric pathogen and the current state of knowledge about the organism. Clin Infect Dis. 1994;18:265–272. doi: 10.1093/clinids/18.supplement_4.s265. [DOI] [PubMed] [Google Scholar]
- 2.O'Connor JR, Johnson S, Gerding DN. Clostridium difficile infection caused by the epidemic BI/NAP1/027 strain. Gastroenterology. 2009;136:1913–1924. doi: 10.1053/j.gastro.2009.02.073. [DOI] [PubMed] [Google Scholar]
- 3.Goorhuis A, Van der Kooi T, Vaessen N, Dekker FW, Van den Berg R, Harmanus C, et al. Spread and epidemiology of Clostridium difficile polymerase chain reaction ribotype 027/toxinotype III in The Netherlands. Clin Infect Dis. 2007;45:695–703. doi: 10.1086/520984. [DOI] [PubMed] [Google Scholar]
- 4.Hubert B, Loo VG, Bourgault AM, Poirier L, Dascal A, Fortin E, et al. A portrait of the geographic dissemination of the Clostridium difficile North American pulsed-field type 1 strain and the epidemiology of C. difficile-associated disease in Quebec. Clin Infect Dis. 2007;44:238–244. doi: 10.1086/510391. [DOI] [PubMed] [Google Scholar]
- 5.Loo VG, Poirier L, Miller MA, Oughton M, Libman MD, Michaud S, et al. A predominantly clonal multi-institutional outbreak of Clostridium difficile-associated diarrhea with high morbidity and mortality. N Engl J Med. 2005;353:2442–2449. doi: 10.1056/NEJMoa051639. [DOI] [PubMed] [Google Scholar]
- 6.Mooney H. Annual incidence of MRSA falls in England, but C. difficile continues to rise. Bmj. 2007;335:958. doi: 10.1136/bmj.39388.597569.DB. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Redelings MD, Sorvillo F, Mascola L. Increase in Clostridium difficile-related mortality rates, United States, 1999-2004. Emerg Infect Dis. 2007;13:1417–1419. doi: 10.3201/eid1309.061116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sebaihia M, Wren BW, Mullany P, Fairweather NF, Minton N, Stabler R, et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet. 2006;38:779–786. doi: 10.1038/ng1830. [DOI] [PubMed] [Google Scholar]
- 9.Pallen MJ, Wren BW. Bacterial pathogenomics. Nature. 2007;449:835–842. doi: 10.1038/nature06248. [DOI] [PubMed] [Google Scholar]
- 10.Drudy D, Kyne L, O'Mahony R, Fanning S. gyrA mutations in fluoroquinolone-resistant Clostridium difficile PCR-027. Emerg Infect Dis. 2007;13:504–505. doi: 10.3201/eid1303.060771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Emerson JE, Reynolds CB, Fagan RP, Shaw HA, Goulding D, Fairweather NF. A novel genetic switch controls phase variable expression of CwpV, a Clostridium difficile cell wall protein. Mol Microbiol. 2009;74:541–556. doi: 10.1111/j.1365-2958.2009.06812.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Weinacht KG, Roche H, Krinos CM, Coyne MJ, Parkhill J, Comstock LE. Tyrosine site-specific recombinases mediate DNA inversions affecting the expression of outer surface proteins of Bacteroides fragilis. Mol Microbiol. 2004;53:1319–1330. doi: 10.1111/j.1365-2958.2004.04219.x. [DOI] [PubMed] [Google Scholar]
- 13.Henderson IR, Owen P, Nataro JP. Molecular switches—the ON and OFF of bacterial phase variation. Mol Microbiol. 1999;33:919–932. doi: 10.1046/j.1365-2958.1999.01555.x. [DOI] [PubMed] [Google Scholar]
- 14.Sandman KE, Tardiff DF, Neely LA, Noren CJ. Revised Escherichia coli selenocysteine insertion requirements determined by in vivo screening of combinatorial libraries of SECIS variants. Nucleic Acids Res. 2003;31:2234–2241. doi: 10.1093/nar/gkg304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sandman KE, Noren CJ. The efficiency of Escherichia coli selenocysteine insertion is influenced by the immediate downstream nucleotide. Nucleic Acids Res. 2000;28:755–761. doi: 10.1093/nar/28.3.755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Aury JM, Cruaud C, Barbe V, Rogier O, Mangenot S, Samson G, et al. High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics. 2008;9:603. doi: 10.1186/1471-2164-9-603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.