Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2002 Jul;184(14):3886–3897. doi: 10.1128/JB.184.14.3886-3897.2002

Patterns of Sequence Conservation in the S-Layer Proteins and Related Sequences in Clostridium difficile

Emanuela Calabi 1, Neil Fairweather 1,*
PMCID: PMC135169  PMID: 12081960

Abstract

Clostridium difficile is the etiological agent of antibiotic-associated diarrhea. Among the factors that may play a role in infection are S-layer proteins (SLPs). Previous work has shown these to consist mainly of two components, resulting from the cleavage of a precursor encoded by the slpA gene. The high-molecular-weight (MW) subunit is related both to amidases from B. subtilis and to at least another 28 gene products in C. difficile strain 630. To gain insight into the functions of the SLPs and related proteins, we have further investigated the pattern of variability both at the slpA locus and at six nearby paralogs. Sequencing of the slpA gene from an S-layer group II strain and a variant S-layer group strain confirms a high degree of divergence in the low-MW SLP, which may result from diversifying selection. A highly conserved motif, however, is found at the C terminus in all low-MW subunits and may be essential for SlpA precursor cleavage. In strain 167, a variant cleavage product is present, suggesting a secondary processing site. Southern blotting analysis shows slpA-like open reading frames (ORFs) 2 to 7 to be conserved in all nine strains tested, with one exception: ORF2, which encodes a 66-kDa polypeptide coextracted at low pH with the main SLPs in strain 630, may be partially deleted in strain 167. Polymorphism within the slpA-ORF7 cluster may be more pronounced in the region proximal to the slpA gene. Unexpectedly, a high-MW subunit probe cross hybridizes to sequences outside the slpA locus, which appear to vary in number in different strains.


Clostridium difficile, a gram-positive, spore-forming anaerobe, is now recognized as the etiological agent of antibiotic-associated diarrhea, which can cause significant morbidity in hospitalized patients (1, 4, 15). Infection by C. difficile is believed to be facilitated by the administration of antimicrobials, which leads to a disruption of the normal colonic flora and the production of two potent toxins, A and B (29). However, a number of additional factors have been suggested to play a role in infection, including flagella (27), fimbriae (2), degradative enzymes (25), and cell surface proteins that may mediate adhesion or interactions with the immune system (10, 30). One of these cell wall components is the S-layer, a predominant surface-associated structure found in many prokaryotes and primarily characterized by a regularly ordered organization in paracrystalline arrays that is readily apparent by electron microscopy (24).

Biochemical studies have shown that the C. difficile S-layer consists of two main protein components, one of 32 to 38 kDa (low-molecular-weight S-layer protein [low-MW SLP]) and a second of 42 to 48 kDa (high-MW SLP) (6, 7, 14). Strains have been divided into two groups: group I, producing SLPs of 32 and 45 to 48 kDa, and group II, producing SLPs of 38 and 42 kDa (14). The lower-MW SLP is the antigen most consistently recognized by patients with C. difficile disease (7). Recent cloning and sequencing analysis has shown that both subunits are encoded by a single gene, termed slpA (5, 13). The precursor SlpA polypeptide starts with a typical signal sequence that targets the nascent chain for translocation across the cytoplasmic membrane. Cleavage of the precursor to yield the two mature SLPs must occur either within or external to the cytoplasmic membrane. DNA sequence analysis of slpA genes from a collection of strains has yielded two notable results (5, 13). First, the low-MW subunit displays a remarkable degree of variability among isolates. Second, a search of the incomplete genome sequence (http://www.sanger.ac.uk/Projects/C_difficile) has identified a large family of open reading frames (ORFs) (paralogs) in C. difficile strain 630 that are related to the amino acid sequence of the high-MW subunit. This amino acid sequence is ∼45% homologous (including conservative replacements) to two cell wall-bound proteins of Bacillus subtilis, an N-acetylmuramoyl-l-alanine amidase (CWLB/LytC) and its enhancer (CWBA/LytB). The sequence homology has a functional correlate, since the C. difficile high-MW SLP subunit shows amidase activity (5). By analogy with B. subtilis, it has been suggested that the homology domain mediates anchoring to the cell wall and therefore identifies a class of cell wall components (13). Consistent with this, many slpA paralogs encode a typical signal sequence, indicating that they are secreted or membrane bound. Of the 29 slpA paralogs identified so far, 12 map in a densely arranged cluster surrounding slpA and are all transcribed in the same direction, suggesting the possibility of coordinated regulation and related functions. We have shown that the six slpA-like genes immediately 3′ of slpA (ORFs 2 to 7) are transcribed during vegetative growth (5). One of these genes corresponds to Cwp66, a putative adhesin (13), while the functions of the others remain uncharacterized.

In this paper, we have further investigated the pattern of sequence conservation among C. difficile isolates both at the slpA locus and over the gene cluster 3′ of slpA. Our results raise new questions as to the role of the low-MW SLP. In addition, the conservation of slpA paralogs supports the notion that they play a fundamental role in C. difficile physiology.

MATERIALS AND METHODS

Bacterial strains and growth.

C. difficile strains 630, 17, 1, and Y were described previously (5). Strains 101, 167, 291, 371, and 959 are clinical isolates positive for cytotoxin and were obtained from C. Kelly, Harvard Medical School, Boston, Mass. The strains were cultured under anaerobic conditions in brain heart infusion broth and were maintained as described previously (5).

PCR amplification, cloning, and DNA sequencing.

The slpA genes from strains 167 and Y were PCR amplified using the forward primer 5′-ATTCTATGTACATAATAAAGAGATGT-3′ in combination either with the reverse primer 5′-ATTAACTCCACCAGCTAAATAAAC-3′ or the reverse primer 5′-ACCTTCACCAGTTTTCAT-3′ under the conditions previously described (5). The products were cloned into Escherichia coli TG1 using the M13 vector tg131 (16) and were sequenced by MWG Biotech. Recombinant techniques were performed as described previously (23).

N-terminal protein sequencing.

SLPs were acid extracted as previously described (5), fractionated on sodium dodecyl sulfate (SDS)-10% polyacrylamide gel electrophoresis (PAGE), and blotted onto a polyvinylidene difluoride membrane (Immobilon-P; Amersham) in 10 mM CAPS (3[cyclohexylamino]-1-propanesulfonic acid; pH 11)-10% methanol for 1 h at 70 V. The blots were stained in 0.1% Ponceau S-1% acetic acid. N-terminal sequencing was carried out at the Protein and Nucleic Acid Chemistry Facility, Department of Biochemistry, University of Cambridge.

Southern blotting analysis.

Probes were generated by PCR on genomic DNA from strain 630. The primers were as follows: (i) for the slpA low-MW subunit, 5′-TAAGCCATGGCAACTACTGGAACA-3′ and 5′-AGGCTCGAGTGATTTAGTTTCTAATC-3′; (ii) for the slpA high-MW subunit, 5′-TAAGCCATGGCAAATGATACAA-3′ and 5′-AGGCTCGAGCATATCTAATAAA-3′; (iii) for ORF2 to -7, previously described ORF-specific primers (5); (iv) for the 3′ region of ORF3, 5′-ATGCAGAAATAGAAGGTGGA-3′ and 5′-TTCAAGAAATGGCTCTTCAT-3′.

PCRs were carried out in 30-μl volumes with ∼1 μg of genomic DNA, 0.1 μM each primer, 2.5 mM MgCl2, and 0.2 U of Taq DNA polymerase using the following cycle: 45 s at 95°C, 60 s at a temperature 5°C below the calculated melting temperature for the primers, and 60 s at 72°C, repeated 30 times. The expected PCR product was purified using the PCR purification kit from Qiagen according to the manufacturer's instructions.

Probes were labeled using the ECL direct labeling and detection system (Amersham) according to the manufacturer's instructions.

Approximately 5 μg of genomic DNA was digested in a 20-μl total reaction volume and fractionated on 0.7% agarose gels in Tris-acetate-EDTA buffer. The gels were acid treated, denatured, neutralized, and blotted to Hybond-N+ membranes. After UV cross-linking, they were prehybridized for >1 h at 42°C in Gold Buffer (Amersham) prior to the addition of probe. Hybridizations were for ∼18 h at 42°C and were followed by three washes in 1× SSC-0.1% SDS (20 min per wash). Detection was by the enhanced-chemiluminescence method.

Nucleotide sequence accession numbers.

The nucleotide sequences of the slpA genes from strains 167 and Y were deposited in the GenBank database under accession numbers AF478570 and AF478571.

RESULTS

Structural analysis of the SLPs from strains Y and 167.

Acid-extracted SLPs from C. difficile strain Y contain two main proteins with molecular masses of ∼41 and 37 kDa in approximately equimolar amounts (Fig. 1A), which corresponds to a group II pattern (14). In contrast, strain 167 yields one major band at ∼39 kDa and a minor band at ∼43 kDa, in a ratio of approximately 4:1 based on Coomassie blue staining. Additional weaker bands are visible at 20 and 22 kDa and in the 33- to 35-kDa region (Fig. 1B). However, there is no substantial component with mobility comparable to those of the low-MW subunits from other strains (Fig. 1A). In order to clarify the nature of the variant SLPs, the slpA genes from strains 167 and Y were further characterized. PCR amplification was carried out using primers based on the sequence of strain 630 and spanning the whole slpA coding sequence, from 59 bp 5′ of the ATG to 559 bp 3′ of the stop codon. Products were obtained from both strains, indicating conservation of primer sequences. Strain Y yielded a fragment of ∼2,900 bp, and strain 167 yielded a fragment of ∼2,400 bp (data not shown). These compare with a size of 2,778 bp for the PCR product from strain 630. In order to estimate the size of the low-MW subunit coding sequence, a second PCR was carried out using the same forward primer and a reverse primer (NF129) that maps immediately downstream of the sequence encoding the N terminus of the high-MW subunit. Fragments of ∼1,600 and ∼1,100 bp were obtained from strains Y and 167 compared to 1,523 bp from 630. These preliminary data are consistent with the observation that the size of the low-MW SLP is larger in strain Y than in S-layer group I strains whereas the high-MW SLPs are of similar sizes. In strain 167, a significant deletion is suggested within the low-MW SLP.

FIG. 1.

FIG. 1.

SDS-PAGE analysis of SLPs from the strains characterized in this paper. Samples were prepared by low-pH extraction and run on 10% (A and C) or 12.5% (B) gels. The solid arrowheads point to the 66-kDa SLP-like ORF2 product in strain 630 (A) and to two polypeptides close to the molecular mass predicted for the low-MW subunit in 167 (B). The open arrowheads (A and B) point to the positions of the 43-kDa polypeptide from strain 167. The positions of molecular mass standards (in kilodaltons) are indicated on the right.

In order to determine the structures of the slpA genes from both strains in greater detail, PCR fragments were cloned and completely sequenced. The slpA ORF spans 2,301 bp in Y and 1,830 bp in 167. The predicted amino acid sequences are shown in Fig. 2, aligned with the previously reported SlpA precursor sequences (5, 13). In both strains, a signal peptide is evident by comparison to the previously characterized sequences. The sites of processing of the precursor were investigated by N-terminal sequencing of the 41- and 37-kDa proteins from strain Y and the major 39-kDa and minor 43-kDa proteins from 167. Unequivocal assignments were made for 13 to 15 residues in each case, which are shown in Fig. 2. In each case, a perfect match was found to predicted amino acid sequences. These data show the low-MW (N-terminal) and high-MW (C-terminal) SLPs from strain Y to be 351 and 392 amino acids, respectively, with estimated masses of 37,833 and 41,272 Da, in close agreement with the values determined by SDS-PAGE analysis (Fig. 1).

FIG.2.

FIG.2.

Alignment of deduced SlpA precursors from the indicated strains. The alignment was produced with ClustalX (version 1.81), using the Blosum 30 matrix, a gap opening penalty of 10, and a gap extension penalty of 0.1. Shading was applied with the program GeneDoc, allowing default similarity groups. Solid backgrounds indicate 100% conservation, and shaded backgrounds indicate >60% conservation. The determined N-terminal sequences of the low-MW and high-MW subunits from strain Y and the high-MW subunit and the 43-kDa protein from 167 are boxed. The brackets underline three imperfect-repeat units in the high-MW subunit sequences. The cleavage sites of the precursor in strains 1, 17, 630, and Y are indicated by arrowheads. Asterisks and numbers above the sequences mark every 10th residue.

In strain 167, the N terminus of the predominant 39-kDa band was revealed to be at position 207, consistent with this protein being the high-MW SLP. The protein is predicted to contain 404 amino acids, with a mass of 42,390 Da. This would correspond to an appreciably slower mobility in SDS-PAGE than was determined experimentally, suggesting either abnormal electrophoretic behavior or a posttranslational modification. Cleavage of the SlpA precursor at residue 207 to release the 39-kDa protein would be expected to result in a small N-terminal protein of ∼20 kDa. While a weak band of ∼20 kDa which might correspond to this protein is visible in the SLP preparation from strain 167 (Fig. 1B), it is present in significantly smaller amounts than the 39-kDa protein. Thus, although the strain 167 SlpA precursor contains a sequence predicted to correspond to a low-MW subunit of 182 amino acids (approximately 20 kDa) and is processed to release the 39-kDa high-MW subunit, either the low-MW subunit is largely degraded or it is not efficiently extracted under the conditions that we have used.

N-terminal sequencing of the minor 43-kDa protein revealed it to match perfectly residues 25 to 37 of the SlpA precursor, showing this protein to be derived from the N terminus of the precursor protein after removal of the signal sequence. However, the 43-kDa protein cannot be the uncleaved precursor, which has a predicted MW of 62,312.

Based on the primary sequence alone, the C terminus of the 43-kDa component is estimated to lie around position 450 in the SlpA precursor. The sequences surrounding this region and the N termini of high-MW subunits were compared to identify patterns suggestive of a conserved cleavage site. Little conservation is apparent among the N termini of high-MW subunits, except that five out of six have Ala as the N-terminal residue. However, conservation is readily apparent in the sequences immediately upstream of the cleavage sites. A highly conserved motif is found in all low-MW subunits (positions 328 to 345 of strain 630 [Fig. 2 and 3A ]). In strains 167 and Y, this motif extends for a further three residues, which are conserved between these two strains. These motifs may be recognized in an enzymatic-cleavage step. If this is the case, variant processing enzymes that have coevolved with the slpA gene may exist in different strains. No such motif is found at the predicted C terminus of the 43-kDa protein from strain 167. However, in strain 167, a different conserved motif is found ∼50 residues C terminal to the predicted processing sites of both the 43-kDa and the 39-kDa proteins. This second, 17-amino-acid-long motif is part of the first and third of three ∼120-amino-acid repeat units shared between the SLP high-MW subunit and the products of the B. subtilis CWLB and CWBA genes (Fig. 2). The copy found in the second repeat unit is more distantly related. Of all the other five sequenced SlpA precursors, only in Y is this motif similarly conserved between the first and third repeat (Fig. 3B).

FIG. 3.

FIG. 3.

Sequence conservation among SlpA precursors. (A) Regions around the cleavage site between low-MW and high-MW subunits of six strains. (B) Alignment of a 17-amino-acid-long motif derived from each of the three imperfect repeats spanning the high-MW SLP sequence (Fig. 2). The most N-terminal of these motifs lies ∼50 residues C-terminal of the cleavage site between high-MW and low-MW subunits. The motif G(D/E)DRXXTXXX(L/I/V)XXXYY is shared among repeats I from all strains and repeats III from 167 and Y but not with repeat III from other strains or with repeat II from any strain. The numbers on the left of each line refer to strains, and those on the right refer to residue positions relative to the initiation codon. Solid backgrounds highlight positions conserved in all sequences, and shaded backgrounds highlight other conserved positions.

Identities of other surface-associated proteins.

Most S-layer preparations isolated from various strains by the low-pH method contain an ∼66-kDa component (Fig. 1A), although this is not found in strain 167 (see below). For strain 630, we have determined the N-terminal sequence of this protein to be AETTQVKKET. A BLASTX search of the whole unfinished C. difficile genome showed an exact match to the gene product encoded by ORF2, an slpA paralog located ∼3 kbp downstream of slpA and transcribed during vegetative growth (5). The gene product starts with a putative 23-amino-acid-long signal sequence that is highly related to that of the SlpA precursor. ORF2 contains 623 amino acids and, like the SlpA precursor, is 50% similar in amino acid sequence over its C-terminal region (residues 284 to 623) to a domain present in the B. subtilis N-acetylmuramoyl-l-alanine amidase CWLB/LytC and its enhancer, CWBA/LytB (18) (Fig. 4A). Consistent with this, we have previously shown in a zymogram assay that this protein has amidase activity (5). A BLASTP search with the upstream, N-terminal region identifies significant homologies (39% amino acid similarity) to both the ∼125-kDa SLP of Bacillus sphaericus (3) and the transducer of rhodopsin (Htr) II from the archaebacterium Natronobacterium pharaonis (26) (Fig. 4B and C). Different sets of amino acids are conserved in the two pairwise comparisons. Of the 129 positions shared with N. pharaonis Htr-II, 18 are also conserved in the signal domain of bacterial transducers like Tsr, including methylation sites.

FIG.4.

FIG.4.

Sequence homology between the predicted product of slpA-like ORF2 and the SlpA high-MW subunits from strain 630, CWLB (N-acetylmuramoyl-l-alanine amidase) from B. subtilis (BS-Ami) (GenBank accession number Q02114) and CWBA (amidase enhancer) from B. subtilis (BS-Mod) (GenBank accession number Q02113) (A), the ∼125-kDa SLP of B. sphaericus (BacSph) (GenBank accession number P38537) (B), and Htr-II from N. pharaonis (GenBank accession number P42259) (C). The alignments in panel A were produced with ClustalX (version 1.81), using the Blosum 30 matrix, a gap opening penalty of 10, and a gap extension penalty of 0.1. The alignments in panels B and C were produced with pairwise BLASTP on the National Center for Biotechnology Information server (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html), using a gap opening penalty of 11, a gap extension penalty of 1, and no filter. Solid backgrounds highlight positions conserved in all sequences, and shaded backgrounds highlight other conserved positions. #, residues shared with the signal domain of bacterial Tsr transducers (26). Asterisks and numbers above the sequences mark every 10th residue.

Pattern of variability in slpA.

The degree of relatedness between the SlpA protein sequences was estimated by calculating distance scores in each pairwise comparison with the program GeneDoc, using the Blosum 62 distance matrix (Table 1). Even excluding the hypothetical strain 167 component, low-MW subunits are little conserved, except over their COOH termini (data not shown). It has been argued that a 78-amino-acid sequence at the N termini of the low-MW subunits in strains C253 (SlpA sequence identical to that of 630 [13]) and 79-685 is related to the SLH (S-layer homology) domain found in SLPs from unrelated species (13). However, this sequence is not well conserved in different strains (Fig. 2). In contrast, in the high-MW subunits, out of 263 positions at which all six different sequences can be aligned, 213 (81%) are highly conserved. Based on the close relatedness of high-MW subunit sequences, a dendrogram was constructed by the neighbor-joining method which suggests a tentative evolutionary relationship (Fig. 5). As expected, S-layer group I sequences are more related to each other than to Y or 167 sequences. The last strain is marginally more related than Y to S-layer group I sequences.

TABLE 1.

Distance scores in pairwise comparisons between SlpA signal sequences and low-MW and high-MW subunits from six C. difficile strains

Strain Distance score
630 17 1 79-685 167 Y
Signal sequence
    630 209
    17 247 207
    1 209 247 209
    79-685 331 361 331 127
    167 257 221 257 359 207
    Y 245 209 245 359 219 207
Lower subunit
    630 2,807
    17 4,097 2,779
    1 4,944 4,910 2,798
    79-685 5,011 5,028 4,804 2,726
    167 5,738 5,689 5,636 5,545 1,597
    Y 5,923 5,960 5,760 6,056 6,347 3,054
Upper subunit
    630 3,301
    17 3,404 3,292
    1 4,795 4,776 3,654
    79-685 4,289 4,283 4,604 3,441
    167 5,175 5,178 5,583 5,367 3,584
    Y 4,965 4,964 5,421 5,086 5,475 3,474

FIG. 5.

FIG. 5.

Dendrogram depicting the relationships among SlpA high-MW subunits. The diagram was constructed by the neighbor-joining method provided with ClustalX (version 1.81) based on the alignment shown in Fig. 2.

Although the SLP low-MW subunits are highly variable in amino acid sequence, they are conserved in overall length, ranging between 316 and 321 residues. The single exception is represented by the hypothetical low-MW subunit of 167, spanning only 182 amino acids. In contrast, high-MW subunits display greater variation in length, ranging from 373 (in strain 17) to 416 (in strain 1) residues. Most of the differences in sequence length among high-MW subunits can be mapped to one of four regions, i.e., between positions 446 and 449, 595 and 604, 684 and 701, and 732 and 761 in the multiple alignment shown in Fig. 2.

Variability within the slpA gene cluster.

The genome of C. difficile strain 630 contains at least 28 paralogs encoding polypeptides containing ∼45% amino acid similarity to the SlpA precursor (5, 13), which we have provisionally named ORFs 2 to 29, pending the annotation of the genome sequence (http://www.sanger.ac.uk/Projects/C_difficile). A number of these are closely linked to the slpA locus: slpA-like ORFs 2 to 7 are within 21 kbp 3′ of slpA, and slpA-like ORFs 8 to 12 are within 17 kbp 5′ of slpA. We have also shown that ORFs 2 to 7 are transcribed in cells during vegetative growth (5). The functions of these slpA-like genes are presently unknown. As a first step towards investigating it, we examined the extent of their variability by Southern blotting analysis.

ORF-specific probes were designed from the region of each ORF immediately upstream of the amidase homology domain (AHD), keeping to a minimum the overlap with the domain. DNA sequence comparison showed no significant cross homology among probes. Additional probes were made that spanned the whole slpA high-MW and low-MW subunit coding sequences, as well as from the sequence encoding the C-terminal region of ORF3.

Each probe was used on a panel of nine different strains, consisting of four strains in addition to 630, 17, 1, Y, and 167: strains 101 and 371 were isolated from patients with C. difficile-associated diarrhea, whereas strains 291 and 959 were isolated from asymptomatic carriers (17). SDS-PAGE analysis showed all of these four strains to belong to S-layer group I based on their SLP patterns (Fig. 1C). For each strain, two different restriction enzyme combinations were used, chosen on the basis of the sequence from strain 630, to yield the best compromise between a convenient size range of fragments and maximum resolution of individual ORFs.

Figure 6A shows the positions of restriction sites within the slpA locus of each strain based on the DNA sequence. Since, except for strain 630, the sequence has been determined only for the coding sequence and short flanking regions, in most cases the lengths of restriction fragments produced in each digestion are not known, although lower limits can be predicted. Regions showing >70% homology and therefore expected to hybridize to the strain 630 probes are indicated in the figure. Figure 7A shows a complete restriction map of the region spanning from slpA to ORF7 in strain 630. The results of the Southern analysis are shown in Fig. 6B and 7B. In general, fragments hybridizing strongly with each probe closely match the sizes expected from the corresponding locus. There are, however, a few notable exceptions. (i) In strain 630, the low-MW subunit probe strongly hybridizes to an additional 9-kbp HincII/PvuII fragment which is only very weakly positive with the high-MW subunit probe, as is visible on long exposures (data not shown). This fragment cannot be accounted for on the basis of the existing sequence. (ii) In strain 17, no hybridization to the predicted 0.4-kbp HindIII/PvuII fragment is observed with the low-MW subunit probe. This is most likely due to a modification of the upstream PvuII site (position 287; see below). (iii) In strain 17, a 5.5-kbp HindIII/PvuII fragment and a 9-kbp HincII/PvuII fragment can be assigned to the 5′ end of the slpA gene (upstream of the PvuII site at position 287). While a fragment of the same size as the former is positive with the high-MW subunit probe, the 9-kbp HincII/PvuII fragment is not. Thus, it is likely that the 5.5-kbp HindIII/PvuII fragments hybridizing to the high-MW and to the low-MW probes are derived from different loci. (iv) In strain Y, a 0.8-kbp HincII/PvuII fragment is observed instead of the expected 0.6-kbp fragment. This may be explained by the upstream PvuII site (position 1497) being resistant to digestion under the high-salt conditions used for mixed HincII/PvuII digestions. (v) In 630, larger-than-expected fragments are positive with the ORF4 probe due to the PvuII site downstream of the probe being resistant to digestion, likely due to methylation. Thus, identical positive fragments are observed in digests from which PvuII has been omitted, and they match the size predicted for HindIII or HincII fragments spanning the probe (Fig. 7B). (vi) Lack of detection of some fragments is explained by low homology to the strain 630 sequence used as a probe. This applies, in strain 1, to the fragment upstream of the HindIII site at position 847 (∼50% homology), in strain Y to the 0.4-kbp HindIII/PvuII fragment (∼45% homology), and in strain 167 to the 0.8-kbp PvuII/PvuII fragment (∼43% homology). In addition to 630, fragments hybridizing to the low-MW subunit probe are only found in strains 17, 1, and 371, the last two giving identical patterns.

FIG. 6.

FIG. 6.

(A) Deduced restriction maps of the slpA locus from the five strains for which DNA sequences have been determined. The horizontal lines represent the DNA sequence, with the thick bar corresponding to the slpA ORF. The vertical lines extending above the sequence indicate HindIII sites, those below the sequence indicate HincII sites, and those both above and below the sequence indicate PvuII sites. The solid horizontal bars indicate >70% homology to the high-MW subunit probe, and the shaded horizontal bars indicate >70% homology to the low-MW subunit probe. The numbers over the arrows indicate the size of the relevant restriction fragment in base pairs. (B) Southern blotting analysis of genomic DNA from C. difficile strains with an slpA low-MW subunit probe (left) and with an slpA high-MW subunit probe (right). Pv, PvuII; Hc, HincII; Hd, HindIII.

FIG. 7.

FIG. 7.

(A) Deduced restriction maps of the region spanning the slpA gene and slpA-like ORF2 to -7 in strain 630. The thin horizontal line represents the DNA sequence, with thick bars corresponding to the ORFs. The solid segments indicate the AHD. The vertical lines extending above the sequence indicate HindIII sites, those below the sequence indicate HincII sites, and those both above and below the sequence indicate PvuII sites. The solid horizontal bars under the sequence indicate the position of each probe. The numbers over the arrows indicate the size of the relevant restriction fragment in kilobase pairs. (B) Southern blotting analysis of genomic DNAs from C. difficile strains with probes from the indicated slpA-like ORFs. Independent blots were used for each probe, except for ORF3-5′ and ORF3-3′. Pv, PvuII; Hc, HincII; Hd, HindIII.

In the case of the high-MW subunit probe, strongly hybridizing fragments are found in all strains. With the few exceptions mentioned above, these fragments can be fully accounted for by the slpA gene in the five strains for which sequences are available. By analogy, they must also be derived from the slpA locus in the remaining four strains. Fragments hybridizing more weakly to the same probe must correspond to less homologous sequences mapping outside the slpA locus. The number of these fragments varies between one and four. Six different patterns are observed among the nine strains, one consisting of three strains (101, 291, and 959), one of two strains (1 and 371), and the others of single strains. A similar grouping is also apparent from the results with probes for ORF2 to ORF6, the probe for ORF7 giving similar patterns in all strains. The natures of the sequences weakly homologous to the high-MW subunit probe remain to be established. In the case of 630, the 4.3-kbp HincII/PvuII fragment and the 2.5-kbp HindIII/PvuII fragment may correspond to ORF2.

The probe for ORF2 does not detect any homologous sequence in 167. This may indicate a gene deletion, since the 66-kDa polypeptide encoded by this ORF is absent in SLP preparations from 167 (Fig. 1A).

The probe for the 5′ region of ORF3 (ORF3-5′) detects multiple fragments. A major band is detected in each digest or strain, which in 630 matches the size predicted for ORF3. The additional bands are stronger in 630, 101, 291, and 959 than in the other strains, and strain Y gives an overall weaker signal. The basis for this pattern is unclear, since a BLAST search of the whole unfinished 630 genome does not reveal any significant homology outside ORF3. Although the probe spans a HincII site in 630, the portion upstream of the site extends for only 39 bp, and it is unlikely to contribute to any signal. When the same blot was hybridized to a probe corresponding to the 3′ end of ORF3 (ORF3-3′), strong, unique signals were detected only in 630 and 17. These results are consistent with previously reported dot blot hybridization data showing the 3′ end of the Cwp66/ORF3 gene to be less conserved than the 5′ end in a panel of 36 strains (30).

We have identified polymorphic variants by using probes for each of the slpA-like ORFs 2 to 6. Allelic variability seems to be restricted to the 5′ end of the region under investigation. No polymorphism was mapped in or close to ORF7. In addition, probes for ORF5 and -6 give invariant PvuII/HindIII fragments in all strains. While size polymorphism is detected with these two probes on PvuII/HincII digests, both probes hybridize to the same PvuII/HincII fragment in each strain. This suggests that the PvuII/HincII fragment polymorphism detected by the ORF5 and -6 probes is likely due to sequence variation at the 5′ site.

In all ORFs, sequences immediately 5′ of the AHD are conserved across strains. However, like the slpA low-MW subunit coding sequence, the ORF3-3′ sequence shows very limited DNA conservation. It remains to be seen whether this is also reflected in lower conservation at the amino acid sequence level.

DISCUSSION

Both the high- and low-MW C. difficile SLPs show considerable variation in molecular mass. Initial comparison of eight strains identified two patterns: group I, corresponding to masses of 45 to 47 and 32 kDa, and group II, corresponding to masses of 42 and 38 kDa for the high- and low-MW subunits. All of the slpA genes sequenced so far derive from group I strains (5, 13). In order to determine the structural constraints on the SLPs, we have extended the sequence analysis to two further strains: strain Y, a group II strain, and strain 167, which gives a variant pattern, consisting of a single major SLP component and other minor bands.

Our data show that the group II SLPs are less related to any of the previously sequenced group I SLPs than the latter are among themselves, which supports the hypothesis that group II belongs to a distinct class. Divergence is particularly striking over the low-MW SLPs. While sequence conservation is readily apparent in the high-MW SLPs and, to a lesser extent, over the C-terminal ∼50 amino acids of the low-MW SLPs, the remaining portions of the low-MW SLPs show little conservation. In pairwise comparisons between Y and group I strains, identities over the low-MW SLPs range from 34 to 40%. In contrast, the high-MW SLPs are 70 to 71% identical. This pattern of sequence variability in the low-MW SLPs is intriguing and shows impressive analogies with that of flagellins. In the flagellins, the central segment is highly variable, corresponding to surface-exposed regions and giving rise to a well-known class of antigenic determinants, the H antigens (11, 22, 28).

The sequence variability of the low-MW SLPs may simply reflect the lack of functional constraints. An alternative hypothesis is that it confers an evolutionary advantage, mediating escape from immune recognition and the ability to reinfect hosts. Consistent with the latter model, evidence has been presented that the variability of flagellins results from positive selection for amino acid replacements, i.e., diversifying selection, rather than from the absence of negative selection (19, 22). It is tempting to speculate that the high variability in the C. difficile low-MW SLPs reflects the operation of similar mechanisms. This subunit carries the dominant antigenic epitopes, as shown by Western blotting analysis of S-layer preparations using human sera (21). The high degree of divergence between the low-MW subunit coding regions of the group II and any of the group I strains prevents the calculation of meaningful rates of nucleotide change. However, within group I, rates of nonsynonymous change are much higher over the low than over the high-MW subunit sequences and exceed the rates of synonymous change in some strain pairs. By analogy with flagellins, this may reflect diversifying selection. As to the mechanisms responsible for this high mutation rate, work with gram-negative enterobacteria has shown evidence for horizontal gene transfer and multiple recombination events (11, 19, 22). We have searched the genome of C. difficile strain 630 for evidence of intragenomic-recombination events that might give rise to one of the known SLP variants. While none was found, the possibility remains that sequences from other strains, or even other species, may act as donors.

The hypervariable sequence within the low-MW subunit spans a stretch of ∼80 amino acids at the N terminus that in group I strains shows a distant relationship to the SLH domain (13). This domain has been implicated in anchoring S-layers to the cell wall in several species by binding either to the peptidoglycan or to secondary cell wall polymers (8, 12, 20). Since this sequence is not conserved in other C. difficile strains, it is unlikely to mediate binding of the low-MW SLPs to the cell surface.

The C-terminal region of the low-MW subunit does show a recognizable pattern of conserved residues. At least part of this pattern may be due to a requirement for proteolytic processing. Although the mechanism by which the high-MW and low-MW SLPs are released from the SlpA precursor has not been elucidated, cleavage is unlikely to represent an artifact of the isolation procedure, since identical subunit size, yield, and stoichiometry are obtained using different methods of extraction. Furthermore, cleavage does not occur when the SlpA precursor is expressed in a heterologous system, e.g., Lactococcus lactis or E. coli (E. Calabi and N. Fairweather, unpublished data). An amino acid motif N-terminal to the cleavage site is conserved in all strains. Interestingly, with respect to this motif, cleavage occurs in slightly different positions in group I strains than in Y and 167. Thus, the motif may reflect the binding constraints of the catalytic site of a specific peptidase, which may have evolved in different strains in parallel with the SLP sequences.

In strain 167, a different pattern of cleavage of the SlpA precursor is evident. The predominant protein produced is 39 kDa, representing the high-MW SLP. Very little, if any, of the predicted 20-kDa low-MW SLP is found at the cell surface. However, an additional component of 43 kDa is present which extends from the N terminus of the mature SlpA precursor to a putative secondary cleavage site within the precursor. The low levels of the 20- and 43-kDa proteins may be due either to instability and consequent rapid degradation of these polypeptides or to their inability to be anchored to the cell wall. In either case, their presence at the cell surface in the same molar ratio as the 39-kDa protein is clearly not required either for cellular viability or pathogenicity. It remains to be established whether one or both of the S-layer lattices identified in C. difficile are missing in this strain.

One of the most remarkable outcomes of the identification of the slpA gene has been the finding in strain 630 of a large family of related genes, many of which cluster around the slpA locus. As a prerequisite to further analysis of this family, we felt it important to establish whether its members are conserved across strains. To this end, we have used Southern blotting analysis of the slpA-like gene cluster downstream of slpA. While polymorphic variants have been identified for ORFs 2 to 6, allelic variability seems to be restricted to the 5′ end of the region. Since most of the polymorphisms that we have examined are within coding sequences in strain 630, it is expected that these variants will be expressed at the protein level. The patterns of polymorphism seem to be linked across loci. Two additional significant conclusions emerge from our data. First, all seven slpA-like ORFs that we tested were found to be represented in every strain, with one exception. No signal was obtained with ORF2 in strain 167, consistent with the absence of a 66-kDa polypeptide in a low-pH extract. This suggests that the corresponding region is deleted. In all other cases, the signals obtained in different strains show minimal variation in strength, suggesting comparable degrees of homology with respect to 630. Second, a probe corresponding to the high-MW subunit coding sequence shows cross hybridization to a limited number of sequences outside the slpA locus. This result was unexpected, since while they are related at the protein level, homologies between slpA-like family members at the nucleotide level are below the threshold (∼65%) required to result in significant hybridization under our experimental conditions. It remains to be established whether the sequences cross hybridizing to the slpA high-MW probe correspond to any of the known slpA-like family members or to yet-undiscovered chromosomal or extrachromosomal loci. However, it is interesting that their numbers appear to vary in different strains.

The occurrence of up to nine SLP-encoding ORFs sharing a short (∼180-nucleotide) homology domain and arranged in a tight cluster has been reported in the genome of Campylobacter fetus (9). In this case, however, there is only a single slp promoter and only one sequence is expressed at any given time, the others serving as a reservoir for recombination. It remains to be seen whether recombination can occur between different slpA-like loci in C. difficile. Moreover, all six slpA homologues studied are transcribed, and the product of at least one is also present at the cell surface. Transcription of these genes must occur from separate promoters, since intergenic primers do not give positive results on reverse transcription-PCR (data not shown).

Acknowledgments

We thank C. Kelly and L. Kyne for supplying some of the strains in this study, F. Calabi for helpful discussions, and the Sanger Centre for release of genome sequences prior to publication.

E.C. was supported by a fellowship from the Blanceflor Boncompagni Ludovisi Foundation.

REFERENCES

  • 1.Bartlett, J. G. 1992. Antibiotic-associated diarrhea. Clin. Infect. Dis. 15:573-581. [DOI] [PubMed] [Google Scholar]
  • 2.Borriello, S. P., H. A. Davies, S. Kamiya, P. J. Reed, and S. Seddon. 1990. Virulence factors of Clostridium difficile. Rev. Infect. Dis. 12:S185-S191. [DOI] [PubMed]
  • 3.Bowditch, R. D., P. Baumann, and A. A. Yousten. 1989. Cloning and sequencing of the gene encoding a 125-kilodalton surface-layer protein from Bacillus sphaericus 2362 and of a related cryptic gene. J. Bacteriol. 171:4178-4188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brazier, J. S., and S. P. Borriello. 2000. Microbiology, epidemiology and diagnosis of Clostridium difficile infection. Curr. Top. Microbiol. Immunol. 250:1-33. [DOI] [PubMed] [Google Scholar]
  • 5.Calabi, E., S. Ward, B. Wren, T. Paxton, M. Panico, H. Morris, A. Dell, G. Dougan, and N. Fairweather. 2001. Molecular characterization of the surface layer proteins from Clostridium difficile. Mol. Microbiol. 40:1187-1199. [DOI] [PubMed] [Google Scholar]
  • 6.Cerquetti, M., A. Molinari, A. Sebastianelli, M. Diociaiuti, R. Petruzzelli, C. Capo, and P. Mastrantonio. 2000. Characterization of surface layer proteins from different Clostridium difficile clinical isolates. Microb. Pathog. 28:363-372. [DOI] [PubMed] [Google Scholar]
  • 7.Cerquetti, M., A. Pantosti, P. Stefanelli, and P. Mastrantonio. 1992. Purification and characterization of an immunodominant 36-Kda antigen present on the cell-surface of Clostridium difficile. Microb. Pathog. 13:271-279. [DOI] [PubMed] [Google Scholar]
  • 8.Chauvaux, S., M. Matuschek, and P. Beguin. 1999. Distinct affinity of binding sites for S-layer homologous domains in Clostridium thermocellum and Bacillus anthracis cell envelopes. J. Bacteriol. 181:2455-2458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dworkin, J., and M. J. Blaser. 1997. Molecular mechanisms of Campylobacter fetus surface layer protein expression. Mol. Microbiol. 26:433-440. [DOI] [PubMed] [Google Scholar]
  • 10.Eveillard, M., V. Fourel, M. C. Barc, S. Kerneis, M. H. Coconnier, T. Karjalainen, P. Bourlioux, and A. L. Servin. 1993. Identification and characterization of adhesive factors of Clostridium difficile involved in adhesion to human colonic enterocyte-like Caco-2 and mucus-secreting HT29 cells in culture. Mol. Microbiol. 7:371-381. [DOI] [PubMed] [Google Scholar]
  • 11.Harrington, C. S., F. M. Thomson-Carter, and P. E. Carter. 1997. Evidence for recombination in the flagellin locus of Campylobacter jejuni: implications for the flagellin gene typing scheme. J. Clin. Microbiol. 35:2386-2392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ilk, N., P. Kosma, M. Puchberger, E. M. Egelseer, H. F. Mayer, U. B. Sleytr, and M. Sara. 1999. Structural and functional analyses of the secondary cell wall polymer of Bacillus sphaericus CCM 2177 that serves as an S-layer-specific anchor. J. Bacteriol. 181:7643-7646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Karjalainen, T., A. J. Waligora-Dupriet, M. Cerquetti, P. Spigaglia, A. Maggioni, P. Mauri, and P. Mastrantonio. 2001. Molecular and genomic analysis of genes encoding surface-anchored proteins from Clostridium difficile. Infect. Immun. 69:3442-3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kawata, T., A. Takeoka, K. Takumi, and K. Masuda. 1984. Demonstration and preliminary characterization of a regular array in the cell-wall of Clostridium difficile. FEMS Microbiol. Lett. 24:323-328. [Google Scholar]
  • 15.Kelly, C. P., C. Pothoulakis, and J. T. LaMont. 1994. Clostridium difficile colitis. N. Engl. J. Med. 330:257-262. [DOI] [PubMed] [Google Scholar]
  • 16.Kieny, M. P., R. Lathe, and J. P. Lecocq. 1983. New versatile cloning and sequencing vectors based on bacteriophage M13. Gene 26:91-99. [DOI] [PubMed] [Google Scholar]
  • 17.Kyne, L., M. Warny, A. Qamar, and C. P. Kelly. 2000. Asymptomatic carriage of Clostridium difficile and serum levels of IgG antibody against toxin A. N. Engl. J. Med. 342:390-397. [DOI] [PubMed] [Google Scholar]
  • 18.Lazarevic, V., P. Margot, B. Soldo, and D. Karamata. 1992. Sequencing and analysis of the Bacillus subtilis LytRABC divergon—a regulatory unit encompassing the structural genes of the N-acetylmuramoyl-l-alanine amidase and its modifier. J. Gen. Microbiol. 138:1949-1961. [DOI] [PubMed] [Google Scholar]
  • 19.Li, J., K. Nelson, A. C. McWhorter, T. S. Whittam, and R. K. Selander. 1994. Recombinational basis of serovar diversity in Salmonella enterica. Proc. Natl. Acad. Sci. USA 91:2552-2556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mesnage, S., E. Tosi Couture, and A. Fouet. 1999. Production and cell surface anchoring of functional fusions between the SLH motifs of the Bacillus anthracis S-layer proteins and the Bacillus subtilis levansucrase. Mol. Microbiol. 31:927-936. [DOI] [PubMed] [Google Scholar]
  • 21.Pantosti, A., M. Cerquetti, F. Viti, G. Ortisi, and P. Mastrantonio. 1989. Antibody response to Clostridium difficile determined by western blot analysis. Microecol. Ther. 18:303-309. [Google Scholar]
  • 22.Reid, S. D., R. K. Selander, and T. S. Whittam. 1999. Sequence diversity of flagellin (fliC) alleles in pathogenic Escherichia coli. J. Bacteriol. 181:153-160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
  • 24.Sara, M., and U. B. Sleytr. 2000. S-layer proteins. J. Bacteriol. 182:859-868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Seddon, S. V., and S. P. Borriello. 1992. Proteolytic activity of Clostridium difficile. J. Med. Microbiol. 36:307-311. [DOI] [PubMed] [Google Scholar]
  • 26.Seidel, R., B. Scharf, M. Gautel, K. Kleine, D. Oesterhelt, and M. Engelhard. 1995. The primary structure of sensory rhodopsin-II—a member of an additional retinal protein subgroup is coexpressed with its transducer, the halobacterial transducer of rhodopsin-II. Proc. Natl. Acad. Sci. USA 92:3036-3040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tasteyre, A., M. C. Barc, T. Karjalainen, P. Dodson, S. Hyde, P. Bourlioux, and P. Borriello. 2000. A Clostridium difficile gene encoding flagellin. Microbiology 146:957-966. [DOI] [PubMed] [Google Scholar]
  • 28.Tasteyre, A., T. Karjalainen, V. Avesani, M. Delmee, A. Collignon, P. Bourlioux, and M. C. Barc. 2000. Phenotypic and genotypic diversity of the flagellin gene (fliC) among Clostridium difficile isolates from different serogroups. J. Clin. Microbiol. 38:3179-3186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.von Eichel Streiber, C., P. Boquet, M. Sauerborn, and M. Thelestam. 1996. Large clostridial cytotoxins—a family of glycosyltransferases modifying small GTP-binding proteins. Trends Microbiol. 4:375-382. [DOI] [PubMed] [Google Scholar]
  • 30.Waligora, A. J., C. Hennequin, P. Mullany, P. Bourlioux, A. Collignon, and T. Karjalainen. 2001. Characterization of a cell surface protein of Clostridium difficile with adhesive properties. Infect. Immun. 69:2144-2153. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES