Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2000 Jul;182(14):3989–3997. doi: 10.1128/jb.182.14.3989-3997.2000

vrrB, a Hypervariable Open Reading Frame in Bacillus anthracis

James M Schupp 1, Alexandra M Klevytska 1, Guenevier Zinser 1, Lance B Price 1, Paul Keim 1,*
PMCID: PMC94584  PMID: 10869077

Abstract

Bacillus anthracis appears to be the most molecularly homogeneous bacterial species known. Extensive surveys of worldwide isolates have revealed vanishingly small amounts of genomic variation. The biological importance of the resting-stage spore may lead to very low evolutionary rates and, perhaps, to the lack of potentially adaptive genetic variation. In contrast to the overall homogeneity, some gene coding regions contain hypervariability that is translated into protein variation. During marker analysis of diverse strains, we have discovered a novel ca. 750-nucleotide open reading frame (ORF) that contains in-frame, variable-number tandem-repeat sequences. Four distinct variable regions exist within vrrB, giving rise to 11 distinct alleles in eight different length categories among B. anthracis strains. This ORF putatively codes for a 241- to 265-amino-acid protein, rich in glutamine (13.2%), glycine (23.4%), and histidine (23.0%). The variable-region amino acids of the vrrB ORF are strongly hydrophilic. Coupled with putative transmembrane domains flanking the variable regions, this suggests a membrane-anchored cytosolic or extracellular location for the putative protein. Sequence analysis of the complete ORFs from three Bacillus cereus strains shows maintenance of the ORF across species boundaries, including strong conservation of the amino acid sequence and the capacity to vary among strains. The presence of 11 different alleles of the vrrB locus is in stark contrast to the near homogeneity of B. anthracis. Evolution of hypervariable genes can negate the lack of genetic variability in species such as B. anthracis and provide select rapid evolution in other more variable species.


Bacillus anthracis is found throughout the world in a wide range of environments and in a wide variety of large mammalian hosts. The pathogen is thought to have evolved very recently, in the last 10,000 to 20,000 years, from a Bacillus cereus or Bacillus thuringiensis strain that fortuitously acquired the two anthrax virulence plasmids. There is limited evidence suggesting that B. anthracis can replicate as a free-living soil bacterium under optimal conditions of high soil moisture, alkaline pH, and sufficient nutrient availability, although this has yet to be demonstrated conclusively (25). In most environments, B. anthracis more likely lies dormant in the soil as a spore between deadly infections. However, upon infection, rapid vegetative clonal expansion occurs within the host until the host dies or overcomes the microbe. Once dead, the host's body fluids, containing high numbers of the offending bacilli, are leaked into the surrounding soil, setting up the next infection cycle, whether it be within hours, days, or years.

The evolutionary change of organisms usually entails three distinct processes: mutation, recombination, and selection. Generation time is an important parameter for mutation and selection. Mutations, which provide raw genetic variation, generally occur during DNA replication and are frequently measured on a per-generation basis. Bacterial genetic recombination occurs via the exchange of genetic material through phages and other mobile genetic elements, creating novel gene combinations. Selection acts on mutational and recombinational changes to influence differential propagation of genetic types and is greatly influenced by the number of generations involved. In general, evolutionary change will increase with an increasing number of generations.

In B. anthracis, the period of vegetative expansion within the host represents the mostly likely time for evolutionary change as mutation rates, recombination, and selective pressure will be greatly reduced during the resting spore stage. Because no propagation is occurring in the spore stage, differential loss of mortality can be most affected by selection there. Environmental selection is certainly present at all B. anthracis growth stages, but it must have genetic variation upon which to act. Genetic recombination among B. anthracis strains, or even with other species, must be relatively rare given the explosive and short nature of the B. anthracis vegetative growth stage. Phylogenetic analysis of plasmid and chromosomal sequences found no evidence of horizontal transfer (an obvious form of recombination) of a virulence plasmid (pX01) among diverse strains (20). Genetic variation generated by mutation would appear to be a limiting factor for B. anthracis evolution and adaptation.

In many bacteria, it has been shown that variable-number tandem repeats (VNTRs) contained within genes and nongenic regions are extremely diverse (26). VNTRs have been found to affect regulation and product function in genes associated with pathogenesis in a variety of bacterial pathogens. Intragenic VNTRs have been found to affect lipopolysaccharide (LPS) phase variation in Haemophilus influenzae (28). LPS phase variation has been shown to function in immune evasion and translocation in Neisseria gonorrhoeae (27). A pentameric VNTR causing independent translational frameshifts within the members of a family of outer membrane protein genes associated with epithelial invasion has been discovered and characterized in N. gonorrhoeae. This VNTR provides a mechanism for antigenic variation or shifting (15, 19, 23). N. gonorrhoeae also exhibits LPS variation, and while a mechanism of variation has yet to be discovered, a VNTR may be involved. The M protein genes in group A streptococci have been shown to contain variable repetitive DNA elements, resulting in differential protection against phagocytosis (2). Variable repetitive DNA elements in the alpha C protein genes of group B streptococci have been shown to affect differential protection from antibody-mediated killing (14).

In B. anthracis, such VNTR gene variation has been documented previously at the vrrA locus (1, 6). Insertion and deletion events accounted for half of the 30 marker differences in an extensive survey of strains by using amplified fragment length polymorphism (AFLP) markers (10). In this report, we demonstrate that at least some of this rare AFLP variation is due to VNTRs, as the vrrB locus was first detected as a five-allele AFLP marker (10). Upon sequence characterization, a complex repetitive region was found within a large open reading frame (ORF). A total of 11 alleles were found within eight different size classes, resulting from combinations of 9-nucleotide insertion-deletion polymorphisms that maintain the translational reading frame. This VNTR variation is of great use in B. anthracis typing and may also provide a source of genetic differences for evolutionary change in this highly homogeneous species.

MATERIALS AND METHODS

B. anthracis isolates and DNA extraction and purification.

Isolates were obtained from different sources (Table 1). The isolates were cultured, and DNA was extracted and purified as previously described (7).

TABLE 1.

B. anthracis isolatesd

vrrB allele Public name Phylogenetic distributionc Geographic distribution
A 2PTa A1.a Italy
93/197b A3.a Namibia
Pak-2a NA Pakistan
B MOZ-3b B2 Mozambique
C A46b B1 Germany
D BA0015b A2.a Argentina
E Volluma A4 England
Amesa A3.b Iowa
#70b A2.b Scotland
ASC-3a A4 United Kingdom
F A24a B1 Slovakia
G BA0052a A2.b Jamaica
Zim69a A3.a Zimbabwe
93/33a A3.a Namibia
#4/6a A3.a Turkey
H-1a A3.a South Korea
K8a A3.c Kruger, South Africa
93-212c-2b A1.a Canada
F1a A3.a South Korea
H BA1015a A3.a Maryland
#2/6a A1.a Turkey
I J611b A3.b Indonesia
Dompub A3.b Indonesia
J #109b B2 North Carolina
K BA1035a B2 South Africa
B286/76a B2 Norway
BA1018a B2 South Africa
#83b B2 South Africa
a

The hypervariable regions were sequenced from these isolates. 

b

The entire vrrB ORF was sequenced for these isolates. 

c

According to the system in reference 12

d

All isolates were kindly provided by M. E. Hugh-Jones, Department of Epidemiology and Community Health, College of Veterinary Medicine, Louisiana State University, Baton Rouge. 

AFLP fragment extraction and sequencing.

All reagents were obtained from Life Technologies, Inc., Gaithersburg, Md., unless otherwise noted. AFLP analysis was performed as previously described (10). Polymorphic AFLP EcoRI-MseI C/T +1/+1 fragments (10) corresponding to four different alleles, ca. 600 bp in size, were extracted from a dried 6% polyacrylamide gel, amplified, and sequenced as previously described (21).

Isolation of entire vrrB ORF and flanking regions.

Ligation-mediated suppression PCR was used to obtain the entire vrrB ORF and surrounding regions as previously described (21). The upstream-oriented locus-specific primer was CT600u1 (5′-CCCATTGATGTAGGCATTCCTG-3′), and the downstream-oriented primer was CT600d1 (5′-ATCAACAACAATCTTCACCTTGGG-3′).

PCR amplification.

The hypervariable regions from 24 isolates were amplified as follows. Five nanograms of genomic DNA, 40 pmol of primer vrrBHR1f (5′-ATAGGTGGTTTTCCGCAAGTTATTC-3′), 40 pmol of primer vrrBHR2r (5′-CCCAAGGTGAAGATTGTTGTTGA-3′), 100 μM (each) dinucleoside triphosphate (dNTP), 2 mM MgCl2, 10 μl of 10× PCR buffer, 5 U of Taq DNA polymerase, and double-distilled H2O (ddH2O) were added to a final volume of 100 μl. The reaction mixtures were incubated at 94°C for 3 min and then cycled at 94°C for 30 s, 65°C for 20 s, and 72°C for 20 s for 35 cycles, with a final 72°C incubation for 2 min. The amplification products were purified using a Qiaquick PCR purification kit (Qiagen Inc., Valencia, Calif.) and sequenced on an ABI 377 fluorescent sequencer using the PCR amplification primers.

The entire ORF from each of 10 diverse isolates (Table 1) was amplified by PCR as follows. Twenty picomoles of primer VRRBCODF1 (5′-ACTTCCGAAAGAATATGTAGAAGGTT-3′), 20 pmol of primer VRRBCODR1 (5′-GAGTTTTATGCAAGAAGAGCTAGAAGA-3′), 5 ng of genomic DNA, 200 μM (each) dNTP, 2 mM MgCl2, 5 μl of 10× PCR buffer, 2 U of Taq DNA polymerase, and ddH2O were added to a final volume of 50 μl. The reaction mixtures were incubated at 94°C for 3 min and then cycled at 94°C for 30 s, 63°C for 30 s, and 72°C for 60 s for 45 cycles, with a final 72°C incubation for 5 min. The amplification products were purified and sequenced as described above with the use of an additional internal forward primer (VRRBF2; 5′-AGAAGCGGAATTCCAATACAGAC-3′).

A large portion of each of the vrrB ORFs from three American Type Culture Collection (ATCC) strains of B. cereus (B. cereus 11778, B. cereus 31293, and B. cereus 43881) was amplified as follows. Twenty picomoles of primer vrbF3 (5′-GGATGGACAATAATGCACCAC-3′), 20 pmol of primer vrbR3-1 (5′-AACGTCAACGCCAAGAAGC-3′), 5 ng of genomic DNA, 200 μM (each) dNTP, 2 mM MgCl2, 5 μl of 10× PCR buffer, 2 U of Taq DNA polymerase, and ddH2O were added to a final volume of 50 μl. The reaction mixtures were incubated at 94°C for 3 min and then cycled at 94°C for 30 s, 55°C for 30 s, and 72°C for 60 s for 45 cycles, with a final 72°C incubation for 5 min. The amplification products were purified and sequenced as described above.

DNA sequence analysis.

Multiple sequence alignment analysis (CLUSTAL) and dot plot similarity analysis was performed with the MEGALIGN subroutine in the LASERGENE software package (DNASTAR Inc., Madison, Wis.). The multiple sequence alignment analysis was optimized to conserve nucleic acid residues. Dot plot similarity analysis was performed with a 9-nucleotide or 3-amino-acid window with an 80 or 100% similarity requirement, respectively.

Predictive structure of the encoded protein.

The structure of the putative vrrB protein encoded by the ORF containing the VNTRs was predicted using the PROTEAN subroutine in the LASERGENE software package. This subroutine uses the Chou-Fasman (3) and the Garnier-Robson (5) algorithms for predicting alpha, beta, and turn regions, the Garnier-Robson algorithm for predicting coil regions, the Kyte-Doolittle (13) algorithm for predicting hydrophilicity, the Emini et al. (4) algorithm for predicting surface probability, the Karplus-Schultz (9) algorithm for predicting flexibility, and the Jameson-Wolf (8) algorithm for predicting antigenicity. The TMPRED algorithm (http://www.ch.embnet.org /software/TMPRED_form.html) was used to predict putative transmembrane regions.

Nucleotide sequence accession numbers.

The GenBank accession number for the complete vrrB ORF and surrounding sequence from B. anthracis isolate 83 is AF238885. Accession numbers for the vrrB hypervariable-region sequences from select B. anthracis isolates shown in Table 1 are AF238889 (2PT), AF238890 (MOZ-3), AF238891 (A46), AF238892 (BA0015), AF238893 (Vollum), AF238894 (A24), AF238895 (Zim69), AF238896 (BA1015), AF238897 (J611), AF238898 (109), and AF238899 (BA1035). The accession numbers for the partial vrrB ORF sequences from B. cereus strains ATCC 11778, ATCC 31293, and ATCC 43881 are AF238886, AF238887, and AF238888, respectively.

RESULTS

Identification and characterization of vrrB.

A highly polymorphic and apparently allelic set of AFLP DNA fragments (fragment ECO-C/MSE-T) identified in a strain diversity study (10) were characterized by DNA sequencing. Figure 1 shows a 795-nucleotide ORF (vrrB) from isolate 83 (Table 1), which contains a hypervariable region of tandemly repeated sequences (see below), as well as the conserved surrounding sequences. Comparison of the complete ORFs from 10 diverse isolates revealed three single-nucleotide polymorphisms (SNPs), two of which resulted in amino acid changes: alanine-75 to threonine-75 and histidine-123 to glutamine-123 (Fig. 1). All three SNPs were found to be in linkage disequilibrium in the 10 strains examined and hence defined only two alleles. However, even three SNPs represent highly significant variation in B. anthracis, as the 5′ portion of vrrB and the 5′ intergenic regions did not vary among 25 diverse strains (L. Price, unpublished data).

FIG. 1.

FIG. 1

Nucleotide sequence of the B. anthracis vrrB ORF. The entire nucleotide sequence for the largest vrrB allele (K) is presented along with flanking regions containing the leuS gene (5′) and an unidentified ORF (3′). The deduced amino acid sequence of the vrrB ORF is shown below the nucleotide sequence. Known nucleotide differences among B. anthracis strains are shown above the nucleotide sequence. Putative amino acid changes within the vrrB ORF are shown below the amino acid sequence. The opposed arrows beneath the nucleotide sequence indicate regions of possible stem-loop structures. The arrows above the nucleotide sequence indicate ORFs. Putative RBS as well as start and stop codons are indicated by boxes.

Putative ribosome binding sites (RBS) are found 8 and 9 bases upstream of the first and second methionine residues of the vrrB ORF, respectively. No other possible start codons were found to have putative RBS. None of a number of Bacillus subtilis promoter sequences associated with different sigma factors, including housekeeping genes and sporulation genes (16), were found upstream of the vrrB ORF. A strong stem-loop structure just downstream of the vrrB ORF stop codon may be a ρ-independent transcription termination signal. Another possible stem-loop structure 48 nucleotides upstream of the first possible vrrB start codon may act as an expression regulation signal. A BLAST search of GenBank with the deduced vrrB amino acid sequence found no similarity with any publicly available protein sequence, including the complete B. subtilis genome.

We discovered ORFs both upstream and downstream from the vrrB ORF (Fig. 1). The upstream flanking ORF is oriented in the same direction as the vrrB ORF. In this case, GenBank BLAST searches revealed very strong deduced amino acid sequence similarity with the 3′ end of the B. subtilis leucyl-tRNA synthetase gene (leuS; GenBank accession no. M88581). A strong stem-loop structure just downstream of the B. anthracis leuS gene is presumably the ρ-independent transcription termination signal. The downstream ORF was found in the orientation opposite that of the vrrB ORF (Fig. 1). A BLAST search revealed no sequence similarity with any known protein or ORF. Transcription of this ORF may be terminated by the same putative vrrB ORF ρ-independent transcription termination signal, but from the opposite direction.

Comparison of hypervariable regions among B. anthracis strains.

We have found that the hypervariable region is due to VNTRs. This region was amplified from 24 diverse B. anthracis strains and then sequenced. A CLUSTAL multiple-sequence alignment optimized for base conservation revealed eight different size classes and multiple alleles within three of the size classes (Fig. 2). Altogether, we have identified the 11 alleles shown in Fig. 2. The vrrB ORF sizes in the different alleles range from 723 to 795 nucleotides. All allele sizes are in increments of 9 nucleotides except for alleles I and J, which are separated in size by 18 nucleotides. The hypervariable region appears to consist of a series of 9-nucleotide degenerate repeat units with a consensus sequence of CA(C8/T11)CA(A8/C7/T4)GG(A3/C3/G1/T12) or CA(A3/C1/T5)CA(A1/C4/G2/T2)CA(A2/C6/T1), with some minor variation, leading to primarily HHG or QQY amino acid repeat units. Note that the variable positions within the consensus 9-nucleotide repeat represent the third codon position.

FIG. 2.

FIG. 2

Multiple alignment of the hypervariable regions of the vrrB ORFs from all observed B. anthracis allele types. Nucleic acid sequences from the hypervariable regions of the vrrB ORFs from all 11 observed allele types were aligned, optimizing for nucleotide conservation. The shaded rectangles indicate every other degenerative 9-bp repeat. The arrows indicate the positions of primers used to amplify the vrrB1 and vrrB2 regions. The asterisks indicate the positions of point mutations. The consensus deduced amino acid sequence is shown below the nucleic acid alignment. The table at the bottom provides the amplicon size and observed frequency of each vrrB1 and vrrB1 allele. The DIs for individual and combined regions are shown at the bottom of the table. Note that the H and I combined alleles are indistinguishable by PCR analysis and that the frequency of 0.04 is for both alleles taken together.

Just upstream of the first insertion-deletion region in the hypervariable region are two of the three observed SNPs, an A-C and a C-T at positions 1017 and 1023, respectively (Fig. 1). Both of these mutations seem to be linked to particular insertion-deletion patterns. vrrB alleles A, D, E, G, H, and I have similar insertion-deletion patterns (Fig. 2) and share the A and C SNPs. vrrB alleles B, C, F, J, and K also have similar insertion-deletion patterns (Fig. 2) and share the C and T mutations. It would seem that the random generation of SNPs and insertions and deletions are correlated in the evolutionary history of these alleles.

Two related but distinct variable regions.

Repeated sequence variation occurs in two or more semiautonomous regions in the vrrB ORF. A short nonrepetitive sequence in the center of the hypervariable region was used as a PCR priming site to independently amplify each variable region (Fig. 2). PCR fragment size analysis revealed five size classes for hypervariable region one (vrrB1) and four size classes for hypervariable region two (vrrB2). The vrrB1 amplicon sizes were 183, 192, 219, 228, and 255 bp. The vrrB2 amplicon sizes were 142, 151, 160, and 169 bp. Fluorescently labeled PCR primers were used to examine 425 B. anthracis isolates for vrrB variation based upon amplicon size (12). Allele frequencies based upon this large survey for each sublocus and the entire vrrB locus are shown in Fig. 2. The 228-nucleotide vrrB1 allele and the 160-nucleotide vrrB2 allele are very common in B. anthracis strains, making up ∼83 and ∼77% of all alleles observed, respectively.

The sublocus vrrB1 contains two insertion-deletion regions, with the upstreammost 63 bp in size and the downstreammost 36 bp in size (Fig. 2). The primary variable amino acid repeat motif is HHG for both. The diversity index (DI), calculated by subtracting the sum of the squared allele frequencies from 1, is 0.30 for the vrrB1 sublocus.

The sublocus vrrB2 contains three insertion-deletion regions that are smaller than those found in vrrB1 and that range in size from 9 to 36 bp (Fig. 2). The first part of the vrrB2 region contains an HHG repeat, the second part contains a QQH, and the third part contains a QQY, indicating greater complexity than vrrB1. The vrrB2 sublocus diversity is 0.38, slightly greater than that observed for vrrB1. We have compared the repeated sequence array sizes of vrrB1 and vrrB2 and found no positive or negative correlation among insertion-deletion patterns. The diversity value for the combined regions is 0.51. Hence, the comparison of any two B. anthracis strains at the vrrB ORF has a 51% probability of revealing an allele difference.

The repetitive nature of the vrrB hypervariable region is modular and can be further broken down into four distinct regions. The subloci defined above (vrrB1 and vrrB2) are based upon our ability to design unique PCR primers, but structure within these subloci is also apparent (Fig. 2). The vrrB2 substructure is most obvious, with low similarity (5 of 9 nucleotides matching; 55%) between the regions. vrrB1, in contrast, has two regions with high similarity that are separated by a low-similarity region that is only obvious in the 100% homology dot plot analysis (Fig. 3).

FIG. 3.

FIG. 3

Dot plot homology analysis of the entire vrrB ORF from B. anthracis. Dot plot homology analysis was performed on both the nucleotide sequence (top right; 9-bp window; 80% similarity) and the deduced amino acid sequence (bottom left; 3-amino-acid window; 100% similarity). Sequence homology among repeat regions is indicated by corresponding diagonal lines. The boxes indicate regions where insertion-deletion events are found. The hypervariable regions vrrB1 and vrrB2 are indicated by brackets.

Predicted protein structure.

The dramatic size differences among vrrB alleles are translated into different sized proteins with potentially great differences in structure. The putative vrrB protein size ranges from 241 to 265 amino acids, with predicted molecular masses ranging from 24.9 to 27.8 kDa. The predicted charge, at pH 7.0, ranges from 4.6 to 7.0, with the isoelectric point varying from 7.4 to 7.5 for the different alleles. Common amino acids in the predicted vrrB protein include ∼23% glycine, 19.1 to 23% histidine, and 13% glutamine, all of which are found in the variable-repeat region. Significant codon usage bias was seen with six of seven of the most abundant amino acids. No codon bias was observed with respect to histidine. The protein secondary-structure predictions using PROTEAN (Fig. 4) suggest that the two vrrB subloci code for highly hydrophilic regions, separated by a small hydrophobic region. The 3′ subregion within vrrB2 has a slightly different amino acid composition that is evident in the hydrophilicity plot and especially in the surface probability prediction, suggesting it is the most likely portion of the protein to be exposed. The entire hypervariable region is flanked by hydrophobic regions that show strong similarity to known transmembrane regions of other proteins (by TMPRED analysis).

FIG. 4.

FIG. 4

Protein structure predictions for a putative vrrB L allele protein. Structural characteristics for the largest vrrB ORF putative protein (K) were predicted using the PROTEAN subroutine of the Lasergene software package. Putative transmembrane regions were predicted by TMPRED (http://www.ch.embnet.org/software /TMPRED_form.html). The hypervariable regions are indicated above the scale.

vrrB in B. cereus.

We have examined the phylogenetic distribution of the vrrB gene and found it only in bacteria closely related to B. anthracis. No signal was observed from Southern hybridization of vrrB probes to genomic DNAs of B. subtilis and Bacillus megaterium (data not presented). Likewise, BLAST searches of the complete B. subtilis genome found no significant similarity to vrrB. vrrB probes did, however, hybridize to B. thuringiensis, Bacillus mycoides, and B. cereus DNAs during Southern analysis (data not presented). vrrB PCR challenges successfully amplified only three B. cereus samples under our PCR conditions. We determined these B. cereus vrrB ORF sequences, minus the 5′-most 45 to 60 bases, and found essentially the same ORF with no stop codons. Strong deduced amino acid sequence conservation was also observed, with most of the nucleotide differences occurring in the third codon position (Table 2). The three B. cereus strains were nearly as different from each other (29.5% difference) as they were from B. anthracis (30.4%). In all comparisons, synonymous differences were equal to or predominant over nonsynonymous ones (Table 2). In addition, amino acid differences were generally within chemical residue categories (e.g., a neutral amino acid for a neutral amino acid). The conservation of this ORF among strains and across species boundaries argues strongly that it is a gene with a functional protein product. In addition to the sequence conservation, the hypervariable-repeat feature was also conserved across species boundaries. Many in-frame insertion-deletion events were found between B. anthracis and B. cereus, and even among the B. cereus strains. All but two were in association with the hypervariable regions observed within B. anthracis (Fig. 5). Also note that highly conserved deduced amino acid sequences are found at the 5′ end of the putative protein, at the predicted transmembrane regions, and at the short hydrophobic region separating the two hypervariable regions.

TABLE 2.

Summary of vrrB mutations in B. anthracis and three B. cereus strains

Parameter Value
B. anthracis vs. B. cereus
B. cereus vs. B. cereus
Conserved regiona Variable region Total Conserved region Variable region Total
Synonymous mutationsb 52 (31/21) 51 (45/6)  103 (76/27) 38 (20/18) 33 (28/5)  71 (48/23)
Nonsynonymous mutationsb 32 (19/13) 43 (14/29)  75 (31/44) 36 (21/15) 33 (12/21) 69 (33/36)
No. (%) identical amino acidsc 113/139 (81.3) 59/108 (54.6) 172/247 (69.6) 128/158 (81.0) 80/137 (58.4) 208/295 (70.5)
No. (%) similar amino acidscd 22/139 (15.8) 25/108 (23.1) 47/247 (19.0) 22/158 (13.9) 22/137 (16.1) 44/295 (14.9)
No. (%) different amino acidsa 4/139 (2.9) 7/108 (6.5) 11/247 (4.4) 5/158 (3.2) 5/137 (3.6) 11/295 (3.7)
a

Hypervariable regions are those indicated by horizontal arrows in Fig. 5

b

Total (transitions/transversions). 

c

Denominator indicates number of amino acids compared. 

d

Similar versus different amino acid determinations were based on physiochemical properties (24). 

FIG. 5.

FIG. 5

Comparison of vrrB amino acid sequences in B. anthracis and B. cereus. The nucleic acid sequence from the largest B. anthracis vrrB allele and from all but the 5′-most ends of the vrrB sequences from three B. cereus ATCC strains were translated into the predicted amino acid sequence and aligned using the CLUSTAL method. The shaded areas indicate similarity among the sequences. The number of asterisks above the B. anthracis sequence positions indicates the number of synonymous nucleic acid mutations found at that position. The horizontal arrows above the B. anthracis sequence indicate the hypervariable-repeat regions. The vertical arrows indicate the positions of dissimilar amino acid substitutions based upon physiochemical properties (24). The numbers at the 3′ ends of the sequences indicate the number of deduced amino acids in each sequence shown.

DISCUSSION

vrrB, the gene.

The evolutionary pattern of nucleotide differences among strains and across species is consistent with the vrrB ORF being an expressed gene. The ORF remains intact across 11 alleles within B. anthracis and three strains of B. cereus, which suggests that this gene encodes a beneficial, if not an essential, gene product. Conserved amino acid sequences, particularly at the predicted transmembrane regions, suggest important roles for these regions of the putative protein. Interestingly, we have evidence for this gene only in the type I bacilli and not in other related bacteria, such as B. subtilis. It would seem that this gene has a unique role in this group of bacteria. Likewise, it is not restricted to B. anthracis and is not likely to have a function unique to pathogens.

Assuming the vrrB ORF is an expressed gene, a number of the features further suggest a contingency role as opposed to a housekeeping role for the vrrB ORF. None of a number of B. subtilis promoter sequences associated with different sigma factors, including housekeeping genes and sporulation genes (16), were found upstream of the vrrB ORF, suggesting a nonconstitutive and nonsporulation role for this gene. A possible stem-loop structure upstream of a weak RBS could be a trans-acting regulatory target for altering the expression level of the vrrB ORF. The highly biased codon usage of the putative vrrB gene product is consistent with a highly expressed gene (22). While the previous statements are certainly very speculative, the discussed characteristics of the vrrB gene and surrounding sequence, taken together, strongly suggest a gene that is turned on and off in response, or contingent, to an environmental cue that requires a rapid response. In addition, the presence of the vrrB gene in some Bacillus strains but not others argues against an essential or housekeeping role for vrrB. Rather, it seems possible that vrrB has a unique and adaptive role.

vrrB evolution.

The presence of insertion-deletion polymorphisms among B. anthracis isolates is indicative of the homogenization mechanisms that create and maintain directly repeated sequences. It has been suggested that the homogenization mechanism is slip strand repair (26), although unequal recombination could also be acting. All of the vrrB repeats are related to some extent, though adjacent repeats tend to have greater similarity. This is consistent with a cis-dependent homogenization mechanism, which could be either slip strand repair or recombination. The triplet sequence CAX is the dominant trinucleotide pattern, which suggests that the present complex repeat structure evolved from a relatively simple (CAX)n trinucleotide array. We believe that point mutations in the third codon positions may have occurred, because of their minor affect on protein structure, and that homogenization then expanded these to adjacent repeats. Eventually, distinct 9-nucleotide repeats were created by successive rounds of mutation and homogenization. The current strong 9-nucleotide repeat structure and 9-nucleotide insertion-deletion differences among strains indicate that homogenization is currently acting on this repeat.

The distinctive repeat sequences found between and within vrrB1 and vrrB2 illustrate the highly cis-dependent nature of the homogenization process. As the repeated region grows, different parts appear to become autonomous and diverge. The vrrB1 and vrrB2 subloci are the extreme examples where a nonrepeated sequence now separates these repeat regions. A number of observations suggest that the vrrB1 and vrrB2 subloci are now independently evolving. As stated previously, the sizes of vrrB1 alleles do not covary with vrrB2 allele sizes. The DI of vrrB2 is slightly greater than that of vrrB1. A recent phylogenetic analysis of over 400 B. anthracis isolates using a battery of VNTR markers shows the vrrB1 alleles clustering in two exclusive genetic groupings, whereas at least two vrrB2 alleles are found distributed across groups (12). The latter observation is suggestive of convergent evolution within the vrrB2 sublocus. Global geographical dispersal of identical vrrB alleles within a phylogenetic group could be due to human transport in the form of contaminated bone meal or animal hides (Table 1).

Even within both subloci, two clusters of more similar repeats are observed (Fig. 3). It appears that there is an optimal size for homogenization and that array expansion beyond this limit leads to separate dynamic repeats. The relationship between rates of homogenization and mutation is crucial to the resulting structure but may be difficult to determine as little is currently known about the rates of the homogenization mechanisms.

Adaptive variation?

VNTRs have been shown to have a variety of functions in bacterial genomes, from gene expression regulation to antigenic shifting (for a review, see reference 26), and to result in variation among a number of virulence-associated genes (2, 14, 15, 19, 23, 27, 28). It has been proposed that variation within contingency genes enables genetic flexibility while maintaining overall genome integrity (17). Many bacterial contingency genes containing VNTRs have high mutation rates, which may allow for rapid adaptation to changing environmental conditions (18). Indeed, most of the variation seen to date in B. anthracis is due to VNTRs in ORFs (11, 12). The presence of VNTRs in an otherwise highly genetically monomorphic pathogen such as B. anthracis may be important for generating variation essential for adapting to various hosts and environments. Frequently, VNTRs have been discovered from the analysis of genes associated with particular phenotypic changes (e.g., antigenic shifts), but in this age of high-volume genomics, many more VNTRs are likely to be identified from genomic analysis without accompanying phenotypic phenomena. We are conducting an extensive survey of the newly available B. anthracis genome sequence for additional VNTRs.

While it is reasonable to assume that in some cases this variation will be effectively neutral, there will be many examples of biologically important changes associated with VNTRs. The future challenge will be to discover what effect these hypervariable regions have upon the biology of the organism.

ACKNOWLEDGMENTS

This work was supported by funds from NIH (RO1-GM60795), DOE (FG03-00NN20102), and The E. Raymond and Ruth Reed Cowden Endowment for Microbiology.

We thank M. E. Hugh-Jones (Department of Epidemiology and Community Health, College of Veterinary Medicine, Louisiana State University, Baton Rouge) for providing all the B. anthracis isolates used in this study.

REFERENCES

  • 1.Andersen G L, Simchock J M, Wilson K H. Identification of a region of genetic variability among Bacillus anthracis strains and related species. J Bacteriol. 1996;178:377–384. doi: 10.1128/jb.178.2.377-384.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bessen D, Jones K F, Fischetti V A. Evidence for two distinct classes of streptococcal M proteins and their relationship to rheumatic fever. J Exp Med. 1989;169:269–283. doi: 10.1084/jem.169.1.269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chou P Y, Fasman G D. Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol. 1978;47:45–148. doi: 10.1002/9780470122921.ch2. [DOI] [PubMed] [Google Scholar]
  • 4.Emini E A, Hughes J V, Perlow D S, Boger J. Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. J Virol. 1985;55:836–839. doi: 10.1128/jvi.55.3.836-839.1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Garnier J, Osguthorpe D J, Robson B. Analysis of the accuracy and implications of a simple method for predicting the secondary structure of globular proteins. J Mol Biol. 1978;120:97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
  • 6.Harrell L J, Andersen G L, Wilson K H. Genetic variability of Bacillus anthracis and related species. J Clin Microbiol. 1995;33:1847–1850. doi: 10.1128/jcm.33.7.1847-1850.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jackson P J, Walthers E A, Kalif A S, Richmond K L, Adair D M, Hill K K, Kuske C R, Andersen G L, Wilson K H, Hugh-Jones M E, Keim P. Characterization of the variable-number tandem repeats in vrrA from different Bacillus anthracis isolates. Appl Environ Microbiol. 1997;63:1400–1405. doi: 10.1128/aem.63.4.1400-1405.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Jameson B A, Wolf H. The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci. 1988;4:181–186. doi: 10.1093/bioinformatics/4.1.181. [DOI] [PubMed] [Google Scholar]
  • 9.Karplus P A, Schultz G E. Prediction of chain flexibility in proteins. Naturwissenschaften. 1985;72:212–213. [Google Scholar]
  • 10.Keim P, Kalif A, Schupp J, Hill K, Travis S E, Richmond K, Adair D M, Hugh-Jones M E, Kuske C R, Jackson P. Molecular evolution and diversity in Bacillus anthracis as detected by amplified fragment length polymorphism markers. J Bacteriol. 1997;179:818–824. doi: 10.1128/jb.179.3.818-824.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Keim P, Klevytska A M, Price L B, Schupp J M, Zinser G, Smith K L, Hugh-Jones M E, Okinaka R, Hill K K, Jackson P J. Molecular diversity in Bacillus anthracis. J Appl Microbiol. 1999;87:215–217. doi: 10.1046/j.1365-2672.1999.00873.x. [DOI] [PubMed] [Google Scholar]
  • 12.Keim P, Price L B, Klevytska A M, Smith K L, Schupp J M, Okinaka R, Jackson P, Hugh-Jones M E. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J Bacteriol. 2000;182:2928–2936. doi: 10.1128/jb.182.10.2928-2936.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kyte J, Doolittle R F. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  • 14.Madoff L C, Michel J L, Gong E W, Kling D E, Kasper D L. Group B streptococci escape host immunity by deletion of tandem repeat elements of the alpha C protein. Proc Natl Acad Sci USA. 1996;93:4131–4136. doi: 10.1073/pnas.93.9.4131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Makino S, Van Putten J P M, Meyer T F. Phase variation of the opacity outer membrane protein controls invasion by N. gonorrhoeae into human epithelial cells. EMBO J. 1991;10:1307–1315. doi: 10.1002/j.1460-2075.1991.tb07649.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Moran C P. RNA polymerase and transcription factors. In: Sonenshein A L, Hoch J A, Losick R, editors. Bacillus subtilis and other gram-positive bacteria: biochemistry, physiology, and molecular genetics—1993. Washington, D.C.: American Society for Microbiology; 1993. pp. 653–667. [Google Scholar]
  • 17.Moxon E R, Rainey P B, Nowak M A, Lenski R E. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 1994;4:24–33. doi: 10.1016/s0960-9822(00)00005-1. [DOI] [PubMed] [Google Scholar]
  • 18.Moxon E R, Rainey P B. Pathogenic bacteria: the wisdom of their genes. In: Van der Zeijst B A M, Van Alphen L, Hoekstra W P M, van Embden J D A, editors. Ecology of pathogenic bacteria. Royal Dutch Academy of Sciences, second series, no. 96. Amsterdam, The Netherlands: Royal Dutch Academy of Sciences; 1995. pp. 255–268. [Google Scholar]
  • 19.Murphy G L, Connell T D, Barritt D S, Koomeyh M, Cannon J G. Phase variation of gonococcal protein. II. Regulation of gene expression by slipped strand mispairing of a repetitive DNA sequence. Cell. 1989;56:539–547. doi: 10.1016/0092-8674(89)90577-1. [DOI] [PubMed] [Google Scholar]
  • 20.Price L B, Hugh-Jones M, Jackson P J, Keim P. Genetic diversity in the protective antigen gene of Bacillus anthracis. J Bacteriol. 1999;181:2358–2362. doi: 10.1128/jb.181.8.2358-2362.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Schupp J M, Price L B, Klevytska A, Keim P. Internal and flanking sequence from AFLP fragments using ligation-mediated suppression PCR. BioTechniques. 1999;26:905–912. doi: 10.2144/99265st04. [DOI] [PubMed] [Google Scholar]
  • 22.Shields D C, Sharp P M. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 1987;15:8023–8040. doi: 10.1093/nar/15.19.8023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stern A, Meyer T F. Common mechanism controlling phase and antigenic variation in pathogenic neisseriae. Mol Microbiol. 1987;1:5–12. doi: 10.1111/j.1365-2958.1987.tb00520.x. [DOI] [PubMed] [Google Scholar]
  • 24.Taylor W R. The classification of amino acid conservation. J Theor Biol. 1986;119:205–218. doi: 10.1016/s0022-5193(86)80075-3. [DOI] [PubMed] [Google Scholar]
  • 25.Titball R W, Turnbull P C B, Hutson R A. The monitoring and detection of Bacillus anthracis in the environment. J Appl Bacteriol Symp Suppl. 1991;70:9S–18S. [PubMed] [Google Scholar]
  • 26.Van Belkum A, Scherer S, Van Alphen L, Verbrugh H. Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev. 1998;62:275–293. doi: 10.1128/mmbr.62.2.275-293.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Van Putten J P M. Phase variation of lipopolysaccharide directs interconversion of invasive and immuno-resistant phenotypes of N. gonorrhoeae. EMBO J. 1993;12:4043–4051. doi: 10.1002/j.1460-2075.1993.tb06088.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Weiser J N, Maskell D J, Butler P D, Lindberg A A, Moxon E R. Characterization of repetitive sequences controlling phase variation of Haemophilus influenzae lipopolysaccharide. J Bacteriol. 1990;172:3304–3309. doi: 10.1128/jb.172.6.3304-3309.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES