Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2000 Jul 15;28(14):2831–2838. doi: 10.1093/nar/28.14.2831

Mutational analyses of dinucleotide and tetranucleotide microsatellites in Escherichia coli: influence of sequence on expansion mutagenesis

Kristin A Eckert 1,a, Guang Yan 1
PMCID: PMC102660  PMID: 10908342

Abstract

Mutagenesis at [GT/CA]10, [TC/AG]11 and [TTCC/AAGG]9 microsatellite sequences inserted in the herpes simplex virus thymidine kinase (HSV-tk) gene was analyzed in isogenic mutL+ and mutL Escherichia coli. In both strains, significantly more expansion than deletion mutations were observed at the [TTCC/AAGG]9 motif relative to either dinucleo­tide motif. As the HSV-tk coding sequence contains an endogenous [G/C]7 mononucleotide repeat and ~1000 bp of unique sequence, we were able to compare mutagenesis among various sequence motifs. We observed that the relative risk of mutation in E.coli is: [TTCC/AAGG]9 > [GT/CA]10 ~ [TC/AG]11 > unique ~ [G/C]7. The mutation frequency varied 1400-fold in mutL+ cells between the tetranucleotide motif and the mononucleotide motif, but only 50-fold in mutL cells. The [G/C]7 sequence was destabilized the greatest and the tetranucleotide motif the least by loss of mismatch repair. These results demonstrate that the quantitative risk of mutation at various microsatellites greatly depends on the DNA sequence composition. We suggest alternative models for the production of expansion mutations during lagging strand replication of the [TTCC/AAGG]9 microsatellite.

INTRODUCTION

Microsatellite sequences of 1–4 or 5 nt per repeat unit are ubiquitous throughout the human genome (14). These repetitive sequences can be found flanking coding sequences and within introns, as transcribed but untranslated genomic regions (1,5,6). Unfortunately, our knowledge as to the exact number, sequence composition and genomic location of microsatellites is biased by the large proportion of cDNA sequences that constitute the current genomic sequence databases (2,5,6), and a full appreciation of this class of repetitive sequence must await completion of the various genome projects. Nevertheless, evidence exists supporting a role for [GT/CA]n and [TC/AG]n sequences in the regulation of gene expression (79) and in modulating chromatin structure (10). Moreover, within the past decade, a direct involvement of microsatellite sequences in human disease has been demonstrated (11).

Microsatellite sequences influence the local geometry of DNA due to the potential for adopting non-B DNA conformations (10). Repeats of alternating purine·pyrimidine bases (e.g., GT/CA) can form Z-DNA, and repeats of polypurine and poly­pyrimidine tracts (e.g., TC/AG) can form triplex DNA. In addition, particular trinucleotide sequences (e.g., CGG/GCC) have the potential to form stable hairpin structures (12). The effect of non-B DNA forms on DNA metabolism, including replicative, repair and recombination processes, has not been studied rigorously. The ability of long microsatellite sequences that are capable of forming both triplex and hairpin structures to arrest DNA synthesis in vitro (13,14) forms the basis of current models for preferential genetic expansion of these alleles in vivo (15,16).

A large base of knowledge exists in several model systems, including Escherichia coli, yeast and human cells, regarding the genetic stability of mono-, di- and trinucleotide micro­satellite sequences (1621). The favored mechanism to explain alterations in microsatellite allele size is slipped strand mispairing between repeat units during replicative or repair DNA synthesis (22,23). Consistent with this model, loss of DNA polymerase proofreading activity or post-replication mismatch repair (MMR) greatly enhances the rate of [A]n and [GT/CA]n microsatellite tract alterations (1618,21,24). However, little is known about the genetic factors controlling tetranucleotide sequence stability. In yeast, the mutation rate for a [CAGT/GTCA]n allele was similar to that of a [GT/CA]n allele, and was increased in an MMR-deficient strain (25). In human cell lines, direct measurements of microsatellite alleles have yielded estimated mutation rates for [GATA/CTAT]n sequences that are significantly higher than rates for [GT/CA]n sequences (26), and the mutation rate for an [AAAG/TTTC]n microsatellite is one of the highest measured for microsatellites in human cells (27). Nevertheless, mathematical modeling of mutation rates at various microsatellites in the genome databases has failed to show a significant difference in mutability between di- and tetranucleotide sequences (28,29).

In this study, we compared the stability of the di- and tetranucleotide sequences [TC/AG]n and [TTCC/AAGG]n in MMR-proficient and deficient E.coli strains. Our strategy quantitated the mutability of the microsatellite sequences relative to coding sequences within the same genetic target, the herpes simplex virus type 1 thymidine kinase (HSV-tk) gene. We observed a significantly greater incidence of expansion mutations at [TC/AG]n and [TTCC/AAGG]n alleles, relative to a [GT/CA]n allele. The frequency of mutation at the tetranucleotide locus was up to 40-fold higher than the mutation frequencies at both dinucleotide loci, and MMR affected tetranucleotide stability to only a minor extent.

MATERIALS AND METHODS

Escherichia coli strains

Strain FT334 is a derivative of HB101 (30) with the following genotype: tdk, upp, thi1, hsd20, supE44, lacY1, proA2, ara14, galK2, xyl5, mtl1, leuB6, rpsL20, recA13. Strain PP102 is isogenic to strain FT334, except for the following alleles: recA306 srl::Tn10, mutL::Tn5 (P.Prince and R.Monnat, University of Washington, personal communication)

Reagents

Oligonucleotides used to construct the microsatellite sequences were synthesized by Biosynthesis, Inc (Lewisville, TX) or the Macromolecular Core Facility, Penn State College of Medicine (Hershey, PA). All restriction endonucleases were supplied by Gibco BRL Life Technologies (Gaithersburg, MD) and used according to manufacturer’s instructions. 5-Fluoro-2′-deoxyuridine (FUdR) and chloramphenicol were purchased from Sigma Chemical Co. (St Louis, MO).

Construction of artificial-microsatellite-containing vectors

All artificial microsatellite sequences were inserted in-frame between bases 111 and 112 of the HSV-tk gene, in the sequence context [GT (insert) TCTC]. In the unidirectional vectors described, the first sequence listed serves at the template for the leading strand of replication, and the second sequence serves as the template of the lagging strand. Construction of [GT/CA]10 and [TC/AG]11 microsatellite-containing plasmids has been described (20), and the same method was used to construct the [TTCC/AAGG]3 vector. The [TTCC/AAGG]9 and [TC/AG]18 microsatellite inserts were synthesized by an in vitro DNA polymerase reaction. A 111 base oligonucleotide, corresponding to the HSV-tk sense strand (nucleotides 73–147) and containing the microsatellite sequence to be inserted, was primed by hybridization of a 15mer oligonucleotide at a 1:1 molar ratio. This substrate was used as a DNA template for native T7 DNA polymerase in a reaction containing 20 mM Tris–HCl (pH 7.5), 10 mM MgCl2, 2 mM DTT, 500 µM dNTPs, 40 nM template DNA and 4 U of T7 DNA polymerase. After second strand synthesis, the polymerase was heat-inactivated, and the double-stranded DNA was digested with MluI and BsiWI restriction enzymes to generate cohesive termini. After removal of the small fragments using a Microcon-30 ultrafiltration device, the recovered large fragment (86 bp) was inserted into plasmid pGTK4 and subcloned into plasmid pND123, as described (20). The integrity of the DNA was confirmed by DNA sequence analyses of independent clones.

Mutation frequency analyses

Escherichia coli strains were transformed with each plasmid by electroporation. After the expression period, an aliquot of the transformation was used to inoculate an LB culture containing 50 µg/ml chloramphenicol. A population of plasmid-bearing bacteria was selected by overnight growth of the culture. The frequency of chromosomal mutations was estimated by selective plating of the bacteria on LB agar plates in the absence and presence of 150 µg/ml rifampicin. The frequency of plasmid mutations was estimated by selective plating of the bacteria on VBA plates + 50 µg/ml chloramphenicol in the absence and presence of 40 µM FUdR (31). The presence of FUdR selects for bacteria with a plasmid-derived HSV-tk-deficient phenotype. These mutations include deletions or expansions of any number of repeat units within the microsatellite motifs that are not a multiple of three (20), as well as base substitutions, frameshifts, deletions and rearrangements within the 1350 bp HSV-tk gene and promoter sequence (31,32).

HSV-tk mutational specificity analyses

Each vector preparation was introduced into E.coli by electroporation, followed by plating on VBA selective media to generate independent mutants, as described previously (31). The DNA sequence of the HSV-tk gene in the 5′ microsatellite-containing region of each mutant was determined by manual dideoxy DNA sequence analysis of plasmid DNA using Sequenase 2.0, according to manufacturer’s instructions (Amersham Life Sciences, Inc., Arlington Heights, IL). The DNA sequence changes of mutant plasmids containing no changes at the artificial microsatellite loci were determined using an automated ABI Prism 377XL DNA Sequencer (PE Biosystems, Foster City, CA). DNA sequence reactions were carried out using BigDye terminator cycle sequencing and AmpliTaq DNA polymerase FS, according to the manufacturer’s directions. Differences in proportions of specific types of mutations between different vectors or between different strains were analyzed statistically using Fisher’s exact test (two-tailed).

RESULTS

Effect of MMR on mutation frequency

Plasmid vectors containing artificial microsatellite loci were introduced into isogenic mutL+ and mutL E.coli strains. The cultures were plated simultaneously on two types of selective media: rifampicin, to determine the mutation frequency at the chromosomal rif locus, and FUdR to determine mutation frequency at the HSV-tk plasmid locus. As shown in Table 1, the loss of MMR resulted in a 500–1500-fold increased Rifr frequency, independent of the plasmid vector, in agreement with earlier mutagenesis studies of mutL-deficient E.coli strains (33). In contrast, the HSV-tk mutation frequency of the control vector, pJY1, was elevated ~10-fold in the mutL-deficient strain, relative to the mutL+ strain (Table 1). This differential susceptibility of the two loci to a mutator phenotype most likely reflects differences in the mutagenic target sizes and copy number and/or the types of mutational events detected in each selection scheme.

Table 1. HSV-tk mutation frequencies of vectors containing artificial microsatellite sequences in mutL+ and mutL E.coli.

Plasmid Chromosomal locus Plasmid locus
[microsatellite] Rifr frequency × 10–9 FudRr frequencya × 10–5
  mutL+ mutL mutL+ MutL
pJY1 9.3 7100 0.99 ± 0.56 12 ± 1.7
[none]        
pJY2 6.5 3100 4.0 ± 1.1 (4.0)b 85 ± 37 (7.1)
[GT/CA]10        
pJY4 10 15 000 8.2 ± 1.4 (8.3) 180 ± 95 (15)
[TC/AG]11        
pJY5 n.d. n.d. 1.6 ± 0.60 (1.6) 17 ± 6.4 (1.4)
[TTCC/AAGG]3        
pJY5.1 n.d. n.d. 460 ± 86 (460) 320 ± 57 (27)
[TTCC/AAGG]9        

n.d., not determined.

aData are means of three to five (± SD) independent, selective platings of overnight cultures.

bIncreased mutation frequency, relative to pJY1 control.

Introduction of the [GT/CA]10 and [TC/AG]11 microsatellites resulted in a 4 and 8-fold elevated mutation frequency, respectively, relative to the control vector in the mutL+ strain, and a 7- and 15-fold increased frequency, respectively, in the mutL strain (Table 1). Introduction of the tetranucleotide sequence [TTCC/AAGG]3 had no significant effect on the HSV-tk mutation frequency, whereas increasing the allele length to [TTCC/AAGG]9 resulted in >400-fold increased mutation frequency in the repair-proficient strain (Table 1). In the mutL-deficient strain, however, the mutation frequency for the [TTCC/AAGG]9 vector was increased only ~30-fold over that of the parental vector.

Mutational specificity of microsatellite-containing vectors in the presence and absence of MMR

In order to generate a mutational spectrum that accurately reflects events occurring at the HSV-tk locus, we isolated independent FUdRr mutants by selective plating 2 h after plasmid transformation. The HSV-tk mutation frequencies determined under these conditions for the various microsatellite-containing vectors, relative to the parental vector, were elevated 10–360-fold in the mutL+ strain and 5–80-fold in the mutL strain (Table 2). In the mutL+ strain, the mutation frequencies observed after the limited number of cell generations in this approach (Table 2) were similar to those observed after the more extensive number of cell generations in the overnight culture (Table 1). Interestingly, in the mutL strain, the mutation frequencies for the dinucleotide vectors were 2–4-fold higher after the overnight culture, as compared to the mutation frequencies determined by direct plating whereas the mutation frequency for the tetranucleotide vector was not affected by the number of cell generations (Tables 1 and 2). As the mutation frequency for the [TTCC/AAGG]3 vector was increased <2-fold over the pJY1 control in either strain, the mutational specificity of this construct was not examined.

Table 2. Mutational specificity of vectors containing artificial microsatellites in mutL+ and mutL E.coli.

Mutation class /allele length   Number of mutational events observed (proportion of total)
change (units)   pJY2 [GT/CA]10 pJY4 [TC/AG]11 pJY5.1 [TTCC/AAGG]9
    mutL+ mutL mutL+ mutL mutL+ mutL
Artificial microsatellite              
Expansion +5 0 0 1 0 0 0
  +4 4 0 0 2 0 0
  +2 3 0 6 2 2 4
  +1 2 10 15 8 32 16
Deletion –1 25 19 22 26 14 8
  –2 12 6 12 2 3 0
  –4 4 0 0 0 1 0
  –5 0 0 1 0 0 0
Subtotal   50 (0.91) 35 (0.59) 57 (0.93) 40 (0.67) 52 (1.0) 28 (0.76)
HSV-tk coding region   5 (0.09) 24 (0.41) 4 (0.07) 20 (0.33) 0 9 (0.24)
Total mutants sequenced   55 59 61 60 52 37
Overall HSV-tk mutation frequency ± std. SDa   12 ± 3.3 × 10–5 45 ± 5.0 × 10–5 8.2 ± 0.4 × 10–5 41 ± 5.9 × 10–5 310 ± 68 × 10–5 620 ± 300 × 10–5

aData are means of three to seven (± SD) independent experiments, in which the frequency was determined by selective plating 2h after transformation. Corresponding control values for pJY1 parent vector are 0.85 ± 0.26 × 10–5 for the mutL+ strain and 8.2 ± 0.31 × 10–5 for the mutL strain.

Mutational specificities were generated by combined manual and automated DNA sequence analyses of the HSV-tk sequences. This approach allowed us to categorize mutants into either those with alterations in the artificial microsatellite or those with a normal microsatellite locus and alterations in the coding region of the HSV-tk target gene. The HSV-tk coding region (34) contains several endogenous microsatellite sequences: 20 G/C mononucleotide motifs of four to seven units, two A/T mononucleotide motifs of four units, and five dinucleotide motifs of three units. As all mutational events were detected using the same target sequence and selection protocol, we can quantitate directly the frequency of mutational events at microsatellite sequences of varying base composition, size and length. In the mutL+ strain, 91–100% of mutational events occurred at the artificial microsatellite loci in the three vectors analyzed (Table 2). In contrast, only 59–76% of mutational events occurred at the artificial microsatellite loci in the mutL-deficient strain. The majority of microsatellite mutational events in all cases were changes in allele length of one unit; i.e., 2 bp for the dinucleotide motifs and 4 bp for the tetranucleotide motif. However, the gain or loss of up to five units was observed for the dinucleotide motifs (Table 2).

Effect of microsatellite composition and MMR on expansion mutations

In mutL+ E.coli, a strong bias in favor of deletions (82%) over expansions (18%) was observed at the [GT/CA]10 locus (Table 3), with the overall frequency of deletion events (9.0 × 10–5) occurring approximately four times as frequently as expansions (2.0 × 10–5). At the [TC/AG]11 locus, the proportion of expansions (39%) was somewhat greater than that observed at the [GT/CA]10 locus (Table 3), such that the overall frequencies of deletions (4.7 × 10–5) and expansions (3.0 × 10–5) are similar for the [TC/AG]11 microsatellite sequence. A strong bias in favor of expansion mutations (65%) was observed at the tetranucleotide microsatellite sequence, a difference that is statistically different from either the [TC/AG]11 locus or the [GT/CA]10 locus (Table 3). For the [TTCC/AAGG]9 allele, the overall frequency of expansion mutations (200 × 10–5) was greater than that of deletion mutations (110 × 10–5). Because of the unidirectional mode of DNA replication for these pBR322-derived vectors, we were able to test for the effects of DNA replication asymmetry on expansion mutagenesis by cloning the HSV-tk gene into the vector in the opposite orientation. In a reverse-orientation [AAGG/TTCC]9 vector, our preliminary results demonstrate that the frequency of expansion mutations is the same for this replication mode. However, both the overall mutation and the microsatellite deletion frequencies of this vector were increased, relative to the [TTCC/AAGG]9 vector (data not shown).

Table 3. Expansions of various microsatellite sequences in mutL+ and mutL E.coli.

Mutational event Proportion of microsatellite mutations (no. observed)
 
mutL+
 
 
mutL
 
 
  [GT/CA]10 [TTCC/AAGG]9 [TC/AG]11 [GT/CA]10 [TTCC/AAGG]9 [TC/AG]11
Expansion 0.18 (9) 0.65 (34) 0.39 (22) 0.29 (10) 0.71 (20) 0.30 (12)
Deletion 0.82 (41) 0.35 (18) 0.61 (35) 0.71 (25) 0.29 ( 8) 0.70 (28)
P-valuea <0.0001   0.001  
    0.007   0.001

aFisher’s exact test, two-sided.

To test the effect of allele length on expansion mutagenesis, we determined the mutation frequency and specificity of a [TC/AG]20 microsatellite-containing vector in the mutL+ strain. The observed HSV-tk mutation frequency for this allele was 84 × 10–5, 10-fold higher than that of the [TC/AG]11 allele, and 4-fold lower than that of the [TTCC/AAGG]9 allele. DNA sequence analyses of 19 mutants revealed that all of the mutations occurred at the artificial microsatellite motif. These mutations consisted of 11 expansion events (58%) and eight deletion events (42%). Therefore, a bias in favor of expansion mutations is a feature of microsatellite alleles >36 bp in length and of the general sequence polypyrimidine/polypurine.

In the mutL strain, the proportion of deletion to expansion mutations at the dinucleotide microsatellites was not significantly different from that observed for the mutL+ strain (Table 3). At both dinucleotide microsatellite sequences, the frequency of deletion mutations was 2–3-fold greater than the frequency of expansion mutations. Moreover, a mutational bias continued to be observed for the tetranucleotide locus in the mutL strain: the proportion of expansions to deletions was significantly different for the [TTCC/AAGG]9 vector than for either the [GT/CA]10 or the [TC/AG]11 vectors (Table 3). The frequency of expansions at the tetranucleotide locus (340 × 10–5) was ∼3-fold greater than that of deletions (130 × 10–5) in the mutL strain.

Relative risk of mutation at various sequence motifs

We observed a statistically significant difference between mutL+ and mutL strains in the proportion of mutational events arising outside the artificial microsatellite loci, within the HSV-tk coding region (P = 0.0003 to P < 0.0001, Table 2). In the mutL strain, single base frameshifts accounted for ∼90% of the observed HSV-tk coding region mutations (Table 4). A major hotspot corresponding to a one-base insertion within an endogenous, [G/C]7 mononucleotide microsatellite sequence was observed among mutants derived from all plasmids. The average frameshift mutation frequency at this site is 9.4 × 10–5 in the mutL strain, 40-fold greater than the mutL+ strain (0.22 × 10–5). The other frameshift events in the mutL strain occurred at mononucleotide sequences of 4–6 units in length (Table 4). Interestingly, one-base insertions were only observed at microsatellites of >5 bp in length. Base substitutions were recovered for the mutL strain only in the pJY4 spectrum, corresponding to a frequency of 1.3 × 10–5, ∼5-fold higher than the base substitution frequency observed for pJY4 replication in the mutL+ strain (0.27 × 10–5).

Table 4. DNA sequence analyses of HSV-tk coding region mutations.

Strain Plasmid Mutational event HSV-tk positiona Sequence contextb Number
mutL+ pJY2 G:C→T:A 541 TTC G ACC 1
    A:T→G:C 658 GGC A CCA 1
    A:T→C:G 1172 AGA T GGG 1
    +1G:C insertion 487–493 TC GGGGGGG AG 1
    –1G:C deletion –181c TCT C ATG 1
  pJY4 G:C→T:A –98d TAT G AAA 1
      220 ATA G ACG 1
    Complex 277–282 TTCGCGCGAC→T ACGTA GAC 2
mutL pJY2 +1G:C insertion 487–493 TC GGGGGGG AG 13
      605–610 GA CCCCCC AG 2
      517–521 CG CCCCC GG 1
    –1G:C deletion 938–942 GG CCCCC GA 3
      833–836 CT GGGG AC 1
      723–726 CG CCCC GG 1
    Complex 277–282 TTCGCGCGAC→T ACGTA GAC 1
    Tandem 497–501 CTGGG→GTGCT 1
    Deletion 377–393 GCCTCGA....ATCGGC 1
  pJY4 +1G:C insertion 487–493 TC GGGGGGG AG 13
      330–334 CT GGGGG CT 1
    –1G:C deletion 938–942 GG CCCCC GA 3
      605–610 GA CCCCCC AG 1
    G:C→A:T 730 GGC G AGC 1
      758 CTG C GAT 1
  pJY5.1 +1G:C insertion 487–493 TC GGGGGGG AG 3
    Complex 492–497 GGGGGGGAGGC→GGGGGTGAGGGC 1

aNucleotide position of the mutated base. Numbering of the HSV-tk sequence begins at the G residue within the BglII recognition site (32).

bWild-type sequence of sense strand (5′ to 3′ direction). Underlined base(s) is (are) mutated as indicated.

cPromoter region (–35) mutation.

dStart codon mutation.

An extreme variability in the inherent mutability of the different sequence motifs was observed in the mutL+ strain (Table 5). Within the HSV-tk gene, unique sequence mutations (including base substitution, single base frameshifts and complex mutations) occurred about four times as frequently as mutations within the endogenous [G/C]7 microsatellite motif. Among microsatellite sequences, the [TTCC/AAGG]9 allele was 30-fold more mutable that the dinucleotide alleles of similar unit length, which, in turn, were 40-fold more mutable than the mononucleotide allele. Overall, we observed a 1400-fold range of mutation frequencies among the various sequence motifs in the presence of MMR.

Table 5. Mutation frequencies at mono, di and tetranucleotide microsatellite sequences, relative to unique sequences, in mutL+ and mutL E.coli.

DNA sequence motif Plasmid HSV-tk mutation frequencya × 10–5 Ratio mutL/mutL+
    mutL mutL+  
Uniqueb pJY2   8.4   0.87   9.7
  pJY4   4.8   0.54   8.9
Mononucleotidec pJY2   9.9   0.22  45
[G/C]7 pJY4   8.9  <0.13 >68
Dinucleotide        
[GT/CA]10 pJY2  27  11   2.4
[TC/AG]11 pJY4  27   7.7   3.5
[TC/AG]20 pJY4.3.1 n.d.  84 n.d.
Tetranucleotide        
[TTCC/AAGG]9 pJY5.1 470 310   1.5

n.d., not determined.

aData generated by multiplying the proportion of total mutants observed at each sequence motif (Tables 2 and 4) by the overall HSV-tk mutation frequency measured 2 h after transformation (Table 2).

bMutations within the HSV-tk coding sequence and promoter region, exclusive of frameshift mutations at positions 487–493.

cOne-base deletion mutations at the microsatellite sequence endogenous to HSV-tk (positions 487–493).

Loss of mutL-dependent MMR had the net effect of narrowing the range of mutation frequencies among the various motifs to 50-fold. The greatest consequence of MMR deficiency was on mutation frequency at the [G/C]7 endo­genous HSV-tk microsatellite (Table 5). Thus, in the mutL strain, both the unique sequence mutations and the mononucleotide mutations occurred with approximately equal frequency. Moreover, the frequency of mutagenesis at the dinucleotide motifs was only 2–3-fold higher than that at the mononucleot­ide microsatellite. Despite these changes, however, loss of MMR had little effect on the frequency of mutation within the tetranucleotide microsatellite sequence (Table 5).

DISCUSSION

We have compared the genetic stability in E.coli of three micro­satellite sequences, [GT/CA]10, [TC/AG]11 and [TTCC/AAGG]9, in the absence and presence of MMR. Our data demonstrate that the frequency of expansion mutations is dependent upon the DNA sequence composition of the microsatellite. In particular, we observed a greater incidence of expansion mutations at the [TC/AG]20 and [TTCC/AAGG]9 microsatellites, relative to the [GT/CA]10 and [TC/AG]11 microsatellites (Table 3). We have shown that the bias in favor of expansions at the tetra­nucleotide allele is not due to mutL-dependent MMR. In our analyses, the mutation frequency at the tetranucleotide locus was ~20–30-fold greater than at the dinucleotide loci of similar unit size in both MMR-proficient and deficient strains (Table 5).

The microsatellite sequences analyzed have the potential to form different non-B DNA structures (10). While [GT/CA] sequences and other alternating purine·pyrimidine tracts can adopt left-handed helices (Z-DNA), polypurine/polypyrimidine tracts can form triple helices (10). Plasmids containing [TC/AG]12 and [TTCC/AAGG]6 repeat motifs have been demonstrated by others to form triplex DNA structures in vitro (35); whether triplex structures are formed in our vectors remains to be tested. The different expansion frequencies of the [TC/AG]11 and [TTCC/AAGG]9 vectors reported in this study (Table 3) could reflect either the total length of the potential triplex sequence (22 bp for the dinucelotide repeat versus 40 bp for the tetranucleotide repeat) or the inherent potential for slipped strand mispairing by DNA polymerases at di- versus tetra­nucleotide repeat units. The observation that the proportion of expansion mutations observed for the [TC/AG]20 allele (total length = 40 bp) was increased relative to the [TC/AG]11 allele and similar to the [TTCC/AAGG]9 allele is consistent with the former hypothesis.

Several investigators have proposed that polymerase pausing at non-B DNA structures during DNA synthesis, together with biochemical differences between leading and lagging strand DNA replication, are responsible for expansion biases at microsatellite sequences (11,12,15,16) as well as for asymmetric deletion mutation frequencies (36). DNA triplexes may be formed during DNA synthesis from a single-stranded DNA molecule, provided by the newly formed (nascent) DNA strand, and a homopurine or homopyrimidine mirror repeat sequence, provided by the template strand (9). When the nascent strand has achieved a great enough length, the template DNA that is yet to be synthesized may fold back so as to provide the third strand and form a triplex structure with the newly synthesized duplex DNA (13). In E.coli, DNA synthesis on the discontinuous (lagging) template strand is co-ordinated with the continuous (leading) template strand through a multiprotein DNA replication machine containing DNA polymerase III holoenzyme (Pol III HE). Pol III HE is directly associated with DNA helicase, such that duplex unwinding and DNA synthesis are tightly co-ordinated events (37). Therefore, we posit that the architecture of proteins at the replication fork prevents triplex DNA formation during DNA synthesis on the leading strand, as the template DNA ahead of the polymerase is too highly constrained to form a fold-back hairpin structure. However, the extensive single-stranded nature of the lagging strand, together with the opposite polarity of DNA synthesis relative to fork progression (helicase unwinding), may be permissive to triplex formation during DNA synthesis. Triplex DNA formed by [TC]27 and [GA]27 repeats has been shown to cause DNA polymerase pausing during DNA synthesis in vitro (13) and DNA polymerase pausing is correlated with misalignment-mediated errors (38). Thus, the preferential expansions observed for [TC/AG]20 and [TTCC/AAGG]9 microsatellites in our system may involve the formation of triplex DNA structures, and we speculate that resolution of the stalled polymerase complex at triplex structures on the lagging strand results in misalignment errors. In our pJY vectors, the lagging strand template of the microsatellite sequences is polypurine [(AG) or (AAGG)]; therefore, the postulated triplex formed would be of the RRY type, in which the nascent strand is involved only in Watson–Crick base pairing (9). In this scenerio, polymerase resolution that involves backwards slippage of the nascent strand to resume synthesis occurs more frequently than slippage of the template strand, resulting in a bias toward expansion mutations. This model is consistent with the observation that DNA polymerase inhibition is dependent upon the length of the polypurine or polypyrimidine tract (39), and that polymerase pausing occurs in the middle of the tract (13). We have observed that flipping the HSV-tk gene such that the lagging strand template is polypyrimidine resulted in a similar frequency of expansion errors concomitant with an increased frequency of deletion errors. One explanation for this altered mutational specificity is that the YRY-type triplex is resolved differentially, perhaps because the nascent DNA strand in this structure is involved in both Watson–Crick and Hoogsteen base pairing (9). An alternative model to explain the observed mutational biases at microsatellite sequences does not require the formation of triplex DNA structures. The pol III α-subunit has been reported to produce a low frequency of frameshift errors at homopolymeric repeat sequences, with a bias towards misalignment errors at template purine over template pyrimidine sequences (40). The potential for DNA misalignments of template and nascent strands may be greater for the lagging DNA strand, as described above; in this case, the differential ability of Pol III HE to continue synthesis on misaligned DNA structures of polypurine versus polypyrimidine sequences would lead to a mutational bias. We point out that the two models are not mutually exclusive, and both mechanisms may be working at the lagging strand. The models proposed can be tested directly in a variation of our in vitro polymerase assay (31), an approach that will allow us to definitively determine whether polymerase error rates differ depending on template sequences as well as whether triplex DNA formation is involved in microsatellite mutagenesis.

Recently, GAA/CTT repeated sequences associated with Friedreich’s ataxia have been shown to expand possibly via formation of triplex DNA structures (15). In Friedreich’s ataxia, the unstable microsatellite alleles are large (>79 units) and the expansions observed involve several units (15). In contrast, our alleles are relatively small (9–20 units), and the mutational events are primarily one unit (Table 2). Thus, the biochemical mechanism underlying the observed expansions is not necessarily the same in the two cases.

Defective MMR has been shown previously to increase the frequencies of mutation at both unique and microsatellite loci in E.coli, yeast and human cells (1618,21,24,25,4143). Using a forward mutation assay, Levinson and Gutman reported that the LacZ- mutation frequency for wild-type E.coli carrying an M13-based (GT/CA)11 insert was 15-fold higher than for bacteria carrying no microsatellite insert (17). In our HSV-tk forward mutation assay, the presence of a (GT/CA)10 insert increased the mutation frequency 14-fold over the parent vector (Tables 1 and 2). Moreover, the mutational specificity observed for wild-type E.coli is concordant between these two assays: 88% deletions (LacZ) versus 82% deletions (HSV-tk). The presence of a strong endogenous hotspot target sequence in the HSV-tk gene (see Table 4) precludes our ability to compare directly for MMR-deficient bacteria our results from the forward mutation assay to those previously reported using a specific reversion assay (17). Our results also agree qualitatively but differ quantitatively from previous studies of the effects of MMR loss on (GT/CA) sequence stability in yeast (43). In this case, the differences in magnitude can be explained on the basis of different target sequences. Using the URA3 gene as a mutational target, >90% of the mutational events in MMR-deficient Saccharomyces cerevisiae strains were found at the (GT/CA) microsatellite sequence (43). This is in contrast to the HSV-tk gene, where only 59% of mutational events for a similar vector in mutL E.coli were found at the (GT/CA) sequence, the remainder being found at endogenous micro­satellites (Tables 2 and 4). Qualitatively, however, our results for wild-type cells are in concordance: in yeast, 72% of mutations at a (GT/CA)14 sequence were deletions (44), as compared to 71% in bacteria at a (GT/CA)10 sequence (Table 2).

In this study, we observed that MMR correction is greatly influenced by the sequence composition of the microsatellite. MMR in E.coli does not contribute to the genetic control of mutational events at the [TTCC/AAGG]9 locus (Table 5). In comparison, MMR contributed by a factor of two to three to the genetic stability of both the [GT/CA]10 and the [TC/AG]11 dinucleotide microsatellites. Finally, a correction factor of at least 40 was measured for the endogenous HSV-tk [G/C]7 mononucleotide sequence (Table 5). These data are consistent with a previous E.coli study demonstrating effective mismatch correction of one and two-base deletions but inefficient repair of four-base deletions (45). Studies of MMR-deficient yeast cells demonstrated that the rate of mutation at a [G/C]18 allele is ~20-fold greater than the rate of change at a [GT/CA]16.5 allele, which, in turn, was 7-fold greater than that of a [CAGT/GTCA]16 allele (25). This relative order of mutational risk is similar to what we have observed in E.coli. However, in contrast to the yeast study, we observed no significant increase in the frequency of mutational events at the [TTCC/AAGG]9 tetranucleotide allele in MMR-deficient bacteria, relative to repair-proficient bacteria. This discrepancy may reflect either sequence composition differences of the tetranucleotide alleles or the fact that in our study, the tetranucleotide allele was in mutagenic ‘competition’ with the endogenous [G/C]7 mono­nucleotide sequence present within the HSV-tk coding region. In other words, an intermolecular competition may exist in cells between a finite number of MMR proteins and DNA substrates containing mutational intermediates. In this situation, the DNA substrates with the highest affinity for the MMR proteins will be preferentially repaired. Previous studies have demonstrated that MutS binds with higher affinity to one-base than four-base frameshift intermediates (45). Therefore, at the HSV-tk locus containing an artificial tetranucleotide allele and an endogenous mononucleotide allele, the endogenous allele will ‘win’ the competition for MutS binding and be preferentially repaired, such that no repair is observed at the tetranucleotide locus. Our observations at the HSV-tk locus in E.coli also support the proposal that large, endogenous mononucleotide sequences within genes are a major source of genetic change in MMR-deficient cells (46).

Some tumor cells have been described to exhibit a low level of microsatellite instability in which genetic alterations are observed at a limited number or type of microsatellite marker (47,48). This instability is associated with allele length changes at tri- and tetranucleotide markers rather than at dinucleotide markers, a pattern that is distinct from what is observed in MMR deficient tumors, such as hereditary non-polyposis colorectal carcinomas (47,48). A current debate exists as to whether these genetic alterations reflect clonal expansion of a mutant precursor population or functional genomic instability via an unidentified mechanism (49,50). This controversy stems directly from our ignorance concerning the inherent degree of genetic variation among the various microsatellites represented in the human genome and the mechanisms that operate to stabilize repetitive sequences. In this communication, we have documented that the mutation frequency of microsatellite sequences in E.coli is highly dependent upon the sequence composition, and we postulate that the sequence dependence reflects formation of non-B form DNA structures. We also demonstrate that MMR contributes little to the stability of the tetranucleotide locus examined. The system we describe (20) should be useful in future studies aimed at establishing the baseline stability of microsatellite sequences of distinct types and the host factors that influence that stability.

Acknowledgments

ACKNOWLEDGEMENTS

We thank Suzanne Hile and Jill Stahl for critical reading of the manuscript, Lisa Scheifele for technical assistance and Ray Monnat for providing strain PP102. This work is supported by PHS/NIH grant CA73649. We gratefully acknowledge the generous contributions made to the Jake Gittlen Cancer Research Institute.

REFERENCES


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES