Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 Sep 1;101(37):13448–13453. doi: 10.1073/pnas.0405116101

Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells

Guliang Wang 1, Karen M Vasquez 1,*
PMCID: PMC518777  PMID: 15342911

Abstract

Naturally occurring DNA sequences can form noncanonical structures such as H-DNA, which are abundant and regulate the expression of several disease-linked genes. Here, we show that H-DNA-forming sequences are intrinsically mutagenic in mammalian cells. This finding suggests that DNA is a causative factor in mutagenesis and not just the end product. By using the endogenous H-DNA-forming sequence found in the human c-myc promoter, mutation frequencies in a reporter gene were increased ≈20-fold over background in COS-7 cells. H-DNA-induced double-strand breaks (DSBs) were detected near the H-DNA locus. The structures of the mutants revealed microhomologies at the breakpoints, consistent with a nonhomologous end-joining repair of the DSBs. These results implicate H-DNA-induced DSBs in c-myc gene translocations in diseases such as Burkitt's lymphoma and t(12;15) BALB/c plasmacytomas, where most breakpoints are found near the H-DNA-forming site. Thus, our findings suggest that H-DNA is a source of genetic instability resulting from DSBs and demonstrate that naturally occurring DNA sequences are mutagenic in mammals, perhaps contributing to genetic evolution and disease.


DNA can adopt a number of different structures in addition to the canonical B-form DNA structure described by Watson and Crick over fifty years ago (1). These noncanonical DNA secondary structures form on tracts of repeat sequences, have a range of biological functions, and are possible causative factors in a number of human diseases, such as Fragile X syndrome, Huntington's disease, Friedreich's ataxia, and myotonic dystrophy (2, 3). Although the biology of these special DNA structures is not yet fully understood, it is known that some of these structures are involved in critical DNA metabolic processes, including DNA replication, gene transcription, recombination, and genetic instability (4).

One type of noncanonical DNA structure, an intramolecular triplex or H-DNA, forms at regions containing mirror repeat symmetry, where one half of the tract can dissociate into single strands by using the energy provided by supercoiling. One of the single strands can then swivel its backbone parallel to the purine-rich strand in the remaining duplex, forming a three-stranded helix in this half of the region, leaving its complementary strand unpaired (5, 6). Naturally occurring mirror repeat sequences capable of adopting H-DNA structures are very abundant in mammalian genomes and are found as frequently as 1 in every 50,000 bp in the human genome (7). The conformation of H-DNA has been probed in vivo by using both triplex antibodies and fluorescently labeled single-stranded DNA oligonucleotides (8, 9). Unlike other DNA secondary structure-forming sequences, which are typically located in intergenic or intronic regions, H-DNA-forming sequences are found most frequently in promoters and exons and have been found to be involved in regulating the expression of several disease-linked genes (10, 11). For example, the human c-myc gene, which is often translocated and overexpressed in tumors, contains an H-DNA-forming sequence in its promoter (10, 12). Because of their preferential distribution in promoter regions, most studies have focused on H-DNA's regulatory role in transcription, whereas its potential contribution to genetic instability has not been examined.

In addition to an intramolecular triplex, an intermolecular triplex can form in a sequence-specific manner by means of Hoogsteen hydrogen bonding of a third strand of DNA to purine-rich regions of the duplex DNA (13). The sequence-specific DNA recognition and binding characteristics of synthetic triplex-forming oligonucleotides (TFOs) have been extensively studied because they have potential applications in genome modification and therapy (14). The formation of TFO-induced triplex structures can directly inhibit transcription, can induce site-directed mutagenesis both in cells and in animals, can enhance homologous recombination in a sequence-specific manner, and can inhibit proliferation and induce apoptosis in cultured cells (1419). Because the triplex region found in H-DNA is similar in structure to intermolecular triplexes formed by TFOs, which are known to be mutagenic, and the remaining single-stranded region of the H-DNA structure is more susceptible to damage, we hypothesized that H-DNA structures are mutagenic in mammalian cells and cause genetic instability. Supporting this hypothesis, in Burkitt's lymphoma (20) and t(12;15) BALB/c plasmacytomas (21), many breakpoints on the translocated c-myc gene are clustered around the H-DNA-forming sequences in the promoter regions, immediately 5′ of exon 1 in each case. This finding suggests that the H-DNA structures result in fragile sites or mutation hotspots, and lead to double-strand breaks (DSBs) and to the subsequent translocation of the gene. This result is not unexpected because some other DNA secondary structures (e.g., cruciforms, hairpins, and Z-DNA) are found in mutagenic hotspots (4).

Because we believe that DNA double-strand breaks are the most likely intermediates for c-myc translocations and triplex-forming oligonucleotide-induced modification, and because it is well known that nonhomologous end-joining (NHEJ) is the major DSB repair pathway in mammalian cells, often leading to mutagenic repair (22), we measured the mutagenic potential of H-DNA-forming sequences in mammalian cells by using a supF mutation reporter system. This loss-of-function mutation reporter system allows detection of a variety of mutations, including those generated during NHEJ repair of DSBs. We tested the mutagenic potential of the endogenous H-DNA-forming sequence from the human c-myc gene and three model H-DNA-forming sequences, GG32, AG32, and GA32, in COS-7 cells. A scrambled sequence, CON, and a mutated c-myc sequence containing a 2-bp substitution to disrupt the mirror symmetry, MycAG, served as controls (Fig. 1A). Of the three model sequences, GG32 has been shown to form either a py·pu·py or pu·py-pu H-DNA structure with the highest efficiency relative to the other model sequences, whereas AG32 forms either type of H-DNA with lower propensity than GG32, and GA32 is able to adopt only a py·pu·py H-DNA structure with much lower efficiency than either GG32 or AG32. Thus, the different capabilities of GG32, AG32, and GA32 to adopt H-DNA structures afforded the opportunity to evaluate the relationship between the H-DNA conformation and their mutagenic potential in our system.

Fig. 1.

Fig. 1.

H-DNA-induced mutagenesis in mammalian cells. (A) Structure of the H-DNA-forming plasmids. H-DNA-forming sequences, GG32, AG32, GA32, cMyc, and the control sequences MycAG and CON, were inserted 4 bp upstream of the supF gene in shuttle vector pSP189 at the XhoI site. (B) Identification of the H-DNA conformation at the insertion site. Plasmids were digested with S1 nuclease at pH 4.5 (Upper) or P1 nuclease at pH 7.1 (Lower), treated with phosphatase, and radiolabeled. The 169- to 177-bp fragments containing the H-DNA-forming region were released by EcoRI and BamHI digestion and were separated on a native 8% polyacrylamide gel. The radioactivity incorporated into the fragments indicates that a single-stranded region was formed in the H-DNA region. The shorter fragments seen on pcMyc indicate that cleavages occurred on both strands leading to DSBs.

Here, we show that the endogenous H-DNA-forming sequence from the human c-myc promoter and the model H-DNA-forming sequences induced the mutation frequencies in the supF reporter gene coincident with their abilities to form the H-DNA structures. Additionally, H-DNA-induced DSBs were found near the H-DNA locus. Thus, our findings implicate H-DNA as a source of genetic instability resulting from DSBs and demonstrate that naturally occurring DNA sequences are mutagenic in mammalian cells, perhaps contributing to genetic evolution and disease.

Materials and Methods

Construction of H-DNA-Forming Plasmids. Oligonucleotides listed in Fig. 1 A and their complementary strands were synthesized and purified by polyacrylamide gel electrophoresis by the Midland Certified Reagent Company (Midland, TX). The XhoI-digested shuttle vector, pSP189, backbone was ligated with phosphorylated H-DNA-forming duplex DNA with XhoI overhangs. After transformation into MBM7070 indicator bacterial cells, mixtures were plated onto LB-Amp plates with 5-bromo-4-chloro-3-indolyl β-d-galactoside (X-gal) (20 mg/ml) and isopropyl β-d-thiogalactoside (IPTG) (50 mg/ml). Blue colonies with an intact supF gene were selected by EcoRI and BamHI restriction digestions. Fragments from recombinants were 37–45 bp larger than the 132-bp fragment from the parental pSP189 plasmid. Plasmids containing inserts of the proper length were further characterized by sequencing analysis using an upstream primer (5′-CAAAAAAGGGAATAAGGGCG-3′). Recombinants containing the correct insert were given the names pCON, pGG32, pAG32, pGA32, pcMyc, and pMycAG, according to their inserts.

Analysis of H-DNA Conformations in the Modified Plasmids. All H-DNA-forming sequences used have been shown to adopt H-DNA conformations with different efficiencies in previous studies (12, 23). To ensure that these sequences were able to adopt H-DNA structures in the newly constructed plasmids, we examined the formation of H-DNA structures by using S1 and P1 nuclease sensitivity assays at pH 4.5 and 7.1, respectively, as described (24), with slight modification. After nuclease digestion, the 5′ ends produced at the single-stranded region in the H-DNA loci were radiolabeled with T4 polynucleotide kinase and [γ-32P]ATP. The 169- to 179-bp fragments containing the H-DNA-forming region, released by EcoRI and BamHI digestion, were separated on a native 8% polyacrylamide gel and subjected to autoradiography for visualization. The radioactivity incorporated into the fragments indicated that a single-stranded region was formed in the H-DNA region.

H-DNA-Induced Mutagenesis of the supF Gene in Mammalian COS-7 Cells. Plasmids were transfected into COS-7 cells by using Gene PORTER (GTS Inc., San Diego) according to the manufacturer's recommendations. After 48 h, the amplified plasmids were recovered by the method of Hirt (25). After treatment with DpnI to remove those plasmids that were not replicated in the COS-7 cells, the plasmids were transfected into MBM7070 cells to detect the supF mutants by using a blue/white screen.

Linker-Mediated PCR (LM-PCR) Analysis of H-DNA-Induced DSBs. LM-PCR analysis was performed as described (26) with slight modification. After replication in COS-7 cells, plasmids were recovered, purified, and treated with Pol I Klenow fragment (New England Biolabs) according to the supplier's recommendations to blunt the DNA ends. Linkers containing one blunt end and one 3′ overhang were ligated with the Klenow-treated plasmids. Ligation products served as templates in PCR reactions performed at 95°C for 30 s for denaturation, 54°C for 30 s for annealing, and 72°C for 20 s for extension, for 28 cycles. Regions between the upstream primer (5′-AGATCCAGTTCGATGTAACC-3′), 181–201 bp upstream of the H-DNA site, and the linkers were PCR amplified and separated on a 2% agarose gel. The DNA fragments separated on the gel were transferred to Zeta-Probe GT membranes (Bio-Rad). We interrogated the blots with a single-stranded probe (5′-CACTCGTGCACCCAACTGATCTTCAGCATCTTTTAC-3′) near the upstream primer site, to confirm that the fragments on the gel were generated from the upstream primer and the downstream H-DNA-induced DSBs. The length of each specific band (L) was measured on a KODAK 1D Imaging Station. The positions of the breakpoints relative to the H-DNA loci (N) were calculated according to the length (L) of the PCR fragments: N = L – 20 (length of linker) – 201 (distance between the primer and the 5′ XhoI site).

Results

Construction of H-DNA-Forming Mutation Reporter Plasmids. To determine whether H-DNA-forming sequences are mutation hotspots in mammalian cells, we constructed four H-DNA-forming plasmids containing either the endogenous H-DNA-forming sequence from the human c-myc promoter (pcMyc), three model H-DNA-forming sequences with different H-DNA forming potentials: pGG32, pAG32, and pGA32, a scrambled control non-H-DNA-forming sequence, pCON, or a control mutated c-myc sequence, pMycAG, by inserting the sequences 4 bp upstream of the supF reporter gene in the shuttle vector, pSP189 (Fig. 1A). In the H-DNA-forming region, half of the tract exists as single-stranded DNA and can serve as a substrate for S1 or P1 nuclease digestion. As shown in Fig. 1B Upper, under the conditions of our S1 assay (pH 4.5), strong radioactive signals were detected on the EcoRI-BamHI fragments released from those plasmids containing the H-DNA-forming sequences (pGG32, pAG32, pGA32, and pcMyc), but not from the plasmid, pCON, which contains the non-H-DNA-forming sequence, nor from the mutated c-myc control plasmid, pMycAG. Notably, the fragment from pGA32 contained less radioactivity than the fragments from pGG32, pAG32, and pcMyc, suggesting that this sequence adopted an H-DNA structure with much lower efficiency, as expected from previous reports (23) (Fig. 1B). These results indicate that the H-DNA structures were formed in the expected locations. However, at pH 7.1, only fragments from pGG32 and pcMyc have visible radioactive signals (Fig. 1B Lower), indicating that both pAG32 and pGA32 form H-DNA structures with much lower efficiencies under conditions of neutral pH. Interestingly, unlike the other H-DNA-forming plasmids (which have cleavage on the non-hydrogen-bonded strand only as indicated by radiolabeled full-length EcoRI-BamHI fragments), nuclease digestion of pcMyc produced three fragments, suggesting the production of DSBs (Fig. 1B). One possible explanation for this difference may be that the center segment between the two symmetrical arms in the cMyc sequence is longer than those in the others (7 bp vs. 4 bp), allowing access of the central region to nuclease cleavage resulting in DSBs.

H-DNA-Forming Sequences Are Mutagenic in Mammalian COS-7 Cells. To determine the H-DNA-induced mutation frequencies in mammalian cells, we transfected COS-7 cells with the H-DNA-forming plasmids and screened for supF mutations 48 h after transfection. Before the plasmids were introduced into COS-7 cells, the stability of the plasmids in Escherichia coli, and the heterogeneity of the initial plasmids used in the mutagenesis assay was examined. The background mutation frequencies generated in MBM7070 indicator bacterial cells were <1 × 10–5 (data not shown), indicating that either the H-DNA-forming sequences were relatively stable, or they were repaired without impacting the adjacent supF gene in bacteria. This result excluded the possibility that the mutants screened in the following steps were generated in the bacterial cells. To determine whether H-DNA was mutagenic in mammalian cells, we screened the mutants generated on the plasmids pGG32, pAG32, pGA32, pcMyc, pMycAG, and pCON, after allowing them to replicate for 48 h in COS-7 cells. All H-DNA-forming sequences induced high levels of mutations in the mammalian COS-7 cells. Compared with the control plasmid, pCON (3.5 × 10–4), the frequency of mutagenesis was ≈17-fold greater for pGG32 (58.9 × 10–4), ≈13-fold greater for pAG32 (47.1 × 10–4), ≈4.7-fold greater for pGA32 (16.5 × 10–4), and ≈19-fold greater for pcMyc (65.4 × 10–4)(P < 0.0005, Fisher's exact test), (Fig. 2). The pMycAG, which differs from pcMyc only by switching the position of an A and a G (to destroy the mirror symmetry), had a significantly lower mutation frequency than pcMyc (6.9 × 10–4, P < 0.0005, Fisher's exact test). The mutagenesis induced by pAG32 was significantly higher than the mutations induced from pCON, but lower than pGG32 and pcMyc, and the mutagenesis induced by pGA32, was significantly lower than pAG32 (P < 0.001, Fisher's exact test). These results are consistent with the finding that the GA32 and AG32 sequences adopt H-DNA intramolecular triplex conformations at lower frequencies than GG32 (23)

Fig. 2.

Fig. 2.

supF mutation frequencies in H-DNA-forming plasmids after replication in COS-7 cells. Plasmids were transfected into COS-7 cells, replicated for 48 h, and subjected to a blue/white screen for supF mutants. Error bars show the SEM.

Characterization of supF Mutations Generated in COS-7 Cells. Inactivating mutations in the supF gene were sequenced by using a primer located 220 bp downstream of the H-DNA site. The frequencies of different classes of H-DNA-induced mutations generated after replication in mammalian cells are displayed in Table 1. The majority of spontaneous mutations generated in the control plasmid in the mammalian cells were point mutations (40%) and deletions (50%), consistent with results of previous studies (27). As shown in Table 1, the mutation spectra of pGG32, pAG32, and pcMyc are very similar, even though the inserted H-DNA-forming sequences and the induced mutation frequencies differed for the three plasmids. As expected, the majority of mutations induced by the H-DNA-forming sequences were deletions. Forty-one multibase deletion mutants were identified near the H-DNA site (Fig. 3). Among them, 7 had 1- to 3-bp additions in the junctions; the left ends of the remaining 7 deletions are shown, but the right ends are too close to the primer in the pBR327 ori to sequence, and therefore, are not shown. Interestingly, in the remaining 27 characterized junctions, ≈80% (22/27) had 1–6 bp of homology. Seventeen “rearranged” mutants were more complicated and apparently resulted from processes other than simple cutting and rejoining. Sequencing and restriction analyses indicated that, in addition to deletions surrounding the H-DNA-containing region, the primer-binding site was inverted, and >58% (10/17) of the rearranged mutants contained an additional large-scale deletion or duplication spanning the region from bases 1165–3118 (Table 2).

Table 1. Mutation spectra of plasmids replicated in COS-7 cells.

H-DNA-forming plasmids
Mutation pAG32 pGG32 pcMyc Total pCON
Point mutation 0 (0%) 1* (5.0%) 0 (0%) 1 (1.6%) 4 (40%)
Large-scale insertion 1 (4.8%) 0 (0%) 1§ (5.0%) 2 (3.2%) 0 (0%)
Large-scale deletion 15 (71.4%) 15 (75.0%) 11 (55.0%) 41 (67.3%) 5 (50%)
Rearrangement 5 (23.8%) 4 (20.0%) 8 (40%) 17 (27.9%) 1 (10%)
Total 21 (100%) 20 (100%) 20 (100%) 61 (100%) 10 (100%)
*

G43→A43

G4→A4, G24→A24, G25→T25, and G66→A66

A fragment from 1187-1399 on the original plasmid was duplicated and inserted in supF gene between A32 and C33.

§

A fragment from 1550-2354 was duplicated and inserted in supF gene between C10 and C11.

Details of large-scale deletions and rearrangements are shown in Fig. 3 and Table 2

Fig. 3.

Fig. 3.

Diagram of deletion mutants. Sequences surrounding the inserted H-DNA-forming sequences are shown at the top of the figure. The blank areas between the lines indicate deletions. Boxed letters represent additional bases inserted. Letters at the ends of the blank areas indicate microhomologies at the junctions. Because the microhomologies cannot be assigned unambiguously to the left or right of the deleted sequence, we have listed them on both ends, although only one copy is present in the sequence of the junction. Broken lines indicate deletions that extended near the primed region. The number at the right of the line indicates the number of identical mutants identified.

Table 2. Restriction analysis of complex rearrangements.

Size, kb EcoRI (4926) Eco0109I (1165) BsgI (2698) BstXI (3118) BsaI (4162) ScaI (4581) XmnI (4700)
pAG32-1 5 1 1 1 1 1 1 1
pAG32-10 5 1 1 1 1 1 1 1
pAG32-16 5.1 1 1 1 1 1 1 1
pAG32-24 2.5 1 0 0 0 1 1 1
pAG32-25 2.5 1 0 0 0 1 1 1
pGG32-6 3.3 1 0 0 1 1 1 1
pGG32-8 5.3 1 1 1 1 2 1 1
pGG32-12 3.3 1 0 0 1 1 1 1
pGG32-17 3.3 1 0 0 1 1 1 1
pGG32-24 2.5 1 0 0 0 1 1 1
pGG32-25 6.7 1 1 2 2 2 1 2
pcMyc-3 2.5 1 0 0 0 1 1 1
pcMyc-4 5.8 1 1 1 1 1 1 1
pcMyc-8 2.5 1 0 0 0 1 1 1
pcMyc-13 2.5 1 0 0 0 1 1 1
pcMyc-18 2.5 1 0 0 0 1 1 1
pcMyc-23 2.5 1 0 0 0 1 1 1
pcMyc-24 4 0 1 1 1 1 0 0
pcMyc-27 6.5 1 1 2 2 1 1 1
pcMyc-20 5 1 1 1 1 1 1 1

Entries indicate the number of bands generated by restriction digestions. 0, Zero fragments indicate deletion of the analyzed region; 1, one restriction fragment indicates wild type; 2, two restriction fragments indicate duplication of the analyzed region.

H-DNA-Forming Sequences Induce DNA DSBs in Mammalian Cells. The structures of the junctions suggested that the H-DNA-forming plasmids had undergone DSBs and were repaired by means of a NHEJ pathway. To evaluate whether the H-DNA structure could act as a fragile site resulting in DSBs in mammalian cells, we mapped DSBs near the c-myc H-DNA locus on pcMyc recovered from COS-7 cells. Using LM-PCR to amplify the regions between a specific primer 181–201 bp upstream of the H-DNA sequence and the DNA breakpoints, we identified several unique products in reactions with the H-DNA-containing plasmids (Fig. 4A, compare lanes 9–12 to control lane 7). The lengths of the specific bands were calculated as 90, 110, 120, 138, 146, 166, 191, 200, 220, 240, and 255 bp, as listed in Fig. 4A. We further identified the products resulting from the amplification of the region between the specific primer and the downstream linker by Southern blot analysis. We found that these sequences were located near the H-DNA site (Fig. 4B). The 41- and 62-bp bands were not detected by Southern blotting, likely because the DSBs are too close to the upstream primer. Thus, the H-DNA-induced DNA double-strand breakpoints were located at 132, 121, 101, 83, 75, 55, 30, 21, and 1 bp 5′ to the first XhoI site (4 bp from the H-DNA locus), and 19 and 34 bp 3′ to the XhoI site. In contrast, we did not detect any clear hotspots for H-DNA-induced DSBs with plasmids that had not undergone replication in the mammalian cells (Fig. 4A, lane 8). Interestingly, we found that the spectrum of DNA breakpoints on pcMyc varied as a function of replication time within the mammalian cells. Products of lengths 240, 120, and 90 bp, which can been seen in Fig. 4A, lane 9 (pcMyc, replicated in COS-7 cells for 4 h), decreased with time of incubation, whereas products of lengths 255, 200, 191, and 138 bp, which were weak or undetectable after 4 h of incubation (Fig. 4A, lane 9), increased as the time of incubation increased from 4 h to 48 h (Fig. 4A, lanes 9–12).

Fig. 4.

Fig. 4.

Mapping of H-DNA-induced DSBs. (A) Products from LM-PCR analysis of the H-DNA-induced DSBs were separated on a 2% agarose gel. pcMyc was recovered from COS-7 cells after incubation for 0 (lanes 2 and 8), 4 (lanes 3 and 9), 8 (lanes 4 and 10), 24 (lanes 5 and 11), and 48 (lanes 6 and 12) h; pCON was replicated for 48 h (lanes 1 and 7). The primer on the linker amplified all plasmid fragments (lanes 1–6). Specific bands were produced between the specific primer and the linker primer (lanes 7–12), indicating the DSBs were generated at 132, 121, 101, 83, 75, 55, 30, 21 and 1 bp 5′ to the first XhoI site, and 19 and 34 bp 3′ to the XhoI site. (B) Southern blot analysis of the fragments amplified by LM-PCR. Fragments in each lane indicate the specific PCR amplification between the primer and DNA breakpoint.

Discussion

The discovery that abnormalities in the DNA structure itself can drive genetic instability has changed our understanding of the genetics and mechanisms of some human diseases (2, 28). Long tracts of triplet repeats (such as CAG and CGG), which cause genetic instability, are found in the DNA of patients afflicted with any one of >15 genetic diseases. These repeats may be expanded from the normal length of ≈15 units to up to hundreds or thousands of repeats in the diseased state (29, 30). Secondary structures such as slipped DNA or hairpins that can form at these repeats are a possible cause for the repeat instability (31). The length of the repeat is a critical factor in driving instability. For example, CAG tracts of 28 repeats are very stable, whereas expansions of >50–60 repeats can lead to genetic instability as seen in Huntington's disease (32).

Long GAA repeats found in Friedreich's ataxia patients, which have the potential to adopt triplex conformations, lead to high levels of expansions and deletions of the repeats, likely resulting from slippage events during replication (33). Wells and colleagues (34) found that the instability in this region was initiated by hexa-stranded “sticky” DNA formed by the interaction of two triplex structures. If the GAA tracts were too short (10–20 repeats) to form sticky DNA, no heterogeneous fragments were detected, suggesting that the sticky DNA structure was necessary for the genetic instability seen. However, in our system, the H-DNA-forming sequences tested are short (23–32 bp) and are not capable of adopting secondary structures other than H-DNA. This study (Fig. 1B) and previous reports (10) have suggested that the 23-bp sequence from the promoter region of the c-myc gene in the human genome can adopt an H-DNA structure. Its control derivative, pMycAG, which differs from pcMyc by 2 bp, did not adopt H-DNA structures under the conditions used in the S1 and P1 nuclease assays, as expected. Among the three model H-DNA palindrome motifs used in this study, the perfect mirror repeat GG32, has been shown to form H-DNA with the highest efficiency, followed by the AG32 sequence, whereas the GA32 sequence was nearly deficient in its ability to adopt H-DNA (23). We show here that all H-DNA-forming sequences tested induced high levels of mutagenesis on plasmids introduced into mammalian cells in the order pcMyc > GG32 > AG32 > GA32 > pMycAG > pCON, correlating with their relative abilities to adopt H-DNA conformations, providing evidence for the relationship between the DNA structural conformation and the induced mutagenesis. Interestingly, in our study, pGA32 adopts H-DNA at low efficiency under acidic pH (Fig. 1B), consistent with the relatively low but significant mutagenesis induced (≈4-fold, Fig. 2). We speculate that, under certain conditions in vivo, this sequence might adopt an H-u3 structure (4) where a protonated A pairs to a G (35). Supporting this notion, Rooney and Moore (36) have shown that the GA32 sequence stimulated recombination to similar levels as GG32 on plasmids in mammalian cells, suggesting that cellular conditions can allow for H-DNA formation by this sequence.

Our results support the hypothesis that H-DNA structures can induce mutagenesis in mammalian cells. We measured the mutation frequencies induced by H-DNA by using a mutation reporter shuttle vector-based assay. Thus, the mutation frequencies measured may actually be an underestimate of the total number of mutations that occurred in vivo for the following reasons: (i) the H-DNA-forming sequences were inserted near the supF gene, so only the mutations extending into the supF gene can be detected by blue-white screening; and (ii) mutations resulting in a dysfunctional replication origin or ampicillin resistance gene will not survive the selection and will be excluded in the subsequent mutant screening steps.

Restriction analyses of the mutants revealed that the majority of the rearrangements resulted in loss of the area spanning the Eco0109I (1165), BsgI (2698) and BstXI (3118) restriction sites. We found no altered DNA structures in this region (data not shown); however, the center of this region (2512–2688) was very A+T rich (71% vs. 56% for the entire plasmid). Such A+T-rich regions have been described as DNA-unwinding sites and recombination hot spots (4). It is plausible that the two fragile sites (H-DNA and the A+T-rich region) in our plasmids led to the high level of rearrangements detected. Similarly, Pluciennik et al. (37) reported that CAG repeats increase the recombination frequency in A+T-rich regions ≈17-fold greater than in G+C-rich sequences.

The formation of DNA secondary structures plays a role in genetic instability by interrupting replication, transcription, and/or DNA repair activities. For example, secondary structures formed on tracts of CAG and CGG repeats block DNA polymerase progression (38). The temporary pausing of polymerases may result in slippage errors during DNA synthesis, leading to large expansions or deletions of the repeats. However, polymerase arrest or stalling at DNA replication forks may result in DSBs (39), and H-DNA has also been shown to cause arrest of DNA polymerase (40, 41). Interestingly, in our study, the yields of H-DNA-forming plasmids isolated from COS-7 cells were much lower than the yields of control plasmids isolated. This result may be due to H-DNA-induced DNA polymerase stalling or to mutations in the origin of replication or the ampicillin resistance gene.

DNA repair proteins may also be involved in the genetic instability resulting from noncanonical DNA structures. For example, mismatch repair was found to be involved in CAG and CGG repeat instability E. coli (42). We and others have shown that intermolecular triplexes are recognized by the nucleotide excision repair (NER) factors XPA and RPA, followed by an error-prone repair resulting in site-specific mutagenesis, both in cells and in animals (19, 4345). Bacolla et al. (46) showed recently that introduction of the DNA sequence of intron 21 of the PKD1 gene, in which H-DNA and other secondary structure-forming sequences are found, induced an SOS response in E. coli through the NER pathway. These results suggest that naturally occurring DNA can adopt noncanonical DNA structures that are recognized as substrates for mutagenic repair. Here, by using both genetic and physical methods, we have shown that H-DNA can induce DSBs in the surrounding sequences, invoking an error-prone repair process in mammalian cells. Studies to address the mechanisms of DSB induction and repair of H-DNA structures, such as the role of NHEJ, mismatch repair (MMR), NER, and the polymerases involved are needed.

NHEJ is the major pathway to repair DSBs in mammalian cells and is an error-prone process. Analysis of the H-DNA-induced mutant structures in this study revealed that 80% of the junctions had 1–6 bp of homology. This finding is consistent with previous reports demonstrating that NHEJ repair of DNA ends in mammalian cells (47) generally (80–90%) results in junctions containing microhomologies, implicating NHEJ in the repair of the H-DNA-induced DSBs and in the subsequent mutagenesis.

In diseases associated with chromosomal translocations and dysregulation of the c-myc gene, such as Burkitt's lymphoma and BALB/c plasmacytomas, the c-myc gene fuses to an Ig gene (48). The breakpoints on the Ig loci are thought to be generated during V(D)J recombination, with the RAG endonucleases creating the DSBs (49). However, little is known about the mechanisms that induce the DSBs in c-myc. The sequences surrounding the breakpoints have neither homologies to V(D)J nor recombinase recognition sequences, indicating they are not likely the result of V(D)J recombination (50). Our results suggest an involvement of H-DNA-induced DSBs in at least part of the c-myc translocations such as those found in Burkitt's lymphoma and t(12;15) BALB/c plasmacytomas, where many breakpoints in the translocated c-myc gene are found near the H-DNA-forming region. Thus, our findings implicate H-DNA as an impetus for disease resulting from DSBs, and more generally, provide evidence that naturally occurring DNA sequences can serve as a source of genetic instability in mammalian cells, contributing to genetic evolution and/or genetic disease.

Acknowledgments

We acknowledge Sarah Henninger for manuscript preparation and Dr. Howard Thames for assistance with statistical analyses. This work was supported by National Cancer Institute Grant CA93729 (to K.M.V.) and National Institute on Environmental Health Sciences Center Grant ES07784.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: DSB, double-strand break; LM-PCR, linker-mediated PCR; NHEJ, nonhomologous end-joining.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES