Skip to main content
The Journal of Biological Chemistry logoLink to The Journal of Biological Chemistry
. 2009 Oct 12;284(49):33883–33893. doi: 10.1074/jbc.M109.055368

A Recombination Hot Spot in HIV-1 Contains Guanosine Runs That Can Form a G-quartet Structure and Promote Strand Transfer in Vitro*

Wen Shen 1, Lu Gao 1,1, Mini Balakrishnan 1,2, Robert A Bambara 1,3
PMCID: PMC2797159  PMID: 19822521

Abstract

The co-packaged RNA genomes of human immunodeficiency virus-1 recombine at a high rate. Recombination can mix mutations to generate viruses that escape immune response. A cell-culture-based system was designed previously to map recombination events in a 459-bp region spanning the primer binding site through a portion of the gag protein coding region. Strikingly, a strong preferential site for recombination in vivo was identified within a 112-nucleotide-long region near the beginning of gag. Strand transfer assays in vitro revealed that three pause bands in the gag hot spot each corresponded to a run of guanosine (G) residues. Pausing of reverse transcriptase is known to promote recombination by strand transfer both in vivo and in vitro. To assess the significance of the G runs, we altered them by base substitutions. Disruption of the G runs eliminated both the associated pausing and strand transfer. Some G-rich sequences can develop G-quartet structures, which were first proposed to form in telomeric DNA. G-quartet structure formation is highly dependent on the presence of specific cations. Incubation in cations discouraging G-quartets altered gel mobility of the gag template consistent with breakdown of G-quartet structure. The same cations faded G-run pauses but did not affect pauses caused by hairpins, indicating that quartet structure causes pausing. Moreover, gel analysis with cations favoring G-quartet structure indicated no structure in mutated templates. Overall, results point to reverse transcriptase pausing at G runs that can form quartets as a unique feature of the gag recombination hot spot.

Introduction

Human immunodeficiency virus type 1 (HIV-1)4 evolves rapidly to escape immune host response. Retroviral diversity is generated by the low fidelity of reverse transcriptase (RT) (14) and the mutagenic actions of host factors such as APOBEC3G (5) in accumulating mutations during the reverse transcription. Additionally, recombination is also a medically important source of diversity in HIV-1. Like other retroviruses, HIV-1 packages two copies of single stranded RNA genomes within its virion (6). RT is able to switch templates during reverse transcription (7). When the two co-packaged RNA templates are non-identical, template switching results in genetic recombination. In HIV-1, the rates of recombination vary from 3 to 30 times per viral replication, depending on the host cell type (810). By dispersing mutations, recombination promotes the generation of new variants of the virus, such as the drug-resistant strains that arise in response to antiviral treatments (11).

Several factors that contribute to efficient strand transfer and, hence, recombination have been examined. Sequences and structures that caused RT pausing were considered as principally responsible (12). Our group previously proposed that stalling of RT at pause sites would induce RT RNase H activity to make a series of adjacent cuts in the first RNA template (donor). The degradation of the donor RNA was proposed to create an invasion site for the second RNA template (acceptor) to interact with the newly synthesized DNA, which initiates the strand transfer process (1317). In a related study the effects of pausing on retroviral recombination were examined in a murine leukemia virus-based system (18). There was a direct correlation between pausing in vitro and increased rate of recombination in vivo, supporting the idea that template secondary structure promotes recombination. In addition, our group indicated in vitro, using RNA templates containing the HIV-1 dimerization initiation sequence (DIS), that factors increasing proximity between the donor and acceptor templates also promoted recombination (19). Moreover, the viral nucleocapsid (NC) protein, an RNA chaperone that coats the viral genome, enhances strand transfer through the rearrangement of RNA secondary structure as well as the annealing of complementary strands during reverse transcription (20). Other factors, such as low dNTPs (21) and mutant RT that displays defective ability to incorporate dNTPs (22), were also reported to promote strand transfers in vitro.

To understand how recombination contributes to HIV-1 evolution, it is important to learn whether recombination occurs randomly throughout the genome or whether there are preferred sites. Recently, a putative hot spot for recombination within the C2 portion of the gp120 env gene was identified in vivo (23, 24). The crossovers were mapped to the loop region of a hairpin structure. Upon the destabilization of this hairpin, the rate of recombination decreased to the background level (23, 24). Moreover, HIV-1 repeated sequence R was also reported as a preferred site for recombination in vitro (25). The two R regions, located at the 5′- and 3′-ends of the HIV-1 genome, are the donor and acceptor sites for minus strand transfer (26). Frequent crossovers have also been observed within R (25, 27).

We previously reported a recombination hot spot in vivo within a 112-nt-long region encompassing the gag AUG codon (28). Two HIV-1 strains, NL4-3 and JRCSF, which differed by approximately every 25 bases from DIS to vpr, were used to locate recombination events by a cell culture-based system. A 459-bp region from DIS to gag was sequenced. About 64% of the recombination events were found around the beginning of gag, a region representing only 20% that of the sequenced length. The recombination hot spot starts 30 bases upstream of the gag AUG and continues 82 bases into the sequence that encodes the matrix (MA) domain of the HIV-1 Gag protein. The HIV-1 Gag protein is initially synthesized as a polyprotein precursor and then cleaved by a virus-encoded protease into the mature Gag proteins p17 MA, p24 capsid (CA), p7 nucleocapsid (NC), and p6 (29). MA protein plays a critical role in viral assembly at the plasma membrane through its myristoylated N-terminal Gly residue as well as a highly basic domain between MA amino acid residues 17 and 31 (30). In a recent study of a drug-resistant NL4-3/SPL2923 strain of HIV-1, several amino acid substitutions were observed in the MA of the strain selected in the presence of SPL2923, an inhibitor blocking HIV-1 binding to its target cell (31). Significantly, three of the total of four substitutions, E12K, K26R, and V35I, were located in or around the gag hot spot region (31). Amino acid substitutions were also observed in the gp120 gene, another hot spot for recombination in HIV-1 (23). In addition, mutations in MA (E12K and V35I) were reported to confer resistance to the anti-viral drug Amprenavir, an HIV-1 protease inhibitor (32). The overlapping of recombination hot spots and HIV-1 drug resistance sites indicated that facile recombination in gag could be advantageous in the development of drug resistance. Moreover, it follows that understanding the mechanisms responsible for the gag recombination hot spot could suggest approaches to suppress drug resistance.

Here, we report analyses in vitro of the gag recombination hot spot in HIV-1 identified previously using a cell-culture based system. This hot spot is G-rich (38%) compared with the rest of the genome (24%) (33), and there are several evenly spaced runs of G or A/G residues that correlate with synthesis pauses (28). We employed a substrate RNA-containing sequence from the primer-binding site to a portion of gag for strand transfer assays. The effect of altering or weakening the pause sites on crossover efficiency was determined by creating mutant substrates with disrupted G runs. We also developed profiles of transfer positions to locate and compare crossover distributions in substrates with or without intact G runs. Experiments were designed to define the connection, if any, between G runs, unique structures in the RNA, pausing, and high efficiency strand transfer.

EXPERIMENTAL PROCEDURES

Materials

HIV-1 reverse transcriptase (p66/p51 heterodimer) was purified as described previously (34), and the expression plasmids pKK-p66(His)6 and pKK-p51(His)6 (35) were generous gifts from Dr. Neerja Kaushik. HIV-1 nucleocapsid protein NCp9 (72 amino acids) was generously provided by Dr. Robert J. Gorelick. DNA oligonucleotides were obtained from Integrated DNA Technologies, Inc. (Coralville, IA). Pfu Turbo DNA polymerase and the QuikChange II site-directed mutagenesis kit were purchased from Stratagene (La Jolla, CA). Platinum Taq DNA polymerase and TOPO TA cloning kits were purchased from Invitrogen. The MEGAshortscript high yield transcription kit was purchased from Ambion, Inc. (Austin, TX). [32P]ATP was purchased from PerkinElmer Life Sciences. P-30 Micro BioSpin 30 columns in RNase-free Tris were purchased from Bio-Rad. Templiphi amplification kit was purchased from GE Healthcare. AccuGel 19:1 and AccuGel 29:1 were purchased from National Diagnostics (Atlanta, GA). All the other reagents were purchased from Roche Applied Science.

Construction of Substrates

Genomic sequences from the NL4-3 and JRCSF strains of HIV-1 were amplified by PCR, then cloned into pBluescript II KS(+) to generate the pNL182–520 (WT donor) and pJRC150–502 (WT acceptor) constructs as described previously (28). The QuikChange II site-directed mutagenesis kit was used to generate mutant constructs based on the wild type constructs. For each mutant substrate two DNA primers, both containing the desired mutations, were designed to be complementary to the WT double-stranded plasmid template sequences. Mutant plasmids were generated by primer extension with Pfu Turbo DNA polymerase. The desired mutations of each construct are underlined. For A mutant donor construct, the paired primers containing the desired mutations are 5′-GAA TTA GAT AAA TGG GAA CCA ATT CGG TTA AGG CC-3′ and the complementary strand. For A mutant acceptor construct, the paired primers containing the desired mutations are 5′-GAA TTG GAT AGG TGG GAA CCA ATT CGG TTA AGG CC-3′ and the complementary strand. For the G mutant donor construct, the paired DNA primers containing the desired mutations are 5′-GCG TCG GTA TTA AGC TGC GGA GAA TTA GAT AAA TGG G-3′ and the complementary strand. For G mutant acceptor constructs, the paired primers containing the desired mutations are 5′-GCG TCA GTA TTA AGC TGC GGA GAA TTG GAT AGG TGG G-3′ and the complementary strand. For AG mutant donor construct, the paired primers containing the desired mutations are 5′-GGA ACC AAT TCG GTT AAG GCC ACG GCG AAA GAA AC-3′ and the complementary strand. For AG mutant acceptor construct, the paired primers containing the desired mutations are 5′-GGG AAC CAA TTC GGT TAA GGC CAC GAC GAA AGA AAC-3′ and the complementary strand. To generate GAG mutant donor and acceptor constructs, the same primers that generated the G mutant were used based on the AG mutants constructs. For the GGG mutant donor construct, the paired primers containing the desired mutations are 5′-GAA TTA GAT AAA TGC GAA AAA ATT CGG TTA AGG CC-3′ and the complementary strand based on the GAG mutant donor construct.

Generation of RNA Templates

Donor and acceptor DNA templates were generated from pNL182–520 and pJRC150–502, respectively (19). PCR primers AL(143)-gl-minus (5′-CTC TAA TAC GAC TCA CTA TAG G-3′) and MB24 (5′-CCC AGT ATT TGT CTA CAG CC-3′) were used to amplify donor DNA templates, whereas primers AL(143)-gl-minus and lg50 (5′-CTT CTG ATG ATT CTA ACA GGC C-3′) were used to amplify acceptor DNA templates. To amplify the short wild type sWT donor DNA, primers ws9 (5′-CCT CTA ATA CGA CTC ACT ATA GGC ATA TTA AGC GGG GGA GAA-3′) and lg66 (5′-AGC TCC CTG CTT GCC CAT AC-3′) were used. To amplify the sGGG-mut donor DNA, primers ws10 (5′-CCT CTA ATA CGA CTC ACT ATA GGC ATA TTA AGC TGC GGA GAA-3′) and lg66 were used. To amplify the sWT acceptor DNA template, primers ws8 (5′-CCT CTA ATA CGA CTC ACT ATA GGA GAG AGA TGG GTG CG-3′) and lg69 (5′-TAT ATG TTT TAA TCT ATA TTT TTT C-3′) were used. RNA templates were produced by run-off transcription in vitro using a MEGAshortscript kit. Full-length transcripts were purified on 6% denaturing polyacrylamide gels (AccuGel 19:1) and analyzed for integrity.

Strand Transfer Assay

Annealing reactions were performed in 20-μl final reaction volume. The donor RNA (1 pmol) was incubated with 5′-end-labeled DNA primer (2 pmol) MB24 in 100 mm KCl at 95 °C for 5 min and slow-cooled to room temperature to facilitate primer annealing to the donor RNA. Transfer assays were performed in 12.5-μl final reaction volume. Nucleocapsid protein at a concentration for 200% substrate coating (100% nucleocapsid protein was defined as 1 protein molecule for every 7 nucleotides) was preincubated at 37 °C for 5 min with acceptor RNA (100 fmol) and a 0.5-μl annealing reaction (containing 50 fmol of primer and 25 fmol of donor) in 50 mm Tris-HCl, pH 8.0, 50 mm KCl, 1 mm dithiothreitol, and 1 mm EDTA. For donor extension assays, acceptor RNA was excluded. RT was added at a final concentration of 35 nm and incubated at 37 °C for 5 min. Reactions were initiated by adding MgCl2 and dNTPs at a final concentration of 6 mm and 50 μm, respectively. After the incubation at 37 °C for different time points, reactions were terminated by adding 12.5 μl of termination dye (90% formamide, 10 mm EDTA, pH 8.0, and 1% each of xylene cyanol and bromphenol blue). Reaction products were resolved on 6% denaturing polyacrylamide gels (AccuGel 19:1), visualized by Storm PhosphorImager (GE Healthcare), and analyzed by ImageQuant software Version 1.2 (GE Healthcare). For testing the effect of cations on pause profiles, 100 mm cation (KCl, LiCl, or NaCl) were added in the annealing reactions, respectively. Moreover, 50 mm cation (KCl, LiCl, or NaCl) were added in the transfer reactions, respectively.

Transfer Distribution Assay

To locate the transfer events, transfer assays were performed, and gels were run and analyzed by autoradiography while still wet. Bands corresponding to transfer products were excised. The excised DNA was recovered by elution, amplified by PCR using primers lg61 (5′-ACC AGA GCT CTA GGC GAT TGG AGC TCC-3′) and lg62 (5′- ACT GGG ATC CTG CCC AGT GTT TGT CTA C-3′), and cloned into pCR2.1-TOPO vector (Invitrogen). Individual clones representing transfer products were first amplified by the Templiphi amplification kit, then sequenced using M13 (−20) forward primer (5′-GTA AAA CGA CGG CCA GT-3′) to identify the recombination sites. To address the possibility of false recombinant products generated during PCR amplification, WT donor extension and acceptor extension products were mixed in equal amounts and subjected to the same PCR amplification as the transfer products. Because of natural sequence difference between donor and acceptor templates at the 3′-end, equal amounts of reverse primers ws7 (for donor template) (5′-CTT CTG ATG TCT CTA AAA GGC C-3′) and lg40 (for acceptor template) (5′-CTT CTG ATG ATT CTA ACA GGC C-3′) were mixed with forward primer ws2 (5′-ACC AGA GCT CTA CGC CCG AAC AGG G-3′) to perform the PCR reactions. The PCR products were cloned and sequenced to determine the percentage of the recombinants contributed by PCR.

Native Gel Assay

The assays were performed in a 12.5-μl reaction volume. 5′-32P-Labeled sWT or sGGG-mut RNA in 50 mm Tris-HCl, pH 8.0, 1 mm dithiothreitol, and 1 mm EDTA was incubated with or without 50 mm salt (KCl, LiCl, or NaCl). The reactions were heated at 95 °C for 1 min and cooled down slowly to room temperature, then separated by electrophoresis on 8% nondenaturing polyacrylamide gels (AccuGel 29:1). The gels were dried and exposed by Storm PhosphorImager and analyzed by ImageQuant software Version 1.2.

RESULTS

Multiple Pauses in Vitro Correlate with the Peak of Transfer in Vivo

To understand the factors that contribute to the peak of transfers in HIV-1 gag in vivo, we used a previously described strand transfer gag in vitro in which donor RNA (NL4-3) and acceptor RNA (JRCSF) harbor the same naturally occurring periodic sequence differences as in the cell culture-based system (28). These differences act as markers to identify positions of crossovers in the transfer distribution assay. The donor and acceptor templates share a 320-nt homology spanning the primer-binding site through the beginning of gag, as shown in Fig. 1A. In this system, the primer was annealed to and extended on the donor template. The primer could switch from the donor onto the acceptor followed by completion of synthesis on the acceptor to yield a 385-nt transfer product (TP), or the primer could complete the synthesis on the donor without transfer to yield a 353-nt donor extension product (DE) (Fig. 1A).

FIGURE 1.

FIGURE 1.

A series of regularly spaced pauses correlates with runs of G or A/G residues in the strand transfer assay in vitro. A, shown is a schematic of the WT RNA templates. Bold lines represent the NL4-3 donor RNA (black) and JRCSF acceptor RNA (gray), whereas the hatched lines at the 5′-ends of both RNA templates represent plasmid-derived sequences. Relevant regions and nucleotide positions in the genomic RNA corresponding to the donor and acceptor RNAs are indicated. 32P-Labeled DNA primer (indicated by the star and arrow) was initially annealed to and extended on the donor RNA. Full-length synthesis on the donor RNA resulted in the donor extension product (DE), whereas template switching resulted in the transfer product (TP). PBS, primer binding site. B, shown is a strand transfer assay using WT templates. Reactions were terminated at 2, 5, 10, 15, 30, 45, and 60 min. The transfer product (385 nt), donor extension (353 nt), and major pause sites are indicated on the right side of the gel, whereas stars along with the numbers on the left indicate approximate marker positions in the newly synthesized DNA. The arrow on the left side of the gel indicates the direction of DNA synthesis. Lane L, 25-base DNA ladder. C, shown is the WT donor RNA sequence around the gag hot spot. Three G or A/G runs, which correspond to the regularly spaced pauses in gag hot spot, are underlined. The positions of markers are indicated on the top of the sequence.

Fig. 1B shows a typical gel of products from a strand transfer reaction using wild type donor and acceptor templates. The locations of markers are indicated on the left of the gel, whereas the major RT pause sites are indicated on the right of the gel. The arrow indicates the direction of DNA synthesis. According to the study in vivo, the peak of transfers falls into the 112-nt region between markers 4 and 9 (28). By examining the pause profile, we noticed a series of regularly spaced pause bands in and around the peak of transfers. RT pausing is known to occur at hairpin structures in the donor RNA (36) or could happen at homopolymeric runs of nucleotides (37, 38). We noticed that several pause bands located downstream of the gag hot spot were correlated with known hairpin structures, such as the DIS and splice donor (SD) hairpin. Remarkably, the locations of three pause sites (designated G1, A/G, and G2) in the hot spot region correspond to three runs of G or A/G residues (Fig. 1C). Compared with the HIV-1 genome (24%) (33), the 112-nt sequence of the gag hot spot region is unusually G-rich (38%). There are 43 G residues throughout the entire hot spot sequence, and these include four independent runs of Gs (with three or more adjacent bases). We previously reported that pauses promote strand transfer (36). Based on these observations, we hypothesized that the runs of G or A/G residues in the gag hot spot have the ability to pause RT in a way that is particularly favorable to promote strand transfers.

Base Substitutions in the Runs of G Residues Eliminate Corresponding Pauses

To test whether regularly spaced G or A/G runs can pause RT and promote strand transfers, we made several base substitutions to disrupt the natural runs in the gag hot spot region (Fig. 2A). Strand transfer assays were performed using both wild type and mutant templates to examine the pause profiles (Fig. 2B) and transfer efficiency (Fig. 2C). “A-mut” disrupts the run of A residues in the A/G run (nt 382–390) with two A to C base substitutions. The pause profile in and around the A-mut sequence change was the same as that of the wild type templates, suggesting that the run of A residues on the donor RNA template is not able to cause RT pausing, and the pause band corresponding to this A/G run may result from the three continuous G residues in the A/G run. “G-mut” alters the G run between markers 5 and 6, the second G run encountered by the RT (G2: nt 363–367). Significantly, this mutation eliminated the corresponding G2 pause, providing more evidence that runs of G residues stall the RT. “AG-mut” contains base substitutions in the G run at marker 8 (G1: nt 405–409) in addition to those present in A-mut. Again, the A/G pause was not affected, but the G1 pause completely disappeared. “GAG-mut,” created by the disruption of all three G and A/G runs, exhibited the least pauses in the entire hot spot region as expected. The only remaining significant pause band was in the A/G region corresponding to the remaining G run. These results highlight the significance of runs of G residues rather than the A residues in causing RT pauses in the hot spot region.

FIGURE 2.

FIGURE 2.

Strand transfer assay using mutant templates. A, sequences of WT and mutant donor are shown. Base substitutions made to disrupt the three G or A/G runs (underlined) in the gag hot spot are indicated in bold. B, strand transfer assay using mutant templates are shown. Different donor RNAs are indicated on the top of the gel, and the acceptor RNA used in this assay is WT acceptor RNA. Reactions were terminated at 2, 5, 10, 20, and 40 min. Major pause sites are indicated on the left of the gel. DE, donor extension product; TP, transfer product. C, shown are relative transfer efficiencies of mutant templates. Transfer efficiency is calculated as: 100 × transfer products/(transfer products + full-length donor extension products). Relative transfer efficiency is calculated as: transfer efficiency of the mutant template/transfer efficiency of WT template.

Transfer Efficiencies Do Not Decrease Significantly in Mutant Substrates

Our group previously reported that pausing of RT on donor templates concentrated RT-RNase H cuts in a way anticipated to make short gaps. These would be places on the nascent DNA that are accessible for acceptor invasion in vivo or in vitro, facilitating transfer (1315, 17). Considering the evidence linking RT pausing with transfers, we hypothesized that the mutant templates, which eliminate some RT pauses, would be less effective than the wild type template in promoting transfers. The transfer efficiency values used to make comparisons were calculated as: 100 × transfer products/(transfer products + full-length donor extension products). To our surprise a comparison of wild type with mutant substrates in which both donor and acceptor templates were mutated as described above showed that the mutations did not decrease the efficiency of transfer. With G-mut, AG-mut, and GAG-mut sequence changes, although pauses were eliminated, transfer efficiency was almost the same as with the WT sequence (data not shown). As we considered how to explain this, we realized that the accessibility of mutant acceptor templates might have been greatly increased by alterations in folding caused by disruption of the runs of G. Viewed in this manner, the mutations had two counteracting effects, a negative on the donor and a positive on the acceptor. To focus on the effect of pausing on the donor template, we adjusted our system using wild type acceptor to replace the mutant ones. A similar pattern of pauses was observed in this system compared with the previous system (Fig. 2B), expected because the primer-donor complexes were the same. This time we observed a small decrease in transfer efficiency with G-mut (10.72%) and GAG-mut (22.23%) substrates (Fig. 2C); however, the transfer efficiency of AG-mut substrates was similar to that of the wild type substrate, even though AG-mut had distinctly reduced pausing (Fig. 2C). The transfer efficiency of A-mut was also similar to that of the wild type, which was expected, because the A-mut and wild type substrates shared a similar pause pattern.

An Increase of Transfer Events at Downstream Pausing Sites Compensates for the Decrease of Transfer Events at the Hot Spot with the GAG-mut Substrate

The fact that the disruption of runs of G residues eliminated the corresponding pauses but did not alter transfers seems to conflict with the currently held idea that pauses promote transfers. A likely explanation was suggested by an additional effect of the mutations. Pausing was noticeably enhanced in mutant substrates at the downstream pausing sites in the direction of DNA synthesis, including the DIS and SD, indicating the possibility of a shifting of the distribution of the transfer events to these downstream pause sites (Fig. 2B). To explore this possibility, we compared the transfer profile of the GAG-mut substrate, which exhibited the least pauses, with that of the wild type substrate. There are 15 individual base pair differences between donor (NL4-3) and acceptor (JRCSF) in the 320-bp homology region. These naturally occurring sequence variations were used as markers to map the crossover sites within the transfer products. Briefly, transfer products were isolated, cloned, and sequenced. Each clone was aligned against donor and acceptor, and all the markers on the transfer products were categorized as being characteristic of either the NL4-3 or the JRCSF to locate the crossover sites. Fig. 3 presents the distribution of transfer events as determined from three independent experiments. The arrows on the right of the charts indicate the direction of DNA synthesis. Most transfer products resulted from a single template-switching event. Approximately 1 in 20 clones was the result of triple crossovers within the templates, and even a five-time crossover product was observed in our experiments.

FIGURE 3.

FIGURE 3.

Transfer profiles of WT (A) and GAG-mut (B) templates in vitro. The template descriptions are the same as in Fig. 2A. The x axis indicates the markers for the recombination events, numbered 1a–14. The y axis indicates the number of transfer events between markers divided by the total number of transfer events that occurred over the entire homology region, corrected for the number of bases between markers. Data collected from three independent experiments were indicated in black, white, and gray. The arrows on the right side of the charts indicate the direction of DNA synthesis.

We noticed a striking difference in transfer profile between the WT and GAG-mut template from markers 5 to 9, the region including the three runs of G or A/G residues. With the wild type template, 32.61% of the total transfer events occurred within this region (Fig. 3A), whereas only 8.64% of total crossovers occurred within the same region of the GAG-mut template (Fig. 3B). The 3.8-fold decrease in crossover events resulting from mutations in only 6 nucleotides of a 65-nt-long region highlights the importance of these G runs. Significantly, transfer frequency in the region downstream of the runs of G, before marker 5, increased substantially with the GAG-mut substrate (81.49%) compared with that with the wild type substrate (63.51%). This result is consistent with the enhancement of the downstream pauses in the pause profile around SD and DIS (Fig. 2B), confirming the shift of recombination events from the hot spot region to enhanced downstream pausing. Similarly, a higher recombination frequency upstream of the G runs (after marker 9) was observed with the GAG-mut substrate (9.86%) compared with the wild type substrate (3.88%). These results not only highlight the contribution of runs of G residues in promoting the recombination in and around the hot spot region but also demonstrate that although the overall transfer efficiency did not decrease very much as a result of the mutations (Fig. 2C), eliminating pauses by disrupting G runs substantially decreased transfer efficiency in the hot spot region.

Transfer Efficiency of a Substrate with Disrupted G Runs Decreased Substantially in Short Templates without Downstream Pause Sites

To avoid the effects of downstream pause sites on the overall rate of strand transfer and concentrate on the significance of the runs of G residues in the hot spot region, short templates without downstream pause sites were designed with both wild type and mutated sequences. Based on our experiences with the longer templates, we created a short mutant template in which all three G runs were disrupted (called sGGG-mut) in an attempt to minimize pausing in the hot spot region (Fig. 4A). This was based on the observation that disruption of the A run did not affect either pauses or transfer efficiency (Fig. 2, B and C). The sequences around the hot spot region of the sWT and short GGG-mut (sGGG-mut) templates are shown in Fig. 4A. The short templates have an 84-nt homologous sequence between the donor (NL4-3) and acceptor (JRCSF) RNA templates (Fig. 4C). The runs of G residues are located near the 5′-end of the donor RNA template, within the homology region. Strand transfer assays were performed under the same conditions as described in the previous experiments but using sWT or sGGG-mut donor and sWT acceptor templates. As expected, three pause bands corresponding to the G runs appeared clearly with sWT; however, these bands were almost completely eliminated with sGGG-mut (Fig. 4B). Strikingly, the transfer efficiency of the sGGG-mut decreased about 72% compared with that of the sWT (Fig. 4D). This result demonstrates that the G runs were indeed responsible for the high transfer efficiency of the hot spot region, and the disruption of these G runs both eliminated the pause profile and suppressed the transfer efficiency.

FIGURE 4.

FIGURE 4.

Strand transfer assay using short templates that did not include downstream pause sites. A, sequences of sWT and sGGG-mut donor RNAs are shown. Base substitutions made to disrupt the three G or A/G runs (underlined) in the gag hot spot were indicated in bold. B, strand transfer assay of sWT and sGGG-mut templates is shown. Different donor RNAs were indicated on the top of the gel, and the acceptor RNAs used in this assay was sWT acceptor RNA. Reactions were terminated at 2, 5, 10, 20, and 40 min. Donor extension (DE)-sWT donor extension reaction in the absence of acceptor template is shown. This reaction was terminated at 40 min. C, shown is a schematic of the short RNA templates. Bold lines represent the NL4-3 donor RNA (black) and JRCSF acceptor RNA (gray), whereas the hatched line at the 5′-end of the donor RNA represents several nucleotides added to prevent the end transfer. The nucleotides positions in the HIV-1 genomic RNA corresponding to the donor and acceptor RNAs are indicated. Three G or A/G runs are located near the 5′-end of the donor template. D, relative transfer efficiencies of sWT and sGGG-mut templates are shown. DE, donor extension product; TP, transfer product.

Guanosine-rich Sequences in the Hot Spot Region Can Form an Intramolecular G-quartet

First identified in telomeric DNA, some G-rich sequences are able to form a unique G-quartet structure in vitro (3941). G-rich sequence can form either intermolecular or intramolecular G-quartets (42, 43). The association of either two or four parallel strands forms the intermolecular G-quartet (42). For example, to form an intermolecular G-quartet by four parallel strands, each strand is required to contain a run of G residues (usually 3–5 Gs). The first G residues in these G runs from each strand form a square planar array (core structure element) by Hoogsteen hydrogen bonds. The next G residues in the G runs from each strand form the same core structure element and stack on the first one and so on (40). An intramolecular G-quartet is formed by repeated folding of a single polynucleotide containing four evenly spaced runs of G residues in which the G runs form the core structure element, whereas the variable sequences between G runs are placed in loops (40). Fig. 5A shows an example of the structure of an intramolecular G-quartet formed by the sequence in the gag hot spot. G-quartet structure can be formed preferentially in the presence of specific monovalent cations (44). In general, the order of preference is K+ > Na+ > Cs+ > Li+ (44). Our study focused only on potential intramolecular G-quartets, as it is more ready to form in physiological conditions (45, 46).

FIGURE 5.

FIGURE 5.

Formation of an intramolecular G-quartet by the sequence around G1. A, shown is a schematic representation of the intramolecular G-quartet structure. The structure is formed by 5′-GGU UAA GGC CAG GGG G-3′. B, native gel assay of sWT and sGGG-mut templates is shown. Different templates and cations are indicated on the top of the gel. Bands representing monomer and intramolecular G-quartet are indicated.

We first used a web-based software program for G-quartet prediction, QGRS Mapper, to examine whether the guanosine-rich sequence in the gag hot spot region is likely to form intramolecular G-quartets (47). According to the QGRS Mapper, the sWT donor RNA is capable of forming an intramolecular G-quartet, whereas the sGGG donor RNA, in which all the G runs were disrupted, cannot form this structure. The RNA sequence that can form the G-quartet predicted by QGRS Mapper was located around G1 and contained all the G residues from G1 and four other G residues between G1 and A/G (Fig. 5A). This potential intramolecular G-quartet formation sequence is described in Fig. 5A. To confirm actual structure formation, a native gel assay was performed using labeled sWT and sGGG-mut donor RNA templates (Fig. 5B). In the absence of cations or in the presence of Li+, sWT and sGGG-mut donor RNA migrated strictly according to length. However, in the presence of K+ or Na+, a more compact structure of sWT was observed on the gel with a greater mobility than that of the sWT in the presence of Li+, consistent with an intramolecularly folding. Significantly, this preferential formation pattern is in agreement with the formation conditions for G-quartets. Moreover, sGGG-mut RNA did not exhibit the sharply varying influences of the different cations indicative of the formation of compact structure, suggesting that intact G runs were required to form the structure. Significantly, the native gel assay demonstrated the ability of the G runs in the gag hot spot to form the unique intramolecular G-quartet structure.

The Cation-dependent Pause Profile Suggests That an Intramolecular G-quartet Caused RT Pausing and Facilitated Transfers

To investigate the link between the special structure motif of the gag hot spot and its high frequency of crossovers, we performed strand transfer assays using WT and GAG-mut RNA substrates with different cations (K+, Li+, or Na+) in the reactions (Fig. 6). Monovalent cations added to the transfer reaction do not significantly affect the enzymatic activity of RT (48, 49). In the presence of K+, pause sites caused either by known hairpin structures, such as DIS, SD, ψ, and AUG, or by runs of G residues (G1, A/G, G2) were observed using the substrate with a wild type sequence. In the presence of Li+ or Na+ the pauses caused by hairpin structures on the donor template were not affected, whereas the pauses caused by G runs, especially the G1 pause, were greatly reduced. This cation-dependent pause profile suggests intramolecular G-quartets as a unique mechanism in pausing HIV-1 RT. A similar result was also observed with the short templates without downstream pause sites (data not shown).

FIGURE 6.

FIGURE 6.

Strand transfer assay of WT and GAG-mut template under different cation conditions. The template descriptions are the same as in Fig. 2A. Different templates and cations are indicated on the top of the gel. The acceptor RNA used in this assay is WT acceptor RNA. Reactions were terminated at 5, 10, 20, 40 min. Major pause sites are indicated on the left side of the gel. DE, donor extension product; TP, transfer product.

With the GAG-mut template, G1 and G2 pause bands were eliminated even in the presence of K+, whereas A/G pause was still present because the G residues in A/G were not mutated (Fig. 6). Interestingly, this A/G pause was weakened significantly when Li+ or Na+ was present, consistent with disruption of its participation in cation-dependent G-quartet formation. Because the nearby G runs had already been disrupted in the GAG-mut template, the A/G run might have participated in the formation of an intermolecular G-quartet tetramer.

It surprised us that the G1 pause disappeared in the WT template when Na+ was present, as the formation of the intramolecular G-quartet was observed in the native gel analysis. A possible explanation is that the intramolecular G-quartet prefers K+ to Na+, and although the quartet structure can form in the presence of either K+ or Na+, this structure is more stable with K+ (50). Moreover, the quantitation of band density using ImageQuant software indicated a weak but detectable G1 pause band in the presence of Na+. Overall, these results demonstrate that intramolecular G-quartet structure formation at the runs of G caused RT pausing on the donor template. This pausing was presumably effective in promoting transfer by favoring RT RNase H cleavages as occur with pausing at hairpins. Of course, transfer was observed under all conditions in this experiment, because this template allows hairpin-based pausing.

DISCUSSION

It is important to study the mechanism of strand transfer using natural HIV-1 hot spot sequences in vitro, because this will further our understanding of template features that promote recombination in vivo. The gag recombination hot spot was identified in assays designed to test whether recombination during HIV-1 replication is more frequent near DIS. DIS dimerizes the donor and acceptor RNAs by folding donor and acceptor into a kissing loop hairpin (51). Strand transfer in vitro with templates containing DIS was highly efficient (19). We attributed this result to a “proximity effect” in which the dimerization results in high local concentration of acceptor, an environment promoting strand transfer (52). Surprisingly, using a cell culture-based HIV-1 recombination assay system, we found a strikingly prominent peak of recombination in the nearby 5′-end region of the gag gene whereas only a small peak of crossovers occurred at the DIS (28). Strand transfer assays in vitro showed regularly spaced pauses to be correlated with three G runs in the gag hot spot. We propose that these G residues have a special ability to pause RT in a way that is particularly favorable to promote transfers. Here, we present evidence that RT pausing is critical in promoting highly efficient transfer in the gag hot spot, and intramolecular G-quartet formation is involved in stalling the RT in the same region.

We first examined the RT pause profile of the WT template spanning the primer-binding site through the beginning of the gag (Fig. 1B). Notably, three pauses between markers 5 and 9 were not derived from hairpin structure but instead correlated with three runs of G or A/G residues. This correlation was reasonable, because homopolymeric nucleotide runs had been shown to promote RT pausing during reverse transcription (37, 38). Significantly, homopolymeric nucleotide runs were also reported to affect the rate and the location of the recombination events in a spleen necrosis virus recombination system (53). These connections encouraged us to further investigate the relationship between these runs of G residues and the rate of recombination in the gag hot spot.

To test how these G or A/G residues affect the RT pausing and strand transfer in our system, we disrupted them by base substitutions. Surprisingly, although corresponding pauses were eliminated by mutating G residues, the mutations did not make substantive changes in the probabilities of folding structure at the pause sites, according to the folding analysis program, RNA structure (54). This result indicated that the elimination of the pauses at the gag hot spot in mutant substrates was not caused by the alteration of the hairpin structures in this region. Moreover, although many studies demonstrated that pauses promoted transfers, we did not observe any decrease in transfer frequency using mutant donor and mutant acceptor templates. We reasoned that mutant acceptors present in our strand transfer assay complicated our system. On one hand, it was reported that hairpin structures on the acceptor RNA strongly promotes the transfer events (55). On the other hand, however, later observations indicated that folding in the acceptor template would block the invasion process, a critical step in the transfer mechanism (17, 36). To avoid these complex influences of acceptor structure on the transfer efficiency, we simplified our assays by using WT acceptor in the reactions.

The self-complementary sequence in the DIS loop was known to dimerize HIV-1 RNAs by forming a kissing loop dimeric structure (51, 56). In addition, the runs of G residues in the gag hot spot in HIV-1 have also been studied extensively in vitro regarding their potential role in facilitating the dimerization of the two genomic RNAs by forming intermolecular G-quartet structures (5759). This role was also proposed for G residues in Moloney murine sarcoma virus genomic RNAs, and it was further concluded that the kissing loop interaction is driven by the initial formation of intermolecular G-quartet dimers (60, 61). Multiple dimerization sites through the DIS to the beginning of gag would stabilize the alignment proximity of the two RNA genomes in a way certain to promote recombination. However, our results indicate that these proximity and alignment mechanisms may not be the only G run-derived processes that contribute to the gag hot spot. The intramolecular G-quartet formed in the same region as the intermolecular G-quartets is also highly effective in promoting transfer events. The special characteristics of the beginning of the gag region in forming both intermolecular and intramolecular G-quartets differentiate this region from the surrounding sequences, including DIS, as a hot spot for recombination in vivo.

It was evident from our results that the distribution of transfer events with the WT template in vitro did not exactly match that measured in vivo (Fig. 3). The transfer profile in vivo showed more than 60% recombination events happening within a 112-nt-long region located between marker 4 and marker 9 and only about 4% of recombination events located before marker 4 (28). In the transfer profile in vitro, however, only 45.21% of crossovers were located at the 112-nt-long hot spot region, whereas 50.91% of transfer events happened downstream, a 123-nt-long region containing the strong pausing sites DIS and SD. We have designed the RNA substrates carefully to avoid donor end transfer, so that it could not be the cause of the discrepancy. We also have already excluded PCR-generated recombinants from the transfer profile in vitro using methods described previously (28). Results from independent experiments revealed that 20–35% of the PCR amplification products contained markers from both donor (NL4-3) and acceptor (JRCSF) RNAs. We mapped the distribution of PCR-generated recombination (corrected for the observed ∼27.5% transfer frequency) and obtained crossovers resulting only from RT-mediated transfer by subtracting the PCR-generated crossover frequency from the observed crossover frequency at each marker. Nonetheless, a very high relative frequency of crossovers was still observed at the downstream pausing sites in vitro. It seems to us that although the G sequences promote transfer in the hot spot region, they are not strikingly more effective than hairpins at downstream pausing sites in vitro. Based on the current knowledge of the dual roles of G runs at the hot spot, we reasoned that the assay conditions, including the salt or template concentration, might not favor both forms of G-quartet structure. For example, the formation of the intermolecular G-quartet could be RNA concentration-dependent (50). In the study that identified the dimerization of HIV-1 genomic RNAs by intermolecular G-quartet, the authors detected the dimer formation at a concentration of 1.5 mm RNA strand (58), almost 10-fold higher than the concentration we used to detect the intramolecular structure. It is also possible that some other features of transfer in vivo make the G region more effective, such as the presence of protein chaperones, not represented accurately in our assay in vitro. These are important parameters, and we will be continuing to search for them.

Although not equivalent to that in vivo, the transfer profile in vitro was striking in two ways. First, the disruption of the G-runs greatly decreased the frequency of strand transfer at the hot spot region, especially between markers 5 and 9, the sequence overlapped with the G runs. Second, a higher frequency of recombination was observed at downstream pausing sites with the GAG-mut template, indicating that the effect of base substitutions on the transfer frequency was expressed beyond the region where point mutations were made. Importantly, the transfer profiles before and after mutations remained consistent with the pause profile. That is, more pauses were observed at the DIS and SD in the GAG-mut template after mutations, where the transfer profile shifted, demonstrating the correlative relationship between pauses and transfers. Results from the later experiments using short substrates without the downstream pause sites supported the conclusion that G runs are responsible for the high rate of recombination in the gag hot spot.

Our RT pausing assays and computational predictions both indicated that the G1 run together with four G residues between A/G and G1 could participate in the formation of an intramolecular G-quartet under similar conditions (cation concentration and buffer conditions) as used in the strand transfer assay. We propose that this intramolecular G-quartet is the factor that pauses RT, at least at the G1 site. Strand transfer assays, in which the structure of this G-quartet was disrupted either by base substitutions in G1 or by creating unfavorable cation conditions for G-quartet formation, showed the complete elimination of the G1 pause as expected.

Although these results explained one possible cause of the observed pausing of RT by G residues, the question of how the G2 and A/G runs stall the RT remains unsolved. G2 and A/G are not predicted to be directly involved in the formation of intramolecular G-quartet structure, yet they not only caused RT to pause, but also the pauses were nearly eliminated in the buffer containing Li+. One explanation is that the G runs have an intrinsic ability to produce pauses. However, if this were true, it is hard to explain their sensitivity to the cations. Alternatively, it is possible that A/G and G2 form a structure other than an intramolecular G-quartet, and this structure is not only able to cause RT pausing but also is sensitive to the cation conditions. This structure, perhaps, is the intermolecular G-quartet. According to a model proposed by Dipankar Sen and Walter Gilbert (50), dimerization through G-quartets requires, first, folding back a single strand, which contains at least two G runs. Then Hoogsteen hydrogen bonds are formed between each pair of G residues from the two G runs, making it into a hairpin-like structure. Two strands, each containing one of these hairpin-like structures, then connect to each other between corresponding G residues, producing a dimer (50). If this model is accurate, it would seem that the hairpin-like structure on the donor RNA should be able to cause RT pausing just as any other hairpins, but its formation will be influenced by specific cations.

It is not surprising that an intramolecular G-quartet stalls HIV-1 RT, because previous studies demonstrated that this structure can pause Taq DNA polymerase, Escherichia coli DNA polymerase, and Avian myeloblastosis virus RT (62). Moreover, according to a computational study, potential intramolecular G-quartet-forming sequences were enriched within recombination hot spots in the human genome (63). About 37% of the recombination hot spots were found to contain at least one G-quartet motif, compared with 13.8% of cold spots. This computational study, in agreement with our data, demonstrated the potential positive relationship between the intramolecular G-quartets and the rate of recombination.

Although direct evidence of the existence of G-quartet structures in vivo is still lacking, the coincident positioning of intramolecular G-quartet motifs with cellular processes in addition to recombination, such as transcription and replication, supports the significance of these structures (64). For example, intramolecular G-quartet formation in a promoter sequence has been linked to the transcriptional activity of the corresponding gene. According to a genome-wide computational analysis, 43% of human protein-coding genes contain at least one G-quartet sequence in their promoter region (65). Furthermore, study of Werner syndrome suggests that intramolecular G-quartets also affect DNA replication (66). Werner syndrome is a premature aging syndrome characterized by slow growth rate and accelerated telomere shortening in Werner syndrome cells (67). The basis of the disease is the loss of Werner syndrome, which is responsible for resolving the G-quartet structure on the lagging strand of telomeric DNA (68). These intramolecular G-quartets stall the replication fork and result in telomere loss (69, 70).

In conclusion, our work implicates intramolecular G-quartet structure in strand transfer in HIV-1. Strand transfer assays indicate that formation of this structure pauses RT synthesis and that those pauses could initiate acceptor invasion-driven transfer events. Taken together our results imply that activity of the gag hot spot for recombination is facilitated by its unique high content of G runs, their ability to form G-quartet structure, and the capacity of that structure to promote the high efficiency transfer.

Acknowledgments

We thank Dr. Dorota Piekna-Przybylska, Sean T. Rigby, Min Song, and Shuya Kyu for helpful discussions and critical reading of the manuscript. We also thank Dr. Robert J. Gorelick for providing NCp9.

*

This work was supported, in whole or in part, by National Institutes of Health Grant GM049573 (to R. A. B.).

4
The abbreviations used are:
HIV-1
human immunodeficiency virus, type 1
RT
reverse transcriptase
DIS
dimerization initiation sequence
NC
nucleocapsid
nt
nucleotide(s)
R
repeat
MA
matrix
CA
capsid
SD
splice donor
WT
wild type
sWT
short WT.

REFERENCES


Articles from The Journal of Biological Chemistry are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES