Skip to main content
The Journal of Biological Chemistry logoLink to The Journal of Biological Chemistry
. 2008 Jun 6;283(23):15965–15974. doi: 10.1074/jbc.M801354200

The GP(Y/F) Domain of TF1 Integrase Multimerizes when Present in a Fragment, and Substitutions in This Domain Reduce Enzymatic Activity of the Full-length Protein*,S⃞

Hirotaka Ebina 1, Atreyi Ghatak Chatterjee 1,1, Robert L Judson 1,1, Henry L Levin 1,2
PMCID: PMC2414268  PMID: 18397885

Abstract

Integrases (INs) of retroviruses and long terminal repeat retrotransposons possess a C-terminal domain with DNA binding activity. Other than this binding activity, little is known about how the C-terminal domain contributes to integration. A stretch of conserved amino acids called the GP(Y/F) domain has been identified within the C-terminal IN domains of two distantly related families, the γ-retroviruses and the metavirus retrotransposons. To enhance understanding of the C-terminal domain, we examined the function of the GP(Y/F) domain in the IN of Tf1, a long terminal repeat retrotransposon of Schizosaccharomyces pombe. The activities of recombinant IN were measured with an assay that modeled the reverse of integration called disintegration. Although deletion of the entire C-terminal domain disrupted disintegration activity, an alanine substitution (P365A) in a conserved amino acid of the GP(Y/F) domain did not significantly reduce disintegration. When assayed for the ability to join two molecules of DNA in a reaction that modeled forward integration, the P365A substitution disrupted activity. UV cross-linking experiments detected DNA binding activity in the C-terminal domain and found that this activity was not reduced by substitutions in two conserved amino acids of the GP(Y/F) domain, G364A and P365A. Gel filtration and cross-linking of a 71-amino acid fragment containing the GP(Y/F) domain revealed a surprising ability to form dimers, trimers, and tetramers that was disrupted by the G364A and P365A substitutions. These results suggest that the GP(Y/F) residues may play roles in promoting multimerization and intermolecular strand joining.


Retroviruses and long terminal repeat (LTR)3 retrotransposons are closely related elements that depend on integrase (IN) to insert their cDNA into the genome of host cells. IN proteins are composed of three structurally distinct domains (1). The N-terminal domains contain an HHCC motif, and the catalytic core domains in the center of INs possess the DDE residues that mediate catalysis. The C-terminal domains of INs have little sequence conservation but possess nonspecific DNA binding activity.

INs expressed as recombinant proteins possess a variety of catalytic activities. Oligonucleotides that model the termini of LTRs are trimmed by IN in a processing reaction that removes terminal nucleotides 3′ of the conserved CA. Once the processing reaction is complete, strand transfer occurs. In this reaction, the 3′-hydroxyls of the terminal A serve as nucleophiles in transesterification reactions that cleave phosphodiester bonds in the target DNA and make covalent bonds between the 3′-ends of the viral DNA and the 5′-ends of the target DNA (2, 3). Under highly specific conditions, IN can catalyze concerted integration, the simultaneous insertion of two ends of donor DNA into the same site of target DNA (47). In addition, INs also catalyze disintegration, the reverse of integration that uses model substrates that mimic one end of an LTR inserted into target DNA (8, 9). This assay is a particularly sensitive method for measuring catalytic activity of INs, perhaps because it detects strand breaking and joining within a single substrate molecule, an intramolecular reaction.

Currently, no molecular structures exist of an IN that possesses all three domains. The structures that do exist are of individual domains and do not include bound DNA. As a result, the function of the C-terminal domain in integration is not clear. An additional difficulty in determining the function of the C-terminal domains stems from their low levels of sequence conservation. However, close examination of C termini did identify two separate sequence modules that exist either alone or in combination in a wide variety of INs (10). One module termed the GP(Y/F) domain is present in the INs of a diverse set of LTR retrotransposons of the Metaviridae family (formally called the Ty3/gypsy family) (10, 11). In addition, the GP(Y/F) domain is also present in the distantly related genius of γ-retroviruses that includes the Moloney murine leukemina virus (M-MuLV). The function of the GP(Y/F) domain has not been studied. The chromodomain (CHD) is another module discovered in the C termini of INs and is present in the metavirus genus of LTR retrotransposons. The CHD is similar to the domains in HP1 proteins that mediate the formation of heterochromatin by binding histone H3 when methylated at Lys-9. Recent results indicate that CHDs in some INs do have interactions with histone H3 methylated at Lys-9 (12).

Tf1 is an LTR retrotransposon of Schizosaccharomyces pombe that integrates specifically upstream of polymerase II-transcribed genes (1315). The IN of Tf1 possesses the HHCC motif near the N terminus and the DDE motif in the central region. Interestingly, the C-terminal portion of the Tf1 IN possesses both the GP(Y/F) domain and the CHD (10). Recent experiments revealed that Tf1 IN purified as a recombinant protein possesses significant activity in assays that measure 3′ processing, strand transfer, and disintegration (16). Assays of Tf1 IN without the CHD revealed the surprising result that the CHD restricts catalytic activity by as much as 8-fold (16).

The experiments reported here use the IN of Tf1 as a model in order to study the function of the GP(Y/F) domain. A series of deletions in recombinant IN revealed that the C-terminal domain was required for disintegration activity. However, a single amino acid substitution in a conserved amino acid of the GP(Y/F) domain (P365A) did not significantly reduce disintegration. Assays for strand transfer activity revealed the P365A substitution significantly reduced activity. The results of gel filtration and chemical cross-linking indicated that a 71-aa fragment containing the GP(Y/F) domain formed dimers, trimers, and tetramers. Single amino acid substitutions in conserved residues of the GP(Y/F) domain, G364A and P365A, abrogated this multimerization. These data suggest that the GP(Y/F) residues may promote multimerization and strand transfer activity.

EXPERIMENTAL PROCEDURES

Plasmids—To generate versions of Tf1 with substitutions in the GP(Y/F) domain, fusion-PCR fragments with mutations were inserted into the NarI and BsrGI sites of pHL414-2 (Wt Tf1-neo). The primers are described in Table S1. Transposition assays were performed as previously described (17).

The construction of pHL2468, the plasmid for the expression of the full-length Tf1 IN and the plasmid expressing IN lacking the CHD (CH-), pHL2469, have been described previously (16). The plasmids expressing the fragments of IN were generated by similar methods. The DNA fragments coding for each protein were amplified by PCR with the Pfu Ultra Hotstart 2× Master Mix (Stratagene) and primer pairs as indicated in the supplemental data (Table S1). The DNA generated was cleaved with NdeI and BamHI and cloned into the vector pET15b cut with NdeI and BamHI. Each insert was sequenced. All plasmids are listed in Table S2.

The Purification of His-tagged Recombinant Proteins—BL21 cells containing the expression plasmids were grown at 32 °C until they reached an A600 of ∼0.6. Inductions were performed at 16 °C with 1 mm isopropyl 1-thio-β-d-galactopyranoside for 24 h. The cells were harvested, and pellets were stored at -80 °C.

The methods of protein purification were based on our previous report but contained the following modifications (16). The volumes of the cultures were 250 ml, and the sonication was performed in 30 ml of buffer. The bed volumes of Sepharose-Co2+ (BD Talon, BD Biosciences) for the columns were 2.0 ml. All column treatments were by gravity flow. The column washes were 50 ml with no imidazole followed by 100 ml with 25 mm imidazole. The proteins were eluted with a sequence of 4-ml steps that contained 35, 50, 75, 100, and 150 mm imidazole.

The fractions eluted from the column were dialyzed into a storage buffer (50 mm NaH2PO4, pH 7.5, 10% glycerol, 1 mm EDTA, 1 mm dithiothreitol, and 0.5 m NaCl) and frozen at -80 °C.

Purification of IN Lacking the His Tag and Partial Trypsin Digestion—This purification was based on the previous report with modifications that differed from the purification of the His tag-containing IN described above (16). Harvested cells with pHL2468 were incubated for 15 min on ice in 50 mm HEPES, pH 7.5, buffer, 0.1 m NaCl, 1 mm phenylmethylsulfonyl fluoride, 2 mm β-mercaptoethanol, 0.5 mg/ml lysozyme, and 1× Complete EDTA-free protease inhibitor mixture (Roche Applied Science). The cells were sonicated, CHAPS was added to the lysate to a final concentration of 0.05%, and the sample was inverted six times. The sample was then spun in an SW28 at 25,000 rpm for 1.5 h. After ultracentrifugation, the supernatant was loaded onto a 5-ml HIS Trap FF column (Amersham Biosciences). At 1 ml/min, the column was washed with the lysis buffer plus the CHAPS and 0.5 m NaCl and 20 mm imidazole. The protein was eluted into 2-ml fractions with an imidazole gradient of 40–400 mm.

IN-containing fractions were dialyzed into 75 mm NaCl with 50 mm HEPES-NaOH (pH 7.5), 10 mm MgSO4, 10% glycerol, 1 mm EDTA, 2 mm dithiothreitol, and 0.1% CHAPS and loaded onto a 5-ml heparin-Sepharose FF column (Amersham Biosciences). The protein was eluted with 0.5 m NaCl. IN-containing fractions were combined and dialyzed into thrombin (Sigma)-containing buffer to remove the His tag. Thrombin was removed with a benzamidine column (Amersham Biosciences). The IN was stored in 50 mm HEPES, pH 7.5, buffer, 0.5 m NaCl, 1 mm EDTA, 2 mm dithiothreitol, 10% glycerol, and 0.1% CHAPS at a final concentration of 0.73 μg/μl.

Either 50 or 500 ng of trypsin (Roche Applied Science) was added to 18 μg of Tf1 IN in the above storage buffer and incubated at 30 °C. Reactions were stopped and collected at 0.5, 2, 4, 8.5, and 21.5 h by the addition of 2 mm phenylmethylsulfonyl fluoride. A mock reaction, containing no trypsin, was incubated for 21.5 h at 30 °C. Reactions were divided and loaded on both 20% Tris-glycine SDS-polyacrylamide gels (Invitrogen) for band visualization and 4–12% NuPAGE BisTris gels (Invitrogen) for N terminus analysis. Tris-glycine gels were stained with Coomassie Brilliant Blue R250. The protein from the NuPAGE gels was electrotransferred to the Immobilon P membrane (Millipore). The membrane was Coomassie-stained, and bands were cut out and sent for N terminus sequencing at the Food and Drug Administration Center for Biologics Evaluation and Research by Dr. Nga Nguyen.

Disintegration and Strand Transfer Assays—The disintegration and strand transfer assays were conducted as described (16). However, in this work, the reactions were stopped by adding 10 μl of loading buffer (50% glycerol, 250 mm EDTA, 0.5 mg/ml bromphenol blue, and 0.05 mg/ml xylene cyanole) and heating at 95 °C for 3 min. The samples were loaded on an 8 m urea, 14% (w/v) polyacrylamide sequencing gel and electrophoresed in Tris-borate EDTA buffer.

HL1127 and HL1034 were the DNA oligonucleotide that were 5′-end-labeled for use in the disintegration and strand transfer assays, respectively (Table S1) (16). Aliquots containing 0.4 μg of each oligonucleotide were 32P-end-labeled using 20 units of T4 polynucleotide kinase (New England Biolabs) and 100 μCi of [γ-32P]ATP in a volume of 20 μl of the buffer from the supplier. After an incubation for 1 h at 37 °C, 1 μl of 0.25 m EDTA was added to each sample, and the mixtures were treated for 5 min at 95 °C to inactivate the enzyme. After adding NaCl to a final concentration of 0.1 m and increasing the volume to 40 μl, the labeled DNA was annealed to 3 times the molar ratio of the appropriate nonlabeled DNA. After 5 min at 95 °C, the mixtures were allowed to cool to room temperature for about 2 h. The duplexes generated were then stored at -20 °C and were used after freezing and thawing for up to 1 month. One pmol of substrate was added to each reaction.

Pull-down Experiment—Binding of His-tagged INs to full-length IN lacking a His tag was assayed in a binding buffer containing 50 mm HEPES (pH 7.5), 0.5 m NaCl, 0.1% Triton X-100. Six micrograms of recombinant His-tagged IN was incubated with 6 μg of full-length IN lacking the tag in 500 μl of binding buffer. Following a 30-min incubation at 4 °C, the reactions were supplemented with 40 μl of prewashed Ni2+-nitrilotriacetic acid-agarose (Qiagen) and stirred for an additional 60 min at 4 °C. The agarose beads were recovered by centrifugation for 1 min at 960 × g at 4 °C and washed with 500 μl of binding buffer two times and then washed three times with 500 μl of binding buffer supplemented with 25 mm imidazole. Bound proteins were eluted in 40 μl of binding buffer supplemented with 400 mm imidazole and analyzed on a 10–20% SDS-polyacrylamide gel. The proteins in the gel were transferred to Immobilon-P membranes (Millipore). The membrane was probed with anti-IN rabbit antibody (1:10,000) (41). The secondary antibody was horseradish peroxidase-conjugated donkey anti-rabbit Ig, whole antibody (1:10,000; Amersham Biosciences). ECL Plus was used to detect the protein signals (Amersham Biosciences).

Size Exclusion Chromatography—Size exclusion chromatography was performed using a Superdex 75 10/300 GL column or Superdex 200 PC 3.2/30 column on an AKTA FPLC system (GE Healthcare). A flow rate of 0.5 ml/min with a mobile phase of 50 mm HEPES, pH 7.5, 0.5 m NaCl, 1% (v/v) glycerol, 1 mm EDTA, 1 mm dithiothreitol was used for the Superdex 75HR 10/300 GL column. A flow rate of 0.25 ml/min was used for the Superdex 200 PC 3.2/30 column. Typically, 100 μg of purified IN polypeptides at a concentration of 1 μg/μl were injected. The concentrations of the proteins and fragments were as follows: IN, 17.4 μm; core, 32.7 μm; ΔC-(1–334), 24.3 μm; GP(Y/F) fragment, 94.3 μm; CHD, 105.3 μm. Samples were subjected to centrifugation for 5 min at 10,000 × g prior to injection on the column. Absorbance of the column eluate was monitored at 280 nm. Samples from peak fractions were monitored by SDS-PAGE for the presence of the expected protein species. The column was calibrated using five different globular proteins as molecular weight standards (Gel Filtration Calibration Kits, High Molecular Weight and Low Molecular Weight; Amersham Biosciences), and the apparent molecular weight of each sample peak was determined using linear regression of the log of known molecular weight versus the elution behavior (Kav or elution time).

Chemical Cross-linking—Cross-linking of the GP(Y/F) fragment was conducted using bis(sulfosuccinimidyl)suberate (Pierce) at concentrations of 0.2, 1.0, and 2.0 mm in reactions (100 mm HEPES, pH 7.5, 0.5 m NaCl) that were incubated at room temperature for 60 min. The GP(Y/F) fragment was at a concentration of 25 μm. The reactions were quenched by adding Tris-HCl to a final concentration of 60 mm and incubating at room temperature for 15 min. One-half volume of 2× sample buffer without 2-mercaptoethanol (125 mm Tris-HCl, 5% SDS, 20% glycerol, 0.5 mg/ml bromphenol blue) was added to the reactions, and the samples were heated at 95 °C for 10 min. Covalently linked multimers were detected by separation in 10–20% SDS-polyacrylamide gels and silver staining.

DNA Binding Assay—Reaction mixtures for UV cross-linking contained the same buffer components as the disintegration reactions, except the concentration of glycerol was 1% (w/v). Unless otherwise stated, 1 μg of protein was combined with 1 pmol of DNA per reaction. The following were the molar quantities of protein added to 1 pmol of DNA: IN, 17.4 pmol; CH-, 20.2 pmol; N-terminal domain (NTD), 66.2 pmol; core, 30.6 pmol; GP(Y/F), 94.3 pmol; CHD, 105.3 pmol. The mixtures were incubated over ice for 40 min. They were then spotted onto parafilm placed over ice and exposed to UV light for 1.5 min using a UV Stratalinker 2400 (Stratagene) set on time mode. Seven microliters of 4× NuPAGE LDS sample buffer (Invitrogen) were added to 20 μl of each reaction, and the samples were boiled for 5 min and electrophoresed in 4–12% BisTris NuPAGE gels (Invitrogen). Gels were subjected to autoradiography. For the quantitative UV cross-linking assays, 1.2, 6, and 30 pmol of proteins were combined with 1 pmol of DNA.

RESULTS

The Domain Structure of Tf1 IN—The GP(Y/F) domain was reported to be a conserved module in the INs of the Metaviridae family of LTR retrotransposons and in the γ family of retroviruses (10, 11). The alignment of INs in Fig. 1A shows the broad conservation of the GP(Y/F) domain among these distantly related families. The amino acids of Tf1 IN that encompass the GP(Y/F) domain are aa 339–368 (10), and this is the definition we use in this report. The alignment also indicates that the INs of the lenti and β families of retroviruses lack a significant portion of the domain, including the N-terminal amino acids and the Y/F that defines the GP(Y/F) domain.

FIGURE 1.

FIGURE 1.

The conserved domains of retrovirus and LTR retrotransposon INs. A, this alignment of GP(Y/F) domains is a composite of previously published alignments with minor modifications (10, 35). The black bar indicates the position of the GP(Y/F) domain, and the conserved residues are shown in yellow. The position of the trypsin cleavage in the GP(Y/F) domain of Tf1 is marked in brown with a double arrow. The sequences from the Metaviridae family were Tf1 (M38526), Sushi (AAC33526), Aspergillus flavus (AAR29046), Maggy (T18348), Skippy (S60179), Skipper (AAC39021), Ty3 (AAA98435), Blastopia (CAA81643), Mdg (AAD14015), Athila (AAD37020), and 412 (P10394). Sequences from the γ family of retroviruses were BaEV (BAA89659), M-MuLV (NC_001501), GaLV (P21414), PERV (AAY28928), avian REV (ABC26818), HERV-E (M10976 K02168 K02169), and HERV-H (D11078). Sequences from the lenti retroviruses were BLV (ABB90626), HTLV-1 (P14078), HIV-1 (AAC61700), and EIAV (P32542). Sequences from the β-retroviruses were MMTV (AAC24859) and MPMV (NP_954565). B, the INs of HIV-1, M-MuLV, and Tf1 were drawn to scale. The positions of conserved residues are indicated. The GP(Y/F) domain is shown in yellow, and the CHD is colored green.

Tf1 belongs to the Metaviridae family and was one of the initial members reported to have an IN with the GP(Y/F) domain (10). The IN of Tf1, as a recombinant protein, is highly soluble and possesses robust catalytic activity (16). These properties motivated us to examine the IN of Tf1 as a model to study the function of the GP(Y/F) domain. Since little was known about the structure of Tf1 IN, our initial experiments tested whether it possessed the three domain architecture typical of INs. A scaled drawing of Tf1 IN indicates that the position of its HHCC and DDE motifs are quite similar to those of M-MuLV IN (Fig. 1B). To map the position of its domains, Tf1 IN purified from bacteria was subjected to partial proteolysis with trypsin, and the N termini of the products were sequenced. The results confirmed that the IN does have a three-domain structure with an N-terminal domain (aa 1–110), a catalytic core (aa 111–354), and a C-terminal domain (aa 355–477) (Fig. 1B and supplemental Fig. S1). Interestingly, the cleavage between the central and C-terminal domains occurred in the middle of the GP(Y/F) domain. This indicated that the GP(Y/F) domain assembles into two stable segments split by protease-accessible residues.

Substitutions in the GP(Y/F) Domain Significantly Reduce Transposition Activity—To test whether the amino acids of the GP(Y/F) domain possessed an important function, we made substitutions in the GPF residues of the transposon and measured the resulting transposition activity in vivo (Fig. 2). The assay for transposition activity consisted of expressing neo-containing copies of Tf1 in S. pombe and measuring the resistance to G418 that results from integration (17). The transposition frequencies of elements with the substitutions G364A, P365A, F366A, and G364A/P365A/F366A ((364–366)AAA) were significantly reduced (Fig. 2A). When the proportion of the cells with resistance to G418 was quantified, we found the substitutions G364A, P365A, F366A, and G364A/P365A/F366A reduced transposition by 59-, 7.6-, 26-, and 72-fold, respectively (supplemental Table S3). In order to analyze which steps of transposition were disrupted by the substitutions, cDNA synthesis and levels of IN were analyzed. Immunoblots revealed that the substitutions significantly reduced the levels of IN expressed in S. pombe (Fig. 2B). These reduced levels of IN made it difficult to identify a specific function of the GP(Y/F) domain in in vivo assays. DNA blots of cells expressing Tf1-neo revealed that the substitutions caused no more than a 2-fold defect in cDNA production (supplemental Fig. S2A). In addition, the substitutions did not reduce levels of reverse transcriptase (supplemental Fig. S2B).

FIGURE 2.

FIGURE 2.

Analysis of Tf1 transposons with substitutions in the GP(Y/F) residues. A, activities of the transposons with the GP(Y/F) substitutions, G364A, P365A, F366A, and G364A/P365A/F366A ((364–366)AAA), were measured in each of four independent transformants (T1–T4). The control strains included wild-type Tf1, Tf1 with a frameshift in IN, and Tf1 with a frameshift in protease. B, the levels of IN and GAG produced by the altered versions of Tf1 were measured on immunoblots probed with anti Gag (number 660) and anti IN (number 657) rabbit antibodies. WT, wild type.

The Role of the GP(Y/F) Domain in Disintegration and Strand Transfer—To determine whether the amino acids in the GP(Y/F) domain contributed to the catalytic activity of IN and to compare its function to other domains, we purified a set of Tf1 INs that contained various truncations (Fig. 3A). Each of the protein preparations exhibited a high degree of purity (Fig. 3B).

FIGURE 3.

FIGURE 3.

The sections of Tf1 IN expressed as recombinant proteins. A, the regions of IN expressed are illustrated relative to the zinc finger-like motif, the DDE catalytic residues, the GP(Y/F) domain, and the CHD. The His tag and the residues expressed are shown for each protein. The coordinates of the trypsin cleavages (triangles), the GP(Y/F) domain (horizontal lines), and the CHD (shaded) are shown. B, the purified proteins were run on an SDS-polyacrylamide gel stained with Coomassie Blue.

The disintegration assay is a sensitive method for identifying the core structures necessary for catalyzing strand breakage and joining. For example, the catalytic core domain of HIV-1 and Rous sarcoma virus INs are by themselves able to catalyze disintegration (18, 19). Each of the recombinant Tf1 proteins was assayed for disintegration activity with a previously established substrate that mimics the U3 end of the LTR inserted in a target site (Fig. 4A) (16). By itself, the catalytic core domain of Tf1 IN lacked activity (Fig. 4B, lanes 9–12). As observed previously, IN lacking the CHD (aa 1–406) was considerably more active than the full-length IN (Fig. 4B, lanes 1–8) (16). Truncations revealed that the N-terminal domain was required for catalytic activity (Fig. 4C, lanes 4–8). Other deletions revealed that the C-terminal domain was also necessary for activity (Fig. 4C, lanes 9 and 10). This requirement of the C-terminal domain is consistent with studies of the M-MuLV IN that found a similar requirement for the C terminus using disintegration assays (20). However, the same study found that the N-terminal domain of M-MuLV IN was not required for activity. Thus, Tf1 IN is distinct from the IN of M-MuLV in that it requires both N-terminal and C-terminal domains for disintegration activity.

FIGURE 4.

FIGURE 4.

The disintegration activity of the purified INs. A, the substrate for the disintegration assays consisted of a 76-nt DNA annealed to a 20-nt DNA that was labeled with 32P on its 5′-end. The DNA in red models the end of the transposon sequence, and the DNA in blue represents the target. The arrows represent the 3′-ends. The product of the reaction produced a 63-nt DNA that contained the 32P label on its 5′-end. B, the disintegration products of full-length IN-(1–477), CH--(1–406), and the central core domain (aa 110–354) are shown. The reaction mixture in lanes 1, 5, and 9 contained 0.5 μg of IN protein. The reactions in lanes 2, 6, and 10 contained 1.0 μg of IN protein. The reactions in lanes 3, 7, and 11 contained 3.0 μg of IN protein. The reactions in lanes 4, 8, and 12 contained 9.0 μg of IN protein. IN was omitted from the reaction mixture in lane 13. The substrates (filled arrowheads) and the products of disintegration (open arrowheads) are shown on the left. C, the disintegration activities of IN with N-terminal or GP(Y/F) domain truncations are shown. The activities of 0.5 μg of IN proteins were measured as described in B.

The C-terminal deletions removed the GP(Y/F) domain as well as downstream amino acids. To examine the role specifically of the GP(Y/F) domain, we substituted each of the GPF residues to alanine (G364A, P365A, and F366A). Unfortunately, full-length INs with the substitutions G364A and F366A were insoluble and could not be purified. Nevertheless, we were able to isolate full-length IN with the P365A IN (Fig. 3B, bottom left).

The isolation of full-length IN with the P365A substitution provided the opportunity to ask whether a single amino acid substitution in the GP(Y/F) domain would affect the catalytic activity of IN. When assayed with the disintegration substrate, the P365A substitution caused a moderate reduction in activity (Fig. 5A). However, the disintegration assay monitors strand joining in an intramolecular reaction that has modest substrate specificity. The strand transfer assay with oligonucleotides that mimic the double-stranded end of an LTR is a stringent test of an INs ability to complete strand joining of two molecules of substrate DNA. We therefore used the strand transfer assay to determine whether the P365A substitution reduced this integration activity. Relative to wild-type IN, the IN with the P365A substitution had significantly less strand transfer activity (Fig. 5B).

FIGURE 5.

FIGURE 5.

Catalytic activities of full-length IN with a substitution in GP(Y/F) domain. A, wild-type and IN with the P365A substitution were assayed for disintegration activity, as described in the legend to Fig. 4. The disintegration activities of 1.2 μm, 0.6 m, 0.3 μm, 0.15 μm, and 0.08 μm IN protein were measured. B, wild-type and IN with the P365A substitution were assayed for strand transfer activity using U3 substrates. The protein concentrations were the same as in A. C, the levels of reaction products for the disintegration and strand transfer assays were measured by phosphorimaging. nt, nucleotides.

Quantification of the disintegration assays revealed that the P365A substitution caused a 3.6-fold reduction in product when measured at the protein concentration that gave the highest activity of wild-type IN, 0.3 μm (Fig. 5C, left). Surprisingly, the P365A substitution caused the IN concentration with maximum activity to increase 4-fold to 1.2 μm. At the optimal concentrations of both INs, the protein with the P365A substitution had 55% of the activity of wild-type IN. In contrast to this modest reduction in disintegration activity, P365A caused a 13.4-fold decrease in strand transfer activity when measured at an IN concentration of 0.15 μm, the optimal concentration of both INs.

The GPY Residues Do Not Contribute to the DNA Binding Activity in the C Terminus of Tf1 IN—To determine how the GP(Y/F) domain contributed to strand transfer activity, we tested how substitutions in the GP(Y/F) residues altered properties associated with IN. The C-terminal domains of INs are known to bind DNA without any sequence specificity (1). To determine whether the GP(Y/F) domain contributed to DNA binding, we first mapped which sections of Tf1 IN interacted with DNA. Labeled oligonucleotides with sequence from the U5 end of the LTR were mixed with 1 μg of our various proteins, and the mixtures were cross-linked with UV. DNA binding activity was observed with the full-length IN, CH-, NTD, core, CHD, and the fragment consisting of amino acids 335–406 (Fig. 6A). This fragment (aa 335–406) is referred to here as the GP(Y/F) fragment, because it included the entire GP(Y/F) domain. The proteins with the strongest DNA binding were CH-, the catalytic core, and the GP(Y/F) fragment. The CHD of the retrotransposon Maggy was used as a negative control, because it is similar to the CHD of HP1 and binds specifically to histone H3 methylated at lysine 9 (12). The Tf1 CHD exhibited background levels of DNA binding equal to that of the Maggy chromodomain. Experiments with a different DNA sequence revealed that all of the DNA binding activities lacked sequence specificity (supplemental Fig. S3). A quantitative comparison of DNA binding using equal molar amounts of protein demonstrated that the GP(Y/F) fragment had significantly greater binding activity than the CHD of Tf1 or the chromodomain of Maggy (Fig. 6B, lanes 2–4 and lanes 14–16). This indicated that the GP(Y/F) fragment contained the principal DNA binding activity in the C-terminal domain. Interestingly, the GP(Y/F) fragment with single amino acid substitutions in the GPF residues (G364A and P365A) retained full DNA binding activity (Fig. 6B, lanes 5–10). These results indicate that the GP(Y/F) residues did not contribute to DNA binding.

FIGURE 6.

FIGURE 6.

The binding of DNA to IN as detected by UV cross-linking. A, domains of IN were tested for binding to a double-stranded DNA with sequence from the U5 region of the Tf1 LTR. The reaction mixtures contained the DNA substrate and 1 μg of the proteins. After exposure to UV, the mixtures were run on SDS-polyacrylamide gels. IN was omitted from the reaction mixture in lane 1. B, increasing amounts of the C-terminal domains were tested for binding to the DNA. 0.06, 0.3, and 1.5 μm proteins were tested for DNA binding. The reaction mixture in lanes 2–4 contained the GP(Y/F) fragment (aa 335–406) with wild-type sequence. Lanes 5–7 contained the GP(Y/F) fragment with G364A. Lanes 8–10 contained the GP(Y/F) fragment with P365A. Lanes 11–13 contained the CHD of Maggy. Lanes 14–16 contained the CHD (aa 407–477) of Tf1.

The GP(Y/F) Domain Promotes the Formation of Dimers, Trimers, and Tetramers—Biochemical analyses reveal that the INs of HIV-1, M-MuLV, and avian sarcoma virus form dimers and tetramers in solution (2125). The individual domains of INs promote multimerization by self-dimerizing (1, 26, 27). We conducted a series of experiments to determine whether the GP(Y/F) domain forms multimers and to compare its ability to multimerize to that of the other domains. In initial experiments to test the domains of Tf1 IN for the propensity to multimerize, we tested full-length IN for interactions with the individual portions of the protein. Using our set of recombinant proteins in precipitation experiments, we found that the N-terminal domain, the central core, and the 71-amino acid GP(Y/F) fragment (aa 335–406) all bound to the full-length IN (supplemental Fig. S4). Interestingly, removing 20 amino acids from the N terminus of the fragment with the GP(Y/F) domain resulted in a protein (GP(Y/F)ΔN) that did not bind IN. This suggests that the N-terminal half of the GP(Y/F) domain played an important role in binding IN.

In order to test whether the GP(Y/F) domain contributes to multimerization, we subjected the IN proteins to gel filtration on Superdex 200 using protein concentrations of 1 mg/ml and a buffer of 50 mm HEPES, pH 7.5, 0.5 m NaCl, and 1% (v/v) glycerol. Full-length wild-type IN eluted with an apparent mass of 119.5 kDa, indicating that it formed a stable dimer (Fig. 7A). In efforts to detect a tetrameric form, gel filtration was performed in 1 m NaCl with protein concentrations of 1.0 and 2.0 mg/ml. Under these conditions, HIV-1 IN is in equilibrium between dimers and tetramers (28). However, only dimer species of Tf1 IN were observed (data not shown).

FIGURE 7.

FIGURE 7.

Size exclusion chromatography of full-length IN and truncated IN proteins. The size exclusion chromatography of the full-length IN-(1–477) (A), the catalytic core (B), and ΔC-(1–334) were analyzed using a Superdex 200 column. The chromatograms display the absorbance at 280 nm as a function of elution volume. The molecular sizes of each monomer are indicated in parentheses, and the estimated molecular masses based on elution volumes are listed above the peaks. The reference standards were aldolase (158 kDa), albumin (67 kDa), ovalbumin (43 kDa), chymotrypsinogen A (25.0 kDa), and ribonuclease A (13.7 kDa).

We tested whether the GP(Y/F) domain was required for dimerization by subjecting the full-length IN with P365A to gel filtration. The altered protein eluted with an apparent mass of 103.8 kDa, indicating that the P365A substitution did not reduce the formation of dimers (Fig. 7A, bottom).

Since the N-terminal, catalytic core, and C-terminal domains of INs each form stable dimers when tested as individual fragments (1, 26, 27), such interactions had the potential to mask any contribution the GP(Y/F) domain may have made to the dimerization of the full-length IN. To determine whether this was possible with Tf1 IN, we performed gel filtration on the catalytic core (aa 110–354) and the IN lacking the C-terminal domain, ΔC (Fig. 7B). Because these experiments were run separately from those in Fig. 7A, a set of molecular weight standards and full-length IN were run to calibrate the column. Both the core and the ΔC had apparent weights indicative of stable dimers. As a result, any contribution made by the GP(Y/F) domain to multimerization could have been masked by the core and N-terminal domains.

To test directly whether the individual fragments of the C-terminal domain promoted multimerization, gel filtration with Superdex 75 was performed. At a concentration of 1.0 mg/ml, the CHD (aa 407–477) eluted as a monomer (Fig. 8A). Surprisingly, the profile produced by the GP(Y/F) fragment (aa 335–406) included three major peaks (Fig. 8B). The apparent size of these species indicated the presence of monomer, dimer, and trimer. These results were interesting because they indicated that the small GP(Y/F) domain itself was capable of forming multimers larger than dimers. To test whether the highly conserved GP residues of the GP(Y/F) domain contributed to this multimerization, GP(Y/F) fragments with single amino acid substitutions were analyzed. Both substitutions, G364A and P365A, disrupted all multimerization of the GP(Y/F) fragment (Fig. 8, C and D). These data indicate that the GPF residues played an important role in promoting multimerization of the GP(Y/F) fragment.

FIGURE 8.

FIGURE 8.

Size exclusion chromatography of individual domains. The size exclusion chromatography of CHD (A), the GP(Y/F) fragment (B), and the GP(Y/F) fragment with substitutions G364A (C) and P365A (D) were analyzed using a Superdex 75 column. The chromatograms display the absorbance at 280 nm as a function of elution volume. The molecular sizes of each monomer are indicated in parentheses, and the estimated molecular weights based on elution volumes are listed above the peaks. The reference standards were albumin (67 kDa), ovalbumin (43 kDa), chymotrypsinogen A (25.0 kDa), ribonuclease A (13.7 kDa), and aprotinin (6.5 kDa).

Gel filtration of the GP(Y/F) fragment did not resolve multimers larger than trimers. To test for larger multimers, we subjected the GP(Y/F) fragment to the chemical cross-linker bis(sulfosuccinimidyl)suberate. Gel electrophoresis of the cross-linked samples indicated the protein at a concentration of 25 μm formed an equilibrium of monomers, dimers, trimers, and tetramers (Fig. 9). Thus, this 71-amino acid fragment containing the GP(Y/F) domain was able to form multimers as large as tetramers.

FIGURE 9.

FIGURE 9.

The multimerization of the GP(Y/F) (aa 335–406) fragment as monitored by chemical cross-linking. The GP(Y/F) fragment was incubated with bis(sulfosuccinimidyl)suberate (BS3), and the products were analyzed on 10–20% SDS-polyacrylamide gels and silver staining. The triangles indicate the position of species predicted on the basis of molecular weights to be monomer, dimer, trimer, and tetramer.

DISCUSSION

Trypsin proteolysis revealed that Tf1 IN possessed the three domain architecture typical of retrotransposon and retrovirus INs. In addition to this general property of INs, The IN of Tf1 has specific features that are similar to the IN of M-MuLV. These INs have HHCC and DDE motifs in positions that are closely matched between the two proteins. Of particular interest is that both Tf1 and M-MuLV INs have GP(Y/F) domains in very similar positions. These observations indicate that the activities of the GP(Y/F) domain in Tf1 IN may reflect the function of the GP(Y/F) domain of M-MuLV as well as other γ-retroviruses.

Our in vivo assays of transposon function revealed that substitutions of the Gly, Pro, and Phe residues in the GP(Y/F) domain caused IN to become unstable. Although the reduction in the levels of IN made it difficult to evaluate the function of these amino acids, it did suggest that the GP(Y/F) domain was an important structural feature needed for the protein to fold correctly. Regardless of their effect on integration, the substitutions in the GP(Y/F) domain had little impact on the stability of reverse transcriptase and the production of cDNA. This finding reflects previous observations that reverse transcriptase produces normal levels of cDNA even when Tf1 lacks IN expression (Fig. S2, IN fs) (29).

For in vitro analysis, the disintegration assay is a sensitive method for identifying the minimum structure of an IN that is capable of performing strand breaking and joining. With this assay, it was shown that the central core domains of HIV-1 and Rous sarcoma virus INs possess all of the active site and substrate binding residues necessary for completing catalysis (18, 19). In the case of the M-MuLV and Tf1 INs, these enzymes have larger C-terminal domains, and these domains are required to catalyze disintegration; for Tf1, the N-terminal domain was also necessary. The requirement for these additional domains is not understood, but it is possible that they stabilize binding to the disintegration substrate.

The segment of the C terminus in Tf1 IN that was required for disintegration activity was amino acids 354–406. This portion of the C terminus contained the GP(Y/F) domain as well as most of the residues in the DNA binding GP(Y/F) fragment. Since the P365A substitution in the full-length IN caused only a modest reduction in disintegration, we propose the possibility that the GP(Y/F) residues themselves were not the component of the C-terminal domain that was essential for disintegration activity. Instead, its DNA binding activity may have provided the critical function removed by the C-terminal truncation of amino acids 354–406. This model is consistent with our finding that the DNA binding activity in the C-terminal domain functioned independently of the GP(Y/F) residues.

Although the P365A substitution did not cause a substantial reduction in disintegration activity, it did increase the concentration of IN required for maximal activity. It is not known what is responsible for the inhibition of activity caused by high concentrations of IN. Nevertheless, it is possible that P365A weakened subunit interactions necessary for disintegration. The higher concentrations of IN would be required to compensate for the defects in multimerization. However, other explanations are also possible.

In strand transfer assays, the P365A substitution caused a much greater defect in activity than was observed in disintegration reactions. This suggests that the GP(Y/F) domain provided a key function in strand transfer that was less important in disintegration assays. Functions of IN that contribute more significantly to strand transfer than disintegration include recognition of the terminal sequences of donor DNA as well as the ability to join two separate DNA substrates (30). It may be that Pro-365 played a disproportional role in one of these functions.

In order to distinguish between the possible roles of the GP(Y/F) domain, we studied the biochemistry of a 71-amino acid fragment containing the GP(Y/F) domain. In our UV cross-linking assays with oligonucleotides, the GP(Y/F) fragment did bind DNA with high efficiently. The finding that a region of the C-terminal domain bound DNA and that it was a nonspecific DNA binding activity paralleled the activities of other INs (3136). However, our study of single amino acid substitutions indicated that the GP(Y/F) residues within the GP(Y/F) fragment did not contribute to this DNA binding. Also of interest was our finding that IN lacking the CHD (CH-) bound DNA more efficiently than the full-length IN (Fig. 6A). This observation suggests that the CHD can interfere with DNA binding. Such a reduction in DNA binding could explain why CH- has greater catalytic activity than the full-length IN. This is consistent with the possibility that the CHD performs a regulatory function during integration.

Another property of INs important for activity is multimerization. Sedimentation and kinetic experiments indicate that the INs of avian sarcoma virus and HIV-1 must multimerize to be active (21, 37). Complementation studies with two defective forms of HIV-1 IN revealed that subunits can multimerize to become active (38, 39). Recent studies indicate that the IN of HIV-1 is a tetramer in its synaptic complex and that multimerization of the C-terminal domain plays an important role in concerted integration (7, 40). Our results of gel filtration indicated that full-length Tf1 IN formed a stable dimer. Despite a significant effort, we could not identify conditions that allowed IN to form tetramers. Nevertheless, complementation analyses with defective versions of Tf1 IN revealed that multimeric complexes were highly active.4

In experiments to test whether the GP(Y/F) domain contributed to multimerization, we found the substitution P365A in full-length IN did not diminish dimerization. We also found that stable dimers were formed by IN lacking the C-terminal domain (aa 1–334) and with the core domain (aa 110–354) itself. Thus, we do not have direct evidence that the GP(Y/F) domain contributes to multimerization of the full-length protein. It is possible that the GP(Y/F) domain does not promote dimerization of IN. However, any contribution that the GP(Y/F) domain might make to dimerization could have been masked by self-association interactions in other regions of the protein.

Our direct examination of the GP(Y/F) fragment by gel filtration and chemical cross-linking revealed high levels of dimers, trimers, and tetramers. This ability to form trimers and tetramers was unique among the INs and IN fragments we studied. The multimerization of the GP(Y/F) fragment was disrupted by the single amino acid substitutions in Gly-364 and Pro-365, indicating that the GP(Y/F) domain played a significant role in multimerization. It was interesting that these same substitutions did not compromise the DNA binding activity of the GP(Y/F) fragment. We therefore concluded that multimerization of the GP(Y/F) fragment was not necessary for the DNA binding activity.

The lack of any high resolution structure of Tf1 IN and our inability to detect its tetramerization makes it difficult to speculate about the role of the GP(Y/F) domain in the multimerization of the full-length IN. It is reasonable to propose that the dimerization of the GP(Y/F) domain contributes to the multimerization of the full-length protein. It is tempting to speculate that like the IN of HIV-1, Tf1 IN may form higher multimers in its synaptic complex. If this is true, the GP(Y/F) domain could mediate the higher multimerization. This speculation is consistent with the increase in IN concentration necessary for peak activity that was caused by the P365A substitution. Whether the drop in strand transfer activity caused by P365A resulted from a defect in tetramerization is not known. It is possible that the substitution resulted in other structural perturbations. Nevertheless, the significant drop in strand transfer activity caused by P365A suggests that the GP(Y/F) domain plays an important function in integration.

Although little sequence conservation of the GP(Y/F) domain is observed in the IN of HIV-1, the C termini of Tf1, avian sarcoma virus, and HIV-1 INs all bind DNA. Thus, it is possible that the C terminus of Tf1 IN adopts the Src homology 3-like folds present in the INs of HIV-1 and avian sarcoma virus. Nevertheless, the conservation of the GP(Y/F) domain in the Metavirus family of retrotransposons and in the diverse family of γ-retroviruses indicates that its key function is broadly conserved.

Supplementary Material

Supplemental Data

Acknowledgments

We thank Dr. Nga Nguyen (Food and Drug Administration Center for Biologics and Evaluation) for assistance with N-terminal sequencing. We are grateful to Amnon Hizi for providing antibodies against reverse transcriptase.

*

This work was supported, in whole or in part, by the National Institutes of Health Intramural Research Program, NICHD, and the National Institutes of Health Intramural AIDS Targeted Antiviral Program. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables S1–S3 and Figs. S1–S4.

Footnotes

3

The abbreviations used are: LTR, long terminal repeat; IN, integrase; M-MuLV, Moloney murine leukemina virus; CHD, chromodomain; CHAPS, 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonic acid; BisTris, 2-[bis(2-hydroxyethyl)amino]-2-(hydroxymethyl)propane-1,3-diol; aa, amino acid(s); HIV-1, human immunodeficiency virus, type 1.

4

H. Ebina and H. Levin, unpublished results.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

Articles from The Journal of Biological Chemistry are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES