Abstract
Retroviruses and retrotransposons insert into the host genome with no obvious sequence specificity. We examined the target sites of the retroelement ZAM by sequencing each host-ZAM junction in the genome of Drosophila melanogaster. Our overall data provide compelling evidence that ZAM integration machinery recognizes and leads to ZAM insertion into the sequence 5′-GCGCGCg-3′. This unique property of ZAM will facilitate the development of new tools to study the integration process of retroelements.
Integration of proviral DNA into host chromosomes is an essential part of the retroviral life cycle. This step in the replication cycle is mediated by integrase, a key enzyme encoded by the element itself (10).
During the early stages of infection, the integrase protein processes the termini of viral long terminal repeats (LTRs) by endonucleolytic cleavage of the 3′-terminal dinucleotides. Integration of this processed proviral DNA occurs by staggered transesterification of the recessed 3′ ends and the phosphodiester backbone of the target DNA. It has been demonstrated for some retroviruses that the cleavage reaction is performed within a high-molecular-weight preintegration complex (2, 18). Preintegration complexes are transported to the nucleus, and the proviral DNA is integrated into the host chromosome.
Retrotransposons and retroviruses are structurally related and have similar modes of integration into host DNA. They display various degrees of bias in selecting target sites for integration in vivo. The mechanism and factors governing these preferences are not completely documented, but host proteins are known to affect selection of the insertion site through changes in DNA structure or by interacting directly with the integration machinery. For example, yeast retroelements such as Ty1 and Ty3 target genomic regions lying immediately upstream of genes transcribed by polymerase III, and Ty5 targets transcriptionally silent regions of the yeast genome (6, 14–16). Furthermore, sequence data from large numbers of integration sites for Ty1 (3), human immunodeficiency virus (HIV) (12), and Gypsy (4) have revealed weak consensus sequences, but no retroelement that displays strict sequence specificity in its target has been described.
The ZAM retroelement is an 8,435-bp retrovirus-like element present within the genome of Drosophila melanogaster (17). On the basis of sequence, structural, and functional similarities, ZAM displays a striking resemblance to vertebrate retroviruses (for a review, see reference 7). It has three open reading frames analogous to the retroviral gag, pol, and env genes surrounded by LTRs. The ZAM pol gene is subdivided into three regions, which encode typical retrovirus-like enzymes: protease, reverse transcriptase-RNase H, and integrase (IN). The latter polypeptide displays all the characteristics of canonical retroviral IN (10). It contains three domains: a zinc finger amino-terminal motif (HHCC), a core or catalytic domain characterized by the DD35E motif, and a carboxy-terminal part of the protein which displays a high basicity similar to the DNA binding domain of retroviral integrases. All of these characteristics suggest that the mechanism of ZAM integration resembles that of retroviruses.
In this paper, we report that ZAM is highly sequence specific in its integration. By exhaustive analyses of ZAM insertions in a defined strain of flies, we show that the target sequence chosen by nearly every ZAM element is 5′-GCGCGCg-3′ (lowercase “g” indicates a <50% occurrence of that base).
In a previous work, we identified two independent ZAM insertions flanked by the duplicated sequence CGCGC (17), suggesting that ZAM could require a specific target site for integration. Interestingly, this sequence is also recognized by two restriction endonucleases, HhaI and ThaI, which cleave the sequences GCGC and CGCG, respectively. Thus, we postulated that if ZAM copies present in a genome were indeed inserted into a CGCGC sequence, then HhaI and ThaI endonucleases must cleave within the 5′ and 3′ end duplications of ZAM insertions.
Since the ZAM genome itself contains three HhaI restriction sites (Fig. 1 [restriction map below the autoradiograph]), cleavage of genomic DNA with HhaI should reveal bands of 475 bp and 3.2 kb when probed by sequences complementary to the ZAM LTR (Fig. 1A). Additionally, bands of 1.4 and 3.3 kb are expected after hybridization with the internal E6 probe (Fig. 1B). This is indeed what is seen when DNA from strains bearing a high copy number of ZAM elements, such as Charolles and wIR6Rev, or strains bearing a low copy number of ZAM elements, such as Canton S and wIR6, are subjected to HhaI digestion (Fig. 1). Most of the restriction fragments produced were of the expected sizes in all of the strains studied. By using a Bio-Rad GS-525 phosphorimager, we estimated the copy numbers of ZAM elements detected in this experiment. We found that the 3.2-kb or 475-bp bands obtained in wIR6Rev and Charolles contain 6 to 8 more copies of ZAM than any faint band visualized on the blot or a band revealed by a probe of the white gene, taken as an internal standard to estimate the intensity of a sequence that is unique in the genome (data not shown). If a single copy of ZAM is present in faint bands, then we estimate the total copy number to be 15 to 20, which is in good correlation with previous data estimated through in situ and Southern blot analyses (5, 17). Identical results were obtained when a similar experiment was performed with the ThaI endonuclease (Fig. 2).
FIG. 1.
(Upper panel) ZAM target duplication sequences contain an HhaI restriction site. Genomic DNAs of high- and low-copy-number strains were treated with the HhaI restriction endonuclease and hybridized to the LTR (A) or E6 (B) probe. (Lower panel) ZAM HhaI restriction map. Probes used for Southern analysis (LTR and E6 probes) and expected digested fragments are depicted below the restriction map. The HhaI restriction sites in shaded boxes are located in ZAM target duplication sequences.
FIG. 2.
(Upper panel) ZAM target duplication sequences contain a ThaI restriction site. Genomic DNAs of high- and low-copy-number strains were treated with the ThaI restriction endonuclease and hybridized to the LTR (A) or E6 (B) probe. (Lower panel) ZAM ThaI restriction map. Probes used for Southern analysis (LTR and E6 probes) and expected digested fragments are depicted below the restriction map. The ThaI restriction sites in shaded boxes are located in ZAM target duplication sequences.
Several conclusions can be drawn from these experiments: (i) most ZAM retroelements contain both an HhaI and a ThaI restriction site at each end; (ii) sequence-specific integration is found in all strains tested, independently of ZAM copy number; and (iii) internal HhaI and ThaI restriction sites are well conserved in all members of the ZAM family.
In order to confirm these data, we performed systematic sequencing of each host-ZAM junction in the wIR6Rev genome. This strain was chosen because its genome has been subjected to a recent and massive invasion by ZAM elements (5). To collect as many independent integration sites as possible, we used an inverse PCR approach that allowed cloning of integrated copies of ZAM together with their flanking cellular sequences. Total wIR6Rev genomic DNA of one fly was extracted according to the protocol of Gloor and Engels (8, 9). Inverse PCR experiments were performed on ligated TaqI-cut DNA by using pairs of backward-oriented primers within the 3′ LTR (ZAMLTR [8340 to 8361], 5′ AAT TCT CCC AAG ACG ACC GTG 3′; ZAMLTRic [8302 to 8323], 5′ ACG TCT ACA AGT GTT TGC TGC 3′). PCR amplification was as follows: 2 min at 95°C for one cycle; 45 s at 94°C, 45 s at 60°C, and 45 s at 72°C for 35 cycles; and 10 min at 72°C for one cycle. Fragments of variable sizes (200 to 500 bp) were amplified, cloned into the pGEMT vector (Promega), and sequenced on an ABI 377 sequencer (Perkin-Elmer). Sequences of 60 clones were determined, and 16 of these appear to be unique. This sample size of 16 clones fits well with the predicted ZAM copy number present in the wIR6Rev genome and previously estimated through Southern blots and in situ experiments (1, 3, 11). Sequences of independent ZAM integration sites are shown in Table 1. Alignment allowed us to derive a consensus sequence, 5′-CGCGCg-3′, bordering ZAM insertions (Table 2). As expected, this consensus contained the two restriction sites recognized by the HhaI and ThaI restriction endonucleases. It is noteworthy that the strict consensus sequence CGCGC is represented in as many as 6 of the 16 ZAM insertions sequenced. If the stringency is relaxed by allowing a single mismatch to this consensus, then more than 93% (15 of 16) of the insertions conform (the last one displaying only one additional mismatch). These small sequence variations in the targets of ZAM insertions may explain the presence of the few additional bands visualized on Southern blots (Fig. 1 and 2). However, some of these extra bands may have also been generated by enzymatic restriction site polymorphisms within the sequences of numerous ZAM copies. Polymorphisms were indeed detected when blots were probed with an internal ZAM sequence (probe E6 [Fig. 1B and 2B]).
TABLE 1.
Alignment of independent ZAM integration sites from the genome of strain wIR6Rev
| Sequence | Inverted repeat | Integration site |
|---|---|---|
| +1+28 | ||
| 1 | AGGTAACT | CGCGCGTGCTGTAAGCTAACGCCGACTT |
| 2 | AGGTAACT | CGCGCGGTAGTTGGTTCGGCCACAACAG |
| 3 | AGGTAACT | CGCGCTCATTATTGGCCATGTTAGTTAG |
| 4 | AGGTAACT | CCCGCGGCGCACATCAATTGATTTACCG |
| 5 | AGGTAACT | CCTGCGACAGCCACATCAAAAGGATCCA |
| 6 | AGGTAACT | CGTGCCCTGACCAATCCAAAGAGCTCAC |
| 7 | AGGTAACT | CGCGCAGTTGAGAGCATAGCTCCTGTGC |
| 8 | AGGTAACT | CACGCGGCCTTCAGCCGCCTCATCAGCT |
| 9 | AGGTAACT | CACGCCCTCCGGCTGCCGTAGCACATTG |
| 10 | AGGTAACT | CGCGCTTAACAAAGGTCCTGGAAACCGA |
| 11 | AGGTAACT | CGCCCCAACCCGAGACCCAGAATCCCCA |
| 12 | AGGTAACT | CGTGCGTTGGTGTGTGTGCCGCCCGCTC |
| 13 | AGGTAACT | CCCGCAAGGAGGACACACGGCTAAGTCC |
| 14 | AGGTAACT | CGCTCGTGTACAAATTACGAAGAACTAC |
| 15 | AGGTAACT | CGCGGCAACAGCGCACAACAAGCTGAGA |
| 16 | AGGTAACT | CGCGCACCTCCGCCGGAGAGAATAGCCA |
TABLE 2.
Consensus target of ZAM integration site derived from frequencies of bases at the sites selected
| Base | % of indicated base at position
|
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| +1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +9 | +10 | |
| C | 100 | 18.8 | 81.2 | 6.3 | 93.8 | 25 | 25 | 25 | 31.2 | 31.2 |
| T | 0 | 0 | 18.8 | 6.3 | 0 | 12.5 | 25 | 31.2 | 25 | 18.8 |
| A | 0 | 12.5 | 0 | 0 | 0 | 18.8 | 25 | 25 | 18.8 | 25 |
| G | 0 | 68.8 | 0 | 87.5 | 6.2 | 43.8 | 25 | 18.8 | 25 | 25 |
| Consensusa | C | G | C | G | C | g | N | N | N | N |
Capital letters indicate a >50% occurrence of the indicated base. Lowercase letters indicate a <50% occurrence of the indicated base. The letter N symbolizes C, T, A, or G.
In order to determine the genomic regions where ZAM copies were inserted, the 16 flanking sequences in Table 1 were compared with sequences present in data banks. We found that seven of these sites display 100% homology to cosmids sequenced in the course of the D. melanogaster sequencing project, two identify different ZAM insertions within the retroelement R1Dm, and another corresponds to the ZAM element integrated upstream of the transcriptional start site of the white gene. Identification of the latter insertion was a good indirect control of our screen because it allowed us to isolate an insertion previously identified in this strain by a different method (17). Alignment of all of the ZAM empty target sites confirmed that the sequence recognized upon ZAM insertion spans the 6-bp target site identified through the inverse PCR approach. An additional G present in all of the empty sites and located at the 5′ end of the CGCGCg sequence was also detected (Table 3). These data suggest that the target site recognized upon ZAM insertions is the consensus sequence GCGCGCg, which is longer than the duplicated sequence generated upon its insertion and previously reported for its insertion at the white locus. It is interesting to note that this consensus is palindromic, as are other consensus sequences described for Ty1 (12) and Tn10 transposition (11). In addition, identification of these genomic loci confirmed previous results indicating that ZAM copies are dispersed on chromosomal arms of the strains and are not clustered in heterochromatic or telomeric regions of the genome (Table 3) (1, 5).
TABLE 3.
Localization of ZAM integrations
| Sequence | Genomic ZAM target | Accession no. | Locationa |
|---|---|---|---|
| ↓Insertion site | |||
| 1 | TTATGAGACGCGCGCGTGCTGTAAGCTAACGCCGACTT | DMWHITE | ChrX |
| 2 | CTCAACTGAGCGCGCGGTAGTTGGTTCGGCCACAACAG | DMRER1DM | HET |
| 3 | TCATGATGAGCGCGCTCATTATTGGCCATGTTAGTTAG | DMBH61I5 | ChrX |
| 4 | TTTATAAGCGCCCGCGGCGCACATCAATTGATTTACCG | DMC190E7 | ChrX |
| 5 | TGGGGCTTAGCCTGCGACAGCCACATCAAAAGGATCCA | DS04362 | Chr2L |
| 6 | CCAGGAGACGCGTGCCCTGACCAATCCAAAGAGCTCAC | DS02972 | Chr2R |
| 7 | AACTACCGCGCGCGCAGTTGAGAGCATAGCTCCTGTGC | DMRER1DM | HET |
| 9 | GGGTGTGCCGCACGCCCTCCGGCTGCCGTAGCACATTG | AI294665 | ? |
| 12 | TCCTTCTAGGCGTGCGTTGGTGTGTGTGCCGCCCGCTC | DMC25D2 | ChrX |
| 15 | TATCTAGACGCGCGGCAACAGCGCACAACAAGCTGAGA | DS07721 | Chr2L |
Chr, chromosome; HET, heterochromatin.
The capacity of the ZAM integration machinery to select specific sequences for integration makes ZAM unique among the retroelements described so far. Indeed, available studies, including a systematic sequencing of retroviral integration targets, are often limited and do not reveal consensus sequences with such a strict arrangement of bases (3, 4, 12, 20). The specificity of the ZAM integration machinery for the sequence 5′-GCGCGCg-3′ is more reminiscent of the well-characterized property of the sequence-specific restriction endonucleases that trigger DNA cleavage in bacteria. Like ZAM, these enzymes recognize and cleave DNA at specific sites and, under certain conditions, may extend their cleavage to noncanonical (“star”) sites.
The development of molecular and genetic approaches for understanding ZAM integrase activity is a worthwhile goal. Analysis of integrase activity will identify functional domains and their roles in sequence recognition. Furthermore, it is essential to identify ZAM partners involved in preintegration complexes as well as cellular factors present on the host chromosome at the insertion sites. Indeed, the alternating pyrimidine and purine residues in the GCGCGCg target sequence of ZAM form a DNA sequence which is able to adopt a particular conformation potentially facilitating recognition by specific host factors. Cellular factors such as Ini1 and HMGI(Y) have been shown to interact with HIV and Moloney murine leukemia virus proteins and enhance integration (13, 18, 19). The genetic potential offered by the Drosophila genome will provide an invaluable tool for beginning to answer these questions. Because of the high degree of similarity between ZAM integrase and integrases encoded by retroviruses, such a development will be of general interest and extend our understanding of retroviral insertion. It is tempting to predict that it would then be possible to engineer retroelements and drive their insertion into preexisting genome targets.
Acknowledgments
We thank J. M. Heard, S. Ronsseray, and J. Miller for critical reading of the manuscript and Valérie Calco for technical assistance.
This work was supported by INSERM grants and by a grant from Ministère de l’Enseignement Supérieur et de la Recherche (MESR). P.L. received a graduate grant from MESR and a grant from the Conseil Régional d’Auvergne (Bourse Régionale d’Excellence 1998).
REFERENCES
- 1.Baldrich E, Dimitri P, Desset S, Leblanc P, Codipietro D, Vaury C. Genomic distribution of the retrovirus-like element ZAM in Drosophila melanogaster. Genetica. 1997;100:131–140. [PubMed] [Google Scholar]
- 2.Bukrinsky M I, Sharova N, McDonald T L, Pushkarskaya T, Tarpley G W, Stevenson M. Association of integrase, matrix, and reverse transcriptase antigens of human immunodeficiency virus type 1 with viral nucleic acids following acute infection. Proc Natl Acad Sci USA. 1993;90:6125–6129. doi: 10.1073/pnas.90.13.6125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Carteau S, Hoffmann C, Bushman F. Chromosome structure and human immunodeficiency virus type 1 cDNA integration: centromeric alphoid repeats are a disfavored target. J Virol. 1998;72:4005–4014. doi: 10.1128/jvi.72.5.4005-4014.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dej K J, Gerasimova T, Corces V G, Boeke J D. A hotspot for the Drosophila gypsy retroelement in the ovo locus. Nucleic Acids Res. 1998;26:4019–4024. doi: 10.1093/nar/26.17.4019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Desset S, Conte C, Dimitri P, Calco V, Dastugue B, Vaury C. Mobilization of two retroelements, ZAM and Idefix, in a novel unstable line of Drosophila melanogaster. Mol Biol Evol. 1999;16:54–66. doi: 10.1093/oxfordjournals.molbev.a026038. [DOI] [PubMed] [Google Scholar]
- 6.Devine S E, Boeke J D. Regionally-specific, targeted integration of the yeast retrotransposon upstream of genes transcribed by RNA polymerase III. Genes Dev. 1996;10:620–633. doi: 10.1101/gad.10.5.620. [DOI] [PubMed] [Google Scholar]
- 7.Finnegan D J. Retroviruses and transposons. Wandering retroviruses. Curr Biol. 1994;4:641–643. doi: 10.1016/s0960-9822(00)00142-1. [DOI] [PubMed] [Google Scholar]
- 8.Gloor G B, Preston C R, Johnson-Schlitz D M, Nassif N A, Phillis R W, Benz W K, Robertson H M, Engels W R. Type I repressors of P element mobility. Genetics. 1983;135:81–95. doi: 10.1093/genetics/135.1.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gloor G B, Engels W R. Single fly DNA preps for PCR. Dros Inf Ser. 1991;71:148–149. [Google Scholar]
- 10.Goff S P. Genetics of retroviral integration. Annu Rev Genet. 1992;26:527–544. doi: 10.1146/annurev.ge.26.120192.002523. [DOI] [PubMed] [Google Scholar]
- 11.Halling S M, Kleckner N. A symmetrical six-base-pair target site sequence determines Tn10 insertion specificity. Cell. 1982;28:155–163. doi: 10.1016/0092-8674(82)90385-3. [DOI] [PubMed] [Google Scholar]
- 12.Ji H, Moore D P, Blomberg M A, Braiterman L T, Voytas D F, Natsoulis G, Boeke J D. Hotspots for unselected Ty1 transposition events on yeast chromosome III are near tRNA genes and LTR sequences. Cell. 1993;73:1007–1018. doi: 10.1016/0092-8674(93)90278-x. [DOI] [PubMed] [Google Scholar]
- 13.Kalpana G V, Marmon S, Wang W, Crabtree G R, Goff S P. Binding and stimulation of HIV-1 integrase by a human homolog of yeast transcription factor SNF5. Science. 1994;266:2002–2006. doi: 10.1126/science.7801128. [DOI] [PubMed] [Google Scholar]
- 14.Ke N, Irwin P A, Voytas D F. The pheromone response pathway activates transcription of Ty5 retrotransposons located within silent chromatin of Saccharomyces cerevisiae. EMBO J. 1997;16:6272–6280. doi: 10.1093/emboj/16.20.6272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kim J M, Vanguri S, Boeke J D, Gabriel A, Voytas D F. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 1998;8:464–478. doi: 10.1101/gr.8.5.464. [DOI] [PubMed] [Google Scholar]
- 16.Kirchner J, Connolly C M, Sandmeyer S B. Requirement of RNA polymerase III transcription factors for in vitro position-specific integration of a retrovirus-like element. Science. 1995;267:1488–1491. doi: 10.1126/science.7878467. [DOI] [PubMed] [Google Scholar]
- 17.Leblanc P, Desset S, Dastugue B, Vaury C. Invertebrate retroviruses: ZAM, a new candidate in D. melanogaster. EMBO J. 1997;16:7521–7531. doi: 10.1093/emboj/16.24.7521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li L, Farnet C M, French Anderson W, Bushman F D. Modulation of activity of Moloney murine leukemia virus preintegration complexes by host factors in vitro. J Virol. 1998;72:2125–2131. doi: 10.1128/jvi.72.3.2125-2131.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Miller M D, Farnet C M, Bushman F D. Human immunodeficiency virus type 1 preintegration complexes: studies of organization and composition. J Virol. 1997;71:5382–5390. doi: 10.1128/jvi.71.7.5382-5390.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shih C-C, Stoye J P, Coffin J M. Highly preferred targets for retrovirus integration. Cell. 1988;53:531–537. doi: 10.1016/0092-8674(88)90569-7. [DOI] [PubMed] [Google Scholar]


