Integration of the retrovirus DNA genome into host chromosomes is essential for viral replication. After virus infection of the cell, synthesis of viral cDNA occurs by reverse transcription of the viral RNA genome. Integration of the viral DNA genome into cellular DNA is catalyzed by the viral integrase (IN) encapsulated within the infecting virus particle. Viral DNA integration has different consequences for the host. Integration of HIV type 1 (HIV-1) results in virus spread giving rise to the pathology of HIV-1/AIDS that can be suppressed by combinational drug therapies. In animal model systems, the integrated provirus of murine leukemia virus (MLV) and avian sarcoma-leukemia virus (ASLV) DNA either is transcribed to produce virus particles or is silenced by host cellular mechanisms, or, on rare occasions, may modify cellular protooncogenes that can result in different forms of cancers. In this issue of PNAS, Holman and Coffin (1) studied DNA sequences at the sites of integration of HIV-1, MLV, and ASLV viral genomes within the human genome. The evaluation of this collection of nucleotide sequences was made possible by coupling the known human genome sequence with the recent cloning and sequencing of several thousand viral DNA integration sites resulting from infection of human cells by these three retroviruses (2-5).
Analysis showed significant base preferences at and near integration sites.
Integration of the retrovirus DNA genome by IN occurs in the context of viral nucleoprotein complexes termed preintegration complexes (PICs) (6). Do PICs of HIV-1, MLV, and ASLV select distinct chromosomal regions in the human genome (2-5) for integration of their viral genomes? All of the chromosomes represented nearly equal targets for viral DNA integration with all three viruses. HIV-1 PIC favored integrating its viral genome inside genes (2-4). MLV preferred integrating into or near transcription start sites (3, 4), although there was a slight preference for genes over other DNA regions. The HIV-1 and MLV preferences were ≈2-fold higher over insertion of their viral genomes into other regions of a chromosome. ASLV PICs displayed a slight preference for genes and no preference for transcription start sites (4, 5). HIV-1 PICs had a preference to integrate into genes that are actively transcribed over less active genes (2, 4), whereas ASLV PICs did not prefer active genes in human cells (4, 5). In fact, ASLV-infected quail cells demonstrated that integration of the viral genome into an actively transcribed metallothionein gene is significantly inhibited over the same gene that had not been induced (7). These studies suggest that HIV-1 and MLV PIC may interact with chromatin-associated factors and/or transcriptional cofactors to help guide these nucleoprotein complexes to integrate into the gene itself for HIV-1 and into or near the transcriptional start sites for MLV (8).
Does IN have a preference for specific target sequences in the host DNA even when the PIC integrates the retroviral genome into many regions of every human chromosome? Holman and Coffin (1) evaluated the base frequencies directly at and adjacent to the cloned HIV-1, MLV, and ASLV integration sites. Sufficient numbers of sites were scored for each retrovirus (HIV-1, n = 1,795; MLV, n = 939; and ALV, n = 620), allowing a statistical analysis to prove strong preferences for nucleotide sequences at the sites of insertion with all three viruses. Human genome sequences were aligned at each determined viral DNA integration site and numbered proportionally from the integration sites. IN inserts the two 3′ OH ends of the viral linear DNA into the two strands of the cell DNA at a distance staggered by 4-6 bp (depending on the infecting virus), resulting in short duplications of cell DNA. The authors studied the cellular sequences as though IN was searching for the optimal target DNA sequences for integration just before insertion (Fig. 1). Sequences that span 500 bases on either side of the integration sites were examined. The five independent HIV-1 sequence sets revealed significant base preferences at and near the sites of insertion over randomly selected mock integration sites (n = 881). Some smaller base preferences extending from ≈15 bp to 17 bp on either side of the insertion sites were also evident. Analysis of MLV and ASLV data sets also showed significant base preferences at and near their integration sites. Among these three viruses, the selected sites and the sequence patterns at the site of insertions were different as to sequence and the positions of preferred and avoided bases (Fig. 1).
Fig. 1.
Cellular DNA target site selection by integrase before integration. The ends of the viral 10-kbp linear DNA (angled) are not connected in this drawing. The host DNA is represented by the horizontal DNA structure and does not take into account the chromatin structure of DNA in vivo. The potential insertion sites are indicated by two red dots on the host DNA. The 2-bp 3′ OH recessed viral DNA ends are adjacent to the host DNA, with the 2-bp viral DNA overhang shown as 5′. The four black dashed ovals (labeled IN) represent two dimers of IN (tetramer) bound to the viral and host DNA. The host nucleotides bound by IN at and outside of the insertion sites (sequence pattern) vary for each virus, and their approximate locations are bracketed. In this model, the two ends of the viral DNA bound with IN are partners for symmetrical recognition of the host target site.
Further major insights were gleaned from the Holman and Coffin study (1) into the question of how IN within the PIC recognizes cellular sequences for integration in vivo. Models of IN-viral DNA complexes have suggested that a symmetrical complex of four IN subunits associated with two viral DNA ends is responsible for integration (Fig. 1), although other arrangements with different numbers of subunits have not been entirely ruled out. The base preferences at the sites of integration and the proximal sequences for HIV-1, MLV, and ASLV sites demonstrated a significant symmetry, strongly suggesting that IN recognizes a sequence pattern upon integration (Fig. 1). This symmetrical recognition of bases was apparent at both the preferred and avoided bases, producing a specific pattern of sequences for each virus. The observed sequence patterns at the sites of integration were different, possibly reflecting their host target-binding properties. The particular target-binding sites for each IN would reflect the sequence and flexibility of DNA that results in a unique tertiary structure compatible for binding.
Selection of host target sites by some retrovirus PIC may be influenced by cellular transcription factors or chromatin-associated factors (8, 9). Host cell factors have been shown to direct nucleoprotein complexes of intracellular retrotransposons (possessing replication pathways similar to that of retroviruses) to mediate integration into specific sites in yeast genomes (10, 11). Genetic approaches like efficient gene depletion and knockout strategies will be necessary to show whether cellular host factors either influence or direct retrovirus PIC to certain target sites for integration.
The symmetrical sequence patterns at and surrounding the host target sites suggest that IN plays the major role for positioning the PIC on the target DNA (Fig. 1). The ability of HIV-1 and ASLV IN to bind target sequences for insertion of the viral DNA ends has been mapped to the central catalytic core domain of the IN (12-14). In vitro integration assays with model viral DNA substrates and naked target DNA have been used to analyze the sequences surrounding the insertion sites (n = 93) that possess the HIV-1 5-bp host site duplications (15). Few base sequence preferences, except for a G at the attachment sites, or sequence patterns were observed at the sites of insertion by using nonionic detergent lysed HIV-1 virions as a source of IN. In support of the Holman and Coffin study (1), analysis of integration sites (n = 300) on naked DNA using purified avian myeloblastosis virus IN demonstrated a symmetrical pattern of G/C and A/T sequence biases at and surrounding the insertion sites (16) similar but not identical to the symmetrical pattern observed in the ASLV PIC sequence studies (1).
The studies of Holman and Coffin (1) suggest that the PIC of HIV-1, MLV, and ASLV (2-5) may have different interactions with cellular DNA for integration. Understanding the molecular structure of nucleoprotein complexes comprising purified IN-viral DNA, a task not yet accomplished, would be beneficial for understanding the symmetrical sequence patterns observed upon integration. Target binding by IN played a role in the development of drugs to HIV-1 IN. Potential clinically important drugs directed against HIV-1 IN prevent viral DNA integration into the host genome (17). These drugs bind to IN only in the context of an IN-viral DNA complex and inhibit integration by competing with target binding. Target binding by PIC is also critical for human gene therapy. MLV-mediated insertional mutagenesis of the LMO2 protooncogene occurred in 2 of the 12 children being treated for severe combined immune deficiency and resulted in leukemia (18). Understanding IN target site selection is important for the future development of safe retrovirus vectors for gene therapy.
See companion article on page 6103.
References
- 1.Holman, A. G. & Coffin, J. M. (2005) Proc. Natl. Acad. Sci. USA 102, 6103-6107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schroder, A. R., Shinn, P., Chen, H., Berry, C., Ecker, J. R. & Bushman, F. (2002) Cell 110, 521-529. [DOI] [PubMed] [Google Scholar]
- 3.Wu, X., Li, Y., Crise, B. & Burgess, S. M. (2003) Science 300, 1749-1751. [DOI] [PubMed] [Google Scholar]
- 4.Mitchell, R. S., Beitzel, B. F., Schroder, A. R., Shinn, P., Chen, H., Berry, C. C., Ecker, J. R. & Bushman, F. D. (2004) PLoS Biol. 2, E234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Narezkina, A., Taganov, K. D., Litwin, S., Stoyanova, R., Hayashi, J., Seeger, C., Skalka, A. M. & Katz, R. A. (2004) J. Virol. 78, 11656-11663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brown, P. O., Bowerman, B., Varmus, H. E. & Bishop, J. M. (1987) Cell 49, 347-356. [DOI] [PubMed] [Google Scholar]
- 7.Maxfield, L. F., Fraize, C. D. & Coffin, J. M. (2005) Proc. Natl. Acad. Sci. USA 102, 1436-1441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Engelman, A. (2005) Proc. Natl. Acad. Sci. USA 102, 1275-1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bushman, F. D. (2002) Curr. Top. Microbiol. Immunol. 261, 165-177. [DOI] [PubMed] [Google Scholar]
- 10.Sandmeyer, S. (2003) Proc. Natl. Acad. Sci. USA 100, 5586-5588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhu, Y., Dai, J., Fuerst, P. G. & Voytas, D. F. (2003) Proc. Natl. Acad. Sci. USA 100, 5891-5895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Katzman, M. & Sudol, M. (1998) J. Virol. 72, 1744-1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Appa, R. S., Shin, C. G., Lee, P. & Chow, S. A. (2001) J. Biol. Chem. 276, 45848-45855. [DOI] [PubMed] [Google Scholar]
- 14.Wang, J. Y., Ling, H., Yang, W. & Craigie, R. (2001) EMBO J. 20, 7333-7343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Goodarzi, G., Chiu, R., Brackmann, K., Kohn, K., Pommier, Y. & Grandgenett, D. P. (1997) Virology 231, 210-217. [DOI] [PubMed] [Google Scholar]
- 16.Fitzgerald, M. L. & Grandgenett, D. P. (1994) J. Virol. 68, 4314-4321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hazuda, D. J., Anthony, N. J., Gomez, R. P., Jolly, S. M., Wai, J. S., Zhuang, L., Fisher, T. E., Embrey, M., Guare, J. P., Jr., Egbertson, M. S., et al. (2004) Proc. Natl. Acad. Sci. USA 101, 11233-11238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hacein-Bey-Abina, S., Von Kalle, C., Schmidt, M., McCormack, M. P., Wulffraat, N., Leboulch, P., Lim, A., Osborne, C. S., Pawliuk, R., Morillon, E., et al. (2003) Science 302, 415-419. [DOI] [PubMed] [Google Scholar]

