Abstract
A coding region homologous to the sequence for essential eukaryotic enzyme dUTPase has been identified in different genomic regions of several viral lineages. Unlike the nonprimate lentiviruses (caprine arthritis- encephalitis virus, equine infectious anemia virus, feline immunodeficiency virus, and visna virus), where dUTPase is integrated into the pol coding region, this enzyme has never been demonstrated to be present in the primate lentivirus genomes (human immunodeficiency virus type 1 [HIV-1], HIV-2, or the related simian immunodeficiency virus). A novel approach allowed us to identify a weak but significant sequence similarity between HIV-1 gp120 and the human dUTPase. This finding was then extended to all of the primate lentivirus lineages. Together with the recently reported fragmentary structural similarity between the V3 loop region and the Escherichia coli dUTPase (P. D. Kwong, R. Wyatt, J. Robinson, R. W. Sweet, J. Sodroski, and W. A. Hendrickson, Nature 393:648–659, 1998), our results strongly suggest that an ancestral dUTPase gene has evolved into the present primate lentivirus CD4 and cytokine receptor interacting region of gp120.
The role of the dUTPase protein is to produce dUMP to decrease the intracellular concentration of dUTP so that uracil cannot be misincorporated into DNA (12). This enzyme, essential in eukaryotes (4), has been acquired by multiple viral lineages (11). dUTPase sequences are highly variable, and no position is strictly conserved in an alignment (2) of sequences from mammals, yeast, plants, Escherichia coli, and viruses, making it impossible to derive a suitable consensus sequence to retrieve all dUTPases from sequence databases. On the other hand, dUTPases from the same subfamily (e.g., from lentiviruses) are too closely related to build a motif capable of retrieving distant relatives by using current methods.
Kwong et al. (6) have recently determined the structure of the human immunodeficiency virus type 1 (HIV-1) envelope glycoprotein gp120. Although this structure has no precedent, the authors noticed fragmentary structural similarities between the outer domain of gp120 and unspecified segments of the E. coli dUTPase (7). However, they were unable to find evidence of any sequence relationship to confirm this. Given that most nonprimate lentiviruses are known to encode a dUTPase, we further investigated this coincidence and were able to detect a subtle but significant sequence similarity between human dUTPase and the gp120 of primate lentiviruses.
A new approach that takes advantage of the high variability between dUTPases was used to capture some invariant properties of the dUTPase sequences. Starting from an alignment of lentivirus dUTPases (from caprine arthritis-encephalitis virus, equine infectious anemia virus, feline immunodeficiency virus, and visna virus; SwissProt accession no. P33459, P03371, P16088, Q84809, and P35956, respectively), we discarded all strictly conserved positions as noninformative and concentrated instead on the positions for which the contrast between residue variability and invariance of the hydropathy index (5) was the strongest. A regular expression motif, spanning 113 positions and allowing one gap of up to four residues, was then designed to capture these most informative (hydropathy-wise) positions: [VCA][PAQ]<.3>[MTHFS]<.25>[TAIG]<.25>[QGN]<.2>[CMLI] <.3>[GNST]<.2>[NASGE]<.27>[NSVIT]<.15><.?4>[FYI]
where <.n> denotes a fixed spacing of n successive positions occupied by any of the 20 amino acids and <.?n> denotes a variable spacing (i.e., gap) of 0 to n positions. Brackets enclose the choices of residues allowed at a given position.
This highly degenerate pattern was used to scan (9) the viral section of the GenBank database and located a putative dUTPase similarity in frame with the env gene product of two different HIV-1 strains (AL and Z3; GenBank accession no. U95476 and K03347, respectively). To assess the statistical significance of this finding, a synthetic database of one million sequences was generated by repetitive randomization of the gp120 sequence of the HIV-1 subtype B strain HXB2 (GenBank accession no. K03455). Three hundred forty-one occurrences of the pattern shown above were found, leading to an estimated P value of 3.4 × 10−4.
The two identified HIV-1 env-encoded amino acid sequences were then aligned with all known dUTPases (2) to determine their closest relative. Interestingly, the highest similarity was obtained with the human dUTPase (GenBank accession no. P33316).
The human dUTPase sequence was then aligned with the gp120 consensus sequences (10) for HIV-1 groups O and M (subtypes A to H). In pairwise comparisons, each HIV-1 subtype had 21 to 25 identical residues to the human dUTPase sequence (Fig. 1). Despite the variability of HIV-1 sequences in this region, the combined multiple alignment exhibited 16 strictly conserved positions and spanned 140 of 145 residues of the human dUTPase (Fig. 1). Such a 16-residue consensus pattern was not found to occur in a synthetic database of 10 million randomized gp120 sequences, a finding that corresponds to a high statistical significance (P < 10−7). Its biological relevance is confirmed by the fact that when the viral section of the GenBank database was scanned, 775 sequences (all corresponding to HIV-1 gp120) were found to exhibit this consensus.
The gp120 sequences of the various primate lentivirus lineages are very divergent, and their simultaneous alignment introduces numerous insertions and deletions, obscuring their individual relationship with the dUTPase sequence. Indeed, the pairwise alignments of human dUTPase with gp120 of various HIV-1, HIV-2, and simian immunodeficiency virus (SIV) sequences exhibit comparable levels of similarity (22 to 28 identical residues) (Fig. 2). Thus, all primate lentivirus gp120 sequences appear evolutionarily equidistant from the human dUTPase sequence.
In contrast to other lentiviruses, primate lentiviruses lack a dUTPase activity (3, 18). Other viruses are known to have incorporated the dUTPase gene at different locations (mostly in the pol region) in their genomes (2, 3). Thus, it is tempting to postulate that the central portions of HIV and SIV gp120 envelope proteins (residues 218 to 380 in Fig. 1) may have originated from the insertion of a mammalian dUTPase. Based on the HIV-1 gp120 three-dimensional structure (6, 14), this area involves mostly β-sheets and the V3 loop. Similarly, dUTPase structures show a predominance of β-sheets (7, 13). Nine of the strictly conserved residues (Leu 33, Lys 52, Ile 55, Pro 70, Ile 80, Gly 83, Gly 85, Gly 98, and Leu 120; numbered according to the E. coli dUTPase sequence [7]) cluster in the core of the dUTPase β-barrel or are located in the conserved β-sheets of the known dUTPase structures (7, 13). This suggests that their conservation is important in maintaining the overall fold. CD4 binding involves three critical residues in gp120: Asp 368, Glu 370, and Trp 427 (6). Asp 368 and Glu 370 are among those conserved in our alignment (Fig. 1) and in that with the other primate lentiviruses (Fig. 2). The V3 loop, involved in the chemokine receptor binding (14), is also remarkably central to the putatively dUTPase-derived region.
By itself, the sequence homology identified here is not sufficient to discriminate between convergent or divergent evolution. However, the fact that the human dUTPase is the closest relative to primate lentivirus gp120, and appears equidistant from these now-divergent sequences, argues that the gene encoding it was acquired by an ancestral lentivirus from its host, before the separation of the main primate lentivirus lineages. Presumably, the enzymatic role of dUTPase evolved to the current receptor function, accelerating the loss of sequence similarity while retaining part of the original fold (6). The loss of dUTPase activity may bear some evolutionary advantages for the virus, such as increasing its mutability (8, 18) and inducing its latency in nondividing infected macrophages and T cells (16, 17). Alternatively, the loss of dUTPase might have been compensated by the function of the vpr gene in primate lentiviruses (1).
Acknowledgments
We acknowledge the helpful comments of an anonymous referee.
This work was partially supported by a grant from Incyte Pharmaceuticals, Inc. D.L.R. is funded by the Agence Nationale de Recherches sur le SIDA.
REFERENCES
- 1.Bouhamdan M, Benichou S, Rey F, Navarro J-M, Agostini I, Spire B, Camonis J, Slupphaug G, Vigne R, Benarous R, Sire J. Human immunodeficiency virus type 1 Vpr protein binds to the uracil DNA glycosylase DNA repair enzyme. J Virol. 1996;70:697–704. doi: 10.1128/jvi.70.2.697-704.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Corpet F, Gouzy J, Kahn D. The ProDom database of protein domain families. Nucleic Acids Res. 1998;26:323–326. doi: 10.1093/nar/26.1.323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Elder J H, Lerner D L, Hasselkuslight C S, Fontenot D J, Hunter E, Luciw P A, Montelaro R C, Phillips T R. Distinct subsets of retroviruses encode dUTPase. J Virol. 1992;66:1791–1794. doi: 10.1128/jvi.66.3.1791-1794.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gadsen M H, McIntosh E M, Game J C, Wilson P J, Haynes R H. dUTP pyrophosphatase is an essential enzyme in Saccharomyces cerevisiae. EMBO J. 1993;12:4425–4431. doi: 10.1002/j.1460-2075.1993.tb06127.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hopp T P, Woods K R. Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA. 1981;78:3824–3828. doi: 10.1073/pnas.78.6.3824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kwong P D, Wyatt R, Robinson J, Sweet R W, Sodroski J, Hendrickson W A. Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing antibody. Nature. 1998;393:648–659. doi: 10.1038/31405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Larsson G, Svensson L A, Nyman P O. Crystal-structure of the Escherichia coli dUTPase in complex with a substrate analog. Nat Struct Biol. 1996;3:532–538. doi: 10.1038/nsb0696-532. [DOI] [PubMed] [Google Scholar]
- 8.Lerner D L, Wagaman P C, Phillips T R, Prosperogarcia O, Henriksen S J, Fox H S, Bloom F E, Elder J H. Increased mutation frequency of feline immunodeficiency virus lacking functional deoxyuridine-triphosphatase. Proc Natl Acad Sci USA. 1995;92:7480–7484. doi: 10.1073/pnas.92.16.7480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lopez, F., C. Abergel, and J.-M. Claverie. The LookFor program. http://igs-server.cnrs-mrs.fr/lookfor.
- 10.Los Alamos HIV Sequence database.http://hiv-web.lanl.gov/.
- 11.McGeoch D J. Protein sequence comparisons show that the ‘pseudoproteases’ encoded by poxviruses and certain retroviruses belong to the deoxyuridine triphosphate family. Nucleic Acids Res. 1990;18:4105–4110. doi: 10.1093/nar/18.14.4105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.McIntosh E M, Ager D D, Gadsen M H, Haynes R H. Human dUTP pyrophosphatase—cDNA sequence and potential biological importance of the enzyme. Proc Natl Acad Sci USA. 1992;89:8020–8024. doi: 10.1073/pnas.89.17.8020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Prasad G S, Stura E A, McRee D E, Laco G S, Hasselkuslight C, Elder J H, Stout C D. Crystal structure of dUTP pyrophosphatase from feline immunodeficiency virus. Protein Sci. 1996;5:2429–2437. doi: 10.1002/pro.5560051205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rizzuto C D, Wyatt R, Hernandez-Ramos N, Sun Y, Kwong D, Hendrickson W A, Sodroski J. A conserved HIV gp120 glycoprotein structure involved in chemokine receptor binding. Science. 1998;280:1949–1953. doi: 10.1126/science.280.5371.1949. [DOI] [PubMed] [Google Scholar]
- 15.Sharp P M, Robertson D L, Gao F, Hahn B H. Origins and diversity of human immunodeficiency viruses. AIDS. 1994;8:S27–S42. [Google Scholar]
- 16.Steagall W K, Robek M D, Perry S T, Fuller F J, Payne S L. Incorporation of uracil into viral-DNA correlates with reduced replication of EIAV in macrophages. Virology. 1995;210:302–313. doi: 10.1006/viro.1995.1347. [DOI] [PubMed] [Google Scholar]
- 17.Strahler J R, Zhu X X, Hora N, Wang Y K, Andrews P C, Roseman N A, Neel J V, Turka L, Hanash S M. Maturation stage and proliferation-dependent expression of dUTPase in human T-cell. Proc Natl Acad Sci USA. 1993;90:4991–4995. doi: 10.1073/pnas.90.11.4991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Turelli P, Guigen F, Mornex J-F, Vigne R, Quérat G. dUTPase-minus caprine arthritis-encephalitis virus is attenuated for pathogenesis and accumulates G-to-A substitutions. J Virol. 1997;71:4522–4530. doi: 10.1128/jvi.71.6.4522-4530.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]