Abstract
The human immunodeficiency virus type 1 (HIV-1) epidemic in Southeast Asia has been largely due to the emergence of clade E (HIV-1E). It has been suggested that HIV-1E is derived from a recombinant lineage of subtype A (HIV-1A) and subtype E, with multiple breakpoints along the E genome. We obtained complete genome sequences of clade E viruses from Thailand (93TH057 and 93TH065) and from the Central African Republic (90CF11697 and 90CF4071), increasing the total number of HIV-1E complete genome sequences available to seven. Phylogenetic analysis of complete genomes showed that subtypes A and E are themselves monophyletic, although together they also form a larger monophyletic group. The apparent phylogenetic incongruence at different regions of the genome that was previously taken as evidence of recombination is shown to be not statistically significant. Furthermore, simulations indicate that bootscanning and pairwise distance results, previously used as evidence for recombination, can be misleading, particularly when there are differences in substitution or evolutionary rates across the genomes of different subtypes. Taken jointly, our analyses suggest that there is inadequate support for the hypothesis that subtype E variants are derived from a recombinant lineage. In contrast, many other HIV strains claimed to have a recombinant origin, including viruses for which only a single parental strain was employed for analysis, do indeed satisfy the statistical criteria we propose. Thus, while intersubtype recombinant HIV strains are indeed circulating, the criteria for assigning a recombinant origin to viral structures should include statistical testing of alternative hypotheses to avoid inappropriate assignments that would obscure the true evolutionary properties of these viruses.
Viruses involved in the human immunodeficiency virus type 1 (HIV-1) pandemic are grouped into the main (M), the outlier, and the non-M, non-O groups. Phylogenetic analysis of the env and gag genes of the M group has established 10 distinct subtypes, or clades (A through H, K, and J) (11, 26, 33, 60; information found in the HIV Molecular Immunology database [http://hiv-web.lanl.gov/immuno/ctl]). A high amount of genetic diversity has developed among and within these clades through nucleotide substitution, duplication, deletion, and recombination of closely related or divergent viral strains (1, 6, 18, 27, 42, 43, 47, 49). The relatively high level of genetic divergence between the M group clades has led to the hypothesis that multiple vaccines against HIV-1 may have to be made against the different subtypes of the virus (20, 35). Sequence information on most of the nine subtypes is currently limited, suggesting that more information will be needed if subtype-specific vaccines are to be produced.
Previous studies of clade E viruses from both Thailand and the Central African Republic suggest that HIV-1E originated in Africa and then spread through a single introduction into Southeast Asia (12, 35, 38, 39). HIV-1E predominates in a growing epidemic in Southeast Asia and is expected to represent a major proportion of new HIV infections in the coming decades (65). The number of HIV-1-infected individuals in Thailand is estimated to be 750,000, with 90% of the sexually transmitted viruses belonging to subtype E (61, 64) and over half of the recently infected intravenous drug users infected with subtype E (24, 32, 57).
Phylogenetic analysis of HIV-1 group M viruses has led to the discovery of intersubtype recombinants, each having genome regions that are evolutionarily associated with different subtypes (4, 5, 9, 12, 22, 30, 37, 43, 46, 49). Typically, unique recombinants are represented by individual strains. However, in some instances, entire groups appear to be descended from a recombinant lineage: viruses in subtype E, those ascribed to the circulating recombinant form IbNG (3, 36), and those in recent outbreaks in Russia (31) and in China (51, 52), are examples of these. However, the clade E viruses are unusual in that only a single parental strain has been identified. These viruses, recently designated HIV-1 subtype A/E, are described as recombinant lineages, with regions of the gag and pol genes derived from the A subtype and regions of the env and vpu genes derived from an unknown, subtype E parental strain (4, 12).
Several techniques have been designed to detect recombination events. Some examples include Stephens' method, based on incompatible sites (56); Sawyer's method, based on imbalances in the distribution of sequence segments (50); Smith's chi-square method (55); Jakobsen and Easteal's method of displaying compatibility matrices (21); Grassly and Holmes' sliding window likelihood approach (14); Weiller's graphical method, based on character partitions (63); and the RIP program of Siepel et al. (54). The techniques originally used in identifying recombination events within the present HIV-1E genome, which is thought to have had one of its parental lineages either die out or go undetected, include a combination of bootscanning (48) and pairwise distance analyses (12).
The bootscanning analysis uses bootstrapped phylogenetic analyses (7) on a sliding window of sequential and overlapping segments of the HIV-1 genome alignment. Previous HIV-1 subtype E recombination analyses have relied on the three HIV-1 subtype E complete genomes available in the literature. We have sequenced four more subtype E complete genomes, two from Thailand and two from the Central African Republic. In order to locate putative regions of intersubtype recombination, we performed bootscanning and pairwise distance analyses that yielded results similar those obtained by other researchers (4, 12). However, using more-stringent statistical techniques (described below), we found no support for the hypothesis that HIV-1E is derived from a recombinant ancestor. Using simulations, we also found that variations in substitution and/or evolutionary rates across the HIV-1 genome can appear as putative recombination events with bootscanning and pairwise distance analyses. Our analyses indicate that the evolutionary relationship of subtype A and E viruses is similar to that of subtype B and D viruses, the latter of which have never been classified as recombinants. Our analyses also suggest that subtype E viruses were not derived from a recombinant lineage and in fact form an independent monophyletic clade that, similar to the relationship described for clades B and D, maintains a relatively close evolutionary affinity to HIV-1A.
MATERIALS AND METHODS
Terminology.
Throughout this article, all previously defined HIV-1 subtypes are assumed to form monophyletic groups. All variants that have recently been termed HIV-1 subtype A/E are referred to as “HIV-1 subtype E” variants. The term “evolutionary rate” refers to the measure of nucleotide divergence over time from a common ancestor: we assume that two contemporary lineages that share a common ancestor have different evolutionary rates if the branch lengths are different. The term “substitution rate” refers to the expected rate of substitution of one nucleotide for another. A “recombinant lineage” is broadly defined as a phylogenetic subset of variants that share a common ancestor that acquired different parts of its genome from two or more parental strains. In the case of “intersubtype recombination,” the founder for the recombinant lineage would be formed by a recombination event between viruses from two different viral subtypes.
Sample acquisition, viral gene amplification, cloning, and sequencing.
HIV-1 subtype E viral isolates used in this study were obtained from individuals in the Central African Republic (90CF11697 and 90CF4071) (38) and Thailand (93TH057 and 93TH065) (62). A nested set of PCRs were used to amplify either half-genome (∼5,000 bp)- or quarter-genome (∼2,500 bp)-sized fragments from end point-diluted proviral samples (Expand High Fidelity PCR System; Boehringer Mannheim, Indianapolis, Ind.) according to the manufacturer's instructions (44).
Either overlapping PCR fragments from each patient were subcloned into pAMP1 (Gibco BRL, Grand Island, N.Y.) and sequenced or the PCR product was directly sequenced using dye terminator chemistry on an automated DNA sequencer model 373A or 377 (Applied Biosystems, Foster City, Calif.) according to the manufacturer's instructions. Sequence data were assembled using the computer program Sequencher (Gene Codes Corp., Ann Arbor, Mich.).
Sequence alignment.
Newly determined clade E sequences 93TH057, 93TH065, 90CF4071, and 90CF11697 were aligned, using the program ClustalW (59), with the complete genomes of 42 subtype reference sequences that are representative of the variability of the current HIV-1 pandemic (2, 25) (Table 1). The initial ClustalW-generated multiple sequence alignment was manually refined to maximize the positional homology of nucleotides while minimizing the number of gaps introduced into the alignment. Amino acid coding information for each sequence was also used in the manual refinement of the sequences. Regions of the alignment where positional homology was undeterminable were excluded, but the alignment was not completely gap-stripped. Thus, 8,650 sites were included in the final alignment. Since some analyses require that the reference alignment does not include recombinants, a second alignment (alignment 2) was made with the newly determined clade E sequences and the complete genomes of 32 of the 42 subtype reference sequences. This second alignment included only a single outgroup sequence, CPZGAB, and excluded the sequences of subtypes AG and AGI, which are thought to be derived from recombinant lineages.
TABLE 1.
Complete genome subtype reference sequencesa
Sequence name | Accession no. | Subtype |
---|---|---|
92UG037 | U51190 | A |
Q23 | AF004885 | A |
SE7253 | AF069670 | A |
IBNG | L39106 | AG |
DJ264 | AF063224 | AG |
DJ263 | AF063223 | AG |
CY032 | AF049337 | AGI |
PVMY | AF119819 | AGI |
PVCH | AF119820 | AGI |
BRU | K02013 | B |
MN | M17449 | B |
AD8 | AF004394 | B |
JRFL | U63632 | B |
RF | M17451 | B |
WEAU | U21135 | B |
ETH2220 | U46016 | C |
92BR025 | U52953 | C |
IN21068 | AF067155 | C |
96BW05 | AF110967 | C |
NDK | M27323 | D |
ELI | K03454 | D |
94UG114 | U88824 | D |
84ZR085 | U88822 | D |
90CF402 | U51188 | E |
93TH253 | U51189 | E |
CM240 | U54771 | E |
93BR020 | AF005494 | F |
VI850 | AF077336 | F |
FIN9363 | AF075703 | F |
SE6165 | AF061642 | G |
HH8793 | AF061640 | G |
DRCBL | AF084936 | G |
90CF056 | AF005496 | H |
VI991 | AF190127 | H |
VI997 | AF190128 | H |
SE9280 | AF082394 | J |
SE9173 | AF082395 | J |
YBF30 | AJ006022 | Group N |
MVP5180 | L20571 | Group O |
ANT70 | L20587 | Group O |
CPZGAB | X52154 | SIVCPZ |
CPZUS | AF103818 | SIVCPZ |
Bootscan analysis.
For bootscan analysis (48), alignment 2 was divided into sequential overlapping segments of 500 nucleotides, with 100 nucleotide steps between each segment. The window size of 500 nucleotides was chosen to correspond with previous studies that describe subtype E variants as recombinants (4, 12). Bootstrapping using 100 replicates was performed on each segment using the program PAUP* 4.0 (58). The bootstrapping analysis used a general time-reversible (GTR) model (28) of nucleotide substitution and a gamma-distributed site-to-site heterogeneity of rates with invariant sites, all estimated from the original data set. The bootstrap values that supported the monophyly of all HIV-1E and HIV-1A sequences were then plotted.
Distance analysis.
The pairwise distance analysis performed was similar to that described by Gao et al. (12), with our sequence alignment divided into sequential overlapping segments of 500 nucleotides and moving in steps of 100 nucleotides between each segment. Maximum likelihood distances were calculated under the Felsenstein 84 model of evolution for each segment using the program DNADIST from the PHYLIP package (8). The intersubtype and intrasubtype pairwise distances were calculated and plotted across the HIV-1 genome.
Kishino-Hasegawa testing.
Likelihood ratio tests (19) were performed on substitution rate models to identify the model that produced significantly better likelihood scores for the given data set. The program Modeltest was used to compare the different nested models of DNA substitution in a hierarchical hypothesis-testing framework (40). The nested nature of this group of nucleotide substitution models also allows measurement of the effects that different model parameters have on the goodness of fit.
Kishino-Hasegawa (KH) tests were conducted using the PAUP* 4.0 program (58). The KH test compares the likelihood scores of two phylogenetic trees defined a priori (e.g., trees that correspond to hypotheses derived independently of the observed data) over a given region of an alignment and tests the null hypothesis that there is no statistical difference in the likelihoods of the two trees. The KH test is designed to compare only tree topologies and thus does not use fixed branch lengths in the calculation of likelihood scores. Therefore, two identical topologies will always produce identical likelihood scores and will satisfy the null hypothesis of no difference. We were interested in knowing if there is a statistically significant difference between topologies for which HIV-1E sequences share a common node with HIV-1A sequences and those for which the two subtypes are separate monophyletic groups that do not exclusively share a common node. The complete genome reference sequence alignment was divided into nine regions that corresponded to variations in the tree topology, as indicated by the bootscan results. The maximum likelihood (ML) tree topology for each region was constructed. Constrained tree topologies, which forced clade E sequences to share a common node with the clade A sequences, were also constructed for regions where the best topology grouped clade E viruses evolutionarily closer to subtypes other than HIV-1A. The KH test was used to test the null hypothesis that there is no statistical difference in the likelihoods of the best-tree topology and the constrained-tree topology.
Novel likelihood-based statistical tests of topologies have been developed to correctly compare tree topologies that are derived from the observed data (i.e., the ML tree topology) (13, 53). These tests are computationally intensive, limiting the size of sample sets to 10 or fewer sequences. The KH test, however, can be used with larger sample sets, although using the KH test to compare prederived topologies may force the KH test to be too conservative because the test was designed to statistically compare the topologies of two trees defined a priori (13).
Simulated data.
In evaluating recombination, it is instructive to ask whether nonrecombining sequences can show similar bootscan and pairwise distances as those obtained previously with subtype E sequences. Differences in the rates of evolution or substitution across the genome can potentially confound phylogenetic signal (15, 16). To address this issue, we simulated multiple nucleotide alignments corresponding to the 36 complete genome sequences (alignment 2) used in our bootscanning and pairwise distance analyses. Since we were interested in knowing whether an equivalent pattern can be produced in the absence of recombination, we simulated data using a single identical phylogeny across the genome, broken into nine segments (see Fig. 1). The nine contiguous segments corresponded to regions that had previously been identified by bootscanning as having potentially different phylogenetic affinities (possibly as a result of recombination). This segmenting of the genome allowed us to vary the rate of evolution along the genome to correspond with the observed data while maintaining a consistent tree topology. The nine regions were then concatenated to form a simulated complete genome alignment that maintained variations in rates of evolution along the genome. The complete genome phylogeny for the 36 isolates included in alignment 2 was used as the tree topology for all nine regions. In the absence of recombination, the complete genome ML topology should provide an accurate reconstruction of relationships among lineages (17). This topology places E and A viral isolates as separate monophyletic groups that together form a larger monophyletic group. These simulations were performed using the program Seq-Gen, version 1.1 (41). To produce the simulated data sets, Seq-Gen used a GTR model (28) of nucleotide substitution, gamma-distributed site-to-site heterogeneity of rates, and nucleotide frequencies that were estimated from the observed sequence data (Table 2). Therefore, the evolutionary rate differences along the simulated genome reflect the differences seen in the observed data. The program also used a phylogenetic tree as the true tree for each data set that it produced. The true trees for each of the nine regions were produced by building a constrained tree for each region based on the whole genome ML tree topology, which maintains clades A and E as two separate monophyletic groups. Therefore, the same tree topology is maintained along the entire genome, thus ensuring that recombination has not taken place in the simulated data but accounting for variations in the evolutionary rate by allowing for variations in the branch lengths for each of the nine simulated regions.
FIG. 1.
Plot showing bootstrap clustering values comparing subtypes A and E along the genome. The x axis shows the relative position across the HIV-1 genome. Numbered sections along the HIV-1 gene map identify the nine contiguous genomic regions that were used in later analyses.
TABLE 2.
Models of evolution used by Seq-Gen to simulate multiple sequence alignmentsa
Nucleotide substitution rates used in the GTR model of evolution, base frequencies, and the gamma distributed site-specific rate heterogeneity are indicated for each of the nine contiguous regions along the genome.
The complete genome ML topology was used as the model tree to generate 100 independent simulations. The model of substitution and the branch lengths of the model tree varied along the length of the genome corresponding to regional variations in the observed data. Bootscan and pairwise distance analyses were performed on the simulated data, as with the original data.
Nucleotide sequence accession numbers.
GenBank accession numbers for the full-length HIV-1 sequences reported in this study are AF197338 (93TH057), AF197339 (93TH065), AF197340 (90CF11697), and AF197341 (90CF4071).
RESULTS
Bootscan analysis.
Complete genome phylogenetic analysis can show the broad relationships among viral populations, but a more detailed analysis that breaks up the genome into smaller sections may reveal relationships that are more intricate. Bootscan analysis, which breaks the genome into small sections and then analyzes each section independently, has been used to identify areas of recombination within an HIV-1 genome (48). Ideally, bootscan analysis of nonrecombining genomes would show consistently high bootstrap support (i.e., >70%) across the genome for major clades containing the same sets of sequences. If some sequences were recombinants, then their phylogenetic placement at different regions of the genome would vary, grouping with one parental lineage for one region of the genome and with another lineage for a different region (48). However, it is unclear whether inconsistent bootstrap support for clade membership or phylogenetic incongruence in and of itself is sufficient evidence of the presence of recombinant lineages.
The bootscan plot we obtained using the full set of 36 genomes was comparable to previous clade E bootscanning results produced by Carr et al. (4) and Gao et al. (12). Sequences from clades A and E group with a high bootstrap value over most of the genome, with regions of lower bootstrap values located in env, nef, and vif (Fig. 1).
Pairwise distance analysis.
Pairwise distance analyses have been used in conjunction with bootscan analyses to support the idea of recombination within the clade E genome. Previous pairwise distance analyses have shown that clade E and clade A viruses are relatively close to one another in gag but more distant in env (4, 12). Also, there is no apparent parental lineage for the subtype E viruses for part of the genome (i.e., there are no known extant lineages that appear to be more closely related phylogenetically to subtype E viruses over regions of the vif, env, and nef genes). Without a parental clade E available, the env region of HIV-1E variants does not show close genetic affinity with any subtype. Our own pairwise distance analyses show similar results, with HIV-1E sequences sharing great similarity to HIV-1A sequences in gag and pol but diverging in env. Interestingly, the same pattern is also seen with HIV-1B and HIV-1D sequences, but it has never been suggested that subtypes B and D are recombinant lineages (Fig. 2).
FIG. 2.
Nucleotide divergence measurements across the HIV-1 M group genome for alignment 2. (a) Pairwise distance plot showing intra- and intersubtype maximum likelihood distances. The gray lines indicate the intersubtype distances across the genome for all subtype pairs except A versus E and B versus D. The teal lines indicate intrasubtype distances across the genome. (b) Pairwise distance plot showing intra- and intersubtype maximum likelihood distances as described for panel a, with A/E and B/D intersubtype distances included. The x axis shows the relative position across the HIV-1 genome.
Complete genome phylogenetic analysis of subtype E.
A phylogenetic tree that included all available complete clade E genomes, three from Gao et al. (12) and Carr et al. (4) and four determined in this study, along with the complete genomes from the 39 other subtype reference sequences, was produced (Fig. 3). HIV-1E and HIV-1A sequences were found to form separate monophyletic groups on this tree. Larger monophyletic groups can also be formed with the A and E clades, and with the A, AG, AGI, and E clades. Viruses from the AG and AGI clades are reported to be derived from recombinant lineages, with portions of their genomes closely associating but forming separate monophyletic groups with subtype A viruses (3, 10) (Fig. 4). The clade E viruses from Thailand cluster separately from the CAR viruses, the latter having a greater overall level of divergence (12). The lower diversity among the Thailand clade E viruses supports the concept that the Thailand clade E epidemic is relatively recent. The clustering of the Thailand viruses within the more diverse CAR clade E subgroup supports the hypothesis that the clade E viruses in Southeast Asia were introduced from Africa (12, 38).
FIG. 3.
Maximum likelihood phylogenetic relationships of newly derived HIV-1 clade E viral genomes and complete genomes representative of other HIV-1 group M clades, with bootstrap values of 70% or greater indicated. The tree was constructed using the ML method and the GTR substitution model as described in Materials and Methods and reference 29.
FIG. 4.
Locations of the nine adjacent regions used in the analysis of clade E and A variants (numbered 1 through 9), along with the inferred mosaic structures for subtypes AG and AGI (3, 10). Approximate genomic coordinates are indicated by the position within the HXB2 reference sequence. Regions of different subtypes located in the AG and AGI sequences are indicated by the single letters A, G, and I. U, unknown. Dashed lines indicate the breakpoints for the nine analyzed regions superimposed onto the AG, AGI, and HIV-1 gene maps.
Kishino-Hasegawa tests.
Nine ML phylogenetic tree topologies were constructed across the genome to identify regions of topological incongruence (Fig. 5). The model of evolution used for each region is shown in Table 3. Each of the nine topologies was constructed from a genomic region that had previously been identified by bootscanning as having a potentially different phylogenetic affinity to subtype E sequences (Fig. 1), and together the nine regions spanned the entire HIV-1 genome. In six of the nine regions, the best ML tree topology grouped clade E viruses evolutionarily more closely to subtypes other than HIV-1A (Fig. 5). For these, constrained topologies were constructed which forced HIV-1A and HIV-1E viruses to exclusively share a common node on the tree. The constrained topology in regions 4, 5, 8, and 9 forced clade E viruses to group with clade A viruses, while the constrained topology in regions 6 and 7 forced the clade E viruses to group with viruses from clades A and AG (the mosaic subtype AG has been reported to be subtype A in these regions [3]) (Fig. 4). To test for significant differences between the best-ML tree topologies and the constrained topologies, the Kishino-Hasegawa test was used (23). In every set tested, we found that a topology that maintained clades A and E as separate but closely related monophyletic groups never produced a likelihood score statistically worse than those produced by topologies that group clade E with other subtypes (Fig. 6). Therefore, the KH test could not reject the hypothesis that, across the genome, clades A and E are maintained as separate monophyletic groups that share a close association to one another. These results indicate that no significant topological incongruence occurs along the genome and that thus no recombination is required to account for these topologies.
FIG. 5.
Maximum likelihood trees for each of the nine genome segments (shown in Fig. 4), showing the phylogenetic relationships of clade E viruses in comparison to representative sequences of other HIV-1 group M clades. The coordinates of each segment are given relative to the HXB2 genome. Subtype designations are indicated in the names of the sequences (i.e., J_ indicates subtype J).
TABLE 3.
Models of evolution used in constructing tree topologiesa
The model of evolution, nucleotide substitution rates, base frequencies, gamma distributed site-specific rate heterogeneity, and the proportion of invariable sites are indicated for each of the nine regions along the genome.
FIG. 6.
Kishino-Hasegawa test of topological incongruence across the entire genome. For each region indicated, the best-tree topology (shown in Fig. 5) was compared to the constrained-tree topology shown that forced clade E sequences to group either with clade A exclusively or with clades A and AG. In regions 6 and 7, where clade E was forced to group with clades A and AG, clade AG has been reported to be of subtype A (3). Likelihood scores for the best and constrained topologies are indicated, along with P values for each regional comparison (statistical significance was set as α being 0.05). A P value of <0.05 indicates that the best topology is significantly better than the constrained topology.
To evaluate the significance of separate monophyletic groupings of clades A and E, we constructed two sets of constrained topologies for each of the nine regions. These constrained topologies forced the most basal clade E sample to be grouped within clade A or vice versa. In all regions, these constrained topologies, which did not maintain A and E as separate monophyletic groups, produced significantly worse likelihood scores when compared to the best-ML tree topology (data not shown).
Simulated data.
Bootscan analyses showed a consistently low level of bootstrap support for grouping clades A and E over portions of the env and vif regions (Fig. 1). Intersubtype pairwise distance analyses also showed that the env region of clades A and E are more dissimilar than the gag and pol regions (Fig. 2). To determine if factors other than recombination can cause such changes in the bootscan and distance measures, we performed bootscanning and pairwise distance analysis on alignment simulations. Interestingly, the simulated data produced bootscan patterns similar to that obtained with the real sequence data (Fig. 7a). Since the true tree (i.e., the tree used to construct the simulated sequences) maintained the same topology over the entire genome, it is interesting to note that this topology is not consistently supported by bootstrap analysis as one moves along the genome in 500-nucleotide steps. Likewise, results of pairwise distance measurements show similar patterns with simulations as those obtained with the real data (Fig. 7b).
FIG. 7.
Bootscan and pairwise distance analyses of real and simulated nucleotide sequence alignment data sets. (a) Bootscan plot showing the bootstrap value along the genome of subtypes A and E forming a monophyletic group for the real data (black line) and 10 independent simulations (gray lines) across the genome. The x axis shows the relative position across the HIV-1 genome, with the nine contiguous regions indicated by the alternately shaded sections. (b) Pairwise distance plots showing intersubtype maximum likelihood distances of the observed data (black lines) and 10 independent simulations (gray lines) across the genome. The x axis shows the relative position across the HIV-1 genome, with the nine contiguous regions indicated by the alternately shaded sections.
Our studies suggest a clear framework for the evaluation of potential recombinant genomes, in which topological relationships are evaluated with bootscan or other techniques, followed by KH testing for the statistical significance of these relationships. To evaluate the general utility of the KH test to identify recombinant structures, we also applied it to the analysis of other reported recombinant HIV-1 genomes (Table 4). In each of seven cases evaluated, we found statistical support for the hypothesis of a recombinant origin. In all but one of the 16 regions tested, the KH test found statistical support for recombination. The single region that did not produce a significant result was a putative subtype G region in the isolate 94CY032 (Table 4). As a further test of the robustness of this test, we evaluated the reported B/F recombinant virus 93BR029 (11), without the benefit of one parental sequence (Fig. 8), thus simulating the situation with the subtype E viruses, in which only an A-like parental sequence has been identified. In this instance too, statistical support for the recombination hypothesis was evident (Fig. 8). Thus, the E genome analysis uniquely stands out as one that fails to support a recombinational origin.
TABLE 4.
Analysis of other intersubtype recombinantsa
Isolate | Gene | Region HXB2 | Putative subtype | Constrained to subtype | − ln L | Difference −ln L | P value | Significant |
---|---|---|---|---|---|---|---|---|
ZAM184 | pol | 3173-3973 | C | C | 6690.35 | (best) | ||
A | 6747.78 | 57.43 | 0.0032 | Yes | ||||
env-nef | 6511-9463 | A | A | 32524.71 | (best) | |||
C | 32619.66 | 94.94 | 0.0001 | Yes | ||||
92RW009 | pol-vpr | 2573-6071 | C | C | 28378.06 | (best) | ||
A | 28806.18 | 428.12 | <0.0001 | Yes | ||||
env | 6109-8195 | A | A | 22831.00 | (best) | |||
C | 22967.94 | 136.94 | <0.0001 | Yes | ||||
MAL | gag | 789-1883 | A | A | 9571.50 | (best) | ||
D | 9713.38 | 141.87 | <0.0001 | Yes | ||||
vpr-env | 5559-8751 | D | D | 33543.66 | (best) | |||
A | 33725.95 | 182.29 | <0.0001 | Yes | ||||
IBNG | gag | 789-2292 | A | A | 13962.19 | (best) | ||
G | 14049.00 | 86.80 | <0.0001 | Yes | ||||
pol | 2293-3173 | G | G | 6442.77 | (best) | |||
A | 6468.69 | 25.92 | 0.0096 | Yes | ||||
93BR029 | pol | 2085-5096 | B | B | 23128.62 | (best) | ||
F | 23276.56 | 147.94 | <0.0001 | Yes | ||||
env | 6220-8795 | F | F | 27885.77 | (best) | |||
B | 28273.97 | 388.19 | <0.0001 | Yes | ||||
BFP90 | gag | 789-1583 | A | A | 7309.43 | (best) | ||
J | 7346.98 | 37.54 | 0.0261 | Yes | ||||
G | 7338.20 | 28.77 | 0.0133 | Yes | ||||
env | 6261-7312 | G | G | 12646.18 | (best) | |||
J | 12675.35 | 29.17 | 0.0225 | Yes | ||||
A | 12749.43 | 103.24 | <0.0001 | Yes | ||||
nef | 8424-9463 | J | J | 12280.21 | (best) | |||
A | 12335.76 | 55.55 | 0.0003 | Yes | ||||
G | 12327.69 | 47.47 | 0.005 | Yes | ||||
94CY032 | gag | 789-1883 | A | A | 9611.10 | (best) | ||
G | 9682.93 | 71.82 | <0.0001 | Yes | ||||
I | 9635.65 | 24.55 | 0.012 | Yes | ||||
pol | 2189-4673 | I | I | 20009.92 | (best) | |||
A | 20140.33 | 130.41 | <0.0001 | Yes | ||||
G | 20036.07 | 26.14 | 0.0423 | Yes | ||||
env | 6311-6773 | G | G | 4386.51 | (best) | |||
A | 4404.26 | 17.75 | 0.0605 | No | ||||
I | 4387.39 | 0.88 | 0.9251 | No |
The Kishino-Hasegawa test was used to test the hypothesis of recombination in seven isolates thought to be formed by intersubtype recombination events. Genomic regions corresponding to different subtypes were analyzed by the Kishino-Hasegawa test. The ML tree for each region was compared to the best constrained topology, grouping the recombinant isolate with the subtypes that formed the variant The region tested is indicated by its relative position in HXB2 (nucleotide positions are given). For each topology the negative log likelihood score (−ln L) is shown along with the difference in scores between the various topologies and P values for each Kishino-Hasegawa test. Significance is assessed at an α of 0.05.
FIG. 8.
Evaluation of the B/F recombinant 93BR029 in analyses of only one of the two parental strains. For these analyses, subtype F sequences were removed from the sequence alignments, creating a situation where subtype B was the only parental sequence studied. (a) Bootscan plot showing bootstrap clustering values comparing the variant 93BR029 and subtype B variants along the genome. The x axis shows the relative position across the HIV-1 genome. Numbered sections along the HIV-1 gene map identify the 10 contiguous genomic regions that were used for Kishino-Hasegawa testing. (b) Kishino-Hasegawa test of topological incongruence across the entire genome. For each region indicated, the best-tree topology that grouped 93BR029 with the subtype B sequences was compared to the best-tree topology that placed 93BR029 outside of the subtype B clade. The tree topology that produced the highest likelihood score was labeled “Best” and was compared to the remaining topology using the Kishino-Hasegawa test. P values for each regional comparison are indicated with statistical significance set as α being 0.05 (statistically significant results were marked with an asterisk). A P value of < 0.05 indicates that the “Best” topology is significantly better than the other topology.
DISCUSSION
Our initial bootscanning and pairwise distance analyses yielded results consistent with those obtained by other investigators (4, 12): they indicated that a recombination event might have occurred in the history of the HIV epidemic, giving rise to a lineage of HIV-1A/E variants. Bootscan analysis showed that gag and pol regions of the genome group HIV-1E and HIV-1A together with strong bootstrap support, whereas analyses of portions of vif, env, and nef failed to provide strong evidence for this same grouping (Fig. 1).
The intersubtype pairwise distance analyses comparing clades A and E have also been used to support the idea of recombination within the clade E genome (4, 12). The distance between clades A and E in gag and pol is within the intrasubtype distance range, while the distance in env, which is thought to be descended from a nonrecombinant proto-E parental virus, reaches the intersubtype distance range (Fig. 2). The close distances in gag and pol and the more extreme distances in env seem to support the recombination hypothesis that clades A and E share an intrasubtype relationship in gag and pol while they are separate subtypes in the env region. However, the analysis of clades B and D also shows a pattern of close distances in gag and pol, with a more divergent region located in env (Fig. 2). Therefore, a hypothesis of recombination taking place between clades B and D would also be supported by these pairwise distance analyses, although this suggestion has not been put forward. In fact, these extreme changes in pairwise distances over different regions of the genome may not be due to recombination but may be the result of variations in evolutionary rates across the genome (43). If variations in evolutionary rate can dramatically affect the pairwise distance analysis, then the ability of pairwise distance analysis to determine or confirm putative areas of recombination is questionable.
Since regional variations in the evolutionary rate may cause extreme changes in the patterns of pairwise distances, it is conceivable that these same variations may also affect the bootscan results. To test this hypothesis, simulations were performed to determine the effect of rate variation on bootscan and pairwise distance analyses. Our results indicate that despite simulating data using a nonrecombining and consistent phylogenetic topology across the genome, the bootscan plots of the simulated data erroneously showed support for a recombination event in the env and vif regions (Fig. 7a). The simulations also showed that clades A and E maintained a pairwise distance similar to that of most intersubtype pairings in the env region but resembling an intrasubtype relationship in the gag and pol regions (Fig. 7b), all without a recombination event having taken place in the generation of these data sets. These simulations highlight the fact that changes in the evolutionary rate across the genome can have profound effects on the distance analysis and can support the hypothesis of recombination when none has occurred. In particular, by sampling relatively small (500-nucleotide) sections of this region, bootscanning may be prone to sitewise heterogeneity in rates of substitutions and poor support (by virtue of a short length of sampled sequence) for short interior branches on a topology. Thus, simulations that invoke differences in the evolutionary rate across the genome can provide a reasonable explanation for the bootscan and pairwise distance results without invoking an intersubtype recombination event.
A likely explanation for the differences in the evolutionary rates across the genome is that different regions of the genome are under different selective pressures. Random mutations in gag or pol are more likely than those in env to adversely affect the fitness of the viral progeny. Furthermore, the rate of nonsynonymous substitutions equals or exceeds that for synonymous substitutions in the gp120 encoding region of env, indicative of diversifying selection (29, 45). On the other hand, purifying selection appears to predominate in gag and pol, as evidenced by a relatively low rate of nonsynonymous substitutions (29, 45). Therefore, gag and pol reside in a more constrained sequence space than env. As the HIV-1A and HIV-1E variants diverge from their most recent common ancestor, the gag and pol regions evolve more slowly into regions of sequence space that maintain fit viruses while the less constrained env regions are diverging more rapidly.
The last piece of evidence cited for the hypothesis of clade E viruses being derived from a recombinant lineage resides in the topological incongruence seen in different regions of the genome. In the gag and pol regions, clade E and clade A viruses are more closely related to each other than to any other subtype, with HIV-1E and HIV-1A each forming a separate monophyletic group (Fig. 5). The env region, on the other hand, groups clade E viruses closer to subtypes other than HIV-1A, supporting the idea that HIV-1E is derived from a recombinant lineage (Fig. 5). It is important, however, to test whether the sequence data in different regions provide statistically greater support for the recombination hypotheses than for the different phylogenetic development hypotheses. The Kishino-Hasegawa test (23) was used to test these hypotheses, and it revealed that the hypothesis that HIV-1E and HIV-1A are separate monophyletic assemblages with a close association to each other is not significantly less supportable than an HIV-1E recombination hypothesis (Fig. 6).
The KH test showed that the constrained topologies that forced clade E viruses to group with clade A viruses were not significantly different from the ML topology across the genome, as long as the clades remained monophyletic. However, these constrained topologies in the vif and env regions may group the HIV-1A and E clades on long branches that are approximately equidistant from every other subtype. These longer branches may still support the hypothesis of recombination in these regions. To address this issue, we constructed further constrained-tree topologies in regions 6 and 7 (Fig. 9) that placed the common node of HIV-1A and HIV-1E above that of HIV-1A and HIV-1AG, the latter pair having been classified as the same subtype in these regions (Fig. 4). A KH test comparing these new constrained topologies to that of the best-ML tree topologies indicated that the difference between the best topology and the constrained topology in either region was not statistically significant (Fig. 9), although the constrained topology in region 7 came close to the significance threshold. Therefore, the long clade E branch seen in these regions does not provide significant evidence of a recombination event. On the weight of this evidence, we find it more reasonable to believe that HIV-1E variants are not the descendents of a recombinant lineage but are monophyletic, although with relatively close evolutionary affinities to HIV-1A. We contend that HIV-1A and HIV-1E maintain separate monophyletic lineages, with a relationship similar to that of HIV-1B and HIV-1D. Finally, we conclude that the use of bootscan and pairwise distance analyses without other statistical tests to locate areas of recombination may result in false identification due to variations in evolutionary rates across the genome.
FIG. 9.
Kishino-Hasegawa test results over the vif and env regions of the HIV-1 genome. The best-tree topology (shown in Fig. 5) was compared to the constrained-tree topology shown here which forced clade E sequences to form a common node with clade A sequences that was evolutionarily closer to clade A than clade AG was to clade A. In these regions clade AG is reported to be of subtype A, and thus the constrained tree places clade A sequences within the A/AG cluster. Likelihood scores and P values were derived as described in the legend to Fig. 6.
We have shown that more-stringent statistical analyses need to be performed to achieve a greater understanding of the intricate processes involved in HIV-1 evolution. Indeed, these tests confirmed the recombinational origin of several other viruses reported in the literature, and this confirmation was not dependent upon the presence of both parental strains (Table 4).
HIV-1 nomenclature is primarily based on the evolutionary relationships of viral variants. The point can be made that if the clade E variants are indeed evolutionarily similar to clade A variants across the entire genome, then clade E may in fact be just a subclade of a larger A subgroup that would include present clade A sequences along with the clade E sequences and portions of clade AG and AGI sequences. Following the current nomenclature requirements (D. L. Robertson, J. P. Anderson, J. A. Bradac, J. K. Carr, B. Foley, R. K. Funkhouser, F. Gao, B. H. Hahn, M. L. Kalish, C. Kuiken, G. H. Learn, T. Leitner, F. McCutchan, S. Osmanov, M. Peeters, D. Pieniazek, M. Salminen, P. M. Sharp, S. Wolinsky, and B. Korber, Letter, Science 288:55–56). The similarity between these two monophyletic clades would preclude them from achieving the rank of two individual subtypes. These two monophyletic clades have been historically named subtypes A and E, however, just as the monophyletic clades of B and D have historically been designated subtypes even though they do not meet present requirements (D. L. Robertson et al., Letter, Science 288:55–56). Hence, we propose a nomenclature that maintains the E subtype and argue that the A/E nomenclature is unwarranted. Most importantly, the inappropriate attribution of recombinant origins to divergent sequences obscures the true evolutionary properties of these viruses.
ACKNOWLEDGMENTS
This work was supported by grants from the Centers for Disease Control and Prevention (CDC), The University of Washington Center for AIDS Research (CFAR), and the U.S. Public Health Service.
We also thank G. van der Groen for helpful comments.
REFERENCES
- 1.Burke D S. Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis. 1997;3:253–259. doi: 10.3201/eid0303.970301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Carr J K, Foley B T, Leitner T, Salminen M, Korber B, McCutchan F. Reference sequences representing the principal genetic diversity of HIV-1 in the pandemic. Vol. 1999. Los Alamos, N.M: Los Alamos National Laboratory; 1999. [Google Scholar]
- 3.Carr J K, Salminen M O, Albert J, Sanders-Buell E, Gotte D, Birx D L, McCutchan F E. Full genome sequences of human immunodeficiency virus type 1 subtypes G and A/G intersubtype recombinants. Virology. 1998;247:22–31. doi: 10.1006/viro.1998.9211. [DOI] [PubMed] [Google Scholar]
- 4.Carr J K, Salminen M O, Koch C, Gotte D, Artenstein A W, Hegerich P A, St. Louis D, Burke D S, McCutchan F E. Full-length sequence and mosaic structure of a human immunodeficiency virus type 1 isolate from Thailand. J Virol. 1996;70:5935–5943. doi: 10.1128/jvi.70.9.5935-5943.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cornelissen M, Kampinga G, Zorgdrager F, Goudsmit J the UNAIDS Network for HIV Isolation and Characterization. Human immunodeficiency virus type 1 subtypes defined by env show high frequency of recombinant gag genes. J Virol. 1996;70:8209–8212. doi: 10.1128/jvi.70.11.8209-8212.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Diaz R S, Sabino E C, Mayer A, Mosley J W, Busch M P The Transfusion Safety Study Group. Dual human immunodeficiency virus type 1 infection and recombination in a dually exposed transfusion recipient. J Virol. 1995;69:3272–3281. doi: 10.1128/jvi.69.6.3273-3281.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39:783–791. doi: 10.1111/j.1558-5646.1985.tb00420.x. [DOI] [PubMed] [Google Scholar]
- 8.Felsenstein J. PHYLIP (phylogeny inference package) version 3.5c. Seattle, Washington: Department of Genetics, University of Washington; 1993. [Google Scholar]
- 9.Gao F, Morrison S G, Robertson D L, Thornton C L, Craig S, Karlsson G, Sodroski J, Morgado M, Galvao-Castro B, von Breisen H, Beddows S, Weber J, Sharp P M, Shaw G M, Hahn B H the WHO and NIAID Networks for HIV Isolation and Characterization. Molecular cloning and analysis of functional envelope genes from human immunodeficiency virus type 1 sequence subtypes A through G. J Virol. 1996;70:1651–1667. doi: 10.1128/jvi.70.3.1651-1667.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gao F, Robertson D L, Carruthers C D, Li Y, Bailes E, Kostrikis L G, Salminen M O, Bibollet-Ruche F, Peeters M, Ho D D, Shaw G M, Sharp P M, Hahn B H. An isolate of human immunodeficiency virus type 1 originally classified as subtype I represents a complex mosaic comprising three different group M subtypes (A, G, and I) J Virol. 1998;72:10234–10241. doi: 10.1128/jvi.72.12.10234-10241.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gao F, Robertson D L, Carruthers C D, Morrison S G, Jian B, Chen Y, Barré-Sinoussi F, Girard M, Srinivasan A, Abimiku A G, Shaw G M, Sharp P M, Hahn B H. A comprehensive panel of near-full-length clones and reference sequences for non-subtype B isolates of human immunodeficiency virus type 1. J Virol. 1998;72:5680–5698. doi: 10.1128/jvi.72.7.5680-5698.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gao F, Robertson D L, Morrison S G, Hui H, Craig S, Decker J, Fultz P N, Girard M, Shaw G M, Hahn B H, Sharp P M. The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. J Virol. 1996;70:7013–7029. doi: 10.1128/jvi.70.10.7013-7029.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Goldman, N., J. P. Anderson, and A. Rodrigo. Likelihood-based tests of topologies in phylogenetics. Syst. Biol., in press. [DOI] [PubMed]
- 14.Grassly N C, Holmes E C. A likelihood method for the detection of selection and recombination using nucleotide sequences. Mol Biol Evol. 1997;14:239–247. doi: 10.1093/oxfordjournals.molbev.a025760. [DOI] [PubMed] [Google Scholar]
- 15.Griffiths C S. Correlation of functional domains and rates of nucleotide substitution in cytochrome b. Mol Phylogenet Evol. 1997;7:352–365. doi: 10.1006/mpev.1997.0404. [DOI] [PubMed] [Google Scholar]
- 16.Hillis D M, Huelsenbeck J P. Signal, noise, and reliability in molecular phylogenetic analyses. J Hered. 1992;83:189–195. doi: 10.1093/oxfordjournals.jhered.a111190. [DOI] [PubMed] [Google Scholar]
- 17.Hillis D M, Moritz C, Mable B K, editors. Molecular systematics. 2nd ed. Sunderland, Mass: Sinauer Associates; 1996. [Google Scholar]
- 18.Howell R M, Fitzgibbon J E, Noe M, Ren Z, Gocke D, Schwartzer T A, Dubin D T. In vivo sequence variation of the human immunodeficiency virus type 1 env gene: evidence for recombination among variants found in a single individual. AIDS Res Hum Retrovir. 1991;7:869–876. doi: 10.1089/aid.1991.7.869. [DOI] [PubMed] [Google Scholar]
- 19.Huelsenbeck J P, Rannala B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science. 1997;276:227–232. doi: 10.1126/science.276.5310.227. [DOI] [PubMed] [Google Scholar]
- 20.Ichimura H, Kliks S C, Visrutaratna S, Ou C Y, Kalish M L, Levy J A. Biological, serological, and genetic characterization of HIV-1 subtype E isolates from northern Thailand. AIDS Res Hum Retrovir. 1994;10:263–269. doi: 10.1089/aid.1994.10.263. [DOI] [PubMed] [Google Scholar]
- 21.Jakobsen I B, Easteal S. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput Appl Biosci. 1996;12:291–295. doi: 10.1093/bioinformatics/12.4.291. [DOI] [PubMed] [Google Scholar]
- 22.Kampinga G A, Simonon A, Van de Perre P, Karita E, Msellati P, Goudsmit J. Primary infections with HIV-1 of women and their offspring in Rwanda: findings of heterogeneity at seroconversion, coinfection, and recombinants of HIV-1 subtypes A and C. Virology. 1997;227:63–76. doi: 10.1006/viro.1996.8318. [DOI] [PubMed] [Google Scholar]
- 23.Kishino H, Hasegawa M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order of the Hominoidea. J Mol Evol. 1989;29:170–179. doi: 10.1007/BF02100115. [DOI] [PubMed] [Google Scholar]
- 24.Kitayaporn D, Vanichseni S, Mastro T D, Raktham S, Vaniyapongs T, Des Jarlais D C, Wasi C, Young N L, Sujarita S, Heyward W L, Esparza J. Infection with HIV-1 subtypes B and E in injecting drug users screened for enrollment into a prospective cohort in Bangkok, Thailand. J Acquir Immune Defic Syndr Hum Retrovirol. 1998;19:289–295. doi: 10.1097/00042560-199811010-00012. [DOI] [PubMed] [Google Scholar]
- 25.Korber B, Hahn B, Foley B, Mellors J W, Leitner T, Myers G, McCutchan F, Kuiken C L. Human retroviruses and AIDS 1997: a compilation and analysis of nucleic acid and amino acid sequences. Los Alamos, N.M: Los Alamos National Laboratory; 1997. [Google Scholar]
- 26.Kostrikis L G, Bagdades E, Cao Y, Zhang L, Dimitriou D, Ho D D. Genetic analysis of human immunodeficiency virus type 1 strains from patients in Cyprus: identification of a new subtype designated subtype I. J Virol. 1995;69:6122–6130. doi: 10.1128/jvi.69.10.6122-6130.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kuwata T, Miyazaki Y, Igarashi T, Takehisa J, Hayami M. The rapid spread of recombinants during a natural in vitro infection with two human immunodeficiency virus type 1 strains. J Virol. 1997;71:7088–7091. doi: 10.1128/jvi.71.9.7088-7091.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lanave C, Preparata G, Saccone C, Serio G. A new method for calculating evolutionary substitution rates. J Mol Evol. 1984;20:86–93. doi: 10.1007/BF02101990. [DOI] [PubMed] [Google Scholar]
- 29.Leigh-Brown A, Monaghan P. Evolution of the structural proteins of Human immunodeficiency virus: selective constraints on nucleotide substitution. AIDS Res Hum Retrovir. 1988;4:399–407. doi: 10.1089/aid.1988.4.399. [DOI] [PubMed] [Google Scholar]
- 30.Leitner T, Escanilla D, Marquina S, Wahlberg J, Brostrom C, Hansson H B, Uhlen M, Albert J. Biological and molecular characterization of subtype D, G, and A/D recombinant HIV-1 transmissions in Sweden. Virology. 1995;209:136–146. doi: 10.1006/viro.1995.1237. [DOI] [PubMed] [Google Scholar]
- 31.Liitsola K, Tashkinova I, Laukkanen T, Korovina G, Smolskaja T, Momot O, Mashkilleyson N, Chaplinskas S, Brummer-Korvenkontio H, Vanhatalo J, Leinikki P, Salminen M O. HIV-1 genetic subtype A/B recombinant strain causing an explosive epidemic in injecting drug users in Kaliningrad. AIDS. 1998;12:1907–1919. doi: 10.1097/00002030-199814000-00023. [DOI] [PubMed] [Google Scholar]
- 32.Limpakarnjanarat K, Ungchusak K, Mastro T D, Young N L, Likhityingvara C, Sangwonloy O, Weniger B G, Pau C P, Dondero T J. The epidemiological evolution of HIV-1 subtypes B and E among heterosexuals and injecting drug users in Thailand, 1992–1997. AIDS. 1998;12:1108–1109. [PubMed] [Google Scholar]
- 33.Louwagie J, McCutchan F E, Peeters M, Brennan T P, Sanders-Buell E, Eddy G A, van der Groen G, Fransen K, Gershy-Damet G-M, Deleys R, Burke D S. Phylogenetic analysis of gag genes from seventy international HIV-1 isolates provides evidence for multiple genotypes. AIDS. 1993;7:769–780. doi: 10.1097/00002030-199306000-00003. [DOI] [PubMed] [Google Scholar]
- 34.Mascola J R, Louwagie J, McCutchan F E, Fischer C L, Hegerich P A, Wagner K F, Fowler A K, McNeil J G, Burke D S. Two antigenetically distinct subtypes of HIV-1: viral genotype predicts neutralization serotype. J Infect Dis. 1994;169:48–54. doi: 10.1093/infdis/169.1.48. [DOI] [PubMed] [Google Scholar]
- 35.McCutchan F E, Artenstein A W, Sanders-Buell E, Salminen M O, Carr J K, Mascola J R, Yu X F, Nelson K E, Khamboonruang C, Schmitt D, Kieny M P, McNeil J G, Burke D S. Diversity of the envelope glycoprotein among human immunodeficiency virus type 1 isolates of clade E from Asia and Africa. J Virol. 1996;70:3331–3338. doi: 10.1128/jvi.70.6.3331-3338.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.McCutchan F E, Carr J K, Bajani M, Sanders-Buell E, Harry T O, Stoeckli T C, Robbins K E, Gashau W, Nasidi A, Janssens W, Kalish M L. Subtype G and multiple forms of A/G intersubtype recombinant human immunodeficiency virus type 1 in Nigeria. Virology. 1999;254:226–234. doi: 10.1006/viro.1998.9505. [DOI] [PubMed] [Google Scholar]
- 37.McCutchan F E, Salminen M O, Carr J K, Burke D S. HIV-1 genetic diversity. AIDS. 1996;10(Suppl. 3):S13–S20. [PubMed] [Google Scholar]
- 38.Murphy E, Korber B, Georges-Courbot M, You B, Pinter A, Cook D, Kieny M, Georges A, Mathiot C, Barre-Sinoussi F, Girard M. Diversity of V3 region sequences of human immunodeficiency viruses type 1 from the central african republic. AIDS Res Hum Retrovir. 1993;9:997–1006. doi: 10.1089/aid.1993.9.997. [DOI] [PubMed] [Google Scholar]
- 39.Nkengasong J N, Janssens W, Heyndrickx L, Fransen K, Ndumbe P M, Motte J, Leonaers A, Ngolle M, Ayuk J, Piot P, et al. Genotypic subtypes of HIV-1 in Cameroon. AIDS. 1994;8:1405–1412. doi: 10.1097/00002030-199410000-00006. [DOI] [PubMed] [Google Scholar]
- 40.Posada D, Crandall K A. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14:817–818. doi: 10.1093/bioinformatics/14.9.817. [DOI] [PubMed] [Google Scholar]
- 41.Rambaut A, Grassly N C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci. 1997;13:235–238. doi: 10.1093/bioinformatics/13.3.235. [DOI] [PubMed] [Google Scholar]
- 42.Robertson D L, Hahn B H, Sharp P M. Recombination in AIDS viruses. J Mol Evol. 1995;40:249–259. doi: 10.1007/BF00163230. [DOI] [PubMed] [Google Scholar]
- 43.Robertson D L, Sharp P M, McCutchan F E, Hahn B H. Recombination in HIV-1. Nature. 1995;374:124–126. doi: 10.1038/374124b0. [DOI] [PubMed] [Google Scholar]
- 44.Rodrigo A G, Goracke P C, Rowhanian K, Mullins J I. Quantitation of target molecules from polymerase chain reaction-based limiting dilution assays. AIDS Res Hum Retrovir. 1997;13:737–742. doi: 10.1089/aid.1997.13.737. [DOI] [PubMed] [Google Scholar]
- 45.Rodrigo A G, Mullins J I. HIV-1 molecular evolution and the measure of selection. AIDS Res Hum Retrovir. 1996;12:1681–1685. doi: 10.1089/aid.1996.12.1681. [DOI] [PubMed] [Google Scholar]
- 46.Sabino E C, Shpaer E G, Morgado M G, Korber B T M, Diaz R, Bongertz V, Cavalcante S, Galvao-Castro B, Mullins J I, Mayer A. Identification of human immunodeficiency virus type 1 envelope genes recombinant between subtypes B and F in two epidemiologically linked individuals in Brazil. J Virol. 1994;68:6340–6346. doi: 10.1128/jvi.68.10.6340-6346.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Saksena N K, Wang B, Ge Y C, Xiang S H, Dwyer D E, Cunningham A L. Coinfection and genetic recombination between HIV-1 strains: possible biological implications in Australia and South East Asia. Ann Acad Med Singapore. 1997;26:121–127. [PubMed] [Google Scholar]
- 48.Salminen M O, Carr J K, Burke D S, McCutchan F E. Identification of breakpoints in intergenotypic recombinants of HIV-1 by bootscanning. AIDS Res Hum Retrovir. 1995;11:1423–1425. doi: 10.1089/aid.1995.11.1423. [DOI] [PubMed] [Google Scholar]
- 49.Salminen M O, Carr J K, Robertson D L, Hegerich P, Gotte D, Koch C, Sanders-Buell E, Gao F, Sharp P M, Hahn B H, Burke D S, McCutchan F E. Evolution and probable transmission of intersubtype recombinant human immunodeficiency virus type 1 in a Zambian couple. J Virol. 1997;71:2647–2655. doi: 10.1128/jvi.71.4.2647-2655.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sawyer S. Statistical tests for detecting gene conversion. Mol Biol Evol. 1989;6:526–538. doi: 10.1093/oxfordjournals.molbev.a040567. [DOI] [PubMed] [Google Scholar]
- 51.Shao Y, Su L, Zhao F, Xing H, et al. 12th World AIDS Conference, Geneva, Switzerland. 1998. Genetic recombination of HIV-1 strains identified in China, abstr. 11179; p. 429. [Google Scholar]
- 52.Shao Y, Zhao F, Yang W, et al. The identification of recombinant HIV-1 strains in IDUs in Southwest and Northwest China. Chin J Exp Clin Virol. 1999;13:109–112. [PubMed] [Google Scholar]
- 53.Shimodaira H, Hasegawa M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999;16:1114–1116. [Google Scholar]
- 54.Siepel A C, Halpern A L, Macken C, Korber B T M. A computer program designed to rapidly screen for HIV-1 intersubtype recombinant sequences. AIDS Res Hum Retrovir. 1995;11:1413–1416. doi: 10.1089/aid.1995.11.1413. [DOI] [PubMed] [Google Scholar]
- 55.Smith J M. Analyzing the mosaic structure of genes. J Mol Evol. 1992;34:126–129. doi: 10.1007/BF00182389. [DOI] [PubMed] [Google Scholar]
- 56.Stephens J C. Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. Mol Biol Evol. 1985;2:539–556. doi: 10.1093/oxfordjournals.molbev.a040371. [DOI] [PubMed] [Google Scholar]
- 57.Subbarao S, Limpakarnjanarat K, Mastro T D, Bhumisawasdi J, Warachit P, Jayavasu C, Young N L, Luo C C, Shaffer N, Kalish M L, Schochetman G. HIV type 1 in Thailand, 1994–1995: persistence of two subtypes with low genetic diversity. AIDS Res Hum Retrovir. 1998;14:319–327. doi: 10.1089/aid.1998.14.319. [DOI] [PubMed] [Google Scholar]
- 58.Swofford D L. PAUP 4.0: Phylogenetic analysis using parsimony (and other methods), 4.0b2a ed. Sunderland, Mass: Sinauer Associates, Inc.; 1999. [Google Scholar]
- 59.Thompson J D, Higgins D G, Gibson T J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Triques K, Bourgeois A, Vidal N, Mpoudi-Ngole E, Mulanga-Kabeya C, Nzilambi N, Torimiro N, Saman E, Delaporte E, Peeters M. Near-full-length genome sequencing of divergent African HIV type 1 subtype F viruses leads to the identification of a new HIV type 1 subtype designated K. AIDS Res Hum Retrovir. 2000;16:139–151. doi: 10.1089/088922200309485. [DOI] [PubMed] [Google Scholar]
- 61.UNAIDS. UNAIDS/WHO Working Group on Global HIV/AIDS and STD Surveillance Report on the global HIV/AIDS epidemic. Geneva, Switzerland: World Health Organization; 1997. [Google Scholar]
- 62.Wasi C, Herring B, Vanichseni S, Raktham S, Mastro T D, Young N L, Rübsamen-Waigmann H, von Briesen H, Kalish M L, Luo C-C, Pau C-P, Baldwin A, Mullins J I, Delwart E L, Esparza J, Heyward W L, Osmanov S. Determination of HIV-1 subtypes in injecting drug users in Bangkok, Thailand, using peptide binding enzyme immunoassay and the heteroduplex mobility assay: evidence of increasing infection with HIV-1 subtype E. AIDS. 1995;9:843–849. doi: 10.1097/00002030-199508000-00003. [DOI] [PubMed] [Google Scholar]
- 63.Weiller G F. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol Biol Evol. 1998;15:326–335. doi: 10.1093/oxfordjournals.molbev.a025929. [DOI] [PubMed] [Google Scholar]
- 64.Weniger B. Experience from HIV incidence cohorts in Thailand: implications for HIV vaccine efficacy trials. AIDS. 1994;8:1007–1010. doi: 10.1097/00002030-199407000-00020. [DOI] [PubMed] [Google Scholar]
- 65.WHO Working Group. The HIV/AIDS pandemic: 1995 overview. Global Programme on AIDS. Geneva, Switzerland: World Health Organization; 1995. [Google Scholar]