Abstract
Vpu, an accessory protein of human immunodeficiency virus-1 (HIV-1), plays a crucial role in viral particle production and significantly contributes to HIV virulence. However, the evolution of the vpu gene remains poorly understood. We conducted a computational analysis of approximately 39,000 simian immunodeficiency virus (SIV) and HIV sequences, focusing on 141 representative Vpu proteins. Phylogenetic analysis classified the SIV and HIV strains into four major types based on their Vpu proteins: Vpu-type 1 (ancestral, found in SIVs such as SIVmon and SIVgsn), Vpu-type 2 (SIVgor and HIV-1 group O), Vpu-type 3 (SIVcpz), and Vpu-type 4 (HIV-1 group M and N). Notably, Vpu-type 1 exhibited variability in gene length, genome length, and the overlap between vpu and env compared with other Vpu-types. A phylogenetic tree was constructed using 426 nucleotide sequences from HIV-1, HIV-2, and SIVs focusing on the region between the pol and env genes. Vpu-type 1 was closely clustered with SIVasc and SIVsyk, lacking both vpu and vpx. The similarities observed between vpu and genes such as vpr and env suggest that vpu originated within the SIV genome. In addition, a phylogenetic tree constructed from 252 Vpu-type 4a sequences from the HIV pandemic strain and 135 sequences of circulating recombinant forms of HIV-1 revealed 18 distinct protein subtypes, exceeding the number of previously recognized subtypes. The systematic analysis of the sequences from large datasets has enabled a detailed characterization of the transition states of vpu, enhancing our understanding of the processes driving viral diversity.
Supplementary Information
The online version contains supplementary material available at 10.1007/s00239-025-10256-6.
Keywords: HIV-1, SIV, Vpu, Bioinformatics, Phylogenetic tree, Molecular evolution
Introduction
HIV-1 is an RNA virus in the genus Lentivirus and the family Retroviridae. HIV emerged through zoonotic cross-species transmission SIV infecting primates in Africa (Kirchhoff 2009). Currently, over 40 nonhuman primate species carry species-specific SIV infections (Aghokeng et al. 2010). HIV-1 is categorized into four groups: group M (main), group O (outlier), group N (non-M/non-O), and group P. Among these, group M is the strain responsible for the global pandemic. Studies have shown that HIV-1 groups M and N evolved from SIVcpz found in the chimpanzee (Pan troglodytes) whereas groups O and P evolved from SIVgor infecting the gorilla (Gorilla gorilla) (Keele et al. 2006; Van Heuverswyn et al. 2006; Jun Takehisa et al. 2009). In contrast, HIV-2 evolved from SIVsmm infecting the sooty mangabey (Cereocebus atys) (Hirsch et al. 1989). The ancestral differences between HIV-1 and HIV-2 underpin their genetic differences and the differences in their infectivity rates.
SIVs such as SIVgor originated from SIVcpz. In turn, SIVcpz evolved from multiple recombination events among SIVrcm, SIVgsn, SIVmon, and SIVmus, which infect red-capped mangabeys, greater spot-nosed monkey (Cercopithecus nictitans), mona monkey (C. mona), and mustached guenon (C. cephus), respectively, which are commonly categorized as Old World monkeys (Sharp et al. 2005). The genome organization of HIV and SIV includes 10 genes (Supplementary fig. S1): three core genes (gag, pol, and env), two essential regulatory genes (tat and rev) and five accessory regulatory genes (vif, vpr, nef, vpx, and vpu). HIV-1 and several SIV strains, sharing a common evolutionary lineage, contain the vpu gene. In contrast, HIV-2 and certain SIVs lack this gene, and instead carry the vpx gene or neither of these genes. To advance our understanding of the genetic changes in HIV-1 over time, we investigated the origin and evolution of the vpu gene in HIV-1. Our findings may provide insights into the mechanisms of gene acquisition and adaptation in HIV-1, potentially clarifying why one specific strain has emerged as the predominant variant in the global HIV pandemic.
Vpu (viral protein U) is an 81-amino-acid type 1 integral membrane phosphoprotein that is translated by a Rev-regulated bicistronic mRNA that also encodes the env gene (Cohen et al. 1988; Strebel et al. 1989; Schwartz et al. 1990). Vpu comprises three distinct α-helices: the N-terminal proximal transmembrane domain and two C-terminal domains (Cohen et al. 1988; Federau et al. 1996; Schubert et al. 1996; Gonzalez 2015). Its functions are to enhance the degradation of cellular CD4 in the endoplasmic reticulum and augment the release of progeny virions from infected cells by antagonizing tetherin (also known as BST-2). Host restriction factors impede the overall infectivity and pathogenicity of the virus (Schubert & Strebel 1994; Klimkait et al. 1990; Terwilliger et al. 1989; Willey et al. 1992). In addition, Vpu induces the degranulation of natural killer (NK) cells by downmodulating NTB-A, inhibits lipid antigen presentation via downmodulation of CD1d by cooperating with Nef, and impairs the migration and chemotactic signaling within cellular CD4 by downregulating CCR7. Notably, only HIV-1 group M strains contain a vpu gene, which expresses Vpu with the capacity to enhance the degradation of cellular CD4, antagonize tetherin, and effectively downmodulate NTB-A, CD1d, and CCR7. HIV-1 groups N, O, and P do not show this trait entirely; HIV-1 group N strains are unable to degrade cellular CD4 or downmodulate NTB-A and CD1d (Sauter et al. 2009). Meanwhile, the group O and P stains are incapable of downregulating BST-2 but can degrade cellular CD4 and downmodulate CD1d. Currently, the capability of other activities, such as downmodulation of CCR7, remains unknown. However, the origin of the vpu gene and the mechanisms underlying its molecular evolution in HIV-1 remain poorly understood. A previous study suggested that HIV-1 vpu originated from a common ancestor of the gene in SIVgsn, SIVmus, and SIVmon, and was subsequently transferred to the SIVcpz genome (Bibollet-Ruche et al. 2004). However, this hypothesis has yet to be tested.
In this study, we comprehensively examined an extensive dataset of HIV and SIV sequences to clarify the origin of the vpu gene and to trace its evolution in HIV-1. We conducted an evolutionary analysis to understand how the vpu gene in the HIV pandemic strain diversified over time within HIV-1 group M, leading to a more detailed reclassification of its subtypes based on their Vpu amino acid sequences. We also analyzed the variant B strain, which is known for its increased virulence compared with the standard subtype B. Overall, this study enhances our understanding of the evolution of vpu by establishing an evolutionary model, and hence provide insights into the origin, possible acquisition mechanism, and diversification of vpu in the pandemic strain of HIV-1.
Materials and Methods
Datasets
The nucleotide and amino acid sequences of the HIV and SIV strains used in this study were obtained from the Los Alamos HIV Sequence Database (https://www.hiv.lanl.gov/, last accessed 22 October 2021), together with details of the sampling region and sampling year. In total, 10,893 nucleotide sequences (including complete and partial genomes), 38,856 Vpu amino acid sequences, and 19,965 Pol amino acid sequences were extracted (Supplementary table S1). In addition, the recently discovered variant B strain (GenBank IDs: MT458931–MT458935 and MW689459–MW689470; MW689465 and MW689466) was extracted from the National Center for Biotechnology Information (NCBI) GenBank (https://www.ncbi.nlm.nih.gov/, last accessed 16 July 2023).
Multiple Sequence Alignment
The multiple sequence alignments (MSAs) of HIV and SIV nucleotide and amino acid sequences were conducted using MAFFT L-INS-i v7.45 (Katoh and Standley 2013) using the default parameters. The alignments were visualized with Jalview v2.11 (Waterhouse et al. 2009). The alignment was colored using ClustalX and the percentage identity color scheme was utilized to highlight the percentage abundance of aligned residues and the amino acids conserved across sequences (Waterhouse et al. 2009). The conservation, consensus, and multi-Harmony scores are shown below the alignments. To quantitatively measure conservation, the numbers of amino acids with specific physicochemical properties conserved in each column of the alignment were calculated. This index was based on the Analysis of Multiply Aligned Sequences method (Livingstone and Barton 1993), where conservation is represented by a numerical score that reflects the degree of similarity between the physicochemical properties of the amino acids in the alignment, followed by the highest identity score, and substitutions to amino acids in the same physicochemical class. The conservation score for each amino acid position ranges from 1 to 11. High conservation is marked with ‘*’ (score of 11 on the default amino acid grouping), whereas partial conservation in all properties is marked with a yellow ‘ + ’ (score of 10) (Livingstone and Barton 1993). The consensus annotation shows the percentage of the modal residue per column. We used ‘ + ’ to show that the modal value is shared by more than one residue. Multi-Harmony (Brandt et al. 2010) was used to identify significant patterns of subgroup variation among the columns of an alignment and the specific residues of each type. The tool requires that the alignment is subdivided into groups containing at least two nonidentical protein sequences, which we grouped according to viral type. The Sequence Harmony (SH) scores for each subgrouped alignment were determined by applying Shannon’s entropy (Shannon 1948), which measures the extent of the differences in amino acid composition between groups, thus detecting subtype-specific sites. Pairwise local alignments between the sequences were performed using the Smith‒Waterman algorithm implemented in EMBOSS Water (Madeira et al. 2024) with default parameters (matrix: EDNAFULL, gap penalty: 10.0, extend penalty: 0.5) to calculate sequence similarity and identity. This is a rigorous dynamic programming approach for optimal local alignment that considers all possible alignments between two sequences (Waterman et al. 1976).
Molecular Phylogenetic Analysis
CD-HIT is a clustering program used to minimize the number of redundant sequences encoding similar proteins (Fu et al. 2012). It allows researchers to create a dataset composed solely of unique representative sequences. The alignment file created in this analysis was used to construct the phylogenetic tree. We used trimAl version 1.2rev59, an automated tool for trimming poorly aligned regions in sequence alignments. By applying the gappyout parameter, trimAl automatically detects and removes columns with excessive gaps, improving alignment quality. This trimming is particularly beneficial for large-scale phylogenetic analyses (Capella-Gutierrez et al. 2009). Subsequently, midpoint-rooted and unrooted maximum likelihood trees were constructed using IQ-TREE v.1.6.12 (Minh et al. 2020). We applied ultrafast bootstrap approximation (UFBoot2) (Hoang et al. 2018), which enables rapid and unbiased estimation of branch support values, with 1,000 bootstrap replicates. The midpoint rooting method was utilized to avoid assumptions about the ancestral point, particularly given the uncertainty regarding a suitable outgroup. In addition, unrooted phylogenetic trees were created to elucidate the relationships between sequences without making assumptions about the evolution relations. The best-fit model for the analysis was determined using ModelFinder (Kalyaanamoorthy et al. 2017), which was used to optimize the tree’s accuracy. We used the JTTDCMut + R5 model (Fig. 1 and Supplementary fig. S13), which applies the revised Jones‒Taylor‒Thornton (JTT) matrix to include amino acid substitution rates and incorporates a FreeRate model with five rate categories to allow site-specific rate variation (Yang 1995; Kosiol and Goldman 2005; Soubrier et al. 2012). Supplementary fig. S11 was constructed using the JTTDCMut + R6 model, which is identical, except it has six rate categories. For Fig. 4, we used the GTR + F + R9 substitution model, which combines the general time reversible model and allows variable substitution rates between nucleotides in both directions, with empirical base frequencies (F) and a FreeRate model with nine rate categories to capture the evolutionary rate (Tavaré and Miura 1986; Yang 1995; Soubrier et al. 2012). Supplementary fig. S4 was constructed using the JTT + F + R5 model, which applies the JTT matrix with empirical amino acid frequencies and a FreeRate model with five rate categories (Jones et al. 1992; Yang 1995; Soubrier et al. 2012). For Fig. 5, Supplementary fig. S9A, and Supplementary fig. S10, we used the HIVb + R6 model, an HIV-1-specific probabilistic framework based on the between-patient training set combined with a FreeRate model using six rate categories (Yang 1995; Nickle et al. 2007; Soubrier et al. 2012). For Supplementary fig. S9C, we used the HIVb + R7 model, which differs only in the use of seven rate categories, whereas Supplementary fig. S9B used the HIVb + F + R10 model, which adds empirical amino acid frequencies and uses 10 rate categories (Yang 1995; Nickle et al. 2007; Soubrier et al. 2012). All trees were visualized using FigTree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). The alignment and tree files in FASTA and Newick formats, respectively, are provided in Supplementary table S2.
Fig. 1.
Midpoint-rooted phylogenetic tree of Vpu proteins of representative HIV-1 and SIV strains. The tree was constructed using 141 Vpu protein sequences (see Supplementary table S1) with 1,000 bootstrap replicates. The annotations for each sequence are presented in the following order: viral strain, viral group, and GenBank protein ID. These annotations are color-coded according to the viral host; human-derived sequences are shown in red and sequences from apes and monkeys are shown in blue. Additionally, the DSGxxS motif corresponding to each sequence is displayed, and conserved amino acid residues shared between types are highlighted in yellow. HIV-1 group M sequences were collapsed due to the vast number of sequences and are shown as the black triangle on the tree. Five types of Vpu proteins (Vpu-types 1, 2, 3, 4a, and 4b) were classified based on the clustering of the evolutionary lineages. The scale bar below the tree indicates 0.3 (30%) amino acid substitutions per site, and the bootstrap values (% of 1,000 replicates) are shown at each node
Fig. 4.
Midpoint-rooted phylogenetic tree of the nucleotide sequences in the pol–env region. The tree was constructed using 426 sequences (see Supplementary table S1) with 1,000 bootstrap replicates. The region of focus is shown in Supplementary fig. S1. Annotations of the viral strains and GenBank IDs, extracted from GenBank, are colored according to type: vpu−vpx− (black), Vpx-type (orange), Vpu-type 1 (red), Vpu-type 2 (brown), Vpu-type 3 (green), and Vpu-type 4 (blue). SIVmac/smm, SIVcpz, HIV-1 group M, HIV-1 group O, and HIV-2 were collapsed and are shown as black triangles. The scale bar below the tree indicates 0.3 (30%) nucleotide substitutions per site, and the bootstrap values (% of 1,000 replicates) are shown at each node
Fig. 5.
Unrooted phylogenetic tree of Vpu proteins showing the geographic distributions of the viruses. The tree was constructed using 252 sequences (see Supplementary table S4 and S6) with 1,000 bootstrap replicates. The branches are colored according to the official viral subtypes; Orange: subtype A; red: subtype B; strawberry pink: variant B; dark blue: subtype C; light blue: subtype D; magenta: subtype F; yellow: subtype G; dark green: subtype H; brown: subtype J; light orange: subtype K; cyan blue: subtype L. The pie chart shows the geographic prevalence of the protein subtypes, using the following colors: Asia (light green), Africa (yellow), Oceania (purple), Europe (light blue), North America (light orange), and South America (pink). The scale bar below the tree indicates 0.2 (20%) amino acid substitutions per site, and the bootstrap values (% of 1,000 replicates) are shown at each node
Similarity Search
A similarity search for the possible origin of vpu was conducted using sliding window analysis through multiple sequence alignment of the SIV genomes with strains containing vpu sequences. To thoroughly examine the entire genome, we applied the sliding window approach by aligning the genome sequence against the query sequence, using a window size equal to the length of the query sequence. Each window was shifted by one nucleotide (step size of 1) from the previous segment, ensuring nearly complete overlap between consecutive segments. This method ensured continuous coverage across the genome on both the coding and antisense strands, with a particular focus on the overlap between vpu and env, as well as vpu alone. The similarity scores were calculated based on the number of aligned sequence similarities, excluding alignment gaps, and all alignments with high similarity scores were extracted.
Results and Discussion
Classification of HIV-1 and SIV Vpu Proteins
To comprehensively understand the evolution of the Vpu proteins in the genus Lentivirus, we collected all available sequences from the Los Alamos HIV Database. Our dataset included 38,856 sequences, containing 38,723 HIV-1 and 49 SIV Vpu sequences. We utilized CD-HIT to cluster sequences with 90% sequence similarity, yielding a final set of 141 sequences, with 125 from HIV-1 and 16 from SIV. This process ensured that only unique representatives of strains with numerous similar sequences were retained (Supplementary table S1). We constructed a phylogenetic tree of Vpu amino acid sequences We constructed a phylogenetic tree of Vpu amino acid sequences, resulting in the classification of four groups: Vpu-types 1, 2, 3, and 4 (Fig. 1). The groups were categorized based on four primary criteria: the phylogenetic tree structure, distinct clades within the sequences, motifs associated with specific functional properties, and differences in viral host preferences. Using these criteria, we comprehensively classified and characterized the Vpu protein types in HIV-1 and SIV: Vpu-type 1 included SIVden, SIVgsn, SIVmon, and SIVmus; Vpu-type 2 included SIVgor and HIV-1 groups O and P; Vpu-type 3 included SIVcpz; and Vpu-type 4 included HIV-1 groups M and N. Although this classification does not necessarily reflect the chronological evolutionary order of the viruses, our results offer a clear framework for understanding the evolutionary relationships among Vpu-containing viruses. We distinguished Vpu-types 3 and 4 based on their viral hosts and origins. The tree indicates that Vpu-type 1 sequences are more ancestral, being located at the basal end, and that Vpu-type 3 gave rise to Vpu-type 4. Our classification is consistent with a previous study that demonstrated HIV-1 group O originated from SIVgor, and HIV-1 groups M and N originated from SIVcpz (Gao et al. 1999). SIVcpz is believed to have originated from multiple recombination events between various SIV strains, including SIVgsn, SIVmus, and SIVmon (Bailes et al. 2003). Furthermore, the tree supports the hypothesis that Vpu evolved through a series of transmission events, suggesting that HIV-1 Vpu evolved from its ancestral progenitors.
To analyze the differences among the Vpu-types at the amino acid level, we performed multiple sequence alignment (Fig. 2 and Supplementary fig. S2). In Fig. 2A, we standardized the comparison by establishing a reference point of 81 amino acids, corresponding to the average length of HIV-1 group M Vpu, providing a consistent baseline for our analysis. In this analysis, Vpu-type 4 was further classified into two distinct subtypes based on their functional characteristics: 4a, which includes HIV-1 group M; and 4b, which includes HIV-1 group N. Vpu-type 4b lacks the ability to downregulate cellular CD4 (Sauter et al. 2009, 2012) or downmodulate NTB-A and CD1d (Shah et al. 2010; Sauter et al. 2012). Figure 2A displays the consensus, conservation, and multi-Harmony scores for the alignment of all Vpu-type sequences together, and the full alignment is shown in Supplementary fig. S2. This analysis revealed that the transmembrane domain and the two α helices are highly conserved within each Vpu-type. The alignment process introduced gaps, causing variations in the actual positions of amino acid residues. Therefore, in this context, ‘position’ refers to the amino acid location considering gaps, and ‘residue’ refers to the specific amino acid without gaps. To distinguish the differences between the Vpu-types better, we subsequently aligned the sequences by their respective groups in Fig. 2B, using the same sequences as shown in Supplementary fig. S2. This approach allowed for a clearer comparison than the full multiple sequence alignment of all Vpu-types.
Fig. 2.
Multiple-amino-acid sequence alignment of the Vpu proteins. A Protein structure of Vpu. The transmembrane domain and cytoplasmic domain regions in the Vpu protein are marked at the top. Conserved regions are displayed in the consensus sequence, with non-conserved residues marked with ‘ + ’. The conservation histogram shows the conservation of residues based on physicochemical properties. The multi-Harmony histogram shows the subtype-specific residues. B Multiple sequence alignment (n = 50) of each type of Vpu protein, colored according to the ClustalX color scheme. Important functional motifs and residues are indicated below the alignment. The virus names corresponding to the sequences are provided in Supplementary fig. S2. The table on the right shows the ability (+) or inability (−) of each type to antagonize tetherin (BST-2) or degrade cellular CD4. Functions that are yet unknown are indicated as UN. *Strain RBF206 is the only strain in group O known to be active against tetherin (Mack et al. 2017)
In Fig. 2B, we examined the Vpu-types individually to investigate Vpu-type-specific differences and assess the presence or absence of specific functions, including the ability to antagonize tetherin, downregulate cellular CD4, and downmodulate CCR7 and NTB-A. For this, we aligned 10 sequences per type that were selected from the same dataset used to generate the phylogenetic tree. There are a total of four motifs across the Vpu-types. The transmembrane domain contains motifs involved in tetherin antagonism, CCR7 downregulation, and NTB-A modulation, while the cytoplasmic domain contains motifs related to cellular CD4 downregulation. It is well established that the AxxxAxxxAxxxW, AxxxAxxxAxLL, and AxxxxxxxW motif in the transmembrane domain of HIV-1 groups M (Vpu-type 4a), N (Vpu-type 4b), and SIVgsn (Vpu-type 1), respectively, are crucial for anti-tetherin activity and CCR7 downregulation (Sauter et al. 2012; Douglas et al. 2013; Yao et al. 2020, 2022). The DSGxxS motif, crucial for the proteasomal degradation of cellular CD4 and NTB-A downmodulation in HIV-1, is partially conserved in Vpu-type 1, containing only one of the necessary double serine (S) residues. In Vpu-type 2, the S residue at position 57 (residue 53) is conserved in 50% of sequences, with the remainder containing glycine (G) (20%) or glutamic acid (E) (30%). However, the S residue at position 61 (residue 57) is fully conserved across all sequences. In contrast, both S residues at positions 72 and 79 are completely conserved in Vpu-types 3 and 4 (Supplementary fig. S2). Interestingly, the DSGxxS consensus sequences within their respective Vpu-types exhibit notable conservation, providing strong support for the Vpu-type classifications established in our study. These classifications are clearly mapped on the phylogenetic tree shown in Fig. 1 and the multiple sequence alignment in Fig. 2. Vpu-type 1 consistently contains the consensus sequence DSGxxx. Vpu-type 2 is characterized by a mix of DSxxES and DxGxES sequences, both of which are relatively conserved within this type. Vpu-type 3 features a highly conserved DSGNES sequence, whereas Vpu-type 4a shows the highest conservation of the DSGNES sequence. In contrast, Vpu-type 4b displays a mixture of the conserved DSGNES sequence and variations of this consensus, all of which are relatively conserved within this type. The FxNPxF/Y motif in the cytoplasmic domain, with unknown function, is commonly found in SIVcpz and group O, and is also present in Vpu-type 1 sequences (Kluge et al. 2013). This phenylalanine (F) is highly conserved throughout Vpu-types 1, 2, and 3, and is notably absent from Vpu-type 4. Supplementary fig. S2 presents the overall amino acid alignment of 50 sequences, with 10 randomly selected sequences from each Vpu-type. For example, at position 29 (residue 23), Vpu-types 2, 3, and 4 showed 100% conservation with tryptophan (W), whereas Vpu-type 1 displayed variability with only 20% of sequences containing W. Similarly, at position 59 (residue 42), valine (V) was present in 70% of Vpu-type 1 sequences, arginine (R) in 100% of Vpu-type 2 sequences, leucine (L) in 100% of Vpu-types 3 and 4b sequences, and isoleucine (I) in 90% of the sequences, illustrating type-specific differences.
Despite lacking certain crucial functions, Vpu-type 1 antagonizes cellular CD4 and tetherin, suggesting that variations in essential motifs across Vpu-types do not necessarily affect the functionality and that the functional motifs may differ between SIV and HIV-1 sequences, as shown in previous studies (Yoshida et al. 2011, 2013; Kluge et al. 2013; Yao et al. 2020, 2022). In Fig. 3, we used all available complete and partial sequences of Vpu-type 1. For Vpu-types 2‒4, we randomly selected sequences while ensuring that all of the groups were represented. Consequently, we observed that the vpu gene was prominently shorter in Vpu-type 1 than in Vpu-types 2, 3, and 4, with an average difference of 16.4 nucleotide bases. The vpu gene was larger in Vpu-type 2 than in the other types. The overlap between env and vpu varied within Vpu-type 1 but remained stable in Vpu-types 2, 3, and 4. Specifically, Vpu-type 4 had a consistent overlap of 82 bases, Vpu-type 3 had the shortest overlap of 77.5 bases, and Vpu-type 2 had the longest of 89.6 bases. The greater variation in overlap in Vpu-types 1 and 3 suggests potential genome instability or flexibility. Over time, this overlap has become more consistent, a crucial strategy for viral genome compaction (Chirico et al. 2010). Our analysis showed a decreasing trend in genome size across Vpu-types 1‒4, being longest in Vpu-type 1 (averaging 9,479.1 bases), followed by Vpu-type 2 (9,432.5 bases) and Vpu-type 3 (9,384.2 bases), and shortest in Vpu-type 4 (9,098.3 bases). Overall, Vpu-type 1 displayed high variability in terms of the lengths of the vpu gene, genome, and overlap. Although Vpu-type 1 has a small vpu gene, its genome is longer than that of the other types. To further understand these differences, we compared the lengths of all nine genes in the viral genomes to identify which gene may contribute to the larger genome size of Vpu-type 1 (Supplementary fig. S3). All genes in Vpu-type 1, except vpu, were larger than those in Vpu-type 4, suggesting that the genomic reduction over time led to a more efficient virus, and indicates that this type represents a more primitive form of vpu. In addition, while Vpu-type 1 uses vpu to antagonize cellular CD4, SIVcpz relies on Nef for tetherin antagonism due to the poor tetherin antagonism of its Vpu. This difference highlights the role of host-specific adaptation in viral evolution and pathogenesis (Sauter et al. 2009). In contrast, host-specific adaptation may be less critical for the downregulation of cellular CD4, given the widespread activity of the primate lentiviral Vpu and Nef proteins against human cellular CD4 (Schindler et al. 2006). The transition of Vpu to Nef to conduct these functions is suggested to have occurred because the use of the vpu gene in Vpu-type 1 was unfavorable for the virus. After transmission from SIVcpz to HIV-1, the virus reverted to using the Vpu protein to antagonize tetherin and likely underwent significant adaptations, potentially gaining the ability to perform the aforementioned functions, and contributed to the creation of the infectious virus. These findings suggest that Vpu is crucial in the HIV-1 group M pandemic, with gene and genome variations indicating adaptive evolution that enhance the functionality of Vpu across different hosts.
Fig. 3.

Boxplots comparing the sizes of HIV-1 and SIV vpu genes, lengths of overlap, and whole-genome lengths. Boxplots of the nucleotide (nt) lengths of vpu (A), the length of the overlap between vpu and env (B), and the whole-genome lengths (C) are shown (total sequences, n = 56; see Supplementary table S3). The line inside each box marks the median length, and the whiskers extend to the smallest and largest values within 1.5 times the interquartile range from the 25th and 75th percentiles. Outliers are represented by individual points beyond the whiskers. The numbers below each figure indicate the Vpu-type. Tukey’s method was used to identify significant differences between means (*p < 0.05, **p < 0.01)
Acquisition of Vpu Proteins
To clarify how vpu was acquired, we constructed a phylogenetic tree of HIV-1, HIV-2, and SIV based on the nucleotide sequence between the pol and env genes (Supplementary fig. S1). This region was selected for its high genomic variability and because it contains the vpu and vpx genes, as there are HIV and SIV strains that lack vpu. We retrieved all available, partial, and complete HIV and SIV genome sequences from NCBI GenBank. Our initial dataset comprised 10,893 sequences, including 10,001 HIV-1, 55 HIV-2, and 837 SIV sequences. To preserve the overall diversity of the dataset, we applied CD-HIT with a 90% sequence similarity threshold to reduce redundancy, resulting in a final dataset of 426 sequences, consisting of 282 HIV-1, 34 HIV-2, and 110 SIV. The midpoint-rooted phylogenetic tree indicated that Vpu-type 1 consists of SIVden, SIVgsn, SIVmon, and SIVmus. These are closely related to the SIVs that infect monkeys, such as the red-tailed monkey (SIVasc), African Sykes’ monkey (SIVsyk), and de Brazza’s monkey (SIVdeb), which lack the vpu and vpx genes (Fig. 4). These SIV strains were classified as vpu−vpx− in our study. Specifically, we located a mixture of Vpu-type 1 and vpu−vpx−-type sequences of SIVden and SIVdeb, respectively, which suggests that the vpu gene arose from the vpu−vpx−-type. Because previous studies have reported that the vpu gene originated from a common ancestor of SIVgsn and SIVmus (Bibollet-Ruche et al. 2004), we constructed an extensive phylogenetic tree that included both HIV and SIV sequences in a novel and comprehensive analysis to delve deeper into the evolutionary history of the vpu gene. To validate our findings, an additional tree was constructed using the Pol protein sequences because the pol gene is known for its high degree of conservation among the lentiviruses (Jern et al. 2005; Malossi et al. 2020). As on our previous tree, SIVden, SIVmus, and SIVmon were located close to the SIVsyk and SIVasc sequences on the Pol tree, confirming their close relationship and indicating that the vpu gene was acquired through transmission from vpu−vpx− viruses. The close relationship of Vpu-type 1 strains to other SIVs lacking the vpu gene (vpu−vpx−-types) suggests that Vpu-type 1 may represent an intermediate stage in the acquisition of the vpu gene. In addition, viruses such as SIVsyk, SIVasc, and SIVdeb are located near to Vpu-type 1, rather than other vpu−vpx− viruses, supporting the idea that Vpu-type 1 contains the oldest and most primitive form of the vpu gene (Supplementary fig. S4).
The exact origins of the vpu gene remain unclear. However, previous studies have suggested that Vpu may have evolved from the mammalian background K + channel TASK-1 through a process of “molecular piracy,” given the structural similarities between Vpu and the N-terminus of TASK-1 (Hsu et al. 2004). Additionally, it has been reported that the C-terminus of Epstein-Barr virus (EBV) latent membrane protein 1 (LMP-1) shares sequence similarities with Vpu (Huet et al. 1990). For example, TASK-1 which is approximately 394 amino acid (aa) long, exhibits 24% (16/68 aa) overall amino acid identity and 56% (38/68 aa) sequence conservation with Vpu (Hsu et al. 2004), while LMP-1, with a length of 386 amino acids, shows 30% (25/81 aa) overall identity and 50% (40/81aa) conservation. From an evolutionary perspective, considering that EBV does not infect apes it is unlikely that Vpu originated from LMP-1. In contrast, TASK-1 remains a potential origin for Vpu. However, our comparison of the nucleotide sequences of KCNK3 (TASK-1) and vpu did not show continuous alignment. This suggests that while TASK-1 may or may not have contributed to the formation of Vpu, it cannot be conclusively identified as the origin.
Therefore, we conducted a similarity search using the vpu sequence against both the nucleotide collection (nt) and the non-redundant protein database (nr) to explore other possible origins. We did not identify proteins with high similarity to Vpu, including LMP-1 and TASK-1. These findings led us to hypothesize that Vpu may have originated from within the genome itself. To explore this hypothesis, we focused on the overlap between the vpu and env genes in Vpu-type 1, as this overlap may have played a crucial role in gene’s creation. As shown in Fig. 3, the Vpu-type 1 viruses exhibit significant variation in the length of this overlap. Mapping this variation onto the phylogenetic tree in Fig. 4, specifically where Vpu-type 1 (red) and other vpu−vpx− (black) viruses are situated—revealed no correlation between the overlapping region and the phylogenetic relationships (Supplementary fig. S5). This suggests that the overlapping region developed independently in multiple species containing Vpu-type 1. A multiple sequence alignment of the nucleotide sequences from these overlapping regions revealed two conserved regions, CR1 and CR2 (Supplementary fig. S6). We then aligned the CRs of SIVmus (EF070330) against the complete genome of SIVasc (vpu−vpx−) (KJ461716), given their proximity on the phylogenetic tree in Fig. 4. This alignment was conducted using a sliding window method, segmenting the genome sequences based on the lengths specified for each CR (Supplementary fig. S7A). Similarity scores were calculated using EMBOSS-water, which uses the Smith‒Waterman algorithm to generate optimal local sequence alignments, with the default parameters (Madeira et al. 2024). The Smith‒Waterman algorithm conducts exhaustive pairwise comparisons, identifying the highest-scoring local alignment between two sequences while accounting for substitution scoring, gap penalties, and alignment length (Waterman et al. 1976). In this study, we required a similarity metric that reflects both sequence conservation and the extent of alignment, ensuring that longer, well-aligned regions contribute more to the similarity score than short but highly identical fragments. Although the Smith‒Waterman algorithm identifies the highest-scoring local alignment, our analysis emphasizes alignments that balance both sequence similarity and length to capture biologically meaningful regions of conservation. Unlike BLAST, which relies on heuristic searches to identify local alignments and may favor shorter, highly similar regions over longer but moderately conserved ones, our method explicitly considers both alignment length and sequence identity. This approach provides a more comprehensive and biologically meaningful measure of sequence conservation than methods that focus solely on short, high-identity fragments.
Supplementary fig. S7B and C show that CR2 shares 71.4% (10/14 nucleotide (nt)) similarity with tat and 69.2% (9/13 nt) similarity with the reverse complement of env in CR1. Despite the short sequence length, this analysis revealed continuous alignments with high similarity scores to neighboring genes, including tat and env, on both the coding and antisense strands. We extended this analysis to other Vpu-type 1 strains, including SIVgsn, SIVmon, and SIVden, and observed varying results across these strains. However, vpr and tat consistently showed high similarity scores across all strains. In addition, the env gene was consistently found in all CR2 regions. The results suggest that vpu originated from a possible combination of multiple genes, including partial sequences of tat, vpr, and env genes. Moreover, the presence of sequences similar to CR1 and CR2 at multiple genomic locations enables gene creation through recombination mediated by direct and inverted repeats (Bi and Liu 1996; Lee et al. 2015).
Subsequently, we analyzed the entire vpu nucleotide sequence against the genomes of vpu−vpx− strains, dividing it into three segments: S1 (85 nt), S2 (84 nt), and S3 (26 nt) (Supplementary fig. S8A). CR1 and CR2 specifically focus on the overlapping region between vpu and env, whereas S1 and S2 were generated by splitting vpu into two halves after first isolating S3, which corresponds to this overlap. Dividing the sequence into segments allowed us to examine specific regions more precisely, improving our ability to detect subtle similarities that may have been overlooked by full-length alignment. Our analysis of SIVden (Vpu-type 1) revealed that S1 shared 55.8% similarity with rev (48/86 nt) and 58.8% similarity with env (50/85 nt). S2 exhibited 51.6% similarity with env (49/95 nt) and 52.9% (46/87 nt) with vif, whereas S3 showed 71.4% similarity with vpr (15/21 nt) and 66.7% similarity with env (18/27 nt) (Supplementary fig. S8B, C). Considering that retroviruses are highly recombinogenic during cDNA synthesis (Goodrich and Duesberg 1990; Hemelaar et al. 2006), it is plausible that the vpu gene was created through recombination events between vpu−vpx−, possibly involving intergenomic recombination. The high similarity between the vpu and the antisense strand of env further supports this hypothesis. In addition, a replication misalignment (‘slippage’) within the vpu−vpx− genome may have expanded the direct and inverted repeats, forming a hybrid gene (Lovett 2004). Consistent with our findings in Supplementary fig. S7, the observed similarities among various genes suggest that multiple neighboring genes contributed to the formation of vpu, supporting our hypothesis. Furthermore, the similarities observed between these gene segments and Vpu were as high as those reported for LMP-1 and TASK-1. Notably, the presence of direct repeats in these regions raises the possibility that new genes could have been generated through recombination events involving these sequences, especially given the overlap with env. This further reinforces the idea that vpu may have originated through recombination and gene creation within the genome itself.
Diversification of vpu
After acquisition, vpu underwent a series of evolutionary events detailed in the previous sections, including modifications of its length and the region overlapping the env gene, as well as genome changes. Each of these events was crucial in shaping the virulent pandemic strain, HIV-1 group M. We constructed an unrooted phylogenetic tree with Vpu sequences of Vpu-type 4a (HIV-1 group M) to investigate the evolutionary processes and diversification that led to the emergence of group M. We analyzed the same dataset of Vpu amino acid sequences that was previously extracted from the Los Alamos HIV Database and applied CD-HIT with an identity threshold of 80% to reduce redundancy. This resulted in a final set of 252 HIV-1 group M sequences (Supplementary table S4), which included sequences from the newly reported variant subtype B strain (Wymant et al. 2022). The unrooted phylogenetic tree was constructed to clarify the relationships between the viral subtypes and identify potential differences in the Vpu protein among the viral subtypes without making assumptions about the direction of evolution of the sequences. We observed significant divergence within the pre-existing classification of the subtypes by observing the sequences. Currently, Vpu-type 4a includes 10 recognized viral subtypes (subtypes A-D, F–H, J-L) (Hemelaar et al. 2006; Sharp and Hahn 2011; Bbosa et al. 2019). Therefore, in a more specific classification, shown in Fig. 5, we reclassified the previously known subtypes into more detailed categories, which we refer to as ‘protein subtypes’ in this study. The reclassification of Vpu protein subtypes was based on multiple criteria. First, we identified unique clades that did not cluster with other sequences. We then examined their phylogenetic distances and confirmed that these clades were significantly separated from their corresponding viral subtypes. Finally, we examined the distinct sequence features within each clade and classified the sequences accordingly. Although the majority of this crucial motif was conserved across all subtypes, specific residue variations were observed at certain positions. These variations were not randomly distributed but were instead uniquely associated with their respective subtypes, reinforcing the validity of our classification. Furthermore, the proximity of branches in the phylogenetic tree reflects the sequence similarity, with closely clustered branches indicating higher conservation. As a result, the reclassification revealed 17 distinct protein subtypes. The divergence of viral subtypes within their initial classifications underscores the need for a more refined and comprehensive classification to enhance our understanding of the virus. This is particularly evident for Vpu because a previous study demonstrated that its sequence diversity is closely associated with significant variability in cellular CD4 and tetherin downregulation activities, including among major group M subtypes (Umviligihozo et al. 2020). Notably, the Vpu proteins of subtype B viruses showed considerable sequence diversity, leading to the identification of subtypes such as BFK-1, which combines elements of subtypes B, F, and K, as well as BD-1, which combines subtypes B and D. Subtypes A, B, and C also showed significant variation within their initial clades, resulting in three, four, and five newly classified protein subtypes, respectively. Subtype C exhibited the greatest divergence, and its clades were further classified as C-1 to C-5. To validate these results, we compared the sequences across the new classifications and observed clear differences in the alignments. Despite the small phylogenetic distances, these differences were still evident. In contrast, a previous study identified seven sub-subtypes within subtype A and three within subtype B (Desire et al. 2018), which is fewer than we proposed for subtype A and more than proposed for subtype B. However, the consistent grouping of subtypes B and D in both studies support our new classification. This agreement validates our reclassification and highlights the importance of accurately identifying subtypes to better detect and trace epidemiological changes in the virus. We conducted a similar analysis on the conserved genes using 242 Env and 214 Pol amino acid sequences (Supplementary table S5) and the results are shown in Supplementary fig. S9B and C, respectively. Similar to Vpu (Supplementary fig. S9A), we observed significant variation within the initial clades (subtypes) of both Env and Pol, and we could classify the clades further into 19 and 22 protein subtypes, respectively. However, in Vpu, multiple subtypes were grouped into a single protein subtype, indicating a mixture of subtypes within a single classified protein subtype, such as subtypes B, F, and K, as mentioned earlier. By contrast, the Pol and Env trees display sequences that are distinctly grouped according to their respective subtypes, without intermixing. This clear segregation in the Pol and Env trees suggests these proteins show lower diversity than those in the Vpu tree, where the blending of subtypes within clades indicates relatively greater diversity. Moreover, our analysis uncovered misannotations in existing sequence data, revealing instances where certain subtypes were erroneously intermixed within clusters due to annotation errors.
To identify protein-subtype-specific differences and the subtype-specific prevalence in these strains, we examined geographic and decadal variations (Fig. 5 and Supplementary fig. S10). We compiled the prevalence data in decadal intervals (Supplementary table S6) from subtype sequence annotations and generated pie charts for visual representation. This revealed decadal and geographic differences among protein subtypes. For instance, subtype B-1 sequences were predominant in 1990–99, whereas B-2 sequences were prevalent in 2010–19. Subtype B-3 showed a broader distribution across decades, with sequences from 1980 to 2019. Subtype B-4 had the highest sequence prevalence in 2000–09. Geographic differences were also evident. B-1 sequences were solely from North America; B-2 included sequences from both South America and North America; B-3 was represented across North America, Oceania, and Africa; and B-4 showed the widest geographic distribution, spanning Europe, North America, South America, and Africa (Supplementary table S7). We also analyzed the Vpu proteins of 135 circulating recombinant forms (CRFs) alongside those of HIV-1 group M (Vpu-type 4a) by constructing an unrooted phylogenetic tree to visualize protein diversification within HIV-1 (Supplementary fig. S11, Supplementary table S8). This analysis identified a distinct recombinant protein group, labeled ‘R1,’ which was clustered separately, suggesting independent evolution of the Vpu protein. R1 was composed mainly of sequences collected from Asia in 2010–19. Multiple sequence analysis between neighboring subtypes G-1 and A-4 revealed that R1 contains sequences derived from these subtypes (Supplementary fig. S12). Thus, a total of 18 protein subtypes (17 protein subtypes + 1 recombinant group [R1]) were classified within HIV-1 group M. This finding highlights the ongoing, rapid evolution of the vpu gene and the high divergence of recombinant forms.
A highly virulent variant of subtype B, which reduces cellular CD4 twice as efficiently as the standard subtype B, was recently reported (Wymant et al. 2022). To visualize its evolutionary lineage, we constructed a rooted phylogenetic tree of Vpu-type, including the variant from Vpu-type 4a, performed multiple sequence alignment, and used multi-Harmony to identify specific differences between the vpu genes of subtype B and the variant B strain (Supplementary fig. S13 and S14). To construct the tree, we used the same dataset of 141 sequences from the phylogenetic tree shown in Fig. 1, along with 17 additional sequences from a newly reported variant of the subtype B strain (Wymant et al. 2022). The phylogenetic tree revealed that the variant B strain is highly divergent from the common subtype B, branching off from subtype B, being located at the top of the tree. The alignment of Vpu amino acids showed greater conservation in the variant than in the common subtype. Specifically, positions 11 (residue 8), 19 (residue 16), 50 (residue 47), 76 (residue 73), and 80 (residue 77) in the variant B subtype exhibited over 94% conservation, whereas subtype B showed significantly lower conservation within the same strain. The comparisons at these five positions revealed high variation rates of 81%, 65%, 59%, 82%, and 71%, respectively. These findings suggest that even minor amino acid changes within Vpu may play crucial roles in influencing the virulence of HIV-1, potentially linking the Vpu protein to the decline in cellular CD4 levels. However, it is important to note that the vpu gene of this new variant has yet to be subjected to experimental studies.
Conclusions
Vpu is a key gene in the HIV-1 genome and in the genomes of SIVs, such as SIVcpz and SIVmus. Through comprehensive sequence analysis and phylogenetics, we traced the evolution of vpu and proposed a tentative evolutionary model (Fig. 6). This study presents a working hypothesis, acknowledging the limitations of relying solely on computational methods. We suggest that vpu was acquired in Vpu-type 1, infecting monkeys such as the mona monkey (SIVmon), mustached guenon (SIVmus), and greater spot-nosed monkey (SIVgsn) from African Sykes’ monkey (SIVsyk) and red-tailed monkey (SIVasc), which are vpu−vpx− SIVs strains, through recombination and replication misalignment of the direct and inverted repeat sequences from vpu−vpx− strains. The gene likely underwent adaptive optimization processes, such as adjusting the length of vpu and modifying its overlap with env, to enhance viral replication and infectivity. Additionally, the gene shifted the role of tetherin antagonism from vpu to nef in Vpu-types 2 and 3, before reverting back to using vpu in Vpu-type 4, likely for greater viral efficiency. The virus continues to optimize, driving the diversification of HIV-1.
Fig. 6.
Proposed model of vpu gene evolution in SIV and HIV-1 strains. The types are labeled above each image. The viral strain and name of each monkey species are written in the individually colored boxes. Black arrows indicate the possible ancestors of SIVmon and SIVden. Vpu evolved through processes of optimization to create an ideal gene. Yellow stars mark the points of recombination and transmission events that contributed to the evolution of the Vpu gene. In our evolutionary model, we incorporated images of multiple monkey faces, after cropping them from their original sizes. The photographs were obtained from Irasutoya, Freepik by brgfx (https://www.freepik.com/) and Wikimedia Commons. The photographers from Wikimedia commons are credited as: Michael Gäbler, Alena Houšková, Thomas Springer, Peggy Motsch, Laetitia C, Paul Harrison, Wookiemedia, Aaron Logan, Jack Hynes, Six Plus by Libé and Madhero88. The images are distributed under the CC BY 3.0, CC0 1.0, CC BY-SA 3.0, CC BY-SA 4.0, and CC BY 2.5 licenses
Our large-scale analysis of thousands of HIV and SIV sequences revealed subtle genetic variations and evolutionary patterns, providing valuable insights into viral evolution. Similar studies tracking the evolution of protein and RNA molecules (Saito et al. 2019, 2023; Tsurumaki et al. 2022) and functional RNA molecules (Tamaki et al. 2018; Miura et al. 2022), further demonstrate that large-scale data analysis offers deeper evolutionary insights. The global view of vpu evolution underscores the importance of this approach in revealing molecular transitional states. Tracking viral evolution and monitoring new strains is essential for understanding adaptation processes, a key factor in managing high-mutation-rate viruses like HIV and SARS-CoV-2. This knowledge is crucial for developing effective strategies to combat viral diseases and protect public health.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgments
We thank members of the RNA Group at the Institute for Advanced Biosciences of Keio University, Japan, for insightful discussions. We also thank Miura Masahiro and Phillip Yamamoto for helping to create the Python script used in this study.
Funding
This work was supported, in part, by research funds from the Yamagata Prefectural Government and Tsuruoka City, Japan. The funding bodies played no role in the study design, data collection or analysis, the decision to publish, or the preparation of the manuscript.
Data Availability
The data supporting the findings of this work are available within the paper and its Supplementary Information files.
Declarations
Conflict of Interest
The authors declare that they have no conflicts of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Aghokeng AF, Ayouba A, Mpoudi-Ngole E et al (2010) Extensive survey on the prevalence and genetic diversity of SIVs in primate bushmeat provides insights into risks for potential new cross-species transmissions. Infect Genet Evol 10:386–396. 10.1016/j.meegid.2009.04.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailes E, Gao F, Bibollet-Ruche F et al (2003) Hybrid origin of SIV in chimpanzees. Science 300:1713. 10.1126/science.1080657 [DOI] [PubMed] [Google Scholar]
- Bbosa N, Kaleebu P, Ssemwanga D (2019) HIV subtype diversity worldwide. Curr Opin HIV AIDS 14:153–160. 10.1097/COH.0000000000000534 [DOI] [PubMed] [Google Scholar]
- Bi X, Liu LF (1996) A replicational model for DNA recombination between direct repeats. J Mol Biol 256:849–858. 10.1006/jmbi.1996.0131 [DOI] [PubMed] [Google Scholar]
- Bibollet-Ruche F, Bailes E, Gao F et al (2004) New simian immunodeficiency virus infecting De Brazza’s monkeys (Cercopithecus neglectus): evidence for a cercopithecus monkey virus clade. J Virol 78:7748–7762. 10.1128/JVI.78.14.7748-7762.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brandt BW, Feenstra KA, Heringa J (2010) Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 38:W35-40. 10.1093/nar/gkq415 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 10.1093/bioinformatics/btp348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chirico N, Vianelli A, Belshaw R (2010) Why genes overlap in viruses. Proc Biol Sci 277:3809–3817. 10.1098/rspb.2010.1052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen EA, Terwilliger EF, Sodroski JG, Haseltine WA (1988) Identification of a protein encoded by the vpu gene of HIV-1. Nature 334:532–534. 10.1038/334532a0 [DOI] [PubMed] [Google Scholar]
- Desire N, Cerutti L, Le Hingrat Q et al (2018) Characterization update of HIV-1 M subtypes diversity and proposal for subtypes A and D sub-subtypes reclassification. Retrovirology 15:80. 10.1186/s12977-018-0461-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Douglas JL, Bai Y, Gustin JK, Moses AV (2013) A comparative mutational analysis of HIV-1 Vpu subtypes B and C for the identification of determinants required to counteract BST-2/Tetherin and enhance viral egress. Virology 441:182–196. 10.1016/j.virol.2013.03.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Federau T, Schubert U, Flossdorf J et al (1996) Solution structure of the cytoplasmic domain of the human immunodeficiency virus type 1 encoded virus protein U (Vpu). Int J Pept Protein Res 47:297–310. 10.1111/j.1399-3011.1996.tb01359.x [DOI] [PubMed] [Google Scholar]
- Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao F, Bailes E, Robertson DL et al (1999) Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436–441. 10.1038/17130 [DOI] [PubMed] [Google Scholar]
- Gonzalez ME (2015) Vpu protein: the viroporin encoded by HIV-1. Viruses 7:4352–4368. 10.3390/v7082824 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodrich DW, Duesberg PH (1990) Retroviral recombination during reverse transcription. Proc Natl Acad Sci 87:2052–2056. 10.1073/pnas.87.6.2052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hemelaar J, Gouws E, Ghys PD, Osmanov S (2006) Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004. AIDS 20:W13–W23. 10.1097/01.aids.0000247564.73009.bc [DOI] [PubMed] [Google Scholar]
- Hirsch VM, Olmsted RA, Murphey-Corb M et al (1989) An African primate lentivirus (SIVsm) closely related to HIV-2. Nature 339:389–392. 10.1038/339389a0 [DOI] [PubMed] [Google Scholar]
- Hoang DT, Chernomor O, von Haeseler A et al (2018) UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol Biol Evol 35:518–522. 10.1093/molbev/msx281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsu K, Seharaseyon J, Dong P et al (2004) Mutual functional destruction of HIV-1 Vpu and host TASK-1 channel. Mol Cell 14:259–267. 10.1016/S1097-2765(04)00183-2 [DOI] [PubMed] [Google Scholar]
- Huet T, Cheynier R, Meyerhans A et al (1990) Genetic organization of a chimpanzee lentivirus related to HIV-1. Nature 345:356–359. 10.1038/345356a0 [DOI] [PubMed] [Google Scholar]
- Jern P, Sperber GO, Blomberg J (2005) Use of Endogenous Retroviral Sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology 2:50. 10.1186/1742-4690-2-50 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8:275–282. 10.1093/bioinformatics/8.3.275 [DOI] [PubMed] [Google Scholar]
- Takehisa J, Kraus MH, Ayouba A et al (2009) Origin and biology of simian immunodeficiency virus in wild-living western gorillas. J Virol 83:1635–1648. 10.1128/jvi.02311-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalyaanamoorthy S, Minh BQ, Wong TKF et al (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14:587–589. 10.1038/nmeth.4285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keele BF, Van Heuverswyn F, Li Y et al (2006) Chimpanzee reservoirs of pandemic and nonpandemic HIV-1. Science 313:523–526. 10.1126/science.1126531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirchhoff F (2009) Is the high virulence of HIV-1 an unfortunate coincidence of primate lentiviral evolution? Nat Rev Microbiol 7:467–476. 10.1038/nrmicro2111 [DOI] [PubMed] [Google Scholar]
- Kluge SF, Sauter D, Vogl M et al (2013) The transmembrane domain of HIV-1 Vpu is sufficient to confer anti-tetherin activity to SIVcpz and SIVgor Vpu proteins: cytoplasmic determinants of Vpu function. Retrovirology 10:32. 10.1186/1742-4690-10-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosiol C, Goldman N (2005) Different versions of the Dayhoff rate matrix. Mol Biol Evol 22:193–199. 10.1093/molbev/msi005 [DOI] [PubMed] [Google Scholar]
- Lee K, Kolb AW, Sverchkov Y et al (2015) Recombination analysis of herpes simplex virus 1 reveals a bias toward GC content and the inverted repeat regions. J Virol 89:7214–7223. 10.1128/JVI.00880-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9:745–756. 10.1093/bioinformatics/9.6.745 [DOI] [PubMed] [Google Scholar]
- Lovett ST (2004) Encoded errors: mutations and rearrangements mediated by misalignment at repetitive DNA sequences. Mol Microbiol 52:1243–1253. 10.1111/j.1365-2958.2004.04076.x [DOI] [PubMed] [Google Scholar]
- Mack K, Starz K, Sauter D et al (2017) Efficient Vpu-mediated tetherin antagonism by an HIV-1 Group O Strain. J Virol. 10.1128/JVI.02177-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madeira F, Madhusoodanan N, Lee J et al (2024) The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res 52:W521–W525. 10.1093/nar/gkae241 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malossi CD, Fioratti EG, Cardoso JF et al (2020) High genomic variability in equine infectious anemia virus obtained from naturally infected horses in Pantanal, Brazil: an endemic region case. Viruses 12:207. 10.3390/v12020207 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Minh BQ, Schmidt HA, Chernomor O et al (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miura MC, Nagata S, Tamaki S et al (2022) Distinct expansion of group II introns during evolution of prokaryotes and possible factors involved in its regulation. Front Microbiol 13:849080. 10.3389/fmicb.2022.849080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nickle DC, Heath L, Jensen MA et al (2007) HIV-specific probabilistic models of protein evolution. PLoS ONE 2:e503. 10.1371/journal.pone.0000503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saito M, Inose R, Sato A et al (2023) Systematic analysis of diverse polynucleotide kinase Clp1 family proteins in eukaryotes: three unique Clp1 proteins of trypanosoma brucei. J Mol Evol 91:669–686. 10.1007/s00239-023-10128-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saito M, Sato A, Nagata S et al (2019) Large-scale molecular evolutionary analysis uncovers a variety of polynucleotide kinase Clp1 family proteins in the three domains of life. Genome Biol Evol 11:2713–2726. 10.1093/gbe/evz195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sauter D, Schindler M, Specht A et al (2009) Tetherin-driven adaptation of Vpu and Nef function and the evolution of pandemic and nonpandemic HIV-1 strains. Cell Host Microbe 6:409–421. 10.1016/j.chom.2009.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sauter D, Unterweger D, Vogl M et al (2012) Human tetherin exerts strong selection pressure on the HIV-1 group N Vpu protein. PLoS Pathog 8:e1003093. 10.1371/journal.ppat.1003093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schindler M, Munch J, Kutsch O et al (2006) Nef-mediated suppression of T cell activation was lost in a lentiviral lineage that gave rise to HIV-1. Cell 125:1055–1067. 10.1016/j.cell.2006.04.033 [DOI] [PubMed] [Google Scholar]
- Schubert U, Ferrer-Montiel AV, Oblatt-Montal M et al (1996) Identification of an ion channel activity of the Vpu transmembrane domain and its involvement in the regulation of virus release from HIV-1-infected cells. FEBS Lett 398:12–18. 10.1016/s0014-5793(96)01146-5 [DOI] [PubMed] [Google Scholar]
- Schubert U, Strebel K (1994) Differential activities of the human immunodeficiency virus type 1-encoded Vpu protein are regulated by phosphorylation and occur in different cellular compartments. J Virol 68:2260–2271. 10.1128/JVI.68.4.2260-2271.1994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwartz S, Felber BK, Fenyo EM, Pavlakis GN (1990) Env and Vpu proteins of human immunodeficiency virus type 1 are produced from multiple bicistronic mRNAs. J Virol 64:5448–5456. 10.1128/JVI.64.11.5448-5456.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shah AH, Sowrirajan B, Davis ZB et al (2010) Degranulation of natural killer cells following interaction with HIV-1-infected cells is hindered by downmodulation of NTB-A by Vpu. Cell Host Microbe 8:397–409. 10.1016/j.chom.2010.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon CE (1948) The mathematical theory of communication. MD Comput 14:306–317 [PubMed] [Google Scholar]
- Sharp PM, Hahn BH (2011) Origins of HIV and the AIDS pandemic. Cold Spring Harb Perspect Med 1:a006841–a006841. 10.1101/cshperspect.a006841 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp PM, Shaw GM, Hahn BH (2005) Simian immunodeficiency virus infection of chimpanzees. J Virol 79:3891–3902. 10.1128/JVI.79.7.3891-3902.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soubrier J, Steel M, Lee MSY et al (2012) The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol Biol Evol 29:3345–3358. 10.1093/molbev/mss140 [DOI] [PubMed] [Google Scholar]
- Strebel K, Klimkait T, Maldarelli F, Martin MA (1989) Molecular and biochemical analyses of human immunodeficiency virus type 1 vpu protein. J Virol 63:3784–3791. 10.1128/JVI.63.9.3784-3791.1989 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klimkait T, Strebel K, Hoggan MD et al (1990) The human immunodeficiency virus type 1-specific protein vpu is required for efficient virus maturation and release. J Virol 64:621–629. 10.1128/jvi.64.2.621-629.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamaki S, Tomita M, Suzuki H, Kanai A (2018) Systematic analysis of the binding surfaces between tRNAs and their respective Aminoacyl tRNA synthetase based on structural and evolutionary data. Front Genet 8:227. 10.3389/fgene.2017.00227 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tavaré S, Miura RM (1986) Lectures on mathematics in the life sciences. pp 57–86
- Terwilliger EF, Cohen EA, Lu YC et al (1989) Functional role of human immunodeficiency virus type 1 vpu. Proc Natl Acad Sci U A 86:5163–5167. 10.1073/pnas.86.13.5163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsurumaki M, Saito M, Tomita M, Kanai A (2022) Features of smaller ribosomes in candidate phyla radiation (CPR) bacteria revealed with a molecular evolutionary analysis. RNA 28:1041–1057. 10.1261/rna.079103.122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Umviligihozo G, Cobarrubias KD, Chandrarathna S et al (2020) Differential Vpu-mediated CD4 and tetherin downregulation functions among major HIV-1 group M subtypes. J Virol 94:e00293-e320. 10.1128/JVI.00293-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Heuverswyn F, Li Y, Neel C et al (2006) Human immunodeficiency viruses: SIV infection in wild gorillas. Nature 444:164. 10.1038/444164a [DOI] [PubMed] [Google Scholar]
- Waterhouse AM, Procter JB, Martin DM et al (2009) Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189–1191. 10.1093/bioinformatics/btp033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387. 10.1016/0001-8708(76)90202-4 [Google Scholar]
- Willey RL, Maldarelli F, Martin MA, Strebel K (1992) Human immunodeficiency virus type 1 Vpu protein induces rapid degradation of CD4. J Virol 66:7193–7200. 10.1128/JVI.66.12.7193-7200.1992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wymant C, Bezemer D, Blanquart F et al (2022) A highly virulent variant of HIV-1 circulating in the Netherlands. Science 375:540–545. 10.1126/science.abk1688 [DOI] [PubMed] [Google Scholar]
- Yang Z (1995) A space-time process model for the evolution of DNA sequences. Genetics 139:993–1005. 10.1093/genetics/139.2.993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao W, Strebel K, Yamaoka S, Yoshida T (2022) Simian immunodeficiency virus SIVgsn-99CM71 Vpu employs different amino acids to antagonize human and greater spot-nosed monkey BST-2. J Virol 96:e0152721. 10.1128/JVI.01527-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao W, Yoshida T, Hashimoto S et al (2020) Vpu of a simian immunodeficiency virus isolated from greater spot-nosed monkey antagonizes human BST-2 via two AxxxxxxxW motifs. J Virol 94:e01669-e1719. 10.1128/JVI.01669-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshida T, Kao S, Strebel K (2011) Identification of residues in the BST-2 TM domain important for antagonism by HIV-1 Vpu using a gain-of-function approach. Front Microbiol. 10.3389/fmicb.2011.00035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshida T, Koyanagi Y, Strebel K (2013) Functional antagonism of rhesus macaque and chimpanzee BST-2 by HIV-1 Vpu is mediated by cytoplasmic domain interactions. J Virol 87:13825–13836. 10.1128/JVI.02567-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data supporting the findings of this work are available within the paper and its Supplementary Information files.





