Abstract
Evaluation of viral diversity is critical for the rational design of treatment modalities against Human immunodeficiency virus (HIV). Predominated by HIV-1 clade C (HIV-1C), the epidemic in India represents the third largest population infected with HIV-1 globally. Glycoprotein 41 (gp41) is critical for viral replication and is a target for the design of therapeutic strategies. However, documentation of viral diversity of gp41 gene in infected individuals from India remains limited. Present study employed high throughput sequencing to examine variation in gp41 amplicons generated from blood derived viruses in 24 HIV-1C infected individuals from Mumbai, India. Sequence diversity profiles were documented in different functional domains of gp41. Furthermore, through a meta-analysis approach, all reported gp41 sequences from India (N = 70) were compared with those from South Africa (N = 126), country with the largest HIV epidemic globally, also predominated by HIV-1C. A total of 44 positions displayed statistically significant differential (p < 0.05) Shannon entropy in the two regions. This comparison also identified 11 codon sites undergoing distinct selection, 8 of which remained differentially selected in an extended comparison of data from Asia (N = 137) and Africa(N = 383). Assessment of correlated mutation networks associated with differentially selected residues revealed common as well as distinct interaction networks. Furthermore, codon usage analysis revealed 17 differentially selected codons (Mann–Whitney test, p < 0.001) in Asia and Africa. Dissimilar trends in GC content across codon positions were also observed. In depth understanding of these divergent evolutionary signatures through extended analysis with larger data-sets would assist development of effective interventions being considered for HIV-1C.
Electronic supplementary material
The online version of this article (10.1007/s13337-020-00595-x) contains supplementary material, which is available to authorized users.
Keywords: HIV-1C, India, gp41, Evolution, Viral variation, Codon usage
Introduction
Preventing HIV infection as an interventionist strategy has remained elusive due to continual immune escape by the virus facilitated through antigenic variability which the host immune system is unable to surmount. Factors underlying this variability are poor proofreading activity of the reverse transcriptase enzyme, base substitution, recombination and accumulation of insertions and deletions (indels) resulting in generation of an array of viral variants, or 'Quasispecies' [21].
HIV envelope glycoprotein is a heterodimer of non-covalently associated gp120 (surface unit) and gp41 (transmembrane unit). The former initiates viral attachment to target cells through binding with CD4 and association with appropriate co-receptor while fusion and entry occurs through gp41. The transmembrane unit, gp41, can be delineated structurally into three major domains, which are, the extracellular domain or ectodomain, the trans-membrane domain and the cytoplasmic tail. The ecto-domain is further divided into the fusion peptide (FP), N-terminal repeat (NHR/HR1), C-terminal repeat (CHR/HR2) and Membrane proximal external region (MPER) [5]. The ectodomain region of gp41 is an important therapeutic target for modalities such as antiretroviral drugs and broadly neutralizing antibodies (bNAbs) [6, 39]. Enfuviritide (T20), an FDA approved fusion inhibitor competes with HR2 for binding to the HR1 domain, preventing the conformational change caused by 6HB formation and thereby restricting viral entry [8]. MPER has been demonstrated to be involved in membrane destabilization, flexibility for proper positioning of the fusion assembly as well as formation and expansion of the fusion pore through recruitment of additional gp41subunits [4]. The discovery of 10E8 bNAb by Huang et al. in 2012 has again highlighted the importance of MPER as a therapeutic target for bNAbs based therapy [20]. Indeed, in a recent study we evaluated the effect of diversity in MPER of primary HIV-1 Indian subtype C sequences on interaction with broadly neutralizing antibodies 4E10 and 10E8 [44]. The cytoplasmic tail of gp41 has been relatively less studied and its functional significance remains relatively unexplored. It has been shown to influence the conformation of ectodomain on infected cells and the virion surface during viral budding and fusion with target cells respectively. From a therapeutic standpoint it has been shown to affect viral sensitivity to conformation-dependent neutralizing antibodies [43]. Identification of novel functional motifs in this region and understanding their influence of envelope structure and function is an ongoing endeavor [41].
According to UNAIDS 2018 estimates, approximately 37 million people are infected with HIV worldwide, of which 46.6% are infected with HIV-1 clade C. Currently, at least 2.1 million HIV-1C infected individuals are estimated to reside in India [46, 18]. Yet, clade C sequence data remains sparse in the Los Alamos National Laboratories HIV database, a major resource for HIV research. Also, as of December 2019, the database contained < 20% sequences reported for clade C, with alarmingly meagre gp41 sequence data from India (N = 355). Based on this scant data sequences isolated from India for the gp41 region have previously been reported to cluster distinctly from sequences from Africa, another major geographical region for subtype C epidemic [1]. Lack of HIV sequence diversity/genetic drift data, especially pertaining to therapeutically important regions of the viral genome, represents a major obstacle in the development of intervention strategies. This study has used deep sequencing to document gp41 diversity of uncultured circulating plasma viral RNA as well as PBMC derived proviral DNA sequences isolated from HIV infected individuals at different stages of antiretroviral therapy and disease progression from Mumbai, India. Analysis of selection pressure as well as covariation have been employed to decipher evolutionary trends in clade C sequences from Asia and Africa. Analysis such as these are important to guide design and selection of therapeutic strategies such as fusion inhibitors, gp41 based vaccines for India as well as HIV-1 clade C in general.
Materials and methods
Study participants
Ethics statement
HIV-1 subtype C infected study participants were recruited from J.J. group of Hospitals, Mumbai following approval of ethics committees of both the participating Institutes (NIRRH and Grant Medical Government College). The informed consent forms were provided and duly signed by all study participants prior to recruitment. In the case of minors/children, duly signed informed consent was obtained from their next of kin/caretakers/guardian, as per the participating institutes’ ethics committee guidelines.
Collection of specimens
A total of 24 HIV-1 infected individuals (S1-S24) were recruited for the study with ages ranging from 8 to 55 years. These individuals had no documented coinfections at sampling. As indicated in Table 1, 8 participants were ART naïve while 16 were receiving ART at the time of sampling. Of the 16 ART receiving participants, 6 were receiving 1st line ART and 10 participants had previously experienced therapy failure and were now receiving 2nd line therapy (supplementary file 1). Participants selected for the study were chronically infected with documented period of infection ranging from a few months to 16 years. CD4 counts for each subject were estimated through flow cytometry on a BD FACS Calibur cytometer (Becton Dickinson, San Jose, CA). The total nucleic acids were isolated from plasma samples using MagNA pure automated Nucleic Acids isolation system (Roche) and Real time PCR was performed for viral load estimation with COBAS® Taqman HIV-1 V2.0, Roche as per manufacturer’s instructions.
Table 1.
Participant characteristics
Amplicon Designation | Participant Designation | Template source | Age | Sex | CD4 count (cells/mm3) | Viral load (copies/mL) | ART status |
---|---|---|---|---|---|---|---|
1 | S1 | Plasma | 8 | M | 966 | 45,034 | AN |
2 | Provirus | ||||||
3 | S2 | Plasma | 40 | F | 185 | 26,967 | AR |
4 | Provirus | ||||||
5 | S3 | Plasma | 8 | M | 864 | 146,462 | AN |
6 | S4 | Plasma | 43 | F | 427 | 68,856 | AN |
7 | S5 | Provirus | 40 | F | 440 | 2749 | AN |
8 | S6 | Provirus | 45 | M | 864 | Undetectable | AR |
9 | S7 | Provirus | 9 | M | 1324 | Undetectable | AR |
10 | S8 | Provirus | 35 | M | 230 | 101,032 | AN |
11 | S9 | Provirus | 40 | M | 162 | 304,046 | AR |
12 | S10 | Provirus | 40 | M | 372 | 184,723 | AR |
13 | S11 | Provirus | 55 | M | 124 | 34,062 | AR |
14 | S12 | Provirus | 48 | F | 269 | 7293 | AR |
15 | S13 | Provirus | 36 | F | 179 | 29,194 | AR |
16 | S14 | Provirus | 48 | M | 195 | < 34 | AR |
17 | S15 | Provirus | 51 | M | 194 | 97 | AR |
18 | S16 | Provirus | 45 | M | 185 | 848 | AR |
19 | S17 | Provirus | 45 | F | 202 | 132,788 | AR |
20 | S18 | Provirus | 42 | F | 2585 | Undetectable | AR |
21 | S19 | Provirus | 37 | F | 1061 | Undetectable | AR |
22 | S20 | Provirus | 43 | M | 714 | < 34 | AR |
23 | S21 | Provirus | 45 | F | 424 | < 34 | AR |
24 | S22 | Provirus | 28 | M | 534 | 197,054 | AN |
25 | S23 | Provirus | 45 | M | 612 | 6585 | AN |
26 | S24 | Provirus | 43 | M | 334 | 861,484 | AN |
AN ART naïve, AR ART receiving
Processing of samples
Plasma and PBMC separation
10 ml of blood was collected from each subject into EDTA vacutainers (BD, catalog no. 367525). Blood plasma was collected following centrifugation at 700 g for 10 min and the aliquots were stored at – 80 °C until further use. Peripheral Blood Mononuclear Cells (PBMCs) were isolated from each sample by continuous density gradient centrifugation. Briefly, blood samples were diluted 1:1 with RPMI-1640 (Himedia laboratories Ltd., India) and overlayed on Hisep (Himedia laboratories Ltd., India) in 3:1 proportion and centrifuged at 700 g for 20 min at room temperature. PBMCs obtained were given two washes with RPMI 1640 and ~ 5 × 106 cells were suspended in 200 µL RPMI-1640. Isolated PBMCs were further processed for extraction of genomic DNA.
Extraction of DNA and RNA
Genomic DNA was extracted form PBMCs by using QIAamp blood DNA mini kit (Qiagen, Valencia, CA) as per manufacturer’s instruction. The stored plasma samples were centrifuged at 1500 g for 15 min at room temperature and viral RNA was isolated using QIAamp Viral RNA mini kit (Qiagen, Valencia, CA) as per manufacturer’s instructions. Both DNA and RNA were checked for purity and quantitated spectrophotometrically.
Amplification of gp41 gene
Reverse transcription was performed for RNA templates using a gene-specific primer. Following this, cDNA as well as proviral DNA were used to amplify gp41 gene using two nested PCR strategies (protocol 1 for datasets 1–8 and protocol 2 for datasets 9–26) as described previously [44].
Next generation sequencing (NGS) of PCR products
Templates amplified by protocol 1 were sequence by NGS protocol 1 and those amplified by protocol 2 were sequenced by NGS protocol 2.
NGS protocol 1
Next generation sequencing of PCR products (1–8) was performed commercially (SciGenom Labs, Kochi, India). Library preparation was performed as per Truseq sample preparation protocol V2. Sequencing was carried out on the Illumina Miseq V2 platform to obtain 'paired read' data (read length 151 bases) in ‘FASTQ’ format. The Illumina Miseq sequencing data obtained in this protocol is available from the GenBank® Sequence Read Archive under study accession number SRP040990. (NCBI-SRA: SRP040990).
NGS protocol 2
Next generation sequencing of PCR products (S9-S24) was performed commercially (Interpretomics India Pvt.Ltd., Bengaluru, India). The DNA libraries were prepared from PCR amplicons using TrueSeq Nano DNA library preparation kit. Sequencing was carried out on Illumina Hiseq platform to obtain 'paired read' data (read length 101 bases) in ‘FASTQ’ format. The sequence data have been deposited with links to BioProject accession number PRJNA493619. (NCBI-SRA: SRP162802).
Templates from S5, S6, and S8 were amplified and sequenced by both amplification sequencing protocols for amplification and sequencing bias analyses.
NGS data analysis
Quality assessment and alignment of raw reads as well as variant calling was performed as described previously [44]. Position- wise sequence logos for LLP domains were prepared using the tool Weblogo 3 [11]. Amino acid frequencies derived from this analysis were used to generate a functional domain specific heat map using Circos v0.69 [23].
Phylogenetic analysis
Sequence retrieval from manually curated LANL HIV database was performed as depicted in Supplementary file 2. Briefly, uncultured HIV-1 subtype C sequences specific to the genomic region 7758–8795 (as per HXB2 numbering) were retrieved with a one sequence/subject filter available on the LANL HIV sequence search interface. Sequence entries without any information regarding the sample tissue source and sample source country were excluded. RIP HIV-1 subtype reference dataset was retrieved from LANL HIV database. Retrieved sequences along with consensus sequences generated by Vicuna in the present study were processed further for Phylogenetic analysis. Multiple sequence alignment for the nucleotide sequences along with HXB2 sequence (GenBank: K03455.1) was produced with Gene cutter (hiv.lanl.gov). Gene Cutter clips the coding regions from a nucleotide alignment and codon aligns the sequences based on Hmmer v 2.32 algorithm with a training set of the full-length genome alignment. Alignments were manually curated using Bioedit v7.2.5 [17].
Two phylogenetic trees were generated as follows: (1) RIP HIV-1 reference dataset with HIV-1C sequences retrieved from India and China, (2) All sequences for HIV-1C gp41 retrieved as per strategy depicted in supplementary file 2. Phylogenetic analysis were performed with Maximum likelihood (ML) method for sequence data sets using PhyML 3.0 [16].The best-fit Nucleotide substitution model was predicted by jModelTest2 [12]. The model selected under both Akaike Information criterion (AIC) and Bayesian Information Criterion (BIC) ranking was ‘General time reversible’ (GTR) nucleotide substitution model with a gamma distribution of rates (+ G) and a proportion of invariant sites (+ I). The robustness of the ML trees were further investigated with non-parametric bootstrap analysis available in PhyML with 100 pseudo-replicates. The trees were manually edited for presentation using FigTree v 1.4.0 [35]. Sequence entries selected from Asia and Africa were used to produce amino acid alignments specific to these geographic regions using Gene Cutter and edited in Bioedit v7.2.5. Pairwise distances were analyzed using DIVEIN webserver [13].
Entropy and N-linked glycosylation sites analysis
Multiple sequence alignment for the amino acid sequences was produced with Gene cutter. Shannon entropy for selected sequence data sets was generated using Entropy-two tool following 100 randomizations with replacement (www.hiv.lanl.gov). N-linked glycosylation sites were predicted in sequence datasets with N-GlycoSite tool hosted at the HIV-LANL database (www.hiv.lanl.gov).
Selection pressure analysis
Recombination analysis was performed on selected sequence datasets using GARD method implemented in HyPhy v2.2.7 [22]. Recombination-aware codon-by-codon selection pressure analysis was performed by single likelihood ancestor counting (SLAC) and fixed effects likelihood (FEL) method implemented in Hyphy. Population specific sites with differential selection pressures were identified using CompareSelectivePressure.bf batch file script of HyPhy.
Correlated mutations identification
CorMut package for R statistical computing software was used to identify correlated mutation networks through Mutual Information approach. Briefly, population specific codon-wise sequence alignments were provided as an input. A mutual information score of 0.10 as well as Benjamini–Hochberg adjusted p value less than 0.05 i.e. 5% false discovery rate was selected as cut-off for accepting correlation between mutations. Resulting networks were visualized and further analyzed using Cytoscape v3.5.1 [42].
All the statistical analyses were performed using GraphPad Prism version 5.01 for Windows, GraphPad Software, San Diego California USA, www.graphpad.com (“Prism—graphpad.com,”). Accession numbers for selected sequence datasets have been provided in supplementary file 3.
Codon Usage analysis
Codon usage statistics for datasets of India, South Africa, Asia and Africa as defined previously were generated with a java script developed by Palanisamy et al. [31]. Statistical Comparison of codon usage profiles was performed through R statistical computing software (v3.4.0) and R studio v1.0.143 [33, 34]. CodonW software (Version 1.4.4) (https://codonw.sourceforge.net) was used for determination of effective number of codons (Nc) and GC content at the third synonymous codon position (GC3s). GC content statistics for codon positions 1, 2 and 3 were generated using custom bash scripts. Plots of Nc vs GC3s as well as GC12 vs GC3 were prepared using ‘ggplot2′ package in R [48].
Results
Participant characteristics
Clinical specimens were collected from HIV-1C infected individuals (S1–S24) from ages ranging from 8–55. Eight of the participants were ART naïve and 16 were receiving antiretroviral therapy. All the participants were devoid of any documented co-infections at the time of sampling. Their CD4 counts ranged from 162–2585 cells/mm3 while their plasma viremia ranged from undetectable to 861,484 viral copies/mL as indicated in Table 1. Gp41 genes were analyzed from viruses derived from both blood plasma and genomic DNA for participants S1 and S2, from only plasma for participants S3 and S4 and from genomic DNA i.e. proviruses for all other (S5-S24) participants.
Diversity in the functional domains of gp41 and disease progression
Viral diversity was documented in all functional domains of gp41 across all 26 datasets generated from 24 study participants (Fig. 1). Consistent with earlier reports of structural and sequence conserved nature of gp41, intra-individual variation was observed to be < 15% in most of the amino acid positions in gp41. To quantify this variation across the participants, average amino acid variation per amino acid residue (AVPAA) was calculated for every individual at each of the gp41 functional units. The domains considered have been indicated with env HXB2 numbering in Fig. 1. Ectodomain of gp41 (Fusion peptide to Membrane proximal external region) was observed to be less variable (AVPAA:2.46%) compared to the Cytoplasmic tail (AVPAA: 2.88%). Within the ectodomain, NHR domain was observed to be the most conserved with AVPAA of 1.41% (range 0.03–4.12%) closely followed by the Fusion peptide (Mean 1.76%, range 0–11%) irrespective of stage of disease progression/therapy highlighting its functional importance. The variation observed in the loop region as well as MPER was similar with AVPAAs of 2.88% (range 0.03–4.12%) and 2.86% (range 0.03–4.12%) respectively. CHR was observed to be most variable part of the ectodomain with AVPAA of 3.41% (range 0.04–9.82%).
Fig. 1.
Variation analysis of gp41 gene. A circos plot was prepared from heat-maps of 26 high throughput sequencing datasets from 24 individuals depicting variation at each of the amino acid positions in gp41 gene. The heatmaps 1 to 26 datasets have been arranged radially inwards. HXB2 amino acid positions in the envelope gene gp160 (512–856 i.e. positions 1–345 in gp41) indicate different functional domains within gp41 gene. Each pixel in the heatmap depicts one amino acid position with color ranging from lighter (yellow) to darker (red) as per the observed variation as indicated in the color-key. The functional domains indicated are; FP Fusion peptide, NHR N-terminal Heptad repeat, loop, CHR C-terminal heptad repeat, MPER Membrane proximal external region, TM Transmembrane region, KE Kennedy epitope, LLP2, 3 and 1 Lentiviral lytic peptides 2, 3 and 1 (color figure online)
Drug resistance associated mutations
Fusion inhibitors are a class of drugs that prevent viral entry in the host cell by binding to NHR domain of gp41 and thereby preventing the essential 6-helix bundle formation [8]. Fusion inhibitors are currently not a part of nationally prescribed ART regimens. Upon codon by codon analysis of variation, ~ 15% of recruited participants (S9, S12, S15 and S21, N = 4) were found to harbor G36S mutation at 1.09–10.79% frequency. G36S mutation has been associated with resistance to T20- 'enfuviritide' fusion inhibitor drug [47].
MPER domain contains epitopes for four well studied broadly neutralizing antibodies: 2F5 (662–667), Z13e1 (666–677), 4E10 (671–683) and 10E8 (671–683) and we have recently reported on the effect of diversity in MPER of primary HIV-1 Indian subtype C sequences on interaction with broadly neutralizing antibodies 4E10 and 10E8 [44].
Variation in the cytoplasmic tail region
Consistent with earlier reports for inter-individual variation, functional domains (KE, LLP2, LLP3 and LLP1) in the cytoplasmic tail were found to be highly variable [43, 41]. In the cytoplasmic tail region, Kennedy epitope (Mean: 2.43%, range 0–6.68%) was observed to be most conserved while LLP1 (Mean: 3.81%, range 0–10.7%) was observed to be most variable. As expected, 6 arginine residues in the LLP1 and LLP2 domain were found to be conserved with an additional conserved residue at position 852 (Fig. 4; indicated with a red arrow). LLP3 domain demonstrated conservation of two lysine residues [24]. The YSPL(712–715), YHRL (768–771), YW(802–803) and LQ(856–857) domains implicated in clathrin dependent endocytosis, env expression and incorporation, endosome TGN trafficking and PM-endosome traffic were observed to be highly conserved (~ 0% variation) in our dataset [26]. A rare mutation (C-F/Y) was observed in three study participants at position 837 implicated in plasma membrane targeting of Env [49]. Another infrequent L to R mutation was also observed at LL (799–800) domain involved in cell to cell transmission via prohibitin 1/2 interaction [15].
Fig. 4.
Correlated mutation networks. a Common network for Asia and Africa with mutual information > 0.5. b Unique mutation network for site 640 in Asia and c Africa. d Common mutation network for residue 795 in Asia and Africa. e Unique mutation network for site 795 in Asia and f Africa. Width of the connecting edge between the sites is proportional to mutual information value. Sites indicated in triangular, rectangular and elliptical shape have been reported to be undergoing positive, negative and neutral evolution respectively
Analysis of Indian gp41 sequences in context of HIV-1 clade C sequences reported globally
Sequences from India for this region have previously been reported to cluster away from sequences from Africa, another major geographical region for subtype C epidemic and together with those from China [1]. A phylogenetic tree was constructed (Fig. 2a) to validate this observation with the gp41 sequences reported using the maximum likelihood method. Sequences used for this analysis included the RIP subtyping dataset provided by the LANL-HIV database consisting of consensus as well as reference sequences for group M (N = 50, HXB2: 7758–8795) along with uncultured sequences from India (N = 8) and China (N = 28). Consistent with the earlier observations, sequences from India and China clustered together with 96% bootstrap support indicating a recent common evolutionary ancestor. Out of the 5264 entries for uncultured HIV-1C in the LANL HIV database for gp41 gene, a total of 297 sequences were selected with one sequence/subject filter and exclusion of sequence entries without source tissue and source country information. A phylogenetic tree was constructed to assess the association of global clade C sequences (Fig. 2b). Total divergence observed in the sequences from India was 0.078 (mean average pairwise distance or APD). Indian sequences clustered together with sequences from China with a mean APD of 0.082. They were observed to cluster separately from European sequences with a mean APD of 0.118 and African sequences with a mean APD of 0.106. A survey of Los Alamos National Laboratory HIV sequence database (LANL-HIV) revealed presence of 290 sequences reported from India that fully/partially cover gp41 region. In a period from 1990 to 2018, an average of 9 sequences (Median 5) have been reported each year (Fig. 2c). Being the two major countries with phylogenetically distinct HIV-1 subtype C sequences, assessment of spatial evolutionary selection pressures was undertaken for sequences from India and South Africa through Entropy and synonymous and non-synonymous mutations analysis.
Fig. 2.
Analysis of gp41 sequences in context of HIV-1 clade C sequences reported globally. a Maximum likelihood tree was generated for gp41 gene from RIP HIV-1 subtype reference dataset provided by LANL-HIV database along with uncultured gp41 sequences from India (red, N = 8) and China (blue, N = 28). Bootstrap values have been indicated next to the respective nodes b Maximum likelihood phylogenetic tree was generated for HIV-1 clade C sequences observed globally with consensus sequences generated in the present analysis. Sequences from different regions of the world have been color coded as described in the color key. c Year-wise bar graph of sequences partially/fully covering gp41 region reported from India in the Los Alamos National Laboratory HIV sequence database. Red dotted line depicts average number of sequences (9.03) reported from 1990–2018 while the black dotted line represents median number of sequences (5) reported in the same period (color figure online)
Entropy and N-linked glycosylation site analysis
Entropy analysis were performed for a total of 70 sequences from India and 126 sequences from South Africa following filtration criteria described above. A total of 44 positions displayed statistically significant differential (p < 0.05) entropy following Monte Carlo randomization (n = 100; with replacements) strategy with Bonferroni correction (Fig. 3a). Of these, 11 sites had higher entropy in Indian sequences compared to South African data. Thirty-one out of 44 sites were present in functional domains with 7 sites having higher entropy in India data. Of the 31 differential sites, 19 were present in the cytoplasmic tail domains and 11 were observed in the ectodomain. KE had the greatest number of differential sites (7) followed by LLP2 (6) and LLP3 (5). Fusion peptide and transmembrane regions had minimum differences in entropy with 2 differential sites each. A comparison was also made between predicted N linked glycosylation sites (pNLGs) in these two regions (Fig. 3b). In both the data sets, pNLGs were found at positions 611, 616, 625, 637, 674 and 743. No significant difference was observed in the pNLG frequency across populations following an unpaired t test with Welch’s correction.
Fig. 3.
Entropy and N-linked glycosylation site analysis. a An entropy comparison was performed between sequences from South Africa (SA) and sequences from India (IN). Entropy difference has been calculated between SA (background) and IN (query). Red colored bars indicate positions with statistically significant (p < 0.05) difference between entropies of the two data sets. b Percent frequencies of predicted N linked glycosylation positions were compared between SA and IN. Frequency difference at each position was tested with unpaired t test with welch's correction. p > 0.05 was considered not significant (ns) (color figure online)
Selection pressure analysis
Following statistically significant differences observed in entropy of the functional domains of gp41 gene from India (IN) and South Africa (SA), presence of residues under differential selection pressures was hypothesized. To assess the hypothesis, recombination aware codon-by-codon selection pressure analysis were performed for the two sequence data sets by single likelihood ancestor counting (SLAC) and fixed effects likelihood (FEL) methods as described in materials and methods. Breakpoint analysis by GARD identified breakpoint sites at codon 635 and 657 in Indian and south African sequences, respectively. Fifty-one sites in IN dataset and 49 sites in SA dataset were observed to be under positive selection by either of the methods whereas 103 and 143 sites were observed to be under negative selection, respectively. Ratios of positively selected sites to negatively selected sites for both datasets were compared to the data published by Bandawe et al. from SA, the last available analysis of gp41 [3]. A non-parametric Spearman's correlation test indicated a high degree of correlation (p < 0.01) between all three datasets (supplementary file 4). Overall, positive to negatively selected sites ratio was lowest for the NHR loop as well as transmembrane region and highest for domains in the cytoplasmic tail. Twenty-four and 73 sites were indicated to be positively and negatively selected, respectively in the gp41 gene by both methods in Indian dataset. Number of residues predicted by both methods to be under positive and negative selection in South African data were 42 and 121, respectively. Four sites were found to be differentially selected compared to the published reports by Bandawe et al., Travers et al. and Choisy et al. as described in Table 2 [32–10]. Positions 565 and 725 were detected to be under diversifying selection in the present data that have not been identified for clade C. Similarly, positions 607 and 725 were reported to be under positive selection by published studies but were not found to be selected in Indian data generated in the present study. Three of these sites were also found to have significantly different entropies in the analysis described earlier with positions 607 and 648 having higher entropy in SA while position 725 demonstrated higher entropy in IN dataset.
Table 2.
Comparison of the positions of sites reported to be under diversifying selection in published work with present data
Residue position | India | South Africa | Comparison with published data* | ||
---|---|---|---|---|---|
SLAC | FEL | SLAC | FEL | ||
535 | – | – | – | – | Detected by Ba, Tr and Ch in non-C clades |
565 | + | + | – | – | Not detected for clade C |
607 | – | – | + | + | Detected by Ba and Tr for clade C |
612 | – | – | – | – | Not detected for clade C |
641 | + | + | – | – | Detected by Ba |
648 | – | – | + | + | Not detected for clade C |
674 | + | + | + | + | Detected by Ba |
676 | + | + | + | + | Detected by Ba |
683 | – | + | – | – | Detected by Ba |
721 | + | + | + | + | Detected by Ba, Tr and Ch |
725 | + | + | – | – | Not detected for clade C |
732 | – | + | + | + | Detected by Ba |
741 | – | – | – | + | Detected in non-C clades by Ba |
782 | – | + | + | + | Detected by Ba |
839 | + | + | + | + | Detected by Ba |
843 | – | – | – | – | Detected in non-C clades by Ba |
860 | – | + | + | + | Detected by Ba and Ch |
+ : detected, − : not detected
*Ba: Bandawe et al. (2008), Tr: Travers et al. (2005), Ch Choisy et al. (2003)
Residues in bold and underlined are uniquely reported in this study
CompareSelectivePressure.bf script of HyPhy was implemented to detect sites under statistically significant differential diversifying/purifying selection in IN and SA datasets. This analysis tests the hypothesis that β/α at a residue position are different between two populations. The script estimates dS, dN and dN/dS for every position for both populations and tests a null hypothesis that there is no difference in dN/dS ratios in the two populations. LR statistic and asymptotic p values are then reported, positions with p values < 0.05 were considered to be statistically significant in the present analysis. A total of twenty-seven sites were identified to be differentially selected between India and South Africa (supplementary file 5). Following filtration of sites with dN or dS values of 0 leading to inconclusive ratios, 11 sites were indicated to be differentially selected (Table 3). Out of 11, one site each was identified from CHR and MPER domains while remaining 9 were identified in the CT domain. Sites 795 and 811 were reported to be under diversifying selection in India as opposed to purifying selection in South Africa. Sites 770 and 824 were indicated to be under neutral evolution in India, while being positively and negatively selected in South Africa respectively. To evaluate the regional relevance of these sites, the analysis was expanded to sequences from Asia (N = 137) and Africa (N = 383). Sequence datasets were selected as described previously, on the basis of being uncultured/primary culture, availability of source tissue information and one sequence per individual. Sequence dataset from Asia included sequences from India (N = 77), China (N = 29), Nepal (N = 30) and Pakistan (N = 1) while the dataset from Africa included sequences from South Africa (N = 121), Botswana (N = 44), Zambia (N = 55), Zimbabwe (N = 1), Kenya (N = 14), Tanzania (N = 47), Malawi (N = 59), Cameroon (N = 2), Rwanda (N = 3), Senegal (N = 7), Nigeria (N = 1), The democratic republic of Congo (N = 2), Ethiopia (N = 24) and Gambia (N = 3). Eight of the previously identified 11 sites remained differentially selected even across the continents (described with bold and underlined font in Table 3).
Table 3.
Selection pressure comparison between India and South Africa
Codon | Region | dN1/dS1 | dN2/dS2 | Joint dN/dS | LRT | p value |
---|---|---|---|---|---|---|
640 | CHR | 10.62 | 1.49 | 2.20 | 4.63 | 0.03 |
658 | MPER | 0.52 | 0.10 | 0.18 | 6.73 | 0.01 |
701 | CT | 0.36 | 0.03 | 0.09 | 4.82 | 0.03 |
720 | CT | 5.55 | 0.38 | 1.01 | 6.85 | 0.01 |
754 | CT | 0.05 | 0.31 | 0.19 | 10.91 | 0.00 |
770 | CT-LLP2 | 0.47 | 3.67 | 2.03 | 7.58 | 0.01 |
789 | CT-LLP3 | 0.24 | 0.04 | 0.07 | 6.29 | 0.01 |
795 | CT-LLP3 | 15.01 | 0.48 | 1.23 | 16.34 | 0.00 |
805 | CT-LLP3 | 0.10 | 0.35 | 0.26 | 4.48 | 0.03 |
811 | CT-LLP3 | 6.34 | 0.41 | 0.90 | 8.88 | 0.00 |
824 | CT | 3.40 | 0.41 | 0.71 | 8.68 | 0.00 |
dN1/dS1 refers to sequence data from India and dN2/dS2 refers to sequence data from South Africa
Codon positions in bold and underlined font were found to be differentially selected in Asia versus Africa dataset as well
To address the contribution of interacting residues with mutations reported in the present analysis and to establish a signature mutational network in subtype C, a correlated mutation analysis was undertaken for sequence datasets from Asia and Africa.
Correlated mutation network analysis
Correlated mutation analysis was performed using CorMut package for R statistical computing software using a mutual information-based approach. Population specific codon sequence alignments were used as input for both Asia (N = 137) and Africa (N = 383). A mutual information score of 0.1 and Benjamini Hochberg adjusted p value 0.05% was selected as a threshold for acceptance of a correlation. The threshold was further raised to 0.5 to identify highly correlated mutations.
Asian dataset had 81 residues contributing to a correlated mutational network, while African dataset indicated a network of 88 residues. A network of 49 residues was found to be common for both the datasets with 10 residues in the LLP1 region correlating with mutual information score > 0.5 (Fig. 4a). Out of 8 differentially selected residues described previously, only 2, residue 640 (HR1) and residue 795 (LLP3) were found to be contributing to correlated mutation network. Correlation analysis indicated association of residue 640 and 641 to be common in both the regions. In the Asian dataset, residue 640 was found to be correlated with 7 residues, with two interactions (641 and 641) reported earlier (Fig. 4b). African dataset indicated presence of 9 correlated mutations, 3 of which (607, 641 and 644) have previously been reported (Fig. 4c) [14, 38]. The present study demonstrates strong evidence for involvement of residue 795 of LLP3 domain in a correlated mutation network. This site was observed to be a part of 25 residue network common to both the regions. Further filtration for residues from only functionally relevant regions of CT indicated a 15 residue network involving 2 KE, 3LLP1, 5 LLP2 and 5 LLP3 sites (Fig. 4d). Asia dataset indicated only 3 unique correlations for residue 795 (Fig. 4e) while African dataset had a unique network of 30 additional residues (Fig. 4f).
Codon usage analysis
Next, we assessed if there were any specific differences between the sequences from India and South Africa on codon level. Following application of the codon usage analysis script developed by Palanisamy et al., we found a total of 27 codons encoding 10 amino acids having statistically significantly different (p < 0.05, Mann–Whitney test) usage in IN and SA (Supplementary File 6). Twenty-four of these codons were also significantly different across Asia and Africa datasets used previously. Further, 17 codons encoding 8 amino acids had highly significant differences (p < 0.001) in IN and SA, 15 of which also displayed distinct usage in Asia and Africa (Table 4). In an attempt to dissect these differences, we plotted effective number of codons vs GC content in the third synonymous codon position (GC3S). As depicted in Fig. 5a, b, the effective number of codons remained similar in both comparisons of IN vs SA as well as Asia and Africa. However, we observed a slight increase in IN and Asia GC3S compared to SA and Africa respectively. To explore this further, neutrality plots i.e. GC12 (average of GC content in codon positions (1) and (2) versus GC3 (average of GC content in codon position (3) were prepared. As depicted in Fig. 5c, d, distribution of GC content in codon positions was different in IN versus SA as evidenced by the distinct slopes of linear regression lines (0.50 ± 0.09 and 0.02 ± 0.04 respectively). This observation was consistent in Asia and Africa data (slope: 0.30 ± 0.06 and 0.11 ± 0.04 respectively). Therefore, while GC content distribution was observed to be concordant in codon positions 1 and 2 with position 3 for IN and Asia, this relationship was discordant for SA and Africa where increase in distribution of GC content in position 3 was not matched by that in positions 1 and 2.
Table 4.
codon usage analysis
Amino acid | Codon | India (mean ± SEM) | South Africa (mean ± SEM) | p value | Asia (mean ± SEM) | Africa (mean ± SEM) | p value |
---|---|---|---|---|---|---|---|
L | UUA | 13.73 ± 0.38 | 11.98 ± 0.27 | 0.0002995 | 13.87 ± 0.25 | 12.03 ± 0.15 | 4.279E−10 |
L | UUG | 21.48 ± 0.37 | 23.36 ± 0.31 | 0.0006258 | 21.87 ± 0.25 | 23.75 ± 0.19 | 5.929 E−07 |
L | CUG | 22.18 ± 0.35 | 20.45 ± 0.29 | 0.0002611 | 22.21 ± 0.23 | 20.79 ± 0.17 | 4.287 E−07 |
I | AUC | 11.95 ± 0.49 | 13.71 ± 0.31 | 0.000196 | 12.43 ± 0.32 | 13.23 ± 0.19 | 0.009015 |
I | AUA | 61.66 ± 0.52 | 59.54 ± 0.36 | 0.0003605 | 61.23 ± 0.37 | 60.37 ± 0.23 | 0.01209 |
S | AGU | 31.97 ± 0.69 | 28.46 ± 0.49 | 5.036 E−05 | 32.26 ± 0.49 | 29.48 ± 0.29 | 7.426 E−07 |
S | AGC | 21.63 ± 0.83 | 26.83 ± 0.48 | 1.207 E−07 | 22.41 ± 0.54 | 25.9 ± 0.31 | 7.96 E−09 |
A | GCC | 12.82 ± 0.62 | 10.95 ± 0.41 | 0.0005687 | 13.41 ± 0.37 | 10.88 ± 0.23 | 1.019 E−10 |
A | GCA | 29.89 ± 0.70 | 39.51 ± 0.59 | 2.2 E−16 | 29.91 ± 0.54 | 39.05 ± 0.35 | 2.2 E−16 |
A | GCG | 19.93 ± 0.61 | 15.88 ± 0.45 | 1.576 E−06 | 19.83 ± 0.42 | 15.8 ± 0.26 | 5.141 E−14 |
Q | CAA | 41.69 ± 0.53 | 44.75 ± 0.40 | 6.28 E−06 | 42.56 ± 0.38 | 44.72 ± 0.25 | 1.755 E−06 |
Q | CAG | 58.31 ± 0.53 | 55.25 ± 0.40 | 6.28 E−06 | 57.44 ± 0.38 | 55.28 ± 0.25 | 1.755 E−06 |
N | AAU | 49.04 ± 1.10 | 56.49 ± 0.87 | 4.828 E−07 | 48.36 ± 0.77 | 57.9 ± 0.46 | 2.2 E−16 |
N | AAC | 50.96 ± 1.10 | 43.51 ± 0.87 | 4.828 E−07 | 51.64 ± 0.77 | 42.1 ± 0.46 | 2.2 E−16 |
C | UGU | 12.429 ± 1.58 | 18.82 ± 1.18 | 0.0008838 | 16.86 ± 1.14 | 16.56 ± 0.70 | 0.8087 |
C | UGC | 87.57 ± 1.58 | 81.18 ± 1.18 | 0.0008838 | 83.14 ± 1.14 | 83.44 ± 0.70 | 0.8087 |
G | GGG | 13.82 ± 0.45 | 11.66 ± 0.28 | 2.509 E−06 | 14.17 ± 0.31 | 12.12 ± 0.19 | 3.757 E−10 |
p values obtained through Mann–Whitney test; 95% CI; two-tailed
Fig. 5.
Codon usage analysis. Number of codons (Nc: Y axis) have been plotted against GC content in the 3rd synonymous codon position (GC3s: X axis) from available data for a India (red) and South Africa (blue) and b Asia (red) and Africa. Black curves in both the plots indicate expected number of codons for given GC3s values on the respective X axes. Neutrality scatter plots i.e. Average of GC content in codon positions 1 & 2 (GC12, Y axis) have been plotted against the same for position 3 (GC3: X axis) for c India (red) and South Africa (blue) and d Asia (red) and Africa. Linear regression lines (with 95% confidence intervals) have been color coded as per their respective regions in both plots C and D (color figure online)
In summary, distinct evolutionary trends were observed in entropy, selection pressure, mutation interaction networks as well as codon usage in gp41 gene in available HIV-1C sequences from Indian and South African epidemics.
Discussion
Emerging HIV diversity and evolution studied through sequence analysis is instructive of epidemiological fitness of the virus that is important for design of therapies [40]. Sub-optimal penetrance and adherence to ART regimens in India leads to transmission risk of variants even in the chronic phase of infection [27]. Consideration of both circulating viral (potentially transmitted) and proviral (archival reservoirs) sequences from acute as well as chronically infected individuals is imperative for designing of therapeutic modalities. The present study assessed two facets of viral evolution. Firstly, the sequence diversity of the gp41 gene was examined in circulating viral RNA and proviral DNA in 24 HIV-1 clade C infected individuals at different stages of ART from India. The analysis was then extended to study differential selection pressure and correlated mutation networks in clade C gp41 gene from Asia and Africa.
Two categories of chronically infected participants were selected viz. ART naïve, ART receiving. In light of the expected ramp up of ART following WHO 'test and treat' guidelines, ART receiving individuals are representative of a growing cohort which is at the risk of failing therapy and will require novel treatment modalities. Recently, drug resistance has also been associated with escape from humoral responses in individuals receiving ART for a long time and experiencing drug failure [29]. This observation underscores importance of analyzing of viral variation in both ART naïve and experienced individuals. Our study did not find any specific signature differences associated with ART in the recruited individuals. Minor variants (frequency < 15%) capable of causing primary resistance to the drug enfuviritide (T-20) that targets NHR domain by mimicking CHR was observed in ~ 15% of enfuviritide naïve individuals. Nonetheless, being the most conserved, NHR would be an attractive drug target to tackle the increasing prevalence of drug resistance for currently prescribed regimens in India [7]. Indeed, efforts are being made to improve pharmacokinetics of T-20 through fusion with Polyethylene glycol [9]. While the exact topology of cytoplasmic tail domains of gp41 remains controversial till date, LLP domains have been reported to be highly variable yet structurally conserved [43]. In the present study, LLP2 domain was observed to be most conserved with LLP1/KE being most variable. CT domain has been reported to include several signature residues that interact with various cellular factors to maintain viral replication, env surface expression, env trafficking through trans golgi network [41] High conservation (< 1% variation) was observed for these residues in high throughput (> 20,000×) data from archival proviral DNA even within therapy failing participants. This reiterates their functional importance and supports the use of native trimeric envelope structures with functional CTs as immunogens [36].
Indian HIV epidemic has been reported to have its origins in African viral strains, yet we observe distinct phylogenetic clustering of African and Indian sequences confirming previous studies [1, 28]. Several reports have implied poor replication capacity of clade C virus relative to other Group M clades [41]. However, Bachu M et al. reported increasing prevalence of a clade C viral cluster with additional NF-κB site in the U3 region of LTR in India culminating in superior magnitude of transcription and thereby enhanced viral predominance [2]. Also, Rao VR et al. demonstrated differences between clade C viruses from Southeast Asia and South Africa in terms of dicystein motifs that caused increased neurovirulence [37]. Thus, assessment of disparate selection pressures operational in Africa and Asia may help us predict epidemic dynamics and aid design of drug therapies [28, 37]. Characterization of selection pressures is central to understanding viral evolution and its pathogenesis. Selection pressure comparisons were made between data from India and South Africa, two major hubs of clade C infection. No gross differences were observed in selection pressures acting on any functional domains with gp41 in the two regions. However, a more detailed site-by-site assessment identified four sites to be differentially selected compared to existing data that was generated a decade ago [32–10]. A direct comparison between the populations as implemented in HyPhy further identified 11 sites under disparate selection, seven of which are present in functionally defined domains. This analysis was then expanded to Asia vs Africa wherein sequence data from 4 and 14 countries was taken into account respectively. Eight of the 11 sites remained differentially selected, underscoring the robustness of the analysis. This data highlights the need to delineate the role of host population HLA haplotypes in the selection pressures operating on these residues. To assess the putative effect of these differentially selected sites on other domains of gp41, correlated mutation network analysis was undertaken. Two of the 8 residues were found to be a part of correlated mutation networks. Interestingly, residue 795 was observed to be a part of two distinct networks in the two regions that have been reported to affect gp120V3 loop coreceptor tropism [30]. Further investigations are required to delineate the role of these residues in terms of viral replication/infectivity and therefore, implications for epidemic dynamics and implementation of eradication strategies in Asia and Africa. Bias in codon usage is a critical evolutionary feature that provides important information regarding evolution and exogenous gene expression [32]. Our study indicated significant difference between frequencies of certain codons but overall pattern of usage of synonymous codons remained unchanged. While we did not observe any bias in the codon usage between Asian and African epidemic, the GC content distribution between codon positions was observed to have differential trends. GC content distribution is known to be associated with differential evolutionary patterns, highlighting the importance of investigating this observation further with larger datasets [25].
Present study involved validation and variation detection sensitivity threshold determination through employment of two independent amplification and sequencing strategies to reduce experimental/technical artefacts. However, the tools used in the present study have known limitations in handling viral data with higher frequency of insertions and deletions. With respect to very low frequency variant elucidation (< 0.1%), as has been reported for other NGS based analysis [19], the present study findings were also limited by the absence of critical strategies to differentiate between true diversity and experimentally induced or in silico recombination leading to a higher than expected sensitivity threshold (~ 6%). Furthermore, due to absence of a definitive crystal structure for gp41, a protein that undergoes many transient conformations, the correlated mutations remain to be structurally substantiated. While we ensured that the codon usage bias and selection pressure analyses presented herein were performed robustly with curated datasets, the main limitation was the availability of the data itself. A larger dataset with wider geographical coverage collected over regular time intervals would be indispensable to detect accurate evolutionary patterns in the HIV-1 clade C that afflicts nearly half of people living with HIV/AIDS.
A detailed analysis of the sequence diversity and evolutionary trends as reported in the present study, would advance the understanding of host- pathogen interactions governing critical envelope functions and future evolution of HIV.
A comprehensive knowledge of HIV micro-evolution is indispensable for development of successful interventional modalities. The differential viral selection pressures at play within spatially distinct HIV infected populations described herein requires further investigation, highlighting the importance of frequent surveillance of circulating molecular clones in distinct populations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Funding
Funding for this study was provided through The Ramalingaswami Fellowship received by Vainav Patel (DBT, India) and intramural funding provided by ICMR, India. Jyoti Sutar is supported by research fellowship provided by the Lady Tata Memorial Trust (India).
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Vainav Patel, Email: vainavp@gmail.com.
Atmaram Bandivdekar, Email: batmaram@gmail.com.
References
- 1.Agnihotri KD, Tripathy SP, Jere AP, Kale SM, Paranjape RS. Molecular analysis of gp41 sequences of HIV type 1 subtype C from India. J Acquir Immune Defic Syndr. 2006;41:345–351. doi: 10.1097/01.qai.0000209898.67007.1a. [DOI] [PubMed] [Google Scholar]
- 2.Bachu M, Yalla S, Asokan M, Verma A, Neogi U, Sharma S, et al. Multiple NF-κB sites in HIV-1 subtype C long terminal repeat confer superior magnitude of transcription and thereby the enhanced viral predominance. J Biol Chem. 2012;287:44714–44735. doi: 10.1074/jbc.M112.397158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bandawe GP, Martin DP, Treurnicht F, Mlisana K, Karim SSA, Williamson C, et al. Conserved positive selection signals in gp41 across multiple subtypes and difference in selection signals detectable in gp41 sequences sampled during acute and chronic HIV-1 subtype C infection. Virol J. 2008;5:141. doi: 10.1186/1743-422X-5-141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bellamy-McIntyre AK, Lay C-S, Baär S, Maerz AL, Talbo GH, Drummer HE, et al. Functional links between the fusion peptide-proximal polar segment and membrane-proximal region of human immunodeficiency virus gp41 in distinct phases of membrane fusion. J Biol Chem. 2007;282:23104–23116. doi: 10.1074/jbc.M703485200. [DOI] [PubMed] [Google Scholar]
- 5.Blumenthal R, Durell S, Viard M. HIV entry and envelope glycoprotein-mediated fusion. J Biol Chem. 2012;287:40841–40849. doi: 10.1074/jbc.R112.406272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Burton DR, Poignard P, Stanfield RL, Wilson IA. Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses. Science. 2012;337:183–186. doi: 10.1126/science.1225416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cambiano V, Bertagnolio S, Jordan MR, Lundgren JD, Phillips A. Transmission of drug resistant HIV and its potential impact on mortality and treatment outcomes in resource-limited settings. J Infect Dis. 2013;207:S57–S62. doi: 10.1093/infdis/jit111. [DOI] [PubMed] [Google Scholar]
- 8.Cervia JS, Smith MA. Enfuvirtide (T-20): a novel human immunodeficiency virus type 1 fusion inhibitor. Clin Infect Dis. 2003;37:1102–1106. doi: 10.1086/378302. [DOI] [PubMed] [Google Scholar]
- 9.Cheng S, Wang Y, Zhang Z, Lv X, Gao GF, Shao Y, et al. Enfuvirtideĝ’PEG conjugate: a potent HIV fusion inhibitor with improved pharmacokinetic properties. Eur J Med Chem. 2016;121:232–237. doi: 10.1016/j.ejmech.2016.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Choisy M, Guégan JF, Woelk CH, Robertson DL. Comparative study of adaptive molecular evolution in different human immunodeficiency virus groups and subtypes. J Virol. 2004;78:1962–1970. doi: 10.1128/JVI.78.4.1962-1970.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Crooks GE, Hon G, Chandonia J, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 2012;9:772–772. doi: 10.1038/nmeth.2109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Deng W, Maust BS, Nickle DC, Learn GH, Liu Y, Heath L, et al. DIVEIN: A web server to analyze phylogenies, sequence divergence, diversity, and informative sites. Biotechniques. 2010;48:405–408. doi: 10.2144/000113370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dimonte S, Mercurio F, Svicher V, D’Arrigo R, Perno C-F, Ceccherini-Silberstein F. Selected amino acid mutations in HIV-1 B subtype gp41 are associated with specific gp120v3 signatures in the regulation of co-receptor usage. Retrovirology. 2011;8:33. doi: 10.1186/1742-4690-8-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Emerson V, Holtkotte D, Pfeiffer T, Wang IH, Bosch V, Schnölzer M, et al. Identification of the cellular prohibitin 1/prohibitin 2 heterodimer as an interaction partner of the C-terminal cytoplasmic domain of the HIV-1 glycoprotein. J Virol. 2010;84:1355–1365. doi: 10.1128/JVI.01641-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 30. Syst Biol. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- 17.Hall T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. In: Nucleic Acids Symposium Series 1999. pp. 95–8.
- 18.Hemelaar J, Elangovan R, Yun J, Dickson-Tetteh L, Fleminger I, Kirtley S, et al. Global and regional molecular epidemiology of HIV-1, 1990–2015: a systematic review, global survey, and trend analysis. Lancet Infect Dis. 2019;19:143–155. doi: 10.1016/S1473-3099(18)30647-9. [DOI] [PubMed] [Google Scholar]
- 19.Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 2012 doi: 10.1371/journal.ppat.1002529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huang J, Ofek G, Laub L, Louder MK, Doria-Rose NA, Longo NS, et al. Broad and potent neutralization of HIV-1 by a gp41-specific human antibody. Nature. 2012;491:406–412. doi: 10.1038/nature11544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Korber B, Gaschen B, Yusim K, Kesmir C, Detours V, Thakallapally R, et al. Evolutionary and immunological implications of contemporary HIV-1 variation. Br Med Bull. 2001;58:19–42. doi: 10.1093/bmb/58.1.19. [DOI] [PubMed] [Google Scholar]
- 22.Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005;21:676–679. doi: 10.1093/bioinformatics/bti079. [DOI] [PubMed] [Google Scholar]
- 23.Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kuhlmann AS, Steckbeck JD, Sturgeon TJ, Craigo JK, Montelaro RC. Unique functional properties of conserved arginine residues in the lentivirus lytic peptide domains of the C-terminal tail of HIV-1 gp41. J Biol Chem. 2014;289:7630–7640. doi: 10.1074/jbc.M113.529339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li J, Zhou J, Wu Y, Yang S, Tian D. GC-content of synonymous codons profoundly influences amino acid usage. Genes Genomes Genet. 2015;5:2027–2036. doi: 10.1534/g3.115.019877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lopez-Vergès S, Camus G, Blot G, Beauvoir R, Benarous R, Berlioz-Torrent C. Tail-interacting protein TIP47 is a connector between Gag and Env and is required for Env incorporation into HIV-1 virions. Proc Natl Acad Sci U S A. 2006;103:14947–14952. doi: 10.1073/pnas.0602941103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mhaskar R, Alandikar V, Emmanuel P, Djulbegovic B, Patel S, Patel A, et al. Adherence to antiretroviral therapy in India: a systematic review and meta-analysis. Indian J Commun Med. 2013;38:74–82. doi: 10.4103/0970-0218.112435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Neogi U, Bontell I, Shet A, de Costa A, Gupta S, Diwan V, et al. Molecular epidemiology of HIV-1 subtypes in India: origin and evolutionary history of the predominant subtype C. PLoS ONE. 2012;7:e39819. doi: 10.1371/journal.pone.0039819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ouyang Y, Yin Q, Li W, Li Z, Kong D, Wu Y, et al. Escape from humoral immunity is associated with treatment failure in HIV-1-infected patients receiving long-term antiretroviral therapy. Sci Rep. 2017;7:6222. doi: 10.1038/s41598-017-05594-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Pacheco-Martínez E, Figueroa-Medina E, Villarreal C, Cocho G, Medina-Franco JL, Méndez-Lucio O, et al. Statistical correlation of nonconservative substitutions of HIV gp41 variable amino acid residues with the R5X4 HIV-1 phenotype. Virol J. 2016;13:28. doi: 10.1186/s12985-016-0486-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Palanisamy N, Osman N, Ohnona F, Xu H-T, Brenner B, Mesplède T, et al. Does antiretroviral treatment change HIV-1 codon usage patterns in its genes: a preliminary bioinformatics study. AIDS Res Ther. 2017;14:2. doi: 10.1186/s12981-016-0130-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Pandit A, Sinha S. Differential trends in the codon usage patterns in HIV-1 genes. PLoS ONE. 2011 doi: 10.1371/journal.pone.0028889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.; 2018. https://www.r-project.org/
- 34.RStudio Team. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA; 2015. https://www.rstudio.com/
- 35.Rambaut A. Figtree. 2018. https://tree.bio.ed.ac.uk/software/figtree/
- 36.Rangasamy SP, Menon V, Dhopeshwarkar P, Pal R, Vaniambadi KS, Mahalingam S. Membrane bound Indian clade C HIV-1 envelope antigen induces antibodies to diverse and conserved epitopes upon DNA prime/protein boost in rabbits. Vaccine. 2016;34:2444–2452. doi: 10.1016/j.vaccine.2016.03.062. [DOI] [PubMed] [Google Scholar]
- 37.Rao VR, Neogi U, Talboom JS, Padilla L, Rahman M, Fritz-French C, et al. Clade C HIV-1 isolates circulating in Southern Africa exhibit a greater frequency of dicysteine motif-containing Tat variants than those in Southeast Asia and cause increased neurovirulence. Retrovirology. 2013;10:61. doi: 10.1186/1742-4690-10-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rawi R, Kunji K, Haoudi A, Bensmail H. Coevolution analysis of HIV-1 envelope glycoprotein complex. PLoS ONE. 2015;10:e0143245. doi: 10.1371/journal.pone.0143245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Salzwedel K, West JT, Hunter E. A conserved tryptophan-rich motif in the membrane-proximal region of the human immunodeficiency virus type 1 gp41 ectodomain is important for env-mediated fusion and virus infectivity. J Virol. 1999;73:2469–2480. doi: 10.1128/JVI.73.3.2469-2480.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sangeda RZ, Theys K, Beheydt G, Rhee SY, Deforche K, Vercauteren J, et al. HIV-1 fitness landscape models for indinavir treatment pressure using observed evolution in longitudinal sequence data are predictive for treatment failure. Infect Genet Evol. 2013;19:349–360. doi: 10.1016/j.meegid.2013.03.014. [DOI] [PubMed] [Google Scholar]
- 41.Santos da Silva E, Mulinge M, Perez Bercoff D. The frantic play of the concealed HIV envelope cytoplasmic tail. Retrovirology. 2013;10:54. doi: 10.1186/1742-4690-10-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Steckbeck JD, Kuhlmann AS, Montelaro RC. C-terminal tail of human immunodeficiency virus gp41: functionally rich and structurally enigmatic. J Gen Virol. 2013;94:1–19. doi: 10.1099/vir.0.046508-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Sutar J, Padwal V, Sonawani A, Nagar V, Patil P, Kulkarni B, et al. Effect of diversity in gp41 membrane proximal external region of primary HIV-1 Indian subtype C sequences on interaction with broadly neutralizing antibodies 4E10 and 10E8. Virus Res. 2019;273:197763. doi: 10.1016/j.virusres.2019.197763. [DOI] [PubMed] [Google Scholar]
- 45.Travers SAA, O’Connell MJ, McCormack GP, McInerney JO. Evidence for heterogeneous selective pressures in the evolution of the env gene in different human immunodeficiency virus type 1 subtypes. J Virol Am Soc Microbiol. 2005;79:1836–1841. doi: 10.1128/JVI.79.3.1836-1841.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.UNAIDS. The Joint United Nations Programme on HIV/AIDS (UNAIDS) Data 2018. UNAIDS. 2018;1–376. Available from: https://www.unaids.org/sites/default/files/media_asset/unaids-data-2018_en.pdf
- 47.Wensing AW. Update of the drug resistance mutations in HIV-1 annemarie. Top Antivir Med. 2015;2015(23):132–141. [PMC free article] [PubMed] [Google Scholar]
- 48.Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer; 2016. [Google Scholar]
- 49.Yang P, Ai L-S, Huang S-C, Li H-F, Chan W-E, Chang C-W, et al. The cytoplasmic domain of human immunodeficiency virus type 1 transmembrane protein gp41 harbors lipid raft association determinants. J Virol Am Soc Microbiol. 2010;84:59–75. doi: 10.1128/JVI.00899-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.