HIV-1 tropism prediction by the XGboost and HMM methods

Xiang Chen; Zhi-Xin Wang; Xian-Ming Pan

doi:10.1038/s41598-019-46420-4

. 2019 Jul 10;9:9997. doi: 10.1038/s41598-019-46420-4

HIV-1 tropism prediction by the XGboost and HMM methods

Xiang Chen ¹, Zhi-Xin Wang ¹, Xian-Ming Pan ^2,^✉

PMCID: PMC6620319 PMID: 31292462

Abstract

Human Immunodeficiency Virus 1 (HIV-1) co-receptor usage, called tropism, is associated with disease progression towards AIDS. Furthermore, the recently developed and developing drugs against co-receptors CCR5 or CXCR4 open a new thought for HIV-1 therapy. Thus, knowledge about tropism is critical for illness diagnosis and regimen prescription. To improve tropism prediction accuracy, we developed two novel methods, the extreme gradient boosting based XGBpred and the hidden Markov model based HMMpred. Both XGBpred and HMMpred achieved higher specificities (72.56% and 72.09%) than the state-of-the-art methods Geno2pheno (61.6%) and G2p_str (68.60%) in a 10-fold cross validation test at the same sensitivity of 93.73%. Moreover, XGBpred had more outstanding performances (with AUCs 0.9483, 0.9464) than HMMpred (0.8829, 0.8774) on the Hivcopred and Newdb (created in this work) datasets containing larger proportions of hard-to-predict dual tropic samples in the X4-using tropic samples. Therefore, we recommend the use of our novel method XGBpred to predict tropism. The two methods and datasets are available via http://spg.med.tsinghua.edu.cn:23334/XGBpred/. In addition, our models identified that positions 5, 11, 13, 18, 22, 24, and 25 were correlated with HIV-1 tropism.

Subject terms: HIV infections, Diagnosis

Introduction

Human Immunodeficiency Virus 1 (HIV-1) is a retrovirus which mainly infects T-lymphocytes, macrophages and dendritic cells¹. HIV-1 enters into those host cells by chronologically interacting with primary receptors and co-receptors². Fourteen co-receptors have been detected in vitro³. However, in vivo, the major co-receptors are CCR5 and CXCR4^4,5. Indeed, a vast majority of subtype B and probably all subtype C HIV-1 positive individuals are initially infected via CCR5². Viruses using CCR5 are known as R5 tropic, whereas viruses using CXCR4 are called X4 tropic. R5X4 or dual tropic viruses as a third class can bind to either CCR5 or CXCR4⁶. For simplicity, X4 and dual tropic viruses are called X4-using tropic.

R5 tropic viruses start the HIV-1 infection⁷. This start is shown by the HIV-1 resistance in individuals where the function of CCR5 is disabled by a homozygous ccr5-Δ32 gene^4,8,9. Besides, X4-using tropic viruses are associated with disease progression, since those viruses emerge at the later stage of an infection in about half of the infected individuals^2,8,10–12. Furthermore, Miraviroc (MVC), a CCR5 antagonist and the only FDA-approved entry inhibitor, binds to the hydrophobic transmembrane helices of CCR5 so as to allosterically inhibit viruses from entering¹³. It has been proved that MVC cannot transform R5 viruses into X4-using viruses^14,15. Consequently, it becomes clear that tropism testing is necessary for several reasons: (1) To determine the illness progression^2,11; (2) To decide whether MVC can be used¹⁰; and (3) To monitor changes in viral quasispecies in order to modify regimens in time⁴.

In the last decades, two kinds of tropism testing methods, phenotypic and genotypic, have been developed. The phenotypic methods, such as ES-Trofile, are expensive, time-consuming, poorly accessible due to requiring specialized centers, and cannot provide consistent results when the viral load is below 1000 copies/ml¹⁶. Thus, the application of these methods is limited in clinical routines in Europe^5,8. Instead, the genotypic tropism testing is a preferred method due to low cost, reduced turnaround time and great accessibility, even when the viral load is below 1000 copies/ml¹⁷. In contrast to phenotypic methods, genotypic methods are based on statistics or machine learning. These methods analyze the third variable (V3) loop of the viral glycoprotein gp120, which predominantly determines its tropism¹⁸. The earliest proposed genotypic method for prediction of X4-using tropism is the 11/25 rule. This rule is based on the presence of a positively charged amino acid in positions 11 or 25 of the V3 sequence¹⁹. Other genotypic methods such as WebPSSM^20,21 and CM²² predict tropism based on scores that are calculated from position specific score matrices (PSSMs). In detail, WebPSSM constructs ungapped PSSMs, while CM constructs gapped PSSMs and takes the 11/25 rule and net charge into consideration. Recently, many genotypic methods based on machine learning have also been published. The method Geno2pheno⁴ combines two machine learning approaches, support vector machine (SVM) and decision trees, and uses clinical information such as viral loads and CD4-cell counts if available. Another method from the same laboratory, G2p_str²³, combines SVM and Lasso regression and uses the amino acid structure feature. Hivcopred²⁴ is based on SVM^light with the split amino acid composition feature. T-CUP2²⁵ employs random forests (RFs) with the structural information of hydrophobicity and electrostatic potential. Currently, Geno2pheno is the most widely used method and the only genotypic method recommended for usage in clinical routines by the European Consensus Group^5,8.

Genotypic methods can predict R5 viruses (~90%) accurately, but are inaccurate in the prediction of X4-using viruses (~50–70%)²⁶. Thus, more accurate tropism prediction methods are required. Here, we present two methods, XGBpred and HMMpred. We analyzed the HIV-1 tropism prediction ability of our methods and compared them with the Geno2pheno, G2p_str, Hivcopred, CM and WebPSSM methods. The results show that XGBpred is robust with the hard-to-predict dual tropic sequences.

Methods

Datasets

To construct the Newdb dataset, we extracted 6790 R5 tropic, 590 X4 tropic and 1125 dual tropic sequences from the Los Alamos HIV sequence database (http://www.hiv.lanl.gov/, last update: 10 Sep 2017). The tropisms of the sequences from the Los Alamos HIV sequence database have been phenotypically determined, none of them have been inferred from sequences. Then we removed sequences containing non-canonical residues, reserved sequences with lengths between 31 and 39, and dislodged duplicated sequences to guarantee the high quality of genotype/phenotype pairs. This process finally generated 2335 R5 and 663 X4-using (245 X4 and 418 dual) tropic sequences. The distribution of the six major subtypes in the Newdb dataset is shown in Table 1. To compare our methods with the Geno2pheno, G2p_str, Hivcopred, CM and WebPSSM methods, we used the datasets constructed in these studies, respectively. These datasets can be accessed in Supplementary Spreadsheet S1. The distributions of tropisms in different datasets are shown in Table 2.

Table 1.

Distribution of the six major subtypes in the Newdb dataset.

Subtype	Number (R^a, X^b, D^c)	Percentage
B	1503 (1209, 93, 201)	50.13%
C	511 (460, 26, 25)	17.04%
D	233 (120, 52, 61)	7.77%
01_AE	213 (149, 45, 19)	7.10%
A	155 (140, 5, 10)	5.17%
02_AG	124 (50, 3, 71)	4.14%

Open in a new tab

Notes: ^aThe number of R5 tropic sequences. ^bThe number of X4 tropic sequences. ^cThe number of dual tropic sequences.

Table 2.

Distribution of tropisms in the different datasets.

Dataset	R5	X4-using		Sum
Dataset	R5	X4	Dual	Sum
Newdb	2335	245	418	2998
G2p_str²³	973	94	121	1188
Hivcopred^a ²⁴	1768	246	321	2335
CM²²	2354	277	48	2679
WebPSSM²¹	228^b (47^c)	51^b (24^c)		279^b (71^c)

Open in a new tab

Notes: ^aRemoved 31 duplicated sequences from the original Hivcopred dataset which are marked as not only R5 tropism but also X4-using tropism. ^bTraining set. ^cValidation set.

Machine learning method: XGBpred

Extreme gradient boosting (XGboost), like RFs used by T-CUP2²⁵, is an ensemble algorithm of decision trees²⁷. The ensemble works by combining a set of weaker machine learning algorithms to get an improved machine learning algorithm in overall. The main difference between XGboost and RFs is the way of sampling. RFs are based on uniform sampling with return. Instead, XGboost gives higher weights to the wrongly predicted samples in the current weaker leaner, and then these samples will be paid more attention when training the next weaker leaner. In addition, XGboost adds regularization to avoid overfitting. Therefore, XGboost is a more complicated algorithm than RFs, and thus always outperforms.

Because XGboost is designed for vectors, it is necessary to convert V3 loop string sequences of different lengths to numerical vectors. For this task, we used many kinds of features to describe the characteristics of protein sequences, such as split amino acid composition²⁴, dipeptide composition²⁸, and net charge or hydropath²⁹. We also proposed an additional set of features: the alignment score. The 35-dimensional alignment scores were generated by scoring alignments using the block substitution matrices BLOSUM62, BLOSUM90 or BLOSUM100³⁰, and the alignments were generated by aligning sequences to the consensus sequence with 35 residues by the means of Needleman-Wunsch (Version: EMBOSS: 6.6.0)³¹. For the XGBpred method, we tested these different features and their combinations to find the optimal model to discriminate R5 and X4-using sequences.

Statistics method: HMMpred

Hidden Markov model (HMM) is a finite model applied in time series and linear sequences. Just as the PSSM profile, HMM also can be used to describe protein families. The HMM profile described by state-transition and symbol-emission probabilities performs better than PSSM in terms of sequence alignment and homology recognition because it can deal with gaps in protein families better by hidden state chains³².

HMM profile construction

We used the maximum likelihood estimation method to establish R5 and X4-using specific HMM profiles from R5 and X4-using tropic multiple sequence alignments generated by ClustalO³³, respectively. In addition, we simply assigned columns that had more than half gap characters as insertion states. The structure of HMM that we used was no transition allowed from D_j to I_j or from I_j to D_j+1 (This kind of structure performed better than the full structure, as shown in Supplementary Table S1). M, D, and I denote match, deletion and insertion states, respectively.

{\hat{a}}_{k l} = \frac{A_{k l} + 1}{\sum_{l'} A_{k l'} + 3}

{\hat{e}}_{k} (a) = \frac{E_{k} (a) + 1}{\sum_{a'} E_{k} (a') + 21}

Where in, k and l are indices over states M, D, or I; a is an amino acid symbol or gap; ${\hat{a}}_{k l}$ means the estimated probability of transiting from state k to state l, ${\hat{e}}_{k} (a)$ means the estimated probability of emitting residue a at state k, and A_kl and E_k(a) are the corresponding frequencies. In order to avoid the zero probability which represents it cannot happen in the future, we applied the Laplace’s pseudo-count rule that added one to each frequency.

Sequence-profile alignment

We employed Viterbi algorithm³⁴, a dynamic programing algorithm, to get two alignment scores S_R5 and S_non-R5. Those alignment scores represent the optimal state pathway scores from the R5 and X4-using HMM profiles, respectively. the final score was defined as:

S = S_{R 5} - S_{n o n - R 5}

Then the given sequence would be classified as R5 tropic if the final score S is higher than a threshold, otherwise it would be classified as X4-using tropic.

Ten-fold cross validation

The widely-used 10-fold cross validation was used to evaluate the performance of our methods in this study, where the sequences were divided into 10 subsets randomly, one subset was used as the testing set, and the others were used as the training set. After ten repetitions, the final performance was average of the performances of those ten subsets.

Evaluation parameters

For evaluation, we used sensitivity, specificity, accuracy and Matthew’s correlation coefficient (MCC). In particular, MCC is robust even when the size of classes varies widely³⁵. An MCC value ‘0’ corresponds to a completely random prediction, while ‘1’ corresponds to a perfect perdition. These parameters were calculated using the following equations:

Sensitivity = \frac{TP}{T P + F N}

Specificity = \frac{TN}{F P + T N}

Accuracy = \frac{TP + TN}{T P + F P + T N + F N}

MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

where TP is the number of true positives, FP false positives, TN true negatives and FN false negatives. We regarded R5 tropic samples as positives in this study.

In contrast to the four threshold-dependent parameters, the receiver operating characteristic (ROC) curve, a threshold-independent parameter, illustrates the trade-off between sensitivity and specificity at various threshold settings. In this study, we used the area under the curve (AUC) to measure a predictive power, where 0.5 means a random method, and 1 means a perfect method³⁶.

Results

Performance on the Newdb dataset

The feature set and the model that gave the strongest predictive power for the XGBpred and HMMpred methods were found, respectively (Supplementary Tables S1 and S2). The performances of the two methods on the Newdb dataset in a same 10-fold cross validation test are shown in Fig. 1A and Table 3. XGBpred had a higher specificity, accuracy, MCC and AUC than HMMpred when having the same sensitivity. Furthermore, the specificity of XGBpred was higher than 80% (84.62%) at the sensitivity of 91.78%. Results from the two methods were highly consistent: they predicted same tropisms for 87.96% of total samples, and achieved 96.70% sensitivity, 83.39% specificity and 93.93% accuracy.

Performance of the XGBpred and HMMpred methods on the Newdb dataset. (A) ROC curves on the Newdb dataset in a same 10-fold cross validation test. The legend lists AUCs and specificities at the sensitivity of 91.78% which is plotted as the dashed black line. (B) Distribution of V3 loop sequence scores calculated from XGBpred and HMMpred on the Newdb dataset. The score distribution of the R5 tropic sequences is shown in blue, that of X4 is carmine and that of dual is yellow. (C) ROC curves of XGBpred and HMMpred for the six major subtypes. The legend lists AUCs and mAPs.

Table 3.

Performance of the XGBpred and HMMpred methods on the different datasets.

Dataset	Method	Specificity	Accuracy	MCC	AUC
Newdb	XGBpred	84.62%	90.19%	0.7310	0.9465
Newdb	HMMpred	70.59%	87.09%	0.6247	0.8774
G2p_str²³	Geno2pheno²³	61.6%	—	—	0.860
	G2p_str²³	68.6%	—	—	0.892
	XGBpred	72.56%	89.90%	0.6605	0.8952
	HMMpred	72.09%	89.81%	0.6570	0.9002
Hivcopred²⁴	Hivcopred²⁴	81.44%	87.07%	0.67	0.904
	XGBpred	87.13%	88.52%	0.7154	0.9483
	HMMpred	71.08%	84.63%	0.5899	0.8829
CM²²	CM²²	92.92%	95.21%	0.885	0.97
	XGBpred	93.85%	95.33%	0.8106	0.9809
	HMMpred	89.54%	94.81%	0.7826	0.9635
WebPSSM²¹	WebPSSM²¹	83.3%	—	—	0.881
	XGBpred	83.33%	83.10%	0.6419	0.9043
	HMMpred	75.00%	80.28%	0.5693	0.8678

Open in a new tab

Performance of XGBpred and HMMpred on the Newdb, G2p_str, Hivcopred, CM and WebPSSM datasets at the sensitivities of 91.78%, 93.73%, 89.99%, 95.54% and 82.98%, respectively.

Considering the poorer performance of HMMpred, the score distributions of the two methods were plotted (Fig. 1B). As depicted, the scores of dual tropic sequences mostly placed in the middle of the scores of X4 and R5 tropic sequences. Furthermore, HMMpred generated higher scores for a considerable number of dual tropic samples than XGBpred. This phenomenon illustrates that it is hard for dual tropic sequences to be correctly classified, especially by HMMpred.

The performances of the two methods for the six major subtypes (subtypes B, C, D, 01_AE, A and 02_AG) in the Newdb dataset were analyzed due to the sequence divergence among different subtypes and the different number of sequences in each subtype (Fig. 1C). HMMpred for subtypes B and D showed much lower AUCs (0.8942 and 0.8839) than for subtypes C and 01_AE (0.9369 and 0.9486). The reason was that subtypes B and D contained more hard-to-predict dual tropic sequences (Table 1). This also resulted in a low AUC (0.5887) for subtype 02_AG, and a higher AUC (0.9029) for subtype A than for subtype D (0.8839) generated by HMMpred. In contrast, the performance of XGBpred was not so deeply influenced by dual tropic sequences. XGBpred had higher AUCs for the top four most common subtypes (subtypes B, C, D and 01_AE) than for subtypes A and 02_AG. In addition, The V3 loops of subtypes 01_AE and 02_AG come from subtypes E and A, respectively^37,38. This can also further lead to the weaker predictive power for subtypes A and 02_AG as it is a trickier task to determine tropism for subtype A than the other subtypes^39,40. Besides, the large biases existed between the number of R5 and X4-using samples for subtypes C and A (Table 1). Therefore, we also reported the mean average precision (mAP) of the two classes (Fig. 1C). AP is the areas under the precision-recall curve for a certain class. The bigger the mAP is, the better the method preforms. The mAPs and AUCs demonstrated the same tendency for the predictive power of our methods. Among all subtypes, just as AUCs, XGBpred and HMMpred showed the highest mAPs (0.9646, 0.9491) for subtype 01_AE. Moreover, for both XGBpred and HMMpred, the divergences between AUCs and mAPs for subtypes C and A were biggest. This may arise from the large biases between the amount of R5 and X4-using samples.

Comparison with other methods

In this section, to evaluate our methods, we compared with the previously published methods Geno2pheno, G2p_str²³, Hivcopred²⁴, CM²², and WebPSSM²¹ by implementing our methods in a 10-fold cross validation test on the datasets used in these published methods, respectively. The exception was WebPSSM²¹ where we used the training set from WebPSSM to model our methods in a 10-fold cross validation test and used the validation set from WebPSSM to test (Table 3).

First when comparing with the Geno2pheno and G2p_str methods²³, XGBpred and HMMpred achieved AUCs of 0.8952 and 0.9002, respectively. Our methods had higher AUCs than Geno2pheno (0.860) and G2p_str (0.892). In addition, XGBpred and HMMpred achieved specificities of 72.56% and 72.09% at the sensitivity of 93.73%. The specificities were obviously higher than the specificities of Geno2pheno (61.6%) and G2p_str (68.6%) at the same sensitivity. Second, when comparing with the Hivcopred method²⁴, XGBpred had a higher AUC (0.9483) than Hivcopred (0.904), but HMMpred had a low AUC (0.8829) as on the Newdb dataset. Third, when comparing with the CM method²². Our methods were as accurate as the CM method on the CM dataset which only contains a small amount of hard-to-predict dual tropic samples (Table 2). Finally, when comparing with the WebPSSM method²¹, although the WebPSSM dataset is small, XGBpred had a higher AUC (0.9043) than WebPSSM (0.881), and HMMpred presented a similar AUC (0.8678) with WebPSSM.

Feature importance analysis

Given the high performance of XGBpred presented in the previous subsections, we discussed which features XGBpred provided with its predictive power (Fig. 2). We did not analyze the feature importance on the WebPSSM dataset as it contains few training samples (Table 2). In the XGBpred method, the feature alignment score in the 5^th position of the V3 loop appeared in the top three most important features on all datasets. Interestingly, amino acid Tyr in position 5 appeared more frequently in X4-using tropic than in R5 tropic sequences (Supplementary Fig. S1). Currently, X4-using tropism can be predicted by the 11/25 rule¹⁹. However, since position 5 was as same important as positions 11 and 25, the pragmatic 11/25/5 rule was proposed to predict a virus as X4-using tropic by the presence of a positively charged amino acid in positions 11 or 25, or by the presence of amino acid Tyr in position 5 of its V3 loop. Compared with the 11/25 rule, the 11/25/5 rule reduced sensitivities by 1.29%, 1.03%, 1.14% and 1.19% on the Newdb, G2p_str, Hivcopred and CM datasets while increasing specificities by 7.39%, 5.11%, 6.34% and 10.77%, respectively. The 11/25/5 rule also had higher accuracies and MCCs than the 11/25 rule on the four datasets, which indicates the influence of amino acid Tyr in position 5 with regard to viral tropism (Supplementary Table S3). In addition to positions 5, 11 and 25, positions 13, 18, 22 and 24 also ranked in the top ten most important features on the four datasets. Two exceptions were position 18 ranked 21^st on the G2p_str dataset, and position 22 ranked 14^th on the CM dataset. Indeed, all the positions that we identified as correlated with HIV-1 tropism are exactly in accordance with the results from Sander et al.⁴¹ who point to the residues 298 (3), 302 (7), 306 (11), 308 (13), 315 (18), 317 (20), 319 (22), 321 (24), 322 (25) and 328 (32) are important for tropism. Furthermore, the feature importance distribution generated by XGBpred is a feasible method to judge whether a new discovered association pattern is of importance to co-receptor usage or not.

Distribution of feature importance scores. The top 30 most important features indicated by XGBpred on the Newdb, G2p_str, Hivcopred and CM datasets. S# means the alignment score in position #, and R1R2 represents a dipeptide.

Discussion

In this study, we present two methods, XGBpred and HMMpred, for HIV-1 co-receptor usage prediction. XGBpred is based on machine learning, and HMMpred is based on statistics. XGBpred performed best on the Hivcopred and Newdb datasets containing larger proportions of hard-to-predict dual tropic samples in the X4-using samples, while HMMpred performed worst. In contrast, the predictive powers of the two methods were similar on the smaller G2p_str and CM datasets containing fewer dual tropic samples (Table 3). The poor ability of HMMpred to predict tropism stemmed from the high probability that HMMpred incorrectly predicted dual tropic samples as R5 tropic (Fig. 1B and Supplementary Fig. S2). The profiles used in HMMpred may not be meticulous enough. Several reasons may account for this phenomenon. Firstly, the two sequence families are highly similar since even one amino acid substitution may change their tropisms^42,43. Secondly, the characteristics of dual tropic sequences may be overwhelmed by R5 and X4 tropic sequences. Finally, the unavailability of X4-using tropic samples makes it uncertain to learn its accurate HMM profile. Moreover, as the number of samples increased, the gap of predictive powers between XGBpred and HMMpred became large (Tables 2 and 3). This corresponds to the fact that the machine learning based Geno2pheno method is more widely used than the statistics based 11/25 rule and WebPSSM. As a result, a machine learning based method, in particular XGBpred, is recommended to predict co-receptor usage as the number of samples continues to expand.

In an effort to further increase the predictive power, we also generated three meta methods by the means of stacking⁴⁴. The scores generated by XGBpred, Hivcopred (SVM^light) and HMMpred were added as additional features to the new stacking based XGBpred models. Compared with the original XGBpred method, the new stacking-based XGBpred methods had slightly higher AUCs on the G2p_str dataset but lower AUCs on the other datasets (Supplementary Table S4). The poor performances of the meta methods may due to the poorer predictive abilities of Hivcopred and HMMpred than the original XGBpred method, and/or the dependence of the results generated by XGBpred, Hivcopred and HMMpred (Supplementary Table S5). This may stem from the fact that the V3 loop is not the sole determinant of viral tropism. Moreover, V1, V2, C4 and the bridge sheet regions of gp120 also have an impact on co-receptor usage^45,46. To predict tropism, several methods gain a higher accuracy by employing other information in addition to the V3 loop, such as clinical information⁴⁷, V2 loop sequences⁴⁸ and structure information^23,25,41. Therefore, the stacking based method can be constructed to improve its predictive power by combining methods with different kinds of information.

In summary, the two methods we developed performed comparably on the datasets containing less hard-to-predict dual tropic sequences, but XGBpred performed much better on the datasets with more dual tropic sequences. This means XGBpred is more robust to predict dual tropic sequences than other methods. Thus, we strongly recommend to use XGBpred to predict viral tropism. Our two methods have been implemented as a freely available webserver under http://spg.med.tsinghua.edu.cn:23334/XGBpred/.

Supplementary information

41598_2019_46420_MOESM1_ESM.pdf^{(360.1KB, pdf)}

Figures S1, S2; Tables S1, S2, S3, S4, S5

Supplementary Spreadsheet S1^{(283.6KB, xlsx)}

Acknowledgements

This work was partly supported by Grant 31770841 from the National Natural Science Foundation of China, and 2016YFA0502114 from the Ministry of Science and Technology of China. The funding source had no role in the design or conduct of the study, or in the collection, analysis, or interpretation of the data.

Author Contributions

X.C. designed the study, developed the method, implemented the data analysis and wrote the manuscript. Z.-X.W and X.-M.P. participated in study design and revision of the manuscript. All the authors read and approved the final manuscript.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information accompanies this paper at 10.1038/s41598-019-46420-4.

References

1.Hladik F, et al. Initial events in establishing vaginal entry and infection by human immunodeficiency virus type-1. Immunity. 2007;26:257–270. doi: 10.1016/j.immuni.2007.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Wilen C. B., Tilton J. C., Doms R. W. HIV: Cell Binding and Entry. Cold Spring Harbor Perspectives in Medicine. 2012;2(8):a006866–a006866. doi: 10.1101/cshperspect.a006866. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Koning, F., van Rij, R. & Schuitemaker, H. Biological and Molecular Aspects of HIV1 Coreceptor Usage. (2019).
4.Lengauer T, Sander O, Sierra S, Thielen A, Kaiser R. Bioinformatics prediction of HIV coreceptor usage. Nature biotechnology. 2007;25:1407–1410. doi: 10.1038/nbt1371. [DOI] [PubMed] [Google Scholar]
5.Vandekerckhove LPR, et al. European guidelines on the clinical management of HIV-1 tropism testing. The Lancet Infectious Diseases. 2011;11:394–407. doi: 10.1016/s1473-3099(10)70319-4. [DOI] [PubMed] [Google Scholar]
6.Berger EA, et al. A new classification for HIV-1. Nature. 1998;391:240. doi: 10.1038/34571. [DOI] [PubMed] [Google Scholar]
7.Hoffmann C. The epidemiology of HIV coreceptor tropism. European journal of medical research. 2007;12:385–390. [PubMed] [Google Scholar]
8.Panos G, Watson DC. Effect of HIV-1 subtype and tropism on treatment with chemokine coreceptor entry inhibitors; overview of viral entry inhibition. Crit Rev Microbiol. 2015;41:473–487. doi: 10.3109/1040841X.2013.867829. [DOI] [PubMed] [Google Scholar]
9.Huang Y, et al. The role of a mutant CCR5 allele in HIV-1 transmission and disease progression. Nature medicine. 1996;2:1240–1243. doi: 10.1038/nm1196-1240. [DOI] [PubMed] [Google Scholar]
10.Gutiérrez F, Carlos Rodríguez J, García F, Poveda E, Tropismo del VIH. Técnicas disponibles y utilidad. Enfermedades Infecciosas y Microbiología Clínica. 2011;29:45–50. doi: 10.1016/S0213-005X(11)70043-X. [DOI] [PubMed] [Google Scholar]
11.Naif HM. Pathogenesis of HIV Infection. Infectious disease reports. 2013;5:e6. doi: 10.4081/idr.2013.s1.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Berger EA, Murphy PM, Farber JM. Chemokine receptors as HIV-1 coreceptors: roles in viral entry, tropism, and disease. Annual review of immunology. 1999;17:657–700. doi: 10.1146/annurev.immunol.17.1.657. [DOI] [PubMed] [Google Scholar]
13.Tsamis F, et al. Analysis of the mechanism by which the small-molecule CCR5 antagonists SCH-351125 and SCH-350581 inhibit human immunodeficiency virus type 1 entry. Journal of virology. 2003;77:5201–5208. doi: 10.1128/JVI.77.9.5201-5208.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Nelson, M. & Panos, G. Resistance to Chemokine (C-C Motif) Receptor 5 Antagonists HIV and AIDS CCR-5 Virus. (2007).
15.Westby M, et al. Emergence of CXCR4-using human immunodeficiency virus type 1 (HIV-1) variants in a minority of HIV-1-infected patients following treatment with the CCR5 antagonist maraviroc is from a pretreatment CXCR4-using virus reservoir. Journal of virology. 2006;80:4909–4920. doi: 10.1128/jvi.80.10.4909-4920.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Su Z, et al. Response to vicriviroc in treatment-experienced subjects, as determined by an enhanced-sensitivity coreceptor tropism assay: reanalysis of AIDS clinical trials group A5211. The Journal of infectious diseases. 2009;200:1724–1728. doi: 10.1086/648090. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Obermeier M, Symons J, Wensing AM. HIV population genotypic tropism testing and its clinical significance. Curr Opin HIV AIDS. 2012;7:470–477. doi: 10.1097/COH.0b013e328356eaa7. [DOI] [PubMed] [Google Scholar]
18.Huang W, et al. Coreceptor tropism can be influenced by amino acid substitutions in the gp41 transmembrane subunit of human immunodeficiency virus type 1 envelope protein. Journal of virology. 2008;82:5584–5593. doi: 10.1128/jvi.02676-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Fouchier RA, et al. Phenotype-associated sequence variation in the third variable domain of the human immunodeficiency virus type 1 gp120 molecule. Journal of virology. 1992;66:3183–3187. doi: 10.1128/jvi.66.5.3183-3187.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Jensen MA, et al. Improved Coreceptor Usage Prediction and Genotypic Monitoring of R5-to-X4 Transition by Motif Analysis of Human Immunodeficiency Virus Type 1 env V3 Loop Sequences. Journal of virology. 2003;77:13376–13388. doi: 10.1128/jvi.77.24.13376-13388.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Jensen MA, Coetzer M, van ‘t Wout AB, Morris L, Mullins JI. A reliable phenotype predictor for human immunodeficiency virus type 1 subtype C based on envelope V3 sequences. Journal of virology. 2006;80:4698–4704. doi: 10.1128/JVI.80.10.4698-4704.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Shen HS, et al. HIV coreceptor tropism determination and mutational pattern identification. Sci Rep. 2016;6:21280. doi: 10.1038/srep21280. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Bozek K, Lengauer T, Sierra S, Kaiser R, Domingues FS. Analysis of physicochemical and structural properties determining HIV-1 coreceptor usage. PLoS Comput Biol. 2013;9:e1002977. doi: 10.1371/journal.pcbi.1002977. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kumar R, Raghava GP. Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence. PLoS One. 2013;8:e61437. doi: 10.1371/journal.pone.0061437. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Heider D, Dybowski JN, Wilms C, Hoffmann D. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData mining. 2014;7:14. doi: 10.1186/1756-0381-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Agwu AL, et al. Phenotypic Coreceptor Tropism in Perinatally HIV-infected Youth Failing Antiretroviral Therapy. The Pediatric Infectious Disease Journal. 2016;35:777–781. doi: 10.1097/inf.0000000000001158. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chen, T. & Guestrin, C. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, San Francisco, California, USA, 2016).
28.Bhasin M, Raghava GP. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem. 2004;279:23262–23266. doi: 10.1074/jbc.M401932200. [DOI] [PubMed] [Google Scholar]
29.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of molecular biology. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
30.Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America. 1992;89:10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
32.Eddy SR. Profile hidden Markov models. Bioinformatics (Oxford, England) 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
33.Sievers F, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.David Forney, G. Jr. The Viterbi Algorithm: A Personal History. (2005).
35.Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et biophysica acta. 1975;405:442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
36.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
37.Gao F, et al. The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. Journal of virology. 1996;70:7013–7029. doi: 10.1128/jvi.70.10.7013-7029.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Carr JK, et al. Full genome sequences of human immunodeficiency virus type 1 subtypes G and A/G intersubtype recombinants. Virology. 1998;247:22–31. doi: 10.1006/viro.1998.9211. [DOI] [PubMed] [Google Scholar]
39.Riemenschneider M, et al. Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C. Sci Rep. 2016;6:24883. doi: 10.1038/srep24883. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Lochel HF, Riemenschneider M, Frishman D, Heider D. SCOTCH: subtype A coreceptor tropism classification in HIV-1. Bioinformatics (Oxford, England) 2018;34:2575–2580. doi: 10.1093/bioinformatics/bty170. [DOI] [PubMed] [Google Scholar]
41.Sander O, et al. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput Biol. 2007;3:e58. doi: 10.1371/journal.pcbi.0030058. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Shimizu N, et al. Changes in and discrepancies between cell tropisms and coreceptor uses of human immunodeficiency virus type 1 induced by single point mutations at the V3 tip of the env protein. Virology. 1999;259:324–333. doi: 10.1006/viro.1999.9764. [DOI] [PubMed] [Google Scholar]
43.De Jong JJ, De Ronde A, Keulen W, Tersmette M, Goudsmit J. Minimal requirements for the human immunodeficiency virus type 1 V3 domain to support the syncytium-inducing phenotype: analysis by single amino acid substitution. Journal of virology. 1992;66:6777–6780. doi: 10.1128/jvi.66.11.6777-6780.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Wolpert DH. Stacked generalization. Neural Networks. 1992;5:241–259. doi: 10.1016/S0893-6080(05)80023-1. [DOI] [Google Scholar]
45.Monno L, et al. Impact of mutations outside the V3 region on coreceptor tropism phenotypically assessed in patients infected with HIV-1 subtype B. Antimicrobial agents and chemotherapy. 2011;55:5078–5084. doi: 10.1128/aac.00743-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Dimonte S, et al. Selected amino acid mutations in HIV-1 B subtype gp41 are associated with specific gp120v(3) signatures in the regulation of co-receptor usage. Retrovirology. 2011;8:33. doi: 10.1186/1742-4690-8-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Brumme ZL, et al. Molecular and clinical epidemiology of CXCR4-using HIV-1 in a large population of antiretroviral-naive individuals. The Journal of infectious diseases. 2005;192:466–474. doi: 10.1086/431519. [DOI] [PubMed] [Google Scholar]
48.Thielen A, et al. Improved prediction of HIV-1 coreceptor usage with sequence information from the second hypervariable loop of gp120. The Journal of infectious diseases. 2010;202:1435–1443. doi: 10.1086/656600. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

41598_2019_46420_MOESM1_ESM.pdf^{(360.1KB, pdf)}

Figures S1, S2; Tables S1, S2, S3, S4, S5

Supplementary Spreadsheet S1^{(283.6KB, xlsx)}

[CR1] 1.Hladik F, et al. Initial events in establishing vaginal entry and infection by human immunodeficiency virus type-1. Immunity. 2007;26:257–270. doi: 10.1016/j.immuni.2007.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Wilen C. B., Tilton J. C., Doms R. W. HIV: Cell Binding and Entry. Cold Spring Harbor Perspectives in Medicine. 2012;2(8):a006866–a006866. doi: 10.1101/cshperspect.a006866. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Koning, F., van Rij, R. & Schuitemaker, H. Biological and Molecular Aspects of HIV1 Coreceptor Usage. (2019).

[CR4] 4.Lengauer T, Sander O, Sierra S, Thielen A, Kaiser R. Bioinformatics prediction of HIV coreceptor usage. Nature biotechnology. 2007;25:1407–1410. doi: 10.1038/nbt1371. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Vandekerckhove LPR, et al. European guidelines on the clinical management of HIV-1 tropism testing. The Lancet Infectious Diseases. 2011;11:394–407. doi: 10.1016/s1473-3099(10)70319-4. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Berger EA, et al. A new classification for HIV-1. Nature. 1998;391:240. doi: 10.1038/34571. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Hoffmann C. The epidemiology of HIV coreceptor tropism. European journal of medical research. 2007;12:385–390. [PubMed] [Google Scholar]

[CR8] 8.Panos G, Watson DC. Effect of HIV-1 subtype and tropism on treatment with chemokine coreceptor entry inhibitors; overview of viral entry inhibition. Crit Rev Microbiol. 2015;41:473–487. doi: 10.3109/1040841X.2013.867829. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Huang Y, et al. The role of a mutant CCR5 allele in HIV-1 transmission and disease progression. Nature medicine. 1996;2:1240–1243. doi: 10.1038/nm1196-1240. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Gutiérrez F, Carlos Rodríguez J, García F, Poveda E, Tropismo del VIH. Técnicas disponibles y utilidad. Enfermedades Infecciosas y Microbiología Clínica. 2011;29:45–50. doi: 10.1016/S0213-005X(11)70043-X. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Naif HM. Pathogenesis of HIV Infection. Infectious disease reports. 2013;5:e6. doi: 10.4081/idr.2013.s1.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Berger EA, Murphy PM, Farber JM. Chemokine receptors as HIV-1 coreceptors: roles in viral entry, tropism, and disease. Annual review of immunology. 1999;17:657–700. doi: 10.1146/annurev.immunol.17.1.657. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Tsamis F, et al. Analysis of the mechanism by which the small-molecule CCR5 antagonists SCH-351125 and SCH-350581 inhibit human immunodeficiency virus type 1 entry. Journal of virology. 2003;77:5201–5208. doi: 10.1128/JVI.77.9.5201-5208.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Nelson, M. & Panos, G. Resistance to Chemokine (C-C Motif) Receptor 5 Antagonists HIV and AIDS CCR-5 Virus. (2007).

[CR15] 15.Westby M, et al. Emergence of CXCR4-using human immunodeficiency virus type 1 (HIV-1) variants in a minority of HIV-1-infected patients following treatment with the CCR5 antagonist maraviroc is from a pretreatment CXCR4-using virus reservoir. Journal of virology. 2006;80:4909–4920. doi: 10.1128/jvi.80.10.4909-4920.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Su Z, et al. Response to vicriviroc in treatment-experienced subjects, as determined by an enhanced-sensitivity coreceptor tropism assay: reanalysis of AIDS clinical trials group A5211. The Journal of infectious diseases. 2009;200:1724–1728. doi: 10.1086/648090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Obermeier M, Symons J, Wensing AM. HIV population genotypic tropism testing and its clinical significance. Curr Opin HIV AIDS. 2012;7:470–477. doi: 10.1097/COH.0b013e328356eaa7. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Huang W, et al. Coreceptor tropism can be influenced by amino acid substitutions in the gp41 transmembrane subunit of human immunodeficiency virus type 1 envelope protein. Journal of virology. 2008;82:5584–5593. doi: 10.1128/jvi.02676-07. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Fouchier RA, et al. Phenotype-associated sequence variation in the third variable domain of the human immunodeficiency virus type 1 gp120 molecule. Journal of virology. 1992;66:3183–3187. doi: 10.1128/jvi.66.5.3183-3187.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Jensen MA, et al. Improved Coreceptor Usage Prediction and Genotypic Monitoring of R5-to-X4 Transition by Motif Analysis of Human Immunodeficiency Virus Type 1 env V3 Loop Sequences. Journal of virology. 2003;77:13376–13388. doi: 10.1128/jvi.77.24.13376-13388.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Jensen MA, Coetzer M, van ‘t Wout AB, Morris L, Mullins JI. A reliable phenotype predictor for human immunodeficiency virus type 1 subtype C based on envelope V3 sequences. Journal of virology. 2006;80:4698–4704. doi: 10.1128/JVI.80.10.4698-4704.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Shen HS, et al. HIV coreceptor tropism determination and mutational pattern identification. Sci Rep. 2016;6:21280. doi: 10.1038/srep21280. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Bozek K, Lengauer T, Sierra S, Kaiser R, Domingues FS. Analysis of physicochemical and structural properties determining HIV-1 coreceptor usage. PLoS Comput Biol. 2013;9:e1002977. doi: 10.1371/journal.pcbi.1002977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Kumar R, Raghava GP. Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence. PLoS One. 2013;8:e61437. doi: 10.1371/journal.pone.0061437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Heider D, Dybowski JN, Wilms C, Hoffmann D. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData mining. 2014;7:14. doi: 10.1186/1756-0381-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Agwu AL, et al. Phenotypic Coreceptor Tropism in Perinatally HIV-infected Youth Failing Antiretroviral Therapy. The Pediatric Infectious Disease Journal. 2016;35:777–781. doi: 10.1097/inf.0000000000001158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Chen, T. & Guestrin, C. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, San Francisco, California, USA, 2016).

[CR28] 28.Bhasin M, Raghava GP. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem. 2004;279:23262–23266. doi: 10.1074/jbc.M401932200. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of molecular biology. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America. 1992;89:10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Eddy SR. Profile hidden Markov models. Bioinformatics (Oxford, England) 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Sievers F, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.David Forney, G. Jr. The Viterbi Algorithm: A Personal History. (2005).

[CR35] 35.Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et biophysica acta. 1975;405:442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Gao F, et al. The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. Journal of virology. 1996;70:7013–7029. doi: 10.1128/jvi.70.10.7013-7029.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Carr JK, et al. Full genome sequences of human immunodeficiency virus type 1 subtypes G and A/G intersubtype recombinants. Virology. 1998;247:22–31. doi: 10.1006/viro.1998.9211. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Riemenschneider M, et al. Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C. Sci Rep. 2016;6:24883. doi: 10.1038/srep24883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Lochel HF, Riemenschneider M, Frishman D, Heider D. SCOTCH: subtype A coreceptor tropism classification in HIV-1. Bioinformatics (Oxford, England) 2018;34:2575–2580. doi: 10.1093/bioinformatics/bty170. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Sander O, et al. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput Biol. 2007;3:e58. doi: 10.1371/journal.pcbi.0030058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Shimizu N, et al. Changes in and discrepancies between cell tropisms and coreceptor uses of human immunodeficiency virus type 1 induced by single point mutations at the V3 tip of the env protein. Virology. 1999;259:324–333. doi: 10.1006/viro.1999.9764. [DOI] [PubMed] [Google Scholar]

[CR43] 43.De Jong JJ, De Ronde A, Keulen W, Tersmette M, Goudsmit J. Minimal requirements for the human immunodeficiency virus type 1 V3 domain to support the syncytium-inducing phenotype: analysis by single amino acid substitution. Journal of virology. 1992;66:6777–6780. doi: 10.1128/jvi.66.11.6777-6780.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Wolpert DH. Stacked generalization. Neural Networks. 1992;5:241–259. doi: 10.1016/S0893-6080(05)80023-1. [DOI] [Google Scholar]

[CR45] 45.Monno L, et al. Impact of mutations outside the V3 region on coreceptor tropism phenotypically assessed in patients infected with HIV-1 subtype B. Antimicrobial agents and chemotherapy. 2011;55:5078–5084. doi: 10.1128/aac.00743-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Dimonte S, et al. Selected amino acid mutations in HIV-1 B subtype gp41 are associated with specific gp120v(3) signatures in the regulation of co-receptor usage. Retrovirology. 2011;8:33. doi: 10.1186/1742-4690-8-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Brumme ZL, et al. Molecular and clinical epidemiology of CXCR4-using HIV-1 in a large population of antiretroviral-naive individuals. The Journal of infectious diseases. 2005;192:466–474. doi: 10.1086/431519. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Thielen A, et al. Improved prediction of HIV-1 coreceptor usage with sequence information from the second hypervariable loop of gp120. The Journal of infectious diseases. 2010;202:1435–1443. doi: 10.1086/656600. [DOI] [PubMed] [Google Scholar]

PERMALINK

HIV-1 tropism prediction by the XGboost and HMM methods

Xiang Chen

Zhi-Xin Wang

Xian-Ming Pan

Abstract

Introduction

Methods

Datasets

Table 1.

Table 2.

Machine learning method: XGBpred

Statistics method: HMMpred

HMM profile construction

Sequence-profile alignment

Ten-fold cross validation

Evaluation parameters

Results

Performance on the Newdb dataset

Figure 1.

Table 3.

Comparison with other methods

Feature importance analysis

Figure 2.

Discussion

Supplementary information

Acknowledgements

Author Contributions

Competing Interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

HIV-1 tropism prediction by the XGboost and HMM methods

Xiang Chen

Zhi-Xin Wang

Xian-Ming Pan

Abstract

Introduction

Methods

Datasets

Table 1.

Table 2.

Machine learning method: XGBpred

Statistics method: HMMpred

HMM profile construction

Sequence-profile alignment

Ten-fold cross validation

Evaluation parameters

Results

Performance on the Newdb dataset

Figure 1.

Table 3.

Comparison with other methods

Feature importance analysis

Figure 2.

Discussion

Supplementary information

Acknowledgements

Author Contributions

Competing Interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases