Skip to main content
Virologica Sinica logoLink to Virologica Sinica
. 2013 Jul 27;28(4):228–238. doi: 10.1007/s12250-013-3348-z

Inference of global HIV-1 sequence patterns and preliminary feature analysis

Yan Wang 1, Reda Rawi 2, Daniel Hoffmann 2, Binlian Sun 1,, Rongge Yang 1,
PMCID: PMC8208351  PMID: 23913180

Abstract

The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01_AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01_AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.

Keywords: Pattern inference, global HIV-1 sequence, Repeated Incremental Pruning to Produce Error Reduction (RIPPER)

Contributor Information

Binlian Sun, Phone: +86-27-87198736, FAX: +86-27-87198736, Email: sunbl@wh.iov.cn.

Rongge Yang, Phone: +86-27-87198736, FAX: +86-27-87198736, Email: ryang@wh.iov.cn.

References

  1. Avenue M, Hill M, Cohen W W, Of C, Pruning R. Fast E ective Rule Induction 2 Previous work 1 in introduction. 1994. [Google Scholar]
  2. Bello G, Eyer-Silva W a, Couto-Fernandez J C, Guimarães M L, Chequer-Fernandez S L, Teixeira S L M, Morgado M G. Demographic history of HIV-1 subtypes B and F in Brazil. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2007;7:263–270. doi: 10.1016/j.meegid.2006.11.002. [DOI] [PubMed] [Google Scholar]
  3. Blair C, Murphy R W. Recent trends in molecular phylogenetic analysis: where to next? The Journal of heredity. 2011;102:130–138. doi: 10.1093/jhered/esq092. [DOI] [PubMed] [Google Scholar]
  4. Buonaguro L, Tagliamonte M, Tornesello M L, Buonaguro F M. Genetic and phylogenetic evolution of HIV-1 in a low subtype heterogeneity epidemic: the Italian example. Retrovirology. 2007;4:34–34. doi: 10.1186/1742-4690-4-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Butler I F, Pandrea I, Marx P a, Apetrei C. HIV genetic diversity: biological and public health consequences. Current HIV research. 2007;5:23–45. doi: 10.2174/157016207779316297. [DOI] [PubMed] [Google Scholar]
  6. Cai Y-D, Lu L, Chen L, He J-F. Predicting subcellular location of proteins using integrated-algorithm method. Molecular diversity. 2010;14:551–558. doi: 10.1007/s11030-009-9182-4. [DOI] [PubMed] [Google Scholar]
  7. Crooks G E, Hon G, Chandonia J-m, Brenner S E. WebLogo: A Sequence Logo Generator. 2004. pp. 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Delano W L, Ph D. PyMOL User’ s Guide written by. 2004. [Google Scholar]
  9. Delatorre E O, Bello G. Phylodynamics of HIV-1 subtype C epidemic in east Africa. PloS one. 2012;7:e41904–e41904. doi: 10.1371/journal.pone.0041904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dybowski J N, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData mining. 2011;4:26–26. doi: 10.1186/1756-0381-4-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Edgar R C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fauci A S, Johnston M I, Dieffenbach C W, Burton D R, Hammer S M, Hoxie J a, Martin M, Overbaugh J, Watkins D I, Mahmoud A, Greene W C. HIV vaccine research: the way forward. Science (New York, N.Y.) 2008;321:530–532. doi: 10.1126/science.1161000. [DOI] [PubMed] [Google Scholar]
  13. Fryer H R, McLean A R. Modelling the spread of HIV immune escape mutants in a vaccinated population. PLoS computational biology. 2011;7:e1002289–e1002289. doi: 10.1371/journal.pcbi.1002289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gentleman R C, Carey V J, Bates D M, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J. Bioconductor: open software development for computational biology and bioinformatics. Genome biology. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gilbert M T P, Rambaut A, Wlasiuk G, Spira T J, Pitchenik A E, Worobey M. The emergence of HIV/AIDS in the Americas and beyond. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:18566–18570. doi: 10.1073/pnas.0705329104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Grant B J, Rodrigues A P C, ElSawy K M, McCammon J A, Caves L S D. Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics. 2006;22:2695–2696. doi: 10.1093/bioinformatics/btl461. [DOI] [PubMed] [Google Scholar]
  17. Hemelaar J. The origin and diversity of the HIV-1 pandemic. Trends in Molecular Medicine. 2012;18:182–192. doi: 10.1016/j.molmed.2011.12.001. [DOI] [PubMed] [Google Scholar]
  18. Hornik K, Buchta C, Zeileis A. Open-source machine learning: R meets Weka. Computational Statistics. 2009;24:225–232. doi: 10.1007/s00180-008-0119-7. [DOI] [Google Scholar]
  19. Junqueira D M, de Medeiros R M, Matte M C C, Araújo L A L, Chies J A B, Ashton-Prolla P, Almeida S E D M. Reviewing the history of HIV-1: spread of subtype B in the Americas. PloS one. 2011;6:e27489–e27489. doi: 10.1371/journal.pone.0027489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kallings L O. The first postmodern pandemic: 25 years of HIV/AIDS. Journal of internal medicine. 2008;263:218–243. doi: 10.1111/j.1365-2796.2007.01910.x. [DOI] [PubMed] [Google Scholar]
  21. Karlsson Hedestam G B, Fouchier R a M, Phogat S, Burton D R, Sodroski J, Wyatt R T. The challenges of eliciting neutralizing antibodies to HIV-1 and to influenza virus. Nature reviews. Microbiology. 2008;6:143–155. doi: 10.1038/nrmicro1819. [DOI] [PubMed] [Google Scholar]
  22. Li Y, Uenishi R, Hase S, Liao H, Li X-J, Tsuchiura T, Tee K K, Pybus O G, Takebe Y. Explosive HIV-1 subtype B’ epidemics in Asia driven by geographic and risk group founder events. Virology. 2010;402:223–227. doi: 10.1016/j.virol.2010.03.048. [DOI] [PubMed] [Google Scholar]
  23. Liao H, Tee K K, Hase S, Uenishi R, Li X-J, Kusagawa S, Thang P H, Hien N T, Pybus O G, Takebe Y. Phylodynamic analysis of the dissemination of HIV-1 CRF01_AE in Vietnam. Virology. 2009;391:51–56. doi: 10.1016/j.virol.2009.05.023. [DOI] [PubMed] [Google Scholar]
  24. Lihana R W. Update on HIV-1 Diversity in Africa: A Decade in Review. 2012. pp. 83–100. [PubMed] [Google Scholar]
  25. Liu J, Zhang C. Phylogeographic analyses reveal a crucial role of Xinjiang in HIV-1 CRF07_BC and HCV 3a transmissions in Asia. PloS one. 2011;6:e23347–e23347. doi: 10.1371/journal.pone.0023347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11. Nucleic Acids Research. 2008;36:W509–W512. doi: 10.1093/nar/gkn202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lynch R M, Shen T, Gnanakaran S, Derdeyn C a. Appreciating HIV type 1 diversity: subtype differences in Env. AIDS research and human retroviruses. 2009;25:237–248. doi: 10.1089/aid.2008.0219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Masciotra S, Livellara B, Belloso W, Clara L, Tanuri a, Ramos a C, Baggs J, Lal R, Pieniazek D. Evidence of a high frequency of HIV-1 subtype F infections in a heterosexual population in Buenos Aires, Argentina. AIDS research and human retroviruses. 2000;16:1007–1014. doi: 10.1089/08892220050058425. [DOI] [PubMed] [Google Scholar]
  29. Meng Z, Xin R, Zhong P, Zhang C, Abubakar Y F, Li J, Liu W, Zhang X, Xu J. A new migration map of HIV-1 CRF07_BC in China: analysis of sequences from 12 provinces over a decade. PloS one. 2012;7:e52373–e52373. doi: 10.1371/journal.pone.0052373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Moran D, Jordaan J a. HIV/AIDS in Russia: determinants of regional prevalence. International journal of health geographics. 2007;6:22–22. doi: 10.1186/1476-072X-6-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D S, Sander C, Zecchina R, Onuchic J N, Hwa T, and Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences of the United States of America, 108: E1293–E1301. [DOI] [PMC free article] [PubMed]
  32. Morris C N, Ferguson a G. Estimation of the sexual transmission of HIV in Kenya and Uganda on the trans-Africa highway: the continuing role for prevention in high risk groups. Sexually transmitted infections. 2006;82:368–371. doi: 10.1136/sti.2006.020933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Njai H F, Gali Y, Vanham G, Clybergh C, Jennes W, Vidal N, Butel C, Mpoudi-Ngolle E, Peeters M, Ariën K K. The predominance of Human Immunodeficiency Virus type 1 (HIV-1) circulating recombinant form 02 (CRF02_AG) in West Central Africa may be related to its replicative fitness. Retrovirology. 2006;3:40–40. doi: 10.1186/1742-4690-3-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
  35. Paraschiv S, Otelea D, Batan I, Baicus C, Magiorkinis G, Paraskevis D. Molecular typing of the recently expanding subtype B HIV-1 epidemic in Romania: evidence for local spread among MSMs in Bucharest area. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2012;12:1052–1057. doi: 10.1016/j.meegid.2012.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Paraskevis D, Pybus O, Magiorkinis G, Hatzakis A, Wensing A M, van de Vijver D a, Albert J, Angarano G, Asjö B, Balotta C, Boeri E, Camacho R, Chaix M-L, Coughlan S, Costagliola D, De Luca A, de Mendoza C, Derdelinckx I, Grossman Z, Hamouda O, Hoepelman I, Horban A, Korn K, Kücherer C, Leitner T, Loveday C, Macrae E, Maljkovic-Berry I, Meyer L, Nielsen C, Op de Coul E L, Ormaasen V, Perrin L, Puchhammer-Stöckl E, Ruiz L, Salminen M O, Schmit J-C, Schuurman R, Soriano V, Stanczak J, Stanojevic M, Struck D, Van Laethem K, Violin M, Yerly S, Zazzi M, Boucher C a, Vandamme A-M. Tracing the HIV-1 subtype B mobility in Europe: a phylogeographic approach. Retrovirology. 2009;6:49–49. doi: 10.1186/1742-4690-6-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Pérez L, Thomson M M, Bleda M J, Aragonés C, González Z, Pérez J, Sierra M, Casado G, Delgado E, Nájera R. HIV Type 1 molecular epidemiology in cuba: high genetic diversity, frequent mosaicism, and recent expansion of BG intersubtype recombinant forms. AIDS research and human retroviruses. 2006;22:724–733. doi: 10.1089/aid.2006.22.724. [DOI] [PubMed] [Google Scholar]
  38. Pollakis G, Abebe A, Kliphuis A, De Wit T F R, Fisseha B, Tegbaru B, Tesfaye G, Negassa H, Mengistu Y, Fontanet A L, Cornelissen M, Goudsmit J. Recombination of HIV type 1C (C′;/C″) in Ethiopia: possible link of EthHIV-1C′ to subtype C sequences from the high-prevalence epidemics in India and Southern Africa. AIDS research and human retroviruses. 2003;19:999–1008. doi: 10.1089/088922203322588350. [DOI] [PubMed] [Google Scholar]
  39. Poonpiriya V, Sungkanuparph S, Leechanachai P, Pasomsub E, Watitpun C, Chunhakan S, Chantratita W. A study of seven rule-based algorithms for the interpretation of HIV-1 genotypic resistance data in Thailand. Journal of virological methods. 2008;151:79–86. doi: 10.1016/j.jviromet.2008.03.017. [DOI] [PubMed] [Google Scholar]
  40. Restif O. Evolutionary epidemiology 20 years on: challenges and prospects. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2009;9:108–123. doi: 10.1016/j.meegid.2008.09.007. [DOI] [PubMed] [Google Scholar]
  41. Sharp P M, Hahn B H. Origins of HIV and the AIDS Pandemic. 2011. pp. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sharp P M, Hahn B H. Origins of HIV and the AIDS pandemic. Cold Spring Harbor perspectives in medicine. 2011;1:a006841–a006841. doi: 10.1101/cshperspect.a006841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Shen C, Craigo J, Ding M, Chen Y, Gupta P. Origin and dynamics of HIV-1 subtype C infection in India. PloS one. 2011;6:e25956–e25956. doi: 10.1371/journal.pone.0025956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sierra M, Thomson M M, Posada D, Pérez L, Aragonés C, González Z, Pérez J, Casado G, Nájera R. Identification of 3 phylogenetically related HIV-1 BG intersubtype circulating recombinant forms in Cuba. Journal of acquired immune deficiency syndromes (1999) 2007;45:151–160. doi: 10.1097/QAI.0b013e318046ea47. [DOI] [PubMed] [Google Scholar]
  45. Silveira J, Santos A F, Martínez A M B, Góes L R, Mendoza-Sassi R, Muniz C P, Tupinambás U, Soares M a, Greco D B. Heterosexual transmission of human immunodeficiency virus type 1 subtype C in southern Brazil. Journal of clinical virology: the official publication of the Pan American Society for Clinical Virology. 2012;54:36–41. doi: 10.1016/j.jcv.2012.01.017. [DOI] [PubMed] [Google Scholar]
  46. Spira S. Impact of clade diversity on HIV-1 virulence, antiretroviral drug sensitivity and drug resistance. Journal of Antimicrobial Chemotherapy. 2003;51:229–240. doi: 10.1093/jac/dkg079. [DOI] [PubMed] [Google Scholar]
  47. Taylor B S, Hammer S M. The challenge of HIV-1 subtype diversity. The New England journal of medicine. 2008;359:1965–1966. doi: 10.1056/NEJMc086373. [DOI] [PubMed] [Google Scholar]
  48. Tebit D M, Arts E J. Tracking a century of global expansion and evolution of HIV to drive understanding and to combat disease. The Lancet Infectious Diseases. 2011;11:45–56. doi: 10.1016/S1473-3099(10)70186-9. [DOI] [PubMed] [Google Scholar]
  49. Villanova F E. Diversity of HIV-1 Subtype B: Implications to the Origin of BF Recombinants. 2010;5:1–9. doi: 10.1371/journal.pone.0011833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Walker B D, Burton D R. Toward an AIDS vaccine. Science (New York, N.Y.) 2008;320:760–764. doi: 10.1126/science.1152622. [DOI] [PubMed] [Google Scholar]
  51. Walker P R, Pybus O G, Rambaut A, Holmes E C. Comparative population dynamics of HIV-1 subtypes B and C: subtype-specific differences in patterns of epidemic growth. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2005;5:199–208. doi: 10.1016/j.meegid.2004.06.011. [DOI] [PubMed] [Google Scholar]
  52. Wang Y, Rawi R, Wilms C, Heider D, Yang R, Hoffmann D. A small set of succinct signature patterns distinguishes Chinese and non-Chinese HIV-1 genomes. PloS one. 2013;8:e58804–e58804. doi: 10.1371/journal.pone.0058804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques. 2011. [Google Scholar]
  54. Worobey M, Gemmel M, Teuwen D E, Haselkorn T, Kunstman K, Bunce M, Muyembe J-j, Kabongo J-m M, Kalengayi R M, Van Marck E, Gilbert M T P, Wolinsky S M, Kalengayi M, Marck E V. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455:661–664. doi: 10.1038/nature07390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Yang O O. Candidate vaccine sequences to represent intraand inter-clade HIV-1 variation. PloS one. 2009;4:e7388–e7388. doi: 10.1371/journal.pone.0007388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhao Y. R and Data Mining: Examples and Case Studies 1. 2011. [Google Scholar]
  57. Zhu T, Korber B T, Nahmias a J, Hooper E, Sharp P M, Ho D D. An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature. 1998;391:594–597. doi: 10.1038/35400. [DOI] [PubMed] [Google Scholar]

Articles from Virologica Sinica are provided here courtesy of Wuhan Institute of Virology, Chinese Academy of Sciences

RESOURCES