Skip to main content
Virologica Sinica logoLink to Virologica Sinica
. 2011 Apr 7;26(2):95. doi: 10.1007/s12250-011-3188-7

Use of mutual information arrays to predict coevolving sites in the full length HIV gp120 protein for subtypes B and C

Bo Wei 1, Na Han 1, Hai-zhou Liu 1, Anthony Rayner 1, Simon Rayner 1,
PMCID: PMC8222454  PMID: 21468932

Abstract

It is well established that different sites within a protein evolve at different rates according to their role within the protein; identification of these correlated mutations can aid in tasks such as ab initio protein structure, structure function analysis or sequence alignment. Mutual Information is a standard measure for coevolution between two sites but its application is limited by signal to noise ratio. In this work we report a preliminary study to investigate whether larger sequence sets could circumvent this problem by calculating mutual information arrays for two sets of drug naïve sequences from the HIV gp120 protein for the B and C subtypes. Our results suggest that while the larger sequences sets can improve the signal to noise ratio, the gain is offset by the high mutation rate of the HIV virus which makes it more difficult to achieve consistent alignments. Nevertheless, we were able to predict a number of coevolving sites that were supported by previous experimental studies as well as a region close to the C terminal of the protein that was highly variable in the C subtype but highly conserved in the B subtype.

Key words: Mutual information arrays, Predict coevolving sites, Protein evolve, HIV gp120 protein, B and C subtypes

Footnotes

These authors contributed equally to this work.

References

  • 1.Atchley W. R., Terhalle W., Dress A. Positional dependence, cliques, and predictive motifs in the bHLH protein domain. J Mol Evol. 1999;48(5):501–516. doi: 10.1007/PL00006494. [DOI] [PubMed] [Google Scholar]
  • 2.Clarke N. D. Covariation of residues in the homeodomain sequence family. Protein Sci. 1995;4(11):2269–2278. doi: 10.1002/pro.5560041104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Daub C. O., Steuer R., Selbig J., et al. Estimating mutual information using B-spline functions—an improved similarity measure for analysing gene expression data. BMC Bioinformatics. 2004;5:118. doi: 10.1186/1471-2105-5-118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dickson R. J., Wahl L. M., Fernandes A. D., et al. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS ONE. 2010;5(6):e11082. doi: 10.1371/journal.pone.0011082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dunn S. D., Wahl L. M., Gloor G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24(3):333–340. doi: 10.1093/bioinformatics/btm604. [DOI] [PubMed] [Google Scholar]
  • 6.Etemad-Moghadam B., Sun Y., Nicholson E. K., et al. Envelope glycoprotein determinants of increased fusogenicity in a pathogenic simian-human immunodeficiency virus (SHIV-KB9) passaged in vivo. J Virol. 2000;74(9):4433–4440. doi: 10.1128/JVI.74.9.4433-4440.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Excoffier L., Hofer T., Foll M. Detecting loci under selection in a hierarchically structured population. Heredity. 2009;103(4):285–298. doi: 10.1038/hdy.2009.74. [DOI] [PubMed] [Google Scholar]
  • 8.Excoffier L., Yang Z. Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Mol Biol Evol. 1999;16(10):1357–1368. doi: 10.1093/oxfordjournals.molbev.a026046. [DOI] [PubMed] [Google Scholar]
  • 9.Fariselli P., Olmea O., Valencia A., et al. Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins. 2001;Suppl5:157–162. doi: 10.1002/prot.1173. [DOI] [PubMed] [Google Scholar]
  • 10.Gobel U., Sander C., Schneider R., et al. Correlated mutations and residue contacts in proteins. Proteins. 1994;18(4):309–317. doi: 10.1002/prot.340180402. [DOI] [PubMed] [Google Scholar]
  • 11.Gouy M., Guindon S., Gascuel O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27(2):221–224. doi: 10.1093/molbev/msp259. [DOI] [PubMed] [Google Scholar]
  • 12.Handley M. A., Steigbigel R. T., Morrison S. A. A role for urokinase-type plasminogen activator in human immunodeficiency virus type 1 infection of macrophages. J Virol. 1996;70(7):4451–4456. doi: 10.1128/jvi.70.7.4451-4456.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hartley O., Klasse P. J., Sattentau Q. J., et al. V3: HIV’s switch-hitter. AIDS Res Hum Retroviruses. 2005;21(2):171–189. doi: 10.1089/aid.2005.21.171. [DOI] [PubMed] [Google Scholar]
  • 14.Hemmerich C, Kim S. 2007. A study of residue correlation within protein sequences and its application to sequence classification. EURASIP J Bioinform Syst Biol, 2007: doi:10.1155/2007/87356. [DOI] [PMC free article] [PubMed]
  • 15.Huang C. C., Tang M., Zhang M. Y., et al. Structure of a V3-containing HIV-1 gp120 core. Science. 2005;310(5750):1025–1028. doi: 10.1126/science.1118398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kass I., Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins. 2002;48(4):611–617. doi: 10.1002/prot.10180. [DOI] [PubMed] [Google Scholar]
  • 17.Korber B. T. F. B. T., Kuiken C. L., Pillai S. K., et al. Human Retroviruses and AIDS 1998: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Los Alamos, NM: Theoretical Biology and Biophysics Group, Los Alamos National Laboratory; 1998. Numbering Positions in HIV Relative to HXB2CG; pp. 102–111. [Google Scholar]
  • 18.Korber B. T., Farber R. M., Wolpert D. H., et al. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci U S A. 1993;90(15):7176–7180. doi: 10.1073/pnas.90.15.7176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kundrotas P. J., Alexov E. G. Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics. 2006;7:503. doi: 10.1186/1471-2105-7-503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Laakso M. M., Lee F. H., Haggarty B., et al. V3 loop truncations in HIV-1 envelope impart resistance to coreceptor inhibitors and enhanced sensitivity to neutralizing antibodies. PLoS Pathog. 2007;3(8):e117. doi: 10.1371/journal.ppat.0030117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Larson S. M., Di Nardo A. A., Davidson A. R. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J Mol Biol. 2000;303(3):433–446. doi: 10.1006/jmbi.2000.4146. [DOI] [PubMed] [Google Scholar]
  • 22.Liu J., Bartesaghi A., Borgnia M. J., et al. Molecular architecture of native HIV-1 gp120 trimers. Nature. 2008;455(7209):109–113. doi: 10.1038/nature07159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lockless S. W., Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286(5438):295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
  • 24.Martin L. C., Gloor G. B., Dunn S. D., et al. Using information theory to search for co-evolving residues in proteins. Bioinformatics. 2005;21(22):4116–4124. doi: 10.1093/bioinformatics/bti671. [DOI] [PubMed] [Google Scholar]
  • 25.Notredame C., Higgins D. G., Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302(1):205–217. doi: 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
  • 26.Olmea O., Rost B., Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol. 1999;293(5):1221–1239. doi: 10.1006/jmbi.1999.3208. [DOI] [PubMed] [Google Scholar]
  • 27.Suel G. M., Lockless S. W., Wall M. A., et al. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 2003;10(1):59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]
  • 28.UNaids. 2011. UNAIDS Report on the global AIDS epidemic 2010. http://www.unaids.org/globalreport/Global_report.htm.
  • 29.Wu H., Kwong P. D., Hendrickson W. A. Dimeric association and segmental variability in the structure of human CD4. Nature. 1997;387(6632):527–530. doi: 10.1038/387527a0. [DOI] [PubMed] [Google Scholar]
  • 30.Wu T. D., Schiffer C. A., Gonzales M. J., et al. Mutation patterns and structural correlates in human immunodeficiency virus type 1 protease following different protease inhibitor treatments. J Virol. 2003;77(8):4836–4847. doi: 10.1128/JVI.77.8.4836-4847.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wyatt R., Kwong P. D., Desjardins E., et al. The antigenic structure of the HIV gp120 envelope glycoprotein. Nature. 1998;393(6686):705–711. doi: 10.1038/31514. [DOI] [PubMed] [Google Scholar]
  • 32.Zhou T., Xu L., Dey B., et al. Structural definition of a conserved neutralization epitope on HIV-1 gp120. Nature. 2007;445(7129):732–737. doi: 10.1038/nature05580. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Virologica Sinica are provided here courtesy of Wuhan Institute of Virology, Chinese Academy of Sciences

RESOURCES