Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2008 May 29;24(14):1575–1582. doi: 10.1093/bioinformatics/btn248

Using inferred residue contacts to distinguish between correct and incorrect protein models

Christopher S Miller 1, David Eisenberg 1,2,*
PMCID: PMC2638260  PMID: 18511466

Abstract

Motivation: The de novo prediction of 3D protein structure is enjoying a period of dramatic improvements. Often, a remaining difficulty is to select the model closest to the true structure from a group of low-energy candidates. To what extent can inter-residue contact predictions from multiple sequence alignments, information which is orthogonal to that used in most structure prediction algorithms, be used to identify those models most similar to the native protein structure?

Results: We present a Bayesian inference procedure to identify residue pairs that are spatially proximal in a protein structure. The method takes as input a multiple sequence alignment, and outputs an accurate posterior probability of proximity for each residue pair. We exploit a recent metagenomic sequencing project to create large, diverse and informative multiple sequence alignments for a test set of 1656 known protein structures. The method infers spatially proximal residue pairs in this test set with good accuracy: top-ranked predictions achieve an average accuracy of 38% (for an average 21-fold improvement over random predictions) in cross-validation tests. Notably, the accuracy of predicted 3D models generated by a range of structure prediction algorithms strongly correlates with how well the models satisfy probable residue contacts inferred via our method. This correlation allows for confident rejection of incorrect structural models.

Availability: An implementation of the method is freely available at http://www.doe-mbi.ucla.edu/services

Contact: david@mbi.ucla.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Supplementary Material

[Supplementary Data]
btn248_index.html (935B, html)

REFERENCES

  1. Altschuh D, et al. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 1987;193:693–707. doi: 10.1016/0022-2836(87)90352-4. [DOI] [PubMed] [Google Scholar]
  2. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
  3. Bowers PM, et al. De novo protein structure determination using sparse NMR data. J. Biomol. NMR. 2000;18:311–318. doi: 10.1023/a:1026744431105. [DOI] [PubMed] [Google Scholar]
  4. Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics. 2007;8:113. doi: 10.1186/1471-2105-8-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cline MS, et al. Information-theoretic dissection of pairwise contact potentials. Proteins. 2002;49:7–14. doi: 10.1002/prot.10198. [DOI] [PubMed] [Google Scholar]
  6. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Eyal E, et al. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins. 2007;67:142–153. doi: 10.1002/prot.21223. [DOI] [PubMed] [Google Scholar]
  8. Fariselli P, et al. Prediction of contact maps with neural networks and correlated mutations. Protein Eng. 2001;14:835–843. doi: 10.1093/protein/14.11.835. [DOI] [PubMed] [Google Scholar]
  9. Fodor AA, Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins. 2004a;56:211–221. doi: 10.1002/prot.20098. [DOI] [PubMed] [Google Scholar]
  10. Fodor AA, Aldrich RW. On evolutionary conservation of thermodynamic coupling in proteins. J. Biol. Chem. 2004b;279:19046–19050. doi: 10.1074/jbc.M402560200. [DOI] [PubMed] [Google Scholar]
  11. Gobel U, et al. Correlated mutations and residue contacts in proteins. Proteins. 1994;18:309–317. doi: 10.1002/prot.340180402. [DOI] [PubMed] [Google Scholar]
  12. Grana O, et al. CASP6 assessment of contact prediction. Proteins. 2005;61(Suppl. 7):214–224. doi: 10.1002/prot.20739. [DOI] [PubMed] [Google Scholar]
  13. Hamilton N, et al. Protein contact prediction using patterns of correlation. Proteins. 2004;56:679–684. doi: 10.1002/prot.20160. [DOI] [PubMed] [Google Scholar]
  14. Izarzugaza JM, et al. Assessment of intramolecular contact predictions for CASP7. Proteins. 2007;69(Suppl. 8):152–158. doi: 10.1002/prot.21637. [DOI] [PubMed] [Google Scholar]
  15. Joint Center for Structural Genomics. 2006. Crystal structure of novel predicted phosphatase from Haemophilus somnus 129PT at 1.90 A resolution (unpublished). Joint Center for Structural Genomics. [Google Scholar]
  16. Korber BT, et al. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl Acad. Sci. USA. 1993;90:7176–7180. doi: 10.1073/pnas.90.15.7176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Lassmann T, Sonnhammer EL. Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005;6:298. doi: 10.1186/1471-2105-6-298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  19. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
  20. Martin LC, et al. Using information theory to search for co-evolving residues in proteins. Bioinformatics. 2005;21:4116–4124. doi: 10.1093/bioinformatics/bti671. [DOI] [PubMed] [Google Scholar]
  21. Moult J, et al. Critical assessment of methods of protein structure prediction-round VII. Proteins. 2007;69(Suppl. 8):3–9. doi: 10.1002/prot.21767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Neher E. How frequent are correlated changes in families of protein sequences? Proc. Natl Acad. Sci. USA. 1994;91:98–102. doi: 10.1073/pnas.91.1.98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Noivirt O, et al. Detection and reduction of evolutionary noise in correlated mutation analysis. Protein Eng. Des. Sel. 2005;18:247–253. doi: 10.1093/protein/gzi029. [DOI] [PubMed] [Google Scholar]
  24. Olmea O, et al. Effective use of sequence correlation and conservation in fold recognition. J. Mol. Biol. 1999;293:1221–1239. doi: 10.1006/jmbi.1999.3208. [DOI] [PubMed] [Google Scholar]
  25. Ortiz AR, et al. Nativelike topology assembly of small proteins using predicted restraints in Monte Carlo folding simulations. Proc. Natl Acad. Sci. USA. 1998;95:1020–1025. doi: 10.1073/pnas.95.3.1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ortiz AR, et al. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins. 1999;37:177–185. doi: 10.1002/(sici)1097-0134(1999)37:3+<177::aid-prot22>3.3.co;2-5. [DOI] [PubMed] [Google Scholar]
  27. Pollastri G, Baldi P. Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics. 2002;18(Suppl. 1):S62–S70. doi: 10.1093/bioinformatics/18.suppl_1.s62. [DOI] [PubMed] [Google Scholar]
  28. Pollock DD, et al. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 1999;287:187–198. doi: 10.1006/jmbi.1998.2601. [DOI] [PubMed] [Google Scholar]
  29. Punta M, Rost B. PROFcon: novel prediction of long-range contacts. Bioinformatics. 2005;21:2960–2968. doi: 10.1093/bioinformatics/bti454. [DOI] [PubMed] [Google Scholar]
  30. Qian B, et al. High-resolution structure prediction and the crystallographic phase problem. Nature. 2007;450:259–264. doi: 10.1038/nature06249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schueler-Furman O, Baker D. Conserved residue clustering and protein structure prediction. Proteins. 2003;52:225–235. doi: 10.1002/prot.10365. [DOI] [PubMed] [Google Scholar]
  32. Schueler-Furman O, et al. Progress in modeling of protein structures and interactions. Science. 2005;310:638–642. doi: 10.1126/science.1112160. [DOI] [PubMed] [Google Scholar]
  33. Shackelford G, Karplus K. Contact prediction using mutual information and neural nets. Proteins. 2007;69(Suppl. 8):159–164. doi: 10.1002/prot.21791. [DOI] [PubMed] [Google Scholar]
  34. Singer MS, et al. Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng. 2002;15:721–725. doi: 10.1093/protein/15.9.721. [DOI] [PubMed] [Google Scholar]
  35. Suel GM, et al. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 2003;10:59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]
  36. Tillier ER, Lui TW. Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics. 2003;19:750–755. doi: 10.1093/bioinformatics/btg072. [DOI] [PubMed] [Google Scholar]
  37. Tringe SG, Rubin EM. Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet. 2005;6:805–814. doi: 10.1038/nrg1709. [DOI] [PubMed] [Google Scholar]
  38. Vicatos S, et al. Prediction of distant residue contacts with the use of evolutionary information. Proteins. 2005;58:935–949. doi: 10.1002/prot.20370. [DOI] [PubMed] [Google Scholar]
  39. Vullo A, et al. A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics. 2006;7:180. doi: 10.1186/1471-2105-7-180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wang J, et al. Toronto, Canada: Structural Genomics Consortium; 2006. Crystal structure of human ADP-ribosylation factor-like 6 (CASP Target) (unpublished) [Google Scholar]
  41. Wollenberg KR, Atchley WR. Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl Acad. Sci. USA. 2000;97:3288–3291. doi: 10.1073/pnas.070154797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wrabl JO, Grishin NV. Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins. 2005;61:523–534. doi: 10.1002/prot.20648. [DOI] [PubMed] [Google Scholar]
  43. Yeang CH, Haussler D. Detecting coevolution in and among protein domains. PLoS Comput. Biol. 2007;3:e211. doi: 10.1371/journal.pcbi.0030211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Yooseph S, et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16. doi: 10.1371/journal.pbio.0050016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31:3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zhang Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins. 2007;69(Suppl. 8):108–117. doi: 10.1002/prot.21702. [DOI] [PubMed] [Google Scholar]
  47. Zhang Y, et al. TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys. J. 2003;85:1145–1164. doi: 10.1016/S0006-3495(03)74551-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Data]
btn248_index.html (935B, html)

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES