Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2014 May 21;30(18):2681–2683. doi: 10.1093/bioinformatics/btu336

Evol and ProDy for bridging protein sequence evolution and structural dynamics

Ahmet Bakan 1,†,, Anindita Dutta 1,†,, Wenzhi Mao 1, Ying Liu 1, Chakra Chennubhotla 1, Timothy R Lezon 1, Ivet Bahar 1,*
PMCID: PMC4155247  PMID: 24849577

Abstract

Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics.

Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/.

Contact: bahar@pitt.edu

1 INTRODUCTION

The significance of protein dynamics in a wide range of biological functions, including cell signaling, regulation and machinery is widely established (Bahar et al., 2010; Bhabha et al., 2011; Marsh et al., 2012). In many cases, sequence variability goes hand in hand with structural dynamics (Glembo et al., 2012; Liu and Bahar, 2012; Marks et al., 2011; Micheletti, 2012; Worth et al., 2009; Zheng et al., 2005). Structural dynamics correlates with evolvability (Tokuriki and Tawfik, 2009) or sequence and conformational diversity (Friedland et al., 2009) and enables adaptation to substrate binding while maintaining specificity (Liu et al., 2010). To our knowledge, existing software usually relate evolutionary properties to static structures (Ashkenazy et al., 2010; Morgan et al., 2006; Wainreb et al., 2011), or they are exclusively dedicated to either sequence analysis (Waterhouse et al., 2009) or structural dynamics (Eyal et al., 2006; Suhre and Sanejouand, 2004). There is a need for methods that allow combined analysis of sequence (co)evolution and structural dynamics. These would be particularly useful if they could be performed and visualized in a versatile, integrated computing environment.

Toward addressing this need, we introduce the v1.5 of ProDy (Bakan et al., 2011) with Evol applications. Highlights of the new version are rich methods for coevolutionary analysis, and extensions for analyzing and interpreting structural dynamics, following the approach adopted in our recent comparative study of sequence conservation and coevolution patterns versus structure/dynamics properties for a representative set of protein families (Liu and Bahar, 2012), which has been validated in detailed case studies (e.g. General et al., 2014; Liu et al., 2010). A distinctive feature of ProDy is its capability to extract mechanistic information from principal component analysis (PCA) of ensembles of structures (e.g. drug targets) (Bakan and Bahar, 2009). The new release has several new modules and command line applications named ‘evol’ to evaluate sequence conservation and coevolution using information-theoretic and statistical approaches. To our knowledge, this is the only package that enables comparative analysis of protein dynamics with sequence evolution data extracted from multiple sequence alignments (MSAs) for protein families.

2 DESCRIPTION AND FUNCTIONALITY

2.1 Input for ProDy and Evol

The input for ProDy is a set of protein coordinates in PDB format, or simply the PDB ID or protein sequence. The speed of PDB parser and AtomGroup classes has been increased in the current version, such that parsing coordinates is 4.5–40 times faster than using Biopython PDB module (Hamelryck and Manderick, 2003), and atomic data storage occupies 10 times less memory footprint. We implemented efficient and flexible features for handling MSAs. Notably, the new MSA parser can evaluate various formats at a rate of 700 MB/s (on 3.6 GHz Intel Xeon CPU, 16 GB RAM and Samsung SSD) and is up to 80 times faster than the alignment parser of Biopython (Cock et al., 2009). Flexible classes store MSA data parsimoniously in the memory and provide ways of subsampling. Sequences can be filtered based on their labels to retain those in certain categories (e.g. human) and sliced to retain specific regions or sequences (e.g. regions matching structurally resolved amino acids). Such refinements, performed in a fraction of a second, allow for real-time processing of large MSAs and systematic analyses of protein families.

2.2 Coevolution analysis

Evol offers a rich set of features for evaluating and plotting evolutionary properties of amino acids. Methods based on mutual information (Dunn et al., 2008), statistical coupling analysis (SCA) recent extension (Halabi et al., 2009), observed-minus-expected-squared covariance (Kass and Horovitz, 2002) and direct information (DI) (Marks et al., 2011; Weigt et al., 2009) have been implemented for coevolution analysis. Our implementations of these methods follow the descriptions in their respective papers. We rigorously tested our methods, cross-checking our results with the code that came with the cited papers. Evol can operate in turbo mode when there is sufficient memory (twice the size of MSA file); otherwise it falls back to a memory efficient mode. Benchmarking the performance of different implementations also show that, Evol algorithms written in C/Python run 1.5 (DI) to 7 (SCA) times faster than the original implementations in Matlab. Furthermore, Evol takes account of ambiguous (e.g. Asx) and modified (e.g. selenocysteine, pyrrolysine) amino acids or gaps. More specific requirements, such as the occupancy of amino acid positions, can also be satisfied using preprocessing methods described in the previous section. Minimal numbers of sequences to be included in the MSAs are recommended to be 100 and 250 in SCA and DI methods, respectively, in accord with the original studies. All methods are available in the API and through ‘evol’ program, and their usage is illustrated in the Evol Tutorial on the ProDy Web site.

2.3 Structure and dynamics analysis

ProDy was originally designed for inferring structural dynamics from PCA of experimental structural datasets, as well as predictions of the Gaussian network model (GNM) of other elastic network models (ENMs) (Bahar et al., 2010). Building on these methods, we have implemented perturbation-response scanning (Atilgan and Atilgan, 2009), an ENM variant with structure-based force constants (Lezon and Bahar, 2010), rotations-translations of blocks method (Tama et al., 2000), membrane ENM model (Lezon and Bahar, 2012) and ENM reduction (Hinsen et al., 2000) and extension algorithms that enable mapping the model to smaller or larger parts of the studied system. In addition, we added features for essential dynamics analysis (EDA) (Amadei et al., 1993) of MD trajectories. Along with the Evol suite, ProDy now permits comparison of sequence evolution data and structural dynamic patterns predicted by ENMs or deduced from experimental data (PCA) or simulations (EDA).

2.4 Comparisons of sequence evolution and structural dynamics

Of particular interest is to understand the dynamical properties of conserved amino acids and vice versa. On calculation of mobility and conservation profiles for a given protein or a protein family, the profiles can be compared using Pearson’s or Spearman’s correlation coefficients (Liu and Bahar, 2012). ProDy and Evol API functions enable such comparisons by facilitating mapping between structure- and sequence-based models, i.e. missing residues in the structure or sequence are represented with dummy atoms, and outputting results as numerical arrays that can be fed directly into the statistical analysis modules of SciPy, NumPy and Matplotlib.

2.5 NMWiz for visual comparative analysis

We enhanced in v1.5 the capabilities of the Normal Mode Wizard (NMWiz) plug-in, which is now distributed with VMD (Humphrey et al., 1996). NMWiz can be used to analyze all molecule and trajectory file formats supported by VMD and to perform a comparative visual analysis of structural dynamics and sequence evolution. Figure 1 displays screenshots of VMD molecular representations (Panel B) and MultiPlot and Heatmapper plots (Panels C and D) showing conservation and mobility profiles and evolutionary and dynamical correlations, all generated through NMWiz.

Fig. 1.

Fig. 1.

NMWiz for comparative analysis of ProDy and Evol output. (A) NMWiz control panel. (B) Protein and normal mode representations, (C) mobility and conservation profiles and (D) cross-correlations in dynamics and coevolution generated via NMWiz

2.6 An illustrative example

Figure 2 illustrates an application of ProDy and Evol to compare the sequence conservation and coevolution patterns of the RNase A family of proteins with the global dynamics of a structurally resolved (Holloway et al., 2009) member of the family. Panel A shows the correlation between sequence entropy (gray bars) and mobility profile of residues predicted by the GNM based on all modes (black), and a subset of global modes (eight lowest frequency modes, blue). Active site residues Q11, K41 and H119 have minimal entropy and low mobility. Panel B displays the ribbon diagrams color-coded by residue conservation (left) and intrinsic conformational mobility (right). Highly conserved (low entropy) residues, colored blue on the left diagram also have lower mobility (blue, right). Conversely, highly variable residues (red, left) tend to occupy highly mobile regions (red, right). A few residues are highlighted (encircled in A and B) to ease the comparison. Panel C shows the mutual information map generated for the family. The bright points (cyan to red) in the heat map refer to pairs that have high coevolution propensities. A number of evolutionarily correlated but sequentially distant (≥6 intervening residues) pairs of sites are highlighted (circles), including spatially close (magenta) or distant (orange) pairs shown in panel D. Notably, (C65, C72) forms a disulfide bridge; (T82, H48) make side chain (polar) interactions (left diagram); and the pairs (N71, Q11) and (T36, D14) are presumably involved in allosteric interactions (right diagram). The right diagram in panel D displays the RNase A crystallized in the presence of an inhibitor-like substrate (thin stick representation) (Holloway et al., 2009). Q11 and N71 form hydrogen bonds with the substrate to ensure binding/recognition specificity, whereas D14 (near the binding site) shows long-range coevolution with a distant part of the residue (T36) suggestive of allosteric communication.

Fig. 2.

Fig. 2.

Comparison of sequence conservation and residue mobility (panels A and B), and sequence-coevolution and spatial location of selected coevolving pairs (panels C and D) for RNase A. See text in Section 2.6 for more details

3 CONCLUSION

Evol adds new API features and command line applications to ProDy for rapid assessment and visualization of sequence conservation and coevolution patterns and allows for examining these results in the light of the structure and dynamics of proteins, motivated by our current understanding of the role of intrinsic dynamics in sequence evolution. The ProDy API and the new extensions implemented here can harness efficient and powerful features of other open-source scientific packages (e.g. NumPy, SciPy and Matplotlib), to harness their efficient and powerful features, thus making the API suitable for interactive usage and rapid and easy development of new applications.

Funding: The work was supported by National Institutes of Health [5R01GM099738 and P41 GM103712 to I.B.] and fellowship by Tsinghua University [to W.M.].

Conflict of Interest: none declared.

REFERENCES

  1. Amadei A, et al. Essential dynamics of proteins. Proteins. 1993;17:412–425. doi: 10.1002/prot.340170408. [DOI] [PubMed] [Google Scholar]
  2. Ashkenazy H, et al. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010;38:W529–W533. doi: 10.1093/nar/gkq399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Atilgan C, Atilgan AR. Perturbation-response scanning reveals ligand entry-exit mechanisms of ferric binding protein. PLoS Comput. Biol. 2009;5:e1000544. doi: 10.1371/journal.pcbi.1000544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bahar I, et al. Global dynamics of proteins: bridging between structure and function. Annu. Rev. Biophys. 2010;39:23–42. doi: 10.1146/annurev.biophys.093008.131258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bakan A, Bahar I. The intrinsic dynamics of enzymes plays a dominant role in determining the structural changes induced upon inhibitor binding. Proc. Natl Acad. Sci. USA. 2009;106:14349–14354. doi: 10.1073/pnas.0904214106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bakan A, et al. ProDy: protein dynamics inferred from theory and experiments. Bioinformatics. 2011;27:1575–1577. doi: 10.1093/bioinformatics/btr168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhabha G, et al. A dynamic knockout reveals that conformational fluctuations influence the chemical step of enzyme catalysis. Science. 2011;332:234–238. doi: 10.1126/science.1198542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cock PJA, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dunn SD, et al. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24:333–340. doi: 10.1093/bioinformatics/btm604. [DOI] [PubMed] [Google Scholar]
  10. Eyal E, et al. Anisotropic network model: systematic evaluation and a new web interface. Bioinformatics. 2006;22:2619–2627. doi: 10.1093/bioinformatics/btl448. [DOI] [PubMed] [Google Scholar]
  11. Friedland GD, et al. A correspondence between solution-state dynamics of an individual protein and the sequence and conformational diversity of its family. PLoS Comput. Biol. 2009;5:e1000393. doi: 10.1371/journal.pcbi.1000393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. General IJ, et al. ATPase subdomain IA is a mediator of interdomain allostery in Hsp70 molecular chaperones. PLoS Comput. Biol. 2014;10:e1003624. doi: 10.1371/journal.pcbi.1003624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Glembo TJ, et al. Collective dynamics differentiates functional divergence in protein evolution. PLoS Comput. Biol. 2012;8:e1002428. doi: 10.1371/journal.pcbi.1002428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Halabi N, et al. Protein sectors: evolutionary units of three-dimensional structure. Cell. 2009;138:774–786. doi: 10.1016/j.cell.2009.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hamelryck T, Manderick B. PDB file parser and structure class implemented in Python. Bioinformatics. 2003;19:2308–2310. doi: 10.1093/bioinformatics/btg299. [DOI] [PubMed] [Google Scholar]
  16. Hinsen K, et al. Harmonicity in slow protein dynamics. Chem. Phys. 2000;261:25–37. [Google Scholar]
  17. Holloway DE, et al. Influence of naturally-occurring 5′-pyrophosphate-linked substituents on the binding of adenylic inhibitors to ribonuclease A: an X-ray crystallographic study. Biopolymers. 2009;91:995–1008. doi: 10.1002/bip.21158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Humphrey W, et al. VMD: visual molecular dynamics. J. Mol. Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  19. Kass I, Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins. 2002;48:611–617. doi: 10.1002/prot.10180. [DOI] [PubMed] [Google Scholar]
  20. Lezon TR, Bahar I. Using entropy maximization to understand the determinants of structural dynamics beyond native contact topology. PLoS Comput. Biol. 2010;6:e1000816. doi: 10.1371/journal.pcbi.1000816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lezon TR, Bahar I. Constraints imposed by the membrane selectively guide the alternating access dynamics of the glutamate transporter Glt Ph. Biophys. J. 2012;102:1331–1340. doi: 10.1016/j.bpj.2012.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu Y, Bahar I. Sequence evolution correlates with structural dynamics. Mol. Biol. Evol. 2012;29:2253–2263. doi: 10.1093/molbev/mss097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Liu Y, et al. Role of Hsp70 ATPase domain intrinsic dynamics and sequence evolution in enabling its functional interactions with NEFs. PLoS Comput. Biol. 2010;6:15. doi: 10.1371/journal.pcbi.1000931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Marks DS, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6:e28766. doi: 10.1371/journal.pone.0028766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Marsh JA, et al. Probing the diverse landscape of protein flexibility and binding. Curr. Opin. Struct. Biol. 2012;22:643–650. doi: 10.1016/j.sbi.2012.08.008. [DOI] [PubMed] [Google Scholar]
  26. Micheletti C. Comparing proteins by their internal dynamics: exploring structure-function relationships beyond static structural alignments. Phys. Life Rev. 2012;10:1–26. doi: 10.1016/j.plrev.2012.10.009. [DOI] [PubMed] [Google Scholar]
  27. Morgan DH, et al. ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics. 2006;22:2049–2050. doi: 10.1093/bioinformatics/btl285. [DOI] [PubMed] [Google Scholar]
  28. Suhre K, Sanejouand Y-H. ElNemo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res. 2004;32:W610–W614. doi: 10.1093/nar/gkh368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Tama F, et al. Building-block approach for determining low-frequency normal modes of macromolecules. Proteins. 2000;41:1–7. doi: 10.1002/1097-0134(20001001)41:1<1::aid-prot10>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
  30. Tokuriki N, Tawfik DS. Protein dynamism and evolvability. Science. 2009;324:203–207. doi: 10.1126/science.1169375. [DOI] [PubMed] [Google Scholar]
  31. Wainreb G, et al. Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. Bioinformatics. 2011;27:3286–3292. doi: 10.1093/bioinformatics/btr576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Waterhouse AM, et al. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Weigt M, et al. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl Acad. Sci. USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Worth CL, et al. Structural and functional constraints in the evolution of protein families. Nat. Rev. Mol. Cell Biol. 2009;10:709–720. doi: 10.1038/nrm2762. [DOI] [PubMed] [Google Scholar]
  35. Zheng W, et al. Network of dynamically important residues in the open/closed transition in polymerases is strongly conserved. Structure. 2005;13:565–577. doi: 10.1016/j.str.2005.01.017. [DOI] [PubMed] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES