Foundations of Biomolecular Modeling

William L Jorgensen

doi:10.1016/j.cell.2013.11.023

. Author manuscript; available in PMC: 2014 Dec 5.

Published in final edited form as: Cell. 2013 Dec 5;155(6):1199–1202. doi: 10.1016/j.cell.2013.11.023

Foundations of Biomolecular Modeling

William L Jorgensen ^1,^*

PMCID: PMC3892588 NIHMSID: NIHMS544960 PMID: 24315087

Abstract

The 2013 Nobel Prize in Chemistry has been awarded to Martin Kaplus, Michael Levitt, and Arieh Warshel for “Development of Multiscale Models for Complex Chemical Systems”. The honored work from the 1970s has provided a foundation for the widespread activities today in modeling organic and biomolecular systems.

Different techniques are needed for computational modeling of small and large molecular systems. For small systems such as isolated organic molecules and complexes, quantum mechanical calculations can provide very accurate results for important properties such as molecular structure, conformational energetics, interaction energies, and spectroscopic properties. Advances in this area were honored by award of the 1998 Nobel Prize in Chemistry to Walter Kohn and John Pople for their important contributions to the development of density functional theory and ab initio quantum theory. For treatment of much larger systems such as proteins and nucleic acids, more approximate methods including classical mechanics are needed. The Nobel Prize this year recognizes seminal work in this area by Martin Karplus (Harvard), Michael Levitt (Stanford), and Arieh Warshel (USC) that set the stage for today's widespread activities in modeling biomolecular systems. The specific studies that are noted in the Nobel Committee's Scientific Background document were from 1968–1976. The setting at that time and impact are considered here.

Molecular Structure and Force Fields

The most fundamental aspect of a molecule is its geometrical structure. It can be determined experimentally by methods such as microwave spectroscopy for small molecules or X-ray diffraction for large ones. It is also desirable to have computational methods to predict structures and related energetics, especially for molecules that are unstable or difficult to isolate. This requires an expression for the energy of the molecule as a function of the coordinates of every atom, E(R). Then, the change of the energy with respect to the displacements (Δx_i, Δy_i, Δz_i) of each atom i can be used to find the nearest energy minimum. Each minimum corresponds to a conformer of the molecule. A simple molecule like butane has only two conformers, anti and gauche, while a protein can have many thousands of conformers. If E(R) was accurate and easily computed, it would be possible to readily obtain the structures for wide-ranging molecular systems. In principle, all of the minima of E(R) can be found by a conformational search procedure, which would yield the structures and relative energies of all conformers. It would also be possible to determine the structures of transition states, which cannot be well characterized by experiment. By comparing the energies of reactants and transition states, energies of activation would be obtained along with the associated kinetic insights. Similarly, if one knew E(R) for collections of molecules, structures of complexes and their interaction energies could be computed.

From the standpoint of quantum mechanics (QM), E(R) can come from solution of the Schrödinger equation for each choice of coordinates R. For acceptable accuracy, this is only viable for relatively small systems, ca. up to 100 atoms. For larger molecules such as a protein, let alone a protein surrounded by thousands of water molecules, a different approach is needed. The problem is an old one and has led to development of “classical”, i.e., non-quantum, treatments for more than a century (Lafitte et al., 2013). The energy expressions in this case contain expressions for bond stretching, angle bending, torsional energetics, and non-bonded terms based on Hooke's Law (1676), Coulomb's Law (1785), and Mie (1903) or Lennard-Jones (1924) potentials. The individual terms require some parameterization, e.g., to assign force constants, reference bond lengths (r₀), and atomic charges, which is done by fitting to target quantities such as known structures, vibrational frequencies, and conformational energy differences. The resultant complete energy expression is referred to as a “force field', and calculations using force fields are known as force field or “molecular mechanics” (MM) calculations. MM activities began in earnest in the early 1960s with the arrival of digital computers in universities. Initial applications were for conformational analyses of cycloalkanes in several groups including those of James Hendrickson at UCLA and Brandeis, Kenneth Wiberg at Yale, and Norman “Lou” Allinger at Georgia (Allinger, 2011). Though the conformers of cyclohexane were well known at this time, the number and relative energies of the conformers of a molecule like cyclododecane were not known and impossible to establish by experiment.

The Lifson Group

Molecular mechanics calculations were also initiated by the group of Shneior Lifson at the Weizmann Institute starting with a paper in 1967, again on conformations of cycloalkanes (Bixon and Lifson, 1967). Arieh Warshel was a graduate student in the group at that time and published a more general paper on alkanes with Lifson in 1968 (Lifson and Warshel, 1968). One topic that they addressed was the optimal formulae for non-bonded interactions, i.e., intermolecular interactions and intramolecular interactions between atoms separated by more than three bonds. A common treatment was to use a Buckingham “exp-6” potential, but they found preference for the Lennard-Jones 12–6 alternative augmented with Coulomb interactions between atom pairs. This treatment along with the usual bond-stretching, angle-bending, and torsional terms defined their “consistent force field” (CFF) that was expected to be generally suitable for molecular modeling. Indeed, this model has worked well and has formed the basis for the most common biomolecular force fields used today, i.e., AMBER, CHARMM, GROMOS, and OPLS (Jorgensen and Tirado-Rives, 2005).

Michael Levitt also arrived in the Lifson lab in October 1967 to spend a year between completion of his undergraduate work at Kings College in London and graduate studies at the MRC in Cambridge. He participated in further development of the CFF force field and software and in initial calculations on amides, published with Warshel and Lifson. The Lifson group extended the parameterization to the side-chain components necessary for proteins, and they also pioneered testing on crystal structures of hydrocarbons and peptides. A very important event was then the first energy-minimizations for entire proteins, myoglobin and lysozyme (Levitt and Lifson, 1969). Their united-atom model (no hydrogens) was employed and energy minimizations were carried out starting from the available X-ray coordinates. This accomplishment was most timely as it established a means to assist in the refinement of crystal structures for proteins, which were just beginning to appear. It is clear that Lifson (1914–2001) and his co-workers deserve much credit for their leadership in recognizing the importance of developing force fields and associated software for modeling biomolecules.

The Next Phase – Mixing QM and MM

Martin Karplus was a visitor in the Lifson group during this period, and Arieh Warshel moved to Harvard in 1969 as a postdoctoral fellow. On a personal note, I was a graduate student there during 1970–75. Other Harvard graduate students and postdoctoral fellows at that time include David Case (now at Rutgers), Barry Honig (Columbia), Peter Rossky (UT-Austin), Andy McCammon (UCSD), Klaus Schulten (Illinois), Attila Szabo (NIH), and Peter Wolynes (Rice) – not a shabby group of co-workers. Most of the students had offices in Prince House, which provided for lively interactions. Karplus, already a prominent theoretician, had interests which included calculations of the conformational energetics and electronic spectra for π-conjugated molecules including chromophores such as retinal (Honig and Karplus, 1971). The lowest-energy excited states for these molecules involve excitations of electrons from occupied π orbitals into unoccupied ones (Figure 1). Thus, in order to compute the associated excitation energies, a quantum mechanical treatment of at least the π electrons is required. In the paper with Honig, this was done using Pariser-Parr-Pople SCF-CI calculations for the π system, while Buckingham terms were used to estimate the change in non-bonded interactions for different conformers. At the time, accurate, purely QM treatment of entire molecules (σ and π electrons) as large as retinal was not practical. The hybrid approach was generalized subsequently in work with Warshel in which a CFF treatment of the σ framework was merged with the PPP calculations for the π system of conjugated molecules (Warshel and Karplus, 1972). The Allinger group was also very active at this time in development of the MMP1 method, which combined PPP calculations with their force field (Allinger and Sprague, 1972). Overall, during the 1970s and 80s, Allinger and co-workers developed the most sophisticated and accurate force field for the treatment of organic molecules, MM2. However, the added complexity with MM2 kept it from being adopted for modeling biomolecular systems.

Upon exposure of a conjugated molecule like stilbene (illustrated) to sufficiently energetic photons, an electron can be excited from the highest occupied molecular orbital (bottom) to the lowest unoccupied one (top). To compute the energy change for such transitions from the ground electronic state to excited states requires a quantum mechanical (QM) treatment for at least the π orbitals. To also compute the relative energies and excitation energies for alternative conformers, treatment of the energy of the rest of the molecule is also required, e.g., by molecular mechanics (MM). Such a hybrid approach was used by the honorees (Warshel and Karplus, 1972).

Warshel subsequently returned to the Weizmann Institute and collaborated with Levitt at the MRC on work that took mixed quantum and molecular mechanical calculations to the next level (Warshel and Levitt, 1976). Their paper, “Theoretical Study of Enzymatic Reactions: Dielectric, Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of Lysozyme”, largely provided the framework in which “QM/MM” calculations are carried out today. They performed energy minimizations to gain insight into the mechanism of the hydrolysis of a hexasaccharide by lysozyme. The system consisted of two regions. The side chain of the catalytic Glu35 and most of the sugar residue containing the cleaved C-O bond were treated quantum mechanically, while the rest of the substrate and protein was treated using molecular mechanics (Figure 2). They also considered the influence of the aqueous environment by modeling water molecules as point dipoles on a surrounding grid. The total energy then consists of QM, MM, and interfacial QM/MM parts. The QM model that was used now included all of the valence electrons in a simplified manner using localized orbitals. Though the methodological advances were the principal contribution here, the conclusion from the results that electrostatic stabilization of the transition state for the reaction was more important for the catalytic acceleration than ground-state destabilization on binding became a theme in Warshel's work. The mixing of quantum and molecular mechanical calculations in the manner of the papers by Warshel and Karplus and by Warshel and Levitt is being specifically honored by the Nobel Prize.

Mixed quantum and molecular mechanics calculations are used to model enzymatic reactions. The substrate and key surrounding residues are treated with QM, while the remainder of the system including water molecules is represented with faster molecular mechanics (Warshel and Levitt, 1976).

Adventures in Coarse Graining and Protein Folding

There is no question that the honorees exhibited astute problem selection as also reflected in their early paper on “Computer Simulation of Protein Folding” (Levitt and Warshel, 1975). A highly simplified model of the 58-residue protein, bovine pancreatic trypsin inhibitor (BPTI), was used consisting of two particles per residue located on C_α and the center of the side chain. Interactions between side chains, hydrophobic effects, hydrogen bonding, and torsional energetics for the main chain were taken in to account in an approximate manner (Levitt, 1976). Sequential rounds of energy minimizations and heating using normal modes were found to anneal unfolded starting structures into folded ones, though the rms deviations of 6–8 Å from the observed structure of BPTI were substantial and α-helices could not form spontaneously. In fact, results with the same similarity to the native structure of BPTI were obtained from a model just containing glycines and alanines (Hagler and Honig, 1978). Thus, the protein folding problem was not being solved, but Levitt and Warshel provided a computational method that could be used to yield compact polypeptide structures. The honorees were bold in their willingness to push forward with such crude computational models. In the context of protein folding and structure prediction, the extensive efforts of Harold Scheraga at Cornell should also be noted. His group developed or explored many procedures for exhaustive conformational search, they also developed protein force fields for this purpose, and they proceeded in a sound manner starting from small peptides (Némethy and Scheraga, 1977).

Interestingly, the use of highly simplified representations of protein residues has experienced a renaissance for modeling of very large systems such as multiple proteins embedded in membranes (Marrink and Tieleman, 2013). This approach is now referred to as “coarse-graining”. The trend is socio-scientifically interesting since much effort was expended in the 1990s to develop all-atom force fields (one interaction site for every atom) to improve upon the united-atom (lacking hydrogens on saturated carbons) force fields from the 1980s. However, considerations of scale and detail justify the need for a range of molecular representations. Problems in computational biology can range from studying the folding of an oligopeptide to modeling a cell. All-atom force fields may be appropriate for the former problem, but much lower-resolution models are needed for initial forays into the latter realm.

Quo Vadis

Today there is much emphasis on performing “simulations” in solution including computations of free energy changes for enzymatic reactions, quantification of protein-ligand binding, and modeling proteins and processes in membranes. “Simulation” has traditionally implied modeling a molecular system using statistical mechanics in a specified ensemble. Temperature and atomic motions are introduced. Unfortunately, “simulation” has become increasingly misused to mean nothing more than “calculation”. The two dominant ways to perform a simulation are through Monte Carlo statistical mechanics (MC) and molecular dynamics (MD). The roots of MC go back to the Austrian physicist Ludwig Boltzmann (1844–1906) and to Yale legend J. Willard Gibbs (1839–1903). MD traces its origins to Isaac Newton (1643–1727). Given the expression for the energy of the system as a function of the atomic coordinates, E(R), the derivatives of E(R) with respect to the atomic positions provide the forces on the atoms. With F = ma and elementary physics one can then compute the evolution of the system as one steps through time. For molecular systems, the time step is dictated by the fastest motions, bond vibrations, and must therefore be small, ca. 1 fs (10⁻¹⁵ s). MD and MC simulations with MM force fields and with QM/MM treatments are all now commonplace.

The work from 1968–1976, specifically noted by this year's Nobel Prize, laid the foundations of biomolecular modeling. The honorees addressed essential issues including the representation of classical force fields, the format for QM/MM calculations, and the possible utility of coarse-grained models for biomolecules. Karplus, Levitt, and Warshel were also key contributors to the next period which began in 1977 with the introduction of MD simulations for proteins in vacuum. Simulations of proteins in aqueous solution with the water molecules explicitly represented did not begin until the mid-1980s. The latter calculations required development of more complex software and appropriate force fields for both water and the biomolecules. The early force fields needed much improvement for the torsional energetics and the description of non-bonded interactions to obtain, for example, reasonable densities of pure liquids. There were also no water models that were both accurate in reproducing properties of liquid water and in a form readily compatible with the protein force fields (Jorgensen and Tirado-Rives, 2005). In addition, much greater computer resources were needed. To follow a system by MD for 10 ps, the time for a 120° rotation of a methyl group, requires ca. 10,000 time steps, which necessitates computations of E(R) and its derivatives 10,000 times. For 100 ps, the computational demands are easily 1000-fold greater than for an energy minimization.

Biomolecular modeling is now a major activity in the scientific community, carried out in hundreds of research groups around the world. The range of applications is remarkable including refinement of X-ray and NMR structures, analyses of the dynamics and hydration of biomolecules, simulations of protein folding, virtual screening by ligand docking, design of enzyme inhibitors, and studies of the mechanisms of enzymatic reactions, the function of ion channels, transport through membranes, and protein aggregation. Karplus, Levitt, and Warshel were highly visible pioneers and proponents of the field, who richly deserve the Nobel accolades.

ACKNOWLEDGMENTS

Work in the Jorgensen lab is supported primarily by grants from NIH-GM (GM32136) and NIAID (AI44616).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

Allinger NL, Sprague JT. J. Am. Chem. Soc. 1972;95:3893–3907. [Google Scholar]
Allinger NL. J. Comput. Aided Mol. Des. 2011;25:295–316. doi: 10.1007/s10822-011-9422-4. [DOI] [PubMed] [Google Scholar]
Hagler AT, Honig B. Proc. Natl. Acad. Sci. USA. 1978;75:554–558. doi: 10.1073/pnas.75.2.554. [DOI] [PMC free article] [PubMed] [Google Scholar]
Honig B, Karplus M. Nature. 1971;229:558–560. doi: 10.1038/229558a0. [DOI] [PubMed] [Google Scholar]
Lafitte T, Apostolakou A, Avendaño C, Galindo A, Adjiman CS, Müller EA, Jackson G. J. Chem. Phys. 2013;139:154504. doi: 10.1063/1.4819786. [DOI] [PubMed] [Google Scholar]
Jorgensen WL, Tirado-Rives J. Proc. Natl. Acad. Sci. USA. 2005;102:6665–6670. doi: 10.1073/pnas.0408037102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levitt M, Lifson S. J. Mol. Biol. 1969;46:269–279. doi: 10.1016/0022-2836(69)90421-5. [DOI] [PubMed] [Google Scholar]
Levitt M, Warshel A. Nature. 1975;253:694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
Levitt M. J. Mol. Biol. 1976;104:59–107. doi: 10.1016/0022-2836(76)90004-8. [DOI] [PubMed] [Google Scholar]
Lifson S, Warshel A. J. Chem. Phys. 1968;49:5116–5129. [Google Scholar]
Marrink SJ, Tieleman DP. Chem. Soc. Rev. 2013;42:6801–6822. doi: 10.1039/c3cs60093a. [DOI] [PubMed] [Google Scholar]
Némethy G, Scheraga HA. Q. Rev. Biophys. 1977;10:239–352. doi: 10.1017/s0033583500002936. [DOI] [PubMed] [Google Scholar]
Warshel A, Karplus M. J. Am. Chem. Soc. 1972;94:5612–5625. [Google Scholar]
Warshel A, Levitt M. J. Mol. Biol. 1976;103:227–249. doi: 10.1016/0022-2836(76)90311-9. [DOI] [PubMed] [Google Scholar]

[R1] Allinger NL, Sprague JT. J. Am. Chem. Soc. 1972;95:3893–3907. [Google Scholar]

[R2] Allinger NL. J. Comput. Aided Mol. Des. 2011;25:295–316. doi: 10.1007/s10822-011-9422-4. [DOI] [PubMed] [Google Scholar]

[R3] Hagler AT, Honig B. Proc. Natl. Acad. Sci. USA. 1978;75:554–558. doi: 10.1073/pnas.75.2.554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Honig B, Karplus M. Nature. 1971;229:558–560. doi: 10.1038/229558a0. [DOI] [PubMed] [Google Scholar]

[R5] Lafitte T, Apostolakou A, Avendaño C, Galindo A, Adjiman CS, Müller EA, Jackson G. J. Chem. Phys. 2013;139:154504. doi: 10.1063/1.4819786. [DOI] [PubMed] [Google Scholar]

[R6] Jorgensen WL, Tirado-Rives J. Proc. Natl. Acad. Sci. USA. 2005;102:6665–6670. doi: 10.1073/pnas.0408037102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Levitt M, Lifson S. J. Mol. Biol. 1969;46:269–279. doi: 10.1016/0022-2836(69)90421-5. [DOI] [PubMed] [Google Scholar]

[R8] Levitt M, Warshel A. Nature. 1975;253:694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]

[R9] Levitt M. J. Mol. Biol. 1976;104:59–107. doi: 10.1016/0022-2836(76)90004-8. [DOI] [PubMed] [Google Scholar]

[R10] Lifson S, Warshel A. J. Chem. Phys. 1968;49:5116–5129. [Google Scholar]

[R11] Marrink SJ, Tieleman DP. Chem. Soc. Rev. 2013;42:6801–6822. doi: 10.1039/c3cs60093a. [DOI] [PubMed] [Google Scholar]

[R12] Némethy G, Scheraga HA. Q. Rev. Biophys. 1977;10:239–352. doi: 10.1017/s0033583500002936. [DOI] [PubMed] [Google Scholar]

[R13] Warshel A, Karplus M. J. Am. Chem. Soc. 1972;94:5612–5625. [Google Scholar]

[R14] Warshel A, Levitt M. J. Mol. Biol. 1976;103:227–249. doi: 10.1016/0022-2836(76)90311-9. [DOI] [PubMed] [Google Scholar]

PERMALINK

Foundations of Biomolecular Modeling

William L Jorgensen

Abstract

Molecular Structure and Force Fields

The Lifson Group

The Next Phase – Mixing QM and MM

Figure 1. Combining Quantum and Molecular Mechanics to Explore Excited States.

Figure 2. QM/MM for Enzymatic Reactions.

Adventures in Coarse Graining and Protein Folding

Quo Vadis

ACKNOWLEDGMENTS

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Foundations of Biomolecular Modeling

William L Jorgensen

Abstract

Molecular Structure and Force Fields

The Lifson Group

The Next Phase – Mixing QM and MM

Figure 1. Combining Quantum and Molecular Mechanics to Explore Excited States.

Figure 2. QM/MM for Enzymatic Reactions.

Adventures in Coarse Graining and Protein Folding

Quo Vadis

ACKNOWLEDGMENTS

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases