Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Sep 3.
Published in final edited form as: Structure. 2013 Sep 3;21(9):10.1016/j.str.2013.08.007. doi: 10.1016/j.str.2013.08.007

Protein Modelling: What Happened to the “Protein Structure Gap”?

Torsten Schwede 1,2
PMCID: PMC3816506  NIHMSID: NIHMS516480  PMID: 24010712

Abstract

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing vision in structural biology as it holds the promise to bypass part of the laborious process of experimental structure solution. Over the last two decades, a paradigm shift has occurred: starting from a situation where the “structure knowledge gap” between the huge number of protein sequences and small number of known structures has hampered the widespread use of structure-based approaches in life science research, today some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. With the scientific focus of interest moving towards larger macromolecular complexes and dynamic networks of interactions, the integration of computational modeling methods with low-resolution experimental techniques allows studying large and complex molecular machines. Computational modeling and prediction techniques are still facing a number of challenges which hamper the more widespread use by the non-expert scientist. For example, it is often difficult to convey the underlying assumptions of a computational technique, as well as the expected accuracy and structural variability of a specific model. However, these aspects are crucial to understand the limitations of a model, and to decide which interpretations and conclusions can be supported.

Introduction

All macromolecular structures are to some degree models, with a variable ratio between experimental data and computational prediction. Typically, the atomic coordinates of heavy atoms in very high-resolution crystal structures are over-determined by the diffraction data, while methods with a lower ratio of parameters to experimental observables increasingly rely on computational tools to construct structural models for the spatial interpretation of the data (e.g. NMR, EM, SAXS, FRET) (Kalinin et al., 2012; Read et al., 2011; Rieping et al., 2005; Trewhella et al., 2013). At the other end of the spectrum are methods for the de novo prediction of macromolecular structures, which aim to predict the native 3D structure - typically a domain of a protein – directly from its amino acid sequence without experimental data by using various ab initio or knowledge based computational approaches (Baker and Sali, 2001; Das and Baker, 2008).

With the focus of interest in structural biology moving towards larger macromolecular complexes and dynamic networks of interactions, integrative structure solution techniques which can combine experimental data from heterogeneous sources and are able to handle ambiguous or conflicting information are becoming essential. Especially the combination of computational modeling with low-resolution experimental constraints has proven powerful (Ward et al., 2013). In retrospect, the most famous 3-dimensional structure of a biological macromolecule could in today’s terms be considered as an “integrative low-resolution model”: When Watson and Crick published the structure of the DNA double helix their model was based on fiber diffraction data at low resolution and additional constraints about chemistry and stoichiometry. Although atomic high-resolution diffraction data only became available much later, this low resolution model suggested “… a possible copying mechanism for the genetic material” (Watson and Crick, 1953) and has initiated a revolution in molecular biology and biomedical research (Collins et al., 2003). Obviously, it is not the atomic resolution or precision of a model which determine its usefulness, but the understanding which interpretations and conclusions can be supported by the model at hand.

In contrast to regular structure of the DNA double helix, the structural biology of proteins is much more complex, where each protein has its own unique three-dimensional structure. Since small changes in the sequence of a protein can have strong effects on its biophysical properties, experimental determination of protein structures is a laborious and often unpredictable endeavor. The computational modeling of a protein’s structure has therefore attracted substantial interest in the field of structural bioinformatics to complement experimental structural biology efforts to characterize the protein universe (Baker and Sali, 2001; Levitt, 2009).

In the following paragraphs, I will provide an updated view on the “protein structure gap”, arguing that structure information for the majority of amino acids in common model organism proteomes can be provided by a combination of computational and experimental techniques. I will highlight some applications of structure modeling and prediction techniques in various areas of life sciences, and discuss limitations and challenges in communicating model information to potential users of models.

The “protein structure gap” is disappearing

Advances in DNA sequencing techniques are giving rise to an unprecedented avalanche of new sequences (UniProt-Consortium, 2013), and it is obvious that it will be impossible to determine the structures of all proteins of interest experimentally with current techniques (Figure 1). With more sensitive next-generation sequencing techniques becoming available, also non-cultivatable organisms come within reach – widening the “protein structure gap” even further despite tremendous progress in automating experimental structure determination techniques.

Figure 1. Mind the gap.

Figure 1

The number of entries in the SwissProt and trEMBL sequence databases (UniProt-Consortium, 2013)) and the PDB (Berman et al., 2007) are growing exponentially, while the “protein structure gap” between sequence and structures is widening dramatically. Inset: Growth of PDB holdings from 1972 to 2013.

Fortunately, homologous proteins which share detectable sequence similarity have similar three-dimensional structures, and their structural diversity is increasing with evolutionary distance as outlined by Chothia and Lesk in their seminal paper “The relation between the divergence of sequence and structure in proteins” (Chothia and Lesk, 1986). Based on this observation, methods for comparative modeling (aka homology or template-based modeling) of protein structures were developed two decades ago, which allow extrapolating the available experimental structure information to so far uncharacterized protein sequences (Guex et al., 2009; Peitsch, 1995; Sanchez and Sali, 1998; Sutcliffe et al., 1987). Today, comparative modeling techniques have matured into fully automated stable pipelines which provide reliable three-dimensional models accessible also to non-specialists (Table 1).

Table 1.

Commonly used tools and services for protein structure modeling and prediction

The apparent complexity of the protein sequence universe can be explained by multi-domain architectures formed by combinations of single domains characterized by approximately 15,000 sequence family profiles (Levitt, 2009). Thanks to efforts by the experimental structural biology community and world-wide structural genomics efforts (Terwilliger, 2011) an increasing fraction of protein families has at least one member with an experimental structure in the PDB database (Berman et al., 2007). At the same time, sensitive and accurate profile HMM methods have been developed and allow taking advantage of the available sequence databases for detection of remote template relationships (Remmert et al., 2012). As a result, a paradigm shift has occurred during the last decade: starting from a situation, where the “structure knowledge gap” between the number of protein sequences and small number of known structures has hampered the widespread use of structure-based approaches in life science research, today some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes (Figure 2). The widespread availability of three-dimensional structure information enables rational structure-based approaches in a broad range of applications in life science research (Schwede et al., 2009).

Figure 2. Structural template coverage of the human proteome.

Figure 2

The fraction of amino acids in the human proteome showing sequence similarity to proteins with known structures in the PDB is shown over time, where colors indicate levels of sequence identity as detected by PSI-BLAST (Altschul et al., 1997). The area shaded in red indicates the fraction of about 30% of intrinsically unstructured residues estimated in the human proteome (Colak et al., 2013; Ward et al., 2004). Models built on templates sharing low sequence identity <20% are often of poor quality due to evolutionary divergence between target and template structures, and limitations of the modeling and refinement methods (illustrated in Figure 3B). Prokaryotic proteomes have in general a higher structural coverage than eukaryotic ones (Guex et al., 2009; Zhang et al., 2009).

However, computational models often represent only fractions of the full-length of a protein of interest and one of the unresolved questions in template based modeling is how to combine information from multiple templates, e.g. different structural domains, into larger complex assemblies. Current techniques are not able to reliably predict the relative orientation of domains of such multi-template models. Also, comparative models still resemble more closely the template than the target structure, and refinement methods are not able to consistently refine models closer to the target structure (MacCallum et al., 2011). The development of reproducible and reliable methods for refinement which consistently improve the accuracy of models by shifting the coordinates closer to the native state is one of the pressing challenges in the field.

The function of a protein almost always involves motions and conformational changes, and a molecular understanding of its mechanism requires a detailed description of the different functional states the structure can explore dynamically. Typical examples include allosteric conformational changes upon binding events, intermediate excited states in reaction cycles, transport and motion phenomena. Frequently, however, these states are not directly observable experimentally at high resolution, but can only be characterized at low resolution, e.g. by changes in chemical shifts, FRET, SAXS, or limited electron density in X-Ray crystallography (Hennig et al., 2013; Kalinin et al., 2012; Ochi et al., 2012). Modeling and simulation will play a central role in exploring these alternative conformations and describing the dynamics of the transitions (Ma et al., 2011; Nygaard et al., 2013; Weinkam et al., 2012).

Modeling protein complexes and interactions

Although structural protein domains often reflect functional modules (Lees et al., 2012), they are rarely found in isolation: many proteins form intricate multi-domain architectures, often assemble to stable oligomeric quaternary states, and frequently incorporate low-molecular weight ligands and metal ions, e.g. as co-factors or structural components. Since molecular interactions are not easily recognizable from the sequence of the protein, different structure-based prediction strategies have been explored for modeling these interactions. Assuming the structures of the proteins participating in a complex are known, docking programs aim to align them in an orientation favorable for interaction (Janin, 2010). However, the details of the structure-affinity relationships are not yet fully understood and for complex systems the estimates for the binding free energy are often only approximate (Kastritis and Bonvin, 2013).

The fraction of protein hetero-complexes in the PDB is small compared to the number of monomeric or homo-oligomeric structures. Nevertheless, it seems that for almost all known protein-protein interactions for which the individual components are structurally characterized, structures of complexes can be identified in the PDB which can be used for template-based prediction approaches (Kundrotas et al., 2012; Stein et al., 2011; Xu and Dunbrack, 2011). In combination with homology modeling of the target proteins, this opens the opportunity for the structural prediction of protein-protein interactions on a genome-wide scale (Stein et al., 2011; Vakser, 2013). Although structure based methods for predicting protein-protein interactions might have a rather high noise level, accuracy comparable to other high-throughput methods can be achieved in combination with orthogonal information (Aloy and Russell, 2006; Zhang et al., 2012). The current state of the art in protein docking and prediction of complexes is regularly assessed in the CAPRI blind prediction experiment (Janin, 2010).

In contrast to structurally well characterized stable protein complexes, transient protein-protein interactions often involve significant structural changes of the partners upon binding. In fact, a large fraction estimated around 30% of the proteome encoded by higher eukaryotes is assumed to be highly flexible or even intrinsically disordered and supposed to form a well-defined 3-dimensional structure only upon binding to a partner molecule (Colak et al., 2013; Janin and Sternberg, 2013). These intrinsically disordered protein (IDP) regions enable interactions with many different proteins, and are attributed to have a function in tissue specific rewiring of protein-protein interactions by specific modifications of IDP regions, for example by post-translational modifications or alternative splicing (Buljan et al., 2013). Sequence variations in protein-protein interfaces are often associated with human diseases as they have the potential to disrupt regulatory networks in the cell (David et al., 2012; Vidal et al., 2011; Wei et al., 2013). Prediction of these interactions and structural modeling of mutations and PTMs remains an open challenge of highest interest (Uversky and Dunker, 2013; Wass et al., 2011).

Know your limits

“Essentially, all models are wrong, but some are useful” (Box and Draper, 1987). Different applications of macromolecular structure information have different requirements with respect to accuracy and resolution of a model. While atomistic molecular modeling only works with highly accurate sets of coordinates, models of lower resolution are still useful for the rational design site directed mutagenesis experiments, epitope mapping, or supporting experimental structure determination (Schwede et al., 2009). In order to be able to decide if a model or prediction is useful for a specific application, knowing its expected accuracy and quality is essential. The assessment of techniques for structure prediction in retrospective experiments such as CASP (Moult et al., 2011), EVA (Koh et al., 2003), LifeBench (Rychlewski and Fischer, 2005) or CAMEO (Haas et al., 2013) allowed comparing different methods on the same dataset, thereby establishing the current state-of-the-art and indicating areas which require further development of improved methods. However, the accuracy differences between the best prediction methods on the same protein target are typically small in comparison to the differences between easy and difficult protein targets (Figure 3). Therefore, reliable local error estimates for the atomic coordinates predicted by a model are crucial to judge its applicability for a specific question. Unfortunately, only very few modeling methods today deliver reliable confidence measures for their predictions (Kryshtafovych et al., 2013; Mariani et al., 2011). This is a serious limitation of current modeling techniques, which hinders the more widespread application of models in biomedical research.

Figure 3. Examples of blind structure predictions.

Figure 3

Figure 3

Two proteins of different predicting difficulty are displayed – highlighting the importance of individual structure model quality estimation. A) Crystal structure of the acyl-CoA dehydrogenase from Slackia heliotrinireducens solved by the Midwest Center for Structural Genomics in superposition with the ten best blind predictions in the CASP10 experiment (T0758). Obviously in this case, all predictions agree well with the experimental reference structure and the differences between methods are small. In more difficult cases, like the crystal structure of a hypothetical protein from Ruminococcus gnavus solved at Joint Center for Structural Genomics (PDB:4GL6) shown in panel B (CASP target T0684-d2), no suitable template structure could be identified and the ten best predictions show large deviations from the reference structure and among each other. Obviously, reliable error estimated for the atomic coordinates of an individual model are crucial to judge the expected accuracy of individual models and their suitability for specific applications. Consensus between independent prediction methods has been shown to be a good indicator of model accuracy in general.

Several independent tools for model validation and quality estimation have been developed to overcome this problem, which assess certain structural features of a model such as stereo-chemical correctness (Chen et al., 2010; Hooft et al., 1996; Laskowski et al., 1993) or apply a combination of knowledge-based statistical measures derived from high-resolution crystals structures to provide estimates of model accuracy (Benkert et al., 2011; Ray et al., 2012; Wiederstein and Sippl, 2007). In case where many independent predictions for the same protein by different independent methods are available, consensus approaches have proven powerful to identify reliable areas in models and to identify segments with likely deviations from the actual structure (Ginalski et al., 2003; Kryshtafovych et al., 2013; McGuffin et al., 2013).

Validating the accuracy and reliability of a model in order to estimate its suitability for a specific application requires obviously access to the model coordinates and information about the procedures and underlying assumptions which were applied to generate the model. However, since coordinates derived by theoretical modeling cannot be deposited in the PDB (Berman et al., 2006), many manuscripts reporting results of theoretical modeling or simulations are published without making the models available. This makes it impossible for the reader and reviewer of the manuscript to judge if the experiment is reproducible and conclusions are justified. In order to alleviate this situation, a public archive of macromolecular structure models (http://modelarchive.org) is currently being established as part of Protein Model Portal (Haas et al., 2013). The model archive provides a unique stable accession code (DOI) for each deposited model, which can be directly referenced in the corresponding manuscripts. Besides of the actual model coordinates, archiving of models should include sufficient details about assumptions, parameters and constraints applied in the simulation to allow the user of a model to assess – and if necessary reproduce - the simulation. In an ideal situation, it should be possible to download a deposition from a model archive and continue the simulation e.g. by adding own experimental data becoming available as constraint or applying more advanced simulation methods being developed in the meantime.

Over the last two decades, protein structure modeling and prediction methods have matured to a point where reliable models for many proteins can be generated and successfully used as substitute for direct experimental structures in a broad variety of applications. In the following, a few recent examples are highlighted.

Application of homology models in structure-based drug discovery

The rational development of drugs increasingly relies on structure-based strategies for identifying potent and selective low molecular weight chemical compounds. The usefulness of homology models in structure based virtual screening has been demonstrated in various retrospective analyses on a broad variety of different targets (Costanzi, 2013; Kairys et al., 2006; McGovern and Shoichet, 2003; Oshiro et al., 2004; Skolnick et al., 2013). One class of proteins with particular interest for structure based drug discovery are GPCRs, where recent advances in experimental structure determination have brought a wide range of receptors within range for comparative modeling techniques (Carlsson et al., 2011; Kobilka and Schertler, 2008). Similar to protein structure prediction, community efforts for blind and independent validation of prediction techniques are crucial to assess the expected accuracy and reliability of modeled protein-ligand interactions (Damm-Ganamet et al., 2013; Kufareva et al., 2011; SAMPL, 2010).

Interestingly, experimental structures will not necessarily give better results than models in structure based drug discovery. Bajorath and coworkers have analyzed 322 prospective virtual screening campaigns in the scientific literature (Ripphausen et al., 2010), out of which a total of 73 studies successfully utilized homology models. Surprisingly, the potency of the hits identified using homology models was on average higher than for hits identified by docking into X-ray structures. The observation that an X-ray structure is not necessarily the best possible representation of a particular structural state is illustrated by a recent study for predicting the substrate site of metabolism in cytochrome P450 CYP2D6. A model of CYP2D6 was generated based on the X-ray crystal structures of substrate-bound CYP2C5 (Unwalla et al., 2010). During the study, the structure of apo-CYP2D6 also became available. Both the homology model and the experimental structure were used as receptors in docking calculations. While overall the homology model was in good agreement with the CYP2D6 crystal structure, the model consistently outperformed the experimental structure in the docking calculations. This observation can be attributed to structural differences in the substrate recognition sites and demonstrates the importance of correctly describing substrate-induced conformational changes that occur upon ligand binding which must be taken into account. Computational modeling techniques play a crucial role in exploring such alternative receptor conformations.

Structural modeling can not only support the development of new drugs, but also predict the likely effect of amino acid sequence variation in the proximity of the binding site, for example mutations that disrupt drug binding leading to drug resistance. Computational techniques which can predict resistance a priori are expected to become useful tools for drug discovery and design of treatment in the clinics (Safi and Lilien, 2012).

De novo structure prediction techniques guide protein engineering

The de novo structure prediction of naturally occurring proteins is considered as the “holy grail” in computational structural biology. Still, despite a few remarkable exceptions, de novo techniques are limited to small proteins and the overall accuracy remains typically rather low (Kinch et al., 2011). Interestingly, however, the very same techniques appear to work remarkably well for the inverse problem – the design of protein sequences with specific properties, e.g. which will fold into a specific structure or will form specific interactions. Baker and co-workers have successfully designed a library of idealized protein folds, which allowed to derive a set of rules guiding future designs (Koga et al., 2012). Others have engineered naturally occurring tandem-repeat proteins, e.g. ankyrin repeat proteins, into a system which now allows to rationally design proteins with specific binding properties and finds a wide range of applications from research in structural biology to in the future possibly as therapeutics (Javadi and Itzhaki, 2013; Tamaskovic et al., 2012).

Due to limitations in the computational design methodology, these approaches are often combined with high-throughput screening and in vitro maturation techniques to diagnose modeling inaccuracies and generate high activity binders (Whitehead et al., 2013). This approach was recently applied to design proteins that bind a conserved surface patch on the stem of the influenza hemagglutinin (HA) from the 1918 H1N1 pandemic virus. After affinity maturation, two of the designed proteins, HB36 and HB80, bind H1 and H5 HAs with low nanomolar affinity (Fleishman et al., 2011). Special cases of designing specific interactions are proteins that self-assemble to a desired symmetric architecture. The experimental validation of a designed 24-subunit, 13-nm diameter complex with octahedral symmetry and a 12-subunit, 11-nm diameter complex with tetrahedral symmetry confirmed that the resulting materials closely matched the design models (King et al., 2012). The approach opens new perspectives for the development of self-assembling protein nanomaterials.

While in the previous example, the self-assembly into larger assemblies was a desired design goal, the self-assembly of proteins into amyloid fibrils in the human body is associated with numerous pathologies. Recently, several structures of amyloid fibrils have been determined experimentally (Eisenberg and Jucker, 2012) and provide an interesting target for the development of specific inhibitors of pathological amyloid fibril formation. Computer-aided structure-based design approaches have been successfully applied to develop highly specific peptide inhibitors of amyloid formation of the tau protein associated with Alzheimer's disease and of an amyloid fibril that enhances sexual transmission of human immunodeficiency virus (Sievers et al., 2011). It is worth noting that such peptide inhibitors are not limited to the 20 naturally occurring amino acids.

Co-evolution information in sequence data helps predicting membrane protein structures

The experimental determination of the structures of membrane proteins by X-ray crystallography is challenging and requires a series of sophisticated techniques for protein expression, purification and crystallization. Recent technological advances have led to a strong increase in the number of membrane proteins characterized experimentally. Prominent examples include pharmacologically highly relevant drug targets such as GPCRs (Kobilka and Schertler, 2008), drug efflux transporters (Aller et al., 2009; Nakashima et al., 2013), or ion channels (Gouaux and Mackinnon, 2005). However, the number of membrane proteins is still small compared to soluble proteins, and homology modeling of membrane proteins is therefore only possible for a small fraction of membrane proteins.

Recently, computational methods using co-evolution information have been developed to predict contacts between pairs of amino acid residues based on deep multiple sequence alignments. Although similar approaches had been proposed before, recent breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict contacts between different protein residues (Burger and van Nimwegen, 2010; de Juan et al., 2013; Hopf et al., 2012; Lapedes et al., 2002; Morcos et al., 2011; Nugent and Jones, 2012). These approaches were shown to be effective in inferring evolutionary co-variation in pairs of sequence positions within families of membrane proteins, and to use these pairwise distance constraints to generate all-atom models. These methods are expected to greatly expand the range of membrane proteins amenable to modeling due to the rapid increase sequence information, which will allow deriving more comprehensive information on evolutionary constraints (Hopf et al., 2012; Nugent and Jones, 2012).

Combining experimental methods and computational modelling for integrative structure determination

In the simplest case of combining modeling with experimental data, homology models are being used as search models for phasing diffraction data in X-Ray crystallography by searching for placements of a starting model within the crystallographic unit cell that best accounts for the measured diffraction amplitudes. This approach however, requires relatively accurate starting coordinates and often fails for starting models based on remote homologues. This limitation can be overcome by using modeling algorithms for sampling near-native conformations of the initial starting model (DiMaio et al., 2011), or even allow human protein folding game players to sample different protein conformations (Khatib et al., 2011).

Often, however, preparing protein crystals of sufficient quality to collect high-resolution diffraction data turns out to be the limiting step, and complementary methods to characterize the sample in solution have to be explored. Many of these methods provide data of relatively low resolution, and computational modeling techniques are required to generate likely structural representations compatible with the data. For example, NMR chemical shifts provide important local structural information for proteins and consistent structure generation from NMR chemical shift data has recently become feasible for proteins with sizes of up to 130 residues at an accuracy comparable to those obtained with the standard NMR protocol (Shen et al., 2009). Small-angle X-ray scattering (SAXS) is a robust and easily accessible technique for the structural characterizations of biological macromolecular complexes in solution under physiological conditions. This low resolution technique is specifically useful for characterizing large and transient complexes and movements in flexible macromolecules (Graewert and Svergun, 2013; Rambo and Tainer, 2013; Schneidman-Duhovny et al., 2012a; Trewhella et al., 2013) and provides valuable constraints e.g. for restrained macromolecular docking experiments (Schneidman-Duhovny et al., 2011). Recent advances in the measurement and interpretation of single molecule Forster fluorescence resonance energy transfer (FRET) experiments allow the generation of highly accurate distance constraints as input for restrained modeling of biomolecules and complexes (Kalinin et al., 2012). Conceptually similar constraints about the spatial proximity of pairs of Lysine residues at the surface of a protein can be derived by chemical crosslinking and analysis by peptide mass spectrometry. This type of information appears useful for reconstructing large macromolecular complexes (Walzthoeni et al., 2013).

While traditionally the main focus of macromolecular structure modeling and prediction has been on proteins, RNA structure prediction has recently gained significant attention. Interestingly, many of the approaches developed originally for protein modeling such as homology modeling, de novo prediction, quality estimation, or blind assessment of prediction methods, appear to be applicable with small adaptations to RNA structure modeling (Cruz et al., 2012; Rother et al., 2011; Seetin and Mathews, 2012; Sim et al., 2012). In contrast to proteins, RNA secondary structure can be directly characterized by an experimental approach called SHAPE-Seq, which uses selective 2'-hydroxyl acylation analyzed by primer extension sequencing to inform the modeling of RNA tertiary structure (Aviran et al., 2011).

With the focus of interest in structural biology moving towards larger macromolecular complexes and dynamic networks of interactions, individual experimental techniques are often no longer able to generate sufficient data which would allow generating a unique high-resolution atomic model of the system. The combination computational modeling with a variety of heterogeneous (low-resolution) experimental constraints has proven extremely powerful (Alber et al., 2008; Ward et al., 2013). One essential feature of such integrative structure solution techniques is that they must be able to handle ambiguous or conflicting information with different levels of accuracy (Alber et al., 2008; Rieping et al., 2005; Schneidman-Duhovny et al., 2012b). In this context, cryo-electron microscopy techniques play a central role for determining the molecular architecture of large macromolecular complexes (Lasker et al., 2012b; Velazquez-Muriel et al., 2012; Zhao et al., 2013). Recent successful examples determined by integrative (aka hybrid) techniques was the molecular architecture of the nuclear pore complex (NPC, Figure 4) – a trans-membrane complex of approximately 50 MDa with 456 constituent proteins that selectively transport cargoes across the nuclear envelope (Alber et al., 2007a; Alber et al., 2007b). In a similar approach, the molecular architecture of the 26S proteasome holo-complex was determined (Lasker et al., 2012a).

Figure 4. Integrative structure model of the nuclear pore complex NPC.

Figure 4

The molecular architecture of the approximately 50 MDa trans-membrane nuclear pore complex consist of 456 constituent proteins that selectively transport cargoes across the nuclear envelope (Alber et al., 2007a; Alber et al., 2007b). Image courtesy of Andrej Sali, UCSF (http://salilab.org).

Beyond individual models: Structural biology of cellular processes

Although many equilibrium models treat cells as “bag of enzymes”, living cells actively maintain a higher order internal structure, and the functional aspects of this topological organization in three-dimensional space are gradually being discovered. New technologies for cryo-electron tomography hold the promise for direct observation of cellular processes “in situ” (Briggs, 2013; Robinson et al., 2007; Yahav et al., 2011). Higher order three-dimensional cellular organization also includes genome topology. Genome-wide biochemical analysis methods in combination with functional data provide insights how genome topology is maintained and which influence it exhibits on gene expression and genome maintenance. The intricate interplay between transcriptional activity and spatial organization indicates a self-organizing and self-perpetuating system that uses epigenetic dynamics to regulate genome function in response to regulatory cues and to propagate cell-fate memory. Computational modeling based on data from recently developed chromosome conformation capture technology provides unprecedentedly detailed insights into the spatial organization of genomes (Cavalli and Misteli, 2013; Dekker et al., 2013; Engreitz et al., 2013; Gibcus and Dekker, 2013; Kimura et al., 2013).

The integrative modeling of complex molecular machines combines data from a broad range of experimental techniques such as electron microscopy, chemical cross-linking, proteomics, FRET, or small-angle X-ray scattering in an attempt to build an ensemble of models which is consistent with the available data (Webb et al., 2011). Often, this is an iterative process where ambiguities in the models indicate lack (or inconsistency) of the data, motivating new experiments to resolve these ambiguities. Software packages which integrate modeling capability with powerful visualization tools in projects such as or IMP/Chimera (Yang et al., 2012), Mesoscope (Al-Amoudi et al., 2011), or Life Explorer (Figure 5)(Hornus et al., 2013) will greatly facilitate the generation and interpretation of the resulting models.

Figure 5. Speculative data-driven 3D model of the bacterial division machinery.

Figure 5

The model was created with the GraphiteLifeExplorer modeling tool (Hornus et al., 2013). The FtsZ tubulin-like protein (in blue/yellow) is shaped into a double-ring. A short filament of the FtsA actin-like protein (in light blue) is shown onto the Z-ring. One FtsK motor (in grey) pumps the DNA. This translocase is linked to the membrane (not shown) by six linkers (Vendeville et al., 2011). Image courtesy of Damien Larivière (http://www.lifeexplorer.eu/).

Conveying the underlying assumptions of a computational technique, as well as estimating the expected accuracy and variability of a model will be crucial to make these computational techniques useful for the non-expert scientist. Understanding the limitations of a model is the first step to decide which interpretations and conclusions can be supported. One of the key challenges in this respect will be to develop an open environment for sharing of data, models, and algorithms which will allow us to continuously and collaboratively refine our current model of the “molecular sociology of the cell” (Robinson et al., 2007).

Highlights.

  • Structure information is available for the majority of amino acids in many organisms

  • Modeling techniques are today routinely used to complement experimental methods

  • Integrative structural biology allows studying large and complex molecular machines

  • Communicating the limitations of computational techniques remains challenging

Acknowledgements

Special thanks to Andrej Sali (UCSF) for providing the image of the NPC, Damien Larivière (Fondation Fourmentin) for the screen shot of Life Explorer, and Jürgen Haas and Stefan Bienert for help with the template coverage plot.

Abbreviations

CAPRI

Critical Assessment of Prediction of Interactions

CASP

Critical Assessment of Techniques for Protein Structure Prediction

CYP

Cytochrome P450

EM

electron microscopy

FRET

Fluorescence resonance energy transfer

GPCR

G-protein coupled receptor

HMM

Hidden Markov Model

PDB

Protein Data Bank

SAXS

Small-angle X-ray scattering

trEMBL

protein sequences in the UniProt database annotated computationally from DNA sequences

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Al-Amoudi A, Castano-Diez D, Devos DP, Russell RB, Johnson GT, Frangakis AS. The three-dimensional molecular structure of the desmosomal plaque. Proc Natl Acad Sci U S A. 2011;108:6480–6485. doi: 10.1073/pnas.1019469108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT, et al. Determining the architectures of macromolecular assemblies. Nature. 2007a;450:683–694. doi: 10.1038/nature06404. [DOI] [PubMed] [Google Scholar]
  3. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT, et al. The molecular architecture of the nuclear pore complex. Nature. 2007b;450:695–701. doi: 10.1038/nature06405. [DOI] [PubMed] [Google Scholar]
  4. Alber F, Forster F, Korkin D, Topf M, Sali A. Integrating diverse data for structure determination of macromolecular assemblies. Annu Rev Biochem. 2008;77:443–477. doi: 10.1146/annurev.biochem.77.060407.135530. [DOI] [PubMed] [Google Scholar]
  5. Aller SG, Yu J, Ward A, Weng Y, Chittaboina S, Zhuo R, Harrell PM, Trinh YT, Zhang Q, Urbatsch IL, et al. Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science. 2009;323:1718–1722. doi: 10.1126/science.1168750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Aloy P, Russell RB. Structural systems biology: modelling protein interactions. Nat Rev Mol Cell Biol. 2006;7:188–197. doi: 10.1038/nrm1859. [DOI] [PubMed] [Google Scholar]
  7. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201. doi: 10.1093/bioinformatics/bti770. [DOI] [PubMed] [Google Scholar]
  9. Arnold K, Kiefer F, Kopp J, Battey JN, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T. The Protein Model Portal. J Struct Funct Genomics. 2009;10:1–8. doi: 10.1007/s10969-008-9048-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Aviran S, Trapnell C, Lucks JB, Mortimer SA, Luo S, Schroth GP, Doudna JA, Arkin AP, Pachter L. Modeling and automation of sequencing-based characterization of RNA structure. Proc Natl Acad Sci U S A. 2011;108:11069–11074. doi: 10.1073/pnas.1106541108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. doi: 10.1126/science.1065659. [DOI] [PubMed] [Google Scholar]
  12. Benkert P, Biasini M, Schwede T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics. 2011;27:343–350. doi: 10.1093/bioinformatics/btq662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–D303. doi: 10.1093/nar/gkl971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL, Jr, Fidelis K, Frank J, et al. Outcome of a workshop on archiving structural models of biological macromolecules. Structure. 2006;14:1211–1217. doi: 10.1016/j.str.2006.06.005. [DOI] [PubMed] [Google Scholar]
  15. Bordoli L, Schwede T. Automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal. Methods Mol Biol. 2012;857:107–136. doi: 10.1007/978-1-61779-588-6_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Box GEP, Draper NR. Empirical model-building and response surfaces. New York: Wiley; 1987. [Google Scholar]
  17. Briggs JA. Structural biology in situ-the potential of subtomogram averaging. Curr Opin Struct Biol. 2013;23:261–267. doi: 10.1016/j.sbi.2013.02.003. [DOI] [PubMed] [Google Scholar]
  18. Buljan M, Chalancon G, Dunker AK, Bateman A, Balaji S, Fuxreiter M, Babu MM. Alternative splicing of intrinsically disordered regions and rewiring of protein interactions. Curr Opin Struct Biol. 2013;23:443–450. doi: 10.1016/j.sbi.2013.03.006. [DOI] [PubMed] [Google Scholar]
  19. Burger L, van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol. 2010;6:e1000633. doi: 10.1371/journal.pcbi.1000633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Carlsson J, Coleman RG, Setola V, Irwin JJ, Fan H, Schlessinger A, Sali A, Roth BL, Shoichet BK. Ligand discovery from a dopamine D3 receptor homology model and crystal structure. Nat Chem Biol. 2011;7:769–778. doi: 10.1038/nchembio.662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cavalli G, Misteli T. Functional implications of genome topology. Nat Struct Mol Biol. 2013;20:290–299. doi: 10.1038/nsmb.2474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Chen VB, Arendall WB, 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010;66:12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Colak R, Kim T, Michaut M, Sun M, Irimia M, Bellay J, Myers CL, Blencowe BJ, Kim PM. Distinct types of disorder in the human proteome: functional implications for alternative splicing. PLoS Comput Biol. 2013;9:e1003030. doi: 10.1371/journal.pcbi.1003030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomics research. Nature. 2003;422:835–847. doi: 10.1038/nature01626. [DOI] [PubMed] [Google Scholar]
  26. Costanzi S. Modeling G protein-coupled receptors and their interactions with ligands. Curr Opin Struct Biol. 2013;23:185–190. doi: 10.1016/j.sbi.2013.01.008. [DOI] [PubMed] [Google Scholar]
  27. Cruz JA, Blanchet MF, Boniecki M, Bujnicki JM, Chen SJ, Cao S, Das R, Ding F, Dokholyan NV, Flores SC, et al. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA. 2012;18:610–625. doi: 10.1261/rna.031054.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Damm-Ganamet KL, Smith RD, Dunbar JB, Jr, Stuckey JA, Carlson HA. CSAR Benchmark Exercise 2011–2012: Evaluation of Results from Docking and Relative Ranking of Blinded Congeneric Series. J Chem Inf Model. 2013 doi: 10.1021/ci400025f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Das R, Baker D. Macromolecular modeling with rosetta. Annu Rev Biochem. 2008;77:363–382. doi: 10.1146/annurev.biochem.77.062906.171838. [DOI] [PubMed] [Google Scholar]
  30. David A, Razali R, Wass MN, Sternberg MJ. Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum Mutat. 2012;33:359–363. doi: 10.1002/humu.21656. [DOI] [PubMed] [Google Scholar]
  31. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet. 2013;14:249–261. doi: 10.1038/nrg3414. [DOI] [PubMed] [Google Scholar]
  32. Dekker J, Marti-Renom MA, Mirny LA. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet. 2013;14:390–403. doi: 10.1038/nrg3454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A, Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D, Axelrod HL, et al. Improved molecular replacement by density- and energy-guided protein structure optimization. Nature. 2011;473:540–543. doi: 10.1038/nature09964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Eisenberg D, Jucker M. The amyloid state of proteins in human diseases. Cell. 2012;148:1188–1203. doi: 10.1016/j.cell.2012.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Engreitz JM, Pandya-Jones A, McDonel P, Shishkin A, Sirokman K, Surka C, Kadri S, Xing J, Goren A, Lander ES, et al. The Xist lncRNA Exploits Three-Dimensional Genome Architecture to Spread Across the X Chromosome. Science. 2013 doi: 10.1126/science.1237973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch EM, Wilson IA, Baker D. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science. 2011;332:816–821. doi: 10.1126/science.1202617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gibcus JH, Dekker J. The hierarchy of the 3D genome. Mol Cell. 2013;49:773–782. doi: 10.1016/j.molcel.2013.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics. 2003;19:1015–1018. doi: 10.1093/bioinformatics/btg124. [DOI] [PubMed] [Google Scholar]
  39. Gouaux E, Mackinnon R. Principles of selective ion transport in channels and pumps. Science. 2005;310:1461–1465. doi: 10.1126/science.1113666. [DOI] [PubMed] [Google Scholar]
  40. Graewert MA, Svergun DI. Impact and progress in small and wide angle X-ray scattering (SAXS and WAXS) Curr Opin Struct Biol. 2013 doi: 10.1016/j.sbi.2013.06.007. [DOI] [PubMed] [Google Scholar]
  41. Guex N, Peitsch MC, Schwede T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis. 2009;30(Suppl 1):S162–S173. doi: 10.1002/elps.200900140. [DOI] [PubMed] [Google Scholar]
  42. Haas J, Roth S, Arnold K, Kiefer F, Schmidt T, Bordoli L, Schwede T. The Protein Model Portal--a comprehensive resource for protein structure and model information. Database (Oxford) 2013;2013 doi: 10.1093/database/bat031. bat031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Hennig J, Wang I, Sonntag M, Gabel F, Sattler M. Combining NMR and small angle X-ray and neutron scattering in the structural analysis of a ternary protein-RNA complex. J Biomol NMR. 2013;56:17–30. doi: 10.1007/s10858-013-9719-9. [DOI] [PubMed] [Google Scholar]
  44. Hildebrand A, Remmert M, Biegert A, Soding J. Fast and accurate automatic structure prediction with HHpred. Proteins. 2009;77(Suppl 9):128–132. doi: 10.1002/prot.22499. [DOI] [PubMed] [Google Scholar]
  45. Hooft RW, Vriend G, Sander C, Abola EE. Errors in protein structures. Nature. 1996;381:272. doi: 10.1038/381272a0. [DOI] [PubMed] [Google Scholar]
  46. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012;149:1607–1621. doi: 10.1016/j.cell.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Hornus S, Levy B, Lariviere D, Fourmentin E. Easy DNA modeling and more with GraphiteLifeExplorer. PLoS One. 2013;8:e53609. doi: 10.1371/journal.pone.0053609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Janin J. Protein-protein docking tested in blind predictions: the CAPRI experiment. Mol Biosyst. 2010;6:2351–2362. doi: 10.1039/c005060c. [DOI] [PubMed] [Google Scholar]
  49. Janin J, Sternberg MJ. Protein flexibility, not disorder, is intrinsic to molecular recognition. F1000 Biol Rep. 2013;5:2. doi: 10.3410/B5-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Javadi Y, Itzhaki LS. Tandem-repeat proteins: regularity plus modularity equals design-ability. Curr Opin Struct Biol. 2013 doi: 10.1016/j.sbi.2013.06.011. [DOI] [PubMed] [Google Scholar]
  51. Kairys V, Fernandes MX, Gilson MK. Screening drug-like compounds by docking to homology models: a systematic study. J Chem Inf Model. 2006;46:365–379. doi: 10.1021/ci050238c. [DOI] [PubMed] [Google Scholar]
  52. Kalinin S, Peulen T, Sindbert S, Rothwell PJ, Berger S, Restle T, Goody RS, Gohlke H, Seidel CA. A toolkit and benchmark study for FRET-restrained high-precision structural modeling. Nat Methods. 2012;9:1218–1225. doi: 10.1038/nmeth.2222. [DOI] [PubMed] [Google Scholar]
  53. Kastritis PL, Bonvin AM. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J R Soc Interface. 2013;10:20120835. doi: 10.1098/rsif.2012.0835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Kelley LA, Sternberg MJ. Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc. 2009;4:363–371. doi: 10.1038/nprot.2009.2. [DOI] [PubMed] [Google Scholar]
  55. Khatib F, DiMaio F, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I, Thompson J, Popovic Z, et al. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat Struct Mol Biol. 2011;18:1175–1177. doi: 10.1038/nsmb.2119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 2009;37:D387–D392. doi: 10.1093/nar/gkn750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kimura H, Shimooka Y, Nishikawa JI, Miura O, Sugiyama S, Yamada S, Ohyama T. The genome folding mechanism in yeast. J Biochem. 2013 doi: 10.1093/jb/mvt033. [DOI] [PubMed] [Google Scholar]
  58. Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins. 2011;79(Suppl 10):59–73. doi: 10.1002/prot.23181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. King NP, Sheffler W, Sawaya MR, Vollmar BS, Sumida JP, Andre I, Gonen T, Yeates TO, Baker D. Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science. 2012;336:1171–1174. doi: 10.1126/science.1219364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Kobilka B, Schertler GF. New G-protein-coupled receptor crystal structures: insights and limitations. Trends Pharmacol Sci. 2008;29:79–83. doi: 10.1016/j.tips.2007.11.009. [DOI] [PubMed] [Google Scholar]
  61. Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT, Baker D. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Koh IY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Grana O, Pazos F, Valencia A, Sali A, et al. EVA: Evaluation of protein structure prediction servers. Nucleic Acids Res. 2003;31:3311–3315. doi: 10.1093/nar/gkg619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: Evaluation of the model quality estimates in CASP10. Proteins. 2013 doi: 10.1002/prot.24347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Kufareva I, Rueda M, Katritch V, Stevens RC, Abagyan R. Status of GPCR modeling and docking as reflected by community-wide GPCR Dock 2010 assessment. Structure. 2011;19:1108–1126. doi: 10.1016/j.str.2011.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Kundrotas PJ, Zhu Z, Janin J, Vakser IA. Templates are available to model nearly all complexes of structurally characterized proteins. Proc Natl Acad Sci U S A. 2012;109:9438–9441. doi: 10.1073/pnas.1200678109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lapedes A, Giraud B, Jarzynski C. Using sequence alignments to predict protein structure and stability with high accuracy. 2002 arXiv:12072484. [Google Scholar]
  67. Larsson P, Skwark MJ, Wallner B, Elofsson A. Improved predictions by Pcons.net using multiple templates. Bioinformatics. 2011;27:426–427. doi: 10.1093/bioinformatics/btq664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Lasker K, Forster F, Bohn S, Walzthoeni T, Villa E, Unverdorben P, Beck F, Aebersold R, Sali A, Baumeister W. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc Natl Acad Sci U S A. 2012a;109:1380–1387. doi: 10.1073/pnas.1120559109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lasker K, Velazquez-Muriel JA, Webb BM, Yang Z, Ferrin TE, Sali A. Macromolecular assembly structures by comparative modeling and electron microscopy. Methods Mol Biol. 2012b;857:331–350. doi: 10.1007/978-1-61779-588-6_15. [DOI] [PubMed] [Google Scholar]
  70. Laskowski RA, Macarthur MW, Moss DS, Thornton JM. Procheck - a Program to Check the Stereochemical Quality of Protein Structures. Journal of Applied Crystallography. 1993;26:283–291. [Google Scholar]
  71. Lees J, Yeats C, Perkins J, Sillitoe I, Rentzsch R, Dessailly BH, Orengo C. Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Res. 2012;40:D465–D471. doi: 10.1093/nar/gkr1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Levitt M. Nature of the protein universe. Proc Natl Acad Sci U S A. 2009;106:11079–11084. doi: 10.1073/pnas.0905029106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Ma B, Tsai CJ, Haliloglu T, Nussinov R. Dynamic allostery: linkers are not merely flexible. Structure. 2011;19:907–917. doi: 10.1016/j.str.2011.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. MacCallum JL, Perez A, Schnieders MJ, Hua L, Jacobson MP, Dill KA. Assessment of protein structure refinement in CASP9. Proteins. 2011;79(Suppl 10):74–90. doi: 10.1002/prot.23131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins. 2011;79(Suppl 10):37–58. doi: 10.1002/prot.23177. [DOI] [PubMed] [Google Scholar]
  76. McGovern SL, Shoichet BK. Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. J Med Chem. 2003;46:2895–2907. doi: 10.1021/jm0300330. [DOI] [PubMed] [Google Scholar]
  77. McGuffin LJ, Buenavista MT, Roche DB. The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res. 2013;41:W368–W372. doi: 10.1093/nar/gkt294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011;108:E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round IX. Proteins. 2011;79(Suppl 10):1–5. doi: 10.1002/prot.23200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Nakashima R, Sakurai K, Yamasaki S, Hayashi K, Nagata C, Hoshino K, Onodera Y, Nishino K, Yamaguchi A. Structural basis for the inhibition of bacterial multidrug exporters. Nature. 2013 doi: 10.1038/nature12300. [DOI] [PubMed] [Google Scholar]
  81. Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci U S A. 2012;109:E1540–E1547. doi: 10.1073/pnas.1120036109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Nygaard R, Zou Y, Dror RO, Mildorf TJ, Arlow DH, Manglik A, Pan AC, Liu CW, Fung JJ, Bokoch MP, et al. The dynamic process of beta(2)-adrenergic receptor activation. Cell. 2013;152:532–542. doi: 10.1016/j.cell.2013.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Ochi T, Wu Q, Chirgadze DY, Grossmann JG, Bolanos-Garcia VM, Blundell TL. Structural insights into the role of domain flexibility in human DNA ligase IV. Structure. 2012;20:1212–1222. doi: 10.1016/j.str.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Oshiro C, Bradley EK, Eksterowicz J, Evensen E, Lamb ML, Lanctot JK, Putta S, Stanton R, Grootenhuis PD. Performance of 3D-database molecular docking studies into homology models. J Med Chem. 2004;47:764–767. doi: 10.1021/jm0300781. [DOI] [PubMed] [Google Scholar]
  85. Peitsch MC. Protein Modeling by E-mail. Nat Biotech. 1995;13:658–660. [Google Scholar]
  86. Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, et al. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2011;39:D465–D474. doi: 10.1093/nar/gkq1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O, et al. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins. 2009;77(Suppl 9):89–99. doi: 10.1002/prot.22540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Rambo RP, Tainer JA. Super-resolution in solution X-ray scattering and its applications to structural systems biology. Annu Rev Biophys. 2013;42:415–441. doi: 10.1146/annurev-biophys-083012-130301. [DOI] [PubMed] [Google Scholar]
  89. Ray A, Lindahl E, Wallner B. Improved model quality assessment using ProQ2. BMC Bioinformatics. 2012;13:224. doi: 10.1186/1471-2105-13-224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Read RJ, Adams PD, Arendall WB, 3rd, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lutteke T, Otwinowski Z, et al. A new generation of crystallographic validation tools for the protein data bank. Structure. 2011;19:1395–1412. doi: 10.1016/j.str.2011.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012;9:173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
  92. Rieping W, Habeck M, Nilges M. Inferential structure determination. Science. 2005;309:303–306. doi: 10.1126/science.1110428. [DOI] [PubMed] [Google Scholar]
  93. Ripphausen P, Nisius B, Peltason L, Bajorath J. Quo vadis, virtual screening? A comprehensive survey of prospective applications. J Med Chem. 2010;53:8461–8467. doi: 10.1021/jm101020z. [DOI] [PubMed] [Google Scholar]
  94. Robinson CV, Sali A, Baumeister W. The molecular sociology of the cell. Nature. 2007;450:973–982. doi: 10.1038/nature06523. [DOI] [PubMed] [Google Scholar]
  95. Roche DB, Buenavista MT, Tetchner SJ, McGuffin LJ. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res. 2011;39:W171–W176. doi: 10.1093/nar/gkr184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Rother M, Rother K, Puton T, Bujnicki JM. RNA tertiary structure prediction with ModeRNA. Brief Bioinform. 2011;12:601–613. doi: 10.1093/bib/bbr050. [DOI] [PubMed] [Google Scholar]
  97. Russel D, Lasker K, Webb B, Velazquez-Muriel J, Tjioe E, Schneidman-Duhovny D, Peterson B, Sali A. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 2012;10:e1001244. doi: 10.1371/journal.pbio.1001244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Rychlewski L, Fischer D. LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci. 2005;14:240–245. doi: 10.1110/ps.04888805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Safi M, Lilien RH. Efficient a priori identification of drug resistant mutations using Dead-End Elimination and MM-PBSA. J Chem Inf Model. 2012;52:1529–1541. doi: 10.1021/ci200626m. [DOI] [PubMed] [Google Scholar]
  100. SAMPL. Proceedings of the Third Annual Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) Challenge and Workshop. June 2009. Montreal, Canada. J Comput Aided Mol Des. 2010;24:257–383. [PubMed] [Google Scholar]
  101. Sanchez R, Sali A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci U S A. 1998;95:13597–13602. doi: 10.1073/pnas.95.23.13597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Schneidman-Duhovny D, Hammel M, Sali A. Macromolecular docking restrained by a small angle X-ray scattering profile. J Struct Biol. 2011;173:461–471. doi: 10.1016/j.jsb.2010.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Schneidman-Duhovny D, Kim SJ, Sali A. Integrative structural modeling with small angle X-ray scattering profiles. BMC Struct Biol. 2012a;12:17. doi: 10.1186/1472-6807-12-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Schneidman-Duhovny D, Rossi A, Avila-Sakar A, Kim SJ, Velazquez-Muriel J, Strop P, Liang H, Krukenberg KA, Liao M, Kim HM, et al. A method for integrative structure determination of protein-protein complexes. Bioinformatics. 2012b;28:3282–3289. doi: 10.1093/bioinformatics/bts628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Schwede T, Sali A, Honig B, Levitt M, Berman HM, Jones D, Brenner SE, Burley SK, Das R, Dokholyan NV, et al. Outcome of a workshop on applications of protein models in biomedical research. Structure. 2009;17:151–159. doi: 10.1016/j.str.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Seetin MG, Mathews DH. RNA structure prediction: an overview of methods. Methods Mol Biol. 2012;905:99–122. doi: 10.1007/978-1-61779-949-5_8. [DOI] [PubMed] [Google Scholar]
  107. Shen Y, Vernon R, Baker D, Bax A. De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR. 2009;43:63–78. doi: 10.1007/s10858-008-9288-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Sievers SA, Karanicolas J, Chang HW, Zhao A, Jiang L, Zirafi O, Stevens JT, Munch J, Baker D, Eisenberg D. Structure-based design of non-natural amino-acid inhibitors of amyloid fibril formation. Nature. 2011;475:96–100. doi: 10.1038/nature10154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Sim AY, Minary P, Levitt M. Modeling nucleic acids. Curr Opin Struct Biol. 2012;22:273–278. doi: 10.1016/j.sbi.2012.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol. 2013;23:191–197. doi: 10.1016/j.sbi.2013.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Stein A, Mosca R, Aloy P. Three-dimensional modeling of protein interactions and complexes is going 'omics. Curr Opin Struct Biol. 2011;21:200–208. doi: 10.1016/j.sbi.2011.01.005. [DOI] [PubMed] [Google Scholar]
  112. Sutcliffe MJ, Haneef I, Carney D, Blundell TL. Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1987;1:377–384. doi: 10.1093/protein/1.5.377. [DOI] [PubMed] [Google Scholar]
  113. Tamaskovic R, Simon M, Stefan N, Schwill M, Pluckthun A. Designed ankyrin repeat proteins (DARPins) from research to therapy. Methods Enzymol. 2012;503:101–134. doi: 10.1016/B978-0-12-396962-0.00005-7. [DOI] [PubMed] [Google Scholar]
  114. Terwilliger TC. The success of structural genomics. J Struct Funct Genomics. 2011;12:43–44. doi: 10.1007/s10969-011-9114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Trewhella J, Hendrickson WA, Kleywegt GJ, Sali A, Sato M, Schwede T, Svergun DI, Tainer JA, Westbrook J, Berman HM. Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure. 2013;21:875–881. doi: 10.1016/j.str.2013.04.020. [DOI] [PubMed] [Google Scholar]
  116. UniProt-Consortium, T. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013;41:D43–D47. doi: 10.1093/nar/gks1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Unwalla RJ, Cross JB, Salaniwal S, Shilling AD, Leung L, Kao J, Humblet C. Using a homology model of cytochrome P450 2D6 to predict substrate site of metabolism. J Comput Aided Mol Des. 2010;24:237–256. doi: 10.1007/s10822-010-9336-6. [DOI] [PubMed] [Google Scholar]
  118. Uversky VN, Dunker AK. The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biol Rep. 2013;5:1. doi: 10.3410/B5-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Vakser IA. Low-resolution structural modeling of protein interactome. Curr Opin Struct Biol. 2013;23:198–205. doi: 10.1016/j.sbi.2012.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Velazquez-Muriel J, Lasker K, Russel D, Phillips J, Webb BM, Schneidman-Duhovny D, Sali A. Assembly of macromolecular complexes by satisfaction of spatial restraints from electron microscopy images. Proc Natl Acad Sci U S A. 2012;109:18821–18826. doi: 10.1073/pnas.1216549109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Vendeville A, Lariviere D, Fourmentin E. An inventory of the bacterial macromolecular components and their spatial organization. FEMS Microbiol Rev. 2011;35:395–414. doi: 10.1111/j.1574-6976.2010.00254.x. [DOI] [PubMed] [Google Scholar]
  122. Vidal M, Cusick ME, Barabasi AL. Interactome networks and human disease. Cell. 2011;144:986–998. doi: 10.1016/j.cell.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Walzthoeni T, Leitner A, Stengel F, Aebersold R. Mass spectrometry supported determination of protein complex structure. Curr Opin Struct Biol. 2013;23:252–260. doi: 10.1016/j.sbi.2013.02.008. [DOI] [PubMed] [Google Scholar]
  124. Ward AB, Sali A, Wilson IA. Biochemistry. Integrative structural biology. Science. 2013;339:913–915. doi: 10.1126/science.1228565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337:635–645. doi: 10.1016/j.jmb.2004.02.002. [DOI] [PubMed] [Google Scholar]
  126. Wass MN, David A, Sternberg MJ. Challenges for the prediction of macromolecular interactions. Curr Opin Struct Biol. 2011;21:382–390. doi: 10.1016/j.sbi.2011.03.013. [DOI] [PubMed] [Google Scholar]
  127. Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–738. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
  128. Webb B, Lasker K, Schneidman-Duhovny D, Tjioe E, Phillips J, Kim SJ, Velazquez-Muriel J, Russel D, Sali A. Modeling of proteins and their assemblies with the integrative modeling platform. Methods Mol Biol. 2011;781:377–397. doi: 10.1007/978-1-61779-276-2_19. [DOI] [PubMed] [Google Scholar]
  129. Wei Q, Xu Q, Dunbrack RL., Jr Prediction of phenotypes of missense mutations in human proteins from biological assemblies. Proteins. 2013;81:199–213. doi: 10.1002/prot.24176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Weinkam P, Pons J, Sali A. Structure-based model of allostery predicts coupling between distant sites. Proc Natl Acad Sci U S A. 2012;109:4875–4880. doi: 10.1073/pnas.1116274109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Whitehead TA, Baker D, Fleishman SJ. Computational design of novel protein binders and experimental affinity maturation. Methods Enzymol. 2013;523:1–19. doi: 10.1016/B978-0-12-394292-0.00001-1. [DOI] [PubMed] [Google Scholar]
  132. Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35:W407–W410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Xu Q, Dunbrack RL., Jr The protein common interface database (ProtCID)--a comprehensive database of interactions of homologous proteins in multiple crystal forms. Nucleic Acids Res. 2011;39:D761–D770. doi: 10.1093/nar/gkq1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Yahav T, Maimon T, Grossman E, Dahan I, Medalia O. Cryo-electron tomography: gaining insight into cellular processes by structural approaches. Curr Opin Struct Biol. 2011;21:670–677. doi: 10.1016/j.sbi.2011.07.004. [DOI] [PubMed] [Google Scholar]
  135. Yang Z, Lasker K, Schneidman-Duhovny D, Webb B, Huang CC, Pettersen EF, Goddard TD, Meng EC, Sali A, Ferrin TE. UCSF Chimera, MODELLER, and IMP: an integrated modeling system. J Struct Biol. 2012;179:269–278. doi: 10.1016/j.jsb.2011.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, Lefebvre C, Accili D, Hunter T, et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature. 2012;490:556–560. doi: 10.1038/nature11503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins. 2013 doi: 10.1002/prot.24341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Zhang Y, Thiele I, Weekes D, Li Z, Jaroszewski L, Ginalski K, Deacon AM, Wooley J, Lesley SA, Wilson IA, et al. Three-dimensional structural view of the central metabolic network of Thermotoga maritima. Science. 2009;325:1544–1549. doi: 10.1126/science.1174671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, Ahn J, Gronenborn AM, Schulten K, Aiken C, et al. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature. 2013;497:643–646. doi: 10.1038/nature12162. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES