Abstract
Chemical shifts are highly sensitive probes harnessed by NMR spectroscopists and structural biologists as conformational parameters to characterize a range of biological molecules. Traditionally, assignment of chemical shifts has been a labor-intensive process requiring numerous samples and a suite of multidimensional experiments. Over the past two decades, the development of complementary computational approaches has bolstered the analysis, interpretation and utilization of chemical shifts for elucidation of high resolution protein and nucleic acid structures. Here, we review the development and application of chemical shift-based methods for structure determination with a focus on ab initio fragment assembly, comparative modeling, oligomeric systems, and automated assignment methods. Throughout our discussion, we point out practical uses, as well as advantages and caveats, of using chemical shifts in structure modeling. We additionally highlight (i) hybrid methods that employ chemical shifts with other types of NMR restraints (residual dipolar couplings, paramagnetic relaxation enhancements and pseudocontact shifts) that allow for improved accuracy and resolution of generated 3D structures, (ii) the utilization of chemical shifts to model the structures of sparsely populated excited states, and (iii) modeling of side-chain conformations. Finally, we briefly discuss the advantages of contemporary methods that employ sparse NMR data recorded using site-specific isotope labeling schemes for chemical shift-driven structure determination of larger molecules. With this review, we aim to emphasize the accessibility and versatility of chemical shifts for structure determination of challenging biological systems, and to point out emerging areas of development that lead us towards the next generation of tools.
Keywords: Protein structure, Automated methods, NMR structure determination, Chemical shifts, CS-Rosetta, Hybrid methods
1. Introduction
Interpretation of chemical shifts serves as the primary work-horse for both solution-and solid-state nuclear magnetic resonance (NMR) studies of biological systems. Chemical shifts are highly reproducible, sensitive parameters with far-reaching utility in characterizing the structure and dynamics of a diverse range of biomolecules which carry out important cellular functions. When fully deciphered, chemical shifts report on the local magnetic environment of nuclei, allowing for insights into backbone secondary structure, sidechain conformations, dynamics, solvation, and hydrogen bonding [1–4]. In recent years, the notion that chemical shifts can be used to streamline the NMR structure determination process has been supported by landmark methods to determine high resolution structures solely from chemical shift data [4–6]. Many of these methods aim to circumvent the need to acquire and analyze additional restraints, typically provided by nuclear Overhauser effect (NOE), residual dipolar coupling (RDC) or paramagnetic relaxation enhancement (PRE) measurements [7]. More-over, the majority of chemical shift-based structure determination methods can be easily integrated with complementary classical approaches, where backbone resonance assignment is performed through standard triple-resonance experiments, and sidechain resonances are assigned through TOCSY (Total Correlation Spectroscopy)- and COSY (Correlation Spectroscopy)-type experiments [8,9]. More recent advances in chemical shift-based methods stem from the combination of new, highly sensitive experiments, and robust computational algorithms required for analysis of complex NMR datasets [5].
Currently, there are more than 119,000 and 12,000 3D structures in the Protein Data Bank (PDB) [10] solved by X-ray crystallography and NMR spectroscopy, respectively. Notwithstanding the limitations in size and dynamic complexity, NMR methods allow for high resolution studies of macromolecules in their functionally relevant, aqueous environment [11]. Typically, a diverse set of parameters, including chemical shifts, RDC, and NOE restraints are combined to yield NMR structural ensembles showing various levels of precision and accuracy [12–14]. The entire process involves a series of time-consuming steps, such as peak picking, chemical shift and NOE cross-peak assignment, structure calculation, validation and refinement [15]. Thus, while structure determination by X-ray crystallography is relatively streamlined towards high throughput applications, structure determination by NMR remains a lengthy, labor-intensive process where the main bottlenecks (assignment of backbone/sidechain resonances and NOE cross-peaks) would greatly benefit from automated procedures [16]. Moreover, conventional NMR structure determination methods rely on a large number of complementary multidimensional datasets and abundant restraint densities (of the order of 15–20 restraints per residue [17]) to obtain reliable, high resolution structures. Thus, the use of automated algorithms for chemical shift assignment [7,18] and NOE cross-peak assignment [7,15] is an attractive avenue to facilitate the structure determination process. Here, protocols which utilize chemical shifts to drive structure modeling can have a significant impact on the quality and efficiency of automated assignments.
Alongside the development of sophisticated NMR methods, remarkable progress has been achieved in the field of protein and nucleic acid structure prediction. In particular, the application of algorithms that make use of information from known structures in the PDB, together with physically realistic energy functions, have enabled modeling biomolecular structures with reasonable accuracy [19,20]. However, modeling structures exclusively from their sequence is extremely challenging and limited in scope to smaller systems as highlighted by the CASP (Critical Assessment of Techniques for Protein Structure Prediction) initiative [21,22]. Concurrently, the combination of computational methods together with sparse experimental data from a variety of techniques has provided a new paradigm for medium to low-resolution structural studies of macromolecular assemblies and other complex systems [23,24]. The integration of such methods with chemical shift data offers an opportunity for further automation of the NMR structure determination process. Thus, during the past decade, a large number of studies have highlighted the great promise held by this new approach towards studies of monomeric proteins, protein complexes and other biomolecules spanning a range of sizes and dynamic complexities (Fig. 1). In this review, we discuss the development of NMR chemical shift-based methods with a focus on de novo structure determination of proteins in the solution-state (Table 1). Specifically, we illustrate how the implementation of automated methods that take advantage of chemical shift data, either exclusively or in combination with additional experimental restraints, allows for accurate structure determination in a range of applications.
Fig. 1.

Progress in structure determination of biological molecules utilizing NMR chemical shifts. De novo structures determined using sequence-based (ab initio, NK-Lysin and Calbindin) or chemical shift-based (Interleukin-1β, SEN15, NeR45A, TCTP, ALG13, PSE-4 β-lactamase, MBP, Sc.ai5γ RNA) methods. Molecular weight (y-axis, kDa) is shown as a function of time (x-axis, year). The examples shown are Calbindin [58], NK-Lysin [57], Interleukin-1β [64], Sc.ai5γ RNA [210], SEN15 [59], NeR45A [60], TCTP [214], ALG13 [82], PSE-4 b-lactamase [81] and MBP [87,109,128]. PDB IDs and methods used are given for each example.
Table 1.
Evolution of methods (a subset mentioned in this review) that have influenced the development of NMR chemical shift-based structure determination.
| Year | Description (method) | Reference |
|---|---|---|
| 1993 | NOE assignment and structure refinement approach (ASNO) | [213] |
| 1995 | Introduction of ambiguous NOE distance restraints | [144] |
| 1997 | Protein structure determination methods using fragment assembly (Ab initio) | [57,58] |
| 1997 | NMR structure calculation using ambiguous distance restraints in ARIA (Ambiguous Restraints for Iterative Assignment) | [211,215] |
| 1997 | Automated 1H and 13C chemical shift prediction (SHIFTY) | [40] |
| 1997 | Prediction of DNA chemical shifts (NUCHEMICS) | [200] |
| 1999 | Prediction of backbone torsion angles using chemical shifts (TALOS) | [68] |
| 2000 | Structure prediction using MFR and NMR dipolar couplings | [62] |
| 2001 | Prediction of 15N, 13Ca, 13Cb, and 13C′ chemical shifts using density functional database (SHIFTS) | [45] |
| 2001 | Prediction of RNA chemical shifts (NUCHEMICS) | [202] |
| 2003 | NMR chemical shift prediction using artificial neural networks (PROSHIFT) | [46] |
| 2003 | Calculation of 1H, 13C, and 15N chemical shifts (SHIFTX) | [41] |
| 2003 | Protein-Protein docking using ambiguous NMR distance restraints obtained from chemical shift perturbation (HADDOCK) | [139] |
| 2003 | Macro molecular structure determination package (Xplor-NIH) | [172] |
| 2005 | MFR approach to determine structures using chemical shifts and dipolar coupling homology (MFR+) | [64] |
| 2006 | Protein structure prediction using NMR chemical shifts and unassigned NOESY data (ASDP) | [133] |
| 2007 | Yet another chemical shift-based structure prediction method (CHESHIRE) | [59] |
| 2007 | Chemical shift prediction from torsion angles and sequence homology (SPARTA) | [42] |
| 2008 | Blind protein structure determination using NMR chemical shifts (CS-Rosetta) | [60] |
| 2008 | A flexible protein-protein docking method guided by NMR chemical shifts (CamDock) | [152] |
| 2009 | Determination of NMR chemical shifts from inter atomic distances (CamShift) | [44] |
| 2009 | Determination of homo-oligomeric complexes using chemical shifts with docking protocol | [149] |
| 2009 | Structure determination using chemical shift restraints and Monte Carlo simulations | [98] |
| 2009 | Hybrid method for backbone torsion angle prediction from NMR chemical shifts (TALOS+) | [36] |
| 2010 | Structure determination using backbone chemical shifts and RDC data from iterative CS-Rosetta (CS-RDC-Rosetta) | [81] |
| 2010 | MD simulations of proteins using NMR chemical shifts (CS-MD) | [97] |
| 2010 | Improved chemical shift prediction using artificial neural networks (SPARTA+) | [43] |
| 2010 | A new web server for protein-protein docking using data derived from NMR (HADDOCK Web Server) | [140] |
| 2010 | Xaa-Pro peptide bond conformation prediction using chemical shifts and amino acid sequence (Proline Omega angle prediction (PROMEGA)) | [38] |
| 2011 | New and improved chemical shift predictor (SHIFTX2) | [47] |
| 2011 | Algorithm for modeling oligomers from chemical shifts and dipolar couplings (RosettaOligomers) | [137] |
| 2011 | Modeling protein complexes using NMR chemical shifts (CHESHIRE/CamDock) | [154] |
| 2011 | Applying pseudocontact shifts to perform protein-protein docking using HADDOCK | [234] |
| 2011 | Chemical shift prediction of methyl groups (CH3SHIFT) | [181] |
| 2012 | A new molecular fragment replacement method that uses protein docking algorithms to perform fragment assembly | [65] |
| 2012 | Improved sampling in protein structure calculation guided by NMR restraints (RASREC-Rosetta) | [82] |
| 2012 | Structure determination of large proteins using sparse NMR data collected from deuterated samples (RASREC-Rosetta) | [87] |
| 2012 | Utilizing pseudocontact shifts to guide protein structure prediction (PCS-Rosetta) | [232] |
| 2012 | Accurate structure modeling using sparse NMR data and restraints derived from homologous proteins (CS-HM-Rosetta) | [104] |
| 2013 | Utilizing backbone amide pseudocontact shifts generated from paramagnetic tags to determine protein structures (GPS-Rosetta) | [233] |
| 2013 | Backbone and sidechain torsion angle prediction using artificial neural networks (TALOS-N) | [110] |
| 2013 | Applying proton chemical shifts to
determine helical structures of nucleic acids (Chemical Shift de
novo Structure Derivation Protocol Employing Singular Value Decomposition (CHEOPS)) |
[195] |
| 2014 | A new automated NOE assignment algorithm and structure calculation with improved sampling in Rosetta AutoNOE-Rosetta) | [214] |
| 2014 | Using proton chemical shifts to predict non-canonical RNA motifs (CS-Rosetta-RNA) | [210] |
| 2015 | Modeling of protein structures using chemical shift homology (CS-RosettaCM/Pomona) | [109] |
| 2015 | Structure modeling using NMR data and evolutionary coupling restraints (EC-NMR) | [128] |
| 2015 | Protein structure determination using sparse NMR data and evolutionary information (RASREC-Rosetta with evolutionary restraints) | [127] |
| 2016 | HADDOCK2.2 web server for modeling protein complexes using NMR data | [142] |
| 2016 | Protein structure determination using
solvent accessible surface area from paramagnetic relaxation enhancement
measurements (sPRE-CS- Rosetta) |
[229] |
| 2017 | Backbone and sidechain resonance assignment method using two 4D spectra | [216] |
2. Modeling backbone torsion angles from chemical shifts
The strong dependence of isotropic chemical shifts on the local backbone geometry has motivated the development of methods to determine protein torsion (or dihedral) angles from a basic set of shifts (reviewed in [2]). The backbone and sidechain dihedral angles define the local conformation of a polypeptide chain, thus governing secondary structure, sidechain packing, and overall tertiary/quaternary folds. As a result, determination of peptide back-bone (ɸ, ψ and ω) and sidechain (χi) dihedral angles directly from chemical shifts provides valuable restraints during structure calculation and refinement [25–27], especially in the absence of NOE restraints. While torsion angles can be directly measured through scalar J-couplings [28–30] or dipole-dipole and dipole-chemical shift anisotropy (CSA) cross-correlated relaxation [31,32], these methods are less applicable to larger proteins where NMR resonances undergo significant line-broadening and reduction in signal-to-noise, with both effects limiting the applications of these experiments [33].
Whereas DFT (Density Functional Theory)-based calculations can provide valuable insights into the dependence of chemical shift values on local geometry for different nuclei (such as 1Hα, 15N, 13Cα, 13Cβ, 13Cʹ), empirical approaches have generally been more successful in directly modeling the local backbone structure [4]. Here, torsion angles are predicted based on amino acid sequence and chemical shift similarity relative to a curated database of assigned chemical shifts/protein structure pairs derived from the PDB and Biological Magnetic Resonance Bank (BMRB) [34]. Several methods stemming from this approach have already been reviewed in [4,5,7]. As expected, the accuracy of these chemical shift-based methods is higher (>73% of the torsion angle predictions lie within 30° from corresponding angles in the reference structures) compared to approaches that exclusively use sequence information (>65% of the predictions lie within 36° from the reference angles) [35]. Larger databases, together with more consistent referencing of chemical shift values are expected to improve the accuracy and precision of torsion angle predictions even further [36,37]. In addition to torsion angles, alternative methods use chemical shift data to elucidate secondary structures [4,5,7] and less frequently occurring Xaa-Pro peptide bond conformations [38]. While all these methods provide local restraints, which can be applied directly in structure calculations, the majority of NMR structure determination programs are supplemented with additional sources of NMR data in order to obtain highly converged models.
3. Predicting chemical shifts from known structures
Alongside the efforts to derive structural restraints from chemical shifts, prediction of the chemical shifts of known structures is an active field of research where a variety of sequence-based, structure-based and hybrid approaches have been developed [4,5]. Accurate chemical shift predictions from an available X-ray structure can actively aid in making chemical shift assignments, as well as in structure modeling and validation [5,39]. Analogous to dihedral angle prediction methods discussed in Section 2, sequence-based chemical shift prediction methods are based on the concept that sequence similarity often results in local structure and chemical shift homology. This idea forms the basis of SHIFTY [40], an early method, that is able to predict 1H and 13C backbone chemical shifts with a Pearson’s correlation coefficient (between experimental and predicted) >0.85 for all atoms (proton and carbon) when the query protein shares >35% sequence identity to a reference structure with known chemical shift assignments available in the BMRB. Following SHIFTY, there have been a number of methods (extensively reviewed in [4,5,7]), which can predict chemical shifts of up to 40 atom types in less than a few seconds per residue. The correlation coefficients of a few of these methods (SHIFTX [41], SPARTA (Shifts Predicted from Analogy in Residue type and Torsion Angle) [42], SPARTA+ [43], CamShift [44], SHIFTS [45], PROSHIFT [46] and SHIFTX2 [47]) range from 0.7 to 0.99 for 15N, 13Cα, 13Cα, 13Cγ , 1HN, and 1Hα backbone atoms; arguably, an exception is 1HN, where SHIFTX2 (correlation coefficient of 0.97) outperforms other methods by a large margin (correlation coefficient of other methods lie between 0.51 and 0.71), when tested on a benchmark set of 61 proteins [47]. SHIFTX/SHIFTX2 and SPARTA/SPARTA+ are comparable in performance and widely used methods due to their speed and accuracy. The practical utility of some of these methods in de novo structure determination is discussed throughout this manuscript.
4. De novo structure determination from chemical shift data
Structure prediction methods have shown great success for small to medium sized proteins (<150 residues) using various strategies, including ab initio [48,49], comparative modeling [50], fold prediction and threading [51]. However, de novo modeling of larger proteins remains a challenging problem owing to the number of feasible solutions to the conformational search problem [52]. In spite of the computational complexities involved in ab initio structure determination, there has been significant progress in the development of sophisticated methods in the past two decades. Rosetta [53], QUARK [54] and I-TASSER (Iterative Threading Assembly Refinement) [55] are a few software packages that have been widely applied to construct 3D structural models starting only from a query amino acid sequence. These methods are particularly useful when no homologs with known structures can be identified in the PDB, which is often the case for larger proteins [48].
Bowie and Eisenberg pioneered the field of ab initio prediction with their concept of generating protein models from an assembly of short, overlapping backbone fragments derived from a structural database [56] and this idea laid the foundation for several early implementations of ab initio methods [57,58]. In these methods, the selection of fragments from a high resolution protein structure database is based on sequence or secondary structure homology. Following selection, fragments are assembled using Monte Carlo-simulated annealing methods that minimize physically realistic energy functions to produce 3D structural models. Although these methods can produce low-energy models exhibiting the native fold for small proteins (<100 residues), larger targets pose significant challenges due to the quality of fragments used for assembly and the exponential increase in the conformational search space. In order to attempt to overcome the drawbacks of these early ab initio methods, several protocols that exploit NMR chemical shifts have emerged (reviewed in [7]). A great majority of these methods employ the generalized fragment assembly framework (Fig. 2). Here, sequence and chemical shifts are used to derive local structural features, such as torsion angle restraints and secondary structure information, which further guide the fragment selection from a database of high resolution X-ray structures. The selected fragments are then used to build low-resolution models starting from a fully extended protein chain, characterized by bond lengths, bond angles, and backbone torsion angles. Here, bond lengths and angles are typically fixed to ideal values and the peptide bond is assumed to be planar, therefore it is the backbone torsion angles (/ and w) that effectively define the conformation of a protein chain [59,60]. This reduction in the degrees of freedom from Cartesian to torsion angle space greatly boosts the performance of a search towards the native conformation using Monte Carlo-based optimization methods. Lastly, sidechain rotamers [61] and minor deviations from ideal values are introduced on low-resolution conformations, which undergo further refinement to reduce steric clashes, and finally to produce all-atom structural models.
Fig. 2.

General pipeline for de novo structure determination using fragment assembly. Backbone fragments are first generated from high resolution structures obtained from a curated database derived from the PDB. Fragments are then ranked according to primary amino acid sequence information and/or chemical shift-based torsion angle predictions. The assembly of selected fragments generates low-resolution models, which are iteratively refined using a physically relevant energy function to yield the final structures.
An early fragment assembly method (Molecular Fragment Replacement or MFR) utilizes experimental chemical shifts and dipolar couplings to model low-resolution structures [62]. Akin to ‘‘Molecular Replacement” methods, widely used in X-ray crystallography refinement, this approach is inspired from previous work that determined local structural fragments using sparse NOE data [63]. Specifically, MFR performs a pairwise search of a fragment database where the best candidates are selected by a χ2-test that evaluates the difference between (i) measured and calculated dipolar couplings from a singular value decomposition procedure (dipolar homology) and (ii) experimental and predicted chemical shift values for each selected fragment. The well-fitting fragments provide backbone torsion angle restraints that are applied during low-resolution structure modeling. Finally, the predicted models are further refined in order to improve their agreement with experimental chemical shifts and dipolar couplings. The utility of MFR is highlighted by a measured backbone RMSD (Root Mean Square Deviation) of 1.2 Å (angstrom) between modelled and X-ray structures of ubiquitin [62], suggesting that folds for small proteins can be captured using solely the chemical shifts and dipolar couplings, thereby alleviating the need to acquire and analyze NOESY data. Further improvements have been made to this algorithm at various stages including fragment search, assembly, sidechain placement, and structure refinement by employing other NMR parameters, such as J-couplings and NOEs [64,65]. While the early MFR method could accurately model backbone structures of small proteins, a significant limitation remained with respect to sidechain placement [62], which has been addressed in more recent methods [64,65].
One of the methods that surpassed previously existing approaches in structure prediction accuracy, addressed known limitations and worked well for a wide range of molecular weights is CHESHIRE (Chemical Shift Restraints) [59]. The CHESHIRE procedure further extends the fragment-based strategy introduced by MFR [62] together with NMR chemical shifts to model protein structures. This procedure consists of three phases that follow a generalized fragment assembly framework (Fig. 2). First, the 3PRED algorithm [59] is used to predict secondary structures of three- and nine-residue fragments using NMR chemical shifts in conjunction with sequence-based secondary structure propensities. Specifically, the experimental chemical shifts are used to estimate the probability of an amino acid adopting a given secondary structure type. Additionally, secondary structure propensities are computed from known structures in the ASTRAL Structural Classification of Proteins (SCOP) database [66] according to a classification performed by the STRIDE (Structural Identification) algorithm [67]. Second, the TOPOS algorithm [59] similar to TALOS (Torsion Angle Likelihood Obtained from Shift and sequence similarity) [68] is used to predict backbone torsion angles using combined information drawn from experimental chemical shifts and previously determined secondary structural elements. The primary difference between TOPOS and TALOS lies in how they use chemical shifts for scoring (for instance, TOPOS ignores 1HN chemical shifts). Following torsion angle prediction, candidate fragment conformations are selected from a structural database and filtered according to an energy function consisting of empirical terms sensitive to torsion angles, secondary structure and agreement between experimental and back-calculated chemical shifts (computed via SHIFTX). Third, fragments are assembled to generate low-resolution models using a Monte Carlo-simulated annealing method, where the query protein chain adopts a simplified representation consisting of only the backbone atoms and Cβ atoms of sidechain groups. Finally, sidechain rotamers from the Dunbrack library [61] are added to the low-resolution models following optimization of an all-atom energy function using Monte Carlo-based techniques. The all-atom energy function (E) contains a chemical shift term (back-calculated from the predicted models using SHIFTX) and a molecular dynamics (MD) force field according to Eq. (1):
| (1) |
Here, the numerator recapitulates an MD-derived force field containing terms from van der Waals, electrostatic, solvent, pair-wise mean force and hydrogen bonding effects [59]. The denominator is an experimental scoring function, where CX measures the correlation between experimental and back-calculated chemical shifts using SHIFTX for Xϵ{Hα, N, Cα, Cβ}atoms. Corresponding k values are user-defined constants.
In the original implementation of CHESHIRE’s refinement procedure, the optimization of a combined MD/chemical shift scoring function is sufficient to bias the calculations towards the native state [59]. In particular, the derivatives of the chemical shift-based terms are not explicitly computed, which would be required for any approaches employing an MD-based integration of Newton’s equations of motion. Apart from exhibiting high accuracy in predicting benchmarked proteins of sizes up to ~14 kDa (Kilodalton) (backbone atom RMSD <1.8 Å) [59], this approach performed very well (RMSD values <2.6 Å for proteins up to 160 aa (amino acids) in size) in the third round of CASD-NMR (Community Wide Assessment of NMR Structure Determination) evaluation [69,70].
While the CHESHIRE procedure clearly demonstrated that chemical shift-derived fragments can be used to build nearnative structures, the same was independently highlighted by the Chemical Shift (CS)-Rosetta approach (Fig. 3A) [60]. CS-Rosetta combines a highly optimized ab initio fragment assembly protocol [53], employing different sampling schemes and multiple low-resolution energy functions, together with NMR chemical shifts to yield accurate structural models [60]. This protocol leverages high resolution structures from the PISCES (Public server for culling sets of protein sequences from the PDB by sequence identity) database [71], corresponding secondary structures assigned by DSSP (Dictionary of Secondary Structure Predictions) [72] and predicted chemical shifts of 13Cα, 13Cβ, 13C’ , 15N, 1Hβ and 1HN nuclei from SPARTA, to generate a library of native-like fragments, as opposed to fragments obtained from sequence information alone (Fig. 3A). In the earlier implementation of this protocol, the fragment selection was carried out using the MFR approach [62], which was later superseded by a modular algorithm that incorporates various experimental data terms and/or other prior biases during the fragment selection process [73]. The fragments selected using chemical shifts possess ɸ, ψ backbone torsion angles that are closer to their native values as shown for two representative arginine residues (at positions 45 and 61) in a 72 aa protein (Fig. 3B and C). Following three- and nine-residue fragment selection, assembly and refinement are carried out using Rosetta’s Metropolis Monte Carlo procedure. CS-Rosetta makes use of Rosetta’s simplicity and speed during the fragment assembly process. As mentioned previously, a protein chain in Rosetta is represented using torsion angle coordinates. By convention, if any torsion angle within a protein chain is perturbed, the angular motion affects all the atoms towards the C-terminus (called the lever-arm effect). To eliminate this effect, a protein chain is depicted using a directed (from N- to C-terminus), acyclic graph called fold tree [53,74]. In a fold tree, the nodes represent residues and the edges represent covalent connections. During the angular motion of torsions, breaks are introduced in a protein chain to eliminate the lever-arm effect. The fold information is preserved using long-range connections, which are also added as edges in the tree [53,74]. The use of fold tree framework greatly simplifies the assembly process by allowing the sampling of non-local structural features while remaining in torsion angle space [74].
Fig. 3.

Chemical shifts aid the selection of high quality backbone fragments in CS-Rosetta. (A) Cα RMSD to native structure among the top twenty, three-residue backbone fragments for each residue position in the sequence of a 72 aa. query protein. Fragments are selected based on sequence profile and secondary structure prediction in Rosetta (red). Alternatively, chemical shifts can be used to bias the fragment selection process in CS-Rosetta (blue). (B and C) Distribution of ϕ,ψ backbone dihedral angles in the top 100 fragments derived using Rosetta (red) or CS-Rosetta (blue) for two representative Arg residues at positions (B) 45 and (C) 61. Green dots indicate the ϕ,ψ dihedral angles observed in the native structure (X-ray).
The low-resolution phase employs a simplified representation of a protein chain, where only backbone heavy atoms and a centroid site representing the sidechain are present for every residue. Thereafter, Monte Carlo fragment trial steps are applied, starting from a fully extended protein chain, to yield collapsed backbone folds. In this phase, the low-resolution Rosetta energy function [75–77] is continuously minimized while sampling fragments from the library. In the high resolution phase, sidechain atoms are added to low-resolution structures. The placement of sidechain atoms is challenging due to the exponential number of possible rotamer combinations for all amino acids in a query protein sequence. To solve this problem, Rosetta uses a module called packer [53]. Packer selects feasible rotamers for each amino acid from the Dunbrack library [61], and further uses Monte Carlo-simulated annealing method to search for the optimal rotamer combinations. The high resolution (or full-atom) models are further refined using a Monte Carlo- and gradient-based optimization process that performs small backbone perturbations to resolve steric clashes. The final full-atom models are retained based on a high resolution scoring function [75–77], and quality of fit of the predicted models to the experimental chemical shifts. The compliance of the predicted models to experimental shifts is assessed by back-calculating the chemical shifts from the models using SPARTA. The chemical shift deviation in the models is further used to adjust the Rosetta all-atom energy, E, according to Eq. (2):
| (2) |
Here, i and j are the nuclei and residues respectively is the back-calculated chemical shift value obtained from SPARTA, is the experimental chemical shift value, σi;j is a standard deviation and c is a weight factor, which can be optimized according to benchmark calculations.
The reliability and accuracy of CS-Rosetta were demonstrated through the comparison of predicted models with structures determined experimentally by the Northeast Structural Genomics (NESG) consortium [60]. The lowest-energy predicted models were remarkably close to the solved X-ray or NMR structures (0.6–2.1 Å backbone RMSD). In this review, we illustrate the effects of chemical shift data in guiding structure modeling within CS-Rosetta for a 15.5 kDa target protein, RTT103 (Regulator of Ty1 Transposition) (Fig. 4) [78,79]. Here, models calculated using chemical shifts are closer to the native structure (PDB ID 2KM4) [80] and are well-converged, as opposed to the models calculated from sequence and secondary structure prediction alone (Fig. 4A–C). The distributions of Rosetta energies in the sampled models further highlight that, through the selection of more native-like fragments, chemical shifts greatly limit the conformational space and bias the search towards a native structure (Fig. 4D). When CS-Rosetta was originally tested, it could accurately model structures of proteins of sizes up to 15 kDa. However, the size remained well below the marked solution NMR standard of 25–30 kDa at the time (Fig. 1). The size limitation of CS-Rosetta was subsequently remediated by an improvement in the ab initio fragment sampling protocol and the use of backbone RDC restraints along with chemical shifts- (CS-RDC-Rosetta: a b-version of RASREC-Rosetta, discussed below) [81]. This updated protocol allowed modeling of proteins up to 25 kDa in size.
Fig. 4.

Improved de novo structure modeling using chemical shifts. Structure calculations for a 15.5 kDa protein (RTT103) are shown with and without the use of chemical shifts. (A) Ten lowest-energy structures generated by CS-Rosetta using manually assigned backbone chemical shifts. (B) Ten lowest-energy structures generated by CS-Rosetta without the use of chemical shifts. (C) Convergence vs backbone RMSD (Å) to native (PDB ID 2KM4), of the ten lowest-energy structures calculated with (green) and without (crimson) the use of chemical shifts. The native structures (PDB ID 2KM4) [80], which were idealized and refined in the Rosetta force-field for comparison, are additionally provided (purple). (D) Rosetta Energy (in R.E.U.) distributions among the 100 lowest-energy structures generated by CS-Rosetta with (green) and without (crimson) the use of chemical shift fragments in ab initio calculations. R.E.U. – Rosetta Energy Units. Data obtained from [216].
To further address the conformational sampling problem in order to increase the size limit (>25 kDa) of de novo structure determination, the Resolution Adapted Structural Recombination (RASREC)-Rosetta protocol was developed [82]. RASREC-Rosetta is designed to address difficult targets containing complex topologies with many sequentially distant (or nonlocal) interactions, and makes use of optimization strategies implemented by other protocols [74,83–86] alongside important features, including the fold tree framework from Rosetta 3.0 [53]. Each optimization strategy is customized and embedded in different phases of this multi-staged, iterative approach. Specifically, RASREC-Rosetta has an exploration stage followed by five resampling stages. Every stage retains the best scoring candidate structures that serve as a knowledge base, in conjunction with all available experimental data, for subsequent stages. In stages 1–3, the protein chain represented as a fold tree explores different backbone fragments and long-range β-strand pairings. The possible β-strand arrangements are made available through an annotated library constructed from all b- sheets within high resolution X-ray structures in the PDB. During stages 3–6, the sets of three- and nine-residue fragments being sampled are enriched by segments from the low-resolution structures generated in the earlier stages, to promote resampling of native-like features. Hence, consistently observed structural features that form the core of a protein are retained. These features aid in distinguishing incorrect folds from the correct one in the later stages. Most importantly, if any set of low-resolution candidates do not exhibit core features and seem unfit for full-atom refinement by stage 4, the protocol reverts to earlier stages and restarts from the fold trajectories of successful candidates. After approaching the full-atom refinement stages (stages 5,6), fold tree chain breaks introduced during beta-strand topology resampling are closed to yield realistic candidate structural models. In addition to the implementation of an array of optimization strategies, RAS-REC benefits from unprecedented parallelization through MPI (Message Passing Interface), which allows for batches of structure modeling calculations to be distributed across all available cores in a computer cluster. This greatly enhances the sampling range and overall performance of the method.
RASREC-Rosetta exhibited improved performance relative to conventional CS-Rosetta on a benchmark set of 11 proteins in 15–25 kDa size range where very sparse amide NOE restraints (of the order of tens) in addition to backbone chemical shifts were provided. Since then, it has shown considerable accuracy when applied to larger proteins with complex folds by several research groups. In particular, RASREC-Rosetta was used together with sparse NOE data recorded for 11 protein targets of sizes up to 40 kDa [87]. Here, the collection of high quality structural restraints from fully protonated samples poses a challenge due to slower rotational diffusion rates of larger proteins (>20 kDa), which leads to low signal-to-noise ratios. The authors addressed this drawback by employing a selective methyl labeling scheme (ILV, Isoleucine Leucine Valine) in a perdeuterated background to record methyl-methyl, methyl-amide and amide-amide 1H NOE contacts at increased sensitivity and resolution [87–93]. These sparse sets of NOE restraints, together with experimental chemical shifts of the backbone atoms, were used by RASREC-Rosetta to model structures at high levels of accuracy (median Cα RMSD <2 Å). In another application, a structural model of the murine cytomegalovirus (CMV) immunoevasin m04, a 23 kDa protein, was determined by RASREC-Rosetta using chemical shifts together with several data sets of complementary NOE and RDC measurements [94]. By progressively increasing the number of structural restraints supplied to RASREC-Rosetta, well-converged structural models were obtained revealing a novel, complex b-sheet topology similar to the immunoglobulin (Ig) protein fold (Fig. 5A–C). Notably, the degree of structural convergence increased by 16% and 22% with an increase in the total number of local restraints provided by a modest set of long-range (i) amide-amide (Fig. 5A and B) and (ii) amide-amide, amide-methyl, methyl-methyl (Fig. 5C) NOEs recorded using non-uniform sampling methods (NUS) (Fig. 5D). In addition to achieving higher levels of convergence, the use of all available restraints (amide-amide, amide-methyl, methyl-methyl NOE contacts and two RDC datasets) allowed modeling of structures that were within 0.6 Å (backbone RMSD) from the X-ray structure (determined independently by a different group and published after the RASREC-Rosetta models), and showed the correct placement of all core sidechains (Fig. 5E).
Fig. 5.

Convergence of a novel fold adopted by the murine cytomegalovirus m04 protein during iterative RASREC-Rosetta structure calculations with various sparse NMR data sets. Ten lowest-energy structural models calculated using NMR chemical shifts together with (A) amide-amide NOE restraints (B) amide-amide NOE and RDC restraints (C) amide-amide, amide-methyl and methyl-methyl NOE together with RDC restraints, show that 73%, 89% and 95% of residues are converged within 3 Å (backbone RMSD), respectively. (D) 2D 13C–13C projection of a 4D methyl HMQC-NOESY-HMQC (Heteronuclear Multiple Quantum Coherence) spectra recorded without (left) and with (right) non-uniform sampling (NUS). The NUS experiment was recorded with a sparsity of 1.56% and reconstructed using the SMILE algorithm in NMRPipe [246,247]. Both the spectra were acquired with similar parameters and the same net acquisition time. (E) An overlay of the m04 protein structure determined by solution NMR (cyan, PDB ID 2MIZ) [94] and X-ray crystallography (green, PDB ID 4PN6) [248] with backbone heavy atom RMSD of 0.6 Å. This figure is partially (Panels A, B and C) adapted from [94] with permission.
To illustrate the advantages of fragment-based approaches over torsion angle dynamics methods (the most popular approaches for NMR-based structure determination among all entries in the PDB), we calculated new structures of Abl kinase RM (Regulatory Module) protein complex using RASREC-Rosetta and compared them with PDB deposited models, which were generated using CYANA (Combined Assignment and Dynamics Algorithm for NMR Applications) (PDB ID 6AMW) [95,96]. The CYANA structures were modeled using chemical shift derived torsion angle restraints together with 3830 short- and long-range NOEs and an additional set (consisting of 80 restraints) of ‘NOE-derived’ hydrogen bond restraints [96]. In contrast, for the RASREC-Rosetta calculations we used chemical shift-derived torsion angle restraints and a sparse set (1547 long-range) of NOEs (a subset of restraints provided to CYANA; BMRB ID 30332) to guide the structure determination (Fig. 6). The ten lowest-energy models (Fig. 6A) obtained using RASREC-Rosetta showed improved convergence with respect to the average relative orientation of the two individual domains relative to the models produced using CYANA, illustrated by structure superpositions performed using either the SH3 domain (residues 83–138) (Fig. 6B) (SH: Src Homology) or the SH2 domain (residues 139–237, linker and SH2) (Fig. 6C). As highlighted by our results, the use of fragment-based approaches with advanced sampling strategies and a more elaborate high resolution energy function leads to improved convergence of models from a lower restraint density (Fig. 6D–F).
Fig. 6.

Comparison of the Abl kinase regulatory module structural ensembles calculated using RASREC-Rosetta and CYANA. (A) Globally aligned ten lowest-energy models of Abl kinase regulatory module (residues 83–237) calculated by RASREC-Rosetta (left) and CYANA (PDB ID 6AMW) (right) using amide-amide, amide-methyl, methyl-methyl NOE and H-bond (for CYANA) restraints. (B) Ten lowest-energy models calculated using RASREC-Rosetta (left) and CYANA (right) superimposed with respect to domain A (residues 83–138, SH3 domain). (C) Ten lowest-energy models calculated using RASREC-Rosetta (left) and CYANA (right) superimposed with respect to domain B (139–237, connector and SH2 domain). (D) Total number of amide-amide, amide-methyl and methyl-methyl NOE restraints used by RASREC-Rosetta and CYANA during iterative structure calculation and refinement. CYANA additionally uses a set of ‘NOE-derived’ hydrogen bonds (H-bond). Amide (orange): amide to amide and amide to methyl NOE contacts. Aliphatic (gray): methyl-methyl NOE contacts. H-bond restraints (red). (E and F) Average pairwise backbone heavy-atom RMSDs (in Å) using structural superimpositions performed with respect to different domain selections are shown per residue for structural ensembles calculated using CYANA and RASREC-Rosetta, respectively. Full alignment (blue): global alignment of ten lowest-energy structures. Domain A alignment (crimson): alignment of ten lowest-energy structures with respect to the SH3 domain. Domain B alignment: alignment of ten lowest-energy structures with respect to the connector and SH2 domain. Structural models of Abl kinase RM calculated using RASREC-Rosetta and the corresponding NMR data are available at https://dash.library.ucsc.edu/stash/dataset/doi:10.7291/D1Q94R.
Whereas MFR [62], CHESHIRE [59] and CS-Rosetta [60,82] utilize chemical shifts in conjunction with known structures to derive a selection of low-resolution backbone fragments, they do not take full advantage of the high resolution structural information encoded within the data. In these methods, the primary use of experimental chemical shifts is through a comparison against back-calculated (via SPARTA or SHIFTX) chemical shift values used to measure the compliance of selected fragments or models computed using Monte Carlo-based optimization methods. While Monte Carlo-based methods fare very well during structure refinement, they also have a very high rejection rate of trial moves (about 90%) during random exploration while modeling unknown structures. Furthermore, the chemical shift scoring terms computed using SPARTA or SHIFTX are non-differentiable, and therefore, the restraints derived from chemical shifts cannot be used directly to perform a uniform exploration of the conformational phase (/ and w) space. To address this bottleneck, several research groups have incorporated chemical shifts directly as differentiable distance restraints [44,97,98]. Notably, CamShift applies chemical shift restraints during MD simulations (Chemical Shift restrained Molecular Dynamics (CS-MD)) [44,97,98]. Towards this end, Cam-Shift models NMR chemical shifts as polynomial functions of interatomic distances, deviations from random coil values, dihedral angles, ring current and hydrogen bonding effects. The distance-dependent term of the CamShift objective function is contributed by backbone, sidechain, and through-space atom pair correlations as highlighted by Eq. (3):
| (3) |
Here, X ϵ {backbone, sidechain, through-space}, distanceij is the distance between atoms i and j, αij and βij are the parameters derived from known structures with assigned backbone chemical shifts in the database [99]. δbackbone captures the distance between a query atom and backbone atoms of the nearby residues along with additional distances contributed by the backbone atom pairs of neighbors. δsidechain is used to acquire distance between query atom and the sidechain atoms of that residue. Lastly, δthrough-space allows attainment of distances of all the atoms within 5 Å of a query atom excluding backbone atoms of the query residue and the neighboring residues that are obtained while calculating δbackbone.
In this approach, during every integration step of the MD simulation, an overall potential function is calculated by taking a difference between the CamShift-predicted chemical shifts and the experimental shifts. Here, CamShift computes the forces by directly evaluating the derivatives of the chemical shift potential with respect to the various interatomic distances in Eq. (3), along the x, y and z coordinates [44]. Since MD simulations are carried out in the Cartesian coordinate system, the size of the system becomes a limiting factor for larger proteins. Therefore, combining such methods to perform refinement after the generation of an initial set of starting models computed quickly using existing fragment assembly approaches (such as CHESHIRE and CS-Rosetta) provides a promising avenue towards modeling larger, more complex protein folds [97].
5. Applications of chemical shifts in homology-based modeling
As stated by Anfinsen’s postulate, the sequence of amino acids in a protein contains sufficient information to determine its fold [100]. By extension, two or more evolutionarily related (or homologous) proteins that share considerable amino acid sequence similarity likely also have comparable 3D structural features. Classical methods, including MODELLER [101], I-TASSER [55] and threading protocols in Rosetta [102], have achieved success in performing homology (also referred to as comparative or template-based) modeling even when the similarity between the query and template sequences is low (up to 20–25%) [103]. Nonetheless, these methods generally require a high (>40%) degree of sequence similarity to the template to obtain reliable models for larger proteins. Alternative approaches combine information drawn from evolutionarily related proteins (or templates) with sparse experimental data to overcome the drawbacks of classical methods. In particular, NMR chemical shifts can supplement sequence information to guide template identification and alignment at lower sequence similarity levels, thereby helping to alleviate a problem that has plagued comparative modeling since its inception [104]. There are now robust approaches that have employed this concept, each unique in the way it selects templates and extracts restraints in order to model the structures of larger proteins with high accuracy.
The first method which combined comparative modeling algorithms with backbone and 13Cβ chemical shifts to derive consistently accurate models of protein structures is CS-HM-Rosetta (Chemical Shift-Homology Modeling-Rosetta) [104]. In this approach, classical CS-Rosetta ab initio calculations are used together with evolutionary distance restraints [105] derived from homologous proteins (or templates with ~30% sequence identity) in the PDB to bias the search towards solutions that are consistent with both (i) the fold in template structure(s) and (ii) the backbone chemical shifts (Fig. 7, blue). In this way, the chemical shift data are used as a means to distinguish high quality alignments from incorrect alignments, both locally and globally. Relative to conventional comparative modeling protocols, this enables accurate template-based modeling in spite of low sequence similarity levels.
Fig. 7.

Methods for structure determination using restraints derived from evolutionary information and NMR chemical shifts. (Blue) Flow diagram of CS-HM-Rosetta [104]. In CS-HM-Rosetta, the query sequence is aligned to the sequences of template (evolutionarily related) protein structures extracted from the PDB. A set of distance restraints from the template structures are derived using Gaussian probability densities (silver) for every pair of Cα atoms in the sequence of the query protein. These distance restraints are used along with sparse NMR data (NOEs, RDCs), for chemical shift fragment-based structure determination by CS-Rosetta. (Gray) Flow diagram of CS-RosettaCM/Pomona [109]. CS-RosettaCM/Pomona obtains chemical shift-derived torsion angles for a query protein using the backbone chemical shifts, and then performs pairwise alignment to the torsion angles and sequence of template structures in the PDB using a dynamic programming algorithm. Following pairwise alignment, possible template structures are selected and clustered. The representative templates are then selected from these clusters. Together with sparse NMR restraints, the filtered structures serve as templates for the CS-RosettaCM protocol. (Dark red) Flow diagram of EC-NMR [128] and RASREC-Rosetta with evolutionary restraints [127]. Here, a multiple sequence alignment is constructed using the query sequence and many template sequences with unknown structures. Related residues in space the exhibit covariance in the sequence alignment are then identified using statistical algorithms to derive structural restraints, termed evolutionary coupling (EC) restraints. These EC restraints are combined with NMR data (such as chemical shifts, NOEs, RDCs) and input to standard RASREC-Rosetta or CYANA for structure calculation.
In order to derive long-range structural restarints, CS-HM-Rosetta uses a probabilistic approach to establish a relation between a diverse input of pair-wise alignments to template sequences, with features in the corresponding template structures. Here, (i) all proteins in the PDB are aligned to the query sequence using HHSearch (HMM-HMM search; HMM: Hidden Markov Model) [106] where the alignment criterion is based on the predicted secondary structure of the query sequence versus the template secondary structure (via DSSP [72]); the alignment pairs with lower e-values are retained, (ii) every pair of residues that are ten or more positions apart along the query sequence is considered; if the distance between the Cα atoms of the corresponding residues in the template structure is within 10 Å, then it is used to compute a multi-basin Cα distance constraint, (iii) the joined distribution of distances obtained from all alignments is analyzed against a set of four alignment quality features, including the HHSearch e-value (local sequence similarity) [106], the BLOSUM62 (Block Substitution Matrix) score [107] for the aligned residue pairs (global sequence similarity), the nearest gap in a query sequence, and the number of Cβ atoms within 8 Å from other Cβ atoms in the template structure (buried surface). Finally, a multi-modal distribution of distance deviations is constructed for every Cα atom pair in the sequence, and subsequently converted into a single distance restraint. Therefore, the confidence of each restraint is strengthened by combining distances computed for the same residue pair from multiple template structures, represented as a mixture of Gaussian distributions [105] (Fig. 7, silver).
The distance restraints drawn from evolutionarily related proteins directly influence the convergence and distribution of sampled Rosetta energies in explicit CS-HM-Rosetta calculations. If the input alignments are incorrect either locally or globally, then the derived distance restraints will not be consistent with the experimental chemical shifts, and will yield models with poor convergence and high energies relative to control calculations performed without the use of evolutionary distance restraints [108]. CS-HM-Rosetta’s ability to model accurate backbone structures (RMSD < 2 Å) and recover high degree (75–85%) of native sidechain rotamers demonstrates that the combination of NMR chemical shifts with evolutionary distance restraints can circumvent the need to analyze NOE data for targets with remote homologs in the PDB. Instead, the NOE data (if available) can be used for structure validation.
As an alternative to conventional sequence and predicted secondary structure-based alignment methods, the CS-RosettaCM/ Pomona (Chemical shift-Rosetta Comparative Modeling/Protein Alignments Obtained by Matching of NMR Assignments) [109] (Fig. 7, gray) protocol relies on the idea that NMR chemical shifts encode local structural homology. The key innovation in this procedure is a protein alignment module, Pomona, which uses TALOS-N [110] to estimate //w backbone torsion angle probability maps from 13Cα, 13Cβ, 13Cγ , 15N, 1Hα, and 1HN chemical shifts for every amino acid in the query sequence. These maps are used to compute a substitution score measuring local similarity between the query and a template structure, given by the weighted contributions of backbone torsions, secondary structure and sequence similarity. A pairwise sequence alignment is then performed using a modified version of the Smith-Waterman dynamic programming algorithm [111], with an objective function that optimizes the substitution score augmented by a gap insertion penalty term [112]. The resulting alignment is further validated according to the consistency between experimental chemical shifts and SPARTA+ computed chemical shifts (for each residue in the query sequence that aligns to a residue in the database used by SPARTA+). In contrast to classical, sequence-based comparative modeling methods, homologous proteins with sequence identity 2’20% are excluded for the examples used in that study to prevent overfitting. All template structures identified by Pomona undergo normalized Cα RMSD-based hierarchical clustering, and the ten top-ranking clusters with respect to alignment score are retained. Finally, the top two representatives from each cluster are used as structural templates for Rosetta’s comparative modeling protocol, RosettaCM [113,114].
More recently, the use of sequence covariance information to infer structural relationships between different pairs of residues along the query sequence has shown great promise for enabling reliable fold identification [115]. Stemming from the principle that evolutionary coupling correlates well with structural proximity, a growing body of work combines evolutionary data with sparse experimental restraints towards accurate modeling of protein structures [115–117]. Moreover, a global effort towards inferring a reliable network of EC (evolutionary coupling) restraints from fewer homologous sequences has improved the effectiveness of this approach [115,116,118–124]. These methods typically rely on global statistical methods, such as pseudo-likelihood maximization (PLM) [122] and/or direct information (DI) [125], and more recently deep learning methods [126] to identify relevant sequence features for robust identification of residue contacts. As a general rule, these methods first perform a HMM profile-based multiple sequence alignment (MSA) of the evolutionarily related protein sequences. Following MSA, a covariance matrix between all pairs of residues in the query sequence is created. The inverse of the covariance matrix provides conditional mutual information, which allows estimation of residue-residue contacts. Even though these methods exhibit high accuracy in predicting true structural contacts (>80% true positive rate among the top 50 predictions [122,125]), they also have a high false positive rate. Therefore, the extent to which such heterogeneous sets of restraints can be used to guide protein modeling calculations depends on the use of advanced sampling protocols, along with experimental data which can in principle distinguish correct from incorrect EC restraints on the basis of the calculation outcome.
The incorporation of EC restraints together with NMR chemical shifts within robust sampling protocols shows great promise towards identifying the native folds of larger proteins. As described earlier, RASREC-Rosetta has a high degree of accuracy and precision in modeling protein structures with complex folds, in the face of sparse experimental data and erroneous restraints. More recently, RASREC-Rosetta was extended to employ evolutionary contacts in addition to NMR chemical shifts and available sparse experimental data (Fig. 7, dark red) [127]. In this approach, restraints from evolutionary couplings are obtained using either the PLM or DI scoring methods in EVFold (Evolutionary Fold) [115,116]. NMR chemical shifts complement the EC restraints, by identifying a consistent network of restraints during RASREC-Rosetta calculations, and thus eliminating any structurally unrelated correlations recognized by EVFold. In addition, the energy function in RASREC-Rosetta is further adjusted to account for incorrectly drawn EC restraints [127].
An alternative, more integrative approach, EC-NMR (Evolutionary Coupling-NMR spectroscopy) combines evolutionary contact information with NMR data within the structure determination program, CYANA (Fig. 7, dark red) [128]. In this approach, EC restraints are inferred from the analysis of MSAs using the jack-hammer algorithm [129]. NMR data, including backbone and side-chain chemical shifts, NOESY peak lists and RDCs are recorded for ILV-methyl labeled protein samples [130–132]. Briefly, the EC restraints are combined with the previously assigned backbone and sidechain NMR chemical shifts, and used to assign the NOESY cross-peaks using the ASDP program [133]; these assigned restraints are then used in the full simulated annealing structure determination protocol in CYANA [95]. Here, the correct EC restraints and unambiguous NOESY assignments form a reliable network of contacts which helps in resolving ambiguities in the remaining NOESY assignments and in eliminating possible false positive EC restraints. Finally, the full set of assigned NMR restraints and evolutionary couplings are used to refine the preliminary CYANA models using Rosetta’s all-atom energy function [134].
6. Modeling protein complexes using chemical shifts
Protein complexes constitute over 50% of the proteome and participate in very many important biological processes [135]. NMR has enabled structural studies of such systems in vitro [136]; however full structure elucidation is challenging due to their large size and the presence of dynamics as well as the effects that become more pronounced at the interface between different subunits, which ultimately lead to exchange-induced line broadening of the NMR resonances [137]. An existing method, HADDOCK (High Ambiguity Driven Docking) (reviewed in [138]), makes use of chemical shift perturbations to model the structures of protein complexes [139–142]. Specifically, the differences in backbone and sidechain chemical shifts upon complex formation are used to derive ambiguous distance restraints [143,144], which further guide the docking of monomeric subunits under the assumption that the changes are localized to the binding surface(s). Similar to other semi-flexible docking methods, a major challenge remains in addressing any conformational changes that occur upon complex formation, for instance in domain-swapped protein assemblies and systems with more complex topologies.
RosettaOligomers leverages chemical shift fragments within Rosetta’s docking protocols to perform de novo modeling of symmetric oligomers [137]. This approach relies on CS-Rosetta to generate structures of monomeric subunits from sequence information, NMR chemical shifts and sparse NOE restraints (if available). In one branch of the protocol, the oligomers being modeled are assumed to contain relatively simple interfaces in which the monomers do not entwine significantly, and therefore the predicted subunits can be used in their free states. All models in the low-energy ensemble computed for the monomeric subunits are then docked together using sparse RDC restraints and user-defined symmetry information [145] (Fig. 8, Pathway 1). Another branch of this protocol makes use of more elaborate (and computationally demanding) fold and dock calculations to address cases of domain-swapped, or interleaved oligomeric proteins [146– 149]. These cases can be diagnosed on the basis of the initial CS-Rosetta calculations performed for the monomeric subunits: if the resulting structural models exhibit divergence (>3 Å) after ab initio folding, then the oligomeric complex is likely to be inter-leaved (Fig. 8, Pathway 2). Although this method was developed originally to model symmetric domains, it can be extended to accommodate asymmetric domains [149,150]. RosettaOligomers was recently integrated with RASREC-Rosetta. This extended protocol uses NMR chemical shifts, RDCs and SAXS (Small-angle X- ray scattering) data to model larger complexes, as was demonstrated for a 33 kDa dimer target [151].
Fig. 8.

Structure determination of protein complexes with RosettaOligomers guided by chemical shifts. Flow diagram for modeling protein complexes with RosettaOligomers [137] using chemical shifts, sparse NMR restraints derived from NOEs, RDCs, SAXS data sets, and user-specified symmetry definition. Pathway 1, designed to address oligomers from independently folded monomers (PDB ID 1C77, left): (i) CS-Rosetta produces a structural ensemble for each monomeric subunit and (ii) monomers are docked using protein–protein docking protocols in Rosetta [145]. Pathway 2, designed to address domain-swapped oligomers (PDB ID 2K5J, right): (i) chemical shift-derived backbone fragments, together with sparse NMR restraints, are used in one step with the fold and dock protocol [149]. Both approaches can be used in either a fully symmetric using Rosetta’s symmetry interface [249] or asymmetric mode.
CamDock performs ab initio modeling of protein complexes using the Chord program [152] which is based on the HEX [153] approach in its use of a spherical harmonics-based representation of protein surfaces. Here, backbone chemical shifts are used together with CHESHIRE’s molecular dynamics refinement strategy, as described in Section 4. CamDock was used to model E9-Im9, a 60 kDa protein complex, which resulted in a structural ensemble that is very close (1.18 Å Cα RMSD) to the reference X-ray structure [152]. In a more recent work focusing on a Ztaq: Anti-ZTaq protein complex containing 144 amino acids in total [154], the CHESHIRE procedure was first applied to model the monomeric subunits in their bound states, which were then docked as rigid bodies using a protocol akin to CamDock. The docked protein complex is further optimized by CHESHIRE’s hybrid (MD/Monte Carlo) refinement protocol using an objective function that captures experimental and predicted chemical shifts together with molecular mechanical force fields (see Section 4). As a result of these key innovations, the combined CHESHIRE/CamDock approach generated structural models within 1 Å (backbone RMSD) from the reference X-ray structure [154].
7. Modeling transient, sparsely populated conformations from chemical shifts
Chemical shifts also constitute unique NMR observables in modeling the structures of biologically relevant, sparsely populated transient protein and nucleic acid conformations (termed ‘dark’, ‘invisible’ or ‘excited’ states) [155,156]. Such measurements are made possible by the development of a suite of NMR experiments to probe excited states with lifetimes in µs–ms (microsecond-millisecond) timescale. In particular, PRE/PCS measurements [157] are useful for cases of fast conformational exchange; rotating frame R1p relaxation [158] and Carr–Purcell– Meiboom–Gill (CPMG) dispersion [159] for intermediate exchange; and chemical-exchange saturation transfer (CEST) [160] for slow exchange. The power of these methods is highlighted in key applications for the FF domain of human HYPA/FBP11 [161], Fyn SH3 domain [162], T4 lysozyme [163], HIV (human immunodeficiency virus)-1 transactivation response element RNA (Ribonucleic acid) [164], ubiquitin [165], Ca2+ sensor signaling protein calmodulin [166], a transcriptional riboswitch [167], and E. coli enzyme dihydrofolate reductase [168]. These examples assume a system in which conformational exchange occurs between two states. Whereas multi-state exchange models have been explored by several research groups, they are usually limited to three states to avoid overfitting of the NMR data [156].
Recently, the integration of chemical shifts derived from the fitting of relaxation dispersion data with methods such as CS-Rosetta has enabled modeling of sparsely populated protein conformations. In these studies, typically, a series of CPMG dispersion experiments recorded at multiple magnetic fields and temperatures provide insights into the excited-states by fitting populations, chemical shift differences (Δω), and exchange rates (kex) for the major (ground-state) and minor (excited-state) conformations [169]. CS-Rosetta has been employed together with backbone 1H, 15N, 13C chemical shifts and amide RDCs to model the excitedstate conformations of folding intermediates of either (i) a T4 lysozyme mutant [163] or (ii) a HYPA/FBP11 FF domain [161,170]. Alternatively, excited-state structures can be elucidated using paramagnetic NMR restraints provided by PRE and PCS (pseudocontact shift) measurements [156]. Specifically, PCS restraints have been used for structure determination of a transient thioester intermediate formed between Staphylococcus aureus sortase A (SrtA) and a substrate peptide, which was inaccessible to traditional structure determination methods due to its short lifetime [171]. In that study, structural restraints for the SrtA-peptide intermediate were acquired by labeling SrtA with paramagnetic lanthanide tags which enabled the detection of 407 PCS restraints used for structure calculation in Xplor-NIH [172].
8. Modeling the conformations of sidechains from chemical shift data
Despite undeniable progress in chemical-shift-driven structure determination, the majority of studies have focused on information extracted from backbone chemical shifts, often resulting in lower resolution with respect to the orientation of sidechain groups. High resolution modeling of sidechain conformations is of considerable interest towards understanding protein function with respect to enzyme catalysis [173], protein interactions modes [174] and folding [175]. In the absence of sidechain chemical shift measurements, the most probable orientations of sidechains can be inferred from their lowest-energy conformations sampled from existing rotamer libraries [176–180]. Many ab initio structure determination approaches utilize such rotamer libraries to model static sidechain conformations using Monte Carlo-based optimization. Similar to protein backbone modeling using experimental and predicted chemical shifts, modeling sidechain conformations can be significantly improved by the use of 13C chemical shifts. Towards this end, chemical shift prediction methods, such as CH3SHIFT [181], can help guide the rotamer selection and structure refinement processes. In practice, the utility of such methods is limited due to the difficulty , of predicting the γ-gauche effect, where the 13C chemical shift of a given nucleus is influenced by its position relative to γ -substituents [182], along with the observation that sidechain conformations may be constrained in X-ray structures relative to a solution environment.
In solution, sidechain rotamers sample an ensemble of functionally relevant states, which can be unveiled –in principle– by a full analysis of NMR chemical shifts. Here, methyl groups, typically found at the hydrophobic core of proteins, have favorable relaxation properties and their resonances are useful when studying larger, more complex systems [174]. While stereospecific characterization of methyl groups is generally difficult to achieve using uniformly labeled samples, the use of stereospecific isotopic methyl labeling schemes (employing precursors that lead to pro-R and pro-S labeled leucine or valine residues [183]), can help distinguish these groups, even for larger targets. This can in turn aid in capturing different rotamer configurations for leucines and valines [184]. For example, determination of sidechain rotameric states for leucine Cd1/Cd2 groups, which can sample trans, gauche+ or gauche- conformations, can be performed using measurements of chemical shift differences between stereo specifically assigned methyl groups (ΔCδ12 = δ (13Cδ1) – δ (13Cδ2)) [185] or empirical 3JCC (13CH3 – Cα) scalar bond couplings [186,187]. The former was demonstrated through a clear correlation between 13C sidechain chemical shifts, χ1/χ2 dihedral angles and rotamer conformations observed in high resolution structures [185]. In addition, a linear combination of ΔCδ12 and empirical 3JCC scalar coupling values proved useful for the interpretation of more dynamic leucine rotamer populations for calbindin D9k [187]. This analysis was facilitated by the fact that the leucine χ 2 dihedral angle primarily samples trans and gauche+ conformations in solution [188]. Simultaneously, isoleucine sidechain χ 2 rotamer conformations can be determined from chemical shifts [189] or J-coupling [190] measurements. Although isoleucine χ 2 rotamers can sample all four (trans, gauche+, gauche_, gauche100) distinct conformations, based on analysis of high resolution X-ray structures, only the trans and gauche_ conformers are populated in solution [189]. A similar approach has been applied for elucidating the χ3 rotamer of the methionine Ce methyl group [191]. The situation is more complicated for valine because its sidechain χ1 can sample multiple rotamer states (trans, gauche+ or gauche_) in solution. Here, each valine χ1 rotamer is derived from fitting measured 13Cγ1/13- C γ2 chemical shifts to a set of 20 χ1 dihedral angles, allowing for accurate estimation of trans, gauche+ and gauche_ rotamer populations [192].
In practice, these approaches have been applied to accurately determine methyl sidechain conformations in sparsely populated excited-states through the measurement of chemical shifts via CPMG relaxation dispersion experiments [188,189]. Measurement of methyl 13C chemical shifts has also shown success in solid-state NMR studies [193]. Finally, the focus of sidechain modeling has primarily assumed a single rotameric state rather than a distribution of states. A major step towards addressing this problem has been the implementation of a curated database consisting of extensive dynamic sidechain rotamers sampled from the MD simulations of known protein structures [194]. However, incorporating the information content of dynamic sidechains requires computing long MD simulation trajectories, which can be limiting for larger systems.
9. Modeling nucleic acids using chemical shifts
While the use of chemical shifts in structural studies of nucleic acids is a fairly mature field (reviewed in [194]), the application of shifts towards full structure determination is relatively new [196]. Unlike proteins, the relation between NMR chemical shifts and corresponding nucleic acid structures is difficult to discern due to short dispersion range of shift values, and the limited availability of assigned chemical shifts in the BMRB, that has stymied the development of automated methods [195,197]. Nevertheless, 1H chemical shifts still provide powerful restraints that can distinguish native from non-native nucleic acid conformations [198]. Furthermore, approaches have been developed to assign [199] or predict [200–203] chemical shifts to aid nucleic acid structure determination, refinement and validation.
De novo modeling of RNA structures containing non-canonical regions is challenging and several research groups have attempted this problem with limited success [204–207]. More recently, two complementary approaches, FARFAR (Fragment Assembly of RNA with Full Atom Refinement) [208] and SWA (Stepwise Assembly) [209] have exhibited favorable outcomes in modeling non-canonical regions using realistic force fields. In particular, FARFAR employs a fragment assembly approach [204] to model low resolution structures following full-atom refinement using a high resolution energy function. Alternatively, SWA builds each nucleotide in a stepwise manner recursively, where each step involves exhaustive enumeration of all possible conformations of the new residue. Although both these methods address conformational sampling bottlenecks that typically arise during RNA structure modeling, for more complicated cases (such as the UUAAGU hexaloop from 16S ribosomal RNA), the energy function does not provide sufficient discrimination of the native state [210]. These results spurred the development of CS-Rosetta-RNA (Chemical Shift-Rosetta-Ribonucleic Acid), where 1H chemical shifts are exploited to perform de novo modeling of RNA structures (Fig. 9) [210]. In this method, FARFAR and SWA are used in parallel to sample a large number of plausible RNA conformations. The resulting RNA structural models are energy minimized and ranked using an adjusted all-atom energy function according to Eq. (4):
| (4) |
Here, ERosetta is the standard all-atom energy function in Rosetta [208] without using chemical shifts and are experimental and back-calculated non-exchangeable proton chemical shifts obtained using NUCHEMICS [200,202], and c is a weight factor. As shown by the results on a benchmark set of 23 targets, CS-Rosetta-RNA successfully demonstrates that the introduction of chemical shift-based terms in the high resolution potential drives the simulations towards a global, native-like minimum in the energy landscape. These approaches hold great promise for the modeling of protein/RNA complexes in future methods.
Fig. 9.

Modeling RNA structures using CS-Rosetta-RNA. The query RNA sequence is used by specialized Rosetta protocols, FARFAR [208] and SWA [209] to construct a large number of plausible RNA structures. The predicted RNA structures are filtered using a combination of standard Rosetta energy function terms together with a penalty function which measures the difference between experimental and back-calculated chemical shift values (Eq. (4)).
10. Chemical shift-based structure determination and iterative NOE assignment
While chemical-shift based approaches offer an opportunity to determine the structures of proteins de novo, the main driving forces of conventional structural determination protocols by NMR are NOE measurements. NOE connectivities form a network of inter-proton distance restraints (typically within 6 Å in the 3D structure) that can be used directly in structure determination. Typically, hundreds to thousands of NOE restraints are required to define backbone and sidechain orientations during the process of structure modeling [15,16,211]. However, acquiring such restraints is a labor-intensive activity that involves analyzing and interpreting hundreds of cross-peaks in the NOESY spectra. NOE cross-peaks can be assigned to atom pairs in the protein sequence through the accurate mapping of proton chemical shifts to the cross-peak coordinates. The problem of ambiguity arises rapidly during this mapping process, as a result of spectral overlap. Therefore, automatic NOESY assignment and structure refinement has been an iterative process, which typically relies on highly complete (>90%) and accurately assigned NMR chemical shifts. Here, an initial, self-consistent network [212] of relatively unambiguous NOE restraints (of the order of 100, depending on target size and degree of spectral overlap) are drawn from the more unique mappings, to generate an initial set of low-resolution structures [15]. The low-resolution structures from early stages are then used to reduce uncertainty in the remaining unassigned or ambiguously assigned NOE restraints [144,213]. This general concept laid the foundation for the majority of NMR structure determination programs [7,15]. In these approaches, backbone dihedral angle restraints derived from chemical shifts using TALOS and similar methods can play an auxiliary role in biasing the search towards more native-like conformations that successively help assign more long-range NOEs
Several excellent reviews discuss the internal workings of many successful NOE assignment and structure refinement approaches [7,15]; here, we focus on fragment-based approaches that offer an opportunity to further explore this concept through a more optimal use of chemical shifts as a means to improve sampling of native-like conformations. For instance, Auto-NOE-Rosetta (Automatic NOESY Assignment-Rosetta) [214] leverages the powerful RASREC-Rosetta sampling engine (see Section 4) together with an iterative NOE assignment algorithm which uses network anchoring [212], agreement with a pool of preliminary models (already built into the RASREC algorithm), and presence of symmetry-related cross-peaks. The assigned NOEs are used to derive distance restraints at various ambiguity levels [144]. Low-confidence, ambiguous restraints can be combined with highly unambiguous restraints [144,215] and used within eight distinct conformational sampling stages by Auto-NOE-Rosetta.
To demonstrate the improved performance of chemical shift fragment-based approaches in NOE-driven structure determination, we performed new structure calculations of the 198 aa α-lytic protease (aLP) protein from sequence information alone using Auto-NOE-Rosetta and compared them with PDB deposited models determined with the help of chemical shifts (PDB ID 5WOT) [216] (Fig. 10A and B). From this comparison, it can be observed that the models obtained using chemical shift fragments have lower energies, exhibit higher convergence and are closer to the X-ray structure (PDB ID 1P01) (Fig. 10A–C). Further analysis of NOE assignments in the models for both scenarios revealed, as expected, very low (~30%) recovery of native residue pair contacts for sequence based fragments (Fig. 10D) relative to the contacts obtained using chemical shift fragments (~70%). Moreover, Auto-NOE-Rosetta successfully assigned approximately three times more NOE restraints per residue in the structural ensemble calculated using chemical shift fragments as opposed to those obtained using only the sequence fragments (Fig. 10E). This result stems from the sampling of native-like structures during the early stages of the protocol, which in turn helps assign more long-range NOE restraints. Hence, our comparison illustrates a high degree of synergy between chemical shifts and NOE structural restraints in driving CS-Rosetta structure calculations.
Fig. 10.

Synergy between chemical shift-based fragments and automated NOE assignments. (A) Ten lowest-energy structures of 198 aa a-lytic protease protein computed by AutoNOE-Rosetta from fragments derived using the protein amino acid sequence together with manually assigned NMR backbone chemical shifts (PDB ID 5WOT) [216]. (B) Ten lowest-energy structures of the same protein computed by AutoNOE-Rosetta using fragments derived solely from the amino acid sequence. (C) Energy (in R.E.U., Rosetta Energy Units), RMSD (in Å) to X-ray structure (PDB ID 1P01) and Convergence (in %) of the ten lowest-energy structures computed by AutoNOE-Rosetta with (purple) and without (red) using chemical shifts (gray arrows). Convergence of the structures is as shown in the gradient scale to the right. (D) NOE contacts are defined as a function of residue pairs. Upper triangular region represents long-range (at least 5 residues apart) NOE contacts identified by AutoNOE-Rosetta for two independent calculations; first, performed by applying structural fragments derived from the protein amino acid sequence (red), and second, using chemical shift fragments (silver). The lower triangular region represents long-range NOE contacts predicted between all possible protons in the X-ray (PDB ID 1P01), using a 5.5 Å distance threshold (green). (E) Number of NOE restraints assigned by AutoNOE-Rosetta for each residue in the ten lowest-energy structural models computed with (silver) and without (red) using chemical shift-based fragments. Data obtained from [216].
11. New approaches to automated chemical shift assignments
The accuracy of all chemical shift-based structure modeling methods addressed so far depends, to a large extent, on both the correctness and completeness of the input chemical shift assignments. In recent years, there has been a surge of development in methods for automated chemical shift resonance assignments of both backbone and sidechain atoms, often with very high levels of accuracy (reviewed in [16,217]). Such algorithms have become integral components of structure calculation protocols employing NOEs and/or RDCs. While the majority of chemical shift assignment algorithms operate on the basis of a large number (6–10) of complementary NMR spectra, an effort towards reducing the number of input spectra is driven by the need to simplify and further automate the NMR structure determination process.
With this in mind, 4D-CHAINS, an automated procedure, was developed recently to assign backbone and sidechain chemical shifts using two complementary 4D (TOCSY and NOESY) spectra, recorded in fully protonated samples [216]. 4D-CHAINS uses 2D probability density maps of correlated 13C–1H chemical shifts to identify spin systems (termed amino acid index groups or AAIGs) in the input 4D data. Thereafter, AAIGs are mapped to amino acids in the query protein sequence using a procedure similar to genome assembly used in DNA sequencing [218]. During this process, contiguous segments of AAIGs are iteratively matched along the protein sequence until a self-consistent assignment solution is obtained. The high levels of accuracy and completeness of 4DCHAINS (Fig. 11A) allow it to be combined with NOE assignment and structure determination algorithms, such as AutoNOE-Rosetta. The practical utility of the combined 4D-CHAINS/ AutoNOE-Rosetta protocol was demonstrated recently through the structure calculation of aLP (Fig. 11B) [216]. Here, 4D-CHAINS assigned chemical shifts together with two unassigned NOESY peak lists were provided as input to AutoNOE-Rosetta. The combined protocol (i) generated structural models within 1.3 Å from the reference X-ray structure (PDB ID 1P01) (Fig. 11B) and (ii) captured two-thirds of the crystallographic NOE contacts across the entire protein suggesting good recovery of near-native folds (Fig. 11B–D) [216]. Together, the 4D-CHAINS/AutoNOE-Rosetta approach forms a complete, automated pipeline for NMR structure determination from a minimal set of spectra.
Fig. 11.

Automated chemical shift assignment and structure determination of a-lytic protease using 4D-CHAINS/AutoNOE-Rosetta. The use of 4D-CHAINS/AutoNOE-Rosetta pipeline [216] is illustrated for a 20 kDa, uniformly 13C, 15N-labeled protein with a highly complex β-fold topology. (A) 4D-CHAINS produces reliable assignments at completeness levels (~93%) which exceed the minimum required (~70%) by AutoNOE-Rosetta to converge on the correct fold using simulated peak lists. First, 4D-CHAINS assigns 77% of all observed backbone and sidechain chemical shifts using a 4D HC(CC–TOCSY(CO))NH experiment (dark green). Second, correct assignments are automatically extended by an additional 13% using common NOEs in a 4D 13C, 15N-edited HMQC-NOESY-HSQC (Heteronuclear Single Quantum Coherence) experiment (light green). The full method has a combined 1.9% error rate (red), and does not consider the resonances of aromatic or sidechain amide groups (silver), which can be readily obtained manually using the automated assignments as a guide. (B) Ensemble of ten lowest-energy structures calculated using AutoNOE-Rosetta, superimposed on the X-ray reference structure (PDB ID 1P01). Average RMSD to X-ray: 1.3 Å (computed for backbone atoms over core secondary structure regions). (C) NOE contacts defined for residue pairs along the sequence of α-lytic protease. The upper triangular region represents NOE contacts identified by AutoNOE-Rosetta using chemical shifts assigned by 4D-CHAINS and two complementary 4D NOESY peak lists (HCNH and HCCH). The lower triangular region represents all degenerate 1H NOE contacts predicted from the X-ray structure using a 5.5 Å distance threshold. (D) Comparison of the total number of NOE contacts between amide-amide, amide-aliphatic and aliphatic-aliphatic protons assigned by AutoNOE-Rosetta and NOEs predicted from the X-ray structure as described in (C). All structure diagrams were prepared using PYMOL (https://pymol.org/2/).
12. Integration of other NMR structural parameters in chemical shift-based methods
Classical NOE-based approaches for NMR structure determination rely on the analysis of short- to medium-range (<6 Å) 1H–1H distance restraints [219]. These local NOE connectivities are typically complemented with more global restraints obtained from measurements of RDCs, PREs and PCSs [220,221] during the final stages of structure refinement. More recently, the use of such ‘‘global” restraints, in conjunction with chemical shift fragment based approaches, has proven to be a powerful combination to alleviate or reduce the requirement of NOEs for modeling protein structures. Here, we briefly discuss the utility of RDC-, PRE-, and PCS-derived restraints in chemical shift-based structure determination.
RDC measurements report on global orientations between inter-nuclear bond vectors with respect to an overall alignment frame [31,220,222]. RDCs are highly sensitive structural parameters, therefore their application during structure refinement and validation can help not only to identify the overall protein fold, but also to pinpoint detailed structural features, such as the precise equilibrium length of bonds [223,224] or deviations from planarity in the peptide group [225]. However, these high resolution applications of RDCs are limited to smaller proteins. Normally, RDC restraints have been employed within de novo structure determination protocols of various levels of complexity [226]. Due to the degeneracy of RDC values with respect to the underlying orientation of inter-nuclear vectors, multiple independent datasets recorded using different alignment media are required in order to define a uniquely preferred orientation [226]. More recently, significant progress has been made in the development of automated structure determination approaches guided by chemical shifts and or sparse RDC restraints [7,226]. In all these methods, RDCs offer a highly complementary source of structural information to the backbone chemical shifts; while chemical shifts are very sensitive to the local backbone structure, RDCs help define long-range structural features, particularly the orientation of different secondary structural elements and individual domains within the structures of multi-domain proteins. This was recently demonstrated through RASREC-Rosetta calculations, where the use of amide RDCs in conjunction with backbone chemical shifts and sparse amide NOEs enabled structure determination of targets up to 25 kDa [81]. Finally, RDCs together with chemical shifts offer an opportunity for self-consistent cross-validation of NMR structures [227], which becomes particularly relevant in the face of sparse datasets.
PRE restraints are obtained through a quantitative analysis of 15N and 13C relaxation rates in samples containing paramagnetic tags, typically attached via site-specific labeling approaches, relative to a diamagnetic reference sample. Here, the conjugation of nitroxide spin labels to engineered disulfides in proteins has been particularly useful. Alternatively, solution PREs have been widely adopted for structure modeling applications allowing for de novo structure determination of large proteins (40–100 kDa) in the absence of abundant long-range NOE restraints [221,228,229]. For instance, solvent (s)PRE-CS-Rosetta [229] makes use of the global fold information encoded within sPRE restraints (i.e. distance measurements between a paramagnetic solute and the protein surface) and chemical shifts to model protein structures. In the sPRECS-Rosetta protocol, amino acid sequences together with NMR chemical shifts are used to generate backbone fragments, which are subsequently assembled to produce low-resolution structural models (see Section 4). These low-resolution models are further used to back-calculate the sPRE effect for comparison against experimental data, and additionally to compute the sPRE-based score which is used to adjust the energy function. This approach leverages a fast, grid-based method for sPRE computation during the low-resolution stage of ab initio fragment assembly. Thus, the use of sPRE restraints complements chemical shift-based fragments by biasing the collapse of the polypeptide chain towards more native-like conformations [229].
PCS measurements provide structural restraints derived by measuring changes in chemical shift values due to the presence of a paramagnetic metal ion [230]. In contrast to NOEs and PREs, which show a r–6 dependence, PCSs display a r−3 distance dependence, allowing for comparatively longer distance measurements between atoms (up to 40 Å) [221], in conjunction with their orientational dependence that can provide a further powerful source of structural discrimination. Therefore, PCSs are used by protein structure determination algorithms to obtain global fold information [231–234].While PCSs are extensively utilized during docking or structure refinement, their use is limited during de novo modeling because the tensor parameters used to calculate PCS distance restraints depend on atomic coordinates. PCS-Rosetta extends from Rosetta’s ab initio algorithm and makes use of chemical shift derived fragments together with a low-resolution energy function adjusted according to the PCS score (computed using experimental PCS data) [232]. The PCS-based score term is obtained by interleaving a grid search, which defines the position of the paramagnetic tag, with a singular value decomposition to fit the five tensor parameters. Following low-resolution stage, sidechains are introduced and refined using a full-atom energy function augmented by the PCS score. This protocol has been recently expanded to include paramagnetic tags located at multiple sites with the aim of enabling more robust structure determination of smaller proteins [233].
As highlighted by these efforts, the combination of RDCs, PREsand PCSs with local structural restraints obtained from NMR chemical shifts provides a powerful approach towards modeling protein structures with high accuracy, using very sparse or, in some cases, no NOE data. This can be a valuable tool for the study of membrane proteins [235,236], and proteins in the solid-state [237].
13. Accessibility and performance of chemical shift-based structure determination methods
All chemical shift-based structure determination methods discussed in this review are available publicly via web servers or downloadable software packages (Table 2). Detailed manuals are available for most methods, rendering them easy to use by users with a minimal background in UNIX operating systems. Most de novo prediction methods employ fragment assembly to model monomeric or oligomeric protein structures, with various computational requirements owing to the complexity and parallelization of the corresponding protocols. While the fragment selection step itself has very modest computational cost (30–40 min on a commodity machine, depending on the target size), and can be run in parallel using the MPI build of Rosetta, ab initio structure refinement is a more demanding task.
Table 2.
Availability of a subset of the chemical shift-based methods used for structuredetermination discussed in this review.
To perform a large number of structure calculations, computer clusters consisting of 16 or more CPUs is recommended.
As a representative example, we compared total runtimes as a function of number of processors for the 20 kDa aLP target using three main approaches, CS-Rosetta, RASREC-Rosetta and AutoNOE-Rosetta. As input, these protocols were given amino acid sequence along with three- and nine-residue fragments derived using NMR chemical shifts [216]. In addition, two unassigned NOESY (4D HCCH and 4D HCNH) peak lists were provided to AutoNOE-Rosetta to perform NOE assignment alongside ab initio structure determination. CS-Rosetta and RASREC-Rosetta calculations were performed independently using manually assigned NOE constraints. We sampled a total of 10,000 independent CSRosetta structures, whereas RASREC-Rosetta and AutoNOERosetta generated 50–80 batches of 100 structures (depending on the progression of each protocol). The fragment-based approaches generally require 16 or more CPUs (of a commodity computer or a UNIX-based cluster) to yield structures in a reasonable amount of time (Fig. 12). Due to the sampling bottlenecks, it is recommended to run such calculations on 64 or higher number of CPUs for larger (>200 aa) targets.
Fig. 12.

Performance of chemical shift-based fragment assembly methods. Performance of CS-Rosetta (green), RASREC-Rosetta (orange) and AutoNOE-Rosetta (blue) for a 20 kDa target, aLP, given by their runtimes measured as a function of the number of processors (or CPUs) used for structure calculation. All the runs were carried out using sequence information, chemical shifts assigned by 4D-CHAINS, NOE restraints (for CS- and RASREC-Rosetta) and unassigned peak lists (for AutoNOE-Rosetta) as input. The points on the plot represent independent structure calculations performed by respective methods using various number of processors (16, 32, 64, 100 and 200). The y-axis shows time (in hours) taken by the methods for each calculation, which is bounded by the number of hours considered reasonable (~250 h or 10 days). For CS-Rosetta calculations, 10,000 structures are produced during each run. Similarly, for RASREC-Rosetta and AutoNOE-Rosetta calculations, 50–80 batches of size 100 are produced for every execution.
Comparatively, homology-based approaches carry out a preprocessing step to derive restraints or to identify templates from a set of evolutionarily related proteins. Representative times needed to generate restraints/find templates for aLP using CSHM-Rosetta, Pomona, and EVFold (for RASREC-Rosetta and ECNMR) are approximately 4, 400 and 300 min respectively on a single CPU.
14. Conclusions and future outlook
Ever since the first de novo atomic-resolution structure determined by NMR in 1985 [238], chemical shifts have remained an invaluable tool for spectroscopists towards examining the structure and dynamics of biomolecules for systems up to 1 MDa (Megadalton) in the solution- [174] and solid-state [239]. More recently, NMR methods have been applied to determine protein structures within living cells [240]. In this review, we have outlined a representative subset of several complementary approaches for chemical shift-driven structure determination. The active development of new algorithms and expansion of curated databases has the potential to further improve the robustness and accuracy of chemical-shift based methods, to complement and possibly replace classical methods of NMR structure calculation. While the sensitivity and versatility of chemical shifts for structure determination is highlighted by the sheer number and applicability of available approaches, information provided by chemical shifts alone is largely limited to a description of local geometry [4]. Thus, hybrid approaches that combine chemical shifts with additional short-range and long-range restraints, such as NOE [219], RDC [220], PRE [221], and PCS [221] measurements, are expected to further increase the scope, accuracy and resolution of NMR derived structures. The integrated approaches that incorporate NMR chemical shifts with other types of experimental data, such as SAXS [151], Cryo-EM (Cryo-electron microscopy) [241], SANS (Small Angle Neutron Scattering) [242], and EPR (Electron paramagnetic resonance) [243], will provide additional avenues for structure determination of larger and more challenging systems.
At the same time, automated methods have streamlined the chemical shift assignment procedure, allowing for structure determination of small to moderate sized proteins (up to 200 residues,~22 kDa) with minimal intervention by the user [7,217]. Progress in automated NMR structure determination will enable a more thorough description of the protein fold space, allowing for more accurate homology modeling and fragment generation. For systems of larger size and dynamic complexity, automated methods benefit from advances in selective isotope labeling schemes [244], the use of probes with favorable relaxation properties, such as methyl groups [245], and utilization of sparse restraints [94,104,128]. Here, highly-parallel, iterative protocols, such as RASREC-Rosetta, can lead to a drastic improvement in sampling efficiency and accurately determine near-native structures from sparser datasets. Knowing that sequence covariance can provide sufficient long-range information to model the folds of mediumsized proteins, several research groups have moved on to incorporate evolutionary information which drastically reduces the computational costs required by more data-oriented approaches. With the advent of sparse data recorded for larger systems, the mining of evolutionary information from genome sequencing and the fine-tuning of sidechain conformations according to chemical shift data, the next generation of methods will aim to deliver a more accurate view of biomolecular structures and their dynamics towards a new renaissance in structure determination by NMR methods.
Acknowledgements
The authors would like to thank Oliver Lange, Robert Vernon, Yang Shen, Jinfa Ying, Paolo Rossi, Flemming Hansen, Kostas Tripsianes, David Baker and Ad Bax for helpful discussions over the years. This manuscript was supported in part by funds from the Intramural research program of the NIAID, NIH, a K-22 Career Development and an R35 Outstanding Investigator Award to N.G.S. through NIAID (AI112573) and NIGMS(R35GM125034), respectively. Research reported in this publication was supported by the Office Of The Director, NIH, under Award Number S10OD018455.
Glossary of abbreviations
- Å
Angstrom
- µs
microsecond
- aa
amino acid
- aLP
α-lytic protease
- AAIG
amino acid index group
- ARIA
ambiguous restraints for iterative assignment
- AutoNOE-Rosetta
automatic NOESY assignment-rosetta
- BLOSUM
block substitution matrix
- BMRB
biological magnetic resonance bank
- CASD-NMR
community wide assessment of NMR structure determination
- CASP
critical assessment of methods of protein structure prediction
- CEST
chemical-exchange saturation transfer
- CHEOPS
chemical shift de novo structure derivation protocol employing singular value decomposition
- CHESHIRE
chemical shift restraints
- CMV
cytomegalovirus
- COSY
correlation spectroscopy
- CPMG
carr-purcell-meiboom-gill
- Cryo-EM
cryo-electron microscopy
- CS-HM-Rosetta
chemical shift-homology modeling-rosetta
- CS-Rosetta-RNA
chemical shift-rosetta-ribonucleic acid
- CS-Rosetta
chemical shift-rosetta
- CS-RosettaCM
chemical shift-rosetta comparative modeling
- CSA
chemical shift anisotropy
- CS-MD
chemical shift restrained molecular dynamics
- CYANA
combined assignment and dynamics algorithm for nmr applications
- DFT
density functional theory
- DI
direct information
- DSSP
database of secondary structure assignments
- EC
evolutionary coupling
- EC-NMR
evolutionary coupling-nuclear magnetic resonance spectroscopy
- EPR
electron paramagnetic resonance
- EVFold
evolutionary fold
- FARFAR
fragment assembly of RNA with full atom refinement
- HADDOCK
high ambiguity driven docking
- HHSearch
HMM-HMM search
- HIV
human immunodeficiency virus
- HMM
hidden markov model
- HMQC
heteronuclear multiple quantum coherence
- HSQC
heteronuclear single quantum coherence
- I-TASSER
iterative threading assembly refinement
- Ig
immunoglobulin
- ILV
isoleucine leucine valine
- kDA
kilodalton
- MDa
megadalton
- MFR
molecular fragment replacement
- MPI
message passing interface
- ms
millisecond
- MSA
multiple sequence alignment
- NESG
northeast structural genomics
- NMR
nuclear magnetic resonance
- NOE
nuclear overhauser effect
- NOESY
nuclear overhauser effect spectroscopy
- NUS
non-uniform sampling
- PCS
pseudocontact shifts
- PDB
protein data bank
- PISCES
public server for culling sets of protein sequences from the PDB by sequence identity
- PLM
pseudo likelihood maximization Pomona: protein alignments obtained by matching of NMR assignments
- PRE
paramagnetic relaxation enhancement
- PROMEGA
proline omega angle prediction
- RASREC-Rosetta
resolution adapted structural recombination-rosetta
- RDC
residual dipolar coupling
- R.E.U.
rosetta energy units
- RM
regulatory module
- RMSD
root mean square deviation
- RNA
ribonucleic acid
- RosettaCM
rosetta comparative modeling
- RTT
regulator of Ty1 transposition
- SANS
small-angle neutron scattering
- SAXS
small-angle X-ray scattering
- SCOP
structural classification of proteins
- SH
src homology
- SPARTA
shifts predicted from analogy in residue type and torsion angle
- sPRE
solvent paramagnetic relaxation enhancement
- SrtA
sortase A
- STRIDE
structural identification
- SWA
stepwise assembly
- TALOS
torsion angle likelihood obtained from shift and sequence similarity
- TOCSY
total correlation spectroscopy
Footnotes
Conflict of interest
The authors declare that they have no conflict of interest.
References
- [1].Wagner G, Pardi A, Wuethrich K, Hydrogen bond length and proton NMR chemical shifts in proteins, J. Am. Chem. Soc 105 (1983) 5948–5949, 10.1021/ja00356a056. [DOI] [Google Scholar]
- [2].Mielke SP, Krishnan VV, Characterization of protein secondary structure from NMR chemical shifts, Prog. Nucl. Magn. Reson. Spectrosc 54 (2009) 141–165, 10.1016/j.pnmrs.2008.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Saitô H, Ando I, Ramamoorthy A, Chemical shift tensor – the heart of NMR:Insights into biological aspects of proteins, Prog. Nucl. Magn. Reson.Spectrosc 57 (2010) 181–228, 10.1016/j.pnmrs.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Wishart DS, Interpreting protein chemical shift data, Prog. Nucl. Magn. Reson. Spectrosc 58 (2011) 62–87, 10.1016/j.pnmrs.2010.07.004. [DOI] [PubMed] [Google Scholar]
- [5].Berjanskii MV, Wishart DS, Unraveling the meaning of chemical shifts in protein NMR, Biochim. Biophys. Acta 2017 (1865) 1564–1576, 10.1016/j.bbapap.2017.07.005. [DOI] [PubMed] [Google Scholar]
- [6].Robustelli P, Cavalli A, Vendruscolo M, Determination of protein structures in the solid state from NMR chemical shifts, Structure 16 (2008) 1764–1769, 10.1016/j.str.2008.10.016. [DOI] [PubMed] [Google Scholar]
- [7].Guerry P, Herrmann T, Advances in automated NMR protein structure determination, Q. Rev. Biophys 44 (2011) 257–309. [DOI] [PubMed] [Google Scholar]
- [8].Wagner G, NMR investigations of protein structure, Prog. Nucl. Magn. Reson. Spectrosc 22 (1990) 101–139, 10.1016/0079-6565(90)80003-Z. [DOI] [Google Scholar]
- [9].Frueh DP, Practical aspects of NMR signal assignment in larger and challenging proteins, Prog. Nucl. Magn. Reson. Spectrosc 78 (2014) 47–75, 10.1016/j.pnmrs.2013.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE, The Protein Data Bank, Nucleic Acids Res 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Salvi N, Abyzov A, Blackledge M, Atomic resolution conformationaldynamics of intrinsically disordered proteins from NMR spin relaxation,Prog. Nucl. Magn. Reson. Spectrosc 102 (2017) 43–60, 10.1016/j.pnmrs.2017.06.001. [DOI] [PubMed] [Google Scholar]
- [12].Wüthrich K, NMR of Proteins and Nucleic Acids, Wiley, New York, 1986. [Google Scholar]
- [13].Gronenborn AM, Clore GM, Structures of protein complexes bymultidimensional heteronuclear magnetic resonance spectroscopy, Crit.Rev. Biochem. Mol. Biol 30 (1995) 351–385, 10.3109/10409239509083489. [DOI] [PubMed] [Google Scholar]
- [14].Wüthrich K, NMR – this other method for protein and nucleic acid structuredetermination, Acta Crystallogr. D Biol. Crystallogr 51 (1995) 249–270, 10.1107/S0907444994010188. [DOI] [PubMed] [Google Scholar]
- [15].Güntert P, Automated structure determination from NMR spectra, Eur.Biophys. J 38 (2009) 129–143, 10.1007/s00249-008-0367-z. [DOI] [PubMed] [Google Scholar]
- [16].Güntert P, Automated NMR protein structure calculation, Prog. Nucl. Magn.Reson. Spectrosc 43 (2003) 105–125, 10.1016/S0079-6565(03)00021–9. [DOI] [Google Scholar]
- [17].Clore GM, Gronenborn AM, Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination, Prog.Nucl. Magn. Reson. Spectrosc 23 (1991) 43–92, 10.1016/0079-6565(91)80002-J. [DOI] [Google Scholar]
- [18].López-Méndez B, Güntert P, Automated protein structure determination from NMR spectra, J. Am. Chem. Soc 128 (2006) 13112–13122, 10.1021/ja061136l. [DOI] [PubMed] [Google Scholar]
- [19].Das R, Baker D, Macromolecular modeling with rosetta, Annu. Rev. Biochem 77 (2008) 363–382, 10.1146/annurev.biochem.77.062906.171838. [DOI] [PubMed] [Google Scholar]
- [20].Miao Z, Westhof E, Structure RNA, Advances and assessment of 3D structure prediction, Annu. Rev. Biophys 46 (2017) 483–503, 10.1146/annurev-biophys-070816-034125. [DOI] [PubMed] [Google Scholar]
- [21].Kinch L, Shi SY, Qian C, Cheng H, Liao Y, Grishin NV, CASP9 assessment of free modeling target predictions, Proteins 79 (2011) 59–73, 10.1002/prot.23181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Kryshtafovych A, Fidelis K, Moult J, CASP9 results compared to those of previous casp experiments, Proteins 79 (2011) 196–207, 10.1002/prot.23182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Karaca E, Rodrigues JPGLM, Graziadei A, Bonvin AMJJ, Carlomagno T,M3: an integrative framework for structure determination of molecular machines, Nat. Methods 14 (2017) 897–902, 10.1038/nmeth.4392. [DOI] [PubMed] [Google Scholar]
- [24].Webb B, Viswanath S, Bonomi M, Pellarin R, Greenberg CH, Saltzberg D, Sali A, Integrative structure modeling with the Integrative Modeling Platform, Protein Sci. Publ. Protein Soc 27 (2018) 245–258, 10.1002/pro.3311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Pardi A, Wagner G, Wüthrich K, Protein conformation and proton nuclearmagnetic-resonance chemical shifts, Fed. Eur. Biochem. Soc. J 137 (1983)445–454, 10.1111/j.1432-1033.1983.tb07848.x. [DOI] [PubMed] [Google Scholar]
- [26].Williamson MP, Secondary-structure dependent chemical shifts in proteins,Biopolymers 29 (1990) 1423–1431, 10.1002/bip.360291009. [DOI] [PubMed] [Google Scholar]
- [27].Wishart DS, Sykes BD, Richards FM, Relationship between nuclear magnetic resonance chemical shift and protein secondary structure, J.Biomol. NMR 222 (1991) 311–333, 10.1016/0022-2836(91)90214-Q. [DOI] [PubMed] [Google Scholar]
- [28].Wang AC, Bax A, Determination of the backbone dihedral angles u in humanubiquitin from reparametrized empirical karplus equations, J. Am. Chem. Soc 118 (1996) 2483–2494, 10.1021/ja9535524. [DOI] [Google Scholar]
- [29].Hoch JC, Dobson CM, Karplus M, Vicinal coupling constants and protein dynamics, Biochemistry (Mosc.) 24 (1985) 3831–3841. [DOI] [PubMed] [Google Scholar]
- [30].Bax A, Vuister GW, Grzesiek S, Delaglio F, Wang AC, Tschudin R, Zhu G,Measurement of homo- and heteronuclear J couplings from quantitative J correlation, Methods Enzymol 239 (1994) 79–105. [DOI] [PubMed] [Google Scholar]
- [31].Reif B, Hennig M, Griesinger C, Direct measurement of angles between bond vectors in high-resolution NMR, Science 276 (1997) 1230–1233. [DOI] [PubMed] [Google Scholar]
- [32].Kloiber K, Schüler W, Konrat R, Automated NMR determination of protein backbone dihedral angles from cross-correlated spin relaxation, J. Biomol.NMR 22 (2002) 349–363. [DOI] [PubMed] [Google Scholar]
- [33].Li Y, Palmer AG, Narrowing of protein NMR spectral lines broadened by chemical exchange, J. Am. Chem. Soc 132 (2010) 8856–8857, 10.1021/ja103251h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Markley JL, Ulrich EL, Berman HM, Henrick K, Nakamura H, Akutsu H,BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions, J. Biomol.NMR 40 (2008) 153–155, 10.1007/s10858-008-9221-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Faraggi E, Xue B, Zhou Y, Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network, Proteins 74 (2009) 847–856, 10.1002/prot.22193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Shen Y, Delaglio F, Cornilescu G, Bax A, TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts, J.Biomol. NMR 44 (2009) 213–223, 10.1007/s10858-009-9333-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Rieping W, Vranken WF, Validation of archived chemical shifts through atomic coordinates, Proteins Struct. Funct. Bioinf 78 (2010) 2482–2489, 10.1002/prot.22756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Shen Y, Bax A, Prediction of Xaa-Pro peptide bond conformation from sequence and chemical shifts, J. Biomol. NMR 46 (2010) 199–204, 10.1007/s10858-009-9395-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Gronwald W, Willard L, Jellard T, Boyko RF, Rajarathnam K, Wishart DS, Sönnichsen FD, Sykes BD, CAMRA: chemical shift based computer aided protein NMR assignments, J. Biomol. NMR 12 (1998) 395–405. [DOI] [PubMed] [Google Scholar]
- [40].Wishart DS, Watson MS, Boyko RF, Sykes B, Automated 1H and 13C chemical shift prediction using the BioMagResBank, J. Biomol. NMR 10 (1997) 329–336, 10.1023/A:1018373822088. [DOI] [PubMed] [Google Scholar]
- [41].Neal S, Nip AM, Zhang H, Wishart DS, Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts, J. Biomol. NMR 26 (2003) 215–240 [DOI] [PubMed] [Google Scholar]
- [42].Shen Y, Bax A, Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology, J. Biomol. NMR 38 (2007) 289–302, 10.1007/s10858-007-9166-6. [DOI] [PubMed] [Google Scholar]
- [43].Shen Y, Bax A, SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network, J. Biomol. NMR 48 (2010) 13–22, 10.1007/s10858-010-9433-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Kohlhoff KJ, Robustelli P, Cavalli A, Salvatella X, Vendruscolo M, Fast and accurate predictions of protein nmr chemical shifts from interatomic distances, J. Am. Chem. Soc 131 (2009) 13894–13895, 10.1021/ja903772t. [DOI] [PubMed] [Google Scholar]
- [45].Xu XP, Case DA, Automated prediction of 15N, 13Calpha, 13Cbeta and 13C’chemical shifts in proteins using a density functional database, J. Biomol.NMR 21 (2001) 321–333. [DOI] [PubMed] [Google Scholar]
- [46].Meiler J, PROSHIFT: protein chemical shift prediction using artificial neural networks, J. Biomol. NMR 26 (2003) 25–37. [DOI] [PubMed] [Google Scholar]
- [47].Han B, Liu Y, Ginzinger SW, Wishart DS, SHIFTX2: significantly improved protein chemical shift prediction, J. Biomol. NMR 50 (2011) 43–57, 10.1007/s10858-011-9478-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Bonneau R, Baker D, Ab initio protein structure prediction: progress and prospects, Annu. Rev. Biophys. Biomol. Struct 30 (2001) 173–189, 10.1146/annurev.biophys.30.1.173. [DOI] [PubMed] [Google Scholar]
- [49].Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CEM, Baker D,Rosetta in CASP4: progress in ab initio protein structure prediction, Proteins Struct. Funct. Bioinf 45 (2001) 119–126, 10.1002/prot.1170. [DOI] [PubMed] [Google Scholar]
- [50].Wallner B, Elofsson A, All are not equal: a benchmark of different homology modeling programs, Protein Sci. Publ. Protein Soc 14 (2005) 1315–1327, 10.1110/ps.041253405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Lemer CM, Rooman MJ, Wodak SJ, Protein structure prediction by threading methods: evaluation of current techniques, Proteins 23 (1995) 337–355, 10.1002/prot.340230308. [DOI] [PubMed] [Google Scholar]
- [52].Kim DE, Blum B, Bradley P, Baker D, Sampling bottlenecks in de novo protein structure prediction, J. Mol. Biol 393 (2009) 249–260, 10.1016/j.jmb.2009.07.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman KW, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban Y-EA, Fleishman SJ, Corn JE, Kim DE,Lyskov S, Berrondo M, Mentzer S, Popovic Z´ , Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P ,ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules, Methods Enzymol 487 (2011) 545–574, 10.1016/B978-0-12-381270-4.00019-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Xu D, Zhang Y, Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field, Proteins 80 (2012) 1715–1735, 10.1002/prot.24065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Roy A, Kucukural A, Zhang Y, I-TASSER: a unified platform for automated protein structure and function prediction, Nat. Protoc 5 (2010) 725–738, 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Bowie JU, Eisenberg D, An evolutionary approach to folding small alphahelical proteins that uses sequence information and an empirical guiding fitness function, Proc. Natl. Acad. Sci. USA 91 (1994) 4436–4440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Jones DT, Successful ab initio prediction of the tertiary structure of NK-lysin using multiple sequences and recognized supersecondary structural motifs, Proteins 29 (1997) 185–191, . [DOI] [PubMed] [Google Scholar]
- [58].Simons KT, Kooperberg C, Huang E, Baker D, Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions, J. Mol. Biol 268 (1997) 209–225, 10.1006/jmbi.1997.0959. [DOI] [PubMed] [Google Scholar]
- [59].Cavalli A, Salvatella X, Dobson CM, Vendruscolo M, Protein structure determination from NMR chemical shifts, Proc. Natl. Acad. Sci. USA 104 (2007) 9615–9620, 10.1073/pnas.0610313104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Shen Y, Lange O, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D, Bax A, Consistent blind protein structure generation from NMR chemical shift data, Proc. Natl. Acad. Sci. USA 105 (2008) 4685–4690, 10.1073/pnas.0800256105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Shapovalov MV, Dunbrack RL, A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions, Struct. Lond. Engl 1993 (19) (2011) 844–858, 10.1016/j.str.2011.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Delaglio F, Kontaxis G, Bax A, Protein structure determination using molecular fragment replacement and NMR dipolar couplings, J. Am. Chem.Soc 122 (2000) 2142–2143, 10.1021/ja993603n. [DOI] [Google Scholar]
- [63].Kraulis PJ, Jones TA, Determination of three-dimensional protein structures from nuclear magnetic resonance data using fragments of known structures,Proteins 2 (1987) 188–201, 10.1002/prot.340020304. [DOI] [PubMed] [Google Scholar]
- [64].Kontaxis G, Delaglio F, Bax A, Molecular fragment replacement approach to protein structure determination by chemical shift and dipolar homology database mining, Methods Enzymol 394 (2005) 42–78, 10.1016/S0076-6879(05)94003–2. [DOI] [PubMed] [Google Scholar]
- [65].Kontaxis G, An improved algorithm for MFR fragment assembly, J. Biomol. NMR 53 (2012) 149–159, 10.1007/s10858-012-9632-7 [DOI] [PubMed] [Google Scholar]
- [66].Chandonia J-M, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE, The ASTRAL compendium in 2004, Nucleic Acids Res 32 (2004)D189–D192, 10.1093/nar/gkh034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Frishman D, Argos P, Knowledge-based protein secondary structure assignment, Proteins Struct. Funct. Bioinf 23 (1995) 566–579, 10.1002/prot.340230412. [DOI] [PubMed] [Google Scholar]
- [68].Cornilescu G, Delaglio F, Bax A, Protein backbone angle restraints from searching a database for chemical shift and sequence homology, J. Biomol.NMR 13 (1999) 289–302. [DOI] [PubMed] [Google Scholar]
- [69].Rosato A, Bagaria A, Baker D, Bardiaux B, Cavalli A, Doreleijers JF, Giachetti A, Guerry P, Güntert P, Herrmann T, Huang YJ, Jonker HRA, Mao B,Malliavin TE, Montelione GT, Nilges M, Raman S, van der Schot G, Vranken WF, Vuister GW, Bonvin AMJJ, CASD-NMR: critical assessment of automated structure determination by NMR, Nat. Methods 6 (2009) 625–626, 10.1038/nmeth0909-625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Rosato A, Aramini JM, Arrowsmith CH, Bagaria A, Baker D, Cavalli A, Doreleijers JF, Eletsky A, Giachetti A, Guerry P, Gutmanas A, Güntert P, He Y, Herrmann T, Huang YJ, Jaravine V, Jonker HRA, Kennedy MA, Lange OF, Liu G, Malliavin TE, Mani R, Mao B, Montelione GT, Nilges M, Rossi P, vander Schot G, Schwalbe H, Szyperski TA, Vendruscolo M, Vernon R, Vranken WF, de Vries S, Vuister GW, Wu B, Yang Y, Bonvin AMJJ, Blind testing of routine, fully automated determination of protein structures from NMR data,Structure 20 (2012) 227–236, 10.1016/j.str.2012.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Wang G, Dunbrack RL, PISCES: a protein sequence culling server, Bioinf. Oxf Engl 19 (2003) 1589–1591. [DOI] [PubMed] [Google Scholar]
- [72].Kabsch W, Sander C, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22 (1983) 2577–2637, 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- [73].Gront D, Kulp DW, Vernon RM, Strauss CEM, Baker D, Generalized fragment picking in rosetta: design, protocols and applications, PLoS One 6 (2011) e23294, 10.1371/journal.pone.0023294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Bradley P, Baker D, Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation,Proteins Struct. Funct. Bioinf 65 (2006) 922–929, 10.1002/prot.21133. [DOI] [PubMed] [Google Scholar]
- [75].Kortemme T, Morozov AV, Baker D, An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes, J. Mol. Biol 326 (2003) 1239–1259, 10.1016/S0022-2836(03)00021–4. [DOI] [PubMed] [Google Scholar]
- [76].Morozov AV, Kortemme T, Tsemekhman K, Baker D, Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations, Proc. Natl. Acad. Sci. USA 101 (2003) 6946–6951, 10.1073/pnas.0307578101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Morozov AV, Kortemme T, Potential functions for hydrogen bonds in protein structure prediction and design, Adv. Protein Chem 72 (2005) 1–38, 10.1016/S0065-3233(05)72001–5. [DOI] [PubMed] [Google Scholar]
- [78].Jasnovidova O, Krejcikova M, Kubicek K, Stefl R, Structural insight into recognition of phosphorylated threonine-4 of RNA polymerase II C-terminal domain by Rtt103p, EMBO Rep 18 (2017) 906–913, 10.15252/embr.201643723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [79].Jasnovidova O, Klumpler T, Kubicek K, Kalynych S, Plevka P, Stefl R,Structure and dynamics of the RNAPII CTDsome with Rtt103, Proc. Natl. Acad.Sci. USA 114 (2017) 11133–11138, 10.1073/pnas.1712450114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Lunde BM, Reichow SL, Kim M, Suh H, Leeper TC, Yang F, Mutschler H, Buratowski S, Meinhart A, Varani G, Cooperative interaction of transcription termination factors with the RNA polymerase II C-terminal domain, Nat.Struct. Mol. Biol 17 (2010) 1195–1201, 10.1038/nsmb.1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT,Baker D, NMR structure determination for larger proteins using backboneonly data, Science 327 (2010) 1014–1018, 10.1126/science.1183649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Lange OF, Baker D, Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation,Proteins 80 (2012) 884–895, 10.1002/prot.23245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Blum B, Jordan MI, Baker D, Feature space resampling for protein conformational search, Proteins Struct. Funct. Bioinf 78 (2010) 1583–1593, 10.1002/prot.22677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [84].Brunette TJ, Brock O, Improving protein structure prediction with modelbased search, Bioinformatics 21 (2005) i66–i74, 10.1093/bioinformatics/bti1029. [DOI] [PubMed] [Google Scholar]
- [85].Brunette T, Brock O, Guiding conformation space search with an all-atom energy potential, Proteins Struct. Funct. Bioinf 73 (2008) 958–972, 10.1002/prot.22123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D, Highresolution structure prediction and the crystallographic phase problem,Nature 450 (2007) 259, 10.1038/nature06249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Lange OF, Rossi P, Sgourakis NG, Song Y, Lee H-W, Aramini JM, Ertekin A,Xiao R, Acton TB, Montelione GT, Baker D, Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples, Proc. Natl. Acad. Sci. USA 109 (2012) 10873–10878, 10.1073/pnas.1203013109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Sattler M, Fesik SW, Use of deuterium labeling in NMR: overcoming a sizeable problem, Structure 4 (1996) 1245–1249, 10.1016/S0969-2126(96)00133–5. [DOI] [PubMed] [Google Scholar]
- [89].Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei Ono A, Güntert P,Optimal isotope labelling for NMR protein structure determinations, Nature 440 (2006) 52–57, 10.1038/nature04525. [DOI] [PubMed] [Google Scholar]
- [90].Tugarinov V, Kay LE, An isotope labeling strategy for methyl TROSY spectroscopy, J. Biomol. NMR 28 (2004) 165–172, 10.1023/B:JNMR.0000013824.93994.1f. [DOI] [PubMed] [Google Scholar]
- [91].Baldwin AJ, Hansen DF, Vallurupalli P, Kay LE, Measurement of methyl axis orientations in invisible, excited states of proteins by relaxation dispersion NMR spectroscopy, J. Am. Chem. Soc 131 (2009) 11939–11948, 10.1021/ja903896p. [DOI] [PubMed] [Google Scholar]
- [92].Ruschak AM, Kay LE, Methyl groups as probes of supra-molecular structure,dynamics and function, J. Biomol. NMR 46 (2009) 75–87, 10.1007/s10858-009-9376-1. [DOI] [PubMed] [Google Scholar]
- [93].Sprangers R, Kay LE, Quantitative dynamics and binding studies of the 20S proteasome by NMR, Nature 445 (2007) 618–622, 10.1038/nature05512. [DOI] [PubMed] [Google Scholar]
- [94].Sgourakis NG, Natarajan K, Ying J, Vogeli B, Boyd LF, Margulies DH, Bax A, The structure of mouse cytomegalovirus m04 protein obtained from sparse NMR data reveals a conserved fold of the m02–m06 viral immune modulator family, Structure 22 (2014) 1263–1273, 10.1016/j.str.2014.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [95].Güntert P, Buchner L, Combined automated NOE assignment and structure calculation with CYANA, J. Biomol. NMR 62 (2015) 453–471, 10.1007/s10858-015-9924-9. [DOI] [PubMed] [Google Scholar]
- [96].Saleh T, Rossi P, Kalodimos CG, Atomic view of the energy landscape in the allosteric regulation of Abl kinase, Nat. Struct. Mol. Biol 24 (2017) 893–901, 10.1038/nsmb.3470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [97].Robustelli P, Kohlhoff K, Cavalli A, Vendruscolo M, Using NMR chemical shifts as structural restraints in molecular dynamics simulations of proteins,Structure 18 (2010) 923–933, 10.1016/j.str.2010.04.016. [DOI] [PubMed] [Google Scholar]
- [98].Robustelli P, Cavalli A, Dobson CM, Vendruscolo M, Salvatella X, Folding of small proteins by Monte Carlo simulations with chemical shift restraints without the use of molecular fragment replacement or structural homology, J.Phys. Chem 113 (2009) 7890–7896, 10.1021/jp900780b. [DOI] [PubMed] [Google Scholar]
- [99].Zhang H, Neal S, Wishart DS, RefDB: a database of uniformly referenced protein chemical shifts, J. Biomol. NMR 25 (2003) 173–195, 10.1023/A:1022836027055. [DOI] [PubMed] [Google Scholar]
- [100].Anfinsen CB, Principles that govern the folding of protein chains, Science 181 (1973) 223–230, 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
- [101].Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen M-Y, Pieper U, Sali A, Comparative protein structure modeling using MODELLER, Curr. Protoc. Protein Sci Chapter 2, 2007, Unit 2.9. doi: 10.1002/0471140864.ps0209s50. [DOI] [PubMed] [Google Scholar]
- [102].Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O, Kinch L, Sheffler W, Kim B-H, Das R, Grishin NV, Baker D, Structure prediction for CASP8 with all-atom refinement using Rosetta, Proteins Struct. Funct. Bioinf 77 (2009) 89–99, 10.1002/prot.22540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [103].Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A,Comparative protein structure modeling of genes and genomes, Annu. Rev.Biophys. Biomol. Struct 29 (2000) 291–325, 10.1146/annurev.biophys.29.1.291. [DOI] [PubMed] [Google Scholar]
- [104].Thompson JM, Sgourakis NG, Liu G, Rossi P, Tang Y, Mills JL, Szyperski T,Montelione GT, Baker D, Accurate protein structure modeling using sparse NMR data and homologous structure information, Proc. Natl. Acad. Sci. USA 109 (2012) 9875–9880, 10.1073/pnas.1202485109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [105].Thompson J, Baker D, Incorporation of evolutionary information into Rosetta comparative modeling, Proteins 79 (2011) 2380–2388, 10.1002/prot.23046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [106].Söding J, Protein homology detection by HMM-HMM comparison, Bioinf.Oxf. Engl 21 (2005) 951–960, 10.1093/bioinformatics/bti125. [DOI] [PubMed] [Google Scholar]
- [107].Henikoff S, Henikoff JG, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89 (1992) 10915–10919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [108].Sahasrabudhe PV, Tejero R, Kitao S, Furuichi Y, Montelione GT, Homology modeling of an RNP domain from a human RNA-binding protein: homologyconstrained energy optimization provides a criterion for distinguishing potential sequence alignments, Proteins 33 (1998) 558–566. [PubMed] [Google Scholar]
- [109].Shen Y, Bax A, Homology modeling of larger proteins guided by chemical shifts, Nat. Methods 12 (2015) 747–750, 10.1038/nmeth.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [110].Shen Y, Bax A, Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks, J. Biomol. NMR 56 (2013) 227–241, 10.1007/s10858-013-9741-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [111].Smith TF, Waterman MS, Identification of common molecular subsequences, J. Mol. Biol 147 (1981) 195–197, 10.1016/0022-2836(81)90087–5. [DOI] [PubMed] [Google Scholar]
- [112].Madhusudhan MS, Marti-Renom MA, Sanchez R, Sali A, Variable gappenalty for protein sequence-structure alignment, Protein Eng. Des. Sel. PEDS 19 (2006) 129–133, 10.1093/protein/gzj005. [DOI] [PubMed] [Google Scholar]
- [113].Misura KMS, Chivian D, Rohl CA, Kim DE, Baker D, Physically realistic homology models built with rosetta can be more accurate than their templates, Proc. Natl. Acad. Sci. USA 103 (2006) 5361–5366, 10.1073/pnas.0509355103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [114].Song Y, DiMaio F, Wang RY-R, Kim D, Miles C, Brunette T, Thompson J, Baker D, High-resolution comparative modeling with RosettaCM, Structure 21 (2013) 1735–1742, 10.1016/j.str.2013.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [115].Marks DS, Hopf TA, Sander C, Protein structure prediction from sequence variation, Nat. Biotechnol 30 (2012) 1072–1080, 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [116].Hopf TA, Colwell LJ, Sheridan R, Sander C, Marks DS, 3D structures of membrane proteins from genomic sequencing, Cell 149 (2013) 1607–1621, 10.1016/j.cell.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [117].Ovchinnikov S, Park H, Varghese N, Huang P-S, Pavlopoulos GA, Kim DE,Kamisetty H, Kyrpides NC, Baker D, Protein structure determination using metagenome sequence data, Science 355 (2017) 294–298, 10.1126/science.aah4043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [118].Kamisetty H, Ovchinnikov S, Baker D, Assessing the utility of coevolutionbased residue–residue contact predictions in a sequence- and structure-rich era, Proc. Natl. Acad. Sci. USA 110 (2013) 15674–15679, 10.1073/pnas.1314045110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [119].Ovchinnikov S, Kamisetty H, Baker D, Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information, ELife 3 (2014), 10.7554/eLife.02030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [120].Sułkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN, Genomics-aided structure prediction, Proc. Natl. Acad. Sci 109 (2012) 10340–10345, 10.1073/pnas.1207864109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [121].Jones DT, Buchan DWA, Cozzetto D, Pontil M, PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics 28 (2012) 184–190, 10.1093/bioinformatics/btr638. [DOI] [PubMed] [Google Scholar]
- [122].Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E, Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models, Phys.Rev. E 87 (2013), 10.1103/PhysRevE.87.012707. [DOI] [PubMed] [Google Scholar]
- [123].de Juan D, Pazos F, Valencia A, Emerging methods in protein co-evolution,Nat. Rev. Genet 14 (2013) 249, 10.1038/nrg3414. [DOI] [PubMed] [Google Scholar]
- [124].Balakrishnan S, Kamisetty H, Carbonell JG, Lee S-I, Langmead CJ, Learning generative models for protein fold families, Proteins 79 (2011) 1061–1078, 10.1002/prot.22934. [DOI] [PubMed] [Google Scholar]
- [125].Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C, Protein 3D structure computed from evolutionary sequence variation, PLOS ONE 6 (2011) e28766, 10.1371/journal.pone.0028766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [126].Wang S, Sun S, Li Z, Zhang R, Xu J, Accurate De novo prediction of protein contact map by ultra-deep learning model, PLOS Comput. Biol 13 (2017)e1005324, 10.1371/journal.pcbi.1005324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [127].Braun T, Leman JK, Lange OF, Combining evolutionary information and an iterative sampling strategy for accurate protein structure prediction, PLoS Comput. Biol 11 (2015), 10.1371/journal.pcbi.1004661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [128].Tang Y, Huang Y, Hopf TA, Sander C, Marks DS, Montelione GT, Protein structure determination by combining sparse NMR data with evolutionary couplings, Nat. Methods 12 (2015) 751–754, 10.1038/nmeth.3455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [129].Eddy SR, Accelerated profile HMM searches, PLOS Comput. Biol 7 (2011) e1002195, 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [130].Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T, Kay LE, Selective methyl group protonation of perdeuterated proteins, J. Mol. Biol 263 (1996) 627–636, 10.1006/jmbi.1996.0603. [DOI] [PubMed] [Google Scholar]
- [131].Gardner KH, Rosen MK, Kay LE, Global folds of highly deuterated, methylprotonated proteins by multidimensional NMR, Biochemistry (Mosc.) 36 (1997) 1389–1401, 10.1021/bi9624806. [DOI] [PubMed] [Google Scholar]
- [132].Tugarinov V, Kanelis V, Kay LE, Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy, Nat. Protoc 1 (2006) 749–754, 10.1038/nprot.2006.101. [DOI] [PubMed] [Google Scholar]
- [133].Huang YJ, Tejero R, Powers R, Montelione GT, A topology-constrained distance network algorithm for protein structure determination from NOESY data, Proteins Struct. Funct. Bioinf 62 (2006) 587–603, 10.1002/prot.20820. [DOI] [PubMed] [Google Scholar]
- [134].Mao B, Tejero R, Baker D, Montelione GT, Protein NMR structures refined with rosetta have higher accuracy relative to corresponding X-ray crystal structures, J. Am. Chem. Soc 136 (2014) 1893–1906, 10.1021/ja409845w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [135].Han J-H, Batey S, Nickson AA, Teichmann SA, Clarke J, The folding and evolution of multidomain proteins, Nat. Rev. Mol. Cell Biol 8 (2007) 319–330, 10.1038/nrm2144. [DOI] [PubMed] [Google Scholar]
- [136].Guijarro JI, Morton CJ, Plaxco KW, Campbell ID, Dobson CM, Folding kinetics of the SH3 domain of PI3 kinase by real-time NMR combined with optical spectroscopy, J. Mol. Biol 276 (1998) 657–667, 10.1006/jmbi.1997.1553. [DOI] [PubMed] [Google Scholar]
- [137].Sgourakis NG, Lange OF, DiMaio F, André I, Fitzkee NC, Rossi P, Montelione GT, Bax A, Baker D, Determination of the structures of symmetric protein oligomers from nmr chemical shifts and residual dipolar couplings, J.Am. Chem. Soc 133 (2011) 6288–6298, 10.1021/ja111318m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [138].van Dijk ADJ, Boelens R, Bonvin AMJJ, Data-driven docking for the study of biomolecular complexes, FEBS J 272 (2005) 293–312, 10.1111/j.1742-4658.2004.04473.x. [DOI] [PubMed] [Google Scholar]
- [139].Dominguez C, Boelens R, Bonvin AMJJ, HADDOCK: a protein_protein docking approach based on biochemical or biophysical information, J. Am.Chem. Soc 125 (2003) 1731–1737, 10.1021/ja026939x. [DOI] [PubMed] [Google Scholar]
- [140].de Vries SJ, van Dijk M, Bonvin AMJJ, The HADDOCK web server for datadriven biomolecular docking, Nat. Protoc 5 (2010) 883, 10.1038/nprot.2010.32. [DOI] [PubMed] [Google Scholar]
- [141].van Zundert GCP, Bonvin AMJJ, Modeling Protein–Protein Complexes Using the HADDOCK Webserver, Modeling Protein Complexes with HADDOCK, in: Struct Protein. Predict, Humana Press, New York, NY, 2014:pp. 163–179. doi: 10.1007/978-1-4939-0366-5_12. [DOI] [PubMed] [Google Scholar]
- [142].van Zundert GCP, Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis PL, Karaca E, Melquiond ASJ, van Dijk M, de Vries SJ, Bonvin AMJJ, The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes, J. Mol. Biol 428 (2016) 720–725, 10.1016/j.jmb.2015.09.014. [DOI] [PubMed] [Google Scholar]
- [143].Nilges M, A calculation strategy for the structure determination of symmetric dimers by 1H NMR, Proteins Struct. Funct. Bioinf 17 (1993) 297–309, 10.1002/prot.340170307. [DOI] [PubMed] [Google Scholar]
- [144].Nilges M, Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities, J. Mol. Biol 245 (1995) 645–660, 10.1006/jmbi.1994.0053. [DOI] [PubMed] [Google Scholar]
- [145].André I, Bradley P, Wang C, Baker D, Prediction of the structure of symmetrical protein assemblies, Proc. Natl. Acad. Sci. USA 104 (2007) 17656–17661, 10.1073/pnas.0702626104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [146].Sgourakis NG, Yau W-M, Qiang W, Modeling an in-register, parallel ‘‘iowa”ab fibril structure using solid-state NMR data from labeled samples with rosetta, Struct. Lond. Engl 1993 (23) (2015) 216–227, 10.1016/j.str.2014.10.022. [DOI] [PubMed] [Google Scholar]
- [147].Morag O, Sgourakis NG, Baker D, Goldbourt A, The NMR-Rosetta capsid model of M13 bacteriophage reveals a quadrupled hydrophobic packing epitope, Proc. Natl. Acad. Sci. USA 112 (2015) 971–976, 10.1073/pnas.1415393112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [148].Loquet A, Sgourakis NG, Gupta R, Giller K, Riedel D, Goosmann C, Griesinger C, Kolbe M, Baker D, Becker S, Lange A, Atomic model of the type III secretion system needle, Nature 486 (2012) 276–279, 10.1038/nature11079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [149].Das R, André I, Shen Y, Wu Y, Lemak A, Bansal S, Arrowsmith CH, Szyperski T, Baker D, Simultaneous prediction of protein folding and docking at high resolution, Proc. Natl. Acad. Sci. USA 106 (2009) 18978–18983, 10.1073/pnas.0904407106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [150].Porter JR, Weitzner BD, Lange OF, A framework to simplify combined sampling strategies in Rosetta, PLOS ONE 10 (2015) e0138220, 10.1371/journal.pone.0138220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [151].Rossi P, Shi L, Liu G, Barbieri CM, Lee H-W, Grant TD, Luft JR, Xiao R, Acton TB, Snell EH, Montelione GT, Baker D, Lange OF, Sgourakis NG, A hybrid NMR/SAXS-based approach for discriminating oligomeric protein interfaces using Rosetta, Proteins 83 (2015) 309–317, 10.1002/prot.24719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [152].Montalvao RW, Cavalli A, Salvatella X, Blundell TL, Vendruscolo M,Structure determination of protein_protein complexes using NMR chemical shifts: case of an endonuclease colicin_immunity protein complex, J. Am.Chem. Soc 130 (2008) 15990–15996, 10.1021/ja805258z. [DOI] [PubMed] [Google Scholar]
- [153].Ritchie DW, Kemp GJL, Protein docking using spherical polar Fourier correlations, Proteins Struct. Funct. Bioinf 39 (2000) 178–194, (SICI)1097–0134(20000501)39:2<178::AID-PROT8>3.0.CO;2–6. [PubMed] [Google Scholar]
- [154].Cavalli A, Montalvao RW, Vendruscolo M, Using chemical shifts to determine structural changes in proteins upon complex formation, J. Phys.Chem. B 115 (2011) 9491–9494, 10.1021/jp202647q. [DOI] [PubMed] [Google Scholar]
- [155].Sekhar A, Kay LE, NMR paves the way for atomic level descriptions of sparsely populated, transiently formed biomolecular conformers, Proc. Natl.Acad. Sci. USA 110 (2013) 12867–12874, 10.1073/pnas.1305688110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [156].Anthis NJ, Clore GM, Visualizing transient dark states by NMR spectroscopy, Q. Rev. Biophys 48 (2015) 35–116, 10.1017/S0033583514000122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [157].Clore GM, Exploring sparsely populated states of macromolecules by diamagnetic and paramagnetic NMR relaxation, Protein Sci. Publ. Protein Soc 20 (2011) 229–246, 10.1002/pro.576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [158].Palmer AG, Kroenke CD, Loria JP, Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules, Methods Enzymol 339 (2001) 204–238. [DOI] [PubMed] [Google Scholar]
- [159].Hansen DF, Vallurupalli P, Kay LE, Using relaxation dispersion NMR spectroscopy to determine structures of excited, invisible protein states, J.Biomol. NMR 41 (2008) 113–120, 10.1007/s10858-008-9251-5. [DOI] [PubMed] [Google Scholar]
- [160].Vallurupalli P, Bouvignies G, Kay LE, Studying ‘‘invisible” excited protein states in slow exchange with a major state conformation, J. Am. Chem. Soc 134 (2012) 8148–8161, 10.1021/ja3001419. [DOI] [PubMed] [Google Scholar]
- [161].Korzhnev DM, Religa TL, Banachewicz W, Fersht AR, Kay LE, A transient and low-populated protein-folding intermediate at atomic resolution,Science 329 (2010) 1312–1316, 10.1126/science.1191723. [DOI] [PubMed] [Google Scholar]
- [162].Neudecker P, Robustelli P, Cavalli A, Walsh P, Lundström P, Zarrine-Afsar A, Sharpe S, Vendruscolo M, Kay LE, Structure of an intermediate state in protein folding and aggregation, Science 336 (2012) 362–366, 10.1126/science.1214203. [DOI] [PubMed] [Google Scholar]
- [163].Bouvignies G, Vallurupalli P, Hansen DF, Correia BE, Lange O, Bah A, Vernon RM, Dahlquist FW, Baker D, Kay LE, Solution structure of a minor and transiently formed state of a T4 lysozyme mutant, Nature 477 (2011) 111–114, 10.1038/nature10349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [164].Dethoff EA, Petzold K, Chugh J, Casiano-Negroni A, Al-Hashimi HM,Visualizing transient low-populated structures of RNA, Nature 491 (2012)724–728, 10.1038/nature11498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [165].Ma P, Haller JD, Zajakala J, Macek P, Sivertsen AC, Willbold D, Boisbouvier J, Schanda P, Probing transient conformational states of proteins by solid-state R1q relaxation-dispersion NMR spectroscopy, Angew. Chem.Int. Ed 53 (2014) 4312–4317, 10.1002/anie.201311275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [166].Anthis NJ, Doucleff M, Clore GM, Transient, sparsely populated compact states of apo and calcium-loaded calmodulin probed by paramagnetic relaxation enhancement: interplay of conformational selection and induced fit, J. Am. Chem. Soc 133 (2011) 18966–18974, 10.1021/ja2082813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [167].Williams B, Zhao B, Zhang Q, Guffy SL, An excited state underlies gene regulation of a transcriptional riboswitch, Nat. Chem. Biol 13 (2017) 968, 10.1038/nchembio.2427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [168].Oyen D, Fenwick RB, Aoto PC, Stanfield RL, Wilson IA, Dyson HJ, Wright PE, Defining the structural basis for allosteric product release from E. coli dihydrofolate reductase using nmr relaxation dispersion, J. Am. Chem. Soc 139 (2017) 11233–11240, 10.1021/jacs.7b05958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [169].Skrynnikov NR, Dahlquist FW, Kay LE, Reconstructing NMR Spectra of ‘‘Invisible” excited protein states using HSQC and HMQC experiments, J. Am.Chem. Soc 124 (2002) 12352–12360, 10.1021/ja0207089. [DOI] [PubMed] [Google Scholar]
- [170].Barette J, Velyvis A, Religa TL, Korzhnev DM, Kay LE, Cross-validation of the structure of a transiently formed and low populated FF domain folding intermediate determined by relaxation dispersion NMR and CS-Rosetta, J.Phys. Chem. B 116 (2012) 6637–6644, 10.1021/jp209974f. [DOI] [PubMed] [Google Scholar]
- [171].Chen J-L, Wang X, Yang F, Cao C, Otting G, Su X-C, 3D Structure determination of an unstable transient enzyme intermediate by paramagnetic NMR spectroscopy, Angew. Chem. Int. Ed Engl 55 (2016) 13744–13748, 10.1002/anie.201606223. [DOI] [PubMed] [Google Scholar]
- [172].Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM, The Xplor-NIH NMR molecular structure determination package, J. Magn. Reson 160 (2003) 65–73, 10.1016/S1090-7807(02)00014–9. [DOI] [PubMed] [Google Scholar]
- [173].Eisenmesser EZ, Millet O, Labeikovsky W, Korzhnev DM, Wolf-Watz M, Bosco DA, Skalicky JJ, Kay LE, Kern D, Intrinsic dynamics of an enzyme underlies catalysis, Nature 438 (2005) 117, 10.1038/nature04105. [DOI] [PubMed] [Google Scholar]
- [174].Wiesner S, Sprangers R, Methyl groups as NMR probes for biomolecular interactions, Curr. Opin. Struct. Biol 35 (2015) 60–67, 10.1016/j.sbi.2015.08.010. [DOI] [PubMed] [Google Scholar]
- [175].Kussell E, Shimada J, Shakhnovich EI, Side-chain dynamics and protein folding, Proteins Struct. Funct. Bioinf 52 (2003) 303–321, 10.1002/prot.10426. [DOI] [PubMed] [Google Scholar]
- [176].Krivov GG, Shapovalov MV, Dunbrack RL, Improved prediction of protein side-chain conformations with SCWRL4, Proteins 77 (2009) 778–795, 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [177].Dunbrack RL, Cohen FE, Bayesian statistical analysis of protein side-chain rotamer preferences, Protein Sci. Publ. Protein Soc 6 (1997) 1661–1681, 10.1002/pro.5560060807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [178].Dunbrack RL, Karplus M, Backbone-dependent rotamer library for proteins.Application to side-chain prediction, J. Mol. Biol 230 (1993) 543–574, 10.1006/jmbi.1993.1170. [DOI] [PubMed] [Google Scholar]
- [179].Lovell SC, Word JM, Richardson JS, Richardson DC, The penultimate rotamer library, Proteins 40 (2000) 389–408. [PubMed] [Google Scholar]
- [180].Bhuyan MSI, Gao X, A protein-dependent side-chain rotamer library, BMC Bioinf 12 (2011) S10, 10.1186/1471-2105-12-S14-S10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [181].Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M, Structure-based prediction of methyl chemical shifts in proteins, J. Biomol. NMR 50 (2011)331–346, 10.1007/s10858-011-9524-2. [DOI] [PubMed] [Google Scholar]
- [182].Grant DM, Paul EG, Carbon-13 magnetic resonance. II. Chemical shift data for the alkanes, J. Am. Chem. Soc 86 (1964) 2984–2990, 10.1021/ja01069a004. [DOI] [Google Scholar]
- [183].Neri D, Szyperski T, Otting G, Senn H, Wuethrich K, Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional carbon-13 labeling, Biochemistry (Mosc.) 28 (1989) 7510–7516, 10.1021/bi00445a003. [DOI] [PubMed] [Google Scholar]
- [184].Gans P, Hamelin O, Sounier R, Ayala I, Durá MA, Amero CD, Noirclerc-Savoye M, Franzetti B, Plevin MJ, Boisbouvier J, Stereospecific isotopic labeling of methyl groups for NMR spectroscopic studies of high-molecular-weight proteins, Angew. Chem. Int. Ed 49 (2010) 1958–1962, 10.1002/anie.200905660. [DOI] [PubMed] [Google Scholar]
- [185].London RE, Wingad BD, Mueller GA, Dependence of amino acid side chain 13C shifts on dihedral angle: application to conformational analysis, J. Am.Chem. Soc 130 (2008) 11097–11105, 10.1021/ja802729t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [186].Bax A, Max D, Zax D, Measurement of long-range 13C–13C J couplings in a 20-kDa protein-peptide complex, J. Am. Chem. Soc 114 (1992) 6923–6925, 10.1021/ja00043a052. [DOI] [Google Scholar]
- [187].Mulder FAA, Leucine side-chain conformation and dynamics in proteins from 13C NMR chemical shifts, ChemBioChem 10 (2009) 1477–1479, 10.1002/cbic.200900086. [DOI] [PubMed] [Google Scholar]
- [188].Hansen DF, Neudecker P, Vallurupalli P, Mulder FAA, Kay LE,Determination of leu side-chain conformations in excited protein states by NMR relaxation dispersion, J. Am. Chem. Soc 132 (2010) 42–43, 10.1021/ja909294n. [DOI] [PubMed] [Google Scholar]
- [189].Hansen DF, Neudecker P, Kay LE, Determination of isoleucine side-chain conformations in ground and excited states of proteins from chemical shifts,J. Am. Chem. Soc 132 (2010) 7589–7591, 10.1021/ja102090z. [DOI] [PubMed] [Google Scholar]
- [190].Chou JJ, Case DA, Bax A, Insights into the mobility of methyl-bearing side chains in proteins from 3JCC and 3JCN couplings, J. Am. Chem. Soc 125 (2003) 8959–8966, 10.1021/ja029972s. [DOI] [PubMed] [Google Scholar]
- [191].Butterfoss GL, DeRose EF, Gabel SA, Perera L, Krahn JM, Mueller GA, Zheng X, London RE, Conformational dependence of 13C shielding and coupling constants for methionine methyl groups, J. Biomol. NMR 48 (2010) 31–47, 10.1007/s10858-010-9436-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [192].Hansen DF, Kay LE, Determining valine side-chain rotamer conformations in proteins from methyl 13C chemical shifts: application to the 360 kDa halfproteasome,J. Am. Chem. Soc 133 (2011) 8272–8281, 10.1021/ja2014532. [DOI] [PubMed] [Google Scholar]
- [193].Hong M, Mishanina TV, Cady SD, Accurate measurement of methyl 13C chemical shifts by solid-state NMR for the determination of protein side chain conformation: the influenza A M2 transmembrane peptide as an example, J. Am. Chem. Soc 131 (2009) 7806–7816, 10.1021/ja901550q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [194].Scouras AD, Daggett V, The dynameomics rotamer library: amino acid side chain conformations and dynamics from comprehensive molecular dynamics simulations in water, Protein Sci. Publ. Protein Soc 20 (2011) 341–352, 10.1002/pro.565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [195].Lam SL, Chi LM, Use of chemical shifts for structural studies of nucleic acids,Prog. Nucl. Magn. Reson. Spectrosc 56 (2010) 289–310, 10.1016/j.pnmrs.2010.01.002. [DOI] [PubMed] [Google Scholar]
- [196].van der Werf RM, Tessari M, Wijmenga SS, Nucleic acid helix structure determination from NMR proton chemical shifts, J. Biomol. NMR 56 (2013)95–112, 10.1007/s10858-013-9725-y. [DOI] [PubMed] [Google Scholar]
- [197].Barton S, Heng X, Johnson BA, Summers MF, Database proton NMR chemical shifts for RNA signal assignment and validation, J. Biomol. NMR 55(2013) 33–46, 10.1007/s10858-012-9683-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [198].Frank AT, Horowitz S, Andricioaei I, Al-Hashimi HM, Utility of 1H NMR chemical shifts in determining RNA structure and dynamics, J. Phys. Chem. B 117 (2013) 2045–2052, 10.1021/jp310863c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [199].Aeschbacher T, Schmidt E, Blatter M, Maris C, Duss O, Allain FH-T, Güntert P, Schubert M, Automated and assisted RNA resonance assignment using NMR chemical shift statistics, Nucleic Acids Res 41 (2013) e172, 10.1093/nar/gkt665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [200].Wijmenga SS, Kruithof M, Hilbers CW, Analysis of 1H chemical shifts in DNA: assessment of the reliability of 1H chemical shift calculations for use in structure refinement, J. Biomol. NMR 10 (1997) 337–350, 10.1023/A:1018348123074. [DOI] [PubMed] [Google Scholar]
- [201].Dejaegere A, Bryce RA, Case DA, An Empirical Analysis of Proton Chemical Shifts in Nucleic Acids, in: Model. NMR Chem. Shifts, American Chemical Society, 1999: pp. 194–206. doi: 10.1021/bk-1999-0732.ch014. [DOI] [Google Scholar]
- [202].Cromsigt JA, Hilbers CW, Wijmenga SS, Prediction of proton chemical shifts in RNA. Their use in structure refinement and validation, J. Biomol.NMR 21 (2001) 11–29. [DOI] [PubMed] [Google Scholar]
- [203].Victora A, Möller HM, Exner TE, Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes, Nucleic Acids Res 42 (2014) e173, 10.1093/nar/gku1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [204].Das R, Baker D, Automated de novo prediction of native-like RNA tertiary structures, Proc. Natl. Acad. Sci. USA 104 (2007) 14664–14669, 10.1073/pnas.0703836104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [205].Ding F, Sharma S, Chalasani P, Demidov VV, Broude NE, Dokholyan NV,Ab initio RNA folding by discrete molecular dynamics: From structure prediction to folding mechanisms, RNA 14 (2008) 1164–1173, 10.1261/rna.894608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [206].Sharma S, Ding F, Dokholyan NV, iFoldRNA: three-dimensional RNA structure prediction and folding, Bioinformatics 24 (2008) 1951–1952, 10.1093/bioinformatics/btn328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [207].Parisien M, Major F, The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data, Nature 452 (2008) 51, 10.1038/nature06684. [DOI] [PubMed] [Google Scholar]
- [208].Das R, Karanicolas J, Baker D, Atomic accuracy in predicting and designing noncanonical RNA structure, Nat. Methods 7 (2010) 291–294, 10.1038/nmeth.1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [209].Sripakdeevong P, Kladwang W, Das R, An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling, Proc. Natl. Acad. Sci. USA 108(2011) 20573–20578, 10.1073/pnas.1106516108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [210].Sripakdeevong P, Cevec M, Chang AT, Erat MC, Ziegeler M, Zhao Q, Fox GE, Gao X, Kennedy SD, Kierzek R, Nikonowicz EP, Schwalbe H, Sigel R, Turner DH, Das R, Structure determination of noncanonical RNA motifs guided by 1H NMR chemical shifts, Nat. Methods 11 (2014) 413–416, 10.1038/nmeth.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [211].Nilges M, O’Donoghue SI, Ambiguous NOEs and automated NOE assignment, Prog. Nucl. Magn. Reson. Spectrosc 32 (1998) 107–139, 10.1016/S0079-6565(97)00025–3. [DOI] [Google Scholar]
- [212].Herrmann T, Güntert P, Wüthrich K, Protein NMR Structure Determination with Automated NOE Assignment Using the New Software CANDID and the Torsion Angle Dynamics Algorithm DYANA, J. Mol. Biol 319 (2002) 209–227, 10.1016/S0022-2836(02)00241–3. [DOI] [PubMed] [Google Scholar]
- [213].Güntert P, Berndt KD, Wüthrich K, The program ASNO for computersupported collection of NOE upper distance constraints as input for protein structure determination, J. Biomol. NMR 3 (1993) 601–606, 10.1007/BF00174613. [DOI] [Google Scholar]
- [214].Lange OF, Automatic NOESY assignment in CS-RASREC-Rosetta, J. Biomol.NMR 59 (2014) 147–159, doi: 10.1007%2Fs10858-014-9833-3. [DOI] [PubMed] [Google Scholar]
- [215].Nilges M, Ambiguous distance data in the calculation of NMR structures, Fold.Des 2 (1997) S53–S57, 10.1016/S1359-0278(97)00064–3. [DOI] [PubMed] [Google Scholar]
- [216].Evangelidis T, Nerli S, Novácˇek J, Brereton AE, Karplus PA, Dotas RR, Venditti V, Sgourakis NG, Tripsianes K, Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra, Nat. Commun 9 (2018) 384, 10.1038/s41467-017-02592-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [217].Würz JM, Kazemi S, Schmidt E, Bagaria A, Güntert P, NMR-based automated protein structure determination, Arch. Biochem. Biophys 628(2017) 24–32, 10.1016/j.abb.2017.02.011. [DOI] [PubMed] [Google Scholar]
- [218].Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B,Fan W, Comparison of the two major classes of assembly algorithms:overlap–layout–consensus and de-bruijn-graph, Brief. Funct. Genomics 11(2012) 25–37, 10.1093/bfgp/elr035. [DOI] [PubMed] [Google Scholar]
- [219].Anglister J, Srivastava G, Naider F, Detection of intermolecular NOE interactions in large protein complexes, Prog. Nucl. Magn. Reson. Spectrosc 97 (2016) 40–56, 10.1016/j.pnmrs.2016.08.002. [DOI] [PubMed] [Google Scholar]
- [220].Blackledge M, Recent progress in the study of biomolecular structure and dynamics in solution from residual dipolar couplings, Prog. Nucl. Magn.Reson. Spectrosc 46 (2005) 23–61, 10.1016/j.pnmrs.2004.11.002. [DOI] [Google Scholar]
- [221].Clore GM, Iwahara J, Theory, practice, and applications of paramagnetic relaxation enhancement for the characterization of transient low-population states of biological macromolecules and their complexes, Chem. Rev 109 (2009) 4108–4139, 10.1021/cr900033p. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [222].Tjandra N, Bax A, Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium, Science 278 (1997) 1111–1114. [DOI] [PubMed] [Google Scholar]
- [223].Ottiger M, Bax A, Determination of Relative N_HN, N_C0, Ca_C0, and Ca_Ha effective bond lengths in a protein by NMR in a dilute liquid crystalline phase, J. Am. Chem. Soc 120 (1998) 12334–12341, 10.1021/ja9826791. [DOI] [Google Scholar]
- [224].Yao L, Vögeli B, Ying J, Bax A, NMR determination of amide N_H equilibrium bond length from concerted dipolar coupling measurements, J.Am. Chem. Soc 130 (2008) 16518–16520, 10.1021/ja805654f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [225].Ulmer TS, Ramirez BE, Delaglio F, Bax A, Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy, J. Am. Chem. Soc 125 (2003) 9179–9191, 10.1021/ja0350684. [DOI] [PubMed] [Google Scholar]
- [226].Donald BR, Martin J, Automated NMR assignment and protein structure determination using sparse dipolar coupling constraints, Prog. Nucl. Magn.Reson. Spectrosc 55 (2009) 101–127, 10.1016/j.pnmrs.2008.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [227].Cornilescu G, Marquardt JL, Ottiger M, Bax A, Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase, J. Am. Chem. Soc 120 (1998) 6836–6837, 10.1021/ja9812610. [DOI] [Google Scholar]
- [228].Battiste JL, Wagner G, Utilization of site-directed spin labeling and highresolution heteronuclear nuclear magnetic resonance for global fold determination of large proteins with limited nuclear overhauser effect data,Biochemistry (Mosc.) 39 (2000) 5355–5365, 10.1021/bi000060h. [DOI] [PubMed] [Google Scholar]
- [229].Hartlmüller C, Göbl C, Madl T, Prediction of protein structure using surface accessibility data, Angew. Chem. Int. Ed 55 (2016) 11970–11974, 10.1002/anie.201604788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [230].Otting G, Protein NMR using paramagnetic ions, Annu. Rev. Biophys 39(2010) 387–405, 10.1146/annurev.biophys.093008.131321. [DOI] [PubMed] [Google Scholar]
- [231].Pilla KB, Otting G, Huber T, 3D computational modeling of proteins using sparse paramagnetic NMR data, Methods Mol. Biol. Clifton NJ 1526 (2017) 3–21, 10.1007/978-1-4939-6613-4_1. [DOI] [PubMed] [Google Scholar]
- [232].Schmitz C, Vernon R, Otting G, Baker D, Huber T, Protein structure determination from pseudocontact shifts using ROSETTA, J. Mol. Biol 416 (2012) 668–677, 10.1016/j.jmb.2011.12.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [233].Yagi H, Pilla KB, Maleckis A, Graham B, Huber T, Otting G, Threedimensional protein fold determination from backbone amide pseudocontact shifts generated by lanthanide tags at multiple sites, Structure 21 (2013)883–890, 10.1016/j.str.2013.04.001. [DOI] [PubMed] [Google Scholar]
- [234].Schmitz C, Bonvin AMJJ, Protein–protein HADDocking using exclusively pseudocontact shifts, J. Biomol. NMR 50 (2011) 263–266, 10.1007/s10858-011-9514-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [235].Wang S, Munro RA, Shi L, Kawamura I, Okitsu T, Wada A, Kim S-Y, Jung K-H, Brown LS, Ladizhansky V, Solid-state NMR spectroscopy structure determination of a lipid-embedded heptahelical membrane protein, Nat.Methods 10 (2013) 1007, 10.1038/nmeth.2635. [DOI] [PubMed] [Google Scholar]
- [236].Klammt C, Maslennikov I, Bayrhuber M, Eichmann C, Vajpai N, Chiu EJC,Blain KY, Esquivies L, Kwon JHJ, Balana B, Pieper U, Sali A, Slesinger PA,Kwiatkowski W, Riek R, Choe S, Facile backbone structure determination of human membrane proteins by NMR spectroscopy, Nat. Methods 9 (2012)834, 10.1038/nmeth.2033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [237].Sengupta I, Nadaud PS, Jaroniec CP, Protein structure determination with paramagnetic solid-state NMR spectroscopy, Acc. Chem. Res 46 (2013)2117–2126, 10.1021/ar300360q. [DOI] [PubMed] [Google Scholar]
- [238].Williamson MP, Havel TF, Wüthrich K, Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry, J. Mol. Biol 182 (1985) 295–315, 10.1016/0022-2836(85)90347-X. [DOI] [PubMed] [Google Scholar]
- [239].Sarkar R, Mainz A, Busi B, Barbet-Massin E, Kranz M, Hofmann T, Reif B,Immobilization of soluble protein complexes in MAS solid-state NMR:sedimentation versus viscosity, Solid State Nucl. Magn. Reson 76–77 (2016) 7–14, 10.1016/j.ssnmr.2016.03.005. [DOI] [PubMed] [Google Scholar]
- [240].Sakakibara D, Sasaki A, Ikeya T, Hamatsu J, Hanashima T, Mishima M, Yoshimasu M, Hayashi N, Mikawa T, Wälchli M, Smith BO, Shirakawa M, Güntert P, Ito Y, Protein structure determination in living cells by in-cell NMR spectroscopy, Nature 458 (2009), 10.1038/nature07814,nature07814. [DOI] [PubMed] [Google Scholar]
- [241].Perilla JR, Zhao G, Lu M, Ning J, Hou G, Byeon I-JL, Gronenborn AM, Polenova T, Zhang P, CryoEM structure refinement by integrating NMR chemical shifts with molecular dynamics simulations, J. Phys. Chem. B 121(2017) 3853–3863, 10.1021/acs.jpcb.6b13105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [242].Hennig J, Wang I, Sonntag M, Gabel F, Sattler M, Combining NMR and small angle X-ray and neutron scattering in the structural analysis of a ternary protein-RNA complex, J. Biomol. NMR 56 (2013) 17–30, 10.1007/s10858-013-9719-9. [DOI] [PubMed] [Google Scholar]
- [243].Hirst SJ, Alexander N, Mchaourab HS, Meiler J, RosettaEPR: an integrated tool for protein structure determination from sparse EPR data, J. Struct. Biol 173 (2011) 506–514, 10.1016/j.jsb.2010.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [244].Takeda M, Ikeya T, Güntert P, Kainosho M, Automated structure determination of proteins with the SAIL-FLYA NMR method, Nat. Protoc 2(2007) 2896–2902, 10.1038/nprot.2007.423. [DOI] [PubMed] [Google Scholar]
- [245].Xu Y, Liu M, Simpson PJ, Isaacson R, Cota E, Marchant J, Yang D, Zhang X,Freemont P, Matthews S, Automated assignment in selectively methyllabeled proteins, J. Am. Chem. Soc 131 (2009) 9480–9481, 10.1021/ja9020233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [246].Delaglio F, Grzesiek S, Vuister G, Zhu G, Pfeifer J, Bax A, NMRPipe: amultidimensional spectral processing system based on UNIX pipes, J. Biomol.NMR 6 (1995) 277–293. [DOI] [PubMed] [Google Scholar]
- [247].Ying J, Delaglio F, Torchia DA, Bax A, Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data, J. Biomol. NMR 68 (2017) 101–118, 10.1007/s10858-016-0072-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [248].Berry R, Vivian JP, Deuss FA, Balaji GR, Saunders PM, Lin J, Littler DR, Brooks AG, Rossjohn J, The structure of the cytomegalovirus-encoded m04 glycoprotein, a prototypical member of the m02 family of immunoevasins, J.Biol. Chem 289 (2014) 23753–23763, 10.1074/jbc.M114.584128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [249].DiMaio F, Leaver-Fay A, Bradley P, Baker D, André I, Modeling symmetric macromolecular structures in Rosetta3, PLOS ONE 6 (2011) e20450, 10.1371/journal.pone.0020450. [DOI] [PMC free article] [PubMed] [Google Scholar]
