Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Oct 6.
Published in final edited form as: Structure. 2015 Sep 10;23(10):1958–1966. doi: 10.1016/j.str.2015.07.019

Experimental Protein Structure Verification by Scoring with a Single, Unassigned NMR Spectrum

Joseph M Courtney 1, Qing Ye 1, Anna E Nesbitt 1,4, Ming Tang 1,5, Marcus D Tuttle 1, Eric D Watt 1,6, Kristin M Nuzzio 1, Lindsay J Sperling 1,7, Gemma Comellas 2, Joseph R Peterson 1, James H Morrissey 3, Chad M Rienstra 1,2,3,*
PMCID: PMC4786943  NIHMSID: NIHMS722494  PMID: 26365800

Abstract

Standard methods for de novo protein structure determination by nuclear magnetic resonance (NMR) require time-consuming data collection and interpretation efforts. Here we present a qualitatively distinct and novel approach, called Comparative, Objective Measurement of Protein Architectures by Scoring Shifts (COMPASS), which identifies the best structures from a set of structural models by numerical comparison with a single, unassigned 2D 13C-13C NMR spectrum containing backbone and side-chain aliphatic signals. COMPASS does not require resonance assignments. It is particularly well suited for interpretation of magic-angle spinning solid-state NMR spectra, but also applicable to solution NMR spectra. We demonstrate COMPASS with experimental data from four proteins—GB1, ubiquitin, DsbA, and the extracellular domain of human tissue factor—and with reconstructed spectra from 11 additional proteins. For all these proteins, with molecular mass up to 25 kDa, COMPASS distinguished the correct fold, most often within 1.5 Å root-mean-square deviation of the reference structure.

graphic file with name nihms-722494-f0001.jpg

INTRODUCTION

Nuclear magnetic resonance (NMR) is a powerful technique for studying protein structure and dynamics in near-native conditions. Substantial progress has been made in the solution of high-resolution protein structures by solid-state NMR (SSNMR) in the last decade. Structures previously inaccessible by solution NMR and X-ray crystallography, such as fibrils of the HET-s protein and amyloid-β, have been solved at atomic detail, offering insight into important biomedical problems (Wasmer et al., 2008; Lu et al., 2013). SSNMR approaches to solving structures of membrane proteins also have several notable successes (Shahid et al., 2012; Wang et al., 2013a; Park et al., 2012).

However, NMR methods, and SSNMR in particular, still require extensive sample preparation, data collection, and interpretation efforts. Typically, tens of milligrams of 13C,15N-labeled protein and several weeks of instrument time are required to collect the half a dozen or more 3D datasets necessary for the resonance assignments. Additional samples with sparse 13C labeling and weeks of instrument time are needed to obtain a sufficient number of inter-residue distances to determine the fold uniquely (Comellas and Rienstra, 2013). Methods are in development to shorten the lengthy process of data collection, including non-uniform sampling (Paramasivam et al., 2012; Hyberts et al., 2010; Sun et al., 2012), proton detection with fast magic-angle spinning (MAS) (Knight et al., 2011; Zhou et al., 2012; Barbet-Massin et al., 2014), and combinations of these two approaches (Linser et al., 2014). Dynamic nuclear polarization is also a very promising method for accelerating data collection times, yet is usually not compatible with conditions that yield high-resolution spectra (Maly et al., 2008; Wang et al., 2013b; Renault et al., 2012).

In addition to challenges associated with data collection, the assignment and interpretation of spectra to yield a structure remain major bottlenecks and can take months of manual data analysis. Although methods are now available to automate the assignment process (Moseley et al., 2010; Güntert 2009; Guerry and Herrmann, 2011; Schmidt et al., 2013), these approaches still require complete sets of 3D data and extensive manual intervention. Once resonance assignments are available, methods such as CS-ROSETTA (Shen et al., 2008) and CHESHIRE (Cavalli et al., 2007; Robustelli et al., 2010) are available to leverage the chemical shift data for structure determination. These approaches have been highly successful; yet still require complete sets of site-specific resonance assignments. Therefore, there remains a compelling need for alternative methods that are faster and more cost-effective, requiring less sample, instrument time, and analysis. Combining NMR with advances in protein structure prediction (both homology modeling and ab initio methods) offers a potential increase in efficiency (Simons et al., 1997; Eswar et al., 2002; Moult et al., 2014). This approach requires validation by comparing predicted NMR observables from the models with empirical or experimental data. In all prior methods, this has been done using sequence-specific resonance assignments.

Here we present a method, called Comparative, Objective Measurement of Protein Architectures by Scoring Shifts (COMPASS), which aims to extract structural information from NMR spectra by fully leveraging a limited amount of experimental data—one 2D 13C-13C spectrum—to accurately distinguish the correct protein fold from a set of proposed models. This avoids the lengthy structure determination process and requires no manual analysis of spectra. COMPASS solely employs the numerical comparison of predicted spectra from structural models, produced by various methods (e.g., homology modeling, molecular dynamics, ab initio quantum chemistry), with a single, unassigned 2D 13C-13C NMR spectrum, utilizing the dependence of chemical shifts upon protein conformation.

COMPASS leverages the accuracy of 13C chemical shift prediction methods, and in this study we utilize SHIFTX2 (Han et al., 2011). For each protein, we collect a 13C-13C homonuclear correlation spectrum under conditions of scalar or dipolar mixing that yield exclusively one-bond correlations throughout the entire aliphatic region (Chen et al., 2006; Hohwy et al., 1999). Cross-peaks in this spectrum are enumerated and filtered according to a simple heuristic to generate a list of unassigned peaks. Meanwhile, a series of models are generated from the amino acid sequence using either homology or ab initio methods, and the 13C chemical shifts are predicted for each model by SHIFTX2. Due to the simplicity and predictability of single-bond homonuclear correlation spectra, the hypothetical cross-peaks that would result from each model can be predicted (Figure 1). Then, using a scoring method based on the modified Hausdorff distance (Dubuisson and Jain, 1994) (see Figure 10), the models can be ranked according to their consistency with the experimental peak list. In the large majority of cases, the best model identified is consistent with the experimentally solved structure (see Figure 9).

Figure 1. Prediction of 13C-13C Correlation Spectra from Protein Models with SHIFTX.

Figure 1

The predicted chemical shifts are paired using a Python function that enumerates all directly bonded carbon pairs in the structure, and the corresponding chemical shifts are stored in a list without any assignment information. COSY, correlation spectroscopy.

Figure 10. COMPASS Score Calculation.

Figure 10

The COMPASS score is calculated by matching every experimental peak (black x) to the closest test peak (red circle) and calculating the average of the distances between them (gray line). A selected region from a comparison between a ubiquitin COSY spectrum and a poorly matching model is shown.

Figure 9. Flow Chart of the COMPASS Algorithm.

Figure 9

(A) The algorithm takes as input a 13C-13C correlation spectrum. A selected region for a spectrum of ubiquitin is shown.

(B) The peaks are enumerated and stored as a list of unassigned chemical shift pairs.

(C) A collection of test models is produced. The model shown was generated by MODELLER and has a Cα RMSD of 8.5 Å with respect to the reference structure, PDB: 1UBQ.

(D) The chemical shifts for each model are predicted by SHIFTX2, and a list of peaks that would occur in a 13C-13C correlation spectrum is generated.

(E) The experimental and model peak lists are compared using the COMPASS score. Blue lines indicate the minimum distances described in the text.

(F) In this example the COMPASS score from the experimental peak list to the model is 0.902 ppm (point indicated with blue arrow), a relatively high value. The models are then ranked in the order of their computed COMPASS score.

RESULTS AND DISCUSSION

We selected 16 proteins, ranging in molecular mass from 6.6 to 33.6 kDa, to test COMPASS. For all selected proteins, high-quality structures of the monomeric form in the absence of any perturbing ligands are available in the PDB (Bernstein et al., 1977). 2D one-bond 13C-13C correlation spectra under solid-state conditions (MAS) were collected for four of these proteins: GB1, ubiquitin, DsbA, and the extracellular domain of human tissue factor (TF). For GB1, ubiquitin, and DsbA, constant-time, uniform-sign cross-peak correlation spectroscopy (CTUC-COSY) spectra were collected. For TF, we collected an SPC5 spectrum with a short mixing time to observe only one-bond transfers (Hohwy et al., 1999). Other pulse sequences that generate one-bond correlations could also be employed.

Automated Peak Filtering

Peaks were picked using the automated peak picking function of the Sparky NMR data analysis program (Goddard and Kneller, 2004). A range of noise floors was tested and an optimal minimum signal-to-noise ratio of 6 was chosen on the basis of testing shown in Figure S1. Peaks were then filtered to retain only those in the aliphatic region (0–80 ppm), at least 0.5 ppm away from the diagonal. The lists were then further filtered to retain only those peaks that were observed on both sides of the diagonal within a cutoff of 0.3 ppm (Figures 2B and 2D). This automated peak picking and filtering heuristic contributes significantly to the noise tolerance of COMPASS, as observed by the exclusion of the majority of the noise peaks even in a spectrum picked with a noise floor of twice the root-mean-square (RMS) noise (Figure 2B).

Figure 2. Peak Filtering Procedure.

Figure 2

(A) Peaks automatically picked in the Sparky analysis program with a noise floor set at twice the root-mean-square (RMS) noise level.

(B) The same peaks after being filtered to exclude points near the diagonal and peaks without corresponding peaks opposite the diagonal.

(C) Peaks automatically picked with a noise floor set at six times the RMS noise level.

(D) The same data as (C), but filtered as in (B).

See also Figure S1.

Evaluation of COMPASS Score

Next, we investigated the relationship between the scores of a group of models and their Cα RMS deviations (RMSDs) measured against the reference structure deposited in the PDB to test the behavior of the COMPASS score on models of differing accuracy. Figure 3 shows plots of the COMPASS score versus Cα RMSD for the four proteins with peak lists obtained directly from 2D spectra. For all four examples, models with lowest scores have low RMSDs. The obverse, however, is not always true. As can be seen, especially for GB1 (Figure 3A), many models with RMSD below 2 Å have scores greater than or equal to those models with RMSD >10 Å . This phenomenon occurs because the scores depend not only on the Cα-Cβ correlations, which report most strongly on secondary structure, but also on cross-peaks involving side-chain carbons, which report more strongly on the local environment (Han et al., 2011). Therefore, models with the correct side-chain conformations will agree best with the NMR data (i.e., exhibit the lowest scores). This behavior gives the COMPASS score a conservative character in that it rejects some models that have good coarse-grain structure but incorrect side-chain packing, while uniformly rejecting models with incorrect folds. Consistent with the score’s sensitivity to side-chain conformation, there is a decreased correlation between the score and RMSD at higher RMSD values, since models with extremely different back-bone structure but energetically optimized side chains are very unlikely to have conformations that would produce similar side-chain 13C chemical shifts.

Figure 3. COMPASS Results for Four Proteins with Unassigned NMR Data.

Figure 3

(A–D) COMPASS Score versus Cα RMSD from the reference structure for (A) GB1 (PDB: 2LGI), (B) ubiquitin (PDB: 1UBQ), (C) DsbA (PDB: 1FVK), and (D) TF (PDB: 1BOY). The structure with the lowest COMPASS score is shown in blue and indicated with an arrow.

(E–H) The structure with the lowest COMPASS score (blue) overlaid with the reference structure (red). The Cα RMS deviation (RMSD) is noted.

(I–L) The five lowest scoring structures aligned and overlaid. The average pairwise Cα RMSD is noted.

Overlays of the reference structure (red) with the model with the lowest score (blue) for each protein are shown in Figures 3E–3H. For all tested proteins, the bundle RMSD acts as a good surrogate for the actual RMSD from the true structure. When the bundle of five lowest-score structures had an acceptably small average pairwise RMSD, the consensus structure also had a low RMSD with respect to the reference structure (Figure 4).

Figure 4. Ordered Bundle RMSD.

Figure 4

Models are scored and ordered by the COMPASS scores. The bundle RMSD is the average RMSD of the four models with COMPASS scores closest to its own.

(A) COMPASS score versus bundle RMSD showing the “funneling” toward the origin, indicating a dataset containing a correct consensus structure.

(B) The bundle RMSD is highly correlated with the Cα RMSD to the correct structure, which enables its use as a surrogate when the true structure is unknown.

We chose an additional 11 proteins with known structure and complete 13C chemical shift assignments from the Biological Magnetic Resonance DataBank (BMRB) to test the performance of COMPASS on a wider range of structures (Ulrich et al., 2008). In lieu of raw spectra, we reconstructed peak lists from the known assignments using the same algorithm applied to predicting model peak lists. Although the sequence-specific assignments were available for these cases, the assignment information was not carried forward in the calculation.

The COMPASS score performed similarly well for most proteins in the synthetic dataset (Figures 5, 6, and 7). However, for the protein StR65, none of the models predicted by MODELLER had an RMSD below 10 Å . For this dataset, the COMPASS score exhibits the desirable quality that the five structures that agree most closely with the experimental data have an average pairwise RMSD of over 22.4 Å , providing an unambiguous indication that a consensus structure does not exist in the model set (Figures 7D–7F). As expected, if the set of models supplied to COMPASS does not contain any models that are consistent with the experimental data, a consensus structure cannot be identified.

Figure 5. Additional COMPASS Results for Synthetic Peak Lists Constructed from BMRB-Deposited Chemical Shifts.

Figure 5

(A–C) COMPASS score versus Cα RMSD from the reference structure for (A) Ufm1-conjugating enzyme 1 (PDB: 2Z6O), (B) macrophage metalloelastase (PDB: 2KRJ), (C) α-parvalbumin (PDB: 1RWY). The structure with the lowest COMPASS score is shown in blue and indicated with an arrow.

(D–F) The structure with the lowest COMPASS score (blue) overlaid with the reference structure (red). The Cα RMSD is noted.

(G–I) The overlay of five structures from each calculation with the lowest COMPASS scores. The average pairwise Cα RMSD is noted.

Figure 6. Additional COMPASS Results for Synthetic Peak Lists Constructed from BMRB-Deposited Chemical Shifts.

Figure 6

(A–C) COMPASS score versus Cα RMSD from the reference structure for (A) Basic fibroblast growth factor (PDB: 1BFG), (B) sterol carrier protein 2 (PDB: 1C44), and (C) integrin a-L (PDB: 1XUO). The structure with the lowest COMPASS score is shown in blue and indicated with an arrow.

(D–F) The structure with the lowest COMPASS score (blue) overlaid with the reference structure (red). The Cα RMSD is noted.

(G–I) The overlay of five structures from each calculation with the lowest COMPASS scores. The average pairwise Cα RMSD is noted. See also Figures S2 and S3.

Figure 7. Behavior of the COMPASS Scoring Method when Applied to Incorrect Models.

Figure 7

(A–C) Coactosin-like protein (A) COMPASS score versus Cα RMSD from PDB: 1T3Y. Point with anomalously low score is blue and noted with an arrow. (B) Structure from PDB: 1T3Y. (C) Structure of outlier model showing split structure.

(D–F) NorthEast Structural Genomics consortium target STR65 (D) COMPASS score versus Cα RMSD from PDB: 2ES9. Points with five lowest COMPASS scores are denoted by large blue dots. (E) Structure from 2ES9. (F) Aligned overlay of five lowest COMPASS score structures. Cα RMSD is noted.

In one case, a model with a low score but a high RMSD was observed. In this calculation on coactosin-like protein, a single model was generated with a Cα RMSD of 13 Å but had a COMPASS score comparable with much better models (Figure 7A). Upon manual inspection of the outlying model, it is clear that the majority of the secondary and tertiary structure elements are correct, but the model corresponds to a protein with two domains dissociated from each other, tethered by an unstructured loop. While this outlier did not perform as expected, its score is still well above that of the consensus, which agrees with the reference structure to within an RMSD of 0.72 Å . Manual inspection or the application of structure validation programs would easily identify this model as incorrect, enabling its removal from the structure pool.

Application of COMPASS to Solution NMR Data

Although the COMPASS framework was developed to address the problems of spectral overlap and low sensitivity in NMR experiments, it does not rely on any special feature of SSNMR experiments. The performance of COMPASS on solution NMR data was tested by collecting 1H-15N HSQC (heteronuclear single-quantum coherence) and 13C-13C-1H TOCSY (total correlation spectroscopy) spectra for a uniformly 13C,15N-labeled ubiquitin solution. The 3D TOCSY spectrum was projected through the 1H dimension to generate a 13C-13C 2D spectrum.

The results for the HSQC comparison (Figure 8A) do not show a strong relationship between the COMPASS score and the RMSD. We attribute this result to the relative inaccuracy of chemical shift predictions for 15N and 1H amide resonances, due to the stronger dependence on hydrogen bonding and electrostatics, as well as backbone conformation and nearest neighbor residue type. For example, in contrast to the 13Cα predictions which have an RMSD of 0.38 ppm (relative to known chemical shifts for a set of test proteins) (Han et al., 2011), amide 15N predictions have an RMSD of 1.23 ppm, representing a 3-fold larger error over a similar range of chemical shifts (~30 ppm overall, or ~6–10 ppm for a given residue type). Moreover, the amide 1H shifts have an RMSD of 0.24 ppm over a range of ~3 ppm. Thus the relative error in predicting a 1H-15N correlation spectrum is significantly higher than for 13C-13C spectra, leading in the case of 1H-15N to an inability to conclusively identify the best structure among a set, even for the relatively simple case of ubiquitin.

Figure 8. COMPASS Applied to Solution NMR Data of Ubiquitin.

Figure 8

(A) SOFAST 1H-15N HSQC of ubiquitin.

(B) F3-projection of 13C-13C-1H TOCSY of ubiquitin.

(C) COMPASS score versus Cα RMSD for ubiquitin using peaks from HSQC. Difficulty in predicting amide proton and nitrogen shifts makes it unsuited for use with the COMPASS algorithm.

(D) COMPASS score versus Cα RMSD for ubiquitin using peaks from TOCSY spectrum projection. Just as in SSNMR data, the COMPASS score based on 13C-13C correlations has a strong relationship with Cα RMSD, allowing its use in the determination of experimentally consistent data.

In contrast, the COMPASS scores for the projected 13C-13C-1H TOCSY spectrum demonstrate a clear correlation and sharp convergence at a low RMSD value (Figure 8B), similar to the results observed for the solid-state NMR 13C-13C spectra, confirming that the strength of this method comes from its use of 13C chemical shifts.

Conclusions

We present a new method for objective direct comparison of a modeled protein structure with experimental NMR data. COMPASS greatly reduces the time and effort required to validate a structure with experimental data by circumventing the lengthy process of chemical shift assignment and the collection of large datasets to obtain distance and orientation information required for de novo structure determination. The method is robust with respect to data collection and peak picking protocols, and has good tolerance for noise and artifacts. Here we have demonstrated successful calculations for 15 proteins, four with experimental SSNMR data, one with experimental solution NMR data, and ten with reconstructed spectra from the BioMagResBank chemical shift database.

The COMPASS algorithm exploits the fact that the 13C chemical shift is an exquisitely sensitive reporter on conformation, including not only backbone conformation as evidenced in the secondary chemical shifts (Spera and Bax, 1991), but also the conformation of side chains and packing in the protein core, which give rise to ring current and van der Waals packing effects. COMPASS leverages developments in chemical shift prediction methodology that take these effects into account. Strategies based on empirical models, homology methods, quantum mechanical calculations, and machine learning have progressively improved the accuracy, Here we used SHIFTX2 (Han et al., 2011), which uses a hybrid approach combining a sequence homology module with an ensemble machine-learning method to attain good accuracy for both backbone and side-chain atoms. SHIFTX2 attains prediction accuracy of better than 0.6 ppm for α, β, and carbonyl carbons and better than 1.0 ppm accuracy for most side-chain carbons. This level of prediction accuracy enables us to use the inherent sensitivity of 13C chemical shifts to discern structural information from NMR data at a much earlier stage of analysis, and to quantitatively judge consistency of raw spectra with structural models. The rapid discrimination of valid protein folds by COMPASS may enable rational prioritization of subsequent data collection for structure refinement and acceleration of data analysis. For example, the experimentally consistent folds identified by COMPASS may be used to perform assignments of ambiguous correlations in spectra with long mixing times, reporting on long-range correlations.

As NMR is applied to systems of increasing complexity, manual data analysis becomes unfeasible. We envision potential future improvements including the application of COMPASS to 3D spectra, the use of the COMPASS score directly in model refinement and structure determination, as well as continued improvements in the accuracy of chemical shift prediction. In the current implementation only 13C chemical shifts are used but, to accommodate the inclusion of higher dimensionality data, weighted aggregate scoring functions could be devised to account for differing chemical shift prediction accuracy of different nuclei.

While the combination of MODELLER and SHIFTX works well for the primarily monomeric, globular proteins presented here, the COMPASS algorithm could straightforwardly be extended to more specialized areas by using integrative structure prediction approaches for multimeric assemblies (Sali et al., 2015) and utilizing molecular dynamics averaged chemical shift predictions for dynamic loops (Robustelli et al., 2012). In addition, our assignment-free approach can be used to replace many chemical shift similarity-based potentials for structure refinement, and possibly in methods utilizing chemical shifts to develop models of structural ensembles (Kannan et al., 2014).

The continual progression in the quality of model prediction methods and chemical shift prediction algorithms will benefit COMPASS because of its modular approach. By leveraging these increasingly accurate predictions combined with the simple automated analysis of COMPASS, previously inaccessible systems will become feasible. These advances may be particularly significant to address categories of proteins, such as membrane proteins and fibrils, which have historically been very challenging.

EXPERIMENTAL PROCEDURES

The COMPASS framework can be applied to any combination of model-generation method and chemical shift prediction algorithm. In this study, models were prepared using the MODELLER protein structure-modeling program, using a standard protocol (Eswar et al., 2002), and subsequently relaxed using the ab initio relaxation function in the Rosetta software package to ensure low-energy side-chain conformations (Simons et al., 1997). SHIFTX2 was used to predict chemical shifts due to its speed and its applicability to both backbone and side-chain carbons.

To simulate the 2D spectra, a Python program enumerates all adjacent 13C pairs, assembles the corresponding predicted chemical shifts into pairs, and records them in a list (Figure 9). The simulated peak list for each model is then compared with the experimental peak list using the COMPASS score, which is based on the modified Hausdorff distance. Hausdorff distances are a popular family of metrics in computational image analysis, and have found applications both in structure comparison and NOESY (nuclear Overhauser effect spectroscopy) peak matching (Zeng et al., 2008; Kozin and Svergun, 2001).

The COMPASS score is defined by Equations 1 and 2.

d(a,B)=minbBab, (Equation 1)
dCOMPASS(A,B)=1NAaAd(a,B). (Equation 2)

Equation 1 defines the distance between a point a and a point set B as the distance from point a to the closest point in set B. The COMPASS score is then defined in Equation 2 as the average of these minimum distances for every point in set A. This definition makes the COMPASS score directional, meaning that switching sets A and B gives different results. While this diverges from typical Hausdorff distances, it emphasizes the importance of the points in set A (chosen as the experimental peak set) over the points in set B (the predicted peaks). This way, every experimental peak is used in the calculation of the score but if the peak sets are very different, many of the predicted peaks (set B) may be ignored; for example, some regions of a protein may yield lower signal intensities experimentally.

The COMPASS score for each model is computed by matching each experimental peak with the nearest predicted peak in the model peak list, and calculating the average minimum distance for these pairings (Figure 10). The COMPASS score is therefore smaller for models that predict peak patterns similar to the experimental spectrum. In the limit of identical peak patterns, it would be identically zero. By weighting each experimental peak equally, the COMPASS score naturally addresses overlap and missing peaks in experimental spectra. If a peak is missing from the experimental spectrum, nearby peaks in the predicted spectrum are not matched and thus do not contribute to the overall score. Similarly, noise signals are deemphasized by the averaging procedure. Significant outliers that have no near matches in any model peak list contribute a similar magnitude to the scores of all models, manifesting as a nearly constant offset of all resulting scores.

Sample Preparation

The expression, purification, and crystallization of isotopically labeled recombinant ubiquitin was previously reported (Igumenova et al., 2004). The β1-immunoglobulin binding domain of protein G (GB1) was expressed and purified as previously reported (Franks et al., 2005). DsbA was expressed and purified according to the method of Sperling et al. (2010). Soluble TF was expressed and purified as described by Boettcher et al. (2010) and crystallized by precipitation in 1.6 M ammonium sulfate with 200 mM NaCl and 100 mM HEPES buffer (pH 7.5) at 4° C as previously reported (Boys et al., 1993). Samples were packed into 3.2-mm thin-walled NMR rotors.

NMR Spectroscopy

The 13C-13C 2D CTUC-COSY spectrum of GB1 has been previously reported (Franks et al., 2005). The CTUC-COSY spectrum of ubiquitin was collected on a 750-MHz Varian VNMRS spectrometer (1H frequency) with an HCN Balun MAS probe. The MAS rate was 16.666 kHz and the variable air temperature was set to −10° C. SPINAL decoupling (85 kHz) was employed during acquisition. The refocusing delay was 4.2 ms. The spectrum was processed with 20-Hz net line broadening in each dimension.

The CTUC-COSY spectrum of DsbA was collected on a 500-MHz Infinity Plus spectrometer (1H frequency) spinning at 22.222 kHz at variable air temperature set point of -10° C. 85 kHz of 1H SPINAL decoupling was employed during acquisition. 30-Hz net line broadening was applied in each dimension. The 13C-13C 2D SPC5 spectrum of TF was collected on a 750-MHz Varian VNMRS spectrometer (1H frequency) with an HCN BioMAS probe. The MAS rate was 12.500 kHz and the variable air temperature was set to 10° C. The SPINAL 1H decoupling was employed at 80 kHz during the acquisition. The spectrum was processed with 20-Hz net line broadening in each dimension.

Supplementary Material

1
2

Highlights.

  • An algorithm to numerically compare NMR spectra and protein structures is developed

  • An unassigned 13C-13C NMR spectrum can be used to identify the correct protein fold

  • Resonance assignments are not needed to use NMR data in structure development

ACKNOWLEDGMENTS

This work was supported by NIH R01-GM073770 (to C.M.R.), R01-HL103999 (to J.H.M. and C.M.R.), and R21-107905 (to C.M.R.), and NIH S10RR025037 (to C.M.R). J.M.C. and K.M.N. were recipients of National Science Foundation Graduate Research Fellowships. A.E.N. was a recipient of an NIH Ruth L. Kirschstein National Research Service Award (F32 GM095344), and E.D.W. was an American Heart Association Postdoctoral Fellow. The authors thank Dr. Ying Li for expressing and purifying uniformly 13C,15N-labeled ubiquitin, and Dr. Deborah A. Berthold for preparing the uniformly 13C,15N-labeled WT-DsbA sample.

Footnotes

SUPPLEMENTAL INFORMATION

Supplemental Information includes three figures and can be found with this article online at http://dx.doi.org/10.1016/j.str.2015.07.019.

AUTHOR CONTRIBUTIONS

J.M.C., A.E.N., and C.M.R. conceived of the project and experiments. J.M.C., Q.Y., J.R.P., and A.E.N. designed and implemented algorithms. J.M.C., M.T., M.D.T., E.D.W., K.M.N., L.J.S., and G.C. executed NMR experiments. M.T., E.D.W., K.M.N., L.J.S., and J.H.M. prepared samples. J.M.C. and C.M.R. wrote the manuscript. All authors read and corrected the manuscript.

REFERENCES

  1. Barbet-Massin E, Pell AJ, Retel JS, Andreas LB, Jaudzems K, Franks WT, Nieuwkoop AJ, Hiller M, Higman V, Guerry P, et al. Rapid proton-detected NMR assignment for proteins with fast magic angle spinning. J. Am. Chem. Soc. 2014;136:12489–12497. doi: 10.1021/ja507382j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Jr., Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  3. Boettcher JM, Clay MC, LaHood BJ, Morrissey JH, Rienstra CM. Backbone 1H, 13C and 15N resonance assignments of the extracellular domain of tissue factor. Biomol. NMR Assign. 2010;4:183–185. doi: 10.1007/s12104-010-9233-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boys CWG, Miller A, Harlos K, Martin DMA, Tuddenham EGD, O’Brien DP. Crystallization and preliminary X-ray analysis of human tissue factor extracellular domain. J. Mol. Biol. 1993;234:1263–1265. doi: 10.1006/jmbi.1993.1678. [DOI] [PubMed] [Google Scholar]
  5. Cavalli A, Salvatella X, Dobson CM, Vendruscolo M. Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. USA. 2007;104:9615–9620. doi: 10.1073/pnas.0610313104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen L, Olsen RA, Elliott DW, Boettcher JM, Zhou DH, Rienstra CM, Mueller LJ. High resolution (13)C-detected solid-state NMR spectroscopy of a deuterated protein. J. Am. Chem. Soc. 2006;128:9992–9993. doi: 10.1021/ja062347t. [DOI] [PubMed] [Google Scholar]
  7. Comellas G, Rienstra CM. Protein structure determination by magic-angle spinning solid-state NMR, and insights into the formation, structure, and stability of amyloid fibrils. Annu. Rev. Biophys. 2013;42:515–536. doi: 10.1146/annurev-biophys-083012-130356. [DOI] [PubMed] [Google Scholar]
  8. Dubuisson MP, Jain AK. A modified Hausdorff distance for object matching. In: Storms P, editor. Proceedings of the 12th International Conference on Pattern Recognition; IEEE; 1994. pp. 566–568. [Google Scholar]
  9. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen M, Pieper U, Sali A. Protein structure modeling with MODELLER. Curr. Prot. Bioinform. 2002;15:1–30. [Google Scholar]
  10. Franks WT, Zhou DH, Wylie BJ, Money BG, Graesser DT, Frericks HL, Sahota G, Rienstra CM. Magic-angle spinning solid-state NMR spectroscopy of the b1 immunoglobulin binding domain of protein G (GB1): 15N and 13C chemical shift assignments and conformational analysis. J. Am. Chem. Soc. 2005;127:12291–12305. doi: 10.1021/ja044497e. [DOI] [PubMed] [Google Scholar]
  11. Goddard TD, Kneller DG. University of California, San Francisco; SPARKY: 2004. p. 3. [Google Scholar]
  12. Guerry P, Herrmann T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 2011;44:257–309. doi: 10.1017/S0033583510000326. [DOI] [PubMed] [Google Scholar]
  13. Güntert P. Automated structure determination from NMR spectra. Eur. Biophys. J. 2009;38:129–143. doi: 10.1007/s00249-008-0367-z. [DOI] [PubMed] [Google Scholar]
  14. Han B, Liu Y, Ginzinger SW, Wishart DS. SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. NMR. 2011;50:43–57. doi: 10.1007/s10858-011-9478-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hohwy M, Rienstra CM, Jaroniec CP, Griffin RG. Fivefold symmetric homonuclear dipolar recoupling in rotating solids: application to double quantum spectroscopy. J. Chem. Phys. 1999;110:7983. [Google Scholar]
  16. Hyberts SG, Takeuchi K, Wagner G. Poisson-gap sampling and forward maximum entropy reconstruction for enhancing the resolution and sensitivity of protein NMR data. J. Am. Chem. Soc. 2010;132:2145–2147. doi: 10.1021/ja908004w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Igumenova TI, McDermott AE, Zilm KW, Martin RW, Paulson EK, Wand AJ. Assignments of carbon NMR resonances for microcrystalline ubiquitin. J. Am. Chem. Soc. 2004;126:6720–6727. doi: 10.1021/ja030547o. [DOI] [PubMed] [Google Scholar]
  18. Kannan A, Camilloni C, Sahakyan AB, Cavalli A, Vendruscolo M. A conformational ensemble derived using NMR methyl chemical shifts reveals a mechanical clamping transition that gates the binding of the HU protein to DNA. J. Am. Chem. Soc. 2014;136:2204–2207. doi: 10.1021/ja4105396. [DOI] [PubMed] [Google Scholar]
  19. Knight MJ, Webber AL, Pell AJ, Guerry P, Barbet-Massin E, Bertini I, Felli IC, Gonnelli L, Pierattelli R, Emsley L, et al. Fast resonance assignment and fold determination of human superoxide dismutase by high-resolution proton-detected solid-state MAS NMR spectroscopy. Angew. Chem. Int. Ed. Engl. 2011;50:11697–11701. doi: 10.1002/anie.201106340. [DOI] [PubMed] [Google Scholar]
  20. Kozin MB, Svergun DI. Automated matching of high- and low-resolution structural models. J. Appl. Crystallogr. 2001;34:33–41. [Google Scholar]
  21. Linser R, Bardiaux B, Andreas LB, Hyberts SG, Morris VK, Pintacuda G, Sunde M, Kwan AH, Wagner G. Solid-state NMR structure determination from diagonal-compensated proton-proton restraints. J. Am. Chem. Soc. 2014;136:11002–11010. doi: 10.1021/ja504603g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lu JX, Qiang W, Yau WM, Schwieters CD, Meredith SC, Tycko R. Molecular structure of b-amyloid fibrils in Alzheimer’s disease brain tissue. Cell. 2013;154:1257–1268. doi: 10.1016/j.cell.2013.08.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Maly T, Debelouchina GT, Bajaj VS, Hu K-N, Joo C-G, MakJurkauskas ML, Sirigiri JR, van der Wel PCA, Herzfeld J, Temkin RJ, Griffin RG. Dynamic nuclear polarization at high magnetic fields. J. Chem. Phys. 2008;128:1–19. doi: 10.1063/1.2833582. 052211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Moseley HNB, Sperling LJ, Rienstra CM. Automated protein resonance assignments of magic angle spinning solid-state NMR spectra of B1 immunoglobulin binding domain of protein G (GB1) J. Biomol. NMR. 2010;48:123–128. doi: 10.1007/s10858-010-9448-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round X. Proteins. 2014;82(Suppl 2):1–6. doi: 10.1002/prot.24452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Paramasivam S, Suiter CL, Hou G, Sun S, Palmer M, Hoch JC, Rovnyak D, Polenova T. Enhanced sensitivity by nonuniform sampling enables multidimensional MAS NMR spectroscopy of protein assemblies. J. Phys. Chem. B. 2012;116:7416–7427. doi: 10.1021/jp3032786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Park SH, Das BB, Casagrande F, Tian Y, Nothnagel HJ, Chu M, Kiefer H, Maier K, De Angelis AA, Marassi FM, Opella SJ. Structure of the chemokine receptor CXCR1 in phospholipid bilayers. Nature. 2012;491:779–783. doi: 10.1038/nature11580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Renault M, Pawsey S, Bos MP, Koers EJ, Nand D, Tommassen-van Boxtel R, Rosay M, Tommassen J, Maas WE, Baldus M. Solid-state NMR spectroscopy on cellular preparations enhanced by dynamic nuclear polarization. Angew. Chem. Int. Ed. Engl. 2012;51:2998–3001. doi: 10.1002/anie.201105984. [DOI] [PubMed] [Google Scholar]
  29. Robustelli P, Kohlhoff K, Cavalli A, Vendruscolo M. Using NMR chemical shifts as structural restraints in molecular dynamics simulations of proteins. Structure. 2010;18:923–933. doi: 10.1016/j.str.2010.04.016. [DOI] [PubMed] [Google Scholar]
  30. Robustelli P, Stafford KA, Palmer AG., 3rd Interpreting protein structural dynamics from NMR chemical shifts. J. Am. Chem. Soc. 2012;134:6365–6374. doi: 10.1021/ja300265w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Sali A, Berman HM, Schwede T, Trewhella J, Kleywegt G, Burley SK, Markley J, Nakamura H, Adams P, Bonvin AMJJ, et al. Outcome of the first wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure. 2015;23:1156–1167. doi: 10.1016/j.str.2015.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Schmidt E, Gath J, Habenstein B, Ravotti F, Székely K, Huber M, Buchner L, Böckmann A, Meier BH, Güntert P. Automated solid-state NMR resonance assignment of protein microcrystals and amyloids. J. Biomol. NMR. 2013;56:243–254. doi: 10.1007/s10858-013-9742-x. [DOI] [PubMed] [Google Scholar]
  33. Shahid SA, Bardiaux B, Franks WT, Krabben L, Habeck M, van Rossum BJ, Linke D. Membrane-protein structure determination by solid-state NMR spectroscopy of microcrystals. Nat. Methods. 2012;9:1212–1217. doi: 10.1038/nmeth.2248. [DOI] [PubMed] [Google Scholar]
  34. Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu K, Lemak A, et al. Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. USA. 2008;105:4685–4690. doi: 10.1073/pnas.0800256105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 1997;268:209–225. doi: 10.1006/jmbi.1997.0959. [DOI] [PubMed] [Google Scholar]
  36. Spera S, Bax A. Measurement of NH-CaH coupling constants in staphylococcal nuclease by two-dimensional NMR and comparison with X-ray crystallographic results. J. Am. Chem. Soc. 1991;113:5490–5492. [Google Scholar]
  37. Sperling LJ, Berthold DA, Sasser TL, Jeisy-Scott V, Rienstra CM. Assignment strategies for large proteins by magic-angle spinning NMR: the 21-kDa disulfide-bond-forming enzyme DsbA. J. Mol. Biol. 2010;399:268–282. doi: 10.1016/j.jmb.2010.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sun S, Yan S, Guo C, Li M, Hoch JC, Williams JC, Polenova T. A timesaving strategy for MAS NMR spectroscopy by combining non-uniform sampling and paramagnetic relaxation assisted condensed data collection. J. Phys. Chem. B. 2012;116:13585–13596. doi: 10.1021/jp3005794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, et al. BioMagResBank. Nucleic Acids Res. 2008;36:402–408. doi: 10.1093/nar/gkm957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wang S, Munro RA, Shi L, Kawamura I, Okitsu T, Wada A, Kim S-Y, Jung K-H, Brown LS, Ladizhansky V. Solid-state NMR spectroscopy structure determination of a lipid-embedded heptahelical membrane protein. Nat. Methods. 2013a;10:1007–1012. doi: 10.1038/nmeth.2635. [DOI] [PubMed] [Google Scholar]
  41. Wang T, Park YB, Caporini MA, Rosay M, Zhong L, Cosgrove DJ, Hong M. Sensitivity-enhanced solid-state NMR detection of expansin’s target in plant cell walls. Proc. Natl. Acad. Sci. USA. 2013b;110:16444–16449. doi: 10.1073/pnas.1316290110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wasmer C, Lange A, Van Melckebeke H, Siemer AB, Riek R, Meier BH. Amyloid fibrils of the HET-s(218–289) prion form a beta-solenoid with a triangular hydrophobic core. Science. 2008;319:1523–1526. doi: 10.1126/science.1151839. [DOI] [PubMed] [Google Scholar]
  43. Zeng J, Tripathy C, Zhou P, and Donald BR. A Hausdorff based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns. Comput. Sys. Bioinform. Conf. 2008;2008:169–181. [PubMed] [Google Scholar]
  44. Zhou DH, Nieuwkoop AJ, Berthold DA, Comellas G, Sperling LJ, Tang M, Shah GJ, Brea EJ, Lemkau LR, Rienstra CM. Solid-state NMR analysis of membrane proteins and protein aggregates by proton detected spectroscopy. J. Biomol. NMR. 2012;54:291–305. doi: 10.1007/s10858-012-9672-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES