Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Mar 17.
Published in final edited form as: Nat Methods. 2009 Sep;6(9):625–626. doi: 10.1038/nmeth0909-625

CASD-NMR: Critical Assessment of Automated Structure Determination by NMR

Antonio Rosato 1,2,*, Anurag Bagaria 3,4, David Baker 5, Benjamin Bardiaux 6, Andrea Cavalli 7, Jurgen F Doreleijers 8, Andrea Giachetti 1, Paul Guerry 9, Peter Güntert 3,4, Torsten Herrmann 9, Yuanpeng J Huang 10, Hendrik R A Jonker 4,11, Binchen Mao 10, Thérèse E Malliavin 6, Gaetano T Montelione 10, Michael Nilges 6, Srivatsan Raman 5, Gijs van der Schot 12, Wim F Vranken 13, Geerten W Vuister 8, Alexandre MJJ Bonvin 12
PMCID: PMC2841015  NIHMSID: NIHMS177629  PMID: 19718014

We report the completion of the first comparison of automated NMR protein structure calculation methods and announce its continuation in the form of an ongoing, community-wide experiment: CASD-NMR (Critical Assessment of Automated Structure Determination of Proteins by NMR). CASD-NMR is open for any laboratory to participate and/or to submit targets.

NMR spectroscopy is the only technique for the determination of the solution structure of biological macromolecules. This typically requires both the assignment of resonances and a labor-intensive analysis of multidimensional NOESY spectra, where peaks are matched to assigned resonances. Software tools for the full automation of the NOESY assignment and the structure calculation steps have the potential to boost the efficiency, reproducibility and reliability of NMR structures. Within the e-NMR project (www.e-nmr.eu), which is funded by the European Commission (Project number 213010), we are developing an approach to assess whether such automated methods can indeed produce structures that closely match those manually refined using the same experimental data (the “reference structures”). The concept closely resembles that of other community-wide experiments, such as CASP, the Critical Assessment of Techniques for Protein Structure Prediction1, and CAPRI, the Critical Assessment of Prediction of Interactions2. At variance with both CASP and CAPRI, CASD-NMR is entirely based on experimental data, presenting special issues in assembling, organizing, and distributing these data among participants.

We provided seven research teams in the field with ten experimental data sets for various protein systems of known structure and two sets for protein structures not yet publicly available (“blind tests”), courtesy of the NorthEast Structural Genomics consortium (NESG). We then met in Florence, Italy on May 4–6, 2009 to analyze the structures generated (Fig. 1), by comparison to the reference structures and by using software tools for structure validation. This first experiment indicated that while most submissions had correct overall folds, on certain targets some programs failed to calculate accurate packing and length of secondary structure elements. The root mean square deviations (RMSDs) of the backbone coordinates from the manually-solved structures were typically in the 1–2 Å range, but reached values as high as 9 Å in some cases.

Figure 1. Performance of various automated structure calculation methods.

Figure 1

The results of fully automated calculations by various programs for one of the blind test data sets of the 2009 Florence workshop are compared to the reference structure (bottom right) determined by Aramini et al. (PDB ID 2kif).

The future

The complete automation of protein solution structure determination from assigned chemical shift lists and unassigned NOESY peak lists may soon reach the point where unsupervised results can be directly deposited to the PDB. It is therefore meaningful and timely3 to implement CASD-NMR as a community-wide rolling experiment. We invite software developers to test their fully automated protocols on blind data sets and produce structures as if they would directly deposit them to the PDB. An assessment meeting is planned for mid-2010. During CASD-NMR, we will regularly release blind test data sets, provided by the whole NMR community, for proteins whose solution structure will be kept on hold by the PDB for at least eight weeks. We therefore invite any NMR group that is about to deposit a structure in the PDB to contribute a blind test case to CASD-NMR. The NESG consortium of the NIH Protein Structure Initiative (PSI) will provide one data set per month. The information in a blind data set will include the protein sequence, chemical shift assignments, and unassigned integrated NOESY peak lists. Data providers may also include additional biochemical information and raw spectral data. These experimental data will be available from a central database of the e-NMR project and through the PSI Knowledge Base (http://kb.psi-structuralgenomics.org/), also after the release of the reference PDB structure. This will allow participants to join CASD-NMR after its beginning. For more information, including file format requirements and details on the submission procedures, please visit http://www.e-nmr.eu/CASD-NMR.

The CASD-NMR participants will have eight weeks to automatically generate structures and deposit in the aforementioned central database their coordinates and the conformational restraints used in calculations. Manual intervention on the data other than re-calibration of chemical shifts is forbidden. For all structures, the CASD-NMR web site will provide access to the coordinates, their comparison to the reference structure and their various validation scores. These data will foster the development of better algorithms and validation tools, and the adoption of state-of-the-art protocols by the wider bio-NMR community.

We look forward to a fascinating experiment!

References

RESOURCES