PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR

Roberto Tejero; David Snyder; Binchen Mao; James M Aramini; Gaetano T Montelione

doi:10.1007/s10858-013-9753-7

. Author manuscript; available in PMC: 2014 Aug 1.

Published in final edited form as: J Biomol NMR. 2013 Jul 30;56(4):337–351. doi: 10.1007/s10858-013-9753-7

PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR

Roberto Tejero ^1,², David Snyder ³, Binchen Mao ¹, James M Aramini ¹, Gaetano T Montelione ^1,^*

PMCID: PMC3932191 NIHMSID: NIHMS510885 PMID: 23897031

Abstract

The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.

Keywords: Protein NMR Structure Validation, BioMagResDatabase, XPLOR, CNS, CYANA, CS-Rosetta

INTRODUCTION

Protein structure determination by NMR methods involves integration of many different software tools. This heterogeneous environment presents a bottleneck in the structure determination process, and results in a steep learning curve that limits the broader use of NMR in molecular biophysics. The challenges of software integration are relevant to data collection, data analysis, and resonance assignments, as well as to the processes of 3D structure generation and structure quality assessment. Many computer programs and servers have been developed that integrate important parts of the process, including data collection, data processing (e.g. [1]), data analysis (e.g. [2-8]), and structure quality assessment [9, 10]. In particular, the challenge and importance of protein NMR structure validation has been the subject of several recent papers and reviews [9-16]. However, the heterogeneous software environment of protein NMR spectroscopists continues to confound and slow down the efforts of novices and experts alike, and challenges efforts to standardize protein NMR structure determination and assessment.

PDBStat is a computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between restraint data types. It also provides an integrated set of computational methods for protein NMR structure quality assessment. In order to streamline steps in the pipeline of NMR structure determination, and to provide information useful for protein NMR structure quality assessment, it also includes tools for standardized structural superimpositions, RMSD calculations, restraint summaries, restraint violation analyses, and various analyses validating models against experimental data.

Over the last several years, PDBStat has been used extensively by the Northeast Structural Genomics Consortium (NESG) as part of its platform for high throughout protein NMR structure determination [17-19]. Within the NESG, PDBStat is used extensively to compare and interconvert input files for various third party software, allowing analysis of the same restraint data by multiple structure generation programs. PDBStat is also the restraint analysis software underlying the Protein Structure Validation Software (PSVS) server [9, 20]. However, the features and capabilities of PDBStat are much more extensive than the limited set of functions it provides for the PSVS server. In this paper we provide a detailed description of the PDBStat software, and highlight several of its valuable computational capabilities.

DESCRIPTION OF THE SOFTWARE

PDBStat is a stand-alone software program written largely in C. In the course of its evolution, PDBStat has also incorporated some Fortran and C++ subroutines. The program is freely available for non-commercial use at http://biopent.uv.es/~roberto. An on line versión of the program can be accessed at “http://psvs-1_4-dev.nesg.org/”.

1. Universal protein coordinate and protein NMR restraint converter

One of the challenges of working in the heterogenous software environment that has evolved in the protein NMR community, is that the naming conventions and formats for atomic coordinates (and the corresponding naming conventions and formats for NMR restraints) are different for various important software tools. PDBStat addresses this key issue by providing a universal coordinate and restraint converter, which can convert between any of the following coordinate and restraint formats: XPLOR-NIH [21], CNS [22], DYANA [23], CYANA [24], CHARMM, Rosetta [25] (versions 2.3.0 and 3.x), as well as from the older formats of DIANA [26], DISMAN [27], DISGEO ([28, 29], and CONGEN [30, 31]. The most common use of PDBStat is for converting between XPLOR/CNS, CYANA, and Rosetta coordinates and restraints. PDBStat can read atomic coordinate and restraint files in any of these formats and convert them to standard PDB format coordinate files, with correct prochiral hydrogen labels, and a corresponding restraint file (with pseudoatom restraints where appropriate) consistent with these PDB coordinates. These files are the preferred format for submission of coordinates and restraint files to the PDB.

PDBStat can also be used to convert restraint lists and atomic coordinate files used for one program (e.g. CNS) into properly formatted files for use with another program (e.g. restrained CS-Rosetta). PDBStat uses a central representation, the IUPAC definitions for atom labels [32]. Hence, operationally, coordinate and constraint lists are converted first to IUPAC (e.g. CNS to IUPAC), and then from IUPAC to the desired format (e.g. IUPAC to restrained CS-Rosetta)

2. Protein atomic coordinate analysis

Relabeling of prochiral methylene and isopropyl methyl groups

IUPAC conventions [32] define the naming of prochiral sites in proteins, including the labeling of hydrogen atoms of methylene groups, and the labeling of isopropyl methyl groups. They also provide conventions for the labeling of sidechain amide protons of Asn and Gln residues. It is well known that some of the software packages commonly used in protein structure determination use atom labeling schemes that do not follow these IUPAC conventions. PDBStat provides automated analysis of prochiral sites and sidechain amide protons using a wide range of atomic coordinate formats, and relabels these atoms based on the local stereochemistry. This algorithm relies on structure geometry calculations done in the process of translating the coordinates, rather than on conversion tables. Hence the accuracy of the resulting prochiral labels does not depend on the accuracy of the labeling in the original coordinate file.

The PDBStat algorithm for prochiral atom naming is summarized in Figure 1 for a methylene CβH2 group. In this case, the vectors from Cα to Cβ ( $vc \vec{a} cb$ ), Cβ to Cγ ( $vc \vec{b} cg$ ), Cβ to one Hβ ( $vc \vec{b} hb 1$ ) and Cβ to the second Hβ ( $vc \vec{b} hb 2$ ) are first computed. Then the normal vector ( $\vec{N}$ ) to the plane (shown in Fig. 1) defined by $vc \vec{a} cb$ and $vc \vec{b} cg$ is computed ( $\vec{N} = vc \vec{a} cb x vc \vec{b} cg$ ). Finally the dot products with this normal, $vb \vec{h} b 1 * \vec{N}$ and $vb \vec{h} b 2 * \vec{N}$ , are computed. The two dot products have different signs, since the Hβ's are separated by the plane. The Hβ atom (β1 or β2) with $vb \vec{h} b * \vec{N} > 0$ is HB2, and the Hβ atom with $vb \vec{h} b * \vec{N} < 0$ is HB3. The same procedure is applied for other prochiral methylene sites, as well as for the prochiral isopropyl methyl sites of Leu and Val sidechains. An alternative procedure for relabeling prochiral methylene and isopropyl methyl groups would be to follow IUPAC definition of the dihedral angles. For example for the Hβ2/Hβ2 case, using dih(N, Cα, Cβ, Hβ2) - dih(N, Cα, Cβ, Cγ) = ~ +120° and dih(N, Cα, Cβ, Hβ3) - dih(N, Cα, Cβ, Cγ) = ~ −120°. PDBStat also provides proper stereospecific relabeling of the side-chain amide protons of Asn and Gln residues, and correct naming of the Cλ atoms of Thr and Ile residues.

Fig. 1 — Schematic depicting the algorithm used to define the stereospecific labeling of prochiral protons of the CβH₂ methylene group.

Identifying “well defined” and “not well defined” regions of the protein structure

The representation of protein NMR structures generally involves providing an ensemble of conformers. Each member of the ensemble is generally a best-fit solution to the experimental data. The variation in conformations across the ensemble provides an estimate of the precision of the representation in various regions of the structure. Typically, some parts of the structure are “well defined”, and other regions are “not well defined”. However, the variations observed across NMR ensembles are often much greater than the variations that provide observable electron density in a crystal structure; i.e. the variations of coordinates in “not well defined” regions of the NMR structure are often of a magnitude that the corresponding structure would not be observable at all as electron density in an X-ray crystal structure.

The superimposed bundle of NMR conformers is therefore a valuable means of identifying which regions of the structure are precisely defined by the NMR experiments, and which are not well defined. While the superimposition process itself is relatively straightforward [33, 34], there are challenges in defining which atoms to use in the rotation/translation operators used to superimpose the structures. Methods and standard criteria for labeling residues or atoms as “well defined” or “not well defined” are an active area of research and discussion in the computational NMR community (see for example refs [35-37]).

PDBStat provides two means of annotating “well defined” atomic coordinates: (i) the dihedral angle order parameter [36] and (ii) the “core atom set” defined by analysis of the distance variance matrix [35]. Both of these methods are independent of making an initial superimposition. Having defined a core atom set by one or another of these methods, these atoms can be used to compute appropriate superimpositions using standard methods (e.g. [33]).

Dihedral angle order parameters

One of the most commonly used and generally accepted methods for distinguishing ‘well-defined’ from “not well defined” residue backbones is the dihedral angle order parameter (DAOP) [36]. Using Eqn 1, PDBStat can calculate the DAOP and locate “well-defined residues” across an ensemble of N conformers.

S ϕ_{i} = \frac{1}{N} \sqrt{{(\sum_{j = 1}^{N} \sin ϕ_{i, j})}^{2} + {(\sum_{j = 1}^{N} \cos ϕ_{i, j})}^{2}}

Eqn 1

A cutoff value S ϕ_i = 0.90 (corresponding to a standard deviation of ±24°) [36] has been used to define a “well defined dihedral angle”. PDBStat uses the default convention that if the sum of backbone DAOPs S(ϕ) + S(ψ) ≥ 1.8, then the entire residue is taken to be “well defined”.

Variance matrix algorithm

The DAOP method has the advantage that is fast and simple, and widely used by the protein NMR community. However, it has some significant shortcomings. The DAOP cannot distinguish local from long-range order; e.g. it is not possible to identify two well-defined “domains” or secondary structure elements which are themselves well-defined from the data, but connected by a flexible linker [35]. Secondly, this approach is backbone oriented, and does not provide a distinction between residues with “well-defined” and “not well defined” sidechains, or sidechains that are only partially “well-defined”. PDBStat can also be used to define sidechain dihedral angle order parameters, which can be interpreted to provide information on the consistency of sidechain conformations across the ensemble of models. PDBStat also provides an implementation of “FindCore” variance matrix algorithm [35] to identify well-defined atoms by partitioning atoms into core and non-core sets based on the variance in distances to all of the other atoms in the structure. The resulting “core atom sets” can be used to superimpose conformers, and for structure quality assessment. These well defined atom sets can also be used by PDBStat to identify “well-defined” backbone regions. The default criteria used by PDBStat for interpreting well-defined residue ranges from well-defined atom sets is to identify the residues as “well defined” if two or more of the N, Cα, and C’ atoms are in the well-defined core atom set.

Optimally superimposing coordinates (RMSD calculations)

In order to calculate root mean squared deviations (RMSDs) in atomic positions, PDBStat rotates and translates each conformer so as to minimize the RMS deviation with one reference conformer from the bundle, referred to as “refmol”. These superimpositions are done using the core atom set(s) defined by either the DAOP or FindCore algorithms, described above, using the method of eigenvalue decomposition by multipliers of Kabsch [33, 34]. The resulting superimposed coordinates are used to compute a mathematical average coordinate set for the ensemble, and the root-mean-squared deviations in atomic positions (RMSDs) are computed relative to these average coordinates. (Note that the “average coordinates” are not physically meaningful except for computing RMSDs). Tests have demonstrated that, when using well-defined atom sets, almost the same “RMSDs to average coordinates” are obtained regardless of which conformer is selected as “refmol”. RMSDs can be reported for the “well defined” core atom set (using the command rmsd best), or for various subsets of atoms; e.g. RMSDs can be computed for alpha carbon (Cα), backbone (N, Cα, C’), all heavy (N, C, O, S), or all atoms including hydrogens.

Selecting a representative NMR model from the ensemble

NMR structures are generally reported as ensembles of models. The ensemble representation provides information about the consistency of the interpretation of the experimental data in different regions of the structure. However, biologists and other users of NMR structures are often confused by the ensemble representation. For this reason, it is advisable for the NMR experimentalist to designate a “representative” model from the ensemble. While no standard conventions have been agreed upon by the community for selecting the “representative structure”, one useful convention is to select the single model that is most similar to all the other models. Specifically, we have adopted the convention that the representative structure is selected from the ensemble by considering only the well-ordered atoms, and then identifying the medoid [38]; i.e. the model in the ensemble that minimizes the rmsd's between it and all (other) models of the ensemble [35]. This selection can be done using the representative structure (rep) command in PDBStat. Using the same algorithm described above, the refmol which results in the lowest RMSD (i.e., the one most like all the other models) is selected as the representative structure. The model selected by this algorithm should be designated as Model #1 – representative structure, in the PDB deposition.

3. Restraint analysis

Restraint statistics and restraint violation analyses

A key component of a protein NMR structure validation report is an analysis of how well each of the protein models reported in the NMR structure ensemble satisfies the experimental restraints. PDBStat provides extensive tools for assessment of restraint satisfaction and violations. The restraint analyses supported by PDBStat include (i) distance restraints (NOE, disulfide bond, and hydrogen bond restraints), (ii) dihedral angle restraints, and (iii) residual dipolar coupling restraints. Statistics are reported on the numbers and distributions of different types of restraints; e.g. intra and inter-residue, medium and long range restraints, hydrogen bond restraints, and inter-chain restraints of dimeric structures. In addition, the program reports statistics on the extent of restraint violations in these various categories. Using the universal structure and restraint converter of PDBStat, protein NMR structures and restraint lists generated for use by a wide range of programs (e.g. CYANA, CNS, XPLOR, or Rosetta) can be analyzed and reported using identical restraint violation statistics. This is a unique feature of the PDBStat software.

NOE distance restraint violations may be assessed using various interpretations of the relationship between interproton distance and NOE peak intensity. In generating restraints from NOE peak intensities, or in assessing restraints against atomic coordinates, PDBStat assumes the following “r⁻⁶ summation” relationship [39]

r = ∣ l e f t {({m a t h o p ∣ s u m ∣ n o l i m i t s^{\land} {} \sim r_{{i j}}^{\land} {{- 6}}} ∣ r i g h t)}^{\land} {{- ∣ f r a c {1} {6}}}

Eqn. 2

This interpretation assumes that the NOEs arising from each of the several interproton distances r_ij contributing to a single NOESY crosspeak contribute independently to the NOESY cross peak volume. This same “r⁻⁶ summation” interpretation of NOESY peak volumes (or intensities if volumes are not available) in terms of distance restraints is used for methyl groups (3 protons), chiral methylene protons lacking stereospecific assignments (2 protons), degenerate methylene protons (2 protons), chiral isopropyl methyl groups lacking stereospecific assignments (6 protons), degenerate isopropyl methyl groups (6 protons), and for degenerate aromatic ring protons of tyrosine or phenylalanine.

NOE completeness metric

NOE Completeness [40] is defined as the ratio of NOE-derived interatom contacts indicated in the restraint list to the number of NOE-derived interatom contacts that are possible considering the 3D atomic coordinates. It is a metric reflecting the completeness of the NOE-derived restraint list. PDBStat has two methods for evaluating the NOE completeness, differing in the way the set of expected NOE-derived contacts is evaluated. In the first method, following the description in the original paper [40], the set of expected contacts is generated based on a list of “observable atoms”. These atom definitions are stored in an independent file, called Observable.nmr. In the second method, the “observable atoms” are defined automatically by PDBStat, rather than being provided by the user. In this case, the “observable atoms” are simply the set of hydrogen atoms for which chemical shift data is available, and the maximum NOE completeness is determined by the completeness of the proton resonance assignments. In either case, all interproton distances between “observable” hydrogen atoms are calculated from the NMR structure models, and all interproton distances below a cutoff are considered to be a potential NOE contact. The cutoff distance for evaluating the number of expected contacts can be selected by the user; the default value is 4.0 Å. An r⁻⁶ summed average is used for evaluating the interproton distances to degenerate atoms (e.g. methyl hydrogens), and a normal average is used to average each interproton distance across the ensemble of NMR models.

Analysis of residual dipolar coupling (RDC) restraints

PDBStat also provides an evaluation of the axial and rhombic components of the molecular alignment tensor, D_a and R, respectively, and calculation of RDCs based on protein atomic coordinates. A singular value decomposition (SVD) [41] is used to calculate D_a and R, providing results that are similar to those provided by standard programs (e.g. PALES [42] and REDCAT [43]) used for RDC analysis. Statistics are reported summarizing how well the RDC values computed from the NMR conformers compare with the experimentally-determined RDC values, including the RDC Q factor [44]. Using the universal restraint format converter, this analysis can be done easily using structures and data that have been generated with different programs.

Parsing restraint files downloaded from the PDB

When depositing data in the PDB, restraints are collected together in a single file (extension .mr) that is archived together with the protein atomic coordinates. It is often interesting to re-analyze these data extracted from the PDB. PDBStat has a data parser to read the .mr restraint file, extract these restraint data, and to provide a statistical analysis of the restraints and the restraint violations. PDBStat currently supports this feature for CNS/XPLOR and CYANA/DYANA distance restraint formats, which are used for the vast majority of the protein NMR restraint files archived in the PDB. However, the universal format converter of PDBStat will allow other restraint types to be handled as required.

APPLICATIONS

In the following sections, we describe some valuable and/or unique applications of the PDBStat software.

1. Identifying conformationally-restricting restraints

PDBStat provides a comprehensive distance restraint summary analysis using distance restraint lists for the most commonly used structure generation programs (e.g. CNS/XPLOR, DYANA/CYANA, and Rosetta). As illustrated in Table 1, these include summaries of NOE-derived, hydrogen bond, disulfide, and ambiguous restraints, classified as intraresidue, sequential, medium-range, long-range, intrachain, and interchain restraints. An important feature of distance restraint analysis involves distinguishing conformationally-restricting restraints from those that are too loose to restrict the conformational space of the intervening dihedral angles. Such tools are available in some, but not all, computer software developed for analyzing protein structures from NMR data [e.g., redundant restraint analysis can be done using the CYANA program [45]. The PDBStat program provides a “Clean NOE” utility that can be applied to restraint lists generated for use with several different structure generation programs. The Clean NOE utility functions to: (i) locate and remove duplicate restraints present in single or multiple restraint lists, (ii) locate cases where the same atom pairs are restrained with different upper-bound distances, and removes the looser of these distance bounds, and (iii) identify and remove restraints that do not restrict the conformational space of the intervening dihedral angles, which are typically intraresidue or sequential restraints. Rather than computing distances based on the conformations of intervening dihedral angles, a precompiled set of intraresidue and sequential restraining-distance bounds for various amino acid residue types has been generated using standard bond lengths and angles. These distance bounds account for different peptide libraries used by different structure calculation programs. This library of restraining-distance bounds is used to build rules and remove restraints that do not fulfill these rules. This approach has the advantage that the user can change some of the restraining-distance bound values in these precompiled lists (without recompiling the program), to allow for special cases or include new rules. The program then outputs an edited restraint list, excluding these non-functional restraints, along with a report of which restraints have been removed in this process. Table 1 shows the results of this restraint processing for NOE-based restraint lists generated for monomeric and homodimeric protein structures.

Table 1.

Summary of statistics for restraints for NESG target protein CcR55 (PDB id 2jqn), a monomeric structure, and NESG target protein HR3057H (PDB id 2kw6), a homodimeric structure. “Original” refers to the restraint sets as it would have been deposited in PDB prior to the introduction of PDBStat into our standard deposition process, and “Clean” summarizes the restraints lists regenerated using the Clean NOE utility of PDBStat.

SUMMARY OF RESTRAINTS
	PDB id 2jqn		PDB id 2kw6
	Original	Clean	Original	Clean
Total number of restraints	1676	1200	1901	1831
Intra-residue restraints (i=j)	628	221	628	560
Sequential constraints (i-j)=1	428	360	443	441
Backbone-backbone	162	119	84	82
Backbone-side chain	32	23	50	50
Side chain-side chain	234	218	309	309
Medium range restraints 1 < (i-j) < 5	244	244	447	447
Backbone-backbone	6 4	64	104	104
Backbone-side chain	53	53	164	164
Side chain-side chain	127	127	179	179
Long range restraints (i-j) ≥ 5	376	376	383	383
Total hydrogen bond restraints	66	66	0	0
Long range H-bond restraints (i-j) ≥ 5	38	38	0	0
Disulfide restraints	0	0	0	0
Intrachain restraints	1742	1266	1671	1601
Interchain restraints	0	0	230	230
Ambiguous restraints	0	0	0	0

Open in a new tab

Relabeling of prochiral methylene and isopropyl methyl groups and sidechain amide atoms

In the process of converting coordinate file formats, PDBStat can also label prochiral methylene protons, prochiral isopropyl methyl atoms (both C and H), and sidechain amide protons with their correct IUPAC names. The program will also correctly relabel Thr OG1 and CG2, which are often mislabeled in older PDB coordinate files. This functionality is illustrated in PDBStat output shown in Table 2.

Table 2.

PDBStat output demonstrating the relabeling of prochiral methylene protons, isopropyl methyl atoms of Leu and Val, sidechain amide protons of Asn and Gln, and Thr gamma atoms.

PdbStat>	--> Fixing StereoNames of model 1
PdbStat>	The Leucine Carbons (2 CD's)
PdbStat>	(CD2) renamed (CD1) in 29 (LEU)
PdbStat>	(CD1) renamed (CD2) in 29 (LEU)
PdbStat>	The Valine Carbons (2 CG's)
PdbStat>	(CG2) renamed (CG1) in 27 (VAL)
PdbStat>	(CG1) renamed (CG2) in 27 (VAL)
PdbStat>	The Isoleucine Carbons (2 CG's)
PdbStat>	(CG2) renamed (CGI) in 4 (ILE)
PdbStat>	(CG1) renamed (CG2) in 4 (ILE)
PdbStat>	The threonine DG1 and CG2 check
PdbStat>	(OG2) renamed (OG1) in 45 (THR)
PdbStat>	(CG1) renamed (CG2) in 45 (THR)
PdbStat>	General case of Two Beta Protons (HB's)
PdbStat>	(3HB) renamed (2HB) in 1 (MET)
PdbStat>	(2HB) renamed (3HB) in 1 (MET)
PdbStat>	General case of Two Gamma Protons (HG's)
PdbStat>	(3HG) renamed (2HG) in 1 (MET)
PdbStat>	(2HG) renamed (3HG) in 1 (MET)
PdbStat>	General case of Two Delta Protons (HD's)
PdbStat>	(3HD) renamed (2HD) in 6 (LYS)
PdbStat>	(2HD) renamed (3HD) in 6 (LYS)
PdbStat>	General case of Two Epsilon Protons (HE's)
PdbStat>	(3HE) renamed (2HE) in 6 (LYS)
PdbStat>	(2HE) renamed (3HE) in 6 (LYS)
PdbStat>	The Glycine Protons (2 HA's)
PdbStat>	(3HA) renamed (2HA) in 52 (GLY)
PdbStat>	(2HA) renamed (3HA) in 52 (GLY)
PdbStat>	Two Beta Protons (HB's) of Cystine
PdbStat>	Two Beta Protons (HB's) of Serine
PdbStat>	(3HB) renamed (2HB) in 75 (SER)
PdbStat>	(2HB) renamed (3HB) in 75 (SER)
PdbStat>	ASN side chain amides
PdbStat>	(2HD2) renamed (1HD2) in 55 (ASN)
PdbStat>	(1HD2) renamed (2HD2) in 55 (ASN)
PdbStat>	GLN side chain amides
PdbStat>	(2HE2) renamed (1HE2) in 108 (GLN)
PdbStat>	(1HE2) renamed (2HE2) in 108 (GLN)

Open in a new tab

Analysis of restraint density

PDBStat also provides an analysis of the restraint density along the protein sequence. In this analysis, an interresidue distance restraint between residues i and j is assigned 0.5 units to residue I and 0.5 units to residue j. The resulting histogram plots of interresidue restraints per residue, providing a survey of restraint density along the protein sequence (Fig. 2), is output as a .png format file using the gnuplot software [46], suitable for inclusion as a figure in a manuscript or associated supplementary material.

Fig. 2 — Histogram plots generated by PDBStat of conformationally-restricting restraints per residue, analyzed from a restraint list. Results are shown for a monomeric protein (NESG_id CcR55; PDB_id 2jqn) at the top, and for a homodimeric protein (NESG_id HR305; PDB_id 2kw6) at the bottom.

Analysis of contact maps derived from restraint lists or derived from 3D models

Contact maps are an important tool for assessing restraint data sets, 3D structures of protein models, and the agreement between restraint data sets and 3D protein model coordinates. PDBStat has utilities to generate contact maps from distance restraint lists, as illustrated in the left panels of Fig. 3, and contact maps from the 3D protein coordinates (using a default distance cutoff of 5 Å, which can be modified by the user), illustrated in the right panels of Fig. 3. This analysis can be done for monomers (top panels of Fig. 3), homodimers (bottom panels of Fig. 3), or heterodimers (results not shown). Contact maps may be generated for residue-residue contacts (as illustrated in Fig. 3) or for atom-atom contacts (result not shown). These residue-residue contact maps are produced directly by PDBStat either as text files or as postscript images. Comparison of contacts maps derived from restraint lists and 3D structures are useful both for validating structures [47, 48] and for iterative analysis of NOESY data to provide more complete restraint lists [7, 45, 47, 48]. For example, the RPF software [47, 48] for assessing the agreement between a protein model and NOESY peak list data is based on the concept of comparing all possible atomic contact maps derived from the NMR resonance assignment and NOESY data with contact maps derived from the 3D model.

Fig. 3 — Contact maps generated by PDBStat for conformationally-restricting restraints in an input restraint list, left, and for short interproton distances derived from atomic coordinates, right. Results are shown for a monomeric protein (NESG_id CcR55; PDB_id 2jqn) at top, and for a homodimeric protein (NESG_id HR305; PDB_id 2kw6) at bottom. Comparisons of such plots provide a visual analysis of how well a 3D structure model fits to the experimental restraint list.

Analysis of backbone and sidechain dihedral angle order parameter (DAOP)

As outlined in the Description of Software section above, PDBStat provides an analysis of both backbone and sidechain DAOPs. These analyses are illustrated in Fig. 4 for two NMR-derived conformational ensembles archived in the PDB; one for a momomeric protein structure (left panel), and the other for one protomer of a homodimeric protein structure (right panel). These analyses are provided as both text files and as png images.

Fig. 4 — Histogram plots of dihedral angle order parameters (DAOP) for ϕ, ψ and χ₁ vs amino acid sequence obtained from PDBStat. Results are shown for a monomeric protein (NESG_id CcR55; PDB_id 2jqn) at left, and for one protomer of a homodimeric protein (NESG_id HR305; PDB_id 2kw6), at right.

Analysis of “well defined” and “not well defined” regions of the protein structure

The DAOP analysis may be used to provide information regarding which regions of the polypeptide backbone are well-defined (defined by convention in PDBStat as S(ϕ) + S(ψ) ≥ 1.8, and which are not well-defined, as outlined above in the Description of Software section. Fig. 4 illustrates the DAOP analysis for backbone ϕ and ψ, as well as sidechain χ₁, dihedral angles in monomeric and and homodimeric protein structures. PDBStat also provides an implementation of the FindCore algorithm [35], identifying sets of atoms that are well-defined with respect to one another using a variance matrix analysis.

A comparison of these two methods, DAOP and FindCore, using a representative NESG NMR structure ensemble, is illustrated in Fig. 5. In the left panel, the backbone atoms (N, Cα, C’) of the 20 conformers of the ensemble archived in the PDB are superimposed on one representative conformer. In this case, the representative conformer was selected as the medoid structure, with lowest backbone RMSD to all the other conformers in the ensemble. The atoms used in the superimposition are those that were identified as “well-defined” backbone atoms by the FindCore implementation in PDBStat. Backbone atoms colored in yellow are those which both methods identify as “not well defined”, while atoms shown in dark blue are those identified by both methods as “well defined”. Atoms colored light blue are “well defined” based on FindCore, but “not well defined” based on DAOP, while atoms colored green are “well defined” based on DAOP, but not well defined” based on FindCore. The backbone atomic coordinates of the corresponding X-ray crystal structure are shown in red. Interestingly, some segments within the consensus “not well defined” polypeptide regions (yellow) have DAOP above the threshold (green); i.e. the backbone regions in these green regions have relatively consistent backbone dihedral angles. On the other hand, the regions identified by FindCore, but not DAOP analysis, as ”well defined”, shown in light blue, have atomic variances that are similar to those in the consensus well-defined regions (dark blue). The left panel of Fig. 5 thus demonstrates the complementary value of FindCore and DAOP analysis in identifying “well defined” regions of the protein structure that might best be used in structure quality assessment, for computing superimpositions, or in interpreting the precision of various regions of the NMR structure for structure-function studies.

The right panel of Fig. 5 illustrates “well-defined” and “not well defined” regions of protein sidechains. The panel shows a region of the protein structure where the backbone is well defined (blue), while the corresponding side chains, or parts of these side chains, associated with these well defined backbone atoms are themselves “not well defined” (yellow), based on the FindCore analysis. Interestingly, rotation about the ring axis of Phe47 results in less-well-defined positions for the Cδ and Cε atoms relative to the rest of this side chain. This result demonstrates the special value of atom specific designators of structural precision over the standard convention of defining only residue ranges of the well-defined regions of the protein NMR structure.

Generating protein structures using restrained CS-Rosetta

Recently the Rosetta program has been further developed to allow the use of a wide range of interatomic distance restraints and residual dipolar coupling (RDC) data [49, 50]. These enhancements, directed to the challenges of solving larger protein structures, also allow the general use of Rosetta or CS-Rosetta together with NMR-derived distance restraints in a manner similar to conventional distance-restrained structure generation calculations with CNS, XPLOR, CYANA or other more traditional protein structure generation program. We refer to these as restrained CS-Rosetta (rCS-Rosetta) NMR structure generation calculations.

Using the universal restraint converter of PDBStat, restraint lists originally prepared for CNSw calculations were generated and used as distance restraints in rCS-Rosetta calculations. This approach was benchmarked in this study using two small NESG target proteins, ZR18 (91 residues) and PfR193A (114 residues), for which both solution NMR (PDB_id 1pqx and 2kl6, respectively) and X-ray crystal structures (PDB_id 2ffm and 3idu, respectively) have been determined and archived in the PDB. These targets also have extensive NMR data archived in the BioMagResDB [51] (BMBR_id's 5844 and 16385, respectively). For PfR193A, ¹⁵N-¹H RDC data were also available, and were used in the rCS-Rosetta calculations as described elsewhere [49, 50]. These NMR restraint data were used to determine the structures of ZR18 and PfR193A using a standard NESG protocol, involving initial structure generation with CYANA followed by structure refinement with CNS in explicit water solvent (as described in detail at http://www.nmr2.buffalo.edu/nesg.wiki/). This standard NESG protocol is designated as the CNSw protocol. For this study, the ZR18 structure was downloaded from the PDB and re-refined using the CNSw protocol (since it had been deposited in the PDB before the standard CNSw protocol was adopted), while for PfR193A the CNSw-refined coordinates were those obtained from the PDB. The CNS restraints were then converted to rCS-Rosetta restraints using PDBStat, and rCS-Rosetta structures were generated using Rosetta ver. 3, as described in detail elsewhere (Mao, Tejero and Montelione, in preparation). The rCS-Rosetta calculations used restraint tolerance of 0.3 Å; i.e. loosening each restraint by 0.3 Å during the calculations to allow the structure to deviate slightly from experimental restraints in order to better satisfy the Rosetta energy function. rCS-Rosetta calculations required about 4,000 min to generate 10,000 decoys using 20 2.5 GHz processors, compared with Cyana structure generation followed by CNSw refinement, which required about 40 min using the same 20 2.5 GHz processors to generate 100 conformers with Cyana and to refine 20 conformers with CNSw.

The Restraint Summary and Restraint Violation Analysis provided by PDBStat, along with other knowledge-based structure quality scores provided by the PSVS [9] and RPF [47] programs, for the two structures, each solved with the two different protocols, are shown in Table 3. The resulting conformational ensembles are compared with each other, and with the corresponding X-ray crystal structures in Fig. 6, using the superimposition utilities of PDBStat. Average pairwise RMSDs within each ensemble, and between the NMR conformers and the corresponding X-ray crystal structure, are tabulated in Table 4, along with GDT-TS and GDT-HA backbone superimposition scores [52] assessing structural similarity of Cα atom positions. For both ZR18 and PfR193A, the rCS-Rosetta structures are more similar to the X-ray structure than the structures generated using the standard CYANA-CNSw protocol; for ZR18 the changes are relatively substantial, with movements of up to 2 Å, while for PfR193A the differences in backbone structures generated by the two methods are smaller. In both cases, the CYANA-CNSw and rCS-Rosetta NMR structures fit equally well to the NOESY peak list data (i.e. RPF and DP scores), as shown in Table 3. The rCS-Rosetta structures have significantly better knowledge-based structure quality scores than structures generated from the same restraint data using the standard NESG CYANA-CNSw protocol (Table 3). rCS-Rosetta structures, however, tend to have larger numbers of distance restraint and dihedral angle violations (Table 3), measured against the loosen restraints (i.e. the Rosetta restraints with 0.3 Å loosening of upper bounds). As single conformer in the ensemble of PfR135A structures has a significant dihedral angle violation of almost 60°. The CNSw-refined and rCS-Rosetta structures for ZR18 and rCS-Rosetta structure of PfR193A have been deposited in the PDB (2m6q and 2m8w for ZR18 and 2m8x for PfR193A, respectively).

Table 3.

NMR structure statistics for the CNSw and rCS-Rosetta structures of PfR193A and ZR18^a

	PfR193A (CNSw)	PfR193A (rCS-Rosetta)	ZR18 (CNSw)	ZR18 (rCS-Rosetta)

Completeness of resonance assignments ^b:
backbone (%)	93.46	93.46	93.99	93.99
side chain (%)	90.34	90.34	78.02	78.02
aromatic (%)	100	100	100	100
stereospecific methyl (%)	88.46	88.46	100	100
Conformationally-restricting restraints^c:
Distance restraints
Total	2719	2719	1137	1137
intra-residue (i = j)	523	523	168	168
sequential(\|i-j\|= 1)	686	686	337	337
medium range (1 < \|i - j\| < 5)	271	271	217	217
long range (\|i - j\| ≥ 5)	1239	1239	415	415
Dihedral angle restraints	165	165	179	179
Hydrogen bond restraints	0	0	54	54
No. of restraints per residue	26.7	26.7	15.7	15.7
No. of long range restraints per residue	11.5	11.5	5.1	5.1
Residual restraint violations^c:
Average no. of distance viol per structure:
0.1 - 0.2 Å	2.60	16.05	9.50	8.95
0.2 - 0.5 Å	0.05	10.95	5.10	5.35
> 0.5 Å	0	2.25	0.50	3.20
largest violation (Å)	0.22	1.72	0.95	1.30
Average no. of dihed angle viol per structure:
1 - 10°	19.90	2.55	9.7	1.35
> 10°	0	1.55	0.1	0.15
largest violation (°)	9.6	59.7	11.0	22.2
Model QUality^c:
RMSD backbone atoms (Å) ^d	0.4	0.4	0.7	0.6
RMSD heavy atoms (Å) ^d	0.6	0.6	1.0	0.8
RMSD bond lengths (Å)	0.008	0.010	0.019	0.010
RMSD bond angles (°)	0.6	0.5	1.3	0.4
MolProbity Ramachandran statistics ^c,^d
most favored regions (%)	96.7	98.4	88.9	97.1
allowed regions (%)	3.2	1.3	9. 8	2.9
disallowed regions (%)	0.1	0.3	1. 3	0.0
Global quality scores (Raw / Z-score)^c
Verify3D	0.37 / −1.44	0.41 / −0.80	0.33 / -2.09	0.43 / −0.48
ProsalI	0.35 / −1.24	0.35 / −1.24	0.43 / −0.91	0.67 / 0.08
Procheck (phi-psi)	−0.42 / −1.34	−0.30 / −0.87	−0.70 / -2.44	−0.20 / −0.47
Procheck (all) ^d	−0.27 / −1.60	−0.05 / 0.30	−0 .43 / -2.54	0.12 / 0.71
MolProbity clash score	15.93 / −1.21	10.36 / −0.25	12.95 / −0.70	6.27 / 0.45
RPF Scores^e
Recall / Precision	0.967 / 0.955	0.967 / 0.958	0.927 / 0.779	0.931 / 0.772
F-measure / DP-score	0.961 / 0.874	0.963 / 0.881	0.847 / 0.747	0. 844 / 0.747
Model Contents:
	436-541	436-520,	3-5,14-18,28-	2-11,14-
Ordered residue range ^d		523-541	32,38-50,53-	22,27-83
			68,70-82
BMRB accession number:	16385	16385	5844	5844
PDB id:	2kl6	2m8x	2m6q	2m8w

Open in a new tab

Structural statistics computed for the ensemble of deposited structures.

Computed using AVS software [54] from the expected number of resonances, excluding: highly exchangeable protons (N-terminal, Lys, and Arg amino groups, hydroxyls of Ser, Thr, Tyr), carboxyls of Asp and Glu, non-protonated aromatic carbons, and the C-terminal His₆ tag.

Calculated using PSVS 1.5 [9]. Average distance violations were calculated using the sum over r⁻⁶.

For ordered residues with [S(phi) + S(psi) ≥ 1.8].

RPF scores [25] reflecting the goodness-of-fit of the final ensemble of structures (including disordered residues) to the NOESY data and resonance assignments.

Fig. 6 — Comparison of small protein structures generated with either a standard CYANA-CNSw protocol or with restrained CS-Rosetta (rCS-Rosetta). Results are shown for NESG target proteins ZR18 (A) and PfR193A (B). For each protein target, backbone structures are shown for the X-ray crystal structure (red), a standard CYANA-CNSw refined structure (yellow), a structure generated using rCS-Rosetta following restraint conversion using PDBStat (green). As quantified in Tables 3 and 4, for these small proteins the rCS-Rosetta structures have better knowledge-based structure quality scores, equally good agreement with the NOESY peak list data, and are slightly more similar to the corresponding X-ray crystal structures.

Table 4.

Backbone (N, Cα, C) RMSD and GDT-TS scores comparing structures of NESG target proteins ZR18 and PfR193A generated with the standard NESG CYANA-CNSw protocol, with the same structures refined with rCS-Rosetta following restraint conversion using PDBStat. These ensembles of NMR conformers are compared with the corresponding X-ray crystal structure coordinates.

	Number of Residues^a	rCS-Rosetta vs. CNSw RMSD / GDT-TS /GDT-HA	rCS-Rosetta vs. X-ray RMSD / GDT-TS /GDT-HA	CNSw vs. X-ray^b RMSD / GDT-TS /GDT-HA
ZR18	91	1.18 Å / 0.86 / 0.68	0.98 Å / 0.94 / 0.79	1.40 Å / 0.82 / 0.62
PfR193A	114	0.42 Å / 0.97 / 0.86	0.57 Å / 0.99 / 0.93	0.64 Å / 0.99 / 0.89

Open in a new tab

Well-defined residues were determined by PDBStat using the FindCore [35] method.

For target ZR18, the PDB_id of the reference X-ray crystal structure is 2ffm. For target PfR193A, the PDB_id of the reference X-ray crystal structure is 2idu.

A careful analysis of restraints in CNS and Rosetta formats, interconverted by PDBStat, and of the specific violations observed for the rCS_Rosetta structures, confirms that these restraint violations are characteristic of the rCS-Rosetta structures, rather than the result of errors in restraint conversion. For example, when the CNSw structures are assessed using the Rosetta restraints obtained following conversion from CNS format, as expected no residual restraint violations are observed (Supplementary Table S1). In addition NOE completeness calculation was done for both sets of restraints and the results are again the same. The modest number of small restraint violations in the rCS-Rosetta structures are indeed a feature of these structures.

DISCUSION

The PDBStat program is a central component of the NESG NMR structure production pipeline, and of the PSVS [9] structure quality assessment server, and has been used as part of he structure determination and structure validation process on over 450 protein NMR structures. It is a user-friendly software package that integrates many of the computational tools needed to generate and assess protein structures from NMR restraint lists. The PDBStat software is easy to install on laptops or computers in small NMR lab groups, and has minimal requirements in terms of disk space and CPU speed. The software provides a uniform restraint converter that allows the same restraint data to be used with several different structure generation programs, including CNS/XPLOR, various versions of DYANA and CYANA, and Rosetta. Some of these features are also provided by the CING [10, 14] and CCPN [3] software packages. In our experience, however, the restraint conversions and restraint violation analysis tools provided by PDBStat are much more extensive and easier to use.

A special feature of PDBStat is the accurate conversion of restraint lists prepared from one program (e.g. CYANA) into restraint lists that can be used to run another structure generation program (e.g. CNS or rCS-Rosetta). This versatility underlies an evolving approach in which once a protein NMR structure is determined using one software package, it can be validated by rapid redetermination with other software packages. This approach can also be used to validate restraint lists generated by different automatic NOESY analysis programs. PDBStat's universal restraint and coordinate conversion utilities thus provide the basis for the use of many NMR data analysis and software generation programs in parallel, and standardized assessment of restraint violations generated by the different structure generation programs.

Of special interest is the conversion of CYANA or CNS restraint lists into input for rCS-Rosetta. The resulting rCS-Rosetta structures are observed to have better knowledge-based structure quality scores, better agreement with the corresponding crystal structure, and about equally good global scores in matching to the NOESY peaks lists (i.e. RPF scores), as structures generated from the same restraint data using our standard protocol of Cyana structure generation followed by CNSw refinement. However, these rCS-Rosetta structures have a larger number of small restraint violations. Careful analysis of these restraint violations demonstrates that they are not the result of inaccurate restraint conversion by PDBStat; indeed when the rCS-Rosetta restraints are used to assess the CNSw-refined structures there are essentially no serious restraint violations and the NOE completeness calculations are the same as expected (Supplementary Table S1). Hence, the modest restraint violations observed in the rCS-Rosetta structures reflect the inconsistency between the NMR restraints and the conformations preferred in the Rosetta force field, which are generally closer to the crystal structure. The violations of Rosetta restraints were measured against the loosen restraints (i.e. the Rosetta restraints with 0.3 Å loosening of upper bounds).

Similar observations have been observed in un-restrained refinement using Rosetta [53], where it was first suggested that such analyses can be used to correct misinterpretation and miscalibration of restraints derived from the NOESY peak list data due to misassignment of NOESY cross peaks, the effects of conformational averaging, and/or attenuation of cross peak intensities due to exchange broadening. Indeed, in a systematic study of some 40 pairs of structures determined by both NMR and X-ray crystallography (Mao, Tejero, and Montelione, in preparation), we consistently observe that as the accuracy of the NMR structure relative to the crystal structure improves, a small number of NMR-derived restraints become violated. It is not clear if these represent inaccuracies in the restraints or the effects of dynamic averaging in solution. However, these observations demonstrate the tremendous power of the PDBStat universal restraint converter in allowing a simple conversion between restraint formats, allowing users to rapidly and easily exploit the unique strengths of different structure generation packages using the same experimental data. In this way, users can consider to determine NMR structures in parallel with several different structure generation methods, and to use consensus methods to improve the accuracy of the NOESY data interpretation and the precision and accuracy of the resulting NMR structure models.

Supplementary Material

10858_2013_9753_MOESM1_ESM

NIHMS510885-supplement-10858_2013_9753_MOESM1_ESM.pdf^{(247.8KB, pdf)}

Acknowledgments

We thank all the members of the NMR groups of the Northeast Structural Genomics Consortium who contributed constructive criticisms and test data sets used in the development of PDBStat. Special thanks to C. Arrowsmith, J. Cort, A. Eletsky, L. Fella, Y. J. Huang, A. Lamak, M. Kennedy, G. Liu, J. Prestegard, T. Ramelot, A. Rosato, G.V.T. Swapna, T. Szyperski, Y. Tang, and B. Wu for useful discussions. This work was supported by a grant from the Protein Structure Initiative of the National Institutes of Health (U54-GM094597). RT also acknowledges suppport from CONSOLIDER INGENIO CSD2010-00065 and Generalitat Valenciana PROMETEO 2011/008. DS also acknowledges support from the Research Corporation for Science Advancement, College Cottrell Grant, Award #19803.

Abbreviations

ACO: dihedral angle constraint
CNSw: protocol using the Crystallography and NMR Software (CNS) package for restrained structure refinement in explicit water solvent
DAOP: dihedral angle order parameter
CS: chemical shift
rCS-Rosetta: restrained chemical shift-directed Rosetta
RDC: residual dipolar coupling
SVD: singular value decomposition
RMSD: root mean squared deviation

REFERENCES

1.Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A. J. Biomol. NMR. 1995;6:277–293. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
2.Baran MC, Moseley HN, Aramini JM, Bayro MJ, Monleon D, Locke JY, Montelione GT. Proteins. 2006;62:843–851. doi: 10.1002/prot.20840. [DOI] [PubMed] [Google Scholar]
3.Vranken WF, Boucher W, Stevens TJ, Fogh RH, Pajon A, Llinas M, Ulrich EL, Markley JL, Ionides J, Laue ED. Proteins. 2005;59:687–696. doi: 10.1002/prot.20449. [DOI] [PubMed] [Google Scholar]
4.Zimmerman DE, Kulikowski CA, Huang Y, Feng W, Tashiro M, Shimotakahara S, Chien C, Powers R, Montelione GT. J Mol Biol. 1997;269:592–610. doi: 10.1006/jmbi.1997.1052. [DOI] [PubMed] [Google Scholar]
5.Moseley HN, Montelione GT. Curr Opin Struct Biol. 1999;9:635–642. doi: 10.1016/s0959-440x(99)00019-6. [DOI] [PubMed] [Google Scholar]
6.Moseley HN, Monleon D, Montelione GT. Methods Enzymol. 2001;339:91–108. doi: 10.1016/s0076-6879(01)39311-4. [DOI] [PubMed] [Google Scholar]
7.Huang YJ, Tejero R, Powers R, Montelione GT. Proteins. 2006;62:587–603. doi: 10.1002/prot.20820. [DOI] [PubMed] [Google Scholar]
8.Bahrami A, Assadi AH, Markley JL, Eghbalnia HR. PLoS Comput Biol. 2009;5:e1000307. doi: 10.1371/journal.pcbi.1000307. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Bhattacharya A, Tejero R, Montelione GT. Proteins. 2007;66:778–795. doi: 10.1002/prot.21165. [DOI] [PubMed] [Google Scholar]
10.Doreleijers JF, Sousa da Silva AW, Krieger E, Nabuurs SB, Spronk CA, Stevens TJ, Vranken WF, Vriend G, Vuister GW. J Biomol NMR. 2012 doi: 10.1007/s10858-012-9669-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Mao B, Guan R, Montelione GT. Structure. 2011;19:757–766. doi: 10.1016/j.str.2011.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Han B, Liu Y, Ginzinger SW, Wishart DS. J Biomol NMR. 2011;50:43–57. doi: 10.1007/s10858-011-9478-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Rosato A, Aramini JM, Arrowsmith C, Bagaria A, Baker D, Cavalli A, Doreleijers JF, Eletsky A, Giachetti A, Guerry P, Gutmanas A, Guntert P, He Y, Herrmann T, Huang YJ, Jaravine V, Jonker HR, Kennedy MA, Lange OF, Liu G, Malliavin TE, Mani R, Mao B, Montelione GT, Nilges M, Rossi P, van der Schot G, Schwalbe H, Szyperski TA, Vendruscolo M, Vernon R, Vranken WF, de Vries S, Vuister GW, Wu B, Yang Y, Bonvin AM. Structure. 2012;20:227–236. doi: 10.1016/j.str.2012.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Doreleijers JF, Vranken WF, Schulte C, Markley JL, Ulrich EL, Vriend G, Vuister GW. Nucleic Acids Res. 2012;40:D519–524. doi: 10.1093/nar/gkr1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Nabuurs SB, Spronk CA, Vuister GW, Vriend G. PLoS Comput Biol. 2006;2:e9. doi: 10.1371/journal.pcbi.0020009. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hendrickx PM, Gutmanas A, Kleywegt GJ. Proteins. 2013;81:583–591. doi: 10.1002/prot.24213. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Liu G, Shen Y, Atreya HS, Parish D, Shao Y, Sukumaran DK, Xiao R, Yee A, Lemak A, Bhattacharya A, Acton TA, Arrowsmith CH, Montelione GT, Szyperski T. Proc Natl Acad Sci U S A. 2005;102:10487–10492. doi: 10.1073/pnas.0504338102. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Huang YJ, Moseley HN, Baran MC, Arrowsmith C, Powers R, Tejero R, Szyperski T, Montelione GT. Methods Enzymol. 2005;394:111–141. doi: 10.1016/S0076-6879(05)94005-6. [DOI] [PubMed] [Google Scholar]
19.Baran MC, Huang YJ, Moseley HN, Montelione GT. Chemical reviews. 2004;104:3541–3556. doi: 10.1021/cr030408p. [DOI] [PubMed] [Google Scholar]
20.Bhattacharya A, Wunderlich Z, Monleon D, Tejero R, Montelione GT. Proteins. 2008;70:105–118. doi: 10.1002/prot.21466. [DOI] [PubMed] [Google Scholar]
21.Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM. J Magn Reson. 2003;160:65–73. doi: 10.1016/s1090-7807(02)00014-9. [DOI] [PubMed] [Google Scholar]
22.Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL. Acta crystallographica. 1998;54:905–921. doi: 10.1107/s0907444998003254. [DOI] [PubMed] [Google Scholar]
23.Güntert P, Mumenthaler C, Wüthrich K. J Mol Biol. 1997;273:283–298. doi: 10.1006/jmbi.1997.1284. [DOI] [PubMed] [Google Scholar]
24.Herrmann T, Guntert P, Wuthrich K. J Mol Biol. 2002;319:209–227. doi: 10.1016/s0022-2836(02)00241-3. [DOI] [PubMed] [Google Scholar]
25.Rohl CA, Strauss CE, Misura KM, Baker D. Methods in enzymology. 2004;383:66–93. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]
26.Guntert P, Braun W, Wuthrich K. J Mol Biol. 1991;217:517–530. doi: 10.1016/0022-2836(91)90754-t. [DOI] [PubMed] [Google Scholar]
27.Braun W, Go N. J Mol Biol. 1985;186:611–626. doi: 10.1016/0022-2836(85)90134-2. [DOI] [PubMed] [Google Scholar]
28.Williamson MP, Havel TF, Wuthrich K. J Mol Biol. 1985;182:295–315. doi: 10.1016/0022-2836(85)90347-x. [DOI] [PubMed] [Google Scholar]
29.Havel TF, Wuthrich K. J Mol Biol. 1985;182:281–294. doi: 10.1016/0022-2836(85)90346-8. [DOI] [PubMed] [Google Scholar]
30.Bassolino-Klimas D, Tejero R, Krystek SR, Metzler WJ, Montelione GT, Bruccoleri RE. Protein Sci. 1996;5:593–603. doi: 10.1002/pro.5560050404. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Tejero R, Bassolino-Klimas D, Bruccoleri RE, Montelione GT. Protein Sci. 1996;5:578–592. doi: 10.1002/pro.5560050403. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Markley JL, Bax A, Arata Y, Hilbers CW, Kaptein R, Sykes B, Wright P, Wuthrich K. Pure & Appl Chem. 1998;70:117–142. [Google Scholar]
33.Kabsch W. Acta Crystallographica Section A. 1976;32:922–923. [Google Scholar]
34.Kabsch W. Acta Crystallogr Sec A. 1978;34 [Google Scholar]
35.Snyder DA, Montelione GT. Proteins. 2005;59:673–686. doi: 10.1002/prot.20402. [DOI] [PubMed] [Google Scholar]
36.Hyberts SG, Goldberg MS, Havel TF, Wagner G. Protein Sci. 1992;1:736–751. doi: 10.1002/pro.5560010606. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Kirchner DK, Guntert P. BMC Bioinformatics. 2011;12:170. doi: 10.1186/1471-2105-12-170. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Struyf A, Hubert M, Rousseeuw P. J Statistical Software. 1997;1:1–30. [Google Scholar]
39.Nilges M. J Mol Biol. 1995;245:645–660. doi: 10.1006/jmbi.1994.0053. [DOI] [PubMed] [Google Scholar]
40.Doreleijers JF, Raves ML, Rullmann T, Kaptein R. J Biomol NMR. 1999;14:123–132. doi: 10.1023/a:1008335423527. [DOI] [PubMed] [Google Scholar]
41.Losonczi JA, Andrec M, Fischer MW, Prestegard JH. Journal of magnetic resonance. 1999;138:334–342. doi: 10.1006/jmre.1999.1754. [DOI] [PubMed] [Google Scholar]
42.Zweckstetter M, Bax A. Journal of the American Chemical Society. 2000;122:3791–3792. [Google Scholar]
43.Valafar H, Prestegard JH. J Magn Reson. 2004;167:228–241. doi: 10.1016/j.jmr.2003.12.012. [DOI] [PubMed] [Google Scholar]
44.Cornilescu G, Marquardt JL, Ottiger M, Bax A. J. Am. Chem. Soc. 1998;120:6836–6837. [Google Scholar]
45.Herrmann T, Guntert P, Wuthrich K. J Mol Biol. 2002;319:209–227. doi: 10.1016/s0022-2836(02)00241-3. [DOI] [PubMed] [Google Scholar]
46.Williams T, Kelly C. 2011 URL http://gnuplot.info.
47.Huang YJ, Powers R, Montelione GT. J Am Chem Soc. 2005;127:1665–1674. doi: 10.1021/ja047109h. [DOI] [PubMed] [Google Scholar]
48.Huang YJ, Rosato A, Singh G, Montelione GT. Nucleic Acids Res. 2012;40:W542–546. doi: 10.1093/nar/gks373. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D. Science. 2010;327:1014–1018. doi: 10.1126/science.1183649. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D. Proc Natl Acad Sci U S A. 2012;109:10873–10878. doi: 10.1073/pnas.1203013109. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Doreleijers JF, Mading S, Maziuk D, Sojourner K, Yin L, Zhu J, Markley JL, Ulrich EL. J Biomol NMR. 2003;26:139–146. doi: 10.1023/a:1023514106644. [DOI] [PubMed] [Google Scholar]
52.Zemla A. Nucleic Acids Res. 2003;31:3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Ramelot TA, Raman S, Kuzin AP, Xiao R, Ma LC, Acton TB, Hunt JF, Montelione GT, Baker D, Kennedy MA. Proteins. 2009;75:147–167. doi: 10.1002/prot.22229. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Moseley HN, Sahota G, Montelione GT. J Biomol NMR. 2004;28:341–355. doi: 10.1023/B:JNMR.0000015420.44364.06. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

10858_2013_9753_MOESM1_ESM

NIHMS510885-supplement-10858_2013_9753_MOESM1_ESM.pdf^{(247.8KB, pdf)}

[R1] 1.Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A. J. Biomol. NMR. 1995;6:277–293. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]

[R2] 2.Baran MC, Moseley HN, Aramini JM, Bayro MJ, Monleon D, Locke JY, Montelione GT. Proteins. 2006;62:843–851. doi: 10.1002/prot.20840. [DOI] [PubMed] [Google Scholar]

[R3] 3.Vranken WF, Boucher W, Stevens TJ, Fogh RH, Pajon A, Llinas M, Ulrich EL, Markley JL, Ionides J, Laue ED. Proteins. 2005;59:687–696. doi: 10.1002/prot.20449. [DOI] [PubMed] [Google Scholar]

[R4] 4.Zimmerman DE, Kulikowski CA, Huang Y, Feng W, Tashiro M, Shimotakahara S, Chien C, Powers R, Montelione GT. J Mol Biol. 1997;269:592–610. doi: 10.1006/jmbi.1997.1052. [DOI] [PubMed] [Google Scholar]

[R5] 5.Moseley HN, Montelione GT. Curr Opin Struct Biol. 1999;9:635–642. doi: 10.1016/s0959-440x(99)00019-6. [DOI] [PubMed] [Google Scholar]

[R6] 6.Moseley HN, Monleon D, Montelione GT. Methods Enzymol. 2001;339:91–108. doi: 10.1016/s0076-6879(01)39311-4. [DOI] [PubMed] [Google Scholar]

[R7] 7.Huang YJ, Tejero R, Powers R, Montelione GT. Proteins. 2006;62:587–603. doi: 10.1002/prot.20820. [DOI] [PubMed] [Google Scholar]

[R8] 8.Bahrami A, Assadi AH, Markley JL, Eghbalnia HR. PLoS Comput Biol. 2009;5:e1000307. doi: 10.1371/journal.pcbi.1000307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Bhattacharya A, Tejero R, Montelione GT. Proteins. 2007;66:778–795. doi: 10.1002/prot.21165. [DOI] [PubMed] [Google Scholar]

[R10] 10.Doreleijers JF, Sousa da Silva AW, Krieger E, Nabuurs SB, Spronk CA, Stevens TJ, Vranken WF, Vriend G, Vuister GW. J Biomol NMR. 2012 doi: 10.1007/s10858-012-9669-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Mao B, Guan R, Montelione GT. Structure. 2011;19:757–766. doi: 10.1016/j.str.2011.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Han B, Liu Y, Ginzinger SW, Wishart DS. J Biomol NMR. 2011;50:43–57. doi: 10.1007/s10858-011-9478-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Rosato A, Aramini JM, Arrowsmith C, Bagaria A, Baker D, Cavalli A, Doreleijers JF, Eletsky A, Giachetti A, Guerry P, Gutmanas A, Guntert P, He Y, Herrmann T, Huang YJ, Jaravine V, Jonker HR, Kennedy MA, Lange OF, Liu G, Malliavin TE, Mani R, Mao B, Montelione GT, Nilges M, Rossi P, van der Schot G, Schwalbe H, Szyperski TA, Vendruscolo M, Vernon R, Vranken WF, de Vries S, Vuister GW, Wu B, Yang Y, Bonvin AM. Structure. 2012;20:227–236. doi: 10.1016/j.str.2012.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Doreleijers JF, Vranken WF, Schulte C, Markley JL, Ulrich EL, Vriend G, Vuister GW. Nucleic Acids Res. 2012;40:D519–524. doi: 10.1093/nar/gkr1134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Nabuurs SB, Spronk CA, Vuister GW, Vriend G. PLoS Comput Biol. 2006;2:e9. doi: 10.1371/journal.pcbi.0020009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Hendrickx PM, Gutmanas A, Kleywegt GJ. Proteins. 2013;81:583–591. doi: 10.1002/prot.24213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Liu G, Shen Y, Atreya HS, Parish D, Shao Y, Sukumaran DK, Xiao R, Yee A, Lemak A, Bhattacharya A, Acton TA, Arrowsmith CH, Montelione GT, Szyperski T. Proc Natl Acad Sci U S A. 2005;102:10487–10492. doi: 10.1073/pnas.0504338102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Huang YJ, Moseley HN, Baran MC, Arrowsmith C, Powers R, Tejero R, Szyperski T, Montelione GT. Methods Enzymol. 2005;394:111–141. doi: 10.1016/S0076-6879(05)94005-6. [DOI] [PubMed] [Google Scholar]

[R19] 19.Baran MC, Huang YJ, Moseley HN, Montelione GT. Chemical reviews. 2004;104:3541–3556. doi: 10.1021/cr030408p. [DOI] [PubMed] [Google Scholar]

[R20] 20.Bhattacharya A, Wunderlich Z, Monleon D, Tejero R, Montelione GT. Proteins. 2008;70:105–118. doi: 10.1002/prot.21466. [DOI] [PubMed] [Google Scholar]

[R21] 21.Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM. J Magn Reson. 2003;160:65–73. doi: 10.1016/s1090-7807(02)00014-9. [DOI] [PubMed] [Google Scholar]

[R22] 22.Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL. Acta crystallographica. 1998;54:905–921. doi: 10.1107/s0907444998003254. [DOI] [PubMed] [Google Scholar]

[R23] 23.Güntert P, Mumenthaler C, Wüthrich K. J Mol Biol. 1997;273:283–298. doi: 10.1006/jmbi.1997.1284. [DOI] [PubMed] [Google Scholar]

[R24] 24.Herrmann T, Guntert P, Wuthrich K. J Mol Biol. 2002;319:209–227. doi: 10.1016/s0022-2836(02)00241-3. [DOI] [PubMed] [Google Scholar]

[R25] 25.Rohl CA, Strauss CE, Misura KM, Baker D. Methods in enzymology. 2004;383:66–93. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]

[R26] 26.Guntert P, Braun W, Wuthrich K. J Mol Biol. 1991;217:517–530. doi: 10.1016/0022-2836(91)90754-t. [DOI] [PubMed] [Google Scholar]

[R27] 27.Braun W, Go N. J Mol Biol. 1985;186:611–626. doi: 10.1016/0022-2836(85)90134-2. [DOI] [PubMed] [Google Scholar]

[R28] 28.Williamson MP, Havel TF, Wuthrich K. J Mol Biol. 1985;182:295–315. doi: 10.1016/0022-2836(85)90347-x. [DOI] [PubMed] [Google Scholar]

[R29] 29.Havel TF, Wuthrich K. J Mol Biol. 1985;182:281–294. doi: 10.1016/0022-2836(85)90346-8. [DOI] [PubMed] [Google Scholar]

[R30] 30.Bassolino-Klimas D, Tejero R, Krystek SR, Metzler WJ, Montelione GT, Bruccoleri RE. Protein Sci. 1996;5:593–603. doi: 10.1002/pro.5560050404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Tejero R, Bassolino-Klimas D, Bruccoleri RE, Montelione GT. Protein Sci. 1996;5:578–592. doi: 10.1002/pro.5560050403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Markley JL, Bax A, Arata Y, Hilbers CW, Kaptein R, Sykes B, Wright P, Wuthrich K. Pure & Appl Chem. 1998;70:117–142. [Google Scholar]

[R33] 33.Kabsch W. Acta Crystallographica Section A. 1976;32:922–923. [Google Scholar]

[R34] 34.Kabsch W. Acta Crystallogr Sec A. 1978;34 [Google Scholar]

[R35] 35.Snyder DA, Montelione GT. Proteins. 2005;59:673–686. doi: 10.1002/prot.20402. [DOI] [PubMed] [Google Scholar]

[R36] 36.Hyberts SG, Goldberg MS, Havel TF, Wagner G. Protein Sci. 1992;1:736–751. doi: 10.1002/pro.5560010606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Kirchner DK, Guntert P. BMC Bioinformatics. 2011;12:170. doi: 10.1186/1471-2105-12-170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Struyf A, Hubert M, Rousseeuw P. J Statistical Software. 1997;1:1–30. [Google Scholar]

[R39] 39.Nilges M. J Mol Biol. 1995;245:645–660. doi: 10.1006/jmbi.1994.0053. [DOI] [PubMed] [Google Scholar]

[R40] 40.Doreleijers JF, Raves ML, Rullmann T, Kaptein R. J Biomol NMR. 1999;14:123–132. doi: 10.1023/a:1008335423527. [DOI] [PubMed] [Google Scholar]

[R41] 41.Losonczi JA, Andrec M, Fischer MW, Prestegard JH. Journal of magnetic resonance. 1999;138:334–342. doi: 10.1006/jmre.1999.1754. [DOI] [PubMed] [Google Scholar]

[R42] 42.Zweckstetter M, Bax A. Journal of the American Chemical Society. 2000;122:3791–3792. [Google Scholar]

[R43] 43.Valafar H, Prestegard JH. J Magn Reson. 2004;167:228–241. doi: 10.1016/j.jmr.2003.12.012. [DOI] [PubMed] [Google Scholar]

[R44] 44.Cornilescu G, Marquardt JL, Ottiger M, Bax A. J. Am. Chem. Soc. 1998;120:6836–6837. [Google Scholar]

[R45] 45.Herrmann T, Guntert P, Wuthrich K. J Mol Biol. 2002;319:209–227. doi: 10.1016/s0022-2836(02)00241-3. [DOI] [PubMed] [Google Scholar]

[R46] 46.Williams T, Kelly C. 2011 URL http://gnuplot.info.

[R47] 47.Huang YJ, Powers R, Montelione GT. J Am Chem Soc. 2005;127:1665–1674. doi: 10.1021/ja047109h. [DOI] [PubMed] [Google Scholar]

[R48] 48.Huang YJ, Rosato A, Singh G, Montelione GT. Nucleic Acids Res. 2012;40:W542–546. doi: 10.1093/nar/gks373. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D. Science. 2010;327:1014–1018. doi: 10.1126/science.1183649. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D. Proc Natl Acad Sci U S A. 2012;109:10873–10878. doi: 10.1073/pnas.1203013109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Doreleijers JF, Mading S, Maziuk D, Sojourner K, Yin L, Zhu J, Markley JL, Ulrich EL. J Biomol NMR. 2003;26:139–146. doi: 10.1023/a:1023514106644. [DOI] [PubMed] [Google Scholar]

[R52] 52.Zemla A. Nucleic Acids Res. 2003;31:3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Ramelot TA, Raman S, Kuzin AP, Xiao R, Ma LC, Acton TB, Hunt JF, Montelione GT, Baker D, Kennedy MA. Proteins. 2009;75:147–167. doi: 10.1002/prot.22229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Moseley HN, Sahota G, Montelione GT. J Biomol NMR. 2004;28:341–355. doi: 10.1023/B:JNMR.0000015420.44364.06. [DOI] [PubMed] [Google Scholar]

PERMALINK

PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR

Roberto Tejero

David Snyder

Binchen Mao

James M Aramini

Gaetano T Montelione

Abstract

INTRODUCTION

DESCRIPTION OF THE SOFTWARE

1. Universal protein coordinate and protein NMR restraint converter

2. Protein atomic coordinate analysis

Relabeling of prochiral methylene and isopropyl methyl groups

Fig. 1.

Identifying “well defined” and “not well defined” regions of the protein structure

Dihedral angle order parameters

Variance matrix algorithm

Optimally superimposing coordinates (RMSD calculations)

Selecting a representative NMR model from the ensemble

3. Restraint analysis

Restraint statistics and restraint violation analyses

NOE completeness metric

Analysis of residual dipolar coupling (RDC) restraints

Parsing restraint files downloaded from the PDB

APPLICATIONS

1. Identifying conformationally-restricting restraints

Table 1.

Relabeling of prochiral methylene and isopropyl methyl groups and sidechain amide atoms

Table 2.

Analysis of restraint density

Fig. 2.

Analysis of contact maps derived from restraint lists or derived from 3D models

Fig. 3.

Analysis of backbone and sidechain dihedral angle order parameter (DAOP)

Fig. 4.

Analysis of “well defined” and “not well defined” regions of the protein structure

Fig. 5.

Generating protein structures using restrained CS-Rosetta

Table 3.

Fig. 6.

Table 4.

DISCUSION

Supplementary Material

Acknowledgments

Abbreviations

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases