Skip to main content
. Author manuscript; available in PMC: 2016 Sep 1.
Published in final edited form as: J Biomol NMR. 2015 Jul 4;63(1):39–52. doi: 10.1007/s10858-015-9961-4

Table 1.

Prediction results for different attribute categoriesa

Category Manualb
Automatedc
Automated-Plusd
rms n rms n rms sdev n
Hydrogen
Cross-validated 0.12 12,131 0.12 11,953 0.13 1.33 18,774
Canonicale 0.05 2007 0.05 2061 0.06 1.28 3020
Non-canonicalf 0.05 2059 0.06 1966 0.07 1.29 2903
Otherg 0.09 8065 0.10 7926 0.11 1.35 12,851
All 0.08 12,131 0.08 11,953 0.10 1.33 18,774
Carbon
Cross-validated 0.80 5554 0.80 5559 0.83 28.44 9642
Canonicale 0.41 1040 0.41 1072 0.46 28.04 1630
Non-canonicalf 0.42 949 0.42 916 0.47 28.34 1526
Otherg 0.79 3565 0.81 3571 0.85 28.57 6486
All 0.68 5554 0.69 5559 0.75 28.44 9642
a

Output from the support vector regression analysis. The SVR is done separately on each atom type. This table presents the values aggregated across all the hydrogen and carbon atoms used. The columns labeled rms represent the square root of the mean of squared deviations between predicted and experimental values for all the data in the corresponding category. The rms values in the cross-validated rows are the output from the SVR program when performing a tenfold stratified cross-validation and are based on the data values in all categories. Other rms values are calculated on the indicated subset of data values. The columns labeled n represent the number of data values used in the specified category. The column labeled sdev is the standard deviation of all the experimental hydrogen or carbon shifts in the corresponding categories and is included only for the automated-plus section as this measure of dispersion is very similar for all three groups

b

Manual refers to analysis done using the attribute templates created by manual analysis and the shift datasets used in our previous analysis (Barton et al. 2013)

c

Automated refers to analysis done using the mostly-automated attribute generation described in this paper using the same set of datasets as in our previous analysis

d

Automated-Plus refers to analysis done using the automated analysis described here and the new larger number of datasets

e

Canonical bases are the central base in a 5 base stretch in which all 5 base pairs have GC or AU base pairing and no other attributes such as being in a triplet, kissing interaction or pseudoknots are present

f

Non-canonical bases are the same as canonical, but the first and/or fifth bases may be GU wobble base pairs, mismatched, unpaired (e.g. loops) or not-present (e.g. the 5′ or 3′ termini)

g

Other bases are all bases that are in neither the canonical nor non-canonical categories