. Author manuscript; available in PMC: 2016 Sep 1.

Published in final edited form as: J Biomol NMR. 2015 Jul 4;63(1):39–52. doi: 10.1007/s10858-015-9961-4

Table 1.

Prediction results for different attribute categories^a

Category	Manual^b		Automated^c		Automated-Plus^d
	rms	n	rms	n	rms	sdev	n
Hydrogen
Cross-validated	0.12	12,131	0.12	11,953	0.13	1.33	18,774
Canonical^e	0.05	2007	0.05	2061	0.06	1.28	3020
Non-canonical^f	0.05	2059	0.06	1966	0.07	1.29	2903
Other^g	0.09	8065	0.10	7926	0.11	1.35	12,851
All	0.08	12,131	0.08	11,953	0.10	1.33	18,774
Carbon
Cross-validated	0.80	5554	0.80	5559	0.83	28.44	9642
Canonical^e	0.41	1040	0.41	1072	0.46	28.04	1630
Non-canonical^f	0.42	949	0.42	916	0.47	28.34	1526
Other^g	0.79	3565	0.81	3571	0.85	28.57	6486
All	0.68	5554	0.69	5559	0.75	28.44	9642

Output from the support vector regression analysis. The SVR is done separately on each atom type. This table presents the values aggregated across all the hydrogen and carbon atoms used. The columns labeled rms represent the square root of the mean of squared deviations between predicted and experimental values for all the data in the corresponding category. The rms values in the cross-validated rows are the output from the SVR program when performing a tenfold stratified cross-validation and are based on the data values in all categories. Other rms values are calculated on the indicated subset of data values. The columns labeled n represent the number of data values used in the specified category. The column labeled sdev is the standard deviation of all the experimental hydrogen or carbon shifts in the corresponding categories and is included only for the automated-plus section as this measure of dispersion is very similar for all three groups

Manual refers to analysis done using the attribute templates created by manual analysis and the shift datasets used in our previous analysis (Barton et al. 2013)

Automated refers to analysis done using the mostly-automated attribute generation described in this paper using the same set of datasets as in our previous analysis

Automated-Plus refers to analysis done using the automated analysis described here and the new larger number of datasets

Canonical bases are the central base in a 5 base stretch in which all 5 base pairs have GC or AU base pairing and no other attributes such as being in a triplet, kissing interaction or pseudoknots are present

Non-canonical bases are the same as canonical, but the first and/or fifth bases may be GU wobble base pairs, mismatched, unpaired (e.g. loops) or not-present (e.g. the 5′ or 3′ termini)

Other bases are all bases that are in neither the canonical nor non-canonical categories