. Author manuscript; available in PMC: 2011 May 1.

Published in final edited form as: Proteins. 2010 May 1;78(6):1583–1593. doi: 10.1002/prot.22677

Table 1.

Properties used by the predictor, organized by feature class.

Feature value properties
Torsion meta-feature		Accuracy	IG

P_samp	Rosetta sampling rate	88.9%
lowE	10^th percentile energy of models with the feature value	76.4%	0.016
minE	minimum energy of models with the feature value	87.7%	0.040
frag	rate of occurence of the feature value in the fragments	86.2%	0.039
loop	indicates either an E or O torsion feature value

P_pred	output of nativeness predictor	91.1%	0.081
Secondary structure meta-feature		Accuracy	IG

P_samp	Rosetta sampling rate	87.2%
lowE	10^th percentile energy of models with the feature value	72.8%	0.018
minE	minimum energy of models with the feature value	86.2%	0.023
psipred	secondary structure prediction from Psipred	87.7%	0.034
jufo	secondary structure prediction from JUFO	80.9%	0.010

P_pred	output of nativeness predictor	91.8%	0.055
Topology meta-feature		Accuracy	IG

P_samp	Rosetta sampling rate	21.4%
lowE	10^th percentile energy of models with the feature value	21.4%	0.032
minE	minimum energy of models with the feature value	46.4%	0.023
co	approximate contact order of a structure with the given topology

P_pred	output of nativeness predictor	60.7%	0.036
Register meta-feature		Accuracy	IG

P_samp	Rosetta sampling rate	54.0%
lowE	10^th percentile energy of models with the feature value	44.7%	0.065
minE	minimum energy of models with the feature value	61.2%	0.057
bulge	indicates the presence of at least one beta bulge in the register

P_pred	output of nativeness predictor	57.6%	0.066
Contact meta-feature		Accuracy	IG

P_samp	Rosetta sampling rate	85.4%
lowE	10^th percentile energy of models with the feature value	68.9%	0.002
edgedist	distance (in residue numbers) of a contact from either end of a pairing
oddpleat	indicates an anomaly in the pleating pattern

P_pred	output of nativeness predictor	88.3%	0.005

A native feature value is correctly identified by a property if the property is higher (or lower, in the case of energy properties) for the native feature value than for any other values of the associated feature. The “Accuracy” column indicates the percentage of features from our benchmark whose native values were correctly identified by each property. Accuracy values have been omitted for properties that are only informative in conjunction with others and so have no predictive value on their own. P_pred, the output of the native feature value predictor, is included here for comparison. Predictors were trained using leave-one-out training on the benchmark set of 28 proteins. Accuracy measures were computed on the left-out protein and averaged across the set. The “IG” column indicates the average information gain for a predictor $P_{pred}^{'}$ based only on P_samp and the indicated property, versus the baseline predictor P_samp, in units of bits per residue—total gain for features in each class for a given protein is divided by the number of residues in the protein. Results are averaged across proteins in our benchmark. Note that information gain can be large even for properties which do not yield accuracy increases if rare native feature values are often substantially enriched. The information gain given for P_pred is the gain when all properties are included in the predictor.