. Author manuscript; available in PMC: 2011 Jun 1.

Published in final edited form as: Biochim Biophys Acta. 2010 Feb 1;1804(6):1231–1264. doi: 10.1016/j.bbapap.2010.01.017

Table 1.

Accuracy and improvement of neural network predictors of natural disordered regions (PONDRs^®)

Name	Training Set	# Disordered Residues	Accuracies %
Name	Training Set	# Disordered Residues	Order^a	Disorder^b
XL1	7 X-ray	502	71	47
VL1	7 NMR, 8 X-ray	1,366	83	45
XL-XT	VL1 plus XT^c		71	59
VL2	53 X-ray, 35 NMR, 52 CD	17,978	76	65
VL3^d	54 X-ray, 40 NMR, 58 CD	22,434	84	59
VSL1^e	230 long DR^f 983 short DR Ordered regions	25,958 9,632 354,169	83	79
VSL2^g	230 long DR 983 short DR Ordered regions	25,958 9,632 354,169	81	82

O_PDB_S25

Combined dis_X-ray, dis_NMR and dis_CD

XT is a joint name for the N-terminus (XN), and the C-terminus (XC) predictors, which were trained using x-ray crystallographic data, where the terminal disordered regions were 5 or more amino acids in length.

Besides the addition of a few more chains, substantial cleaning of the training databases was carried out between VL2 and VL3. Several incorrectly labeled chains were identified and fixed and order/disorder boundaries were adjusted in a few other proteins

The VSL1 predictor combines two predictors optimized for long (>30 residues) and short (≤30 residues) disordered regions, respectively, using weights generated by a third meta-predictor. The attributes used include amino acid frequencies, sequence complexity, ratio of net charge / hydrophobicity, averaged flexibility, and averaged PSI-BLAST profiles calculated over symmetric input windows.

Disordered region

VSL2 is a slightly improved version of VSL1 predictor. The training data for VSL2 were slightly different: 8 ambiguous sequences were removed; His-tags were not used in training, short DR of 1-3 residues were not used in training. Also, linear SVM instead of logistic regression was used for VSL2 version (Kang Peng, personal communication).

Note: Both VSL1 and VSL2 take advantage of length dependencies