Skip to main content
. Author manuscript; available in PMC: 2011 Jun 1.
Published in final edited form as: Biochim Biophys Acta. 2010 Feb 1;1804(6):1231–1264. doi: 10.1016/j.bbapap.2010.01.017

Table 1.

Accuracy and improvement of neural network predictors of natural disordered regions (PONDRs®)

Name Training Set # Disordered Residues Accuracies %
Ordera Disorderb
XL1 7 X-ray 502 71 47
VL1 7 NMR, 8 X-ray 1,366 83 45
XL-XT VL1 plus XTc 71 59
VL2 53 X-ray,
35 NMR,
52 CD
17,978 76 65
VL3d 54 X-ray,
40 NMR,
58 CD
22,434 84 59
VSL1e 230 long DRf
983 short DR
Ordered regions
25,958
9,632
354,169
83 79
VSL2g 230 long DR
983 short DR
Ordered regions
25,958
9,632
354,169
81 82
a

O_PDB_S25

b

Combined dis_X-ray, dis_NMR and dis_CD

c

XT is a joint name for the N-terminus (XN), and the C-terminus (XC) predictors, which were trained using x-ray crystallographic data, where the terminal disordered regions were 5 or more amino acids in length.

d

Besides the addition of a few more chains, substantial cleaning of the training databases was carried out between VL2 and VL3. Several incorrectly labeled chains were identified and fixed and order/disorder boundaries were adjusted in a few other proteins

e

The VSL1 predictor combines two predictors optimized for long (>30 residues) and short (≤30 residues) disordered regions, respectively, using weights generated by a third meta-predictor. The attributes used include amino acid frequencies, sequence complexity, ratio of net charge / hydrophobicity, averaged flexibility, and averaged PSI-BLAST profiles calculated over symmetric input windows.

f

Disordered region

g

VSL2 is a slightly improved version of VSL1 predictor. The training data for VSL2 were slightly different: 8 ambiguous sequences were removed; His-tags were not used in training, short DR of 1-3 residues were not used in training. Also, linear SVM instead of logistic regression was used for VSL2 version (Kang Peng, personal communication).

Note: Both VSL1 and VSL2 take advantage of length dependencies