Table 1.
Name | Training Set | # Disordered Residues | Accuracies % | |
---|---|---|---|---|
Ordera | Disorderb | |||
XL1 | 7 X-ray | 502 | 71 | 47 |
VL1 | 7 NMR, 8 X-ray | 1,366 | 83 | 45 |
XL-XT | VL1 plus XTc | 71 | 59 | |
VL2 | 53 X-ray, 35 NMR, 52 CD |
17,978 | 76 | 65 |
VL3d | 54 X-ray, 40 NMR, 58 CD |
22,434 | 84 | 59 |
VSL1e | 230 long DRf 983 short DR Ordered regions |
25,958 9,632 354,169 |
83 | 79 |
VSL2g | 230 long DR 983 short DR Ordered regions |
25,958 9,632 354,169 |
81 | 82 |
O_PDB_S25
Combined dis_X-ray, dis_NMR and dis_CD
XT is a joint name for the N-terminus (XN), and the C-terminus (XC) predictors, which were trained using x-ray crystallographic data, where the terminal disordered regions were 5 or more amino acids in length.
Besides the addition of a few more chains, substantial cleaning of the training databases was carried out between VL2 and VL3. Several incorrectly labeled chains were identified and fixed and order/disorder boundaries were adjusted in a few other proteins
The VSL1 predictor combines two predictors optimized for long (>30 residues) and short (≤30 residues) disordered regions, respectively, using weights generated by a third meta-predictor. The attributes used include amino acid frequencies, sequence complexity, ratio of net charge / hydrophobicity, averaged flexibility, and averaged PSI-BLAST profiles calculated over symmetric input windows.
Disordered region
VSL2 is a slightly improved version of VSL1 predictor. The training data for VSL2 were slightly different: 8 ambiguous sequences were removed; His-tags were not used in training, short DR of 1-3 residues were not used in training. Also, linear SVM instead of logistic regression was used for VSL2 version (Kang Peng, personal communication).
Note: Both VSL1 and VSL2 take advantage of length dependencies