Skip to main content
. 2018 Feb 9;7:e31486. doi: 10.7554/eLife.31486

Appendix 1—table 6. Retrospective analysis of predictor quality at different stages during the training process.

AUC values for distinguishing proteomic phase-separating sequences from the human proteome are shown for prediction scores made from pi-contact frequencies (average contacts predicted per residue) obtained at each training step of the protocol in order of their sequential development, with prediction scores calculated as the highest number of contacts predicted for any given 100 residue window in each sequence. Analysis of the relative effects of different contact types was added by excluding contacts from each score and retesting. Standard error of the mean (SEM), by bootstrap analysis, is consistently in the range from 0.021 to 0.039.

Training step AUC at training step Sidechain
contacts
only
Backbone
contacts
only
Short-range sidechain only Long-range sidechain only Short-range backbone
only
Long-range backbone
only
(1) Baseline Frequencies 0.57 0.51 0.84 0.52 0.50 0.73 0.80
2) Context-Averaged Frequencies 0.57 0.51 0.86 0.53 0.51 0.77 0.83
(3) Smoothed Frequency Predictions 0.82 0.64 0.89 0.59 0.65 0.71 0.85
(4) Weight Optimized
Final Predictor
0.88 N/A N/A N/A N/A N/A N/A