. 2018 Feb 9;7:e31486. doi: 10.7554/eLife.31486

Appendix 1—table 6. Retrospective analysis of predictor quality at different stages during the training process.

AUC values for distinguishing proteomic phase-separating sequences from the human proteome are shown for prediction scores made from pi-contact frequencies (average contacts predicted per residue) obtained at each training step of the protocol in order of their sequential development, with prediction scores calculated as the highest number of contacts predicted for any given 100 residue window in each sequence. Analysis of the relative effects of different contact types was added by excluding contacts from each score and retesting. Standard error of the mean (SEM), by bootstrap analysis, is consistently in the range from 0.021 to 0.039.

Training step	AUC at training step	Sidechain contacts only	Backbone contacts only	Short-range sidechain only	Long-range sidechain only	Short-range backbone only	Long-range backbone only
(1) Baseline Frequencies	0.57	0.51	0.84	0.52	0.50	0.73	0.80
2) Context-Averaged Frequencies	0.57	0.51	0.86	0.53	0.51	0.77	0.83
(3) Smoothed Frequency Predictions	0.82	0.64	0.89	0.59	0.65	0.71	0.85
(4) Weight Optimized Final Predictor	0.88	N/A	N/A	N/A	N/A	N/A	N/A