Figure 1. The data process pipeline of LAceP.
The dataset was derived from SysPTM 2.0 (http://lifecenter.sgst.cn/SysPTM/) and PhosphoSitePlus (http://www.phosphosite.org/). After eliminating redundancy, the non-redundant sites were obtained. Independent dataset was selected from positive dataset and negative dataset randomly at first. Then the remaining positive items and the same number of negative items, selected randomly from the whole negative dataset, were combined to construct training datasets. The selection process was iterated 10 times. After encoding three types of features, the logistic regression algorithm was utilized to build the classifier. After parameter optimization and performance evaluation, the best model was created. Finally, a web server of LAceP was established for biologist to use the prediction model.