Skip to main content
. 2022 Sep 28;23(Suppl 3):399. doi: 10.1186/s12859-022-04938-x

Fig. 5.

Fig. 5

The overall framework of this study. The top panel outlines the process of constructing eukaryotic and prokaryotic datasets. In total, 560 verified pHis sites with experimental evidence were manually collected as positive samples, and 7233 non-pHis sites from the same protein were extracted as negative samples. Based on the local sequences (31 aa) flanking His sites, BLASTCLUST and CD-HIT were used to reduce the data redundancy. The bottom panel illustrates the detailed procedures for constructing pHisPred. Five window sizes were used to extract local sequences flanking His sites. For each window size, ten thousands of features were calculated. Features with constant values were removed. Based on the performance evaluation, the optimal combinations of window size, feature number, and model were individually selected to build eukaryotic and prokaryotic classification models in pHisPred