EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites

. 2017 Sep 5;22(9):1463. doi: 10.3390/molecules22091463

Algorithm 1. An enhanced positive-unlabeled learning algorithm.
Input	P—Positive training set; U—Unlabeled training set; $\partial$ —The distance coefficient; $V_{s_{i}}$ —Sequence $s_{i}$ in P and U;
	Model_1,2,3,4,5—Five models trained by five subsets with P respectively;
	N_1,2,3,4,5—Five negative sets predicted by Model_1,2,3,4,5 on the remaining unlabeled training set respectively; cs—Common sequences of five negative sets N_1,2,3,4,5 $N_{s v}$ —Negative support vectors of five Model_1,2,3,4,5
Output	F—Final classifier.
Stage 0:	Initialization
	l $\leftarrow$ 0; Avg_dist = 0; LN = ∅; RN = ∅; i
Stage 1:	Select the reliably negative initial set
	pr = $\sum_{i}^{\| P \|} V_{s_{i}} / \| P \|$ ;
	Avg_dist + = $\sum_{i}^{\| U \|} d i s t (p r, V_{s_{i}}) / \| U \|$ ;
	FOR i from 1 to \|U\|
	IF dist(pr, $V_{s_{i}}$ ) > Avg_dist * $\partial$
	LN = LN∪{S_i};
	END IF
	END FOR
	Randomly divide the LN into five subsets D₁, D₂, D₃, D₄, D₅.
	FOR i from 1 to 5
	Model_i = SVM(P, D_i); N_i = Model_i(U − LN);
	END FOR
	The common sequence are represented to reliably negative initial set cs = N₁ ∩ N₂ ∩ N₃ ∩ N₄ ∩ N₅; RN⁰ = RN⁰ ∪ cs; then the negative support vectors $N_{s v}$ of five models are included in ${R N}^{0}$ = ${R N}^{0} \cup^{} N_{s v}$ .
Stage 2	Expand the reliably negative set
	WHILE TRUE
	IF U^l > 5∗\|P\|
	$U^{l + 1}$ = $U^{l}$ − $N_{p r e d}^{l}$ ;
	${R N}^{l + 1}$ = $N_{p r e d}^{l} \cup^{} N_{s v}^{l}$ ;
	ELSE IF U^l < 5 ∗ \|P\|
	Go to Stage 3
	END IF
	Train a SVM classifier f^l⁺¹ on the P $\cup$ RN^l⁺¹ with optimal parameter C and γ.
	Each sequence x_i in U^l⁺¹ would have a decision value f(x_i) through the obtained f^l⁺¹, use the threshold T to get the reliably negative set.
	l $\leftarrow$ l + 1
Stage 3	Return the final classifier
	Return F = (P, RN)