Skip to main content
. 2017 Sep 5;22(9):1463. doi: 10.3390/molecules22091463
Algorithm 1. An enhanced positive-unlabeled learning algorithm.
Input P—Positive training set; U—Unlabeled training set;
—The distance coefficient; Vsi—Sequence si in P and U;
Model1,2,3,4,5—Five models trained by five subsets with P respectively;
N1,2,3,4,5—Five negative sets predicted by Model1,2,3,4,5 on the remaining unlabeled training set respectively;
cs—Common sequences of five negative sets N1,2,3,4,5
Nsv—Negative support vectors of five Model1,2,3,4,5
Output F—Final classifier.
Stage 0: Initialization
l0; Avg_dist = 0; LN = ∅; RN = ∅; i
Stage 1: Select the reliably negative initial set
pr = i|P|Vsi/|P|;
Avg_dist + = i|U|dist(pr,Vsi)/|U|;
FOR i from 1 to |U|
IF dist(pr,Vsi) > Avg_dist *
LN = LN∪{Si};
END IF
END FOR
Randomly divide the LN into five subsets D1, D2, D3, D4, D5.
FOR i from 1 to 5
Modeli = SVM(P, Di);
Ni = Modeli(ULN);
END FOR
The common sequence are represented to reliably negative initial set
cs = N1N2N3N4N5; RN0 = RN0cs;
then the negative support vectors Nsv of five models are included in RN0 = RN0 Nsv.
Stage 2 Expand the reliably negative set
WHILE TRUE
IF Ul > 5∗|P|
Ul+1 = UlNpredl;
RNl+1 = Npredl Nsvl;
ELSE IF Ul < 5 ∗ |P|
Go to Stage 3
END IF
Train a SVM classifier fl+1 on the PRNl+1 with optimal parameter C and γ.
Each sequence xi in Ul+1 would have a decision value f(xi) through the obtained fl+1, use the threshold T to get the reliably negative set.
l l + 1
Stage 3 Return the final classifier
Return F = (P, RN)