| Algorithm 1. An enhanced positive-unlabeled learning algorithm. | |
| Input |
P—Positive training set; U—Unlabeled training set; —The distance coefficient; —Sequence in P and U; |
| Model1,2,3,4,5—Five models trained by five subsets with P respectively; | |
|
N1,2,3,4,5—Five negative sets predicted by Model1,2,3,4,5 on the remaining unlabeled training set respectively; cs—Common sequences of five negative sets N1,2,3,4,5 —Negative support vectors of five Model1,2,3,4,5 | |
| Output | F—Final classifier. |
| Stage 0: | Initialization |
| l0; Avg_dist = 0; LN = ∅; RN = ∅; i | |
| Stage 1: | Select the reliably negative initial set |
| pr = ; | |
| Avg_dist + = ; | |
| FOR i from 1 to |U| | |
| IF dist(pr,) > Avg_dist * | |
| LN = LN∪{Si}; | |
| END IF | |
| END FOR | |
| Randomly divide the LN into five subsets D1, D2, D3, D4, D5. | |
| FOR i from 1 to 5 | |
|
Modeli = SVM(P, Di); Ni = Modeli(U − LN); |
|
| END FOR | |
| The common sequence are represented to reliably negative initial set cs = N1 ∩ N2 ∩ N3 ∩ N4 ∩ N5; RN0 = RN0 ∪ cs; then the negative support vectors of five models are included in = . |
|
| Stage 2 | Expand the reliably negative set |
| WHILE TRUE | |
| IF Ul > 5∗|P| | |
| = −; | |
| = ; | |
| ELSE IF Ul < 5 ∗ |P| | |
| Go to Stage 3 | |
| END IF | |
| Train a SVM classifier fl+1 on the PRNl+1 with optimal parameter C and γ. | |
| Each sequence xi in Ul+1 would have a decision value f(xi) through the obtained fl+1, use the threshold T to get the reliably negative set. | |
| l l + 1 | |
| Stage 3 | Return the final classifier |
| Return F = (P, RN) | |