. Author manuscript; available in PMC: 2018 Aug 13.

Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2017 Jun 6;15(4):1315–1324. doi: 10.1109/TCBB.2017.2712607

ALGORITHM 1.

Pipeline of the semi-supervised self-training clustering algorithm under low-rank representation

Input

X - The original data matrix;

λ - The control parameter of LRR;

dist_Z, dist_E - The distance metrics of K-means for clustering Z and E, respectively;

maxIterNum - The max number of iteration.

Output

I_Z - The clustering results, among which the labels of unlabeled samples are predicted.

Step 1:

Perform LRR on original data matrix X.

min_{Z, E} {‖ Z ‖}_{*} + λ {‖ E ‖}_{2, 1} s . t ., X = XZ + E;

Re-arrange Z and E as

Z = [\begin{matrix} Z_{l} & Z_{u} \end{matrix}]

and

E = [\begin{matrix} E_{l} & E_{u} \end{matrix}]

, respectively;

currentIterNum ← 0; // Counter of clustering iterations

Step 2:

Perform K-means algorithm on Z and E, respectively.

l_Z = K-means(Z,dist_Z); //Using (1) to determine the initial point of each cluster.

l_E = K-means(E,dist_E); // Using a method similar to (1) to determine the initial point of each cluster

// l_Z and l_E are the clustering results on Z and E, respectively;

currentIterNum ← currentIterNum 1;

Step 3:

Select unlabeled samples as labeled ones for next round clustering.

S ← Φ;

FOR each unlabeled sample i in Z_u

Let l_{z_i} and l_{e_i} be the predicted labels of sample i according to the clustering results of l_Z and l_E, respectively.

IF l_{z_i} = l_{e_i}

S ← S ∪{i}; //select an unlabeled sample

END IF

END FOR

Step 4:

Decide whether to terminate the algorithm, and update Z and E.

IF S = ø; or currentIterNum > maxIterNum:

RETURN l_Z;

ELSE

FOR each selected unlabeled sample i in S

move

z_{i}^{u}

from Z_u to Z_l;

move

e_{i}^{u}

from E_u to E_l;

END FOR

Z \leftarrow [\begin{matrix} Z_{l} & Z_{u} \end{matrix}]

; //update Z

E \leftarrow [\begin{matrix} E_{l} & E_{u} \end{matrix}]

; //update E

GOTO Step 2;

END IF