. Author manuscript; available in PMC: 2017 Jul 24.

Published in final edited form as: Proc (IEEE Int Conf Healthc Inform). 2013 Dec 12;2013:66–73. doi: 10.1109/ICHI.2013.15

Algorithm 1.

Optimal Training Set Selection

1:	for all Binary datasets B₁ to B_q do
2:	Move 20% of the positive examples and 20% of the negative examples from B_i to a validation dataset (V_i).
3:	Put the remaining positive examples into a smaller training dataset (STS_i).
4:	Score the remaining negative examples in B_i according to their similarity with positive examples.
5:	Initialize snapshot variable k = 1
6:	while B_i is not empty do
7:	Remove the top 10% scored negative examples in B_i and add them to STS_i.
8:	Record the snapshot of the current training set, ${STS}_{i}^{k} = S T S_{i}$ .
9:	Build a binary classifier for i-th code with training dataset ${STS}_{i}^{k}$ and record the F_1.5 score on V_i.
10:	k = k + 1
11:	Set the optimal training set OTS = the snapshot ${STS}_{i}^{k}$ with the highest F_1.5 score.