Skip to main content
. Author manuscript; available in PMC: 2017 Jul 24.
Published in final edited form as: Proc (IEEE Int Conf Healthc Inform). 2013 Dec 12;2013:66–73. doi: 10.1109/ICHI.2013.15

Algorithm 1.

Optimal Training Set Selection

1: for all Binary datasets B1 to Bq do
2:  Move 20% of the positive examples and 20% of the negative examples from Bi to a validation dataset (Vi).
3:  Put the remaining positive examples into a smaller training dataset (STSi).
4:  Score the remaining negative examples in Bi according to their similarity with positive examples.
5:  Initialize snapshot variable k = 1
6: while Bi is not empty do
7:   Remove the top 10% scored negative examples in Bi and add them to STSi.
8:   Record the snapshot of the current training set, STSik=STSi.
9:   Build a binary classifier for i-th code with training dataset STSik and record the F1.5 score on Vi.
10:   k = k + 1
11:  Set the optimal training set OTS = the snapshot STSik with the highest F1.5 score.