Skip to main content
. Author manuscript; available in PMC: 2019 Jun 6.
Published in final edited form as: Proceedings VLDB Endowment. 2018 Sep;11(13):2263–2276.

Algorithm 2.

Entropy based Subset Selection

visx[i] : no. of iterations row i was selected
iterations: no. of iterations user has gone through
impact[i][j]: no. of times coli is filled but colj is missing
entropyx[i]: entropy of row i
entropyyrows[i]: entropy of column i for {rows}
I[x][i]: Indicator function = 1 if value is present in row x and column y, 0 otherwise {rows}
1: procedure GENERATE_SUBSET(C)
2:   X = C.rows, Y = C.columns
3:   temperature=italicized100
4:   for i=1X.length do
5:     scoresx[i]iterationsvisx[i]+entropyx[i]+missingx[i]
6:   end for
7:   rowsselect p rows with probability=scoresx
8:   y select column to optimize for, with probability proportional to missing values
9:   for i=1Y.length do
10:     scores[i]sim[y][i].impact[y][i].xrowsI[x][i]+entropyyrows[i]×temperature
11:   end for
12:   columns = Top q indices from scoresy
13:   return c=rowscolumns
14: end procedure