. Author manuscript; available in PMC: 2012 May 23.

Published in final edited form as: Data Min Knowl Discov. 2011 Sep 8;25(1):109–133. doi: 10.1007/s10618-011-0234-x

Algorithm 1.

Outline of our approach, showing the use of cross-validation to estimate training set error and the use of multiple prediction models. The running time depends on the supervised feature prediction algorithm(s) and is otherwise linear in the number of features and feature predictors.

input:N training examples Inline graphic

= {x₁,x₂, …,x_N }, any number of test examples x_q

for each feature i ∈ {1, 2, …, D} do

for each feature prediction model p ∈ {1, 2, …, P} do

A_p,i ← ∅//A_p,i will be a set of training set

//(observed feature value, predicted feature value) pairs we can use to

//build an error model to estimate P(x_qi |C_p,i (ρ_i (x_q))) for anyx_q

for each cross-validation fold fdo

← divide( Inline graphic

, f)//divide Inline graphic

into a training set Inline graphic

//and a validation set Inline graphic

, unique to this fold

valset_f ← (ρ_i (x), x_ji) for each x_j ∈ Inline graphic

trainset _f ← (ρ_i (x), x_ji) for each x_j ∈ Inline graphic

C_{p,i, f} ← train _p(trainset _f)//learn a feature i predictor using model p

A_p,i ← A_p,i ∪ (x_ji, C_{p,i, f} (ρ_i (x_j))) for each x_j ∈ valset _f

end for

E_p,i ← error_model(A _p,i)//model the distribution of error A _p,i

//(the model type depends on the type of feature i, see text)

trainset ← (ρ_i (x_j), x_ji) for each x_j ∈ Inline graphic

C_p,i ← train(trainset)//the “final” predictor trained on the entire

//training set Inline graphic

, used to make test set predictions

end for

for each test example x_qdo

//output the normalized surprisal score as the sum

//over P feature prediction models.

//P(x_qi) depends on C _p,i (ρ_i (x_q))) and E _p,i.

output:

\sum_{p = 1}^{P} \sum_{i = 1}^{D} {\begin{cases} 0 if x_{q i} is missing, otherwise : \\ surprisal (P (x_{q i} ∣ E_{p, i}, C_{p, i} (ρ_{i} (x_{q}))) - entropy ({x_{1 i}, \dots, x_{N i}}) \end{cases}

end for