Skip to main content
. Author manuscript; available in PMC: 2012 May 23.
Published in final edited form as: Data Min Knowl Discov. 2011 Sep 8;25(1):109–133. doi: 10.1007/s10618-011-0234-x

Algorithm 1.

Outline of our approach, showing the use of cross-validation to estimate training set error and the use of multiple prediction models. The running time depends on the supervised feature prediction algorithm(s) and is otherwise linear in the number of features and feature predictors.

input:N training examples Inline graphic = {x1,x2,,xN }, any number of test examples xq
for each feature i ∈ {1, 2, …, D} do
for each feature prediction model p ∈ {1, 2,, P} do
  Ap,i ← ∅//Ap,i will be a set of training set
   //(observed feature value, predicted feature value) pairs we can use to
   //build an error model to estimate P(xqi |Cp,i (ρi (xq))) for anyxq
  for each cross-validation fold fdo
    Inline graphic,Inline graphicdivide( Inline graphic, f)//divideInline graphicinto a training setInline graphic
    //and a validation setInline graphic, unique to this fold
   valsetf ← (ρi (x), xji) for each xjInline graphic
   trainset f ← (ρi (x), xji) for each xjInline graphic
   Cp,i, ftrain p(trainset f)//learn a feature i predictor using model p
   Ap,iAp,i ∪ (xji, Cp,i, f (ρi (xj))) for each xjvalset f
  end for
  Ep,ierror_model(A p,i)//model the distribution of error A p,i
    //(the model type depends on the type of feature i, see text)
  trainset ← (ρi (xj), xji) for each xjInline graphic
  Cp,itrain(trainset)//the “final” predictor trained on the entire
   //training setInline graphic, used to make test set predictions
end for
end for
for each test example xqdo
//output the normalized surprisal score as the sum
//over P feature prediction models.
//P(xqi) depends on C p,i (ρi (xq))) and E p,i.
output:p=1Pi=1D{0ifxqiismissing,otherwise:surprisal(P(xqiEp,i,Cp,i(ρi(xq)))entropy({x1i,,xNi})
end for