Skip to main content
. Author manuscript; available in PMC: 2021 Apr 7.
Published in final edited form as: Stat Anal Data Min. 2021 Feb 5;14(2):144–167. doi: 10.1002/sam.11498

Algorithm 2.

Sidification

1: procedure Sidification((Xi)i=1n, δ = 1
2:  Translate each continuous feature so that they are positive and all have the same maximum value (note that the minimum value can differ over variables)
3:  Order the variables in terms of their range with variables with largest range appearing first. This applies only to continuous variables (factors are placed randomly at the end)
4:  Convert any categorical variable with more than two categories to a set of zero-one dummy variables with one for each category
5:  Add δ to the first variable
6:  for number of input variables, excluding the first do
7:   Add δ plus the maximum of the previous input variable to the current variable
8:  end for
9:  (Xi)i=1n have now been staggered to (Yi)i=1n=(ϕn(Xi))i=1n the main SID features
10:  for all pairs of main SID features (from Line 9) do
11:   if a pair consists of two dummy variables then
12:    Interaction is a four level factor for each dummy variable combination
13:   else
14:    Create interaction variable by multiplying them
15:   end if
16:  end for
17:  This yields (Zi)i=1n=(ψn(ϕn(Xi))i=1n the SID interaction features
18: end procedure
19: return LnS=(Yi,Zi)i=1n=(ϕn(Xi),ψn(ϕn(Xi)))i=1n the sidified data