FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies

. 2018 Aug 29;9(9):435. doi: 10.3390/genes9090435

Algorithm 1: FDHE-IW

Inputs: D (s₁, s₂, …, s_N_, C)—the given data set with N + 1 columns; s_i denotes the values of the ith
SNP locus for all samples.
T—the candidate size; θ—the threshold of the G-test p-value; k—the number of SNPs in a k-way
SNP combination; and K—the number to find the SNP combinations based on a seed SNP.
Outputs: SNP combinations (SC)—the k-way SNP combinations that are associated with disease status.

(1)
Initialize: $S^{0} = {s_{1}, s_{2}, \dots, s_{N}}, S C = \emptyset, k = 0, F = S^{0}$
(2)
Calculate the $S U$ for each SNP.

For i = 1 to N do

Calculate $S U (s_{i}, C), s_{i} \in S^{0}$

$W (s_{i}) \leftarrow 1$

End For

(3)
Search a k-way SNP combination based on the interaction weight.
- (3.1)
  Select a SNP locus with a maximum $S U \times W$ value.
  
  $s_{a} \leftarrow \underset{s_{i} \in F}{\arg \max} S U (s_{i}, C) \times W (s_{i}), i = 1, 2, \dots, N$
- (3.2)
  Search SNP combination based on interaction weight
  
  m = 1
  
  While m < K || // to find K k-way SNP combinations based on $s_{a}$
  
  $S \leftarrow \emptyset$
  
  $\begin{array}{l} S \leftarrow S \cup {s_{a}} \\ F \leftarrow F \ {s_{a}^{}} \\ W (s_{a}) \leftarrow 0 \end{array}$
  
  While $| S |$ < k // $| S |$ is the SNPs number in S.
  
  For i = 1 to |F| do // |F| denotes the SNPs number in F.
  
  $I W (s_{i}) \leftarrow I W F (s_{i}, s_{a}; C), s_{i} \in F$ // Calculate interaction weight between s_i and F_a:
  
  $W (s_{i}) \leftarrow W (s_{i}) \times I W (s_{i}), s_{i} \in F$ // Update the weight coefficient.
  
  // calculate relevance between $s_{i}$ and phenotype (C)
  
  $R (s_{i}, C) \leftarrow W (s_{i}) \times (1 + S U (s_{i}, C)), I = 1, 2, \dots, | S |$
  
  End For
  
  $s_{a} \leftarrow \underset{s \in F}{\arg \max R (s, C)}$ // Select out the SNP $s_{a}$ that has maximum relevance with C in F.
  
  $S \leftarrow S \cup {s_{a})$
  
  $F \leftarrow F \ s_{a}$ // remove SNP $s_{a}$ from F.
  
  End While
  
  $S C \leftarrow S C \cup {S}$ // Store the found SNP combination S into SC as a candidate solution.
  
  m = m + 1
  
  End While
(4)
If the size of SC is less than T

go to step (3) to find new k-way SNP combination that are associated with disease status.

EndIf
(5)
Statistical test

Perform G-test statistic for each SNP combination in SC.

Output the k-way SNP combinations with a p-value < θ