Identification of Insertion Deletion Mutations from Deep Targeted Resequencing

. Author manuscript; available in PMC: 2014 Feb 7.

Published in final edited form as: J Data Mining Genomics Proteomics. 2013 Jul 2;4(3):1000132. doi: 10.4172/2153-0602.1000132

First step

Inputs: X, window sizes w_d, w_m, and w_f, threshold f_THRESH, sequence data S Outputs: SUSPECT, the set of candidate indel locations

For k=w _f + 1, …,T − w_f, where T is the length of the reference sequence, compute the

function f(k) as defined above.

Let SUSPECT ←{k : f (k) > f_THRESH}

Second step

Inputs: X, SUSPECT, S₀, M

Outputs: INDEL, the set of indel locations

INDEL ← EMPTY

For all k ∈ SUSPECT do testseq ←

X_{k - 40}^{k + 40}

For all y_i ∈ S do

Align y_i and testseq using Smith-Waterman (SW) algorithm.

end for

if more than M y_i align to testseq with less than 2 mismatches and with shared indel interval [p, q], then

INDEL ← INDEL ∪ {[p, q]}

end if end for