Skip to main content
. Author manuscript; available in PMC: 2014 Feb 7.
Published in final edited form as: J Data Mining Genomics Proteomics. 2013 Jul 2;4(3):1000132. doi: 10.4172/2153-0602.1000132
First step
 Inputs: X, window sizes wd, wm, and wf, threshold fTHRESH, sequence data S Outputs: SUSPECT, the set of candidate indel locations
 For k=w f + 1, …,Twf, where T is the length of the reference sequence, compute the
 function f(k) as defined above.
 Let SUSPECT ←{k : f (k) > fTHRESH}
Second step
 Inputs: X, SUSPECT, S0, M
 Outputs: INDEL, the set of indel locations
 INDEL ← EMPTY
For all kSUSPECT do testseqXk40k+40
For all yiS do
 Align yi and testseq using Smith-Waterman (SW) algorithm.
end for
if more than M yi align to testseq with less than 2 mismatches and with shared indel interval [p, q], then
 INDEL ← INDEL ∪ {[p, q]}
  end if end for