Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2016 Jan 31.

Published in final edited form as: Nat Genet. 2015 Jun 15;47(8):955–961. doi: 10.1038/ng.3331

[left] The first step in calculating deltaSVM is to train a gkm-SVM classifier using a positive training set of putative regulatory sequences (identified by DNase I hypersensitivity, for example) and a negative training set of matched negative control sequences. The gkm-SVM generates a regulatory sequence vocabulary – a weighted list of all possible 10-mers, where each 10-mer receives an SVM weight that quantifies its contribution to the prediction of regulatory function. [right] After training, this regulatory sequence vocabulary can be used to score the predicted impact of any sequence variant on regulatory activity, as shown here for a single nucleotide substitution in a melanocyte enhancer of the Tyrp1 enhancer.