Skip to main content
. 2019 Nov 7;17:1415–1428. doi: 10.1016/j.csbj.2019.09.009

Table 1.

Tools similar to variation-scan with available implementation. PM stands for Pattern Matching, ML stands for Machine Learning.

Name PMID Source Approach Organism Input Output Matrix flexibility Type Last update
deltaSVM 26075791 http://www.beerlab.org/deltasvm/ Gapped k-mer SVM classifier. Any organism DNaseI-seq data; putative regulatory regions as positive training set and randomized sequences as negative training set. deltaSVM, predicted impact of a variant in chromatin accessibility which is measured by adding up the contribution of all 10-mers in which the SNP is present for chromatin accessibility. It can only be trained for one TF at a time. ML, non-static. Last update Sept 2015.
DeepSea 26301843 http://deepsea.princeton.edu/job/analysis/create/ Deep convolutional network. Human SNPs in VCF format. Chromatin feature probabilities for reference and alternative alleles, chromatin feature probability log fold changes for each variant, chromatin feature probability differences for each variants, e-values for chromatin feature effects, functional significance score for each variant. There are 919 chromatin features evaluated. It contains 690 TF binding profiles for 160 different TFs, but does not support the addition of new matrices. ML, non-static. Last update May 2017.
atSNP 26092860 https://github.com/keleslab/atSNP Importance sampling algorithm for p-value calculation, first-order Markov Model to generate random background sequences. Any organism whose genome is included in the Bioconductor BSGenome package. SNP list, motif file. p-value for binding affinity with alternative and reference allele, p-value for binding affinity change based on log-likelihood ratio and log-rank ratio. It also provides composite logo plots for directly visualizing the SNP effects on motif matches. It accepts several matrices, and several different formats. It includes a motif library of 2,065 PSSMs from ENCODE and JASPAR, but also allows user-defined motif libraries. PM, non-static. Last update Nov 2018.
BayesPI-BAR 26202972 http://folk.uio.no/junbaiw/BayesPI-BAR/ Biophysical modeling of protein-DNA interaction, estimation of TF chemical potential (through a bayesian nonlinear regression model) and differential binding affinity. Any organism ChIP-seq experiment for TFs to be tested, DNA sequences for selected SNPs,PSSMs for selected TFs. Given a SNP and a PSSM list, it produces two lists sorted by significance: one composed of binding motifs disrupted by the SNP, and one by sites with an increased affinity to the TF caused by the SNP. Can use several PSSMs simultaneously. PM, biophysical modeling.Non-static. No updates listed, software created July 2015.
GWAS4D 29771388 http://mulinlab.tmu.edu.cn/gwas4d/gwas4d/gwas4d/gwas4d_server Variant prioritization method, followed by an integrative analysis of genome-wide association. Human Accepts VCF-like, coordinate only, dbSNP ID and PLINK-like formats. Regulatory variant prioritization table: includes the most likely affected motif by alternative variant effect. The model includes motifs of 1,480 transcriptional regulators from 13 different resources. It is not possible to upload user-specified matrices. PM, static Last update Sept 2018.
sTRAP 20127973 http://trap.molgen.mpg.de/cgi-bin/home.cgi Prediction of local binding affinity followed by a normalization of binding affinities to determine difference between reference allele and SNP. Organisms available in TRANSFAC. Accepts only two sequences in FASTA format. List of TFs ranked according to changes induced by the SNP. There is no option for user-specified matrices, matrices from TRANSFAC versions can be selected. PM, non-static No updates listed, software created in 2011.
SNP2TFBS 27899579 https://ccg.epfl.ch//snp2tfbs/ Estimation based on PSSM model. Human. When working with the code, the input required is the reference genome, a SNP catalogue and a PSSM collection.
The web interface accepts SNP IDs and VCF format, as well as a specification of a genomic region through a bed file or by specifying the start and end positions.
List of affected TFBSs, sorted by the magnitude of the effects. On the web interface, only matrices from JASPAR can be used. Nonetheless, it is possible to download the code used to generate the database and use a different input. PM, static. Last update July 2017.
atSNP Search 30534948 http://atsnp.biostat.wisc.edu/ Used atSNP algorithm with dbSNP build 144 for human genome assembly 38 against JASPAR and Encode motifs to create a repository with all the SNP-motif combinations resulting from the previous resources. Human. It can receive a set of rsIDs, a rsID and a window size around the SOI, genomic coordinates, a gene symbol and a window size around the gene of interest, or a TF name. Table including p-values for motif matches for both reference and alternate alleles, as well as the change in the motif matching and the direction of said change. Output includes logo plots, displaying the sequence logos aligned to best motif matches with reference and SNP alleles. Only JASPAR or ENCODE matrices can be selected, and it is possible to select only one transcription factor at a time. PM, static. Last update Jan 2018.
HaploReg 22064851,
26657631
https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php It contains data from multiple genome annotation resources. PSSMs are scored against reference and alternative alleles, and change in log-odds is calculated. Human Users can provide a list of rsIDs or chromosome regions. Users can also select GWAS studies from the NHGRI catalog. Provides data on allelic frequencies, conservation, chromatin states, and near genes. For each of the regulatory motifs altered by the SNP, it provides the change in log-odds and a logo. HaploReg contains a library created from literature sources, TRANSFAC, JASPAR and PBM experiments. There is no option for user-specified matrices. PM, static. Last update November 2015.
RegulomeDB 22955989 http://www.regulomedb.org/ RegulomeDB uses information from several datasets, as well as manual curation and a heuristic method to distinguish between functional and non-functional variants. Human. Users can provide a list of dbSNP IDs, hg19 coordinates in BED, VCF or GFF3 format, or hg19 chromosomal regions in the same formats. Table sorted by likely functionality, containing variant coordinates, score assigned by the algorithm, and evidence of function including protein binding, motifs, chromatin structure, eQTLs and histone modifications. RegulomeDB includes all PSSMs from TRANSFAC, JASPAR CORE, and UniProbe. There is no option for user-specified matrices. PM, static. No updates, listed, software created in Sept 2012.
motifbreakR 26272984 https://github.com/Simon-Coetzee/MotifBreakR It has three options of algorithms: the standard sum of log probabilities, weighted sum, and an information content method. Organisms included in BSgenome. SNPs can be imported from an R package or provided to the algorithm in BED or VCF format. PSSMs can be selected from the MotifDb package or be user-specified. Table containing statistics describing the percent of maximum score for a matrix and matrix values for both alleles, as well as the strand. It also reports whether the TFBS is disrupted strongly or weakly. PSSMs can be imported from the MotifDb package or be user-specified. More than one matrix can be used at a time. PM, non-static. Last update Jul 2018.
variation-scan http://rsat.eu Estimation based on PSSM model. web interface: installed Ensembl organisms.
command-line: any locally installed organism.
A collection of PSSMs and a set of variants in varSeq format. This format can be obtained using retrieve-variation-seq. A table with one line per pair of alleles per motif (if there are more than two, there will be one line per possible pair) reporting the position, weight and p-value of each allele, weight difference and p-value ratio. Users can select for the collections available in RSAT (JASPAR, HOCOMOCO, CisBP), but they can also use personal collections. PM-non static. April 2019.