Skip to main content
. Author manuscript; available in PMC: 2015 Sep 1.
Published in final edited form as: Trends Biochem Sci. 2014 Aug 14;39(9):381–399. doi: 10.1016/j.tibs.2014.07.002

Table 1.

Computational models of protein-DNA binding specificity and high-throughput assays for generating the data used to train and test specificity models.

(A) Computational models of protein-DNA binding specificity
Model type Model Description Refs.
Position weight matrices (PWMs) Simple probabilistic models that assume independence between positions in TF binding sites (TFBSs) [5]
Dinucleotide weight matrices (DWMs) Generalization of PWM models that incorporates frequencies of dinucleotides [73, 230]
Bayesian networks Flexible probabilistic models that can incorporate dependencies between positions in TFBSs [63]
Hidden Markov models Probabilistic models that can incorporate dependencies between neighboring positions in TFBSs [70, 231]
High-order Markov models Flexible probabilistic models that can incorporate high order dependencies between neighboring positions in TFBSs [232]
k-mer based regression models Probabilistic models that predict the level of TF binding based on the frequencies of mono-, di-, and tri-nucleotides [93, 233]
Markov networks Flexible probabilistic models that can incorporate high-order dependencies within TFBSs [72]
Neural networks Flexible probabilistic models that represent TF binding specificities using a system of interconnected, artificial “neurons” [75]
Random forest models Flexible probabilistic models that represent TF specificity using a collection of decision trees [92]
Support vector models Probabilistic models that can incorporate complex patterns of similarities between TFBSs [2, 31]
Variable-order Bayesian networks Flexible probabilistic models that can incorporate high-order dependencies within TFBSs [234]
Thermodynamic/Energy-based models Models that infer DNA binding affinities by fitting thermodynamic equations to experimental data [73, 8183, 235237]
Atomistic/Structure-based models Models based on known structures of TFs bound to target DNA sites [86, 90]
Probabilistic models that incorporate structural features Models that incorporate structural features such as groove geometries and helical parameters [2, 79, 91, 92]
Probabilistic models that incorporate in vivo data Models that incorporate in vivo data such as DNA accessibility, histone modifications [238, 239]
(B) In vivo high-throughput DNA binding assays
Assay name Assay description Refs
ChIP-chip Chromatin immunoprecipitation followed by microarray hybridization [240]
ChIP-seq Chromatin immunoprecipitation followed by high-throughput sequencing [241]
ChIP-exo Chromatin immunoprecipitation with exonuclease digestion followed by high-throughput sequencing [242]
DamID DNA adenine methyltransferase identification [243]
DNase-seq DNase I cleavage followed by high-throughput sequencing [151, 244]
FAIRE-seq Formaldehyde-assisted isolation of regulatory elements, followed by high-throughput sequencing [149]
ATAC-seq Assay for transposase-accessible chromatin using high-throughput [152]
(C) In vitro high-throughput DNA binding assays
Assay name Assay description Refs
B1H Bacterial one-hybrid [102, 245]
PBM Protein binding microarray [94, 246]
CSI Cognate site identifier [247]
MITOMI Mechanically induced trapping of molecular interactions [101, 248]
MEGAshift Microarray evaluation of genomic aptamers by shift [249]
TIRF-PBM Total internal reflectance fluorescence protein-binding microarray [103]
Bind-n-Seq Analysis of in vitro protein-DNA interactions using massively parallel sequencing [250]
SELEX-seq/HT-SELEX Systematic evolution of ligands by exponential enrichment, followed by high-throughput sequencing [1, 82, 110]
EMSA-seq Electrophoretic mobility shift assay followed by deep sequencing [95]
HiTS-FLIP High-throughput sequencing - fluorescent ligand interaction profiling [108]
gcPBM Genomic-context protein binding microarray [2]