Skip to main content
. 2022 Jun 24;10:e13613. doi: 10.7717/peerj.13613

Table 1. Overview of studies applying deep learning in genomics, segmented by their usage.

Annotation Usage Preprocessing Data Species Architecture Reference
TFBS Transfer one-hot-encoding DNA + gene expression + DNaseI cleavage human CNN + RNN Quang & Xie (2019)
DNA sequence human + mouse CNN Cochran et al. (2021)
Bio. mechanism one-hot-encoding DNA sequence human CNN Wang et al. (2018b)
human + mouse + drosophilia CNN Wang et al. (2018a)
RNA sequence human CNN Koo et al. (2018)
Syn. genomics one-hot-encoding DNA sequence human RNN + Attention Gupta & Kundaje (2019)
CNN Lanchantin et al. (2016)
TFBS + histone + chromatin accessibility Transfer one-hot-encoding DNA sequence human + mouse CNN Kelley (2020)
Bio. mechanism one-hot-encoding DNA sequence human CNN Kelley et al. (2018)
CNN Alipanahi et al. (2015)
Zhou et al. (2019)
Hoffman et al. (2019)
Zhou & Troyanskaya (2015)
Richter et al. (2020)
Syn. genomics one-hot-encoding DNA sequence human CNN Schreiber, Lu & Noble (2020)
TFBS (circRNA) Bio. mechanism one-hot-encoding RNA sequence human CNN Wang, Lei & Wu, 2019
chromatin Transfer + Bio. mechanism one-hot-encoding DNA + gene expression human CNN Nair et al. (2019)
accessibility Bio. mechanism one-hot-encoding + embedding DNA sequence human CNN Liu et al. (2018)
gene expression Transfer + Bio. mechanism one-hot-encoding DNA + TF expression level yeast CNN Liu et al. (2019)
Bio. mechanism one-hot-encoding RNA sequence 7 species CNN Zrimec et al. (2020)
Syn. genomics yeast CNN Cuperus et al. (2017)
DNA sequence Random promoters (yeast) CNN + Attention + RNN Vaishnav et al. (2021)
Bio. mechanism one-hot-encoding DNA + mRNA half-life + CG content + ORF length human CNN Agarwal & Shendure (2020)
DNA + promoter-enhancer interaction human CNN Zeng, Wang & Jiang, 2020
DNA sequence human CNN Movva et al. (2019)
gene expression + RNA splicing Syn. genomics one-hot-encoding DNA sequence human CNN Linder et al. (2020)

Note:

CNN, convolutional neural network; RNN, recurrent neural network. After the pioneering use of CNN in genomics in 2015, the methodologies have diversified according to four different aspects: the modelinputs (that may include other annotations on top of the sole DNA sequence), the sequence encoding (mainly one-hot-encoding or k-mer embedding), theneural network architecture (CNN, RNN, Attention mechanism) and the output format, which can be either binary or continuous.