Table 1. Overview of studies applying deep learning in genomics, segmented by their usage.
Annotation | Usage | Preprocessing | Data | Species | Architecture | Reference |
---|---|---|---|---|---|---|
TFBS | Transfer | one-hot-encoding | DNA + gene expression + DNaseI cleavage | human | CNN + RNN | Quang & Xie (2019) |
DNA sequence | human + mouse | CNN | Cochran et al. (2021) | |||
Bio. mechanism | one-hot-encoding | DNA sequence | human | CNN | Wang et al. (2018b) | |
human + mouse + drosophilia | CNN | Wang et al. (2018a) | ||||
RNA sequence | human | CNN | Koo et al. (2018) | |||
Syn. genomics | one-hot-encoding | DNA sequence | human | RNN + Attention | Gupta & Kundaje (2019) | |
CNN | Lanchantin et al. (2016) | |||||
TFBS + histone + chromatin accessibility | Transfer | one-hot-encoding | DNA sequence | human + mouse | CNN | Kelley (2020) |
Bio. mechanism | one-hot-encoding | DNA sequence | human | CNN | Kelley et al. (2018) | |
CNN | Alipanahi et al. (2015) | |||||
Zhou et al. (2019) | ||||||
Hoffman et al. (2019) | ||||||
Zhou & Troyanskaya (2015) | ||||||
Richter et al. (2020) | ||||||
Syn. genomics | one-hot-encoding | DNA sequence | human | CNN | Schreiber, Lu & Noble (2020) | |
TFBS (circRNA) | Bio. mechanism | one-hot-encoding | RNA sequence | human | CNN | Wang, Lei & Wu, 2019 |
chromatin | Transfer + Bio. mechanism | one-hot-encoding | DNA + gene expression | human | CNN | Nair et al. (2019) |
accessibility | Bio. mechanism | one-hot-encoding + embedding | DNA sequence | human | CNN | Liu et al. (2018) |
gene expression | Transfer + Bio. mechanism | one-hot-encoding | DNA + TF expression level | yeast | CNN | Liu et al. (2019) |
Bio. mechanism | one-hot-encoding | RNA sequence | 7 species | CNN | Zrimec et al. (2020) | |
Syn. genomics | yeast | CNN | Cuperus et al. (2017) | |||
DNA sequence | Random promoters (yeast) | CNN + Attention + RNN | Vaishnav et al. (2021) | |||
Bio. mechanism | one-hot-encoding | DNA + mRNA half-life + CG content + ORF length | human | CNN | Agarwal & Shendure (2020) | |
DNA + promoter-enhancer interaction | human | CNN | Zeng, Wang & Jiang, 2020 | |||
DNA sequence | human | CNN | Movva et al. (2019) | |||
gene expression + RNA splicing | Syn. genomics | one-hot-encoding | DNA sequence | human | CNN | Linder et al. (2020) |
Note:
CNN, convolutional neural network; RNN, recurrent neural network. After the pioneering use of CNN in genomics in 2015, the methodologies have diversified according to four different aspects: the modelinputs (that may include other annotations on top of the sole DNA sequence), the sequence encoding (mainly one-hot-encoding or k-mer embedding), theneural network architecture (CNN, RNN, Attention mechanism) and the output format, which can be either binary or continuous.