Skip to main content
. 2019 Nov 21;11:71. doi: 10.1186/s13321-019-0393-0

Fig. 2.

Fig. 2

Architecture of the RNN model used in this study. For every step i, input one-hot encoded token Xi goes through an embedding layer of size mw, followed by l>0 GRU/LSTM layers of size w with dropout in-between and then a linear layer that has dimensionality w and the size of the vocabulary. Lastly a softmax is used to obtain the token probability distribution Yij. Hi symbolizes the input hidden state matrix at step i