. 2021 Mar 24;15:564098. doi: 10.3389/fnins.2021.564098

TABLE 2.

Description of the different layers and notions used in the architecture of the networks (Figure 2).

Layers	Description
Convolution, N 3 × 1 filters; strides 1 × 1	Convolutional layer (LeCun and Bengio, 1995) with N filters of size 3 × 1, i.e., one-dimensional filters of the length 3 and the convolution had a stride of length 1. The weights of convolutional filters were initialized with a Glorot normal distribution (Glorot and Bengio, 2010)
BatchNorm	Batch normalization is a way to speed up training and regularize the network (Ioffe and Szegedy, 2015)
ReLU	Rectified linear unit (Hahnloser et al., 2000), a non-linear activation function. It makes the activations of a network sparse and prevents vanishing of the gradients (Hahnloser et al., 2000)
Max-pooling; pool_size 2 × 1	Max-pooling layer (Fukushima and Miyake, 1982) with pooling size 2. It takes a maximum out of every 2 elements of a tensor. Thus, the size of the resulting tensor will be reduced by a factor of 2. Max-pooling allows us to reduce the size of the vector, retain most useful information and it also has the property of shift invariance
Flatten	Layer which resizes the input tensor and produces a one-dimensional vector with the same number of elements
Dropout (p = q)	Dropout layer (Srivastava et al., 2014). It switches off a fraction q of the neurons in the previous layer in the training phase. Dropout is a good way to regularize the networks, i.e., prevent overfitting (Srivastava et al., 2014)
Dense (N = n)	Densely connected layer with n neurons
Softmax (N = n)	Densely connected layer with n neurons and a special activation function which produces a probability distribution with n values (Bishop, 2016). The sum of these values is equal to 1, n is equal to number of classes we want to predict (in our case it was 4) and every output value is the probability that the sample belongs to the corresponding class
LSTM (N = n)	Long short-term memory layer (Hochreiter and Schmidhuber, 1997) with the size of hidden states equal to n. It has a memory and can use information about the past to make decisions in a current timepoint

For the parameters applied see the corresponding values in Figure 2.