TABLE 2.
Description of the different layers and notions used in the architecture of the networks (Figure 2).
| Layers | Description |
| Convolution, N 3 × 1 filters; strides 1 × 1 | Convolutional layer (LeCun and Bengio, 1995) with N filters of size 3 × 1, i.e., one-dimensional filters of the length 3 and the convolution had a stride of length 1. The weights of convolutional filters were initialized with a Glorot normal distribution (Glorot and Bengio, 2010) |
| BatchNorm | Batch normalization is a way to speed up training and regularize the network (Ioffe and Szegedy, 2015) |
| ReLU | Rectified linear unit (Hahnloser et al., 2000), a non-linear activation function. It makes the activations of a network sparse and prevents vanishing of the gradients (Hahnloser et al., 2000) |
| Max-pooling; pool_size 2 × 1 | Max-pooling layer (Fukushima and Miyake, 1982) with pooling size 2. It takes a maximum out of every 2 elements of a tensor. Thus, the size of the resulting tensor will be reduced by a factor of 2. Max-pooling allows us to reduce the size of the vector, retain most useful information and it also has the property of shift invariance |
| Flatten | Layer which resizes the input tensor and produces a one-dimensional vector with the same number of elements |
| Dropout (p = q) | Dropout layer (Srivastava et al., 2014). It switches off a fraction q of the neurons in the previous layer in the training phase. Dropout is a good way to regularize the networks, i.e., prevent overfitting (Srivastava et al., 2014) |
| Dense (N = n) | Densely connected layer with n neurons |
| Softmax (N = n) | Densely connected layer with n neurons and a special activation function which produces a probability distribution with n values (Bishop, 2016). The sum of these values is equal to 1, n is equal to number of classes we want to predict (in our case it was 4) and every output value is the probability that the sample belongs to the corresponding class |
| LSTM (N = n) | Long short-term memory layer (Hochreiter and Schmidhuber, 1997) with the size of hidden states equal to n. It has a memory and can use information about the past to make decisions in a current timepoint |
For the parameters applied see the corresponding values in Figure 2.