Skip to main content
. 2018 Dec 24;19(1):59. doi: 10.3390/s19010059

Figure 3.

Figure 3

The recurrent three-dimensional convolutional architecture from [23]. As input, the network uses a dynamic gesture in the form of successive frames. It extracts local spatio-temporal features via a 3D Convolutional Neural Network (CNN) and feeds those into a recurrent layer, which aggregates activation across the sequence. Using these activations, a softmax layer then outputs probabilities for the dynamic gesture class.