Table 2.
Detailed description of architecture of the stacked CNN-RNN network in our study.
Repeat Times | Layer Type | Padding Size | Stride | Filter Size | Number of Filters (Neurons) | Size of Feature Maps | Number of Parameters |
---|---|---|---|---|---|---|---|
1 | Input Layer | n/a | n/a | n/a | n/a | 5 224 224 3 | 0 |
2 | Convolution | 1 1 | 1 1 | 3 3 | 64 | 5 224 224 64 | 38,720 |
ReLU | n/a | n/a | n/a | n/a | 5 224 224 64 | 0 | |
1 | Max Pooling | n/a | 2 2 | 2 2 | 1 | 5 112 112 64 | 0 |
2 | Convolution | 1 1 | 1 1 | 3 3 | 128 | 5 112 112 128 | 221,440 |
ReLU | n/a | n/a | n/a | n/a | 5 112 112 128 | 0 | |
1 | Max Pooling | n/a | 2 2 | 2 2 | 1 | 5 128 | 0 |
4 | Convolution | 1 1 | 1 1 | 3 3 | 256 | 5 256 | 2,065,408 |
ReLU | n/a | n/a | n/a | n/a | 5 56 56 256 | 0 | |
1 | Max Pooling | n/a | 2 2 | 2 2 | 1 | 5 28 28 | 0 |
4 | Convolution | 1 1 | 1 1 | 3 3 | 512 | 5 28 28 512 | 8,259,584 |
ReLU | n/a | n/a | n/a | n/a | 5 28 28 512 | 0 | |
1 | Max Pooling | n/a | 2 2 | 2 2 | 1 | 5 14 14 512 | 0 |
4 | Convolution | 1 1 | 1 1 | 3 3 | 512 | 5 14 14 512 | 9,439,232 |
ReLU | n/a | n/a | n/a | n/a | 5 14 14 512 | 0 | |
1 | Max Pooling | n/a | 2 2 | 2 2 | 1 | 5 7 7 512 | 0 |
1 | Global Average Pooling | n/a | n/a | n/a | 1 | 5 512 | 0 |
1 | Fully Connected Layer | n/a | n/a | n/a | 1024 | 5 1024 | 525,312 |
1 | Batch Normalization | n/a | n/a | n/a | n/a | 5 1024 | 4096 |
1 | ReLU | n/a | n/a | n/a | n/a | 5 1024 | 0 |
1 | LSTM | n/a | n/a | n/a | n/a | 1024 | 8,392,704 |
1 | Dropout | n/a | n/a | n/a | n/a | 1024 | 0 |
1 | Fully Connected Layer | n/a | n/a | n/a | 2 | 2 | 2050 |
Total number of parameters: 28,948,546 Total number of trainable parameters: 28,946,498 Total number of non-trainable parameters: 2048 |