Table 2.
Evolution of CNNs since 1959. The table describes primary points of novelty that motivated new architectures to be produced.
| Architecture | Primary focus and novelty | Author and year |
|---|---|---|
| Simple and complex cells [28] | Described cells in the human cortex. | Hubel & Wiesel (1959) |
| Proposed its use case in pattern recognition. | ||
| Neocognitron [29] | Converted the cell idea from [28] into a computational model. | Fukushima (1980) |
| LeNet-5 [30] | First modern CNN. | Lecun et al. (1998) |
| Composed of two convolution layers with three fully connected layers. Introduced the MNIST database. | ||
| AlexNet [31] | Implemented overlapping pooling and ReLU [32]. | Krizhevsky et al. (2012) |
| Non-saturating neurons are used. | ||
| Facilities' effective usage of GPU-driven methods. | ||
| VGG-16 [33] | Made an exhaustive evaluation on architectures of increasing depth. | Simonyan and Zisserman (2014) |
| Used architectures with tiny (3 × 3) convolution filters. | ||
| Inception [34] | Dimensions of network are increased while keeping the computational budget constant. | Szegedy et al. (2015) |
| Utilized the Hebbian principle and multiscale processing. | ||
| Modified VGG-16 [35] | Proposed that if a model is strong enough to fit a large dataset, it can also fit to a small one. | Liu and Deng (2015) |
| ResNet [36] | Presented a residual learning framework. | He et al. (2015) |
| Allowed building larger models with deeper layers through skip connections. Paved the way for more variants [37, 38]. | ||
| Xception [39] | Presented a depth-wise separable convolution as an inception module with a maximally large number of towers. | Chollet (2016) |
| MobileNets [40] | Made for mobile and embedded vision applications. | Howard et al. (2017) |
| Streamlined architecture using depth-wise separable convolutions. | ||
| ResNeXt [41] | Presented cardinality (size of the transformation set) as a key factor along with the dimensions of an architecture. | Xie et al. (2017) |
| DenseNet [42] | Complete intra-layer connections among all singular connections in a feed-forward fashion. | Blei et al. (2017) |
| Strengthens feature propagation and encourages feature reuse. | ||
| Squeeze-and-excitation block [43] | Adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. | Hu et al. (2018) |
| Residual inception [44] | Combined residual and inception module. | Zhang et al. (2018) |
| NASNet search space [45] | Designed a new search space to enable transferability. | Zoph et al. (2018) |
| Presented a new regularization technique—scheduled drop path | ||
| EfficientNet [46] | Proposed a novel scaling technique that scales all the dimensions (width/resolution/depth) uniformly using a compound coefficient. | Tan and Le (2019) |
| Normalizer-free models [47] | Developed an adaptive gradient clipping technique to overcome instability. | Brock et al. (2021) |
| Designed a significantly improved class. |