AlexNet |
2012 |
5 |
3 |
3 |
▪ First CNN architecture to win ImageNet challenge with top-5 error rate of 15.3%. |
|
|
|
|
|
▪ Used ReLU as activation function instead of tanh or sigmoid. |
|
|
|
|
|
▪ AlexNet has 60 million parameters. |
|
|
|
|
|
▪ It had used the Stochastic Gradient descent as the learning algorithm. |
VggNet-16 |
2014 |
13 |
5 |
3 |
▪ The model achieved 92.7% top-5 test accuracy in ImageNet challenge. |
|
|
|
|
|
▪ The model replaces the the large sized kernals used in AlexNet with 3✕3 sized multiple kernals enabling better learning. |
|
|
|
|
|
▪ Main con of this network is that it is slow to train. |
|
|
|
|
|
▪ And network architecture weights are quite large. |
ResNet |
2015 |
17 with 8 residual units |
2 |
1 |
▪ Main building blocks are residual blocks that increase the performance of the network. |
|
|
|
|
|
▪ The identity connection helps the network to handle vanishing gradient problem. |
|
|
|
|
|
▪ The batch normalisation used by network mitigates the problem of covariant shift. |
|
|
|
|
|
▪ ResNet 18 has residual blocks of two layers deep. |
VggNet-19 |
2014 |
16 |
5 |
3 |
▪ Has 3 additional convolutional layers than that of Vgg-16. |
|
|
|
|
|
▪ Deep network is believed to train more efficiently. |
|
|
|
|
|
▪ Requires more memory than Vgg-16. |