Development of a compressed FCN architecture for semantic segmentation using Particle Swarm Optimization

Mohit Agarwal; Suneet K Gupta; K K Biswas

doi:10.1007/s00521-023-08324-3

. 2023 Feb 3;35(16):11833–11846. doi: 10.1007/s00521-023-08324-3

Development of a compressed FCN architecture for semantic segmentation using Particle Swarm Optimization

Mohit Agarwal ^1,^✉, Suneet K Gupta ¹, K K Biswas ¹

PMCID: PMC9897161 PMID: 36778195

Abstract

Researchers have adapted the conventional deep learning classification networks to generate Fully Conventional Networks (FCN) for carrying out accurate semantic segmentation. However, such models are expensive both in terms of storage and inference time and not readily employable on edge devices. In this paper, a compressed version of VGG16-based Fully Convolution Network (FCN) has been developed using Particle Swarm Optimization. It has been shown that the developed model can offer tremendous saving in storage space and also faster inference time, and can be implemented on edge devices. The efficacy of the proposed approach has been tested using potato late blight leaf images from publicly available PlantVillage dataset, street scene image dataset and lungs X-Ray dataset and it has been shown that it approaches the accuracies offered by standard FCN even after 851× compression.

Keywords: FCN architecture, Semantic segmentation, Particle Swarm Optimization, Optimization, Compression and acceleration, Disease segmentation

Introduction

Semantic segmentation of images has wide range of applications in diverse fields such as road scene segmentation to identify traffic for self-driven cars, CCTV footage segmentation to count the number of people at any instant of time, medical images segmentation to figure out the diseased portion in the presented samples. Recently several convolution networks have been proposed which can segment the different parts of images after training the network using original images and their corresponding segmented ground truth images. Badrinarayan et al. [7] have proposed an encoder decoder architecture which helps in semantic segmentation with high mean IoU accuracy. Ronneberger et al. [44] have also evolved an architecture called U-Net which has a contracting path and corresponding expanding path which are made of pooling and upsampling layers, respectively. This gives the network a U shape. Long et al. [34] have proposed Fully Convolutional Network architectures (FCN) based on AlexNet, VGG16 and GoogleNet for pixel-wise segmentation of the images. Kaymak et al. [29] have used FCN-AlexNet, FCN-8, FCN-16 and FCN-32 in medical domain to segment regions of skin lesion to detect skin cancer. They demonstrated that it achieved better accuracy than other state-of-the-art methods such as UNet and SegNet. It has been shown [9, 21, 42, 45] that these models have high mean Intersection over Union (mIoU) accuracies for segmenting street scenes, medical images, etc.

Generally these models replace the dense layers at the end of the network by convolution layers resulting in huge network sizes. Although they carry out efficient semantic segmentation, due to the huge size their deployment on edge devices having very less memory, such as mobile phones and raspberry pi becomes extremely difficult. Hence in this research a study has been carried out if these models can be compressed without loosing their performance evaluation metric.

Several research works in past few years have tried to utilize various ways to compress CNN models such singular value decomposition, knowledge transfer, matrix factorization, pruning, etc. [6, 23, 24, 31, 33, 54]. In very recent years evolutionary approaches using genetic algorithm, etc., have also been experimented to compress these CNN models [46, 51, 52]. Beheshti and Johnsson [8] have demonstrated creation of a squeezed version of UNet which needed 12.08 times less memory and also accelerated the inference time by 1.48 times without loosing the performance metric.

Holliday et al. [25] have used an ensemble of FCN8, FCN16, FCN32, FCN-ResNets, FCN-GoogLeNet for semantic segmentation and found that it gives better results than any of the individual models but since its size is very huge, the authors have used it as a teacher model and trained a student model such FCN-ResNet152 or FCN8 using knowledge transfer and found that this helped in compressing the ensemble model with similar accuracy which was more than individual model. Nguyen et al. [41] have used Genetic Algorithm to evolve the best architecture of FCN by adding or dropping convolution layers, pooling layers, and Conditional Random Field-Gated Recurrent Unit (CRF-GRU) layers. This helped the authors to increase the accuracy of semantic segmentation, tested on PASCAL VOC 2012 dataset, by around 9-10%. Calisto and Lai-Yuen [11] have demonstrated that multi-objective evolutionary-based algorithms (MEA) can be used to work out the best ensemble of 2-D and 3-D FCNs for semantic segmentation of 3-D medical images in prostrate segmentation. 2-D FCNs were used for extracting intraslice information and 3-D FCN for interslice information. The multi-objective evolutionary algorithm helps to find best model which maximizes accuracy and minimizes number of parameters in the network.

Chen et al. [13] have proposed the compression of 3D UNet by converting time dimensionality of 3D UNet to frequency domain and reduce the number of parameters. Therefore, 3D Unet needs the lower computational cost as parameters are comparatively less after compression. Authors have tested the compressed model on Brain Tumour Segmentation (BRATS) dataset and demonstrated that compressed 3D UNet has dice score 0.7832, whereas uncompressed 3D UNet has dice score of 0.7974. Moreover, compressed model achieved the speedup ratio of 2x. Mohan et al. [38] have used quantization techniques to compress SqueezeNet for face mask detection due to the COVID-19 outbreak. Authors found that using quantization SqueezeNet can be compressed from 3.84 to 386 KB and accuracy improved from 98.93 to 98.99%. Skandha et al. [48] have shown compression of CNN models with the help of Genetic algorithm for lung disease classification. Authors have demonstrated that on LIDC-IDRI lung dataset, proposed CNN can be reduced by 90.3% in size while maintaining the performance. Yar et al. [53] have also used Differential Evolution for compression of attention based InceptionV3 for Fire images classification.

Aim and novelty of the study

In the proposed study, our objective is to compress the FCN architecture so that it can be easily deployed on resource constrained devices such as raspberry-pi, jetson nano, and mobile phones. Such kind of compressed models are extremely required in present times as such devices are very commonly employed in diverse areas such as agriculture, security surveillance, health care, and IoT devices.

Although lots of papers are available in the area of CNN compression, so far no work has been reported regarding compression of FCN architecture. The novelty of proposed work is to employ meta-heuristic-based approach for FCN compression. Among various methods available such as Genetic Algorithm, Differential Evolution, Particle Swarm Optimization (PSO) and Whale Optimization, the PSO method is reported to be better in terms of high calculation accuracy as well as it carries out global and local search simultaneously.

The novelty of the proposed approach is that compression of the FCN model is carried out without compromising the mean Intersection over Union metric.

The other sections of this paper are organized as follows: The proposed model is explained in Sect. 3. In Sect. 4, there is a discussion on experimental setup and obtained results followed by discussion on real-time deployment of compressed model on mobile device in Sect. 5. The conclusion and future scope is discussed in Sect. 6.

Proposed model

VGG16-based FCN is a popular fully convolution layers type deep learning network (refer Fig. 1) which consist of 15 convolution layers. The first two layers are same as first two layers of VGG16 having 64 filters each. This is followed by set of two convolution layers with 128 filters each, set of three convolution layers each having 256 filters, and finally a set of two convolution layers each having 512 filters. Each of these sets have a max-pooling layer in the end. These convolution layers are followed by two fully connected convolution layers with 4096 filters each. At the end an upsampling layer is present with a final softmax layer to give pixel-wise segmented output of same size as input image.

In next subsection a brief introduction to PSO is provided which is being used for compressing FCN models.

Introduction to PSO

Particle Swarm Optimization (PSO) algorithm developed by Eberhart and Kennedy in 1995 [17] is inspired from natural biology and mimics birds flocking. PSO flowchart is presented in Fig. 2.

Fig. 2 — Particle Swarm Optimization flowchart

In this algorithm given set of particles are randomly assigned initial start points and as algorithm progresses the particles update their position based on their velocity. In each iteration particles update their velocity based on their previous iteration velocity and a component of particles best direction and a component of global best direction. In each iteration fitness value of particle position is calculated and its best and global best is updated. The main equations governing PSO are given in Eqs. (1 and 2).

\begin{matrix} x_{k + 1}^{i} = x_{k}^{i} + v_{k + 1}^{i} \end{matrix}

\begin{matrix} v_{k + 1}^{i} = w_{k} \times v_{k}^{i} + c_{1} \times r_{1} \times (b_{k}^{i} - x_{k}^{i}) + c_{2} \times r_{2} \times (b_{k}^{g} - x_{k}^{i}) \end{matrix}

Here $x_{k}^{i}$ is the position of the $i th$ particle in $k th$ iteration. $v_{k}^{i}$ is particles velocity, $b_{k}^{i}$ is best individual particle position, $b_{k}^{g}$ is best swarm position. $c_{1}, c_{2}$ are cognitive and social parameters. $r_{1}, r_{2}$ are random numbers between 0 and 1. $w_{k}$ is the inertia weight introduced by Shi and Eberhart [47].

Algorithm 1 describes the basic PSO operation.

Application of PSO for FCN compression

We propose to use Particle Swarm Optimization (PSO) [17] for FCN model compression by keeping the most dominant neurons and discarding the redundant neurons at each hidden layer.

In our case we assume that a unit vector x represents the nodes/filters at any particular layer. To start with, this is converted into a binary vector with some of the elements being randomly assigned 0 value, suggesting that the corresponding nodes/filters are to be dropped from this layer.

A concatenation of particles vector at each layer of FCN is shown in Fig. 3. The initialization of particles is carried out in algorithm 2. The formation of new particle velocity using original particle velocity (inertial component), particle best and swarm best is shown in Fig. 4. The calculation for new particle position is shown in Fig. 5. After adjusting particle velocities the new particle position results in real values between 0 and 1 for each particle position. Since the aim is to get binary values to decide whether a specific node/filter is to be retained, the values are converted into binary values 0 and 1 by rounding off the real values. This is shown in algorithm 3.

Fig. 4 — Calculation of velocity for next iteration using original particle velocity, particle best and swarm best components

Fig. 5 — Calculation of velocity and new particle position based on previous velocity, position, particles best and swarm best positions

Selection in PSO for FCN compression

The selection criteria uses a combination of retained neurons fraction $γ_{j}$ at layer j and mIoU performance metric $ζ$ at any compression step. $ζ$ is calculated using pixel-wise accuracy of test dataset after removal of certain neurons based on the particle vector 0 elements. The fitness function is created as below:

\begin{matrix} m a x (w \times (1 - γ_{j}) + (1 - w) \times ζ) subject to γ_{j} \leq 1 \end{matrix}

where w is the weight-age given to achieve a trade-off between dual objective of reducing neurons and maintaining performance metric of pixel-wise accuracy. Experiments were performed by varying value of w to achieve different compression and mIoU values. To maintain the model performance more weight-age was given to second term in fitness function by keeping value of w less than 0.3. The PSO compression steps are repeated unless performance was reduced by more than a predetermined threshold value.

Experimental setup and results

NVIDIA DGX v100 machine was utilized to perform FCN training and compression experiments with python programming language. FCN was compressed using PSO algorithm and tested on three datasets namely: street scene images [1], lungs X-Ray dataset [2] and potato late blight leaf images from PlantVillage dataset [26]. Compression was done on FCN8 and FCN16 for street scene images and similarly FCN32 was compressed for leaf and lung images. The various parameters used in PSO algorithm are given in Table 1.

Table 1.

Parameters used for PSO Algorithm

S. no.	Parameter	Value
1	Social constant ( $c_{1}$ )	1
2	Cognitive constant ( $c_{2}$ )	2
3	Inertial component weight ( $w_{k}$ )	0.5
4	Number of particles	15
5	Maximum iterations	30

Open in a new tab

Compression of FCN8 on street scene images

The street scene images dataset is having 13 classes and after applying FCN8 an accuracy of 80.23% has been achieved with 524971 KB storage space. Moreover, we have applied proposed compression method on FCN8 and achieved compression of 95.52 times by compromising the mIoU $< 2 %$ . The stepwise compression statistics for FCN8 is presented in Table 3, and predicted outputs are compared with ground truth images through visual representation in Fig. 6. Moreover, performance comparison among FCN8, compressed FCN8, UNet and SegNet is also depicted in Fig. 6. We have also applied proposed method on FCN16 for compression purpose and compression statistics is presented in Table 2.

Table 3.

Performance and size statistics for different compression steps of FCN8 on street scene images

Compression step	mIoU (After compression)(%)	Compressed size (KB)	Times compression
Initial	80.23	524971	1×
1	78.10	173568	3.02×
2	77.98	65097	8.06×
3	78.30	25275	20.77×
4	78.70	11507	45.62×
5	78.75	5554	94.52×

Open in a new tab

Fig. 6 — Sample street scene images, their segmented ground truth images, generated outputs of FCN8, generated output of compressed FCN8, output of UNet and output of SegNet

Table 2.

Performance and size statistics for different compression steps of FCN16 on street scene images

Compression step	mIoU (After compression) (%)	Compressed size (KB)	Times compression
1	70.63	171890	3.05×
2	68.90	58869	8.91×
3	69.77	22938	22.83×
4	70.16	9874	53.14×
5	70.12	4538	115.62×

Open in a new tab

Original FCN16 size: 524726 KB

Compression of FCN32 on leaf images

There has been considerable interest in plant disease classification using leaf images through deep learning techniques [28, 36, 39]. More recently, researchers have been trying to segment the leaf images in order to work out extent of diseases [19, 27, 30, 32, 55].

In our work, compressed version of FCN32 was applied on images of potato leaves affected by late blight disease, and performance was compared with various non-compressed Deep learning models. Sample leaf images are shown in Fig. 7. Ground truth segmented images were created by using HSV transform of images and checking Hue value range for green, yellow and brown colors. The leaf green part was made in ground truth images, disease part which was yellow or brown was made brown and background was made black as shown in second row of Fig. 7. The third row of Fig. 7 shows the results of applying original uncompressed FCN32, the predicted output of compressed FCN32 is presented in fourth row. For sake of comparison predicted outputs of UNet and SegNet are presented in fifth and sixth rows. Visual comparison shows that segmentation by compressed FCN32 is as good as by other architectures.

Fig. 7 — Sample potato late blight leaf images from PlantVillage dataset, their segmented ground truth images, generated outputs of FCN32, generated output of compressed FCN32 and comparison with UNet and SegNet outputs

The original FCN32 model needed 524,566 KB storage space and resulted in 89.17% mIoU. It was observed that there was a drastic drop in storage space required as the model went through various PSO compression steps. Figure 8 shows stepwise application of PSO algorithm to compress FCN model iteratively till performance is within desired range.

Fig. 8 — Flowchart of FCN compression process

Table 4 summarizes the achieved mIoU and required storage at each compression step. It is also observed from Table 4 that after 10 steps, size of FCN32 was reduced to 616 KB with hardly any degradation in mIoU. The visual comparison of ground truth and predicted output is depicted in Fig. 7.

Table 4.

Performance and size statistics for different compression steps of FCN32 on leaves dataset

Compression step	mIoU (After compression) (%)	Compressed size (KB)	Times compression
Initial	89.17	524566	1×
1	88.10	149520	3.50×
2	87.98	50656	10.35×
3	88.30	20286	25.85×
4	88.70	8900	58.94×
5	88.59	4051	129.49×
6	88.32	2043	256.76×
7	88.64	1064	493.01×
8	88.88	805	651.34×
9	88.70	672	780.60×
10	88.42	616	851.56×

Open in a new tab

The results of FCN32 and compressed FCN32 are similar to other segmentation methods such as UNet and SegNet. As far as comparison of storage space was concerned it was found that UNet needs 121,335 KB and SegNet needs 115,283 KB whereas compressed FCN32 needs only 616 KB and attain very close mIoU performance.

A comparison of number of convolution filters during steps 1, 4, 7, and 10 is provided in Table 5. However, a condition was added to stop compressing a layer further if its filter reduces to less than or equal to 32 to keep enough filters in a layer to pass the features. Moreover, after compression of FCN32 model the number of neurons reduces considerably leading to reduction in mathematical operations needed, which accelerated the inference time by 1.68 times on test dataset.

Table 5.

Comparison of FCN filter numbers during various compression iterations for leaves dataset

Layer	Filters	Step 1	Step 4	Step 7	Step 10
1	64	53	30	30	30
2	64	48	31	31	31
3	128	86	39	27	27
4	128	92	38	29	29
5	256	171	63	31	31
6	256	175	60	32	32
7	256	159	56	29	29
8	512	287	85	38	29
9	512	304	84	42	32
10	512	316	87	33	22
11	512	313	92	36	27
12	512	307	92	39	31
13	512	309	88	33	25
14	4096	2245	455	142	35
15	4096	2205	450	117	46
Size (KB)	524566	149520	8900	1064	616

Open in a new tab

Compression of FCN32 on medical dataset of lungs X-rays

The Lungs X-Ray dataset [2] has two classes. On training FCN32 on lungs dataset the storage space needed was 524,550 KB and mIoU was 97.12%. After applying proposed compression method, the size and mIoU in each compression step is presented in Table 6. The predicted outputs are compared with ground truth images through visual representation in Fig. 9. As seen from Fig. 9, even after compressing the model by 808.24 times the segmentation results were similar to original FCN32 model. The results of FCN32 and compressed FCN32 on this dataset are similar to other segmentation methods such as UNet and SegNet.

Table 6.

Comparison of performance and size for different compression steps of compressing FCN8 on lungs X-Ray dataset

Compression step	mIoU (After compression) (%)	Compressed size (KB)	Times compression
Initial	97.12	524550	1×
1	96.79	173569	3.02×
2	96.89	63434	8.27×
3	97.05	25943	20.21×
4	97.03	10541	49.76×
5	96.97	4978	105.37×
6	96.85	2343	223.87×
7	96.84	1499	349.93×
8	96.78	999	525.07×
9	96.70	804	652.42×
10	96.85	726	722.52×
11	96.95	649	808.24×

Open in a new tab

Fig. 9 — Sample images of lungs X-Ray dataset, ground truth annotations of same images, predicted outputs of FCN32, predicted output of compressed FCN32 and comparison with UNet and SegNet outputs

Actual Implementation

Encouraged by the success of PSO-based model compression of FCN architecture, it was decided to try it out on an edge computing device such as an iPhone. The compressed model was ported on an iPhone to see its performance on the leaf dataset. Figure 10 shows a sample input image, python code and output image as seen on the iPhone.

Fig. 10 — Sample input image, compressed FCN code and output image on an iPhone

Conclusion

This paper has proposed a methodology for compressing FCN architecture using Particle Swarm Optimization, so that it can be deployed on edge devices. The performance of compressed model has been tested on a number of publicly available datasets and results compared with uncompressed models. It has been shown on leaves dataset that less storage of space of around 851.56 times is required along with acceleration of model by 1.68 times. Similarly compression of 94.52 times is achieved on street scene images without loss in performance metric. Compression on medical dataset of lungs images, the achieved compression is 808.24 times with minimal change in mIoU.

This research can be further extended by using different meta-heuristic-based approaches such as Differential Evolution and Whale Optimization to figure out the most suitable method with higher compression but maintaining same level of accuracy. This research can be used to develop IoT-based robotic devices to deploy FCN for automobiles, agricultural or medical needs for semantic segmentation of needed image portions at run time. Finally this research can be extended to other models such as SegNet, UNet, YOLO, VNet, and WNet.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Mohit Agarwal, Email: mohit.agarwal@bennett.edu.in.

Suneet K. Gupta, Email: suneet.gupta@bennett.edu.in

K. K. Biswas, Email: kbiswas11@gmail.com

References

1.Street scene images dataset (2007) http://mi.eng.cam.ac.uk/research/projects/ VideoRec/CamSeq01/
2.Lungs x-ray dataset (2018) https://www.kaggle.com/eduardomineo/ lung-segmentation-of-rsna-challenge-data/data
3.Agarwal M, Gupta S, Biswas K. A new conv2d model with modified relu activation function for identification of disease type and severity in cucumber plant. Sustain Comput Informat Syst. 2020;30:100473. doi: 10.1016/j.suscom.2020.100473. [DOI] [Google Scholar]
4.Agarwal M, Gupta SK, Biswas K. Development of efficient CNN model for tomato crop disease identification. Sustain Comput Inform Syst. 2020;28:100407. [Google Scholar]
5.Alqazzaz S, Sun X, Yang X, Nokes L. Automated brain tumor segmentation on multi-modal MR image using segnet. Comput Vis Media. 2019;5(2):209–219. doi: 10.1007/s41095-019-0139-y. [DOI] [Google Scholar]
6.Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 2017;13(3):1–18. doi: 10.1145/3005348. [DOI] [Google Scholar]
7.Badrinarayanan V, Kendall A, Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]
8.Beheshti N, Johnsson L (2020) Squeeze u-net: a memory and energy efficient image segmentation network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 364–365
9.Bi L, Feng D, Kim J. Dual-path adversarial learning for fully convolutional network (fcn)-based medical image segmentation. Vis Comput. 2018;34(6):1043–1052. doi: 10.1007/s00371-018-1519-5. [DOI] [Google Scholar]
10.Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett. 2009;30(2):88–97. doi: 10.1016/j.patrec.2008.04.005. [DOI] [Google Scholar]
11.Calisto MB, Lai-Yuen SK. Adaen-net: an ensemble of adaptive 2D–3D fully convolutional networks for medical image segmentation. Neural Netw. 2020;126:76–94. doi: 10.1016/j.neunet.2020.03.007. [DOI] [PubMed] [Google Scholar]
12.Chakraborty UK. Advances in differential evolution. Heidelberg: Springer; 2008. [Google Scholar]
13.Chen H, Wang Y, Shu H, Tang Y, Xu C, Shi B, Xu C, Tian Q, Xu C (2020) Frequency domain compact 3d convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1641–1650
14.Chen T, Cai Z, Zhao X, Chen C, Liang X, Zou T, Wang P. Pavement crack detection and recognition using the architecture of segnet. J Ind Inform Integr. 2020;18:100144. [Google Scholar]
15.Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282
16.Choudhary T, Mishra V, Goswami A, Sarangapani J. A comprehensive survey on model compression and acceleration. Art Intell Rev. 2020;53:5113–5155. doi: 10.1007/s10462-020-09816-7. [DOI] [Google Scholar]
17.Eberhart R, Kennedy J (1995) Particle Swarm Optimization. In: Proceedings of the IEEE international conference on neural networks, vol 4, pp 1942–1948
18.Feoktistov V. Differential evolution. New York: Springer; 2006. [Google Scholar]
19.Ganesh P, Volle K, Burks T, Mehta S. Deep orange: mask r-CNN based orange detection and segmentation. IFAC-PapersOnLine. 2019;52(30):70–75. doi: 10.1016/j.ifacol.2019.12.499. [DOI] [Google Scholar]
20.Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115
21.Guo D, Zhu L, Lu Y, Yu H, Wang S. Small object sensitive segmentation of urban street scene with spatial adjacency between object classes. IEEE Trans Image Process. 2018;28(6):2643–2653. doi: 10.1109/TIP.2018.2888701. [DOI] [PubMed] [Google Scholar]
22.Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
23.Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. Adv Neural Inform Process Syst. 2015;28:1135–1143. [Google Scholar]
24.He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397
25.Holliday A, Barekatain M, Laurmaa J, Kandaswamy C, Prendinger H. Speedup of deep learning ensembles for semantic segmentation using a model compression technique. Comput Vis Image Underst. 2017;164:16–26. doi: 10.1016/j.cviu.2017.05.004. [DOI] [Google Scholar]
26.Hughes D, Salathé M, et al (2015) An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060
27.Islam M, Dinh A, Wahid K, Bhowmik P (2017) Detection of potato diseases using image segmentation and multiclass support vector machine. In: 2017 IEEE 30th Canadian conference on electrical and computer engineering (CCECE), IEEE, pp 1–4
28.Johannes A, Picon A, Alvarez-Gila A, Echazarra J, Rodriguez-Vaamonde S, Navajas AD, Ortiz-Barredo A. Automatic plant disease diagnosis using mobile capture devices, applied on a wheat use case. Comput Electron Agric. 2017;138:200–209. doi: 10.1016/j.compag.2017.04.013. [DOI] [Google Scholar]
29.Kaymak R, Kaymak C, Ucar A. Skin lesion segmentation using fully convolutional networks: a comparative experimental study. Expert Syst Appl. 2020;161:113742. doi: 10.1016/j.eswa.2020.113742. [DOI] [Google Scholar]
30.Lee U, Chang S, Putra GA, Kim H, Kim DH. An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis. PLoS One. 2018;13(4):e0196615. doi: 10.1371/journal.pone.0196615. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710
32.Lin K, Gong L, Huang Y, Liu C, Pan J. Deep learning-based segmentation and quantification of cucumber powdery mildew using convolutional neural network. Front Plant Sci. 2019;10:155. doi: 10.3389/fpls.2019.00155. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
34.Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 [DOI] [PubMed]
35.Luo JH, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066
36.Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z. A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput Electron Agric. 2018;154:18–24. doi: 10.1016/j.compag.2018.08.048. [DOI] [Google Scholar]
37.Manickam R, Rajan SK, Subramanian C, Xavi A, Eanoch GJ, Yesudhas HR. Person identification with aerial imaginary using segnet based semantic segmentation. Earth Sci Inform. 2020;13:1393. doi: 10.1007/s12145-020-00516-y. [DOI] [Google Scholar]
38.Mohan P, Paul A.J, Chirania A (2021) A tiny CNN architecture for medical face mask detection for resource-constrained endpoints. In: Innovations in Electrical and Electronic Engineering, Springer, pp 657–670
39.Mohanty SP, Hughes DP, Salathé M. Using deep learning for image-based plant disease detection. Front Plant Sci. 2016;7:1419. doi: 10.3389/fpls.2016.01419. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Nguyen HD, Na IS, Kim SH (2019) Hand segmentation and fingertip tracking from depth camera images using deep convolutional neural network and multi-task segnet. arXiv preprint arXiv:1901.03465
41.Nguyen K, Fookes C, Sridharan S. Context from within: Hierarchical context modeling for semantic segmentation. Pattern Recogn. 2020;105:107358. doi: 10.1016/j.patcog.2020.107358. [DOI] [Google Scholar]
42.Ping-Rong C, Hang HM, Sheng-Wei C, Lin JJ. Dsnet: an efficient CNN for road scene segmentation. APSIPA Trans Signal Inform Process. 2020;9:e27. [Google Scholar]
43.Rahnamayan S, Tizhoosh HR, Salama MM. Opposition-based differential evolution. IEEE Trans Evol Comput. 2008;12(1):64–79. doi: 10.1109/TEVC.2007.894200. [DOI] [Google Scholar]
44.Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
45.Roth HR, Oda H, Zhou X, Shimizu N, Yang Y, Hayashi Y, Oda M, Fujiwara M, Misawa K, Mori K. An application of cascaded 3D fully convolutional networks for medical image segmentation. Comput Med Imag Graph. 2018;66:90–99. doi: 10.1016/j.compmedimag.2018.03.001. [DOI] [PubMed] [Google Scholar]
46.Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Richter C, Cha K. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. Phys Med Biol. 2018;63(9):095005. doi: 10.1088/1361-6560/aabb5b. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. In IEEE world congress on computational intelligence (Cat. No. 98TH8360), IEEE, pp 69–73
48.Skandha SS, Agarwal M, Utkarsh K, Gupta SK, Koppula VK, Suri JS. A novel genetic algorithm-based approach for compression and acceleration of deep learning convolution neural network: an application in computer tomography lung cancer data. Neural Comput Appl. 2022;34(23):20915–20937. doi: 10.1007/s00521-022-07567-w. [DOI] [Google Scholar]
49.Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
50.Storn R, Price K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim. 1997;11(4):341–359. doi: 10.1023/A:1008202821328. [DOI] [Google Scholar]
51.Wang Z, Li F, Shi G, Xie X, Wang F. Network pruning using sparse learning and genetic algorithm. Neurocomputing. 2020;404:247–256. doi: 10.1016/j.neucom.2020.03.082. [DOI] [Google Scholar]
52.Yang C, An Z, Li C, Diao B, Xu Y (2019) Multi-objective pruning for CNNS using genetic algorithm. In: International conference on artificial neural networks, Springer, pp 299–305
53.Yar H, Hussain T, Agarwal M, Khan ZA, Gupta SK, Baik SW. Optimized dual fire attention network and medium-scale fire classification benchmark. IEEE Trans Image Process. 2022;31:6331–6343. doi: 10.1109/TIP.2022.3207006. [DOI] [PubMed] [Google Scholar]
54.Zhang Q, Zhang M, Chen T, Sun Z, Ma Y, Yu B. Recent advances in convolutional neural network acceleration. Neurocomputing. 2019;323:37–51. doi: 10.1016/j.neucom.2018.09.038. [DOI] [Google Scholar]
55.Zhou J, Fu X, Zhou S, Zhou J, Ye H, Nguyen HT. Automated segmentation of soybean plants from 3D point cloud using machine learning. Comput Electron Agric. 2019;162:143–153. doi: 10.1016/j.compag.2019.04.014. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

[CR1] 1.Street scene images dataset (2007) http://mi.eng.cam.ac.uk/research/projects/ VideoRec/CamSeq01/

[CR2] 2.Lungs x-ray dataset (2018) https://www.kaggle.com/eduardomineo/ lung-segmentation-of-rsna-challenge-data/data

[CR3] 3.Agarwal M, Gupta S, Biswas K. A new conv2d model with modified relu activation function for identification of disease type and severity in cucumber plant. Sustain Comput Informat Syst. 2020;30:100473. doi: 10.1016/j.suscom.2020.100473. [DOI] [Google Scholar]

[CR4] 4.Agarwal M, Gupta SK, Biswas K. Development of efficient CNN model for tomato crop disease identification. Sustain Comput Inform Syst. 2020;28:100407. [Google Scholar]

[CR5] 5.Alqazzaz S, Sun X, Yang X, Nokes L. Automated brain tumor segmentation on multi-modal MR image using segnet. Comput Vis Media. 2019;5(2):209–219. doi: 10.1007/s41095-019-0139-y. [DOI] [Google Scholar]

[CR6] 6.Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 2017;13(3):1–18. doi: 10.1145/3005348. [DOI] [Google Scholar]

[CR7] 7.Badrinarayanan V, Kendall A, Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Beheshti N, Johnsson L (2020) Squeeze u-net: a memory and energy efficient image segmentation network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 364–365

[CR9] 9.Bi L, Feng D, Kim J. Dual-path adversarial learning for fully convolutional network (fcn)-based medical image segmentation. Vis Comput. 2018;34(6):1043–1052. doi: 10.1007/s00371-018-1519-5. [DOI] [Google Scholar]

[CR10] 10.Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett. 2009;30(2):88–97. doi: 10.1016/j.patrec.2008.04.005. [DOI] [Google Scholar]

[CR11] 11.Calisto MB, Lai-Yuen SK. Adaen-net: an ensemble of adaptive 2D–3D fully convolutional networks for medical image segmentation. Neural Netw. 2020;126:76–94. doi: 10.1016/j.neunet.2020.03.007. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Chakraborty UK. Advances in differential evolution. Heidelberg: Springer; 2008. [Google Scholar]

[CR13] 13.Chen H, Wang Y, Shu H, Tang Y, Xu C, Shi B, Xu C, Tian Q, Xu C (2020) Frequency domain compact 3d convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1641–1650

[CR14] 14.Chen T, Cai Z, Zhao X, Chen C, Liang X, Zou T, Wang P. Pavement crack detection and recognition using the architecture of segnet. J Ind Inform Integr. 2020;18:100144. [Google Scholar]

[CR15] 15.Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282

[CR16] 16.Choudhary T, Mishra V, Goswami A, Sarangapani J. A comprehensive survey on model compression and acceleration. Art Intell Rev. 2020;53:5113–5155. doi: 10.1007/s10462-020-09816-7. [DOI] [Google Scholar]

[CR17] 17.Eberhart R, Kennedy J (1995) Particle Swarm Optimization. In: Proceedings of the IEEE international conference on neural networks, vol 4, pp 1942–1948

[CR18] 18.Feoktistov V. Differential evolution. New York: Springer; 2006. [Google Scholar]

[CR19] 19.Ganesh P, Volle K, Burks T, Mehta S. Deep orange: mask r-CNN based orange detection and segmentation. IFAC-PapersOnLine. 2019;52(30):70–75. doi: 10.1016/j.ifacol.2019.12.499. [DOI] [Google Scholar]

[CR20] 20.Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115

[CR21] 21.Guo D, Zhu L, Lu Y, Yu H, Wang S. Small object sensitive segmentation of urban street scene with spatial adjacency between object classes. IEEE Trans Image Process. 2018;28(6):2643–2653. doi: 10.1109/TIP.2018.2888701. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149

[CR23] 23.Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. Adv Neural Inform Process Syst. 2015;28:1135–1143. [Google Scholar]

[CR24] 24.He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397

[CR25] 25.Holliday A, Barekatain M, Laurmaa J, Kandaswamy C, Prendinger H. Speedup of deep learning ensembles for semantic segmentation using a model compression technique. Comput Vis Image Underst. 2017;164:16–26. doi: 10.1016/j.cviu.2017.05.004. [DOI] [Google Scholar]

[CR26] 26.Hughes D, Salathé M, et al (2015) An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060

[CR27] 27.Islam M, Dinh A, Wahid K, Bhowmik P (2017) Detection of potato diseases using image segmentation and multiclass support vector machine. In: 2017 IEEE 30th Canadian conference on electrical and computer engineering (CCECE), IEEE, pp 1–4

[CR28] 28.Johannes A, Picon A, Alvarez-Gila A, Echazarra J, Rodriguez-Vaamonde S, Navajas AD, Ortiz-Barredo A. Automatic plant disease diagnosis using mobile capture devices, applied on a wheat use case. Comput Electron Agric. 2017;138:200–209. doi: 10.1016/j.compag.2017.04.013. [DOI] [Google Scholar]

[CR29] 29.Kaymak R, Kaymak C, Ucar A. Skin lesion segmentation using fully convolutional networks: a comparative experimental study. Expert Syst Appl. 2020;161:113742. doi: 10.1016/j.eswa.2020.113742. [DOI] [Google Scholar]

[CR30] 30.Lee U, Chang S, Putra GA, Kim H, Kim DH. An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis. PLoS One. 2018;13(4):e0196615. doi: 10.1371/journal.pone.0196615. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710

[CR32] 32.Lin K, Gong L, Huang Y, Liu C, Pan J. Deep learning-based segmentation and quantification of cucumber powdery mildew using convolutional neural network. Front Plant Sci. 2019;10:155. doi: 10.3389/fpls.2019.00155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744

[CR34] 34.Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 [DOI] [PubMed]

[CR35] 35.Luo JH, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066

[CR36] 36.Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z. A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput Electron Agric. 2018;154:18–24. doi: 10.1016/j.compag.2018.08.048. [DOI] [Google Scholar]

[CR37] 37.Manickam R, Rajan SK, Subramanian C, Xavi A, Eanoch GJ, Yesudhas HR. Person identification with aerial imaginary using segnet based semantic segmentation. Earth Sci Inform. 2020;13:1393. doi: 10.1007/s12145-020-00516-y. [DOI] [Google Scholar]

[CR38] 38.Mohan P, Paul A.J, Chirania A (2021) A tiny CNN architecture for medical face mask detection for resource-constrained endpoints. In: Innovations in Electrical and Electronic Engineering, Springer, pp 657–670

[CR39] 39.Mohanty SP, Hughes DP, Salathé M. Using deep learning for image-based plant disease detection. Front Plant Sci. 2016;7:1419. doi: 10.3389/fpls.2016.01419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Nguyen HD, Na IS, Kim SH (2019) Hand segmentation and fingertip tracking from depth camera images using deep convolutional neural network and multi-task segnet. arXiv preprint arXiv:1901.03465

[CR41] 41.Nguyen K, Fookes C, Sridharan S. Context from within: Hierarchical context modeling for semantic segmentation. Pattern Recogn. 2020;105:107358. doi: 10.1016/j.patcog.2020.107358. [DOI] [Google Scholar]

[CR42] 42.Ping-Rong C, Hang HM, Sheng-Wei C, Lin JJ. Dsnet: an efficient CNN for road scene segmentation. APSIPA Trans Signal Inform Process. 2020;9:e27. [Google Scholar]

[CR43] 43.Rahnamayan S, Tizhoosh HR, Salama MM. Opposition-based differential evolution. IEEE Trans Evol Comput. 2008;12(1):64–79. doi: 10.1109/TEVC.2007.894200. [DOI] [Google Scholar]

[CR44] 44.Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241

[CR45] 45.Roth HR, Oda H, Zhou X, Shimizu N, Yang Y, Hayashi Y, Oda M, Fujiwara M, Misawa K, Mori K. An application of cascaded 3D fully convolutional networks for medical image segmentation. Comput Med Imag Graph. 2018;66:90–99. doi: 10.1016/j.compmedimag.2018.03.001. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Richter C, Cha K. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. Phys Med Biol. 2018;63(9):095005. doi: 10.1088/1361-6560/aabb5b. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. In IEEE world congress on computational intelligence (Cat. No. 98TH8360), IEEE, pp 69–73

[CR48] 48.Skandha SS, Agarwal M, Utkarsh K, Gupta SK, Koppula VK, Suri JS. A novel genetic algorithm-based approach for compression and acceleration of deep learning convolution neural network: an application in computer tomography lung cancer data. Neural Comput Appl. 2022;34(23):20915–20937. doi: 10.1007/s00521-022-07567-w. [DOI] [Google Scholar]

[CR49] 49.Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576

[CR50] 50.Storn R, Price K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim. 1997;11(4):341–359. doi: 10.1023/A:1008202821328. [DOI] [Google Scholar]

[CR51] 51.Wang Z, Li F, Shi G, Xie X, Wang F. Network pruning using sparse learning and genetic algorithm. Neurocomputing. 2020;404:247–256. doi: 10.1016/j.neucom.2020.03.082. [DOI] [Google Scholar]

[CR52] 52.Yang C, An Z, Li C, Diao B, Xu Y (2019) Multi-objective pruning for CNNS using genetic algorithm. In: International conference on artificial neural networks, Springer, pp 299–305

[CR53] 53.Yar H, Hussain T, Agarwal M, Khan ZA, Gupta SK, Baik SW. Optimized dual fire attention network and medium-scale fire classification benchmark. IEEE Trans Image Process. 2022;31:6331–6343. doi: 10.1109/TIP.2022.3207006. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Zhang Q, Zhang M, Chen T, Sun Z, Ma Y, Yu B. Recent advances in convolutional neural network acceleration. Neurocomputing. 2019;323:37–51. doi: 10.1016/j.neucom.2018.09.038. [DOI] [Google Scholar]

[CR55] 55.Zhou J, Fu X, Zhou S, Zhou J, Ye H, Nguyen HT. Automated segmentation of soybean plants from 3D point cloud using machine learning. Comput Electron Agric. 2019;162:143–153. doi: 10.1016/j.compag.2019.04.014. [DOI] [Google Scholar]

PERMALINK

Development of a compressed FCN architecture for semantic segmentation using Particle Swarm Optimization

Mohit Agarwal

Suneet K Gupta

K K Biswas

Abstract

Introduction

Aim and novelty of the study

Proposed model

Fig. 1.

Introduction to PSO

Fig. 2.

Application of PSO for FCN compression

Fig. 3.

Fig. 4.

Fig. 5.

Selection in PSO for FCN compression

Experimental setup and results

Table 1.

Compression of FCN8 on street scene images

Table 3.

Fig. 6.

Table 2.

Compression of FCN32 on leaf images

Fig. 7.

Fig. 8.

Table 4.

Table 5.

Compression of FCN32 on medical dataset of lungs X-rays

Table 6.

Fig. 9.

Actual Implementation

Fig. 10.

Conclusion

Data availability

Declarations

Conflict of interest

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases