Skip to main content
Applied Bionics and Biomechanics logoLink to Applied Bionics and Biomechanics
. 2020 Nov 23;2020:8886923. doi: 10.1155/2020/8886923

Multichannel Saliency Detection Based on Visual Bionics

Lidan Cheng 1, Tianyi Li 1, Shijia Zha 1, Wei Wei 1,, Jihua Gu 1
PMCID: PMC7704193  PMID: 33299470

Abstract

Inspired by the visual properties of the human eyes, the depth information of visual attention is integrated into the saliency detection to effectively solve problems such as low accuracy and poor stability under similar or complex background interference. Firstly, the improved SLIC algorithm was used to segment and cluster the RGBD image. Secondly, the depth saliency of the image region was obtained according to the anisotropic center-surround difference method. Then, the global feature saliency of RGB image was calculated according to the colour perception rule of human vision. The obtained multichannel saliency maps were weighted and fused based on information entropy to highlighting the target area and get the final detection results. The proposed method works within a complexity of O(N), and the experimental results show that our algorithm based on visual bionics effectively suppress the interference of similar or complex background and has high accuracy and stability.

1. Introduction

Saliency detection is an important research content in computer vision, which refers to the process of simulating human visual attention mechanism to accurately and quickly detect the most interesting regions in images. Borji et al. defined that saliency visually described the prominent target or area in the scene relative to its neighbouring area [1]. The human visual attention mechanism prioritizes a few significant areas or objectives, while ignoring or discarding others that are not, which can allocate computing resources selectively and greatly improve the efficiency of visual information processing. Therefore, the saliency computing model based on visual attention mechanism has been widely studied. When processing the input image or video, the computer can judge the importance of its visual information by detecting the saliency area. It has been widely applied in object detection and identification [2], image retrieval [3], video quality assessment [4], video compression [5], image cropping [6], and other fields.

The RGB image saliency detection model based on visual attention mechanism uses low-level feature contrast to calculate saliency [7, 8]. Typical of them are global feature comparison calculation model [9], local feature comparison calculation model [10], and combination of global and local feature comparison model [11]. In order to improve the accuracy of detection, the saliency detection model was proposed based on prior knowledge [12]. Typical of them are position prior [13], background prior [14, 15], colour prior [16], shape prior [17], and boundary prior [18, 19].

However, most 2D image saliency detection models based on human visual attention mechanism ignore the fact that human visual attention mechanism is based on 3D scene. It shows that depth provides extra important information of saliency detection for RGB image. Desingh et al. discussed 3D saliency detection methods based on depth appearance, depth-induced blur, and centre-bias [20]. Niu et al. conducted depth saliency detection based on parallax contrast and professional knowledge in vertical photography [21]. Further, Ju et al. proposed a depth saliency detection model based on depth image anisotropic center-surround difference [22]. Ren et al. [23], respectively, proposed the saliency detection of RGB-D images against a complex background by combining the prior knowledge of depth, indicating the validity of depth information in 3D saliency detection. However, there are two challenges in the process of saliency detection of RGB-D images. The first is how to calculate the saliency of depth images under similar or complex background interference, and the second is how to combine the saliency map of depth image and RGB image to obtain the final result with a good performance. In this paper, we proposed a multichannel saliency detection method based on RGBD images, which has the following contributions:

  1. On the basis of SLIC algorithm, colour, texture, and depth information are used to measure the distance of superpixel segmentation

  2. Based on the perception rule of human vision, we introduced the depth information and global information of RGB image as two feature channels for saliency computing

  3. The weighted features of depth saliency and colour saliency were fused by information entropy, and experiment shows that the algorithm has a good performance in case of background interference

2. Saliency Detection

The algorithm framework of this paper is shown in Figure 1. Combining the depth map with the RGB map to carry out image preprocessing and colour, texture, and depth information are introduced as the basis of superpixel segmentation. Then, the colour and depth information were calculated as two feature channels of saliency map. As is shown in Figure 2, the depth saliency was obtained by the anisotropic center-surround difference (ACSD) method, and the global saliency of RGB image was calculated by global contrast method based on HSV space. Finally, information entropy is used to calculate the weights of two channels, respectively, and get the final fused saliency map.

Figure 1.

Figure 1

Our framework of saliency detection.

Figure 2.

Figure 2

Example of the ACSD operator in a region.

3. Image Preprocessing

The human visual observation system takes the image region as the basic unit, and the saliency detection based on the region conforms to the visual characteristics of the human eyes. As a construction method of pixel region, superpixel technology has been widely used in computer vision field. Superpixel can quickly segment the image into subregions with certain semantics, which is conducive to the extraction of local features and the expression of structural information [24]. SLIC algorithm has obtained a good balance in the two aspects of edge fitting degree and compactness, which has an excellent comprehensive performance. When the SLIC is used to segment the left image, the obtained boundary is not accurate because of ignoring the mutual constraint relationship between the 2D and depth information. Therefore, colour, texture, and depth information are used to measure the distance of superpixel segmentation in this paper.

Converting the left image to the CIE Lab colour space and dividing the image into k superpixels. Here, each pixel has a unique identifier  i. Extract the follow 7 d characteristics of each superpixel region as measurement property. It can be expressed:

Sp¯i=li,ai,bi,Cconi,Ccori,Ei,dii=1,2,3,,k, (1)

where li, ai, and bi are the mean value of L, a, and b colour components of each superpixel region; Cconi, Ccori, and Ei  are the mean value of contrast, cross-correlation, and energy mean of gray level cooccurrence matrix of each superpixel region; and di is the depth value of each superpixel region. Then, we can describe the adjacent superpixel pair as Sp¯ij:

Sp¯ij=Sp¯i,Sp¯ji1,k,j1,k,ij, (2)

where Sp¯ij superpixel pair with i and j as identifier, k is the number of superpixels of the image, and SP¯i and Sp¯j are the 7 d characteristics of the adjacent superpixel pair. The number of adjacent superpixel pairs in each image is determined by SLIC superpixel segmentation.

Using colour, texture, and depth features to calculate the difference between all adjacent superpixel pairs Sp¯ij.  dlab, dglcm, and ddepth are defined to describe the measurement of colour, texture and depth characters:

dlab=lilj2+aiaj2+bibj2,dglcm=CconiCconj2+CcoriCcorj2+EiEj2,ddepth=didj2. (3)

Then, the distance measurement of superpixel segmentation Dij is

Dij=ω1ε+dlab23+ω2ε+dglcm23+ω3ε+ddepth2, (4)

where ε = 10−4. It is used to ensure the validity of the value. ω1, ω2, are ω3 are the weight of colour, texture, and depth.

In the image, the greater the discreteness of a feature data set is, it means that the more influence this feature has on the image. Mean variance can effectively represent the degree of difference between data. Therefore, the global mean variance of colour, texture, and depth is used as the weight values of the three features ω1, ω2, and ω3.

If the difference between adjacent superpixels is less than a certain threshold th1, the adjacent superpixel pair will be merged.

th1=ω1l¯+a¯+b¯3+ω2Ccon¯+Ccor¯+E¯3+ω3d¯, (5)

where l¯,a¯, and b¯ are the mean value of L, a, and b colour components of the image; Ccon¯,Ccor¯, and E¯ are the mean value of contrast, cross-correlation, and energy mean of gray level cooccurrence matrix of the image; and d¯ is the depth value of the image.

Finding all similar adjacent superpixel pairs and taking the upper left superpixel of the image as the starting point of clustering. The output after clustering contains n regions Ri, 1 ≤ in.

4. Depth Saliency Map

For each superpixel, the anisotropic center-surround difference (ACSD) value is calculated, and the value of center superpixel is assigned to each pixel within the region Ri. Performing an anisotropic scan along multiple directions, in each scanline, assuming the pixel with the minimum depth value as background and calculate the difference between the center pixel and background. L is the maximum scan length for each scanline. The typical value of L is a third of the diagonal length.

The anisotropic center-surround difference (ACSD) is summed over eight scanning directions 0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°. The mathematical description of anisotropic center-surround difference (ACSD) measure is

Sdmx,y=dx,ymindm,n1,L,Sdx,y=m=18Sdm, (6)

where Sdepthm indicates the ACSD value of pixel (x, y) along the scanline m. d(x, y) is the depth value of pixel (x, y). n is the index of the pixels along the scan path m. min(dm) is the minimum depth value along the scanline m. Sd(x, y) is the ACSD value of pixel (x, y) which sums the center-surround difference values in eight directions.

5. Global Saliency Map of RGB Image

Colour histogram is used to regularize the colour of the image to level 128 in order to reduce the computational complexity and save the storage space. On the other hand, the descending dimension algorithm for HSV colour space is proposed. With the decrease of saturation, any colour in HSV space can be described by the change of gray level. The intensity value determines the specific gray level of the conversion [25]. When the colour saturation is close to zero, all pixels look similar regardless of hue. With the increase of saturation, the pixels are distinguished by hue value.

Compared with colour saturation, human vision is more sensitive to hue and intensity. The pixels with lower colour saturation can be approximately represented by intensity level, while the pixels with higher colour saturation can be approximately represented by hue. Saturation value is used to determine whether each pixel can be represented by hue or intensity value, which is more consistent with the law of human visual perception. Saturation threshold th2 is

th2=10.8Ivx,y255, (7)

where Iv(x, y) represents the V component value of a pixel. When the saturation value Is(x, y) is greater than th2, the pixel point is represented by the hue value Iv(x, y); when the saturation value is less than th2, the pixel point is represented by the intensity value Ih(x, y). The saliency of each pixel is

Scx,y=I¯Ivx,yIsx,yth2I¯Ihx,yIsx,y>th2x,yRi, (8)

where Is(x, y)  is the saturation value of the pixel, Ih(x, y)  is the hue value of the pixel, Iv(x, y) is the intensity value of the pixel, and I¯ is the mean value of all pixels.

6. Fusion of Saliency Map

When synthesizing the colour saliency map and the depth saliency map, the information entropy is used to calculate the weights of the channels.

The information entropy of colour saliency is

HcR=i=1npcRilogpcRi, (9)

where pc(Ri) is the ratio of the sum of Ri colour saliency values to the whole image.

The information entropy of depth saliency is

HdR=i=1npdRilogpdRi, (10)

where pd(Ri) is the ratio of the sum of  Ri depth saliency values of to the whole image.

The saliency map Sfuse(x, y) was obtained by fusing the two channels:

Sfusex,y=HcRHcR+HdRScx,y+HdRHcR+HdRSdx,yx,yRi. (11)

7. Experimental Comparison

We show a few saliency maps generated by different algorithms in Figure 3.

Figure 3.

Figure 3

Saliency comparisons of different methods in terms of NJU400 dataset: (a) RGB image; (b) depth image; (c) ground truth; (d) GS; (e) MC; (f) MR; (g) WCTR; (h) ACSD; (i) MSD.

The precision-recall curve is evaluated from two aspects: precision and recall. Precision refers to the ratio between the number of correct saliency pixels and the whole number of saliency pixels, which is used as the y-axis. Recall refers to the ratio of the number of correct saliency pixels to the number of true pixels, which is used as the x-axis.

The algorithms are tested on NJU400 datasets. Two test sets are divided from NJU400 according to the complexity and the similarity of the background. Four volunteers are invited to divide the raw datasets into the normal group (N group) and the similar/complex background group (S/C group). At last, 92 high quality and consistently labelled images are selected into the S/C group, and the rest are divided into the N group. The precision-recall curves of different algorithms tested on the N group, S/C group, and full datasets are given in Figure 4. The performance of different algorithms tested on three groups is given in Table 1.

Figure 4.

Figure 4

The precision-recall curves of different algorithms: (a) the precision-recall curves on the N group; (b) the precision-recall curves on the S/C group; (c) the precision-recall curves on full datasets.

Table 1.

The performance of different algorithms tested on three groups.

Group Algorithms Precision
N group GS 0.68
MC 0.79
MR 0.85
RBD 0.81
ACSD 0.84
MSD 0.82

S/C group GS 0.49
MC 0.50
MR 0.63
RBD 0.56
ACSD 0.87
MSD 0.95

Full datasets GS 0.64
MC 0.72
MR 0.80
RBD 0.76
ACSD 0.81
MSD 0.86

The proposed method works within a complexity of O(N), and the evaluation on the results of these saliency detection algorithms in the S/C group shows that our algorithm has a better performance than other algorithms. In full datasets, it also performs well. By selecting the salient subset for further processing, the complexity of higher visual analysis can be reduced significantly. Many applications benefit from saliency analysis such as object segmentation, image classification, and image/video retargeting.

8. Conclusions

A new framework based on visual bionics for saliency detection under similar or complex background interference is proposed in this paper: First, we combine the depth map with the RGB map, and colour, texture, and depth information are introduced as the basis of superpixel segmentation. Second, the colour and depth information were calculated as two feature channels of saliency map. Finally, information entropy is used to calculate the weights of two channels, respectively, and get the final fused saliency map. The proposed method works within a complexity of O(N), and the experimental results show that our saliency detection framework greatly reduces the error detection under similar and complex background and improves the overall saliency detection performance.

Data Availability

The NJU400 datasets used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  • 1.Borji A., Itti L. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(1):185–207. doi: 10.1109/TPAMI.2012.89. [DOI] [PubMed] [Google Scholar]
  • 2.Wang W., Shen J., Shao L., Porikli F. Correspondence driven saliency transfer. IEEE Transactions on Image Processing. 2016;25(11):5025–5034. doi: 10.1109/TIP.2016.2601784. [DOI] [PubMed] [Google Scholar]
  • 3.Jian M. W., Dong J. Y., Ma J. Image retrieval using wavelet-based salient regions. The Imaging Science Journal. 2013;59(4):219–231. doi: 10.1179/136821910X12867873897355. [DOI] [Google Scholar]
  • 4.Xin F., Yang D., Ling Z. Saliency variation-based quality assessment for packet-loss-impaired videos. Acta Automatica Sinica. 2011;37(11):1322–1331. [Google Scholar]
  • 5.Gupta R., Chaudhury S. A scheme for attentional video compression. Proceedings of the 4th International Conference on Pattern Recognition and Machine Intelligence; 2011; Moscow, Russia. IEEE; pp. 458–465. [DOI] [Google Scholar]
  • 6.Chenlei Guo. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing. 2010;19(1):185–198. doi: 10.1109/TIP.2009.2030969. [DOI] [PubMed] [Google Scholar]
  • 7.Itti L., Koch C., Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(11):1254–1259. doi: 10.1109/34.730558. [DOI] [Google Scholar]
  • 8.Hu Z.-P., Peng-Quan M. Graph presentation random walk salient object detection algorithm based on global isolation and local homogeneity. Acta Automatica Sinica. 2011;37(10):1279–1284. [Google Scholar]
  • 9.Cheng M. M., Mitra N. J., Huang X. L., Torr P. H. S., Hu S. M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015;37(3):569–582. doi: 10.1109/TPAMI.2014.2345401. [DOI] [PubMed] [Google Scholar]
  • 10.Yong T., Yang L., Liang-Liang D. Image cell-based saliency detection via colour contrast and distribution. Acta Automatica Sinica. 2013;39(10):1632–1641. [Google Scholar]
  • 11.Guo Y.-C., Yuan H.-J., Wu P. Image saliency detection based on local and regional features. Acta Automatica Sinica. 2013;39(8):1214–1224. doi: 10.3724/sp.j.1004.2013.01214. [DOI] [Google Scholar]
  • 12.Xu W., Zhen-Min T. Exploiting hierarchical prior estimation for salient object detection. Acta Automatica Sinica. 2015;41(4):799–812. [Google Scholar]
  • 13.Liu T., Yuan Z., Sun J., et al. Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(2):353–367. doi: 10.1109/TPAMI.2010.70. [DOI] [PubMed] [Google Scholar]
  • 14.Wei Y., Wen F., Zhu W., Sun J. Computer Vision – ECCV 2012. Springer, Berlin, Heidelberg; 2012. Geodesic saliency using background priors; pp. 29–42. [DOI] [Google Scholar]
  • 15.Zhu W., Liang S., Wei Y., Sun J. Saliency optimization from robust background detection. Proceedings of the IEEE conference on computer vision and pattern recognition; 2014; Columbus, Ohio, USA. [Google Scholar]
  • 16.Shen X. H., Wu Y. A unified approach to salient object detection via low rank matrix recovery. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition; 2012; Providence, RI, USA. pp. 853–860. [Google Scholar]
  • 17.Jiang H. Z., Wang J. D., Yuan Z. J., Liu T., Zheng N. N., Li S. P. Automatic salient object segmentation based on context and shape prior. Proceedings of 2011 British Machine Vision Conference; 2011; Dundee, UK. pp. 110.1–11110. BMVA Press. [Google Scholar]
  • 18.Yang C., Zhang L. H., Lu H. C., Ruan X., Yang M. H. Saliency detection via graph-based manifold ranking. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013; Portland, OR, USA. pp. 3166–3173. [Google Scholar]
  • 19.Jiang B., Zhang L., Lu H. Saliency detection via absorbing markov chain. Proceedings of the IEEE international conference on computer vision; 2013; Sydney, Australia. [Google Scholar]
  • 20.Desingh K., Krishna K. M., Rajan D., Jawahar C. V. Depth really matters: improving visual salient region detection with depth. Proceedings of 2013 British Machine Vision Conference; 2013; Bristol, England. pp. 98.1–9898. BMVA Press. [Google Scholar]
  • 21.Niu Y. Z., Geng Y. J., Li X. Q., Liu F. Leveraging stereopsis for saliency analysis. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition; 2012; Providence, RI, USA: IEEE. pp. 454–461. [Google Scholar]
  • 22.Ju R., Ge L., Geng W. J., Ren T. W., Wu G. S. Depth saliency based on anisotropic center-surround difference. Proceedings of 2014 IEEE International Conference on Image Processing; 2014; Pairs, France: IEEE. pp. 1115–1119. [Google Scholar]
  • 23.Ren J. Q., Gong X. J., Yu L., Zhou W. H., Yang M. Y. Exploiting global priors for RGB-D saliency detection. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2015; Boston, MA, USA: IEEE. pp. 25–32. [Google Scholar]
  • 24.Achanta R., Shaji A., Smith K., Lucchi A., Fua P., Süsstrunk S. SLIC superpixels. Switzerland: EPFL; 2010. [DOI] [PubMed] [Google Scholar]
  • 25.Sural S., Qian G., Pramanik S. Segmentation and histogram generation using the HSV colour space for image retrieval. Proceedings. International Conference on Image Processing; September 2002; Rochester, NY, USA. pp. 589–592. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The NJU400 datasets used to support the findings of this study are included within the article.


Articles from Applied Bionics and Biomechanics are provided here courtesy of Wiley

RESOURCES