A clustering-optimized segmentation algorithm and application on food quality detection

QingE Wu; Penglei Li; Zhiwu Chen; Tao Zong

doi:10.1038/s41598-023-36309-8

. 2023 Jun 5;13:9069. doi: 10.1038/s41598-023-36309-8

A clustering-optimized segmentation algorithm and application on food quality detection

QingE Wu ^1,^✉, Penglei Li ¹, Zhiwu Chen ¹, Tao Zong ¹

PMCID: PMC10241822 PMID: 37277524

Abstract

For solving the problem of quality detection in the production and processing of stuffed food, this paper suggests a small neighborhood clustering algorithm to segment the frozen dumpling image on the conveyor belt, which can effectively improve the qualified rate of food quality. This method builds feature vectors by obtaining the image's attribute parameters. The image is segmented by a distance function between categories using a small neighborhood clustering algorithm based on sample feature vectors to calculate the cluster centers. Moreover, this paper gives the selection of optimal segmentation points and sampling rate, calculates the optimal sampling rate, suggests a search method for optimal sampling rate, as well as a validity judgment function for segmentation. Optimized small neighborhood clustering (OSNC) algorithm uses the fast frozen dumpling image as a sample for continuous image target segmentation experiments. The experimental results show the accuracy of defect detection of OSNC algorithm is 95.9%. Compared with other existing segmentation algorithms, OSNC algorithm has stronger anti-interference ability, faster segmentation speed as well as more efficiently saves key information ability. It can effectively improve some disadvantages of other segmentation algorithms.

Subject terms: Computer science, Information technology

Introduction

For effective segmentation so as to improve product qualification of defect targets, some hidden defect features, such as cracks, breakages, stains and so on need to be carried out detect early from images of food products containing fillings, which can greatly improve the recognition rate of dumpling defects, thus the quality of dumplings is approved. The image analysis of filled food products has become important with advances in artificial intelligence and increasing labor costs. The basis for image segmentation target recognition, matching and tracking were important for image understanding, image analysis, pattern recognition, computer vision and others¹.

The result of image segmentation can segment a food processing scene image into target regions, thus providing the location of the target in the image. Algorithms based on grey-scale threshold segmentation², edge segmentation³ and region segmentation⁴ are widely used in image segmentation. The threshold segmentation method is particularly suitable for images where the target and background occupy different grey level ranges and has been applied in many fields, where the selection of threshold values is a key technique in image threshold^5,6. The edge information is the detailed information when the grey scale of the image changes⁷. In order to segment out the region of interest, there were other edge detection operators such as Sobel et al.⁸. The processing principle of such edge detection algorithms was used to record the grey jump, and when the grey jump matches the set threshold, the edge features are extracted using the difference operation method⁹. It has been shown that both segmentation methods are widely used. Yang¹⁰ proposed a supervised multiple threshold segmentation models to complete the detection of potato sprouting. In addition, scholars have also actively improved the edge detection operator. Lu¹¹ introduced a threshold selection method based on the local maximum inter-class variance algorithm in the Canny edge detection algorithm in order to improve the efficiency of thermal image recognition. Liao¹² used a supervised block-based region segmentation algorithm to segment tumor regions from breast ultrasound images, combined with a deep learning network, in order to predict whether a breast tumor is benign or malignant.

The cluster segmentation is one of the specific theoretical approaches to image, typically, the K-means clustering algorithm¹³ and the fuzzy C-mean clustering algorithm¹⁴. Trivedi¹⁵ used a K-means clustering segmentation algorithm to segment plant leaves into homogeneous segments which significantly improved the accuracy of plant leaf pest detection. Wu¹⁶ used the Canny algorithm to process text image edge detection and then used the k-means algorithm for clustering pixel recognition, which effectively improved the accuracy of text image recognition. Fuzzy C-Means (FCM) was the most useful image segmentation algorithm for realistic scenarios¹⁷. The FCM segmentation algorithm deals directly with the greyscale image by using fuzzy theory. The purpose of the clustering operation carried out classifying the dataset more accurately and reasonably, classifying all samples with similar features. Some samples with more different features could be classified in different categories so as to reach the most reasonable segmentation effect¹⁸. Gao¹⁹ proposed a robust fuzzy c-mean clustering method based on the adaptive elastic distance for image segmentation. Brikh²⁰ combined fuzzy C-means and particle swarm optimization (PSO) algorithms to cluster large nonlinear data sets.

FCM clustering algorithm and K-means segmentation algorithm were well applied in Various image segmentation practices²¹. However, they also suffer from the following disadvantages, such as searching time of both types of algorithms and their derivatives is longer especially multi-threshold segmentation. The larger the image size is, the longer the segmentation time is²². The parameters need to be set, and the optimal number of partitions could not be obtained by existing methods²³. In addition, because the defects of stuffed food are relatively small to obtain feature of defects difficultly. it is necessary to establish an algorithm suitable for this defect to realize dumpling image segmentation. This paper proposes an optimized small neighborhood clustering (OSNC) segmentation algorithm, which implements the segmentation of stuffed food, and verifies the effectiveness of the algorithm by using the open-source datasets.

To verify the feasibility of the OSNC segmentation algorithm for fast and accurate segmentation of images of stuffed food in real production. Based on that, a Matlab defect detection platform was built to detect defects in the production process of frozen dumplings. The processes are as follows: (1) a grey-scale camera is set up to capture the image information of the frozen dumplings; (2) the OSNC algorithm pre-processes the samples; and (3) the defect detection platform locates the defective dumplings. The specific flow chart is as follow in Fig. 1.

Flow chart of the frozen dumpling defect detection.

Small neighborhood clustering segmentation algorithm

Suppose $n$ is classes of sample data, each of which is a set of data $α_{1}, α_{2}, \dots, α_{n}$ . Suppose that there are $m$ attribute parameters of the sample, For example, the image peak, single peak, grayscale, valley, color, edge, inflection point, tone, multimodal and other indicators²⁴. The $m$ -dimensional feature space is constructed according to the indicators of these sample points, and all the sample points correspond to the points in the $m$ -dimensional feature space.

Small neighborhood segmentation algorithm is given as follows:

Step 1
For a given sample $x = 〈c_{1} (x), c_{2} (x), \dots, c_{m} (x)〉$ , there are $m$ attributes for sample $x$ .
Step 2
The $c_{q} (x)$ , $q = 1, 2, \dots, m$ is the $q - th$ attribute about $x$ .
Step 3
By calculating the distance between any two samples in the sample space, there are $K$ classes are obtained, that is closest to sample $x$ , $y_{1}, y_{1}, \dots, y_{K}$ . $y_{i}$ represents one of the $K$ .
Step 4
Regard class $x$ as a center, an appropriate class are found in a small neighborhood with radius $ε$ by clustering.
Step 5
For the sample $x$ that needs to be segmented, $d_{1}, d_{2}, \dots, d_{K}$ between the $K$ nearest neighbors $(x, y_{1})$ , …, $(x, y_{K})$ is defined by the distance.
Step 6
For $L$ samples $x$ , that is marked as $x_{l}$ , $l = 1, 2, \dots, L$ , and $n$ classes $α_{i}$ , $i = 1, 2, \dots, n$ , there are $N_{i}$ samples in each class, and one sample has $m$ attributes.
Step 7
Next, for $K$ neighbors of sample $x_{l}$ , that is marked as $x_{l + 1}, \dots, x_{l + K}$ , the center of single attributes is shown under the formula.

For sample $x$ the same attribute of $K$ that is marked as $x_{l + 1}, \dots, x_{l + K}$ of is defined as follows:

V_{l}^{q} = \frac{1}{K + 1} \sum_{j = l}^{K + l} c_{q} (x_{j}), q = 1, 2, \dots, m, l = 1, 2, \dots, L

The small neighborhood clustering algorithm further can be divided into two stages: training stage and segmentation stage.

The training procedures are as follows:

Configure the original iterative value of the algorithm to the same attribute, and set the sample number $s$ to $s = 0$ .
Then the circle center attribute is set to $V_{l}^{q}$ . Compute the nearest neighbors of $V_{l}^{q}$ in a small neighborhood of radius $ε$ . When you get a proper nearest neighbor $V$ , update the value of $s$ : $s = s + 1$ .
Search P2 in turn until the nearest neighbor $V$ does not exist, $s$ neighbors of $V_{l}^{q}$ can be obtained. Then the number of samples of the same attribute is $s$ . Then, the weight of each attribute of the sample can be defined as $ζ_{q} = s / L$ , $q = 1, 2, \dots, m$ . Define $ζ_{p} = max_{q \in \{1, 2, \dots, m\}} \{ζ_{q}\}$ , Then $ζ_{p}$ is the known class. Combined with formula (1), the average value of the same attribute for different samples of each class is defined as follows:
$V_{α_{i}}^{q} = \frac{1}{N_{i}} \sum_{l = 1}^{N_{i}} c_{q} (x_{l}), q = 1, 2, \dots, m, i = 1, 2, \dots, n$ 2

The training process is shown in Fig. 2.

Small neighborhood clustering process and trajectory of clustering centers.

The segmentation procedures are as follows:

First set $c_{h 0} = 0$ , $b_{hq} = 0$ .
Set $N_{i}$ to a certain value $z$ .
Set the iteration initial value $v = 0$ .
Set the center to $V_{α_{i}}^{q}$ . Search the nearest neighbor $V^{'}$ of $V_{α_{i}}^{q}$ in a small neighborhood with radius $ε$ . Whenever new $V^{'}$ is searched, update the value of $v$ : $v = v + 1$ .
According to the above P4 search small neighborhood algorithm, simulation iteration, until the nearest neighbor $V^{'}$ does not exist.
Given $z = z + 1$ .

At the same time, the number of obtained $V_{α_{i}}^{q}$ is stored as $v$ and assigned to $c_{h 0} = max \{c_{h 0}, v\}$ . Then, the weight of each attribute corresponding to the sample contained in each class can be defined as $v / N_{i}$ and assigned to $b_{hq} = max \{b_{hq}, v / N_{i}\}$ , $q = 1, 2, \dots, m$ , $h = 1, 2, \dots, N_{i}$ .
If $v = c_{h 0}$ , the iteration is terminated. Otherwise, P3 ~ P6 are continued until $v = c_{h 0}$ , and the segmentation ends. At this time, the number of total classes of the dataset to be segmented is $z$ , and assigned to $n = z$ . In this way, the dataset is first divided into $n$ classes. At the same time, assigned to $c_{0} = c_{h 0}$ and $b_{q} = b_{hq}$ .

Suppose that the data to be segmented has been determined $n$ classes, and each class has $m$ mean. The metric function $d_{i}$ of the distance function between the sample element $x$ to be segmented and a certain type of element in the training sample is:

d_{i} (x, α_{i}) = \sum_{q = 1}^{m} b_{q} \frac{|c_{q} (x) - V_{α_{i}}^{q}|}{O_{iq}^{u} - O_{iq}^{s}}

where, $O_{iq}^{s} = min_{h \in \{1, 2, \dots, N_{i}\}} \{c_{q} (x), c_{q} (x_{h})\}$ , $O_{iq}^{u} = max_{r \in \{1, 2, \dots, N_{i}\}} \{c_{q} (x), c_{q} (x_{h})\}$ , $i = 1, 2, \dots, n$ .

Then the minimum values of these distances are obtained as shown in Formula (4):

d_{i *} (x, α_{i *}) = min_{i \in \{1, 2, \dots, n\}} \{d_{i} (x, α_{i})\}

Then the formula (4) can determine which class $α_{i *}$ the sample $x$ to be segmented belongs to. For the sample $x$ to be segmented, the distances between $x$ and $n$ classes are respectively defined as $d_{1}, d_{2}, \dots, d_{n}$ . The calculation of decision weight can also be calculated by $λ_{i}$ as shown in Formula (5):

λ_{i} = \frac{1}{n - 1} (1 - \frac{d_{i}}{\sum_{i = 1}^{n} d_{i}})

Satisfying $\sum_{i = 1}^{n} λ_{i} = 1$ . Define $λ_{i *} = max_{i \in \{1, 2, \dots, n\}} \{λ_{i}\}$ . From the obtained $i *$ , which can also determine which class $α_{i *}$ the sample $x$ to be segmented belongs to. The segmentation algorithm is shown in Fig. 3.

Image feature clustering segmentation based on small neighborhood clustering.

Optimistic method

Selection of optimal segmentation points

The pixel value of the grayscale image is used as the input of the algorithm to verify the effective segmentation algorithm. If the shape of the image is $M \times N$ , the corresponding image grayscale value matrix set is $L = \{L_{ij}, i = 1, 2, \dots, M, j = 1, 2, \dots, N\}$ . Define the set of its segmentation centers as $O = {o_{k}, k = 1, 2, \dots, n}$ . $U = {μ_{k} (L_{ij})}$ is the membership set of pixels $(i, j)$ in the defined class $k$ , and $D = {d_{ijk}, k = 1, 2, \dots, n, i = 1, 2, \dots, M, j = 1, 2, \dots, N}$ are the set of distances between cluster centers. The objective function formula of segmentation center is:

B_{f} = \sum_{k = 1}^{n} \sum_{i = 1}^{M} \sum_{j = 1}^{N} {[μ_{k} (L_{ij})]}^{r} {(d_{ijk})}^{2} = \sum U {(i, j, k)}^{r} D {(i, j, k)}^{2}

where $r$ is the fuzzy weight index. There is:

\sum_{k = 1}^{n} μ_{k} (L_{ij}) = 1

The calculation results of the segmentation center $o_{k}$ and the final value $μ_{k} (L_{ij})$ of the membership matrix are shown in Formulas (8) and (9):

o_{k} = \frac{\sum U {(i, j, k)}_{}^{r} L (i, j)}{\sum U {(i, j, k)}_{}^{m}}, k = 1, 2, \dots, n

μ_{k} (L_{ij}) = \frac{D {(i, j, k)}^{- \frac{2}{r - 1}}}{\sum_{k = 1}^{n} D {(i, j, k)}^{- \frac{2}{r - 1}}}, k = 1, 2, \dots, n

Segmentation center can be calculated quickly by initial membership matrix and formula (8). Then calculate the new value of $μ_{k} (L_{ij}) (\forall k, i, j)$ by $o_{k}$ and formula (9). After many calculations, until $μ_{k} (L_{ij}) (\forall k, i, j)$ is stable. Define $O$ as the final set of segmentation centers and use the following formula to calculate the image segmentation threshold:

J_{c} = β o_{c} + \tilde{β} o_{c + 1}, c = 1, 2, \dots, G

where, $G$ is the number of thresholds, $β$ and $\tilde{β}$ are the weight coefficients. satisfy the formula (11).

β + \tilde{β} = 1

Usually select $β = \tilde{β} = 0.5$ .

This paper takes pictures in VOC database as segmentation samples. The above segmentation algorithm is used to segment the testing image with different thresholds, and the results are shown in Fig. 4.

Results of different threshold segmentation.

Selection of optimal segmentation sampling rate

Usually, the fixed interval algorithm for image information acquisition does not have much impact on the image processing results, and can save equipment memory. Therefore, most image processing algorithms will resample the image. The resampling algorithm can be described as formula (12):

[\begin{matrix} x_{1} \\ y_{1} \\ 1 \end{matrix}] = [\begin{matrix} η & 0 & 0 \\ 0 & η & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{0} \\ y_{0} \\ 1 \end{matrix}]

The value range of resampling rate is $0 < η < 1$ , the coordinate of initial image is $(x_{0}, y_{0})$ , and it is $(x_{1}, y_{1})$ after formula transformation. The new data generated is related to the value of $η$ . When the value of $η$ is small, the information acquisition effect is good, but the image distortion is obvious, and important information is lost. Therefore, selecting appropriate proportion is the key to effective segmentation. Selecting the appropriate sampling rate can make the information loss acceptable, which is a feasible algorithm. The information calculation referred to the segmentation method based on histogram entropy.

Calculation of optimal sampling rate

The algorithm proposed in this paper uses entropy loss information as the standard to evaluate the distortion degree of the image. On this basis, in order to achieve good segmentation effect, the relative entropy loss degree is used as the selection basis of sampling rate in sampling. When the sample image has enough segmentation information, the sample image information is used to calculate the segmentation threshold. The obtained sample image is similar to the histogram shape of the original image, that is, the information of the sample image is basically the same as that of the original image. Figure 5 shows the sample image and its histogram at different sampling rates.

According to Fig. 5, the histogram shape of the sample image²⁵ is basically the same under different resampling rates, indicating that the resampled image retains most of the information of the original image; But histograms differ from each other. When the sampling rate decreases, the image information is lost, and the curves in each histogram change obviously, which indicates that the accuracy of segmentation can be guaranteed by obtaining appropriate sampling rate.

The definition of Shannon entropy is shown in formula (13):

S = - \sum_{i = 1}^{n} P (ω_{i} | x) log P (ω_{i} | x)

where, $n$ is the class number, $x$ is the image feature, $ω_{i}$ represents the $i$ class. For images of size $M \times N$ , define information entropy as shown in formula (14):

S = - \sum_{k = 0}^{G - 1} P_{k} log P_{k}

Define $P_{k}$ as:

P_{k} = \frac{1}{MN} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} ρ_{ij} (k)

ρ_{ij} (k) = \{\begin{matrix} 1 & R (i, j) = k \\ 0 & else \end{matrix}), k = 1, \dots, C - 1

where, $C$ is the sum of grayscale levels, and $R (i, j)$ is the grayscale value. $P_{k}$ satisfies the following:

\sum_{k = 0}^{C - 1} P_{k} = 1

Relative entropy loss can measure the degree of information loss. Suppose that the entropy of the sample image is $S_{1}$ , the entropy of the sample image is $S_{η}$ when the sampling rate is $η$ , the relative entropy loss is as follows:

δ_{η} = |\frac{S_{1} - S_{η}}{S_{1}}|

It can be seen from the above analysis that the relative entropy loss can be used as the basis for the selection of sampling rate. In order to explore the relationship between them, this paper analyzes the change trend of relative entropy loss in the range of sampling rate $η \in [0.01, 0.9]$ , and the trend curve is shown in Fig. 6.

The relationship between different image sampling rates and relative entropy loss.

Figure 6a–c are three different original images. Figure 6d shows that $δ_{η}$ increases when $η$ decreases. This trend shows that when $η$ decreases, the sample image distortion increases, but the distortion is small. Most of the original information remains within a certain sampling rate. When the sampling rate is very small, $δ_{η}$ will greatly increase. Within the allowable range of relative entropy loss, the image has less data at the minimum sampling rate and the thresholds calculated are also reliable. In this range, the minimum sampling rate can be calculated by searching the optimal sampling rate algorithm.

Search for optimal sampling rate

The minimum sampling rate can be calculated by dichotomy. Although this algorithm is proved to be effective, it needs more iterations. In order to improve the search efficiency, variable step search can be used to find the minimum sampling rate. Suppose the relative entropy loss range is $[δ_{min}, δ_{max}]$ , the optimal sampling rate $η_{o}$ is:

η_{o} = min {η | δ_{η} \in [δ_{min}, δ_{max}]}

In fact, $η_{o}$ cannot be calculated accurately. It is unnecessary to continuously search $η_{o}$ for the accuracy of this paper. Therefore, this paper selects the first sampling rate $η_{f}$ instead of $η_{o}$ to meet the constraint of relative entropy loss. Suppose that the current iterative search step is $t$ , the variable step search algorithm is as follows:

\{\begin{matrix} t = η k \\ η * = η - t \end{matrix}), δ_{η} < δ_{min}

\{\begin{matrix} t = η (1 - k / 2) \\ η * = η + t \end{matrix}), δ_{η} > δ_{max}

For a single target image, the number of sample image datasets with sampling rate $η_{f}$ is limited. Therefore, the histogram created cannot contain data for each class, which affects the single peak judgment. To calculate the optimal number of thresholds, use the size $M \times N$ of image $S_{0}$ to ensure that the sampling rate is within the optimal range. Therefore, the optimal sampling rate $η_{o}$ can be defined as:

η_{o} = \{\begin{matrix} \frac{S_{0}}{min (M, N)}, & η \leq \frac{S_{0}}{min (M, N)} \\ η_{f}, & η_{f} > \frac{S_{0}}{min (M, N)} \end{matrix})

Set the number of optimization steps $H$ , the number of sample classes $(N + 1)$ , the number of class separation distance $k$ , so the complexity of the algorithm $θ_{x}$ is as follows:

θ_{x} = H \times (N + 1) \times k

Judgment function of validity in segmentation

In this section, in order to find out the optimal segmentation number of images, an improved correlation function between fuzzy sets is constructed. This function is used to judge the effectiveness of image segmentation²⁶. In fuzzy partition, fuzzy membership describes the correlation of classification data sets.

Suppose that the shape of the image is $M \times N$ , the corresponding set $L = \{L_{ij}, i = 1, 2, \dots, M, j = 1, 2, \dots, N\}$ of image grayscale value matrices containing $α$ classes.

Then the fuzzy deviation degree for class $c$ is:

δ_{c}^{2} = \sum_{i = 1}^{M} \sum_{j = 1}^{N} μ_{c}^{2} (L_{ij}) {∥L_{ij} - ν_{c}∥}^{2}

Define the fuzzy relation matrix set $R_{kl}$ as:

R_{kl} = \sum_{i = 1}^{M} \sum_{j = 1}^{N} μ_{k} (L_{ij}) μ_{l} (L_{ij}) ∥L_{ij} - ν_{k}∥ ∥L_{ij} - ν_{l}∥

The fuzzy membership of classes $k$ and $l$ is defined as:

φ_{kl} = \frac{R_{kl}}{δ_{k}^{} δ_{l}^{}}

Then fuzzy membership function can be defined as validity judgment function. If the following equation is satisfied:

F (U *, α *) = min \{max φ_{kl} (U, α)\}, 1 \leq k \leq α, 1 \leq l \leq α, l \neq k

Then $α *$ is the optimal segmentation number of the sample image.

Through the above analysis, the flow chart of the OSNC algorithm constructed in this paper is shown in Fig. 7.

Evaluation index

Establish a unified comparative value: Supposing that the number of testing samples in a single experiment is $N$ and the number of correct detections is $n_{i} (i = 1, 2, \dots, N)$ , the single recognition accuracy rate $M_{s}$ is defined as follows according to the experimental situation:

M_{s} = \frac{n_{i}}{N}, (i = 1, 2, \dots, N)

Suppose the number of experiments is $P$ , the average recognition accuracy $M$ is:

M = \frac{\sum M_{s}}{P}

Experiment and result analysis

The experiment environment is Windows 10 operating system, and all the simulation experiments are run using a CPU of Intel(R) Core(TM) i7-9700, a 4-core processor at 3.0 GHz, 32.0 GB RAM.

In order to verify the effectiveness of the segmentation validity judgment function. Taking Fig. 6a as the segmentation sample, the value of the segmentation validity judgment function is calculated. The experimental comparison results are shown in Table 1.

Table 1.

Comparison results of iteration times and searching time between this paper and the other algorithm.

Number of segmentations		2	3	4	5	6	7	8	9
K-means	Iterative times	13	18	16	17	15	20	15	22
K-means	Searching time (ms)	19	39	42	64	77	143	113	180
FCM	Iterative times	12	20	15	18	14	16	17	29
FCM	Searching time (ms)	18	27	35	80	62	71	94	96
PSO-FCM	Iterative times	13	19	18	19	18	17	20	24
PSO-FCM	Searching time (ms)	20	27	34	70	60	58	81	97
OSNC	Iterative times	10	11	13	12	13	11	10	9
OSNC	Searching time (ms)	18	20	31	66	65	70	100	92

Open in a new tab

As shown in Table 1, the OSNC method has fewer iterations and shorter searching time than the other methods (K-means¹⁵, FCM²¹, PSO-FCM²⁰). Experiments show that this method reduces the search time of the optimal segmentation number to a certain extent. The results of the four image segmentation methods are shown in Fig. 8.

Comparison of segmentation results between proposed algorithm and other segmentation algorithm.

This paper uses the sample images in the VOC database to verify the feasibility of the algorithm through continuous image segmentation experiments. Next, for the image samples collected in the industrial production site of frozen dumplings, the effectiveness of the OSNC algorithm is verified by combining the Matlab image processing platform.

Comparison of OSNC algorithm and existing algorithms

Data source

In this paper, the effectiveness of the algorithm is verified by the field images data of the factory frozen dumpling production line. Sample images are sampled by grayscale camera. To ensure images quality, the resolution ratio of the camera reaches at least 2 million pixels. The camera is erected directly above the conveyor belt, and the receptive field cannot exceed the maximum edge of the conveyor belt. The camera samples every 0.15 s. Image samples include positive samples (qualified dumplings) and negative samples (defective dumplings), and normalized to the same size. Sample images captured under different background colors (dark-green and white) are shown in Fig. 9.

Sample images under different background.

Experiment

The hardware environment of the experiment is Windows 10 operating system, and all the simulation experiments are run using a CPU of Intel(R) Core(TM) i7-9700, a 4-core processor at 3.0 GHz, 32.0 GB RAM. The software of this experiment is a YOLOv3 defect detection platform based on Matlab, and the OSNC image segmentation algorithm is added after the input and before the backbone network.

In this experiment, 4000 images of frozen dumplings were used as sample databases, including 2000 images of dark-green background and 2000 images of white background. The database is divided into training samples and testing samples according to the ratio of 1:1.

P1: In the training sample, all experiments used the restriction $δ \in [0.01, 0.02]$ , and the single-peak determination threshold $ξ_{h} = 0.015$ .

P2: Simple and complex sample images were segmented using the K-MEANS¹⁵, FCM²¹ and PSO-FCM²⁰ segmentation algorithm and the OSNC segmentation algorithm proposed in this paper. The results are shown in Figs. 10 and 11.

Segmentation results for simple image samples.

Segmentation results for complex image samples.

Figure 10 contains samples of dumplings in different backgrounds; the sample images are of cracked surface dumplings, normal dumplings, and defective dumplings. In each row, the K-MEANS segmentation algorithm fails to segment the cracked defects, or is considered to be insensitive to changes in the grey value at the cracked defects, and uses the folds of the dumpling skin as key information for segmentation. the FCM segmentation algorithm is too sensitive to changes in the grey value of the overall image, and the segmentation contains both key and noisy information; PSO-FCM can effectively remove the background interference, but it will retain most of the defect information and redundant information at the same time. The PSO-FCM segmentation effect is better than the original FCM segmentation algorithm; the OSNC algorithm proposed in this paper is effective in segmenting the cracked dumplings for The OSNC algorithm proposed in this paper can effectively segment the key information of the dumpling cracks and is almost unaffected by noise, which provides a good preparation for subsequent defect detection. In Fig. 11, all four segmentation algorithms can effectively segment the background, defects and dumpling wrapper. However, in terms of segmentation effectiveness, the OSNC algorithm in this paper has significant background noise reduction, can retain the key dumpling defect features, and is highly resistant to interference.

P3: The training sample images processed by K-means, FCM, PSO-FCM and OSNC segmentation algorithms are respectively imported into Matlab image processing platform. After the model training is stable, the corresponding four models are recorded as: K-means, FCM, PSO-FCM and OSNC. The platform uses fast convolution network combined with edge detection algorithm for feature extraction. The dumplings that do not meet the production requirements such as surface damage, crack and stain can be identified and framed, and the label is defined as "Bad". For qualified dumplings can also be identified and framed, define the label as "Good".

P4: Use testing sample images to test the defect detection effect of frozen dumplings. For the same test sample images, the visual detection results of the four models are shown in Fig. 12. According to Fig. 12, using the model of K-means, FCM and PSO-FCM segmentation algorithm, there are some misjudgments in dumpling defect detection, as shown in the red box in the Fig. 12. In contrast, the model using OSNC algorithm can accurately identify qualified and unqualified dumplings, and has stronger anti-interference ability and higher confidence level.

Comparison of detection effects of four models.

P5: Evaluate four defect detection methods. Four defect detection models obtained by P3 were used to detect all test sample images. 500 experiments were conducted respectively. The experiment records the detection time and the results of each model (the number of correct recognition and error recognition).

Result analysis

In the experiment, the recognition accuracy rate is calculated every 50 times, and the comparison results are shown in Fig. 13. The total number of samples, the number of accurately detected samples and the detection accuracy rate can be obtained from the Figure at any time. As shown in the Figure, with the increase of sample size, the accuracy of dumpling defect detection increases. After reaching a certain sample size, the curve tends to be stable. After 500 experiments, the accuracy rates of defect detection of frozen dumplings using the models of OSNC, PSO-FCM, FCM and K-means algorithms were 95.9%, 92.5%, 90.2% and 87.5%, respectively. The experimental results show that the OSNC algorithm can not only improve the accuracy rate of model defect detection, but also shorten the detection time.

Comparison of recognition accuracy rate of models.

Comprehensive evaluation of the performance of four segmentation algorithms: (1) The segmentation time of the algorithm for segmentation samples. For the training samples of this experiment, it is the average time for the algorithm to segment samples 10 times. (2) Anti-interference capability of the algorithm. It is an approximate estimation based on the segmentation effect and algorithm complexity. (3) Defect detection time and recognition accuracy. For the above 500 defect detection experiments. The model completed the average time and average recognition accuracy rate of dumpling defect detection. The comprehensive comparison results are shown in Table 2.

Table 2.

Comprehensive comparison of different segmentation algorithms.

Algorithms	Segmentation time (s)	Anti-interference ability	Defect detection time (s)	Detection accuracy rate (%)
K-means	25.46	weak	0.121	87.5
FCM	18.62	inferior	0.092	90.2
PSO-FCM	20.17	inferior	0.012	92.5
Proposed method	12.98	strong	0.053	95.9

Open in a new tab

From the comparison of experimental results, the OSNC algorithm can quickly and accurately segment the frozen dumpling images. The image detection model using OSNC algorithm not only has fast processing speed, but also has higher recognition accuracy for frozen dumpling defects, which is more than 5% higher than that using other segmentation algorithm. The effectiveness of the algorithm established in this paper is proved.

In addition, the OSNC algorithm has strong anti-interference ability and better adaptability to different environments. In order to further improve the accuracy rate of defect detection, in actual processing and production, two cameras can be used to sample the image information of the dumplings on the conveyor belt. It is convenient for subsequent executing agencies to eliminate unqualified dumplings.

Discussion

In this paper, an OSNC segmentation algorithm is established to cluster the feature vectors of stuffed food images. The image is segmented by using the distance function between categories. In order to optimize the OSNC segmentation algorithm, this paper calculates the best segmentation point by constructing the objective function of the clustering segmentation center; the variable step search algorithm is introduced to optimize the time of calculating the minimum sampling rate and improve the segmentation speed. At the same time, the relative entropy loss is used as the basis for judging the image sampling distortion. In addition, the fuzzy correlation is also considered, and the validity judgment function of segmentation is obtained, and the optimal segmentation number can be calculated. This paper used the images in the VOC database to verify the feasibility of the algorithm, and used the frozen dumpling image to verify the effectiveness of the algorithm. According to the comparative experimental results, the OSNC algorithm has faster segmentation speed and stronger anti-interference ability. The defect detection accuracy rate of the image processing model using this algorithm is more than 95%, which is about 5% higher than that of the other algorithm, and the defect detection speed is faster. The application of this method can meet the factory 's detection and elimination of defective dumplings and improve the qualified rate of dumpling production.

In order to enhance the rapidity and robustness, the small neighborhood algorithm will be improved through the aspects of objective function, membership function and distance function, in the future research.

Acknowledgements

This work is supported by Key Science and Technology Program of Henan Province (222102210084); Key Science and Technology Project of Henan Province University (23A413007), respectively.

Abbreviations

FCM: Fuzzy C-means clustering algorithm
K-Means: K-means clustering algorithm
PSO-FCM: Particle swarm optimization-fuzzy C-means clustering algorithm
OSNC: Optimized small neighborhood clustering algorithm
VOC: Visual object class dataset

Author contributions

All authors contributed to the study's conception and design. Data collection and analysis were performed by Q.E.W. and P.L.L. The first draft of the manuscript was written by Q.E.W. and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Lu WS, Chen JJ, Xue F. Using computer vision to recognize composition of construction waste mixtures: A semantic segmentation approach. Resour. Conserv Recyc. 2022;178:1–13. doi: 10.1016/j.resconrec.2021.106022. [DOI] [Google Scholar]
2.Song JT, Jiao WB, Lankowicz K, Cai ZH, Bi HS. A two-stage adaptive thresholding segmentation for noisy low-contrast images. Ecol. Inform. 2022;69:1–8. doi: 10.1016/j.ecoinf.2022.101632. [DOI] [Google Scholar]
3.Wang XQ, Wang S, Guo YC, Hu K, Wang WS. Coal gangue image segmentation method based on edge detection theory of star algorithm. Int. J. Coal. Prep. Util. 2022;1:1–16. [Google Scholar]
4.Guo RL, Lu SD, Wu YH, Zhang MM, Wang F. Robust and fast dual-wavelength phase unwrapping in quantitative phase imaging with region segmentation. Opt. Commun. 2022;510:1–10. doi: 10.1016/j.optcom.2022.127965. [DOI] [Google Scholar]
5.Chen Y, Wang MJ, Heidari AA, Shi BB, Hu ZY, Zhang Q, et al. Multi-threshold image segmentation using a multi-strategy shuffled frog leaping algorithm. Expert Syst. Appl. 2022;194:1–25. doi: 10.1016/j.eswa.2022.116511. [DOI] [Google Scholar]
6.Zhu W, Liu L, Kuang FJ, Li LZ, Xu SL, Liang YQ. An efficient multi-threshold image segmentation for skin cancer using boosting whale optimizer. Comput. Biol. Med. 2022;151:1–19. doi: 10.1016/j.compbiomed.2022.106227. [DOI] [PubMed] [Google Scholar]
7.Gao HX, Zhou G, Cao Y, Luo ZY, Shen ZC, Jasmine A. Research on edge detection and image segmentation of cabinet region based on edge computing joint image detection algorithm. Int. J. Reliab. Qual. 2022;29(05):270–278. [Google Scholar]
8.Tian R, Sun G, Liu XC, Zheng BW. Sobel edge detection based on weighted nuclear norm minimization image denoising. Electronics. 2021;10(6):655–656. doi: 10.3390/electronics10060655. [DOI] [Google Scholar]
9.Jiang F, Wang G, He P, Zheng CC, Xiao ZY, Yue Wu. Application of canny operator threshold adaptive segmentation algorithm combined with digital image processing in tunnel face crevice extraction. J. Supercomput. 2022;78(9):11601–11620. doi: 10.1007/s11227-022-04330-9. [DOI] [Google Scholar]
10.Yang Y, Zhao X, Huang M, Zhu QB. Multispectral image based germination detection of potato by using supervised multiple threshold segmentation model and Canny edge detector. Comput. Electron. Agric. 2021;182:1–11. doi: 10.1016/j.compag.2021.106041. [DOI] [Google Scholar]
11.Lu YC, Duanmu L, Zhai ZQ, Wang ZS. Application and improvement of Canny edge-detection algorithm for exterior wall hollowing detection using infrared thermal images. Energy Build. 2022;274:1–15. doi: 10.1016/j.enbuild.2022.112421. [DOI] [Google Scholar]
12.Liao WX, He P, Hao J, Wang XY, Yang RL, An D, et al. Automatic identification of breast ultrasound image based on supervised block-based region segmentation algorithm and features combination migration deep learning model. IEEE J. Biomed. Health. 2020;24(4):984–993. doi: 10.1109/JBHI.2019.2960821. [DOI] [PubMed] [Google Scholar]
13.Nawaz M, Mehmood Z, Nazir T, Naqvi RA, Rehman A, Iqbal M, et al. Skin cancer detection from dermoscopic images using deep learning and fuzzy k-means clustering. Microsc. Res. Tech. 2022;85(1):339–351. doi: 10.1002/jemt.23908. [DOI] [PubMed] [Google Scholar]
14.Shi JS, Ye YG, Zhu DX, Su LT, Huang YF, Huang JL. Comparative analysis of pulmonary nodules segmentation using multiscale residual U-Net and fuzzy C-means clustering. Comput. Methods Prog. Biol. 2021;209:1–7. doi: 10.1016/j.cmpb.2021.106332. [DOI] [PubMed] [Google Scholar]
15.Trivedi VK, Shukla PK, Pandey A. Automatic segmentation of plant leaves disease using min-max hue histogram and k-mean clustering. Multimed. Tools Appl. 2022;81(14):20201–20228. doi: 10.1007/s11042-022-12518-7. [DOI] [Google Scholar]
16.Wu FS, Zhu CG, Xu JX, Bhatt MW, Sharma A. Research on image text recognition based on canny edge detection algorithm and k-means algorithm. Int. J. Syst. Assur. Eng. 2021;13:72–80. doi: 10.1007/s13198-021-01262-0. [DOI] [Google Scholar]
17.Song J, Yuan L. Brain tissue segmentation via non-local fuzzy c-means clustering combined with Markov random field. Math. Biosci. Eng. 2022;19(2):1891–1908. doi: 10.3934/mbe.2022089. [DOI] [PubMed] [Google Scholar]
18.Soleymanifard M, Hamghalam M. Multi-stage glioma segmentation for tumour grade classification based on multiscale fuzzy C-means. Multimed. Tools Appl. 2022;81(6):8451–8470. doi: 10.1007/s11042-022-12326-z. [DOI] [Google Scholar]
19.Gao YL, Wang ZH, Xie JX, Pan JY. A new robust fuzzy c-means clustering method based on adaptive elastic distance. Knowl. Based Syst. 2022;237:1–16. doi: 10.1016/j.knosys.2021.107769. [DOI] [Google Scholar]
20.Brikh L, Guenounou O, Bakir T. Selection of minimum rules from a fuzzy TSK model using a PSO–FCM combination. J. Control Autom. Electron. 2023;34:384–393. doi: 10.1007/s40313-022-00975-2. [DOI] [Google Scholar]
21.Borlea ID, Precup RE, Borlea AB, Iercan D. A unified form of fuzzy C-means and K-means algorithms and its partitional implementation. Knowl. Based Syst. 2021;214:1–16. doi: 10.1016/j.knosys.2020.106731. [DOI] [Google Scholar]
22.Hu JH, Yin HL, Wei GL, Song Y. An improved FCM clustering algorithm with adaptive weights based on PSO-TVAC algorithm. Appl. Intell. 2022;1:1–16. [Google Scholar]
23.Fu ZX, An J, Yang QY, Yuan HJ, Sun Y, Ebrahimian H. Skin cancer detection using kernel fuzzy C-means and developed red fox optimization algorithm. Biomed. Signal Process. 2022;71:1–11. doi: 10.1016/j.bspc.2021.103160. [DOI] [Google Scholar]
24.Wu Y, Li Q. The algorithm of watershed color image segmentation based on morphological gradient. Sensors. 2022;22(21):1–23. doi: 10.3390/s22218202. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Das A, Dhal KG, Ray S, Gálvez J. Histogram-based fast and robust image clustering using stochastic fractal search and morphological reconstruction. Neural Comput. Appl. 2022;1:1–24. [Google Scholar]
26.Wang G, Wang JS, Wang HY. Fuzzy C-means clustering validity function based on multiple clustering performance evaluation components. Int. J. Fuzzy Syst. 2022;24(4):1859–1887. doi: 10.1007/s40815-021-01243-2. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

[CR1] 1.Lu WS, Chen JJ, Xue F. Using computer vision to recognize composition of construction waste mixtures: A semantic segmentation approach. Resour. Conserv Recyc. 2022;178:1–13. doi: 10.1016/j.resconrec.2021.106022. [DOI] [Google Scholar]

[CR2] 2.Song JT, Jiao WB, Lankowicz K, Cai ZH, Bi HS. A two-stage adaptive thresholding segmentation for noisy low-contrast images. Ecol. Inform. 2022;69:1–8. doi: 10.1016/j.ecoinf.2022.101632. [DOI] [Google Scholar]

[CR3] 3.Wang XQ, Wang S, Guo YC, Hu K, Wang WS. Coal gangue image segmentation method based on edge detection theory of star algorithm. Int. J. Coal. Prep. Util. 2022;1:1–16. [Google Scholar]

[CR4] 4.Guo RL, Lu SD, Wu YH, Zhang MM, Wang F. Robust and fast dual-wavelength phase unwrapping in quantitative phase imaging with region segmentation. Opt. Commun. 2022;510:1–10. doi: 10.1016/j.optcom.2022.127965. [DOI] [Google Scholar]

[CR5] 5.Chen Y, Wang MJ, Heidari AA, Shi BB, Hu ZY, Zhang Q, et al. Multi-threshold image segmentation using a multi-strategy shuffled frog leaping algorithm. Expert Syst. Appl. 2022;194:1–25. doi: 10.1016/j.eswa.2022.116511. [DOI] [Google Scholar]

[CR6] 6.Zhu W, Liu L, Kuang FJ, Li LZ, Xu SL, Liang YQ. An efficient multi-threshold image segmentation for skin cancer using boosting whale optimizer. Comput. Biol. Med. 2022;151:1–19. doi: 10.1016/j.compbiomed.2022.106227. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Gao HX, Zhou G, Cao Y, Luo ZY, Shen ZC, Jasmine A. Research on edge detection and image segmentation of cabinet region based on edge computing joint image detection algorithm. Int. J. Reliab. Qual. 2022;29(05):270–278. [Google Scholar]

[CR8] 8.Tian R, Sun G, Liu XC, Zheng BW. Sobel edge detection based on weighted nuclear norm minimization image denoising. Electronics. 2021;10(6):655–656. doi: 10.3390/electronics10060655. [DOI] [Google Scholar]

[CR9] 9.Jiang F, Wang G, He P, Zheng CC, Xiao ZY, Yue Wu. Application of canny operator threshold adaptive segmentation algorithm combined with digital image processing in tunnel face crevice extraction. J. Supercomput. 2022;78(9):11601–11620. doi: 10.1007/s11227-022-04330-9. [DOI] [Google Scholar]

[CR10] 10.Yang Y, Zhao X, Huang M, Zhu QB. Multispectral image based germination detection of potato by using supervised multiple threshold segmentation model and Canny edge detector. Comput. Electron. Agric. 2021;182:1–11. doi: 10.1016/j.compag.2021.106041. [DOI] [Google Scholar]

[CR11] 11.Lu YC, Duanmu L, Zhai ZQ, Wang ZS. Application and improvement of Canny edge-detection algorithm for exterior wall hollowing detection using infrared thermal images. Energy Build. 2022;274:1–15. doi: 10.1016/j.enbuild.2022.112421. [DOI] [Google Scholar]

[CR12] 12.Liao WX, He P, Hao J, Wang XY, Yang RL, An D, et al. Automatic identification of breast ultrasound image based on supervised block-based region segmentation algorithm and features combination migration deep learning model. IEEE J. Biomed. Health. 2020;24(4):984–993. doi: 10.1109/JBHI.2019.2960821. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Nawaz M, Mehmood Z, Nazir T, Naqvi RA, Rehman A, Iqbal M, et al. Skin cancer detection from dermoscopic images using deep learning and fuzzy k-means clustering. Microsc. Res. Tech. 2022;85(1):339–351. doi: 10.1002/jemt.23908. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Shi JS, Ye YG, Zhu DX, Su LT, Huang YF, Huang JL. Comparative analysis of pulmonary nodules segmentation using multiscale residual U-Net and fuzzy C-means clustering. Comput. Methods Prog. Biol. 2021;209:1–7. doi: 10.1016/j.cmpb.2021.106332. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Trivedi VK, Shukla PK, Pandey A. Automatic segmentation of plant leaves disease using min-max hue histogram and k-mean clustering. Multimed. Tools Appl. 2022;81(14):20201–20228. doi: 10.1007/s11042-022-12518-7. [DOI] [Google Scholar]

[CR16] 16.Wu FS, Zhu CG, Xu JX, Bhatt MW, Sharma A. Research on image text recognition based on canny edge detection algorithm and k-means algorithm. Int. J. Syst. Assur. Eng. 2021;13:72–80. doi: 10.1007/s13198-021-01262-0. [DOI] [Google Scholar]

[CR17] 17.Song J, Yuan L. Brain tissue segmentation via non-local fuzzy c-means clustering combined with Markov random field. Math. Biosci. Eng. 2022;19(2):1891–1908. doi: 10.3934/mbe.2022089. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Soleymanifard M, Hamghalam M. Multi-stage glioma segmentation for tumour grade classification based on multiscale fuzzy C-means. Multimed. Tools Appl. 2022;81(6):8451–8470. doi: 10.1007/s11042-022-12326-z. [DOI] [Google Scholar]

[CR19] 19.Gao YL, Wang ZH, Xie JX, Pan JY. A new robust fuzzy c-means clustering method based on adaptive elastic distance. Knowl. Based Syst. 2022;237:1–16. doi: 10.1016/j.knosys.2021.107769. [DOI] [Google Scholar]

[CR20] 20.Brikh L, Guenounou O, Bakir T. Selection of minimum rules from a fuzzy TSK model using a PSO–FCM combination. J. Control Autom. Electron. 2023;34:384–393. doi: 10.1007/s40313-022-00975-2. [DOI] [Google Scholar]

[CR21] 21.Borlea ID, Precup RE, Borlea AB, Iercan D. A unified form of fuzzy C-means and K-means algorithms and its partitional implementation. Knowl. Based Syst. 2021;214:1–16. doi: 10.1016/j.knosys.2020.106731. [DOI] [Google Scholar]

[CR22] 22.Hu JH, Yin HL, Wei GL, Song Y. An improved FCM clustering algorithm with adaptive weights based on PSO-TVAC algorithm. Appl. Intell. 2022;1:1–16. [Google Scholar]

[CR23] 23.Fu ZX, An J, Yang QY, Yuan HJ, Sun Y, Ebrahimian H. Skin cancer detection using kernel fuzzy C-means and developed red fox optimization algorithm. Biomed. Signal Process. 2022;71:1–11. doi: 10.1016/j.bspc.2021.103160. [DOI] [Google Scholar]

[CR24] 24.Wu Y, Li Q. The algorithm of watershed color image segmentation based on morphological gradient. Sensors. 2022;22(21):1–23. doi: 10.3390/s22218202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Das A, Dhal KG, Ray S, Gálvez J. Histogram-based fast and robust image clustering using stochastic fractal search and morphological reconstruction. Neural Comput. Appl. 2022;1:1–24. [Google Scholar]

[CR26] 26.Wang G, Wang JS, Wang HY. Fuzzy C-means clustering validity function based on multiple clustering performance evaluation components. Int. J. Fuzzy Syst. 2022;24(4):1859–1887. doi: 10.1007/s40815-021-01243-2. [DOI] [Google Scholar]

PERMALINK

A clustering-optimized segmentation algorithm and application on food quality detection

QingE Wu

Penglei Li

Zhiwu Chen

Tao Zong

Abstract

Introduction

Figure 1.

Small neighborhood clustering segmentation algorithm

Figure 2.

Figure 3.

Optimistic method

Selection of optimal segmentation points

Figure 4.

Selection of optimal segmentation sampling rate

Calculation of optimal sampling rate

Figure 5.

Figure 6.

Search for optimal sampling rate

Judgment function of validity in segmentation

Figure 7.

Evaluation index

Experiment and result analysis

Table 1.

Figure 8.

Comparison of OSNC algorithm and existing algorithms

Data source

Figure 9.

Experiment

Figure 10.

Figure 11.

Figure 12.

Result analysis

Figure 13.

Table 2.

Discussion

Acknowledgements

Abbreviations

Author contributions

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases