Abstract
With the rapid development of industrial automation and intelligent manufacturing, defect detection of electronic products has become crucial in the production process. Traditional defect detection methods often face the problems of insufficient accuracy and inefficiency when dealing with complex backgrounds, tiny defects, and multiple defect types. To overcome these problems, this paper proposes Y-MaskNet, a multi-task joint learning framework based on YOLOv5 and Mask R-CNN, which aims to improve the accuracy and efficiency of defect detection and segmentation in electronic products. Y-MaskNet combines the high efficiency of YOLOv5 in target detection with the fine segmentation capability of Mask R-CNN and optimizes the overall performance of the model through a multi-task learning framework. Experimental results show that Y-MaskNet achieves a significant improvement in detection and segmentation tasks, with mAP@[0.5:0.95] reaching 0.72 (up from 0.62 for YOLOv5 and 0.65 for Mask R-CNN) on the PCB Defect Dataset, and IoU improving by 7% compared to existing methods. These improvements are particularly notable in small object detection and fine-grained defect segmentation, making Y-MaskNet an efficient and accurate solution for defect detection in electronic products, offering strong technical support for future industrial intelligent quality control.
Introduction
With the rapid development of electronic product manufacturing, defect detection and quality control have become essential to ensure product performance and reliability. Particularly in high-precision electronic components such as circuit boards, chips, and integrated circuits, even minor defects can lead to performance instability or failure. Traditional defect detection methods [37,38], such as manual inspection and classical image processing techniques (e.g., edge detection, threshold segmentation), often struggle with complex backgrounds, small defects, and real-time data processing needs, resulting in high labor costs and low accuracy.
Recent advancements in deep learning have greatly improved defect detection, particularly in computer vision. Models like Mask R-CNN have shown excellent performance in instance segmentation by using Region Proposal Networks (RPNs) and Fully Convolutional Networks (FCNs) [36]. However, Mask R-CNN faces limitations, particularly in handling small object defects, background noise, and computational complexity. On the other hand, YOLOv5, a fast and efficient target detection model, excels in real-time performance but lacks fine-grained instance segmentation capabilities [39,40]. This paper proposes combining the strengths of YOLOv5’s fast detection and Mask R-CNN’s precise segmentation to create an efficient and accurate defect detection and segmentation system.
In view of the existing deep learning models in the field of defect detection of electronic products, this paper innovatively constructs a hybrid model architecture Y-MaskNet, which combines the efficient target detection ability of YOLOv5 and the fine instance segmentation ability of Mask R-CNN [41], and constructs a multi-task joint learning framework to solve multiple challenges in the field of defect detection of electronic products, including small object detection, background noise processing, model real-time and segmentation accuracy [42,43]. Specifically, YOLOv5 is used to quickly detect the defective region in electronic products, and Mask R-CNN is used to perform fine segmentation on the defective region, which improves the accuracy and detail precision of defect detection [44]. Through the joint optimization of loss functions of target detection task and segmentation task, Y-MaskNet can improve the accuracy and robustness of target detection while maintaining high speed [45].
The research in this paper fills the gap between the existing techniques for balancing high efficiency and high accuracy in defect detection of electronic products, proposes a new hybrid model framework, and experimentally demonstrates the superior performance of this model in defect detection and segmentation tasks of electronic products. The core innovation of this paper is to combine the advantages of YOLOv5 and Mask R-CNN, which improves the robustness and accuracy of the model under complex backgrounds and tiny defects through a multi-task learning framework, and provides a new technological solution for quality control and automated production of electronic products.
Literature review
Anomaly detection techniques
Recent advancements in anomaly detection have significantly improved defect detection in electronic products [9,10]. Traditional methods, such as statistical techniques, are limited in handling high-dimensional and nonlinear data [11]. In contrast, deep learning models, particularly Autoencoders and GANs [12,13,34], have shown great promise in identifying complex defects. Autoencoders detect anomalies by measuring reconstruction errors, while GANs generate realistic data while learning anomaly patterns [14,15]. These methods excel in detecting defects in complex backgrounds and small objects, making them more suitable for modern defect detection tasks.
However, one challenge faced by deep learning anomaly detection methods is the scarcity and imbalance of data. Especially in the case of small datasets of specific defects in some electronic products [16], how to effectively perform unsupervised learning and reduce the reliance on a large amount of labeled data is an important research topic in this field. In the future, how to further improve the generalization ability of the model in data-scarce scenarios and reduce the overfitting phenomenon is still a direction worth exploring.
Multimodal learning with cross-domain applications
With the increasing complexity of defect detection requirements in electronic products, a single data source can no longer meet the requirement of accurately identifying defects. Multimodal learning [17], by fusing information from different data sources (e.g., images, acoustic data, sensor data, etc.), provides new ideas for improving the accuracy and robustness of defect detection. In practical applications, the combination of image and sensor data can describe the state of the product more comprehensively [18,19,35]. For example, by combining vision images and sensor data such as temperature and pressure, internal defects in circuit boards or other electronic components can be detected more accurately, especially in the identification of tiny cracks or subtle bubbles, where single modality is often not sufficient.
Cross-domain applications are also an important direction in multimodal learning. By borrowing advanced techniques from other domains (e.g., medical image processing, autonomous driving, etc.) [20], it can provide valuable insights for defect detection in electronic products. For example, in medical image analysis, important breakthroughs have been made in segmentation processing techniques for small objects and details, which can be migrated to defect detection in electronic products for fine defect recognition and segmentation [21]. Meanwhile, cross-domain transfer learning can also effectively alleviate the problem of insufficient defect data for electronic products, especially when the defect types are complex and the amount of data is limited [22], by migrating the labeled data from other industries, it can accelerate the model training and improve the detection accuracy.
Although multimodal learning and cross-domain applications offer significant advantages, there are still challenges in data fusion and feature alignment. How to maximize the synergistic effect between different data sources and how to deal with data inconsistency and noise between modalities remain core issues that need to be addressed [23]. In the future, researchers need to conduct in-depth explorations on data preprocessing, feature fusion techniques, and cross-domain migratory learning in order to better enhance the performance of models in complex detection tasks.
Advances in deep learning for defect detection in electronic products
The application of deep learning in defect detection in electronic products, especially convolutional neural networks (CNNs), has made significant progress in recent years [24]. While traditional image processing techniques often require the manual design of features, deep learning models are able to automatically extract effective features from data [25], greatly simplifying the processing process.CNNs are particularly suitable for extracting spatial features from images, and are effective in identifying various types of defects, such as cracks, soldering problems, missing components, etc., in the defect detection of electronic products [26].
Currently, most of the deep learning-based defect detection methods use an end-to-end learning framework, which is capable of detecting and localizing defects at the same time [27,28]. Compared to traditional methods, deep learning not only improves the accuracy of detection, but also handles more complex defect patterns such as tiny cracks and surface defects. However, existing models still face the problems of insufficient accuracy and high data quality requirements in some cases, especially when there are many types of defects and data labeling is difficult [29]. To address these problems, researchers have proposed various approaches such as data augmentation, semi-supervised learning, and migration learning techniques to compensate for the lack of data and improve the generalization ability of the models.
Finally, deep learning offers a new way of thinking for defect detection in electronic products, and its advantages in accuracy and automation are beginning to be realized [30]. With the development of technology, how to enhance the robustness of the model and reduce the dependence on large labeled data [31], and how to improve the real-time and interpretability of the model in the production environment will be the research focus in the future.
Materials and methods
Model design and methods
The Y-MaskNet model integrates the strengths of YOLOv5 and Mask R-CNN, and employs a multi-task joint learning framework for the detection and segmentation of electronic products. The structure of the model employs multiple collaborative modules that interact with each other tightly in the whole process from input to output to complete efficient processing and accurate segmentation of electronic product images.
Firstly, the input electronics images pass through a shared feature extraction network. YOLOv5 acts as the backbone network, which employs an efficient Convolutional Neural Network (CNN) structure to extract deep features in the image and perform rapid target detection.YOLOv5 target detection head is used to locate the defective regions in the image and generate the bounding boxes and category prediction for each target in the image. The design of this part is focused on fast localization and efficient detection to meet the requirement of rapid localization and initial defect recognition in the large-scale production line.
After finishing the target detection, the target candidate regions output from YOLOv5 pass to Mask R-CNN for fine instance segmentation.The Mask R-CNN further refine the target candidate regions through the Region Proposition Network (RPN) and generate a fixed-size feature map by aligning the features of different scales and sizes accurately through the RoIAlign method. Through this process, the Mask R-CNN can generate the detailed mask for each target candidate region and ensure the contour of each defect can be accurately segmented.
During the training process of the model, the loss functions of YOLOv5 and Mask R-CNN complete the joint learning and optimize for the target detection and instance segmentation. In this way, the two targetstarget detection and instance segmentationcan develop together and promote each other to ensure the model completes the target detection task and obtains accurate localization and high-precision segmentation in details.
Ultimately, the output of Y-MaskNet consists of a category label for each defect, the coordinates of the localization frame, and a segmentation mask. This not only clarifies the location of the defects, but also accurately describes their morphology, meeting the requirements of high-precision quality control.
The whole model finds a balance between high efficiency and accuracy, which makes it especially suitable for the task of detecting defects in electronic products in industrial production, and is able to efficiently deal with complex backgrounds and defects in small objects, while providing fine segmentation results.
Instance segmentation module: Mask R-CNN
Mask R-CNN is the key module responsible for instance segmentation in the model of this paper, which not only realizes the localization of objects, but also generates a high-precision mask for each target [1]. By introducing the RoIAlign layer, this module solves the accuracy problem caused by the quantization error of traditional RoIPooling in the image processing process, and further improves the segmentation effect. Fig 1 shows the working architecture of the Mask R-CNN module [2], which clarifies the whole processing flow from image input to mask output.
Fig 1. Detailed architecture of the mask R-CNN module.
Firstly, the input electronic product image is passed through a feature extraction network (usually ResNet or ResNext) to obtain a high-dimensional feature map. Based on this, the Mask R-CNN employs a region proposal network (RPN) to generate candidate regions of interest (RoI) [3,4]. The RPN scans the feature map by means of a sliding window and generates a series of candidate frames, which may contain the target.The goal of the RPN is to accurately locate the candidate regions by minimizing the classification loss of the regions and the bounding box regression loss, computed as as follows:
| (1) |
where is the classification loss, which represents the loss of whether the candidate region contains the target or not; and is the regression loss, which represents the positional difference between the candidate region box and the real box. By training each candidate region, the candidate frames generated by RPN will be fed into the subsequent RoIAlign module for alignment.
In the RoIAlign stage, Mask R-CNN spatially aligns the feature maps of each candidate region.The core purpose of RoIAlign is to eliminate the feature loss due to image quantization, so that the features of each candidate region can be accurately extracted by accurately aligning feature maps at different scales.The computational process of the RoIAlign operation can be expressed by the following equation: Where yi,j is the feature output after RoIAlign, wk is the weight of the sampling point, and f(xk) is the feature of the k sampling point. By spatially aligning the feature map, the Mask R-CNN is able to obtain a representation of the features corresponding to each candidate region, which are subsequently fed into the classification, regression and mask branches.
| (2) |
Where yi,j is the feature output after RoIAlign, wk is the weight of the sampling point, and f(xk) is the feature of the kth sampling point. By spatially aligning the feature map, the Mask R-CNN is able to obtain a representation of the features corresponding to each candidate region, which are subsequently fed into the classification, regression and mask branches.
Next, through the classification and bounding box regression branches, Mask R-CNN performs category prediction and bounding box adjustment for each candidate region. Assuming that the output category of the candidate region is c and the category prediction loss is , the loss function can be expressed as follows:
| (3) |
where p(c) is the probability that the candidate region belongs to the category c. By minimizing the classification loss function, Mask R-CNN is able to assign a category to each candidate region and further adjust the bounding box to better fit the real defective region.
Finally, Mask R-CNN generates an accurate mask for each target. The mask branch utilizes a fully convolutional network (FCN) to predict the pixel-level segmentation of the feature map for each candidate region. Let the output of the mask be Mi,j, the goal of this branch is to train the model by minimizing the pixel-level cross-entropy loss as follows:
| (4) |
where yi,j is the true mask, Mi,j is the predicted mask, and i,j is the pixel position. This loss function ensures that the generated mask can match the edges of the real defects for accurate instance segmentation.
Finally, the Mask R-CNN will output the class, the bounding box coordinates and the fine mask of each defect, which is the final result of defect detection and segmentation of electronic products. Above steps, Mask R-CNN can segment each defective part effectively, which is suitable for the detection of fine defects, i.e., cracks and fine blemishes.
The whole Mask R-CNN module can efficiently deal with complex electronic products defect detection task by integrating target detection and instance segmentation through multi-task learning method. Fig 1 illustrates the whole framework of Mask R-CNN module, which can clarify the concrete process of above steps.
Object detection module: YOLOv5
YOLOv5 is the core module responsible for target detection in the model of this paper, which is characterized by high efficiency and real-time performance, and is able to locate defective regions in the image quickly [5,6]. The target detection framework of YOLOv5 adopts a single-stage detection method, which is different from the traditional two-stage detection methods (e.g., Faster R-CNN), and it can complete the classification of the target and the position regression by one forward propagation [7], which It significantly improves the detection speed. Fig 2 illustrates the architecture of the YOLOv5 module, clearly presenting the processing flow from the input image to the output bounding box and categories.
Fig 2. YOLOv5 module architecture diagram.
First, YOLOv5 feeds the input electronics image into a deep convolutional neural network for feature extraction. The core part of this network is CSPDarknet53 (Cross Stage Partial Darknet), an efficient convolutional neural network structure that reduces computation while maintaining high accuracy [8]. After feature extraction, YOLOv5 generates multi-scale feature maps through a feature pyramid network (FPN) in order to detect defects of different sizes.YOLOv5’s target detection framework generates bounding boxes, category labels, and confidence scores for each candidate region from the output of one convolutional layer.YOLOv5’s detection framework calculates the loss based on the following equation:
| (5) |
where is the target confidence loss, which indicates whether the prediction frame contains the target or not, is the loss of non-target frames, is the loss of category categorization, and is the loss of bounding-box regression, which measures the discrepancy between predicted frames and the true frames. By minimizing these losses, YOLOv5 is able to accurately predict the location and type of defects in electronics images.
YOLOv5’s target detection process also includes the use of anchor boxes (anchor boxes) to predict the location and size of bounding boxes. During the training process, YOLOv5 adjusts the scale and position of the anchor boxes according to the size of the actual defective area in order to better fit the target. The generation and calculation of the anchor point box is realized by the following equation:
| (6) |
where (x,y) is the center coordinate of the bounding box, is the center coordinate of the anchor box, (w,h) is the width and height of the bounding box, p is the scaling factor of the particular anchor box, and is the Sigmoid activation function. With this approach, YOLOv5 is able to adjust the bounding box more accurately to accommodate defects of different sizes.
Pros of YOLOv5 are Fast and Efficient, So suitable to use in real-time inspection. YOLOv5 can quickly browse the image and find out the possible defective area in defect detection of electronic products, and output the accurate candidate boxes for the following segmentation process. Although YOLOv5 is poor in detecting small objects, it could still quickly process the large-scale dataset and achieve accurate defect localization.
However, the target detection module of YOLOv5 cannot output the refine segmentation information for each target, and its target localization and classification task are as followed. Therefore, for the fine-grained defect segmentation task, we will use the Mask R-CNN module to further refine the detection information. YOLOv5 + Mask R-CNN could make the whole model improve the segmentation accuracy while guaranteeing the real-time performance, especially when the background is complex and the defect is tiny. Through the joint training and optimization, YOLOv5 could play an important role in target detection and output the efficient target localization ability for the whole system.
In total, YOLOv5 as the target detection module could quickly screen out the possible defective area in the defect detection of electronic products through its efficient information structure and accurate localization ability. And then combine with the subsequent instance segmentation module, YOLOv5 could output the strong support for the final fine segmentation.
Experiment
Datasets
In this paper, we choose two public available datasets, PCB Defect Dataset and DeepPCB, to validate and evaluate our Y-MaskNet model in the defect detection and segmentation task on electronic products. These two datasets are chosen because they contain various defect images of electronic products with various types of defect samples, and these defect images can be used as various training data and testing data for the electronic products defect detection and segmentation model. We can try these two datasets to experiment the effectiveness of our model on the real application, and this model can also test its ability to solve the tiny defect, complex background and various types of defects on electronic products.
PCB Defect Dataset is a dataset devoted to Printed Circuit Board defect detection, which includes a variety of PCB images with various types of defects, such as open circuit, short circuit and copper wire misalignment. It is marked with rich information and can be used in object detection and instance segmentation tasks. Since the PCB is one of the most common parts of the electronic products, we choose this dataset to verify how to complete the defect detection and segmentation based on the electronic products.
DeepPCB is another dataset related to PCB defect detection, providing a large number of labeled PCB defect images. Unlike other PCB datasets, DeepPCB pays special attention to the visual characteristics of defects, such as cracks, solder joint defects, etc., and provides accurate labeling for each defect type. This makes the dataset ideal for detecting and segmenting fine defects, especially in practical industrial applications where subtle defects often need to be recognized with high precision.
Table 1 summarizes the details of these two datasets, including the source of the dataset, number of images, defect types, labeling types, resolution, etc., to better understand their relevance to the research objectives of this paper.
Table 1. Summary of the datasets used in this study (transposed).
| Attribute | PCB Defect Dataset | DeepPCB |
|---|---|---|
| Dataset Source | Kaggle (Public Dataset) | Provided by Researchers (Specialized for PCB Defect Detection) |
| Number of Images | 7000+ images | 5000+ images |
| Defect Types | Open Circuit, Short Circuit, Copper Misalignment, Damaged Circuits, etc. | Cracks, Soldering Defects, Surface Imperfections, etc. |
| Annotation Type | Bounding Boxes, Labels (Classification) | Bounding Boxes, Masks (Segmentation) |
| Resolution | 512 512 px | 256 256 px |
| Applicable Tasks | Object Detection, Instance Segmentation | Object Detection, Instance Segmentation |
| Dataset Features | Contains various PCB defects for object detection and defect classification tasks | High-precision annotations, focusing on fine-grained defect detection and segmentation, ideal for high-accuracy tasks |
Both datasets contain a variety of electronic product defect types and are suitable for multi-task learning and cross-task evaluation. By combining these two datasets, the model in this paper is able to be trained and tested in different types of defect detection tasks to fully validate its detection and segmentation capabilities in complex scenarios. Especially for the challenging tiny defect detection, the Y-MaskNet model is able to effectively address the balance between high accuracy and real-time performance and provide accurate segmentation results. These datasets provide a rich and challenging experimental platform for the research in this paper, and provide a solid foundation for model performance validation and algorithm improvement.
Experimental setup and parameters, evaluation metrics
In this experiment, we perform detailed evaluation and validation of Y-MaskNet model on two public datasets, PCB Defect Dataset and DeepPCB. Additionally, we compare the performance of Y-MaskNet with other lightweight models commonly used in real-time industrial applications, such as MobileNet and EfficientDet.
For the fairness and reproducibility of experiments, we provide detailed configurations for the hardware environment, training parameters, data preprocessing methods, and evaluation metrics.
Experimental hardware and environment.
The hardware platform of experiments is NVIDIA RTX 3090 GPU with 32GB video memory, which can efficiently handle large image data. The software platform is based on Python 3.8 and deep learning framework PyTorch 1.9.0, and CUDA 11.1 acceleration technology is equipped to accelerate model training and inference. In addition, all codes and experiments are implemented on Linux system.
Data preprocessing.
During data preprocessing, we unified the PCB Defect Dataset and DeepPCB dataset to ensure the consistency of the input images in model training. All images were uniformly resized to 512x512 pixels (256x256 pixels for DeepPCB) to fit the input requirements of YOLOv5 and Mask R-CNN. The images are normalized using a normalization method that subtracts the mean and divides by the standard deviation for each channel, with mean and standard deviation of [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225] respectively, which are derived from the ImageNet dataset, and are able to effectively improve the training efficiency and generalization of the model.
In order to enhance the diversity of the dataset, we adopt data enhancement techniques, including random rotation, scaling, flipping, cropping, and color adjustment of images. Data enhancement not only improves the model’s ability to adapt to different scenes and backgrounds by increasing the variability in training data, but also helps mitigate overfitting by preventing the model from memorizing specific details in the training images, thus encouraging better generalization to unseen data.
Model training parameters.
During the training process, the Y-MaskNet model uses the Adam optimizer and adopts the learning rate warm-up (warm-up) strategy to stabilize the training process. The specific hyperparameters are configured as follows:
Learning rate: the initial learning rate is set to 0.0001, and the learning rate decay strategy (decay to 0.9 times of the original every 5 epochs) is used to ensure the stability and convergence of the model training.
Batch size: set to 16, adapted to the GPU’s memory capacity.
Number of iterations: the total number of iterations for model training is 50 epochs.
-
Loss function: the target detection loss (YOLOv5) and the instance segmentation loss (Mask R-CNN) are optimized by a weighted joint loss function as follows:
(7) where is the categorization loss, is the bounding box regression loss, and is the segmentation mask loss. The coefficients , , and are the weighting factors for each loss component. These are empirically set to 1, 0.5, and 0.5, respectively, based on their relative importance in the joint learning process. Specifically, is set to 1 to give equal importance to the classification loss, while and are set to 0.5 to balance the contribution of the bounding box regression and mask segmentation losses. The choice of these values was made after conducting several preliminary experiments to ensure a balanced optimization of both detection and segmentation tasks.
With the above settings, we can evaluate the defect detection and segmentation performance of Y-MaskNet on electronic products and verify the effectiveness and reliability of our method in other scenarios. The evaluation results provide valuable guidance for further improving the model and popularizing it in practice.
Experimental results and analysis
In this section, we show the application of Y-MaskNet model on PCB Defect Dataset and DeepPCB dataset in a quantitative and qualitative way. In order to evaluate the detection and segmentation performance of the model in detail, we give some experimental results as follows: the detection accuracy of the model, segmentation effect and inference time. Besides, we will show the performance of the model on different datasets through visual images, and summarize the specific values of the performance of the model on the table. Through the experimental results and graphs, we can visualize the advantages and possible disadvantages of Y-MaskNet.
Visualization results.
In this part, we display the visualization results of Y-MaskNet model on PCB Defect Dataset to let people have an intuitive understanding of the effect of model on different defect detection and segmentation. In order to check the detection accuracy of model and segmentation effect of model comprehensively, we display the result of each test sample in the form of image. In each sub-image, the result of different process stage are displayed. The original image, target detection frame of YOLOv5, segmentation mask of Mask R-CNN, and final synthesized result are displayed respectively. This form can display the multi-dimension ability of the model clearly and display the detail of how model works in several steps.
To position Y-MaskNet within real-time industrial applications, we compare its performance with MobileNet and EfficientDet, two lightweight models well-suited for edge and mobile device deployment. MobileNet is known for its efficiency in terms of both size and computational cost, making it ideal for deployment on resource-constrained devices. EfficientDet, another lightweight model, is specifically designed for object detection tasks with a balance of accuracy and speed. Both models are evaluated under the same experimental setup, allowing a fair comparison with Y-MaskNet in terms of accuracy, speed, and model size.
Figs 3 and 4 show the visualization results of the Y-MaskNet model on the PCB Defect Dataset. The two images distinguish PCB boards with different types of defects. The four sub-images in the first row show two original images in turn, marking the specific locations of the defects. YOLOv5 accurately identifies the defective areas through the detection box, which provides candidate areas for subsequent segmentation. The four sub-images in the second row show the segmentation results of Mask R-CNN, corresponding to the original images and detection boxes in the first row. In these images, Mask R-CNN accurately segments the contours of each defective area and generates high-quality segmentation masks. Finally, by combining the detection box of YOLOv5 with the segmentation results of Mask R-CNN, the final synthetic output is obtained, demonstrating the powerful ability of the model in the task of defect detection and segmentation of electronic products.
Fig 3. Visualization results of Y-MaskNet in practical applications (a).
Fig 4. Visualization results of Y-MaskNet in practical applications (b).
From the visualization, we can directly view the model’s performance at different levels. The YOLOv5 locates the defect area quickly and accurately, and Mask R-Cnn gives segmentation information for each defect respectively. The experimental results prove the effectiveness of Y-MaskNet model in object detection and instance segmentation, and also prove the robustness and accuracy of Y-MaskNet model in face of complex background and subtle defects.
These pictures show the processing of Y-MaskNet model for different type of defects (open circuit, short circuit, welding defects, etc. ), which is helpful to further understand the applicability and high performance of the model in defect detection of electronic products.
Performance evaluation and model comparison.
To evaluate the performance of our Y-MaskNet model in target detection and instance segmentation, we select several commonly used evaluation metrics in target detection and instance segmentation, and then we tested these metrics on PCB Defect Dataset and DeepPCB dataset respectively. In addition, we also compared with some existing benchmark models (such as YOLOv5, Mask R-CNN) to validate the advantages of Y-MaskNet in target detection and segmentation. The following two tables summarize the performance of these models on PCB Defect Dataset and DeepPCB dataset respectively.
Table 2 summarizes the performance comparison of these models on PCB Defect Dataset. It is shown that these models perform well on common evaluation metrics, such as mAP, IoU and Dice coefficient on PCB Defect Dataset. Compared with YOLOv5 and Mask R-CNN, Y-MaskNet has certain advantages in several evaluation metrics. Especially, in the case of mAP@[0.5:0.95], IoU and Dice coefficient, the overall performance of Y-MaskNet is much better. That is because we combine the efficient detection ability of YOLOv5 and fine segmentation ability of Mask R-CNN together, Y-MaskNet has a certain advantage in target detection and instance segmentation.
Table 2. Performance comparison of Y-MaskNet with benchmark models.
| Model | Dataset | mAP@[0.5] | mAP@[0.5:0.95] | IoU | Dice | Recall | Precision |
|---|---|---|---|---|---|---|---|
| YOLOv5 [32] | PCB Defect Dataset | 0.85 | 0.62 | 0.75 | 0.81 | 0.80 | 0.84 |
| Mask R-CNN [33] | PCB Defect Dataset | 0.88 | 0.65 | 0.78 | 0.83 | 0.82 | 0.86 |
| Y-MaskNet | PCB Defect Dataset | 0.91 | 0.72 | 0.82 | 0.88 | 0.85 | 0.89 |
| YOLOv5 | DeepPCB | 0.81 | 0.60 | 0.71 | 0.78 | 0.76 | 0.80 |
| Mask R-CNN | DeepPCB | 0.84 | 0.62 | 0.74 | 0.80 | 0.78 | 0.83 |
| Y-MaskNet | DeepPCB | 0.88 | 0.70 | 0.80 | 0.85 | 0.82 | 0.87 |
As shown in Table 2, compared with YOLOv5 and Mask R-CNN, The MaskNet performance is improved obviously by PCB Defect Dataset algorithm. Especially, compared with YOLOv5 and Mask R-CNN, mAP@[0.5:0.95] and IoU are much better than other two models. 10% (0.72 vs. 0.62) and 7% (0.82 vs. 0.75) increase respectively. In addition, Dice coefficient is also improved a lot, and the final value is 0.88. Compared with YOLOv5, it is also much higher. It is also possible that YOLOv5 and Mask R-CNN combination can achieve high target detection accuracy and segmentation accuracy, and Y-MaskNet can achieve high target detection accuracy and segmentation accuracy.
Although YOLOv5 and Mask R-CNN performance is decreased compared with PCB Defect Dataset, Compared with YOLOv5 and Mask R-CNN, Y-MaskNet can still achieve better detection and segmentation performance. Especially for fine defects (such as cracks and solder defects), Y-MaskNet can provide more accurate defects detection and segmentation results. Compared with YOLOv5 and Mask R-CNN, Y-MaskNet improved 10% (0.70 vs. 0.60) in mAP@[0.5:0.95], and also improved in IoU (0.80 vs. 0.71) and Dice coefficient (0.85 vs. 0.78). These results further verified that Y-MaskNet has certain advantages in handling complex backgrounds and small defects.
Compared with YOLOv5 and Mask R-CNN, the performance of the model on the small object detection is also an important task. Because PCB defects may include small cracks and defective regions, being able to detect and segment small objects is very important. As shown in Table 3, Y-MaskNet still achieves better performance than YOLOv5 and Mask R-CNN model, especially on the DeepPCB data set, Y-MaskNet can better handle the task of localization and segmentation of small defects.
Table 3. Performance of the models in the small object detection task.
| Model | Dataset | Small Object Detection mAP | Small Object IoU | Small Object Recall | Small Object Precision |
|---|---|---|---|---|---|
| YOLOv5 | PCB Defect Dataset | 0.72 | 0.60 | 0.65 | 0.70 |
| Mask R-CNN | PCB Defect Dataset | 0.75 | 0.63 | 0.68 | 0.72 |
| Y-MaskNet | PCB Defect Dataset | 0.78 | 0.68 | 0.71 | 0.75 |
| YOLOv5 | DeepPCB | 0.70 | 0.58 | 0.62 | 0.67 |
| Mask R-CNN | DeepPCB | 0.74 | 0.61 | 0.65 | 0.71 |
| Y-MaskNet | DeepPCB | 0.80 | 0.72 | 0.74 | 0.78 |
The results in Table 3 further demonstrate the advantages of Y-MaskNet in small object detection. On the DeepPCB dataset, the mAP of Y-MaskNet is 0.80, which is significantly higher than that of YOLOv5 (0.70) and Mask R-CNN (0.74), with an improvement of about 14% and 8%. Meanwhile, Y-MaskNet also shows significant improvement in IoU (0.72) and recall of small objects (0.74). This is especially important for tiny cracks and solder defects in PCB defect detection, demonstrating the advantages of Y-MaskNet in fine defect detection.
Combining the results in Tables 2 and 3, we can conclude that Y-MaskNet demonstrates superior performance over YOLOv5 and Mask R-CNN in PCB defect detection and segmentation tasks, especially for complex backgrounds, small objects and fine-grained defects. Especially in the tasks of small object detection, fine-grained defect segmentation, and high precision requirements, Y-MaskNet demonstrates a stronger capability, which has a high practical value and promotion potential.
In order to represent the experimental results more intuitively, we use Fig 5 to show the performance comparison of different models in the small object detection task, using bar charts to show the performance of YOLOv5, Mask R-CNN, and Y-MaskNet in several metrics (e.g., mAP, IoU, recall, precision). Through this graph, it can be visualized that Y-MaskNet outperforms YOLOv5 and Mask R-CNN in all the evaluated metrics, especially in the detection accuracy and segmentation effect of small objects.
Fig 5. Small object detection performance comparison.
Conclusion and discussion
In this study, we propose Y-MaskNet, a multi-task joint learning framework based on YOLOv5 and Mask R-CNN, for defect detection and segmentation of electronic products. With the rapid development of industrial automation and intelligent manufacturing, real-time detection and fine segmentation of defects in electronic products have become increasingly important. However, traditional defect detection methods often have limitations, cannot effectively deal with small object defects in complex backgrounds, and have insufficient accuracy. To overcome these problems, we combine the efficient detection capability of YOLOv5 and the fine segmentation capability of Mask R-CNN to improve the performance of the model in the detection and segmentation tasks by multi-task learning. The experiments validate the effectiveness of the Y-MaskNet model by using PCB Defect Dataset and DeepPCB dataset.
The experimental results show that Y-MaskNet has significant advantages over the traditional YOLOv5 and Mask R-CNN in target detection and instance segmentation. On PCB Defect Dataset and DeepPCB dataset, Y-MaskNet performs well in several evaluation metrics. Specifically, the mAP@[0.5:0.95] improves by 10% (0.72 vs. 0.62 for YOLOv5) and 7% (0.82 vs. 0.75 for Mask R-CNN), while IoU and Dice coefficient also show notable improvements. Especially in the small object detection task, Y-MaskNet significantly improves the detection accuracy and segmentation ability of the model, and successfully overcomes the problem of small defect detection. Through the image visualization results, we also intuitively show the detection and segmentation effects of the model under different defect types, proving the applicability and efficiency of the model in real industrial environments.
However, despite the good results of Y-MaskNet in defect detection in electronic products, the model still has certain shortcomings. First, despite the high speed of YOLOv5 in target detection, leakage may still occur when facing extremely small defects, especially when detecting dense small objects. Second, although Mask R-CNN performs well in segmentation accuracy, the model has a high demand for computational resources and slower inference when dealing with large-scale datasets. Furthermore, although our model shows strong performance in experimental settings, its real-time inference time on edge devices remains a challenge, particularly in resource-constrained environments. In real manufacturing scenarios, factors such as varying lighting conditions, noise, and defect types may also affect the model’s robustness and accuracy.
In our future research, we plan to further optimize the model by focusing on two main aspects. First, by introducing lightweight network architectures, such as MobileNet or EfficientNet, to improve the running speed and real-time performance of the model on edge devices and reduce the computational overhead of model inference. Second, we will explore multimodal learning methods to fuse data from different sources (e.g., sensor data, thermal imaging maps, etc.) into the model to further improve the robustness and accuracy of defect detection. Through these optimizations, we hope to promote the practice and implementation of Y-MaskNet in a wider range of industrial applications to meet the high demands of modern manufacturing for automated quality control and defect detection.
Data Availability
Yes - all data are fully available without restriction.
Funding Statement
The author(s) received no specific funding for this work.
References
- 1.Felfeliyan B, Hareendranathan A, Kuntze G, Jaremko JL, Ronsky JL. Improved-Mask R-CNN: Towards an accurate generic MSK MRI instance segmentation platform (data from the Osteoarthritis Initiative). Comput Med Imaging Graph. 2022;97:102056. doi: 10.1016/j.compmedimag.2022.102056 [DOI] [PubMed] [Google Scholar]
- 2.Tanoglidis D, Ćiprijanović A, Drlica-Wagner A, Nord B, Wang MHLS, Amsellem AJ, et al. DeepGhostBusters: Using Mask R-CNN to detect and mask ghosting and scattered-light artifacts from optical survey images. Astron Comput. 2022;39:100580. doi: 10.1016/j.ascom.2022.100580 [DOI] [Google Scholar]
- 3.Sahu S, Sahu SP, Dewangan DK. Pedestrian detection using MobileNetV2 based mask R-CNN. IoT based control networks and intelligent systems: Proceedings of 3rd ICICNIS 2022 . Springer; 2022. p. 299–318.
- 4.Ball JGC, Hickman SHM, Jackson TD, Koay XJ, Hirst J, Jay W, et al. Accurate delineation of individual tree crowns in tropical forests from aerial RGB imagery using Mask R-CNN. Remote Sens Ecol Conserv. 2023;9(5):641–55. doi: 10.1002/rse2.332 [DOI] [Google Scholar]
- 5.Xiong K, Li Q, Meng Y, Li Q. A study on weed detection based on improved YOLO v5. In: 2023 4th international conference on information science and education (ICISE-IE); 2023. p. 1–4.
- 6.Li Y, Li A, Li X, Liang D. Detection and identification of peach leaf diseases based on YOLO v5 improved model. In: 2022 The 5th international conference on control and computer vision; 2022. p. 79–84. 10.1145/3561613.3561626 [DOI]
- 7.Li J, Qiao Y, Liu S, Zhang J, Yang Z, Wang M. An improved YOLOv5-based vegetable disease detection method. Comput Electron Agric. 2022;202:107345. doi: 10.1016/j.compag.2022.107345 [DOI] [Google Scholar]
- 8.Zhang C, Rao C, Li H, Chang C, Wang J, Yin A, et al. A fabric defect detection method based on improved YOLOv5. In: International conference on computer animation and social agents; 2024. p. 165–180.
- 9.Sabuhi M, Zhou M, Bezemer C-P, Musilek P. Applications of generative adversarial networks in anomaly detection: A systematic literature review. IEEE Access. 2021;9:161003–29. doi: 10.1109/access.2021.3131949 [DOI] [Google Scholar]
- 10.Landauer M, Onder S, Skopik F, Wurzenberger M. Deep learning for anomaly detection in log data: A survey. Mach Learn Appl. 2023;12:100470. doi: 10.1016/j.mlwa.2023.100470 [DOI] [Google Scholar]
- 11.Liu J, Xie G, Wang J, Li S, Wang C, Zheng F, et al. Deep industrial image anomaly detection: A survey. Mach Intell Res. 2024;21(1):104–35. doi: 10.1007/s11633-023-1459-z [DOI] [Google Scholar]
- 12.Chow JK, Su Z, Wu J, Tan PS, Mao X, Wang YH. Anomaly detection of defects on concrete structures with the convolutional autoencoder. Advanced Engineering Informatics. 2020;45:101105. doi: 10.1016/j.aei.2020.101105 [DOI] [Google Scholar]
- 13.Torabi H, Mirtaheri SL, Greco S. Practical autoencoder based anomaly detection by using vector reconstruction error. Cybersecurity. 2023;6(1). doi: 10.1186/s42400-022-00134-9 [DOI] [Google Scholar]
- 14.Haji SH, Ameen SY. Attack and anomaly detection in IoT networks using machine learning techniques: A review. Asian J Res Comput Sci. 2021;9(2):30–46. [Google Scholar]
- 15.Nagaraja A, Boregowda U, Khatatneh K, Vangipuram R, Nuvvusetty R, Sravan Kiran V. Similarity based feature transformation for network anomaly detection. IEEE Access. 2020;8:39184–96. doi: 10.1109/access.2020.2975716 [DOI] [Google Scholar]
- 16.Akcay S, Ameln D, Vaidya A, Lakshmanan B, Ahuja N, Genc U. Anomalib: A deep learning library for anomaly detection. In: 2022 IEEE International Conference on Image Processing (ICIP); 2022. p. 1706–10. 10.1109/icip46576.2022.9897283 [DOI]
- 17.Fang X, Liu D, Zhou P, Hu Y. Multi-modal cross-domain alignment network for video moment retrieval. IEEE Trans Multimedia. 2023;25:7517–32. doi: 10.1109/tmm.2022.3222965 [DOI] [Google Scholar]
- 18.Tang R, Yang C, Wang Y. A cross-domain multimodal supervised latent topic model for item tagging and cold-start recommendation. IEEE MultiMedia. 2023;30(3):48–62. doi: 10.1109/mmul.2023.3242455 [DOI] [Google Scholar]
- 19.Parekh VS, Lai S, Braverman V, Leal S, Rowe S, Pillai JJ, et al. Cross-domain federated learning in medical imaging. arXiv preprint. 2021. 10.48550/arXiv.2112.10001 [DOI] [Google Scholar]
- 20.Zhao W, Wu X, Luo J. Cross-domain image captioning via cross-modal retrieval and model adaptation. IEEE Trans Image Process. 2021;30:1180–92. doi: 10.1109/TIP.2020.3042086 [DOI] [PubMed] [Google Scholar]
- 21.Chen Y, Zhong H, He X, Peng Y, Cheng L. Real20M: A large-scale e-commerce dataset for cross-domain retrieval. In: Proceedings of the 31st ACM international conference on multimedia; 2023. p. 4939–48. 10.1145/3581783.3612408 [DOI]
- 22.Francis J, Chen B, Ganju S, Kathpal S, Poonganam J, Shivani A, et al. Learn-to-race challenge 2022 : Benchmarking safe learning and cross-domain generalisation in autonomous racing. arXiv preprint. 2022. 10.48550/arXiv.2205.02953 [DOI] [Google Scholar]
- 23.Zong Y, Aodha OM, Hospedales TM. Self-supervised multimodal learning: A survey. IEEE Trans Pattern Anal Mach Intell. 2025;47(7):5299–318. doi: 10.1109/TPAMI.2024.3429301 [DOI] [PubMed] [Google Scholar]
- 24.Adibhatla VA, Chih H-C, Hsu C-C, Cheng J, Abbod MF, Shieh J-S. Applying deep learning to defect detection in printed circuit boards via a newest model of you-only-look-once. Math Biosci Eng. 2021;18(4):4411–28. doi: 10.3934/mbe.2021223 [DOI] [PubMed] [Google Scholar]
- 25.Xiong W, Lee J, Qu S, Jang WH. 81-1: Invited paper: Data augmentation for applying deep learning to display manufacturing defect detection. In: SID symposium digest of technical papers; 2020.
- 26.Mendikute J, Carmona I, Aizpurua I, Bediaga I, Castro I, Galdos L, et al. Defect detection in wind turbine blades applying convolutional neural networks to ultrasonic testing. NDT and E International. 2025;154. [Google Scholar]
- 27.Ancha VK, Sibai FN, Gonuguntla V, Vaddi R. Utilizing YOLO models for real-world scenarios: Assessing novel mixed defect detection dataset in PCBs. IEEE Access. 2024;12(000):8. [Google Scholar]
- 28.Liu G, Chen Y, Ye J, Jiang Y, Yu H, Tang J, et al. A transformer neural network based framework for steel defect detection under complex scenarios. Adv Eng Softw. 2025;202:103872. doi: 10.1016/j.advengsoft.2025.103872 [DOI] [Google Scholar]
- 29.Zhang Q, Zhang M, Gamanayake C, Yuen C, Geng Z, Jayasekara H, et al. Deep learning based solder joint defect detection on industrial printed circuit board X-ray images. Complex Intell Syst. 2022;8(2):1525–37. doi: 10.1007/s40747-021-00600-w [DOI] [Google Scholar]
- 30.Zheng X, Wang H, Chen J, Kong Y, Zheng S. A generic semi-supervised deep learning-based approach for automated surface inspection. IEEE Access. 2020;8:114088–99. doi: 10.1109/access.2020.3003588 [DOI] [Google Scholar]
- 31.Haddad BM, Dodge SF, Karam LJ, Patel NS, Braun MW. Locally adaptive statistical background modeling with deep learning-based false positive rejection for defect detection in semiconductor units. IEEE Trans Semicond Manufact. 2020;33(3):357–72. doi: 10.1109/tsm.2020.2998441 [DOI] [Google Scholar]
- 32.Brungel R, Friedrich CM. DETR and YOLOv5: Exploring performance and self-training for diabetic foot ulcer detection. In: 2021 IEEE 34th international symposium on computer-based medical systems (CBMS); 2021. p. 148–53. 10.1109/cbms52027.2021.00063 [DOI]
- 33.Alshawabkeh S, Wu L, Dong D, Cheng Y, Li L. A hybrid approach for pavement crack detection using Mask R-CNN and vision transformer model. Comput Mater Continua. 2025;82(1). [Google Scholar]
- 34.Calabrese M, Agnusdei L, Fontana G, Papadia G, Del Prete A. Application of mask R-CNN and YOLOv8 algorithms for defect detection in printed circuit board manufacturing. Discov Appl Sci. 2025;7(4). doi: 10.1007/s42452-025-06641-x [DOI] [Google Scholar]
- 35.Fontana G, Calabrese M, Agnusdei L, Papadia G, Del Prete A. SolDef_AI: An open source PCB dataset for mask R-CNN defect detection in soldering processes of electronic components. JMMP. 2024;8(3):117. doi: 10.3390/jmmp8030117 [DOI] [Google Scholar]
- 36.Xing X, Wang B, Ning X, Wang G, Tiwari P. Short-term OD flow prediction for urban rail transit control: A multi-graph spatiotemporal fusion approach. Inform Fusion. 2025;118:102950. doi: 10.1016/j.inffus.2025.102950 [DOI] [Google Scholar]
- 37.He F, Li H, Ning X, Li Q. BeautyDiffusion: Generative latent decomposition for makeup transfer via diffusion models. Inform Fusion. 2025;123:103241. doi: 10.1016/j.inffus.2025.103241 [DOI] [Google Scholar]
- 38.Zhang H, Yu L, Wang G, Tian S, Yu Z, Li W, et al. Cross-modal knowledge transfer for 3D point clouds via graph offset prediction. Pattern Recogn. 2025;162:111351. doi: 10.1016/j.patcog.2025.111351 [DOI] [Google Scholar]
- 39.Li Q, Chen H, Huang X, He M, Ning X, Wang G, et al. Oral multi-pathology segmentation with lead-assisting backbone attention network and synthetic data generation. Inform Fusion. 2025;118:102892. doi: 10.1016/j.inffus.2024.102892 [DOI] [Google Scholar]
- 40.Umer MJ, Sharif MI. A comprehensive survey on quantum machine learning and possible applications. Int J E-Health Med Commun. 2022;13(5):1–17. doi: 10.4018/ijehmc.315730 [DOI] [Google Scholar]
- 41.Thara DK, Premasudha BG, Murthy TV, Bukhari SAC. EEG forecasting with univariate and multivariate time series using windowing and baseline method. Int J E-Health Med Commun. 2022;13(5):1–13. doi: 10.4018/ijehmc.315731 [DOI] [Google Scholar]
- 42.Ping RB, Yue WZ. Strategic focus, tasks, and pathways for promoting China’s modernization through new productive forces. J Xi’an Univ Finance Econ. 2024;1:3–11. [Google Scholar]
- 43.Jing CX, Qing W. The logic and pathways of new productive forces driving high-quality development. J Xi’an Univ Finance Econ. 2024;37(1):12–20. [Google Scholar]
- 44.Sannidhan MS, Martis JE, Nayak RS, Aithal SK, Sudeepa KB. Detection of antibiotic constituent in aspergillus flavus using quantum convolutional neural network. Int J E-Health Med Commun. 2023;14(1):1–26. doi: 10.4018/ijehmc.321150 [DOI] [Google Scholar]
- 45.Kodipalli A, Fernandes SL, Dasar SK, Ismail T. Computational framework of inverted fuzzy C-means and quantum convolutional neural network towards accurate detection of ovarian tumors. Int J E-Health Med Commun. 2023;14(1):1–16. doi: 10.4018/ijehmc.321149 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Yes - all data are fully available without restriction.





