Systematic Evaluation of Atrous Spatial Pyramid Pooling in U‑Net for Pore Segmentation in Plasma Electrolytic Oxidation Coatings

Chi-Wei Chu; Chun-Ming Lu; Wing Kiu Yeung

doi:10.1021/acs.langmuir.5c01673

. 2025 Jun 16;41(25):16368–16377. doi: 10.1021/acs.langmuir.5c01673

Systematic Evaluation of Atrous Spatial Pyramid Pooling in U‑Net for Pore Segmentation in Plasma Electrolytic Oxidation Coatings

Chi-Wei Chu ^‡, Chun-Ming Lu ^‡, Wing Kiu Yeung ^†,^‡,^*

PMCID: PMC12224308 PMID: 40523154

Abstract

Plasma Electrolytic Oxidation (PEO) coatings enhance the physical and chemical properties of metallic substrates, including corrosion resistance, wear resistance, and thermal stability. These enhancements are strongly influenced by the porous surface morphology of the coatings, which affects the ion transport, stress distribution, and permeability. Accurate quantification of pore structures is essential for understanding interfacial structure–property relationships, yet traditional image segmentation methods often fail to capture the complexity of PEO surfaces in SEM images. This study presents a deep learning-based segmentation framework using U-Net architectures integrated with Atrous Spatial Pyramid Pooling (ASPP) to improve multiscale feature extraction. The performance impact of ASPP placement within different parts of U-Net was systematically evaluated. Results show that modifications to the bridge and decoder paths have the greatest impact on segmentation performance, with a combined modification applying ASPP in both achieving the highest F1 score (0.9360) and the highest IoU (0.8798). Statistical analysis using 5-fold cross-validation, bootstrap confidence intervals, and paired t-tests confirmed that only the bridge-modified model (B _1×1) significantly outperformed the baseline (p < 0.05). The proposed approach enables high-fidelity pore segmentation and supports advanced microstructural analysis of PEO coatings. By facilitating accurate morphological quantification, it contributes to the understanding of structure–property relationships in interfacial materials and offers a robust tool for future materials characterization workflows.

graphic file with name la5c01673_0007.jpg

graphic file with name la5c01673_0005.jpg

Introduction

In the era of data-driven innovation, deep learning has significantly expanded the capabilities of computer vision, enabling highly accurate and efficient image analysis. Among its many applications, semantic segmentation, , an advanced technique for pixel-wise classification, has gained prominence for its ability to delineate objects into meaningful categories. This has driven advancements across diverse fields, including those from biomedical imaging − to industrial analysis. ,

Plasma Electrolytic Oxidation (PEO) coatings are an emerging class of ceramic-like surface layers that significantly enhance the physical and chemical properties of metallic substrates, including corrosion resistance, − wear resistance, , thermal stability, , and photocatalytic efficiency. , These enhancements are largely governed by the porous morphology of the coating, which influences ion or gas transport, local stress distribution, and permeability. , Therefore, understanding and quantifying pore structures are essential for correlating the interfacial structure with material performance.

The most thorough approach for analyzing pores in PEO coatings involves micro-CT scanning, , which provides high-resolution images of both surface and internal network structures. However, their widespread use is limited by the high cost and accessibility. As a result, SEM image analysis via thresholding remains the most common method. Yet, due to the complexity of PEO surface morphologies, these traditional methods often require manual corrections, such as adjusting segmentation boundaries or removing artifacts, which are time-consuming, subjective, and prone to human error.

A less-explored method was proposed by Ivasenko et al., which used intensity-based segmentation combined with triangle thresholding. However, the model’s performance has not been systematically evaluated. In contrast, recent advances in deep learning, particularly Convolutional Neural Networks (CNNs), have substantially improved the image segmentation performance across a wide range of domains. Fully Convolutional Networks (FCNs) were the first to eliminate fully connected layers, enabling pixel-wise prediction with end-to-end learning. SegNet further advanced this by introducing a symmetrical encoder–decoder structure with pooling index-based upsampling. Among these, the U-Net and its variants have demonstrated remarkable success across various domains − ,− due to their encoder–decoder structure and skip connections, which help preserve spatial information even with limited data. , U-Net has also been successfully applied to pore segmentation, including bubbles in ice, shale pores, and helium cavities in materials. ,, Given the strong performance, U-Net was selected as the base architecture for this study.

However, the wide range of pore sizes and irregular geometries in PEO coatings poses a challenge to standard U-Net, which uses fixed 3 × 3 convolutional kernels. These may be inadequate for capturing both fine details and broader contextual features. To address this, prior research has emphasized the importance of multiscale feature representation in improving segmentation accuracy. For example, Zhou et al. introduced nested skip pathways in UNet++ to extract hierarchical features, although the design significantly increases the computational load. Gu et al. (CE-Net) applied multikernel pooling and atrous convolutions in the bridge to enhance contextual information efficiently. Similarly, Mao et al. (RR-Net) used decomposed convolutions to increase feature diversity, while Liu et al. (DLGPAFE-Net) incorporated multiscale features within attention mechanisms. PSPNet further supports the value of multiresolution representation through pyramid-based context aggregation.

Among various strategies, Atrous Spatial Pyramid Pooling (ASPP) , has emerged as a lightweight yet effective method. Introduced in the DeepLab series, − ASPP applies parallel atrous convolutions with varying dilation rates to capture multiscale context without largely increasing the parameter count. ASPP has since been adapted into U-Net for applications such as medical imaging and remote sensing. − For instance, Bansal et al. applied ASPP from the encoder to the early Decoder, optimizing performance for mobile devices using small dilation rates. Yousef et al. placed ASPP in the bridge, while Gao and Almekkawy used dynamic ASPP in the encoder of a nested U-Net to boost performance. Yang et al. enhanced skip connections using ASPP with channel attention.

While ASPP has demonstrated success in various domains, its optimal integration into U-Net for segmenting the complex and heterogeneous surface features of PEO coatings remains unexplored. This study aims to establish a robust method for SEM image segmentation of PEO coatings while systematically evaluating the impact of integrating ASPP at different stages of the U-Net architecture, encoder, bridge, and decoder for pore segmentation. The modified U-Net variations, termed multiscale atomic convolutional block (MACB) U-Nets, are tested to determine the most effective configuration for improving the segmentation performance.

By improving segmentation fidelity, this study contributes to a more precise evaluation of porosity, pore shape, and spatial distribution, which are essential descriptors in interfacial materials science. The proposed approach supports advanced microstructural analysis and lays the foundation for correlating pore-level features with the macroscopic physical behavior of PEO coatings. This enables future integration of high-throughput image analysis into the material design and property optimization workflows.

Material and Methods

Data Acquisition and Preprocessing

The data set used in this study consisted of 200 SEM images of PEO coatings, captured at magnifications ranging from 1000 to 5000× using a JEOL JSM-6510 SEM. The images were acquired from PEO-coated samples prepared under varying processing regimes to capture diverse surface morphologies. To enhance contrast and clarity, brightness adjustments were performed automatically during image acquisition or manually, where necessary.

Each image was cropped to a resolution of 800 × 800 pixels without further resizing to ensure consistency and preserve structural details. The corresponding ground truth segmentation masks were generated via manual annotation by three trained graduate students and authors, and those with prior experience in SEM imaging and PEO coatings conducted annotations using APEER (an online labeling tool) and the GNU Image Manipulation Program (GIMP), which allowed for flexible pixel-level selection and boundary refinement. Annotators followed a standardized labeling protocol, and all annotations were reviewed by a senior researcher to ensure accuracy and consistency across the data set.

Pores were defined as surface-penetrating channels with visible openings including complex structures such as irregularities or protrusions within the openings. Depressed areas without penetration were excluded from the pore category. All annotated masks were binarized, with pore regions assigned a value of 1 and the background assigned 0.

Basic U-Net Architecture

The U-Net model served as the baseline architecture for this study, as depicted in Figure . Figure a illustrates the structure of the basic U-Net, which consists of three main components: the encoder, bridge, and decoder. The Basic Convolutional Blocks (BCBs), shown in Figure b, form the core of these components. Each BCB consists of two sequential 3 × 3 convolutional layers, each followed by a Rectified Linear Unit (ReLU) activation function, which is for creating nonlinearity for data fitting. A dropout layer is included after two consecutive convolutional operations to prevent overfitting.

(a) Architecture of the basic U-Net model. (b) Structure of Basic Convolutional Blocks (BCBs).

The basic U-Net begins with an input layer that processes sliced SEM images of 800 × 800 pixels, normalized to the range [0,1]. The encoder, responsible for extracting multiscale features, consists of a series of convolutional blocks. After each block, a 2 × 2 max-pooling operation with a stride of 2 is applied, reducing spatial dimensions while doubling the number of filters in the subsequent BCB (starting with f filters in the first BCB). This down-sampling operation enables the network to capture progressively abstract and high-level features. The bridge acts as an intermediary between the encoder and decoder, processing the downsampled feature maps from the final encoder block before they are up-sampled in the decoder path.

The decoder path mirrors the encoder but focuses on up-sampling the feature maps to restore the original spatial resolution. It begins with a transpose convolution layer, which doubles the spatial size of feature maps while halving the number of filters. The up-sampled feature maps are then concatenated with their corresponding encoder feature maps via skip connections, preserving fine-grained spatial details lost during down-sampling. Each concatenated output is further processed by a BCB. Finally, the output layer applies a 1 × 1 convolution, reducing the depth of the feature maps to a single channel, followed by a sigmoid activation function to generate the final segmentation mask.

To determine the most suitable baseline model, U-Nets with depths of 7 and 9 BCBs were tested before further modifications.

Multiscale Atrous Convolutional Block U-Nets

To improve segmentation performance, Atrous Spatial Pyramid Pooling (ASPP) was integrated into the U-Net architecture by replacing standard convolutions in BCB. The ASPP-incorporated structures used in this research are illustrated in Figure . Figure a illustrates the ASPP convolutional operation, which comprises parallel atrous convolution layers with varying dilation rates. These layers process the input feature maps using atrous filters with different dilation rates, enabling the extraction of both fine-grained local features and broad contextual information. The outputs from these layers are then concatenated and passed through additional convolutions, enabling the network to integrate information across scales. An atrous filter with a dilation rate of r enhances the receptive field from $k \times k$ to $k_{e} = k + (k - 1) \times (r - 1)$ , where k is the original kernel size.

Schematic illustration of multiscale atrous convolutional blocks (MACBs). (a) ASPP convolutional operation, (b) MACB_3×3, and (c) MACB_1×1.

The ASPP-revised structure is named multiscale atrous convolutional block (MACB) in this research. Two variations of the MACB were implemented. MACB_3×3 (Figure b) uses three parallel 3×3 atrous convolutions with three different dilation rates, followed by a 3 × 3 convolution to fuse outputs. MACB_1×1 ( Figure c) uses the same ASPP structure but replaces the final fusion layer with a 1 × 1 convolution, which was the original design in DeepLabv2, aiming for feature integration and reducing the computational complexity. To manage computational costs, the number of filters in the atrous convolutions was set to one-third of those in standard convolutions (e.g., 8 filters per atrous convolutional branch compared to 24 filters in a standard convolution). The optimal dilation rates were first tuned using a fully MACB_3×3-modified U-Net, replacing all BCBs in the baseline U-Net 9 architecture before final comparisons. Similar to the BCBs, the filter numbers in the atrous convolutions were doubled after each down-sampling operation and halved after each up-sampling operation.

MACB_3×3 and MACB_1×1 were systematically tested in different parts of the U-Net architecture (encoder, bridge, decoder, and their combinations) to determine the optimal placement of the ASPP for segmentation.

Training Details

Model training was conducted on Google Colab, leveraging an NVIDIA L4 GPU to accelerate deep learning computations. The experiments were implemented by using TensorFlow 2.18.0 and executed in a Python 3.11.11 environment.

Due to the relatively small data set size, 5-fold cross-validation (CV) was employed to ensure robust model evaluation and reduce the risk of overfitting by avoiding reliance on a single test set. , In each fold, 40 images were designated as the test set and another 40 as the validation set with the remaining 120 images used for training. These sets were mutually exclusive in each fold. Early stopping was applied to prevent overfitting, and training was terminated if no improvement was observed over 10 consecutive epochs.

To further evaluate the reliability and statistical robustness of our models, we employed two complementary methods: bootstrap resampling and paired t tests. For each variant, the 5-fold-level F1 and IoU scores constituted an empirical sampling distribution. We applied nonparametric bootstrap resampling (10,000 iterations) to these values, drawing with replacement to generate distributions of the mean F1 and mean IoU. From each bootstrapped distribution, we derived 95% confidence intervals, defined by the 2.5th and 97.5th percentiles. Concurrently, paired t tests were performed on the fold-level metrics to assess the significance of the performance differences between variants.

To enhance model generalization, we augmented the training set with geometric and photometric transformations: rotations (0°, 90°, 180°, and 270°), horizontal and vertical flips, and random brightness/contrast scaling in the range (0.5, 1.5). To evaluate robustness under variable imaging conditions, we also applied brightness/contrast adjustments (same range) to the validation and test setswhile omitting rotations and flipsto emulate real-world variations in SEM acquisition settings.

Loss Function

Since this study involved binary segmentation, binary cross-entropy was used as the loss function, defined as

L (y, p) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log (p_{i}) + (1 - y_{i}) \log (1 - p_{i})]

where N is the number of pixels, y _i is the true class for the i-th pixel (0 for background, 1 for pores), and p _i is the predicted probability that the i-th pixel belongs to the true class.

The Adam optimizer was used with a first-moment decay rate (β₁) of 0.9 and a second-moment decay rate (β₂) of 0.999. The learning rate was adjusted to enhance training stability, with a clip value of 1 applied to prevent gradient vanishing and explosion. A minibatch size of 4 was used.

Quantitative Analysis

Segmentation performance was assessed by classifying predicted pixels into True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). To compare model performances, the F1 score was used as the evaluation metric due to its sensitivity to imbalanced classes. The F1 score is the weighted average of Precision and Recall, providing a balance between precision and recall and offering a comprehensive measurement of segmentation performance:

Precision = \frac{TP}{TP + FP}

Recall = \frac{TP}{TP + FN}

F1 score = 2 \times \frac{precision \times recall}{precision + recall} = \frac{2 TP}{2 TP + FP + FN}

DSC = F1 score = \frac{2 (A \cap B)}{| A | + | B |} = \frac{2 TP}{2 TP + FP + FN}

where A denotes the predicted pore regions and B represents the pore regions in the ground truth masks.

Additionally, Intersection over Union (IoU), also known as the Jaccard Index, was used as a supplementary evaluation metric. IoU measures the overlap between predicted and ground truth masks, providing an intuitive measure of segmentation performance:

IoU = \frac{A \cap B}{A \cup B} = \frac{TP}{TP + FP + FN}

Results and Discussion

Basic U-Nets

Before model comparisons, hyperparameter tuning for basic U-Net models with depths of 7 and 9 BCBs, denoted as U-Net 7 and U-Net 9, respectively, was first conducted using grid search, as detailed in Tables S1 and S2. The 5-fold CV average F1 score was used to assess performance for each combination of filter number, learning rate (LR), and dropout rate. The tested range of hyperparameters included filter numbers for the first convolutional layer from 18 to 40, learning rates from 0.00025 to 0.002, and dropout rates from 0.1 to 0.4. Table S3 summarizes the optimized hyperparameters.

Using these optimized hyperparameters, a final comparison of the test set was conducted, as presented in Table .

1. Comparison of Segmentation Performance between U-Net 7 and U-Net .

Model	Mean F1 Score	95% CI (F1)	p-value (F1)	Mean IoU	95% CI (IoU)	p-value (IoU)
U-Net 7	0.9195	0.9108–0.9275	-	0.8511	0.8364–0.8649	-
U-Net 9	0.9294	0.9233–0.9356	0.0252*	0.8681	0.8575–0.8789	0.0249*

Open in a new tab

Mean F1 scores and Intersection Over Union (iou) were computed from 5-fold cross-validation. 95% Confidence Intervals (CIs) were estimated using 10,000 bootstrap iterations. Paired t-tests were used to assess statistical significance; p < 0.05 was considered significant and is marked with an asterisk (*).

Table summarizes the segmentation performance of U-Net architectures with different depths. U-Net 9, with increased depth, outperformed U-Net 7, achieving a higher mean F1 score (0.9294 vs 0.9195) and mean IoU (0.8681 vs 0.8511). The 95% confidence intervals for both metrics indicate improved performance and reliability. Importantly, the performance gains were statistically significant, with p values of <0.05 for both the F1 score and IoU. Based on these results, U-Net 9 was selected as the baseline architecture for subsequent experiments.

Multiscale Atrous Convolutional Block U-Nets

Before the optimal placement of ASPP within the U-Net architecture was determined, the dilation rates for the atrous filters were first optimized using the MACB_3×3 fully modified U-Net, as summarized in Table S4. The optimal dilation rates were determined to be 1, 2, and 4. To evaluate the effect of ASPP in different parts of the U-Net architecture, MACB_3×3 and MACB_1×1 U-Nets were compared, as shown in Table

2. Comparison of Segmentation Performance across U-Net Variants .

Model	Mean F1 Score	95% CI (F1)	p value (F1)	ΔF1 vs U-Net 9	Mean IoU	95% CI (IoU)	p-value (IoU)	ΔIoU vs U-Net 9
U-Net 9	0.9294	0.92328–0.93555	-	-	0.8681	0.85750–0.87893	-	-
E_3×3	0.9296	0.92186–0.93633	0.9924	+0.02%	0.8684	0.85450–0.88030	0.9898	+0.03%
B_3×3	0.9323	0.92678–0.93765	0.6425	+0.31%	0.8732	0.86356–0.88263	0.6455	+0.59%
D_3×3	0.9331	0.92821–0.93778	0.3354	+0.40%	0.8745	0.86606–0.88285	0.3318	+0.74%
EB_3×3	0.9323	0.92680–0.93703	0.5977	+0.31%	0.8731	0.86329–0.88153	0.5986	+0.58%
BD_3×3	0.9360	0.93192–0.94004	0.2553	+0.71%	0.8798	0.87256–0.88687	0.2551	+1.35%
ED_3×3	0.9348	0.92878–0.93993	0.3291	+0.58%	0.8775	0.86711–0.88669	0.3308	+1.08%
EBD_3×3	0.9339	0.92798–0.93907	0.0364	+0.48%	0.8760	0.86568–0.88515	0.0373	+0.91%
E_1×1	0.9327	0.92720–0.93755	0.0626	+0.36%	0.8739	0.86433–0.88245	0.0627	+0.67%
B_1×1	0.9343	0.92893–0.93960	0.0051*	+0.53%	0.8766	0.86731–0.88608	0.0052*	+0.98%
D_1×1	0.9329	0.92852–0.93642	0.1278	+0.38%	0.8742	0.86661–0.88014	0.1250	+0.70%
EB_1×1	0.9333	0.92767–0.93850	0.1409	+0.42%	0.8750	0.86515–0.88413	0.1396	+0.79%
BD_1×1	0.9329	0.92677–0.93900	0.0586	+0.38%	0.8742	0.86357–0.88501	0.0614	+0.70%
ED_1×1	0.9339	0.92910–0.93809	0.1012	+0.48%	0.8761	0.86762–0.88340	0.1027	+0.92%
EBD_1×1	0.9326	0.92700–0.93767	0.1298	+0.34%	0.8737	0.86397–0.88267	0.1335	+0.65%
B_1×1_D_3×3	0.9346	0.92928–0.93694	0.0520	+0.56%	0.8772	0.86840–0.88377	0.0501	+1.05%
E_1×1_B_1×1_D_3×3	0.9337	0.93158–0.93542	0.2602	+0.46%	0.8755	0.87175–0.87866	0.2603	+0.85%
E_1×1_B_3×3_D_3×3	0.9339	0.92954–0.93830	0.1160	+0.48%	0.8760	0.86795–0.88136	0.1172	+0.91%

Open in a new tab

Mean F1 scores and Intersection Over Union (IOU) were calculated from 5-fold cross-validation. 95% confidence intervals were estimated by bootstrapping (10,000 iterations), and P-values were obtained using paired t-tests versus the U-Net 9 baseline. An asterisk (*) indicates statistically significant improvement at p < 0.05.

Table presents the segmentation performance of U-Net variants modified with MACB_3×3 and MACB_1×1 modules. Models were evaluated using 5-fold CV, and the mean F1 score and IoU were computed alongside 95% Confidence Intervals (CIs) estimated via 10,000 bootstrap resampling iterations. The table also reports relative improvements over baseline U-Net 9 and associated p-values derived from paired t-tests on fold-level scores. To facilitate visual comparison of the statistical spread and overlap between models, an error bar plot depicting the mean F1 scores and IoU with 95% CIs is provided in Figures S1 and S2.

For clarity, models modified with MACB_3×3 are abbreviated as E_3×3, B_3×3, and D_3×3, corresponding to MACB_3×3 integration in the encoder, bridge, and decoder paths, respectively. Similarly, the MACB_1×1 models are labeled as E_1×1, B_1×1, and D_1×1. Combined configurations follow the same naming convention, such as BD_3×3 for a model with MACB_3×3 in both the bridge and decoder paths.

The baseline U-Net 9 achieved an F1 score of 0.9294 (95% CI: 0.92328–0.93555) and an IoU of 0.8681 (95% CI: 0.85750–0.87893). All MACB-modified variants demonstrated performance improvements over this baseline, although the degree of improvement and statistical significance varied by the location and type of ASPP module used.

Among the MACB_3×3 models, decoder-modified variants (D_3×3, BD_3×3, ED_3×3, EBD_3×3) achieved the most substantial gains (+40–70% F1). Notably, D_3×3 reached an F1 of 0.9331 (+0.40%, p = 0.3354) and an IoU of 0.8745 (+0.74%, p = 0.3318), while BD_3×3 delivered the best overall performance at 0.9360 F1 and 0.8798 IoU (+0.71%, + 1.35%); however, the improvements were not statistically significant (p = 0.2553 for F1; p = 0.2551 for IoU). In contrast, E_3×3 showed negligible improvement (+0.02% F1, p = 0.9924), suggesting that applying ASPP with 3 × 3 convolution in the encoder path does not provide meaningful benefits for PEO coating segmentation.

This trend suggests that decoder enhancement plays a dominant role in segmentation performance, likely due to its influence in reconstructing detailed spatial features. The bridge-modified B_3×3 model also performed moderate improvement (+0.31% F1, +0.59% IoU), but again without statistical significance (p = 0.6425, p = 0.6455), highlighting the bridge path’s importance in aggregating global semantic features even if the gains may vary across folds.

Following MACB_3×3, MACB_1×1 modules, similar to the design of DeepLab series, − were introduced to explore lightweight alternatives. Despite their reduced complexity, MACB_1×1 variants still outperformed the baseline. Among them, B_1×1 achieved the best single-path performance (0.9343 F1, 0.8766 IoU) and was the only model to exhibit statistically significant improvement over U-Net 9 (+0.53%, p = 0.0051 for F1; +0.98%, p = 0.0052 for IoU), reinforcing the critical role of the bridge path. This is likely due to its central position in the U-Net, where global semantic information can be most effectively aggregated. Additionally, the MACB_1×1 may serve as a substitute for further down-sampling operations, helping to preserve spatial resolution typically lost during the down- and up-sampling processes. D_1×1 also performed well (0.9329 F1, +0.38%, p = 0.1278), supporting the importance of decoder-side enhancement. E_1×1 achieved slightly better performance than E_3×3 (+0.36% vs +0.02% F1), suggesting that even lightweight 1 × 1 filters can facilitate effective multiscale feature propagation in the encoder without the added complexity or potential early-stage feature fusion introduced by 3 × 3 convolutions.

Interestingly, B_1×1 outperformed all other MACB_1×1 variants, even the dridge-based multipath-modified models with additional encoder or decoder modifications. This suggests that oversimplifying the network using multiple 1 × 1 filters across paths may introduce diminishing returns.

These findings motivated the development of hybrid configurations that combine MACB_1×1 and MACB_3×3 modules. Three hybrids were evaluated: B_1×1_D_3×3, combining MACB_1×1 in the bridge and MACB_3×3 in the decoder; E_1×1_B_1×1_D_3×3, adding an encoder MACB_1×1 to the B_1×1_D_3×3 structure; and E_1×1_B_3×3_D_3×3, pairing encoder MACB_1×1 with the best-performing model, BD_3×3.

Among them, B_1×1_D_3×3 (+0.56% F1) outperformed B_1×1 (+0.53% F1) alone but did not exceed the BD_3×3 (+0.71% F1) model. The corresponding F1 and IoU improvement was marginally nonsignificant (p = 0.0520, p = 0.0501), suggesting a trend but not strong enough evidence to confirm a meaningful difference. Both E_1×1_B_1×1_D_3×3 (+46% F1) and E_1×1_B_3×3_D_3×3 (+48% F1) showed slightly reduced performance compared to B_1×1_D_3×3 and BD_3×3, indicating that adding encoder-side ASPP (E_1×1) may introduce unnecessary complexity without proportional benefits.

In summary, BD_3×3 remains the top-performing configuration in terms of raw scores, combining robust contextual representation from the bridge with fine detail recovery from the decoder. However, only B_1×1 exhibited statistically significant improvements, emphasizing the importance of evaluating both absolute metrics and statistical confidence. This analysis underscores the critical role of ASPP module placement and filter complexity in designing segmentation networks for morphologically diverse structures like PEO pores.

In addition to segmentation performance, the computational cost of the proposed models was evaluated to assess their efficiency, as summarized in Table . The training time per epoch was recorded during model development, while the average total training time represents the cumulative duration across all five-folds of cross-validation, incorporating early stopping. The inference time was measured as the average time required to predict a single test image. To facilitate comparison, the relative improvement in F1 score over the baseline U-Net 9 (ΔF1 vs U-Net 9) is also included.

3. Computational Cost and F1 Score Improvement (ΔF1) of U-Net Variants with ASPP-Based Modifications .

Model	ΔF1 vs U-Net 9	Training time per epoch (s)	Average total training time (min)	Inference time (ms)
U-Net 9	-	96	43.5	25.07
E_3×3	+0.02%	104	47.5	31.02
B_3×3	+0.31%	97	66.0	24.49
D_3×3	+0.40%	135	80.6	46.32
EB_3×3	+0.31%	104	39.2	31.15
BD_3×3	+0.71%	135	80.6	46.32
ED_3×3	+0.58%	142	86.6	52.67
EBD_3×3	+0.48%	141	55.0	52.67
E _1×1	+0.36%	97	56.3	31.36
B_1×1	+0.53%	95	50.7	24.97
D_1×1	+0.38%	126	56.7	46.49
EB_1×1	+0.42%	96	56.3	31.20
BD_1×1	+0.38%	124	40.9	46.59
ED_1×1	+0.48%	128	69.5	52.22
EBD_1×1	+0.34%	127	60.5	52.43
B_1×1_D_3×3	+0.56%	134	47.3	46.58
E_1×1_B_1×1_D_3×3	+0.46%	134	99.2	52.60
E_1×1_B_3×3_D_3×3	+0.48%	135	58.5	53.80

Open in a new tab

Each model was evaluated by recording the training time per epoch (in seconds), average total training time (in minutes), and inference time (in milliseconds).

Training time per epoch and inference time primarily reflect the computational complexity of each model. Bridge-modified variants such as B_3×3 (97 s/epoch; 24.49 ms inference) and B_1×1 (95 s/epoch; 24.97 ms inference) demonstrated comparable or slightly improved efficiency relative to the baseline U-Net 9 (96 s/epoch; 25.07 ms) due to its only one block modification. Encoder modifications introduced moderate overhead (E_3×3: 104 s; 31.02 ms; E_1×1: 97 s; 31.36 ms), while decoder-modified models exhibited the most significant increases in both training and inference time (D_3×3: 135 s; 46.32 ms; D_1×1: 126 s; 46.49 ms) due to the increased number of filters propagated through skip connections. Multipath models combining encoder, bridge, and decoder enhancements (e.g., E_1×1_B_1×1_D_3×3: 134 s; 52.60 ms) incurred further increases in complexity.

In contrast, the average total training time reflects both the model complexity and convergence behavior during training. This metric accounts for the number of epochs required to reach optimal performance under early stopping. For example, although B_3×3 required longer total training time (66.0 min) compared to EB_3×3 (39.2 min), both achieved the same ΔF1 improvement (+0.31%), suggesting faster convergence for EB_3×3, likely due to enhanced feature extraction from the added encoder modification. BD_3×3, the best-performing model in terms of segmentation F1 score (+0.71%), required 80.6 min, indicating a trade-off between the performance gain and training efficiency. Notably, B_1×1 achieved a substantial improvement (+0.53%) with a relatively low total training and inference time (50.7 min, 24.97 ms), making it a practical and efficient alternative.

In summary, BD_3×3 remains the optimal choice for maximizing segmentation performance, and B_1×1 offers a more practical balance between accuracy and efficiency, making it well-suited for real-time applications or resource constrained applications.

Visual Inspections

Figure presents a visual comparison of segmentation results from the baseline U-Net 9; the encoder-, bridge-, and decoder-modified variants; and the best-performing model (BD_3×3), based on six representative SEM images. Predicted masks are color-coded as follows: true positives (blue), false positives (red), false negatives (green), and true negatives (black). While all models demonstrate generally acceptable segmentation performance, subtle differences can be observed. The baseline U-Net 9 already produces robust predictions, and the MACB-modified variants exhibit slight improvements in capturing complex pore areas. These observations prompted a more detailed quantitative analysis focused on fine pores (regions with an area below 25 pixels) to objectively assess each model’s ability to detect subtle features.

Comparison of U-Net 9, single-path modified U-Net variants, and BD_3×3 for pore segmentation in PEO Coatings.

As shown in Table , all U-Net variants substantially oversegment small pores (area <25 pixels) compared to the ground truth. The ground truth masks contain 1731 small pores, while model predictions range from 3631 (B_1×1) to 4593 (D_1×1), more than double the actual count in some cases. Notably, decoder-modified models (D_3×3 and D_1×1) exhibit the highest degree of oversegmentation, likely due to their larger skip-connection filters amplifying noise and misinterpreting fine textures as pore boundaries. In contrast, the minimal enhancement in B_1×1 leads to more conservative segmentation, reducing the misclassification of image noise as pore features. Bridge- and encoder-modified variants tend to produce slightly fewer small pores overall, suggesting that the placement of architectural modifications influences the model’s sensitivity to fine-scale structures.

4. Comparison of Small Pore Counts (Area <25 Pixels) Identified in Ground Truth Masks and Predicted by Various U-Net Variants .

	Pore count (<25 pixels)
Ground truth	1731
B_1×1	3631
E_3×3	3813
E_1×1	3860
BD_3×3	3953
B_3×3	3984
U-Net 9	4053
D_3×3	4470
D_1×1	4593

Open in a new tab

A total of 40 SEM images were analyzed for each model. The models are listed in the increasing order.

Further comparisons were conducted using masks generated by B_1×1, BD_3×3, new students (manual annotation), and the Otsu’s thresholding method, as shown in Figure . Two test images were used to illustrate that the segmented masks by B_1×1 and BD_3×3 demonstrated markedly superior performance compared to the predictions produced by manual annotation and the Otsu’s method. The manual annotation errors may stem from the new students’ lack of familiarity with the PEO coating structure. The Otsu’s method, relying only on pixel intensity, incorrectly categorized shallow and depressed areas as well as minor dark spots as background, leading to its underperformance.

Comparison of B_1×1, BD_3×3, human annotation, and Otsu’s method of Otsu for pore segmentation in PEO coatings.

The improved segmentation performance provided by the MACB U-Nets not only enhances the reliability of pore analysis but also contributes to a deeper understanding of the relationship between the pore morphology and the mechanical properties of PEO coatings. This method offers a robust tool for future research and industrial applications, where precise pore analysis is critical.

Conclusion

This study systematically evaluated the integration of Atrous Spatial Pyramid Pooling (ASPP) into U-Net architectures for the segmentation of pores in PEO coatings. By modifying different parts of the U-Net architecture, namely, the encoder, bridge, and decoder, using multiscale atrous convolutional blocks (MACBs), we identified optimal strategies for enhancing segmentation performance and enabling accurate pore morphology quantification.

Our findings highlight that those modifications to the decoder path, particularly with MACB_3×3, yield the greatest performance gains, reflecting its critical role in reconstructing fine spatial details. The BD_3×3 variant achieved the highest F1 improvement (+0.71%) and strong visual performance, while the B_1×1 variant using pointwise convolution in the bridge was the only model with statistically significant improvements over the baseline, offering a compelling trade-off between accuracy and computational cost.

The small pore analysis revealed that all U-Net variants substantially oversegmented small pores (area <25 pixels), with decoder-modified models (D_1×1, D_3×3) showing the highest oversegmentation rates, likely due to the noise amplification caused by integrating ASPP in the decoder path. In contrast, the B_1×1 model produced more conservative and morphology-consistent results. Additionally, the ASPP-modified U-Net variants outperformed traditional segmentation methods, such as Otsu’s thresholding and human annotations, in terms of both accuracy and reliability.

By enabling high-fidelity segmentation of surface porosity, this work contributes a robust computational tool for the quantitative analysis of interfacial microstructures, supporting efforts to correlate morphology with macroscopic material properties. The framework established here can be extended to other materials with complex surface features, and future research may explore its integration with attention mechanisms or regularization strategies to further enhance generalization and support data-driven materials optimization.

Supplementary Material

la5c01673_si_001.pdf^{(476.4KB, pdf)}

Acknowledgments

This work is finally supported by the National Science and Technology Council, Taiwan (NSTC 111-2222-E-027-011 and NSTC 113-2221-E-027-030), and the National Taipei University of Technology (NTUT-Gdańsk Tech-113-03 and NTUT-IJRP-114-04).

Glossary

Abbreviations

BCB: Basic Convolutional Block
MACB: Multiscale Atrous Convolutional Block
E_3×3: U-Net variant with MACB_3×3 applied in the encoder
B_3×3: U-Net variant with MACB_3×3 applied in the bridge
D_3×3: U-Net variant with MACB_3×3 applied in the decoder
E_1×1: U-Net variant with MACB_1×1 applied in the encoder
B_1×1: U-Net variant with MACB_1×1 applied in the bridge
D_1×1: U-Net variant with MACB_1×1 applied in the decoder
EB_3×3, BD_3×3, ED_3×3, EBD_3×3: U-Net variants with MACB_3×3 applied in multiple components (e.g., BD_3×3 indicates MACB_3×3 in both the bridge and decoder)
EB_1×1, BD_1×1, ED_1×1, EBD_1×1: U-Net variants with MACB_1×1 applied in multiple components following the same notation

The full training code for Basic U-Net 9 and the model architectures for all MACB U-Net variants are available at a private GitHub repository. Access will be granted upon reasonable request and the repository will be made public upon publication. GitHub repository: https://github.com/Chi-Wei-Chu/PEO-Coating-Segmentation.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.langmuir.5c01673.

Tuning result tables; F1 curves; unpublished works (PDF)

C.-W.C.: Visualization, Investigation, Formal Analysis, WritingOriginal Draft; Visualization, Methodology, and Conceptualization. C.-M.L.; Data Curation, Investigation, Formal Analysis, and Validation. W.K.Y.: WritingOriginal Draft, Review and Editing, Validation, Supervision, Project Administration, Investigation, Funding Acquisition, and Conceptualization.

The authors declare no competing financial interest.

References

LeCun Y., Bengio Y., Hinton G.. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
Hao S., Zhou Y., Guo Y.. A Brief Survey on Semantic Segmentation with Deep Learning. Neurocomputing. 2020;406:302–321. doi: 10.1016/j.neucom.2019.11.118. [DOI] [Google Scholar]
Yu Y., Wang C., Fu Q., Kou R., Huang F., Yang B., Yang T., Gao M.. Techniques and Challenges of Image Segmentation: A Review. Electronics. 2023;12(5):1199. doi: 10.3390/electronics12051199. [DOI] [Google Scholar]
Azad R., Aghdam E. K., Rauland A., Jia Y., Avval A. H., Bozorgpour A., Karimijafarbigloo S., Cohen J. P., Adeli E., Merhof D.. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024;46(12):10076–10095. doi: 10.1109/TPAMI.2024.3435571. [DOI] [PubMed] [Google Scholar]
Jena B., Jain S., Nayak G. K., Saxena S.. Analysis of depth variation of U-NET architecture for brain tumor segmentation. Multimed. Tools Appl. 2023;82(7):10723–10743. doi: 10.1007/s11042-022-13730-1. [DOI] [Google Scholar]
Dong, H. ; Yang, G. ; Liu, F. ; Mo, Y. ; Guo, Y. . Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In Medical Image Understanding And Analysis: 21st Annual Conference, MIUA 2017; Springer, 2017; pp. 506–517. [Google Scholar]
Agarwal S., Sawant A., Faisal M., Copp S. E., Reyes-Zacarias J., Lin Y.-R., Zinkle S. J.. Application of a deep learning semantic segmentation model to helium bubbles and voids in nuclear materials. Eng. Appl. Artif. Intell. 2023;126:106747. doi: 10.1016/j.engappai.2023.106747. [DOI] [Google Scholar]
Amin, A. ; Ma, H. ; Hossain, M. S. ; Roni, N. A. ; Haque, E. ; Asaduzzaman, S. ; Abedin, R. ; Ekram, A. B. ; Akter, R. F. . Industrial product defect detection using custom u-net. In 2022 25th International Conference on Computer and Information Technology (ICCIT); IEEE, 2022; pp 442–447. [Google Scholar]
Kaseem M., Yang H. W., Ko Y. G.. Toward a nearly defect-free coating via high-energy plasma sparks. Sci. Rep. 2017;7(1):2378. doi: 10.1038/s41598-017-02702-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sarbishei S., Faghihi Sani M. A., Mohammadi M. R.. Effects of alumina nanoparticles concentration on microstructure and corrosion behavior of coatings formed on titanium substrate via PEO process. Ceram. Int. 2016;42(7):8789–8797. doi: 10.1016/j.ceramint.2016.02.120. [DOI] [Google Scholar]
Zhang X., Cai G., Lv Y., Wu Y., Dong Z.. Growth mechanism of titania on titanium substrate during the early stage of plasma electrolytic oxidation. Surf. Coat. Technol. 2020;400:126202. doi: 10.1016/j.surfcoat.2020.126202. [DOI] [Google Scholar]
Zhang W., Du Y., Zhang P.. Excellent plasma electrolytic oxidation coating on AZ61 magnesium alloy under ordinal discharge mode. J. Magnesium Alloys. 2022;10(9):2460–2474. doi: 10.1016/j.jma.2021.01.003. [DOI] [Google Scholar]
Kamal Jayaraj R., Malarvizhi S., Balasubramanian V.. Optimizing the micro-arc oxidation (MAO) parameters to attain coatings with minimum porosity and maximum hardness on the friction stir welded AA6061 aluminium alloy welds. Def. Technol. 2017;13(2):111–117. doi: 10.1016/j.dt.2017.03.003. [DOI] [Google Scholar]
Akatsu T., Kato T., Shinoda Y., Wakai F.. Thermal barrier coating made of porous zirconium oxide on a nickel-based single crystal superalloy formed by plasma electrolytic oxidation. Surf. Coat. Technol. 2013;223:47–51. doi: 10.1016/j.surfcoat.2013.02.026. [DOI] [Google Scholar]
Li G., Ma F., Li Z., Xu Y., Gao F., Guo L., Zhu J., Li G., Xia Y.. Influence of Applied Frequency on Thermal Physical Properties of Coatings Prepared on Al and AlSi Alloys by Plasma Electrolytic Oxidation. Coatings. 2021;11(12):1439. doi: 10.3390/coatings11121439. [DOI] [Google Scholar]
Lin G.-W., Chen J.-S., Tseng W., Lu F.-H.. Formation of anatase TiO2 coatings by plasma electrolytic oxidation for photocatalytic applications. Surf. Coat. Technol. 2019;357:28–35. doi: 10.1016/j.surfcoat.2018.10.010. [DOI] [Google Scholar]
Friedemann A. E. R., Thiel K., Gesing T. M., Plagemann P.. Photocatalytic activity of TiO2 layers produced with plasma electrolytic oxidation. Surf. Coat. Technol. 2018;344:710–721. doi: 10.1016/j.surfcoat.2018.03.049. [DOI] [Google Scholar]
Clyne T. W., Troughton S. C.. A review of recent work on discharge characteristics during plasma electrolytic oxidation of various metals. Int. Mater. Rev. 2019;64(3):127–162. doi: 10.1080/09506608.2018.1466492. [DOI] [Google Scholar]
Sikdar S., Menezes P. V., Maccione R., Jacob T., Menezes P. L.. Plasma Electrolytic Oxidation (PEO) ProcessProcessing, Properties, and Applications. Nanomaterials. 2021;11(6):1375. doi: 10.3390/nano11061375. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X., Aliasghari S., Němcová A., Burnett T. L., Kuběna I., Šmíd M., Thompson G. E., Skeldon P., Withers P. J.. X-ray Computed Tomographic Investigation of the Porosity and Morphology of Plasma Electrolytic Oxidation Coatings. ACS Appl. Mater. Interfaces. 2016;8(13):8801–8810. doi: 10.1021/acsami.6b00274. [DOI] [PubMed] [Google Scholar]
Karlova P., Serdechnova M., Blawert C., Lu X., Mohedano M., Tolnai D., Zeller-Plumhoff B., Zheludkevich M. L.. Comparison of 2D and 3D Plasma Electrolytic Oxidation (PEO)-Based Coating Porosity Data Obtained by X-ray Tomography Rendering and a Classical Metallographic Approach. Materials. 2022;15(18):6315. doi: 10.3390/ma15186315. [DOI] [PMC free article] [PubMed] [Google Scholar]
N S., S V.. Image Segmentation By Using Thresholding Techniques For Medical Images. J. Comput. Sci. Eng. 2016;6:1–13. doi: 10.5121/cseij.2016.6101. [DOI] [Google Scholar]
Ivasenko I. B., Posuvailo V. M., Klapkiv M. D., Vynar V. A., Ostap’yuk S. I.. Express method for determining the presence of defects of the surface of oxide-ceramic coatings. Mater. Sci. 2009;45(3):460–464. doi: 10.1007/s11003-009-9191-6. [DOI] [Google Scholar]
Shelhamer E., Long J., Darrell T.. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(4):640–651. doi: 10.1109/TPAMI.2016.2572683. [DOI] [PubMed] [Google Scholar]
Badrinarayanan V., Kendall A., Cipolla R.. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]
Ronneberger, O. ; Fischer, P. ; Brox, T. . U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015:18th international conference; Springer, 2015; pp 234–241. [Google Scholar]
Siddique N., Paheding S., Elkin C. P., Devabhaktuni V.. U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications. IEEE Access. 2021;9:82031–82057. doi: 10.1109/ACCESS.2021.3086020. [DOI] [Google Scholar]
Pan Z., Xu J., Guo Y., Hu Y., Wang G.. Deep Learning Segmentation and Classification for Urban Village Using a Worldview Satellite Image Based on U-Net. Remote Sens. 2020;12(10):1574. doi: 10.3390/rs12101574. [DOI] [Google Scholar]
Ma Z., Wang G., Yao J., Huang D., Tan H., Jia H., Zou Z.. An Improved U-Net Model Based on Multi-Scale Input and Attention Mechanism: Application for Recognition of Chinese Cabbage and Weed. Sustainability. 2023;15(7):5764. doi: 10.3390/su15075764. [DOI] [Google Scholar]
Zhang Z., Liu Q., Wang Y.. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018;15(5):749–753. doi: 10.1109/LGRS.2018.2802944. [DOI] [Google Scholar]
Lozej, J. ; Meden, B. ; Struc, V. ; Peer, P. . End-to-end iris segmentation using u-net. In 2018 IEEE international work conference on bioinspired intelligence (IWOBI); IEEE, 2018; pp 1–6. [Google Scholar]
Zhao Z., Yin Y., Hu A., Wang W., Liang Y.. The U-net-based ice pore parameter extraction method for establishing the SAW icing sensing mechanism. Sens. Actuators, A. 2024;373:115394. doi: 10.1016/j.sna.2024.115394. [DOI] [Google Scholar]
Yasin Q., Liu B., Sun M., Sohail G. M., Ismail A., Majdanski M., Golsanami N., Ma Y., Fu X.. Automatic pore structure analysis in organic-rich shale using FIB-SEM and attention U-Net. Fuel. 2024;358:130161. doi: 10.1016/j.fuel.2023.130161. [DOI] [Google Scholar]
Chen L.-C., Papandreou G., Kokkinos I., Murphy K., Yuille A. L.. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2018;40(4):834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]
Chen, L.-C. ; Papandreou, G. ; Schroff, F. ; Adam, H. . Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017. [Google Scholar]
Wang Y., Wang C., Wu H., Chen P.. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS One. 2022;17(1):e0261582. doi: 10.1371/journal.pone.0261582. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Z., Chen L., Fu T., Yin Z., Yang F.. Spine Image Segmentation Based on U-Net and Atrous spatial pyramid pooling. J. phys.: Conf. Ser. 2022;2209(1):012020. doi: 10.1088/1742-6596/2209/1/012020. [DOI] [Google Scholar]
Liu R., Tao F., Liu X., Na J., Leng H., Wu J., Zhou T.. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022;14(13):3109. doi: 10.3390/rs14133109. [DOI] [Google Scholar]
Liu, C. ; Gao, H. ; Chen, A. . A real-time semantic segmentation algorithm based on improved lightweight network. In 2020 International Symposium on Autonomous Systems (ISAS); IEEE, 2020; pp 249–253. [Google Scholar]
Zhou Z., Siddiquee M. M. R., Tajbakhsh N., Liang J.. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging. 2020;39(6):1856–1867. doi: 10.1109/TMI.2019.2959609. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gu Z., Cheng J., Fu H., Zhou K., Hao H., Zhao Y., Zhang T., Gao S., Liu J.. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging. 2019;38(10):2281–2292. doi: 10.1109/TMI.2019.2903562. [DOI] [PubMed] [Google Scholar]
Mao R., Xie L., Lu X., Pei J., Xu X., Chang S.. Harnessing Multiple Level Features to Improve Segmentation Performance of Deep Neural Network: A Case Study in Magnetic Resonance Imaging of Nasopharyngeal Cancer. IEEE Access. 2024;12:82469–82481. doi: 10.1109/ACCESS.2024.3411099. [DOI] [Google Scholar]
Liu J., Mu J., Sun H., Dai C., Ji Z., Ganchev I.. DLGRAFE-Net: A double loss guided residual attention and feature enhancement network for polyp segmentation. PLoS One. 2024;19(9):e0308237. doi: 10.1371/journal.pone.0308237. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao, H. ; Shi, J. ; Qi, X. ; Wang, X. ; Jia, J. . Pyramid scene parsing network, In Proceedings of the IEEE conference on computer vision and pattern recognition; IEEE, 2017; pp 2881–2890. [Google Scholar]
Bansal, A. ; Ostap, O. ; Trueba, M. M. ; Perry, K. . Atrous Space Bender U-Net (ASBU-Net/LogiNet). arXiv 2022. [Google Scholar]
Yousef R., Khan S., Gupta G., Albahlal B. M., Alajlan S. A., Ali A.. Bridged-U-Net-ASPP-EVO and Deep Learning Optimization for Brain Tumor Segmentation. Diagnostics. 2023;13(16):2633. doi: 10.3390/diagnostics13162633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao Q., Almekkawy M.. ASU-Net++: A nested U-Net with adaptive feature extractions for liver tumor segmentation. Comput. Biol. Med. 2021;136:104688. doi: 10.1016/j.compbiomed.2021.104688. [DOI] [PubMed] [Google Scholar]
Ding, B. ; Qian, H. ; Zhou, J. . Activation functions and their characteristics in deep neural networks. In 2018 Chinese control and decision conference (CCDC); IEEE, 2018; pp 1836–1841. [Google Scholar]
Thanapol, P. ; Lavangnananda, K. ; Bouvry, P. ; Pinel, F. ; Leprévost, F. . Reducing overfitting and improving generalization in training convolutional neural network (CNN) under limited sample sizes in image recognition. In 2020–5th International Conference on Information Technology (InCIT); IEEE, 2020; pp 300–305. [Google Scholar]
Berrar, D. Cross-Validation. Encyclopedia of Bioinformatics and Computational Biology. Ranganathan, S. ; Gribskov, M. ; Nakai, K. ; Schönbach, C. pp. 542–545.Academic Press, 2019. [Google Scholar]
Müller D., Soto-Rey I., Kramer F.. Robust chest CT image segmentation of COVID-19 lung infection based on limited data. Inform. Med. Unlocked. 2021;25:100681. doi: 10.1016/j.imu.2021.100681. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kingma, D. P. ; Ba, J. . Adam: A method for stochastic optimization. arXiv 2014. [Google Scholar]
Bengio Y., Simard P., Frasconi P.. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994;5(2):157–166. doi: 10.1109/72.279181. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

la5c01673_si_001.pdf^{(476.4KB, pdf)}

Data Availability Statement

[ref1] LeCun Y., Bengio Y., Hinton G.. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[ref2] Hao S., Zhou Y., Guo Y.. A Brief Survey on Semantic Segmentation with Deep Learning. Neurocomputing. 2020;406:302–321. doi: 10.1016/j.neucom.2019.11.118. [DOI] [Google Scholar]

[ref3] Yu Y., Wang C., Fu Q., Kou R., Huang F., Yang B., Yang T., Gao M.. Techniques and Challenges of Image Segmentation: A Review. Electronics. 2023;12(5):1199. doi: 10.3390/electronics12051199. [DOI] [Google Scholar]

[ref4] Azad R., Aghdam E. K., Rauland A., Jia Y., Avval A. H., Bozorgpour A., Karimijafarbigloo S., Cohen J. P., Adeli E., Merhof D.. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024;46(12):10076–10095. doi: 10.1109/TPAMI.2024.3435571. [DOI] [PubMed] [Google Scholar]

[ref5] Jena B., Jain S., Nayak G. K., Saxena S.. Analysis of depth variation of U-NET architecture for brain tumor segmentation. Multimed. Tools Appl. 2023;82(7):10723–10743. doi: 10.1007/s11042-022-13730-1. [DOI] [Google Scholar]

[ref6] Dong, H. ; Yang, G. ; Liu, F. ; Mo, Y. ; Guo, Y. . Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In Medical Image Understanding And Analysis: 21st Annual Conference, MIUA 2017; Springer, 2017; pp. 506–517. [Google Scholar]

[ref7] Agarwal S., Sawant A., Faisal M., Copp S. E., Reyes-Zacarias J., Lin Y.-R., Zinkle S. J.. Application of a deep learning semantic segmentation model to helium bubbles and voids in nuclear materials. Eng. Appl. Artif. Intell. 2023;126:106747. doi: 10.1016/j.engappai.2023.106747. [DOI] [Google Scholar]

[ref8] Amin, A. ; Ma, H. ; Hossain, M. S. ; Roni, N. A. ; Haque, E. ; Asaduzzaman, S. ; Abedin, R. ; Ekram, A. B. ; Akter, R. F. . Industrial product defect detection using custom u-net. In 2022 25th International Conference on Computer and Information Technology (ICCIT); IEEE, 2022; pp 442–447. [Google Scholar]

[ref9] Kaseem M., Yang H. W., Ko Y. G.. Toward a nearly defect-free coating via high-energy plasma sparks. Sci. Rep. 2017;7(1):2378. doi: 10.1038/s41598-017-02702-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Sarbishei S., Faghihi Sani M. A., Mohammadi M. R.. Effects of alumina nanoparticles concentration on microstructure and corrosion behavior of coatings formed on titanium substrate via PEO process. Ceram. Int. 2016;42(7):8789–8797. doi: 10.1016/j.ceramint.2016.02.120. [DOI] [Google Scholar]

[ref11] Zhang X., Cai G., Lv Y., Wu Y., Dong Z.. Growth mechanism of titania on titanium substrate during the early stage of plasma electrolytic oxidation. Surf. Coat. Technol. 2020;400:126202. doi: 10.1016/j.surfcoat.2020.126202. [DOI] [Google Scholar]

[ref12] Zhang W., Du Y., Zhang P.. Excellent plasma electrolytic oxidation coating on AZ61 magnesium alloy under ordinal discharge mode. J. Magnesium Alloys. 2022;10(9):2460–2474. doi: 10.1016/j.jma.2021.01.003. [DOI] [Google Scholar]

[ref13] Kamal Jayaraj R., Malarvizhi S., Balasubramanian V.. Optimizing the micro-arc oxidation (MAO) parameters to attain coatings with minimum porosity and maximum hardness on the friction stir welded AA6061 aluminium alloy welds. Def. Technol. 2017;13(2):111–117. doi: 10.1016/j.dt.2017.03.003. [DOI] [Google Scholar]

[ref14] Akatsu T., Kato T., Shinoda Y., Wakai F.. Thermal barrier coating made of porous zirconium oxide on a nickel-based single crystal superalloy formed by plasma electrolytic oxidation. Surf. Coat. Technol. 2013;223:47–51. doi: 10.1016/j.surfcoat.2013.02.026. [DOI] [Google Scholar]

[ref15] Li G., Ma F., Li Z., Xu Y., Gao F., Guo L., Zhu J., Li G., Xia Y.. Influence of Applied Frequency on Thermal Physical Properties of Coatings Prepared on Al and AlSi Alloys by Plasma Electrolytic Oxidation. Coatings. 2021;11(12):1439. doi: 10.3390/coatings11121439. [DOI] [Google Scholar]

[ref16] Lin G.-W., Chen J.-S., Tseng W., Lu F.-H.. Formation of anatase TiO2 coatings by plasma electrolytic oxidation for photocatalytic applications. Surf. Coat. Technol. 2019;357:28–35. doi: 10.1016/j.surfcoat.2018.10.010. [DOI] [Google Scholar]

[ref17] Friedemann A. E. R., Thiel K., Gesing T. M., Plagemann P.. Photocatalytic activity of TiO2 layers produced with plasma electrolytic oxidation. Surf. Coat. Technol. 2018;344:710–721. doi: 10.1016/j.surfcoat.2018.03.049. [DOI] [Google Scholar]

[ref18] Clyne T. W., Troughton S. C.. A review of recent work on discharge characteristics during plasma electrolytic oxidation of various metals. Int. Mater. Rev. 2019;64(3):127–162. doi: 10.1080/09506608.2018.1466492. [DOI] [Google Scholar]

[ref19] Sikdar S., Menezes P. V., Maccione R., Jacob T., Menezes P. L.. Plasma Electrolytic Oxidation (PEO) ProcessProcessing, Properties, and Applications. Nanomaterials. 2021;11(6):1375. doi: 10.3390/nano11061375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Zhang X., Aliasghari S., Němcová A., Burnett T. L., Kuběna I., Šmíd M., Thompson G. E., Skeldon P., Withers P. J.. X-ray Computed Tomographic Investigation of the Porosity and Morphology of Plasma Electrolytic Oxidation Coatings. ACS Appl. Mater. Interfaces. 2016;8(13):8801–8810. doi: 10.1021/acsami.6b00274. [DOI] [PubMed] [Google Scholar]

[ref21] Karlova P., Serdechnova M., Blawert C., Lu X., Mohedano M., Tolnai D., Zeller-Plumhoff B., Zheludkevich M. L.. Comparison of 2D and 3D Plasma Electrolytic Oxidation (PEO)-Based Coating Porosity Data Obtained by X-ray Tomography Rendering and a Classical Metallographic Approach. Materials. 2022;15(18):6315. doi: 10.3390/ma15186315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] N S., S V.. Image Segmentation By Using Thresholding Techniques For Medical Images. J. Comput. Sci. Eng. 2016;6:1–13. doi: 10.5121/cseij.2016.6101. [DOI] [Google Scholar]

[ref23] Ivasenko I. B., Posuvailo V. M., Klapkiv M. D., Vynar V. A., Ostap’yuk S. I.. Express method for determining the presence of defects of the surface of oxide-ceramic coatings. Mater. Sci. 2009;45(3):460–464. doi: 10.1007/s11003-009-9191-6. [DOI] [Google Scholar]

[ref24] Shelhamer E., Long J., Darrell T.. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(4):640–651. doi: 10.1109/TPAMI.2016.2572683. [DOI] [PubMed] [Google Scholar]

[ref25] Badrinarayanan V., Kendall A., Cipolla R.. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]

[ref26] Ronneberger, O. ; Fischer, P. ; Brox, T. . U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015:18th international conference; Springer, 2015; pp 234–241. [Google Scholar]

[ref27] Siddique N., Paheding S., Elkin C. P., Devabhaktuni V.. U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications. IEEE Access. 2021;9:82031–82057. doi: 10.1109/ACCESS.2021.3086020. [DOI] [Google Scholar]

[ref28] Pan Z., Xu J., Guo Y., Hu Y., Wang G.. Deep Learning Segmentation and Classification for Urban Village Using a Worldview Satellite Image Based on U-Net. Remote Sens. 2020;12(10):1574. doi: 10.3390/rs12101574. [DOI] [Google Scholar]

[ref29] Ma Z., Wang G., Yao J., Huang D., Tan H., Jia H., Zou Z.. An Improved U-Net Model Based on Multi-Scale Input and Attention Mechanism: Application for Recognition of Chinese Cabbage and Weed. Sustainability. 2023;15(7):5764. doi: 10.3390/su15075764. [DOI] [Google Scholar]

[ref30] Zhang Z., Liu Q., Wang Y.. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018;15(5):749–753. doi: 10.1109/LGRS.2018.2802944. [DOI] [Google Scholar]

[ref31] Lozej, J. ; Meden, B. ; Struc, V. ; Peer, P. . End-to-end iris segmentation using u-net. In 2018 IEEE international work conference on bioinspired intelligence (IWOBI); IEEE, 2018; pp 1–6. [Google Scholar]

[ref32] Zhao Z., Yin Y., Hu A., Wang W., Liang Y.. The U-net-based ice pore parameter extraction method for establishing the SAW icing sensing mechanism. Sens. Actuators, A. 2024;373:115394. doi: 10.1016/j.sna.2024.115394. [DOI] [Google Scholar]

[ref33] Yasin Q., Liu B., Sun M., Sohail G. M., Ismail A., Majdanski M., Golsanami N., Ma Y., Fu X.. Automatic pore structure analysis in organic-rich shale using FIB-SEM and attention U-Net. Fuel. 2024;358:130161. doi: 10.1016/j.fuel.2023.130161. [DOI] [Google Scholar]

[ref34] Chen L.-C., Papandreou G., Kokkinos I., Murphy K., Yuille A. L.. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2018;40(4):834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]

[ref35] Chen, L.-C. ; Papandreou, G. ; Schroff, F. ; Adam, H. . Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017. [Google Scholar]

[ref36] Wang Y., Wang C., Wu H., Chen P.. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS One. 2022;17(1):e0261582. doi: 10.1371/journal.pone.0261582. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Yang Z., Chen L., Fu T., Yin Z., Yang F.. Spine Image Segmentation Based on U-Net and Atrous spatial pyramid pooling. J. phys.: Conf. Ser. 2022;2209(1):012020. doi: 10.1088/1742-6596/2209/1/012020. [DOI] [Google Scholar]

[ref38] Liu R., Tao F., Liu X., Na J., Leng H., Wu J., Zhou T.. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022;14(13):3109. doi: 10.3390/rs14133109. [DOI] [Google Scholar]

[ref39] Liu, C. ; Gao, H. ; Chen, A. . A real-time semantic segmentation algorithm based on improved lightweight network. In 2020 International Symposium on Autonomous Systems (ISAS); IEEE, 2020; pp 249–253. [Google Scholar]

[ref40] Zhou Z., Siddiquee M. M. R., Tajbakhsh N., Liang J.. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging. 2020;39(6):1856–1867. doi: 10.1109/TMI.2019.2959609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] Gu Z., Cheng J., Fu H., Zhou K., Hao H., Zhao Y., Zhang T., Gao S., Liu J.. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging. 2019;38(10):2281–2292. doi: 10.1109/TMI.2019.2903562. [DOI] [PubMed] [Google Scholar]

[ref42] Mao R., Xie L., Lu X., Pei J., Xu X., Chang S.. Harnessing Multiple Level Features to Improve Segmentation Performance of Deep Neural Network: A Case Study in Magnetic Resonance Imaging of Nasopharyngeal Cancer. IEEE Access. 2024;12:82469–82481. doi: 10.1109/ACCESS.2024.3411099. [DOI] [Google Scholar]

[ref43] Liu J., Mu J., Sun H., Dai C., Ji Z., Ganchev I.. DLGRAFE-Net: A double loss guided residual attention and feature enhancement network for polyp segmentation. PLoS One. 2024;19(9):e0308237. doi: 10.1371/journal.pone.0308237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] Zhao, H. ; Shi, J. ; Qi, X. ; Wang, X. ; Jia, J. . Pyramid scene parsing network, In Proceedings of the IEEE conference on computer vision and pattern recognition; IEEE, 2017; pp 2881–2890. [Google Scholar]

[ref45] Bansal, A. ; Ostap, O. ; Trueba, M. M. ; Perry, K. . Atrous Space Bender U-Net (ASBU-Net/LogiNet). arXiv 2022. [Google Scholar]

[ref46] Yousef R., Khan S., Gupta G., Albahlal B. M., Alajlan S. A., Ali A.. Bridged-U-Net-ASPP-EVO and Deep Learning Optimization for Brain Tumor Segmentation. Diagnostics. 2023;13(16):2633. doi: 10.3390/diagnostics13162633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] Gao Q., Almekkawy M.. ASU-Net++: A nested U-Net with adaptive feature extractions for liver tumor segmentation. Comput. Biol. Med. 2021;136:104688. doi: 10.1016/j.compbiomed.2021.104688. [DOI] [PubMed] [Google Scholar]

[ref48] Ding, B. ; Qian, H. ; Zhou, J. . Activation functions and their characteristics in deep neural networks. In 2018 Chinese control and decision conference (CCDC); IEEE, 2018; pp 1836–1841. [Google Scholar]

[ref49] Thanapol, P. ; Lavangnananda, K. ; Bouvry, P. ; Pinel, F. ; Leprévost, F. . Reducing overfitting and improving generalization in training convolutional neural network (CNN) under limited sample sizes in image recognition. In 2020–5th International Conference on Information Technology (InCIT); IEEE, 2020; pp 300–305. [Google Scholar]

[ref50] Berrar, D. Cross-Validation. Encyclopedia of Bioinformatics and Computational Biology. Ranganathan, S. ; Gribskov, M. ; Nakai, K. ; Schönbach, C. pp. 542–545.Academic Press, 2019. [Google Scholar]

[ref51] Müller D., Soto-Rey I., Kramer F.. Robust chest CT image segmentation of COVID-19 lung infection based on limited data. Inform. Med. Unlocked. 2021;25:100681. doi: 10.1016/j.imu.2021.100681. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] Kingma, D. P. ; Ba, J. . Adam: A method for stochastic optimization. arXiv 2014. [Google Scholar]

[ref53] Bengio Y., Simard P., Frasconi P.. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994;5(2):157–166. doi: 10.1109/72.279181. [DOI] [PubMed] [Google Scholar]

PERMALINK

Systematic Evaluation of Atrous Spatial Pyramid Pooling in U‑Net for Pore Segmentation in Plasma Electrolytic Oxidation Coatings

Chi-Wei Chu

Chun-Ming Lu

Wing Kiu Yeung

Abstract

Introduction