Skip to main content
Journal of Imaging Informatics in Medicine logoLink to Journal of Imaging Informatics in Medicine
. 2025 Feb 25;38(6):3614–3622. doi: 10.1007/s10278-025-01436-3

Ultrasound Thyroid Nodule Segmentation Algorithm Based on DeepLabV3+ with EfficientNet

Nan Xiao 1, Demin Kong 1, Junfeng Wang 1,
PMCID: PMC12701152  PMID: 40000546

Abstract

Ultrasound is widely used to monitor and diagnose thyroid nodules, but accurately segmenting these nodules in ultrasound images remains a challenge due to the presence of noise and artifacts, which often blur nodule boundaries. While several deep learning algorithms have been developed for this task, their performance is frequently suboptimal. In this study, we introduce the use of EfficientNet-B7 as the backbone for the DeepLabV3+ architecture in thyroid nodule segmentation, marking its first application in this area. We evaluated the proposed method using a dataset from the First Affiliated Hospital of Zhengzhou University, along with two public datasets. The results demonstrate high performance, with a pixel accuracy (PA) of 97.67%, a Dice similarity coefficient of 0.8839, and an Intersection over Union (IoU) of 79.69%. These outcomes outperform most traditional segmentation networks.

Keywords: Image segmentation, Ultrasound thyroid nodule, DeepLabV3+, EfficientNet

Introduction

The thyroid is a butterfly-shaped gland located below the neck on either side of the trachea, consisting of two lobes connected by an isthmus. It is the largest endocrine gland in the human body. As of 2020, approximately 586,000 cases of thyroid cancer were diagnosed worldwide, making it the ninth most common cancer [1]. In recent years, the incidence of thyroid nodules has steadily increased, driven in part by the stresses of modern life.

Common diagnostic methods for thyroid nodules include ultrasonography, CT scans, fine needle biopsies, and pathological examinations. These imaging techniques allow doctors to quickly evaluate a patient’s condition and determine appropriate treatment, providing timely and effective disease management. While fine needle biopsy is regarded as the gold standard for diagnosing thyroid nodules [2], it is invasive, poses a risk of trauma to the thyroid, and involves additional costs. Furthermore, the diagnostic process, which includes both biopsy and pathological analysis, can be time-consuming. CT scans, requiring nuclear imaging, also present health risks and are costly for patients.

Ultrasonography, in contrast, employs high-frequency sound waves to generate images of tissues or organs in a non-invasive manner. This technique utilizes the physical properties of ultrasound, enabling the examination of tissues from multiple angles and supporting accurate medical diagnosis. It is a widely used imaging method due to its cost-effectiveness, ease of use, and safety, posing no harm to the patient. While needle biopsy is still required for diagnosing certain conditions, ultrasound can effectively diagnose the majority of thyroid cases, as 60 to 90% of thyroid nodules are low-risk cystic nodules [3]. As a result, ultrasonography remains the primary tool for diagnosing thyroid disorders.

Despite its widespread use, ultrasound imaging is often affected by echo interference and refraction, resulting in significant artifacts and noise that blur the boundaries of thyroid nodules. These challenges can alter the appearance or intensity of the nodules, and in some cases, unrelated regions may produce abnormal echoes, making nodule detection more difficult. Nevertheless, ultrasonography remains the preferred method for diagnosing thyroid nodules due to its speed, convenience, and non-invasive nature. Ultrasound evaluations assess the risk of malignancy by analyzing imaging characteristics. When a high risk of malignancy is detected, patients are typically advised to undergo a needle biopsy for further confirmation. Therefore, accurate identification of thyroid nodules in ultrasound images is essential for guiding diagnostic and treatment decisions.

Numerous semantic segmentation network architectures have been developed, including fully convolutional networks (FCNs) [4], the U-Net family [58], the DeepLab series [911], pyramid scene parsing networks (PSPNet) [12], and multi-scale attention networks (MANet) [13], among others. FCN relies solely on convolutional layers, avoiding downsampling and focusing on pixel-level label detection. However, performing convolutions at the original image resolution requires significant computational resources. U-Net introduces skip connections between the encoder and decoder, enabling the integration of multi-level features. Its successor, UNet++, refines this process by reducing scale differences in the fused feature maps through feature superposition, though this increases the number of parameters and memory consumption. DeepLabV3+ enhances convolution operations in the encoder to expand the receptive field and uses atrous spatial pyramid pooling to capture multi-scale information. PSPNet, an extension of FCN, incorporates a global pyramid pooling module to blend local and global information for improved pixel-level prediction. MANet, combining ResNet as the encoder and U-Net as the decoder, adds a dual attention mechanism to improve accuracy.

The performance of a segmentation network is determined by both the convolutional neural network (CNN) used for feature extraction and the design of the encoder-decoder architecture. The encoder extracts features using various CNN methods, while the decoder restores the image size, allowing for pixel-level classification of the input.

Current deep learning algorithms provide several approa-ches for thyroid nodule segmentation, and numerous studies have addressed the challenges of thyroid nodule segmentation in ultrasound images. While significant progress has been made, several limitations persist: Ozcan [14] proposed Enhanced-TransUNet, a model leveraging transformers to capture global contextual information, addressing the inherent locality limitations of CNNs. It also incorporated an information bottleneck to compress redundant features and reduce overfitting. The method mitigated the disappearance of low-level features crucial for delineating thyroid boundaries during feature encoding. However, the model’s generalization capability on complex clinical data remains to be validated. The boundary attention transformer network (BTNet) [15] combined CNNs and transformers to integrate long- and short-range features, improved boundary attention mechanisms to focus on boundary learning, and incorporated deep supervision for better feature mixing across scales. Despite these advancements, the segmentation performance is limited by the quality of ultrasound images, as blurred images are challenging for complex models. HEAT-Net [16], a hybrid enhanced attention transformer network with a Unet-like structure, combined the strengths of CNNs and transformers to improve segmentation performance. However, challenges remain with images exhibiting blurred edges and acoustic shadows. Moreover, overlapping results may occur when handling multi-lesion images. Manh [17] tackled segmentation challenges such as blurred boundaries, low contrast, and speckle noise in ultrasound images. The framework he proposed also overcame the limitations of existing CNN-based methods in capturing global contextual information. However, lesions with strong echo shadows tend to suffer from over-segmentation or under-segmentation. The MDenseNet architecture [18] addressed the time-consuming and error-prone nature of manual annotation of low-quality ultrasound images. It provided accurate automatic segmentation for cases with speckle noise, uneven appearance, and blurred boundaries. However, the model struggled to segment nodules with internal heterogeneity and similar appearances to the background. Sun [19] constructed the TNSNet model using DeepLabV3+ as the backbone. It introduced soft shape supervision blocks between the region and shape paths and used cross-path attention mechanisms to enhance boundary recognition and constraints. While the method improved segmentation performance, its applicability to complex state-of-the-art models remains limited for blurred images. DMU-Net [20], focusing on malignant nodule segmentation, employed dual-route mirroring U-shaped networks to extract contextual and edge details. However, in images with subtle lesion features, the model may learn irrelevant features, resulting in decreased performance. The proposed FDE-Net [21] included a cross-scale attention module to address the insensitivity of networks to target scale variations and a frequency-domain enhancement module to enhance image contrast by integrating texture information across different frequency bands. Despite these strengths, the network’s design is limited to single-nodule segmentation. FCG-Net [22] tackled challenges in automatic nodule segmentation caused by poor contrast and excessive speckle noise. It introduced Ghost bottleneck structures in both the encoder and decoder, utilized full-scale skip connections for multi-resolution feature extraction, and incorporated Squeeze and Excitation modules to reduce computational complexity. However, the lack of external datasets limits the validation of its generalization capability.

In summary, the diverse appearances and subtle characteristics of thyroid nodules in ultrasound images pose significant challenges for achieving accurate segmentation, and the complexity of the parameter calculations in these models further complicates the task.

To overcome these challenges, this study introduces a thyroid nodule segmentation network that utilizes EfficientNet-B7 as the backbone of the DeepLabV3+ architecture. EfficientNet-B7 is a highly efficient neural network architecture that achieves superior accuracy with relatively fewer parameters. This efficiency is attributed to its compound scaling method, which balances network depth, width, and resolution. When combined with DeepLabV3+, a segmentation framework leveraging atrous convolution to expand the receptive field without increasing parameter count, the model excels in capturing contextual information-a crucial capability for handling the morphological variability and blurred boundaries common in thyroid nodule ultrasound images.

The structure of the paper is organized as follows: following the introduction, we outline the experimental methods, including data processing, an overview of the network architecture, and the model training process. The third section presents an analysis of the experimental results, with the conclusion provided in the fourth section.

The contributions of this paper are as follows:

  1. Data preprocessing and augmentation: We employed a threshold-based method to preprocess the original data, effectively removing irrelevant areas such as black edges and text descriptions from the images to reduce redundant features. To enhance the model’s generalization ability, we applied data augmentation techniques, including random rotations, flips, scaling, translations, and the addition of noise, to increase the diversity of the training data.

  2. Network architecture augmentation: We replaced the backbone of the DeepLabV3+ architecture with the high-performance convolutional neural network EfficientNet-B7, optimizing it for the segmentation of thyroid nodule ultrasound images. Experimental results demonstrated an improvement in the performance of the segmentation network.

  3. Optimized training process: To achieve optimal segmentation, we employed Dice loss as the loss function during training and applied cosine annealing decay to adjust the learning rate, ensuring improved model performance.

Methods

Data Sets and Data Processing

In this study, 1046 ultrasonic thyroid nodule images provided by the First Affiliated Hospital of Zhengzhou University were collected, each image corresponded to a patient, and the region of thyroid nodule was manually marked by 2 ultrasonographers with 10 years of clinical experience, serving as the ground truth for our research. The goal of our model is to produce segmentation results that closely match these ground truth. The raw ultrasound images, captured using various ultrasound devices, differ significantly in shape and size and are stored in DICOM format. To streamline processing, we converted all DICOM images to PNG format using the pydicom library. Since different ultrasound devices produce images with varying resolutions, the images were uniformly resized for consistency. Labeling was conducted using LabelMe software, where each labeled ultrasound image corresponded to a mask. In the mask, the region of interest (the nodule) is represented in white (non-zero pixel values), while the background is black (pixel value 0). Additionally, two publicly available ultrasound thyroid nodule datasets, DDTI [23] and TNCD [24], were incorporated into the study. These datasets contain 637 and 3493 ultrasound images, respectively. The data set is shown in Table 1.

Table 1.

Comparisons of our proposed segmentation model against the classical models on different datasets

Dataset Model DSC PA IoU Recall Precision
OURS U-Net 0.7175 0.9286 0.5737 0.8602 0.6719
UNet++ 0.7315 0.9312 0.5887 0.8904 0.6630
DeepLabV3+ 0.7469 0.9389 0.6103 0.8559 0.7122
DeepLabV3+/EfficientNet-B7 (ours) 0.8847 0.9726 0.7804 0.8813 0.8847
DDTI U-Net 0.7217 0.9294 0.5827 0.8984 0.6516
UNet++ 0.7469 0.9389 0.6103 0.8559 0.7122
DeepLabV3+ 0.7435 0.9363 0.6072 0.8692 0.6994
DeepLabV3+/EfficientNet-B7 (ours) 0.8495 0.9642 0.7479 0.8847 0.8517
TNCD U-Net 0.7082 0.9237 0.5628 0.8982 0.6276
UNet++ 0.7278 0.9308 0.5860 0.8905 0.6610
DeepLabV3+ 0.7362 0.9342 0.5952 0.8843 0.6813
DeepLabV3+/EfficientNet-B7 (ours) 0.8823 0.9737 0.7958 0.9064 0.8796

The best result is shown in bold

Upwards arrow () indicates that the higher the indicator the better

We applied uniform processing to all datasets. Specifically, we used a thresholding method based on gray values to remove irrelevant areas from the images, such as black borders and text annotations, which could introduce redundant features. The region of interest (ROI) and the surrounding area were preserved in both the processed original images and the corresponding mask images. To prepare the images for network input, we resized both the ultrasound images and the mask images to 256 × 256 pixels. Double cubic interpolation was used for resizing the ultrasound images, while nearest neighbor interpolation was applied to the mask images. This ensured that both the images and masks were appropriately scaled for input into the network.

In addition, to enhance the generalization capability of the model, we applied data augmentation techniques during training. These included random rotations, flipping, scaling, translations, and the addition of noise, thereby increasing the diversity of the training data. Furthermore, we partitioned the datasets into training and testing subsets. Specifically, 80% of the two publicly available datasets were used as the training dataset, while the remaining 20% from each dataset, along with a proprietary clinical dataset from this study, were used for validation. Figure 1 presents examples of thyroid nodule ultrasound images, including some after-data augmentation. Figure 2 shows the prediction results generated by the segmentation architecture used in this study, while Fig. 3 displays the corresponding ground truth mask images.

Fig. 1.

Fig. 1

Ultrasound images

Fig. 2.

Fig. 2

Prediction results

Fig. 3.

Fig. 3

Ground truth

Model

EfficientNet-b7

EfficientNet’s architecture is optimized through neural architecture search (NAS), which identifies the best network configuration within a large search space to improve performance. Unlike traditional convolutional neural networks (CNNs) such as ResNet and Inception, EfficientNet achieves high accuracy while significantly reducing the number of parameters and computational requirements. It has demonstrated strong performance across various computer vision tasks, including image classification, object detection, and semantic segmentation. Notably, EfficientNet achieved a Top-1 accuracy of 84.4% on the ImageNet dataset [25], making it a popular choice in fields like autonomous driving [26] and medical image analysis [27, 28].

EfficientNet-B7 is one of the largest and most powerful models in the EfficientNet family. This series introduces a composite scaling method, which differs from traditional approaches that scale only one dimension of the network. EfficientNet scales three dimensions simultaneously: width (w), depth (d), and resolution (r). The compound scaling method uses a compound factor (ϕ) to uniformly scale these dimensions, ensuring a balance between different parts of the network and achieving better accuracy and efficiency. This approach is described by the following formula Eq. 1:

d=αϕ,w=βϕ,andr=γϕs.t.α·β2·γ22α1,β1,γ1 1

where w scales the channel of the feature matrix, d scales the depth, r scales the resolution, s.t. Representing the limitation, α, β, and γ are constants, and the degree of scaling can be controlled by setting ϕ. The three parameters α, β, and γ were searched using the neural architecture search technique (NAS) on the baseline network EfficientNet-B0 to obtain the best parameter values, and then the parameter values were fixed using different ϕ to obtain EfficientNet-B1 to EfficientNet-B7. The network structure is divided into 9 stages: the first stage is a convolutional layer with a convolution kernel size of 3 × 3; stages 2 to 8 are composed of repeated stacking of the moving reverse residual bottleneck convolution (MBConv) structure; and stage 9 is composed of a convolutional layer with a convolutional kernel of 1 × 1, an average pooling layer, and a fully connected layer. The expansion factors for stages 2, 3 to 8 are 1 and 6, respectively. The network structure is shown in Fig. 4.

Fig. 4.

Fig. 4

EfficientNet’s network architecture. The dashed box on the right shows the internal structure of the MBConv module

DeepLabV3+

In recent years, the DeepLab series of networks have become widely utilized for semantic segmentation [2931]. The DeepLabV3+ architecture enhances DeepLabV3 by incorporating a decoder module that improves segmentation boundaries. Unlike U-Net, DeepLabV3+ introduces dilated convolutions, which expand the receptive field without losing information. This approach allows each convolutional output to capture a broader context, thereby enabling more effective feature extraction. The DeepLabV3+ model is structured into an encoder and a decoder. In the encoder, the input images pass through a backbone deep convolutional neural network (DCNN), specifically EfficientNet-B7 in this study. This process separates the features into high-level and low-level semantic components. The high-level semantic features are processed by the atrous spatial pyramid pooling (ASPP) module, which generates five feature maps. These maps are then combined and upsampled in the decoder module, following a 1 × 1 convolution, to match the resolution of the low-level semantic features. The low-level semantic features are also processed through a 1 × 1 convolution and then combined with the high-level features. After further convolution and bilinear upsampling, the final predicted image is generated, with a resolution matching the original input image.

Based on the above section, EfficientNet offers higher accuracy and fewer parameters compared to many CNNs. However, with a large volume of extracted feature information, reducing the number of parameters can impact accuracy. To address this, DeepLabV3+ utilizes dilated convolutions and an expanded receptive field to ensure effective feature extraction. This study integrates these two efficient network architectures to segment ultrasound thyroid nodule images. The enhanced network architecture is illustrated in Fig. 5.

Fig. 5.

Fig. 5

The network architecture of this study. The part in the red box replaces the original backbone network DCNN

Training Details

We utilized two nodes, each equipped with 40 CPUs and 4 DCUs. During training, we conducted 240 epochs using the Adam optimizer with a batch size of 10 and a learning rate of 0.0001.

To train the network, we employed the Dice index as the loss function, as defined in formula Eq. 2. Here, x represents the ground truth values, and y denotes the predicted mask map from the model output.

Diceloss(x,y)=1-2·xy|x|+|y| 2

During training, in addition to the global optimal solution, there are multiple local optima that can trap the model. To avoid getting stuck in a local minimum, we can escape the current value by temporarily increasing the learning rate. To achieve this, we employ cosine annealing decay to adjust the learning rate. As training progresses, the learning rate follows a cosine function, rapidly decreasing and increasing in a cyclical manner. This approach helps the model eventually converge to a global optimum. The process is described by formula Eq. 3.

ηt=ηmini+12(ηmaxi-ηmini)(1+cos(TcurTiπ)) 3

where i is the index value, representing the i-th learning process. Tcur is the number of epochs that have been executed, and Ti is the total number of epochs that need to be executed. When the Ti specified epochs are executed, a hot restart is started, which does not mean starting from scratch, but changing the learning rate. ηmaxi and ηmini represent the maximum and minimum values of the learning rate during the i-th learning process, which delineates the range of the learning rate fluctuation.

Results

To evaluate the performance of the proposed segmentation architecture, we used the Dice similarity coefficient (DSC), pixel accuracy rate (PA), intersection over union (IoU), recall, and precision for quantitative assessment. The calculation methods for these evaluation metrics are detailed in formulas Eqs. 4, 5, 6, 7, and 8.

DSC=2×TP2×TP+FN+FP 4
PA=TP+TNTP+FP+TN+FN 5
IoU=TPTP+FP+FN 6
Recall=TPTP+FN 7
Precision=TPTP+FP 8

The above calculations are based on the confusion matrix. In this matrix, TP (true positive) refers to positive samples correctly identified as positive by the model, TN (true negative) refers to negative samples correctly identified as negative, FP (false positive) refers to negative samples incorrectly identified as positive, and FN (false negative) refers to positive samples incorrectly identified as negative. In this study, positive and negative samples correspond to pixels in the nodule and non-nodule regions, respectively. As defined in the formulas, the Dice similarity coefficient (DSC) measures the overlap between the segmented nodule region and the ground truth. Pixel accuracy (PA) indicates the proportion of correctly classified pixels to the total number of pixels in the thyroid ultrasound image. Intersection over Union (IoU) represents the overlap rate between the predicted segmentation and the ground truth. Recall measures the proportion of nodule pixels in the ground truth that are correctly identified as nodule pixels in the predicted segmentation, indicating the sensitivity of the model. Precision quantifies the proportion of nodule pixels in the predicted segmentation that are actually nodule pixels in the ground truth, reflecting the specificity and accuracy of the model’s predictions. Since this study focuses on the accuracy of nodule region segmentation, particular emphasis is placed on the degree of overlap between the predicted results and the ground truth.

Additionally, to further demonstrate the advantages of the improved model, we compared it with classical segmentation networks such as U-Net, UNet++, and DeepLabV3+. The evaluation was conducted on datasets provided by the First Affiliated Hospital of Zhengzhou University as well as two publicly available datasets. The comparison of various metrics, particularly DSC and IoU, indicates that the performance of our model is improved by 10 to 20% compared to the comparison model compared to the other models. The evaluation results for each model are summarized in Table 1. Performance comparison of the models is shown in Fig. 6.

Fig. 6.

Fig. 6

Performance comparison of the models

Conclusion

In this paper, we proposed a thyroid nodule segmentation architecture based on a convolutional neural network. By replacing the original backbone of DeepLabV3+ with the lightweight EfficientNet-B7, we significantly enhanced the model’s feature extraction capabilities. The experimental results show that compared with the classical segmentation networks U-Net, UNet++, and DeepLabV3+, the performance is improved by 10 to 20% on the dataset from the First Affiliated Hospital of Zhengzhou University and two public datasets. Thus, this architecture proves highly effective for ultrasound segmentation of thyroid nodules.

However, traditional downsampling operations do not perform well in image denoising. Therefore, future work will focus on improving ultrasonic image denoising and incorporating attention mechanisms to address feature loss issues.

Acknowledgements

The authors would like to thank the First Affiliated Hospital of Zhengzhou University for providing the dataset for this study and the National Supercomputing Center in Zhengzhou for providing the computing power support for this study.

Author Contribution

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Nan Xiao, Demin Kong, and Junfeng Wang. The first draft of the manuscript was written by Nan Xiao, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data Availability

The data cannot be made public due to privacy concerns.

Declarations

Ethics Approval

This study has no ethical approval required.

Consent to Participate

Full consent for use in this study has been obtained from the data provider.

Consent for Publication

The authors confirm that informed consent for publication was obtained from the data provider.

Conflict of Interest

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Erratum: Global Cancer Statistics 2018. (2020). GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 70(4), 313. [DOI] [PubMed] [Google Scholar]
  • 2.Valcavi, R., Frasoldati, A. (2004). Ultrasound-guided percutaneous ethanol injection therapy in thyroid cystic nodules. Endocrine Practice, 10(3), 269–275. [DOI] [PubMed] [Google Scholar]
  • 3.Bennedbæk, F. N., Hegedüs, L. (2003). Treatment of recurrent thyroid cysts with ethanol: a randomized double-blind controlled trial. The Journal of Clinical Endocrinology and Metabolism, 88(12), 5773–5777. [DOI] [PubMed] [Google Scholar]
  • 4.Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440). [DOI] [PubMed]
  • 5.Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 (pp. 234–241). Springer International Publishing.
  • 6.Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., Liang, J. (2018). Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4 (pp. 3–11). Springer International Publishing. [DOI] [PMC free article] [PubMed]
  • 7.Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., ... & Wu, J. (2020, May). Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1055–1059). IEEE.
  • 8.Chen, Y., Zhang, X., Li, D., Park, H., Li, X., Liu, P., ... & Shen, Y. (2023). Automatic segmentation of thyroid with the assistance of the devised boundary improvement based on multicomponent small dataset. Applied Intelligence, 53(16), 19708–19723. [DOI] [PMC free article] [PubMed]
  • 9.Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834–848. [DOI] [PubMed] [Google Scholar]
  • 10.Chen, L. C. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
  • 11.Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
  • 12.Zhang, R., Chen, J., Feng, L., Li, S., Yang, W., Guo, D. (2021). A refined pyramid scene parsing network for polarimetric SAR image semantic segmentation in agricultural areas. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. [Google Scholar]
  • 13.He, P., Jiao, L., Shang, R., Wang, S., Liu, X., Quan, D., ... & Zhao, D. (2022). MANet: Multi-scale aware-relation network for semantic segmentation in aerial scenes. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–15.
  • 14.Ozcan, A., Tosun, Ö., Donmez, E., & Sanwal, M. (2024). Enhanced-TransUNet for ultrasound segmentation of thyroid nodules. Biomedical Signal Processing and Control, 95, 106472. [Google Scholar]
  • 15.Li, C., Du, R., Luo, Q., Wang, R., & Ding, X. (2023). A novel model of thyroid nodule segmentation for ultrasound images. Ultrasound in Medicine & Biology, 49(2), 489–496. [DOI] [PubMed] [Google Scholar]
  • 16.Jiang, T., Xing, W., Yu, M., & Ta, D. (2023). A hybrid enhanced attention transformer network for medical ultrasound image segmentation. Biomedical Signal Processing and Control, 86, 105329. [Google Scholar]
  • 17.Manh, V., Jia, X., Xue, W., Xu, W., Mei, Z., Dong, Y., ... & Ni, D. (2024). An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss. Computers in Biology and Medicine, 171, 108137. [DOI] [PubMed]
  • 18.Ma, J., Kong, D., Wu, F., Bao, L., Yuan, J., & Liu, Y. (2024). Densely connected convolutional networks for ultrasound image based lesion segmentation. Computers in Biology and Medicine, 168, 107725. [DOI] [PubMed] [Google Scholar]
  • 19.Sun, J., Li, C., Lu, Z., He, M., Zhao, T., Li, X., ... & Ni, X. (2022). TNSNet: thyroid nodule segmentation in ultrasound imaging using soft shape supervision. Computer methods and programs in biomedicine, 215, 106600. [DOI] [PubMed]
  • 20.Yang, Q., Geng, C., Chen, R., Pang, C., Han, R., Lyu, L., & Zhang, Y. (2022). DMU-Net: Dual-route mirroring U-Net with mutual learning for malignant thyroid nodule segmentation. Biomedical Signal Processing and Control, 77, 103805. [Google Scholar]
  • 21.Chen, H., Yu, M. A., Chen, C., Zhou, K., Qi, S., Chen, Y., & Xiao, R. (2023). FDE-net: Frequency-domain enhancement network using dynamic-scale dilated convolution for thyroid nodule segmentation. Computers in Biology and Medicine, 153, 106514. [DOI] [PubMed] [Google Scholar]
  • 22.Shao, J., Pan, T., Fan, L., Li, Z., Yang, J., Zhang, S., ... & Liu, X. (2023). FCG-Net: an innovative full-scale connected network for thyroid nodule segmentation in ultrasound images. Biomedical Signal Processing and Control, 86, 105048.
  • 23.Pedraza, L., Vargas, C., Narváez, F., Durán, O., Muñoz, E., Romero, E. (2015, January). An open access thyroid ultrasound image database. In 10th International symposium on medical information processing and analysis (Vol. 9287, pp. 188–193). SPIE.
  • 24.Gong, H., Cheng, H., Xie, Y., Tan, S., Chen, G., Chen, F., Li, G. (2022, September). Less is more: adaptive curriculum learning for thyroid nodule diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 248–257). Cham: Springer Nature Switzerland.
  • 25.Tan, M. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946.
  • 26.Wang, C., Li, Y. (2022, May). Motion Prediction for Autonomous Vehicles Based on EfficientNet-B1. In 2022 4th International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 648–651). IEEE.
  • 27.Marques, G., Agarwal, D., De la Torre Díez, I. (2020). Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Applied soft computing, 96, 106691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Shah, H. A., Saeed, F., Yun, S., Park, J. H., Paul, A., Kang, J. M. (2022). A robust approach for brain tumor detection in magnetic resonance images using finetuned efficientnet. Ieee Access, 10, 65426–65438. [Google Scholar]
  • 29.Wang, H., Zhu, Y., Adam, H., Yuille, A., Chen, L. C. (2021). Max-deeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5463–5474).
  • 30.Cheng, B., Collins, M. D., Zhu, Y., Liu, T., Huang, T. S., Adam, H., Chen, L. C. (2020). Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12475–12485).
  • 31.Sun, X., Xie, Y., Jiang, L., Cao, Y., Liu, B. (2022). DMA-Net: DeepLab with multi-scale attention for pavement crack segmentation. IEEE Transactions on Intelligent Transportation Systems, 23(10), 18392–18403. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data cannot be made public due to privacy concerns.


Articles from Journal of Imaging Informatics in Medicine are provided here courtesy of Springer

RESOURCES