Abstract
Thyroid nodules are a common endocrine condition that can be detected through medical imaging, aiding in the identification of thyroid cancer. Accurate segmentation of these nodules is crucial for precise diagnosis, considering factors such as size, shape, and number of nodules that influence their grading. Automating the segmentation process can benefit clinicians and researchers by providing efficient and reliable results. However, ultrasound image segmentation presents challenges due to the complex tissue structure surrounding the thyroid. Traditional approaches have relied on manually developed convolutional neural networks (CNNs) based models, which are tedious, error-prone, and require domain-specific expertise. In this paper, an evolutionary neural architecture search (NAS) based method is developed using the Improved Teaching-Learning-Based Optimization (ITLBO) algorithm to discover optimal block structures in the encoder-decoder architecture for thyroid nodule segmentation (TNS) in ultrasound images. The proposed method enables dynamic network structure optimization through a flexible search space. Moreover, attention blocks are incorporated into the encoder-decoder architecture to enhance the performance of segmentation. The proposed method, named EvoThy-Net, is evaluated on two publicly available ultrasound image datasets, demonstrating its potential in discovering superior-performance segmentation networks for the TNS task. The results revealed that the proposed method outperforms other state-of-the-art models.
Keywords: Neural architecture search, Block-based network, Ultrasound images, Thyroid nodule segmentation
Subject terms: Cancer, Computational biology and bioinformatics, Engineering, Mathematics and computing
Introduction
The thyroid gland releases two hormones that regulate metabolism and play a key role in the human body. Identifying thyroid-related disorders is crucial due to their significant impact. One of the most common diseases related to the thyroid is thyroid nodules, which refer to solid or cystic masses that develop within the gland. Compared to traditional Fine Needle Aspiration (FNA) biopsy, high-resolution ultrasonography has become an essential tool for the early diagnosis and identification of thyroid nodules due to its practical, fast, and non-invasive nature. A doctor’s diagnosis heavily relies on the thyroid ultrasound images’ aspect ratio and margin shape. Unlike other imaging modalities such as X-rays, MRI, and CT, there is no standardization in ultrasound examination procedures. Therefore, doctors’ expertise and knowledge are crucial for accurately identifying thyroid nodules. However, untrained radiologists face a significant risk of misdiagnosis from ultrasound image-based assessments. Thus, thyroid nodule segmentation (TNS) is an active area of research in medical image processing, and various traditional and deep learning-based techniques have been proposed for accurate and efficient segmentation1.
In recent years, advancements in artificial intelligence (AI) and deep learning have significantly enhanced the capabilities of thyroid nodule detection and classification. Convolutional Neural Networks (CNNs), in particular, have shown remarkable performance in extracting high-level features from ultrasound images, allowing for precise differentiation between benign and malignant nodules. These models can learn complex patterns and visual cues that might be overlooked by the human eye, thereby reducing inter-observer variability and improving diagnostic consistency. By integrating automated segmentation methods with AI-driven classification systems, researchers aim to create robust Computer-Aided Diagnosis (CAD) tools that assist clinicians in making informed decisions with greater confidence and accuracy.
Despite the promising progress, several challenges still hinder the widespread adoption of these techniques in clinical practice. Variations in ultrasound equipment, patient anatomy, and imaging protocols often result in diverse image quality, making model generalization difficult. Furthermore, the limited availability of large, annotated datasets restricts the training of highly accurate deep learning models. To overcome these issues, ongoing research focuses on developing more generalized architectures, transfer learning approaches, and collaborative data-sharing frameworks among medical institutions. These efforts are crucial in bridging the gap between research prototypes and real-world clinical applications, ultimately improving the early diagnosis and management of thyroid disorders.
This paper introduces EvoThy-Net, a novel evolutionary method tailored for crafting a high-performance Encoder-Decoder (ED) network architecture. The primary objective is to achieve precise segmentation of thyroid nodules in ultrasound images. The proposed method incorporated attention blocks in the encoder-decoder network structure, which effectively extract important features to enhance the segmentation performance. An Improved Teaching-Learning Based Optimization (ITLBO) algorithm is employed to discover optimal block structure and training parameters in the ED network. The proposed method was evaluated on two thyroid nodule ultrasound image datasets and demonstrated that it achieves higher segmentation performance than existing models. Extensive experiments confirm that the proposed method can generate optimal network structures for TNS and outperform state-of-the-art models. Additionally, the discovered model from the proposed method has significantly fewer parameters, resulting in faster and more lightweight performance compared to other models.
Following are the contributions:
An evolutionary-based method is presented for designing a high-performance encoder-decoder network architecture to accurately segment thyroid nodules in ultrasound images.
A search space is defined with various parameters to construct an efficient encoder-decoder network along with training parameters.
Attention blocks were incorporated into the encoder-decoder network structure to improve the segmentation performance by effectively extracting important features.
Then, Improved Teaching–Learning-Based Optimization is employed to discover the optimal block structure in the encoder–decoder network as well as the training parameters.
The paper is structured as follows: Sect. 2 provides the background knowledge for the proposed method, Sect. 3 presents the preliminaries, Sect. 4 contains the proposed method, and Sect. 5 discusses the dataset and experimental results. Section 6 contains the discussion and experimental analysis. Finally, Sect. 7 concludes the paper.
Related work
This section reviews the key developments in thyroid nodule segmentation (TNS) and classification, highlighting both conventional and deep learning-based methodologies that have been proposed to enhance diagnostic performance.
Traditional techniques such as mean clustering, active contour2, level set3, filtering4, morphological5, and texture-based methods6 have been employed for TNS task. For instance, Abbasian et al.4 proposed a hybrid filtering method, while Shahroudnejad et al.7 utilized various thresholding approaches and morphological operations to segment thyroid nodules in ultrasound images. However, these methods suffer from limited accuracy.
In the past decade, deep learning techniques, particularly CNNs, have gained popularity in medical image processing, as they can automatically extract hidden features from images, eliminating the need for complicated manual feature extraction8. For instance, Ma et al.9 designed a CNN model specifically for this task, while Xu et al.10 introduced a cascaded CNN approach. Similarly, Ying et al.11 utilized a VGG network12 and combined it with a U-net13 to eliminate irrelevant regions for thyroid nodule segmentation. Song et al.14 introduced a deep learning model that uses a multitask cascaded architecture to precisely detect thyroid nodules. Liao et al.15 proposed a deep learning-based technique that combines a modified U-Net model with a conditional random field (CRF) model to separate thyroid nodules from ultrasound images. An automated thyroid nodule segmentation approach based on a modified U-net model was suggested in a different study by Ding et al.16 to increase segmentation accuracy, they developed a deep supervision system with various loss functions. A deep-learning-based CAD was proposed by Liu et al.17 as a method for automatic detection and classification of thyroid nodules from ultrasound images. Kumar et al.18 developed a deep neural network to segment cystic nodules, solid nodules, and the thyroid gland. Wang et al.19 employed an attention-based deep neural network to aggregate features extracted from different ultrasound images. A deep learning-based approach for thyroid nodule diagnosis from ultrasound images was proposed by Abdolali et al.20. This method makes use of a CNN architecture that is trained to classify each pixel in the image as belonging to a nodule or not. Pan et al.21 proposed a Sgunet model for thyroid nodule segmentation that utilizes semantic features to guide the extraction of low-level features. Specifically, they extract a single-channel pixel-by-pixel semantic feature map from each decoding layer’s high-dimensional features and use it to guide the low-level feature extraction process.
Pathak et al.22 used a combination of contrast-limited adaptive histogram equalization (CLAHE) and CNN for thyroid nodule detection. A multi-task learning-based technique for segmenting thyroid nodules from ultrasound images was developed by Kang et al.23. The proposed approach combined the classification and nodule segmentation tasks into a single model. A deep learning-based technique for thyroid nodule segmentation was proposed by Jin et al.24 utilizing a boundary field heatmap method. With the use of information fusion-based suggestion and enhancement networks, Nguyen et al.25 developed a deep learning-based technique for segmenting thyroid nodules. Another study by Song et al.26 proposed a hybrid feature cropping network that uses pseudo-labels to extract thyroid nodules in ultrasound images. Sun et al.27 suggested TNSNet for precise thyroid nodule segmentation by utilizing soft labels to restrict the margins of thyroid nodules. Chen et al.28 developed an encoder network to divide thyroid nodules into distinct types based on their cystic and solid properties for precise segmentation. An attention-guided deep learning model for thyroid nodule segmentation was suggested by Lu et al.29 that makes use of a CNN with a self-attention mechanism to concentrate on relevant areas of the image. Tao et al.30 suggested a novel deep learning-based technique for thyroid nodule segmentation that makes use of a spatial attention mechanism to gather local contextual data and boost nodule segmentation accuracy. For the segmentation of thyroid nodules, Kunapinun et al.31 developed a deep learning-based approach that combines the advantages of both supervised and unsupervised learning. Convolutional autoencoders and a conditional generative adversarial network are both used in their model. Sun et al.32 introduced CLIP-TNseg, a multimodal framework that combines a large pretrained vision–language model with a neural network architecture to enhance thyroid nodule segmentation in ultrasound images. Xiang et al.33 proposed a federated learning–based multi-attention UNet, enabling collaborative training across distributed ultrasound datasets while preserving data privacy and improving segmentation accuracy. Recently, Banerjee et al.34 presented a hybrid deep learning framework that integrates deep feature–based attention mechanisms with statistical validation strategies to improve segmentation performance in thyroid ultrasound images.
Although the existing segmentation models have demonstrated promising results, but they often suffer from a high parameter count and large model sizes, limiting their practicality in resource-constrained environments. Additionally, manual design of network architectures for segmentation tasks can be time-consuming and tedious, especially given the complex structure of ultrasound images depicting nodules35. Hence, there is a demand for an automated lightweight model that achieves high performance while maintaining a small parameter count and compact model size. Such a model would improve accessibility to medical imaging services for patients with thyroid nodules, particularly in resource-limited regions. To address the challenges of manual architecture development, the use of a Neural Architecture Search (NAS) approach becomes necessary to automatically discover optimal network architectures for TNS tasks36.
NAS offers a powerful approach for automatically generating effective deep neural networks, with different optimization methodologies. These include: (i) Differentiable neural architecture search (DNAS)37, which optimizes architecture weights through gradient descent. (ii) Evolutionary Algorithms (EA)38, treating NAS as an optimization problem and employing evolutionary operations to iteratively refine architectures within a population. The population undergoes optimization iterations until satisfactory performance is attained. (iii) Reinforcement Learning (RL)39, framing NAS as a Markov decision process, allowing a controller to learn to construct efficient structures through trial and error. While several NAS models have been implemented for medical image segmentation40, they often have restricted search spaces that involve repetitive stacking of similar blocks without topology optimization. Additionally, NAS-based segmentation models may not seamlessly transfer to TNS due to the distinct features of ultrasound images compared to other image types41,42. Hence, there is growing interest in developing an effective automated TNS model.
NAS has shown significant performance in medical image segmentation, as evidenced by studies such as Mortazi et al.40 and Weng et al.43, which optimize building block structures and repetitively stack them to construct architectures. Additionally, Kim et al.44 focused on optimizing individual layers and hyperparameters within the blocks; the overall block structure remained relatively fixed. Fan et al.45 presented an autoencoder network optimized through evolutionary algorithms to enhance segmentation precision. EvoUNet46 introduced a U-Net-based evolutionary model that employed a Genetic Algorithm (GA) to search for variable-length network architectures with hyperparameters. On the other hand, Wei et al.47 integrated an environmental selection mechanism into a GA for CNN design, focusing on elitism and population diversity to enhance search efficiency and prevent premature convergence in the segmentation task. However, their evolutionary search process did not involve hyperparameters. Baldeon et al.48 proposed AdaResU-Net, which used an evolutionary algorithm to optimize network hyperparameters while maintaining a fixed U-Net structure. Rajesh et al.49 developed a DEvoNet method utilizing a Differential Evolution (DE) algorithm for discovering optimal dense and residual block structures along with hyperparameters, although all blocks in their model followed the same structure. Despite the significant research in NAS for medical image segmentation, there remains a scarcity of studies utilizing evolutionary algorithms for the specific task of TNS. More recently, Rajesh et al.50 proposed a modified genetic algorithm–driven multi-objective NAS framework for brain tumor segmentation, optimizing network architecture and performance trade-offs simultaneously. This observation motivated us to propose a novel method for designing an automated segmentation model for TNS using an evolutionary algorithm.
Prelimaries
Neural architecture search (NAS)
In recent years, the emergence of NAS has attracted considerable attention for its ability to generate architectures that excel in diverse tasks. The primary objective of NAS is to discover optimal architectures that can optimize the objective function f for a specific task T. The common formulation of NAS is as follows:
![]() |
1 |
where
represents an optimal architecture from a search space
that optimizes a specific objective function
,
denotes the learned parameters of
, and
is the loss function used during model training.
Improved teacher learner based optimization (ITLBO)
The ITLBO algorithm, introduced by Rao et al.51, is an enhancement of the standard TLBO algorithm. It integrates novel concepts such as adaptive teaching factors, multiple teachers in the teaching phase, and self-motivated learning in the learning phase to improve the performance of the original TLBO approach. The ITLBO algorithm has been widely adopted in various applications52–55. The algorithm consists of four main phases: initialization, the teaching phase, the learning phase, and greedy selection. The procedural steps of the ITLBO algorithm are depicted in Fig. 1.
Fig. 1.

ITLBO algorithm.
Initialization
The population P is initialized with a size of Np, consisting of learners (d = 1, 2, 3,…, Np), as well as a specified number of subjects, i.e., decision variables (j), and a number of teachers (Nt). The fitness f(P) of the population P is then evaluated by calculating the fitness function.
Teacher phase
During the teacher phase, learners can learn from both the teacher and their peers through assignments and discussions. The selection of teachers involves ranking the evaluated population P. For maximization problems, solutions are ranked in descending order, while for minimization problems, they are ranked in ascending order. The best solution, denoted as f(Pb), is designated as the chief teacher (Tc) of the class (Tc = f(Pb)). The remaining teachers (Ts) are chosen according to the following Eq. 2:
![]() |
2 |
where s = 2, 3, … N. If the calculated value for Ts does not match exactly, the closest teacher Ts to the calculated value is selected. The learners are then categorized into different groups (s) based on their performance level or results, and each group is assigned a specific teacher from the pool of teachers (Ts). Each teacher is responsible for improving the performance of their assigned group. If a group’s performance matches or surpasses that of its assigned teacher, the group is reassigned to a more proficient teacher.
The teaching factor
represents the extent to which learners learn from the teacher, ranging from learning nothing to learning everything. In the standard TLBO algorithm, increasing the value of
accelerates the search process but reduces exploration capability. Conversely, decreasing the value of
enables a more precise search in smaller increments but may result in slower convergence. In the teaching-learning phenomenon, learners have the flexibility to learn from the teacher in varying proportions, indicating that the value of
can vary within its extremes.
![]() |
3 |
Here,
represents the fitness value of learner k in group s, and Ts represents the performance of the teacher in the same group s at iteration i. Consequently, the value of
dynamically adjusts during the search process based on the relative performance of learners and teachers. This allows students to increase their knowledge through collaboration with others, as described in Eq. (4):
![]() |
4 |
where
represents the average grade of the learner in group
for subject
, while
denotes the grade of the teacher assigned to group s for subject
in a population
. The equation comprises two terms on the right side: the first term reflects learning within the classroom, while the second term represents learning through tutorials. Then, the fitness of the generated population is evaluated, and the best learners are chosen using a greedy selection approach.
Learner phase
During the learner phase, self-learning is integrated to allow students to independently acquire knowledge without the direct guidance of a teacher. This facilitates exploration and knowledge enhancement through self-motivation, as expressed in Eq. (5):
![]() |
5 |
where E represents the exploration factor, calculated as E = round(1 + ri), r1 and r2 are two random numbers within the range [0,1]. The first term on the right side of Eq. (5) indicates learning through interactions with other learners, while the second term represents self-motivated learning. The fitness of the population is evaluated, and the best learners are selected using a greedy selection approach.
Greedy selection
The Greedy Selection process involves comparing the fitness scores (
) of the current generation learner with the previous generation learner. If the fitness score of the current generation learner is greater than that of the previous generation learner, the current generation learner replaces the previous generation learner in the population for the next generation. However, if the fitness score of the current generation learner is not greater, the previous generation learner remains unchanged in the population for the next generation. This process continues iteratively until a termination condition is met, which is often defined as a predetermined number of iterations.
Proposed method
In this section, the proposed Search space and Encoding are introduced, then the EvoThy-Net method is presented.
Search space and encoding
Table 1 outlines the parameters and their respective ranges within this search space. The proposed search space is designed to balance segmentation accuracy and computational efficiency, taking into account the intrinsic challenges of ultrasound imaging such as speckle noise, low contrast, and limited annotated data. It consists of four complementary building blocks, each combined with multiple activation and normalization options, allowing the evolutionary process to adaptively select the most suitable configurations. The four blocks, shown in Fig. 2, are drawn from well-established architectures56–59 and were selected based on both their representational capability and computational characteristics. Block 1 (Residual Block)56 introduces lightweight skip connections that promote stable gradient flow and faster convergence with minimal computational overhead, making it well suited for ultrasound imaging where training stability is critical. Block 2 (UNETR Block)57, inspired by transformer-based designs, enables effective global context modeling and precise spatial reconstruction through transposed convolution operations to recover spatial resolution effectively, enabling accurate boundary reconstruction of small and low-contrast thyroid nodules from ultrasound imaging. While UNETR components are generally more computationally demanding due to their global attention mechanisms, the authors adopt a simplified and constrained version within the search space, ensuring that their use remains computationally feasible. Block 3 (UnetRes Block)58 combines residual learning with convolutional feature extraction, providing a favorable trade-off between contextual awareness and computational efficiency for delineating irregular thyroid nodules in ultrasound imaging. Block 4 (Modified ASPP Block)59 employs atrous convolutions to capture multi-scale contextual information without significantly increasing the number of parameters, which is beneficial for detecting nodules of varying sizes and echogenic patterns from ultrasound imaging.
Table 1.
Search space.
| Parameter | Range |
|---|---|
| Block IDs | [Block1, Block2, Block3, Block4] |
| Normalizations | [BATCH, INSTANCE, GROUP, None] |
| Activations | [MEMSWISH, ReLU, PReLU, Leaky ReLU] |
| Attention blocks | [AB1, AB2, AB3, AB4] |
| Optimizers | [Adam, AdamW, Adamax, SGD] |
| Loss functions | [DiceLoss, DiceFocalLoss, FocalLoss, TverskyLoss] |
Fig. 2.
Blocks.
In addition to block selection, the search space includes four normalization strategies—Batch60, Instance61, Group62, and no normalization—to accommodate the small batch sizes and intensity variations typical of ultrasound imaging, as well as four activation functions—ReLU63, MEMSWISH64, PReLU65, and Leaky ReLU66—to enhance nonlinear feature representation from ultrasound imaging. Further four attention blocks adopted from67, applied in the decoder phase instead of simple addition or concatenation, further enhance the model’s ability to focus on key features and improve segmentation quality. Figure 3 illustrates the four types of attention mechanisms used in this framework. Furthermore, the search space encompasses training parameters such as 4 types of loss functions, namely Dice loss68, Dice Focal loss69, Focal loss70, and Tversky loss71, as well as 4 types of optimizers, including Stochastic Gradient Descent (SGD), Adamax, AdamW, and Adam.
Fig. 3.
Attention blocks.
Encoding The encoder-decoder network proposed in this paper features a unique block structure distinct from the fixed block configurations seen in previous methodologies. The network consists of three main stages, comprising three encoder blocks, one bridge block, and three decoder blocks, resulting in a total of seven blocks. Each block is dynamically constructed from the defined search space, which is characterized by three key attributes — block ID, normalization layer, and activation function — all of which are optimized using the ITLBO algorithm. For the single block encoding process, 2 bits are used to represent the block ID, which allows for 22 possible combinations corresponding to four distinct block types. Similarly, 2 bits each are used to encode the normalization layer and activation function, accommodating 22 variations for each parameter. Therefore, for all seven blocks, a total of 14 bits are required for the block ID, 14 bits for the normalization layers, and 14 bits for the activation functions, resulting in a comprehensive and compact representation of the network’s structural configuration. In the decoder phase, the model incorporates attention mechanisms to enhance feature refinement and spatial accuracy. Each of the three decoder stages selects one attention block from a pool of four available attention modules, with 2 bits per stage allocated for this purpose. Thus, a total of 6 bits are utilized for encoding the attention block selections across all decoder stages. Furthermore, 2 bits each are reserved for defining the optimizer and loss function, enabling the evolutionary process to explore different training strategies for better convergence and segmentation accuracy. In total, the learner’s configuration is represented by a 52-bit encoding, as illustrated in Fig. 4, which compactly encodes the network’s architectural and training parameters. This efficient encoding scheme allows the ITLBO algorithm to effectively explore and optimize the search space, enabling the discovery of high-performing encoder–decoder architectures tailored for thyroid nodule segmentation.
Fig. 4.
Encoding of a learner.
EvoThy-Net
The EvoThy-Net uses an ED structure as a backbone network, which has been widely utilized in medical image segmentation tasks. In the ED structure, the input image is downsampled and high-level features are extracted by the encoder blocks, while the decoder blocks upsample the feature maps and generate the segmentation mask. To identify the optimal blocks and hyperparameters for constructing efficient ED networks, the EvoThy-Net used the ITLBO algorithm as a search strategy in the evolutionary process for thyroid nodule segmentation. The ITLBO algorithm is chosen as the search strategy because of its superior balance between optimization efficiency, robustness, and simplicity, making it highly suitable for neural architecture search tasks. Unlike other conventional evolutionary algorithms, ITLBO is a parameter-free optimization technique, reducing computational overhead and improving ease of implementation. It enhances the standard TLBO by introducing adaptive teaching factors, multiple teachers, and self-motivated learning, which together strengthen the balance between exploration and exploitation52–55. By considering the search space parameters as decision variables, the ITLBO algorithm aims to discover the optimal architecture capable of achieving the maximum dice score (Eq. 6), which serve as the fitness functions for a given dataset.
The flow diagram of the proposed EvoThy-Net method is illustrated in Fig. 5. Initially, a random population is initialized, and the fitness score of the initialized population is evaluated. In each iteration, the population undergoes the teacher phase, where new learners are derived following the procedure outlined in Sect. 3.2.2. The fitness scores of the new learners are then evaluated using Algorithm 1. Subsequently, the learners are added to the population using greedy selection. The resulting population is then transferred to the learner phase, where new learners are obtained using the procedure outlined in Sect. 3.2.3. The fitness scores of the new learners are evaluated using Algorithm 1, and the learners are added to the population using greedy selection. The resulting population is utilized as the population for the subsequent generation.
Fig. 5.
The flow diagram of EvoThy-Net based on ITLBO.
Algorithm 1.
The pseudocode for the fitness evaluation of a population
The pseudocode for the fitness evaluation of a population is provided in Algorithm 1. Each learner Pi from a population P is encoded into an ED model, as described in Sect. 4.1 [lines 1, 2]. Subsequently, the encoded ED model is trained on the training dataset for 150 epochs [lines 3 to 14], and its performance is examined on the validation dataset, with the dice score computed starting at epoch 80 [lines 7 to 12]. The model weights with the highest dice score are saved for testing the model [lines 9 to 12]. After completion of the training, the saved weights of the best-performing network with weights will be loaded, and the fitness value is calculated on the test dataset [lines 15 to 17] and stored. Finally, the calculated fitness values for all learners in the population are returned to the proposed method for further evolution of the population during the ITLBO phases.
Experimental results
In this section, the authors delve into the datasets and implementation specifics employed throughout the evaluation phase. Subsequently, the authors present the experimental outcomes of the proposed model and conduct a comparative analysis with established state-of-the-art TNS models.
Dataset
The performance of the proposed model was evaluated on two publicly available datasets: the DDTI dataset72 and the TN3K dataset73. The DDTI dataset consists of 637 ultrasound images of thyroid nodules with corresponding nodule masks acquired from a single device, while the TN3K dataset contains 3493 ultrasound images of thyroid nodules with their corresponding segmentation labels acquired from various devices. Furthermore, the authors pre-processed the images by resizing them to a resolution of 256 × 256 and normalizing the pixel values to the range of 0 to 1. Additionally, the datasets are randomly split into train Dtrain, validation Dval, and test Dtest datasets, as shown in Table 2. To avoid overfitting due to the limited size of the dataset, the authors utilized data augmentation techniques such as random rotations and flips during the training process. Overall, the use of these two diverse datasets ensures the generalizability of the proposed model and its ability to perform well on ultrasound images of thyroid nodules acquired from different devices and settings.
Table 2.
The number of images in train Dtrain, validation Dval, and test Dtest datasets.
| Dataset | Dtrain | Dval | Dtest |
|---|---|---|---|
| DDTI | 479 | 63 | 95 |
| TN3K | 2303 | 576 | 614 |
Evaluation metrics & implementation
The ITLBO algorithm is initiated with a population size Np of 20 learners, and number of subjects (decision variables j) is 52, represented as random binary strings, the number of teachers (Nt) is 3, and the number of generations Tg is chosen as 20. Then, employed early stopping criteria during candidate evaluation to reduce computational cost and unnecessary training time for poorly performing architectures. Specifically, if the Dice score did not improve for 10 consecutive epochs, training was terminated early. In addition, candidates whose Dice score remained below 0.1 within the first 20 epochs were discarded, as such models were unlikely to converge to meaningful solutions. Furthermore, the authors implemented a caching mechanism within the evolutionary process to avoid redundant evaluations. If a learner (architecture) had already been evaluated in a previous generation, its fitness value was retrieved from the cache instead of retraining the model, further reducing the total computational cost. All NAS experiments were conducted on a workstation equipped with an Intel Xeon 2.2 GHz CPU, an NVIDIA Quadro P5000 GPU, and 16 GB RAM. Throughout the training phase, a batch size of 2 and a learning rate of 0.0001 are employed. The implementation of the EvoThy-Net framework leverages Python 3.8 and PyTorch v1.9.0. The performance of the proposed method is evaluated using various metrics, including the dice score, accuracy, recall, precision, and Intersection over Union (IoU), to provide a comprehensive assessment of the model’s performance. These evaluation metrics were calculated using equations for false negatives, false positives, true negatives, and true positives, as specified in Eqs. 6, 7, 8, 9, and 10, respectively.
![]() |
6 |
![]() |
7 |
![]() |
8 |
![]() |
9 |
![]() |
10 |
Comparison with existing models
The performance of the discovered model from the EvoThy-Net method is evaluated using both the DDTI and TN3K datasets, and compared with various state-of-the-art segmentation methods, including SegNet74, Deeplabv3+75, U-net13, CPFNet76, Sgunet21, and TRFE+73. CPFNet and DeepLabv3 + were initialized using ImageNet-pretrained ResNet backbones77, while SegNet employed an ImageNet-pretrained VGG backbone12. These models were subsequently fine-tuned under the same experimental environment as the proposed method. In contrast, the remaining models do not inherently rely on pretrained backbones and were therefore trained from scratch, in accordance with their original implementations. Importantly, all models including EvoThy-Net were trained and evaluated under identical conditions, using the same datasets, input resolutions, data splits, number of training epochs, and evaluation metrics, ensuring a fair and unbiased comparison. The results of the comparative evaluation are presented in Table 3. Additionally, the segmentation results of the discovered model from the proposed method and existing models on the DDTI and TN3K datasets are displayed in Figs. 6 and 7, respectively.
Table 3.
Quantitative comparison with previous models.
| Dataset | Model | Accuracy | Precision | Recall | Dice score | IoU |
|---|---|---|---|---|---|---|
| DDTI | U-net13 | 0.8736 | 0.8680 | 0.7638 | 0.6955 | 0.6097 |
| SegNet74 | 0.9056 | 0.8942 | 0.8131 | 0.7114 | 0.6268 | |
| Deeplabv3+75 | 0.8918 | 0.8248 | 0.7310 | 0.6179 | ||
| CPFNet76 | 0.9166 | 0.8809 | 0.8415 | 0.7558 | 0.6468 | |
| Sgunet21 | 0.9133 | 0.9049 | 0.8353 | 0.7496 | 0.6300 | |
| TRFE+73 | 0.9226 | 0.9124 | 0.8561 | 0.7577 | 0.6548 | |
| EvoThy-Net (Proposed) | 0.9395 | 0.9181 | 0.8720 | 0.7748 | 0.6609 | |
| TN3K | U-net13 | 0.9436 | 0.8006 | 0.8310 | 0.7732 | 0.7099 |
| SegNet74 | 0.9650 | 0.8055 | 0.8500 | 0.7888 | 0.6881 | |
| Deeplabv3+75 | 0.9706 | 0.8321 | 0.8784 | 0.8280 | 0.7382 | |
| CPFNet76 | 0.9691 | 0.8174 | 0.8706 | 0.8103 | 0.7249 | |
| Sgunet21 | 0.9664 | 0.8087 | 0.8590 | 0.7981 | 0.7124 | |
| TRFE+73 | 0.9640 | 0.8390 | 0.8801 | 0.8306 | 0.7443 | |
| EvoThy-Net (Proposed) | 0.9682 | 0.8421 | 0.8813 | 0.8346 | 0.7458 |
Note: The highest values for each dataset are highlighted in bold.
Fig. 6.
Segmentation results of DDTI dataset.
Fig. 7.
Segmentation results of TN3K dataset.
From the table and figures, it can be observed that the U-net model underperforms due to its lack of prior knowledge about the location of the thyroid gland. Additionally, other models also exhibit lower performance and show inaccuracies in identifying non-thyroid nodules. However, the EvoThy-Net method consistently outperforms all other models across both the DDTI and TN3K datasets, achieving top scores in accuracy, recall, precision, dice score, and IoU metrics. The EvoThy-Net model produces more precise and reliable results by accurately classifying thyroid nodules and avoiding misclassification of non-thyroid nodules. Its superior performance across both datasets stems from innovative methodologies such as attention blocks, a distinctive block structure within the encoder-decoder architecture, and the ITLBO algorithm. The integration of attention blocks enables the model to focus on key regions, thereby enhancing feature extraction and localization. The distinct block structure within the encoder-decoder architecture optimizes data processing, leading to more precise segmentation. The use of the ITLBO algorithm plays a critical role in optimizing the network structures during the evolution process, ensuring efficient adaptation and improved performance. These qualitative and quantitative results affirm the proposed EvoThy-Net method’s effectiveness in accurately segmenting thyroid nodules from ultrasound images. This provides a promising pathway for further advancements in medical imaging, potentially transforming how clinicians diagnose and treat thyroid-related conditions.
Experimental analysis and discussion
In this section, the authors provide the analysis and discussion of the experimental results and the evolution process.
The parameters count, model size, and execution time of the proposed EvoThy-Net model were measured and compared with existing state-of-the-art models, as shown in Table 4. EvoThy-Net consists of approximately 9 million parameters and has a model size of 34 MB. This is substantially smaller compared to existing models such as U-net, CPFNet, Deeplabv3+, SegNet, Sgunet, and TRFE+, which are 118 MB, 123 MB, 226 MB, 112 MB, 51 MB, and 171 MB, respectively. In terms of performance, EvoThy-Net demonstrates superior computational efficiency, achieving an average inference time of ~ 20 ms per image on GPU, outperforming other compared models and approximately 120 ms on an Intel i5 CPU. EvoThy-Net’s significantly fewer parameters, smaller model size and time lead to improved memory efficiency and lower computational resource requirements. This is particularly advantageous for deployment on devices with limited hardware capabilities. In regions with restricted access to advanced hardware and infrastructure, EvoThy-Net can be deployed with ease. Additionally, the proposed model’s lightweight nature results in shorter computation times, enabling faster decision-making in clinical environments. Its streamlined architecture facilitates scalability across diverse healthcare settings. The flexibility and efficiency of EvoThy-Net make it an ideal choice for medical imaging tasks and support its widespread adoption in various healthcare contexts. By prioritizing efficiency without compromising accuracy, our model facilitates broader access to medical imaging services for patients with thyroid nodules, particularly in resource-limited regions.
Table 4.
Comparison of parameters, size and execution time of models.
| Model | No. of parameters | Model size | Time |
|---|---|---|---|
| U-net | 31 M | 118 MB | 45 ms |
| CPFNet | 32 M | 123 MB | 45 ms |
| Deeplabv3+ | 59 M | 226 MB | 80 ms |
| SegNet | 29 M | 112 MB | 40 ms |
| Sgunet | 13 M | 51 MB | 25 ms |
| TRFE+ | 44 M | 171 MB | 50 ms |
| EvoThy-Net (Proposed) | 9M | 34MB | 20 ms |
The performance of the proposed ITLBO algorithm is compared with other evolutionary algorithms, such as GA46, DE49, and Teaching Learning Based Optimization (TLBO)67. All algorithms were run for 20 generations, with each model trained for 60 epochs. The ITLBO algorithm achieved superior performance in terms of standard deviation, mean, and dice scores compared to other evolutionary algorithms, as shown in Table 5. Furthermore, the non-parametric Wilcoxon rank-sum test78 was utilized to assess the statistical significance of the results. As shown in Table 6, the test yielded p-values below 0.05, signifying that the results are statistically significant. These findings support the efficiency of the proposed approach, particularly when employing the ITLBO algorithm, in achieving superior outcomes for the TNS task. Consequently, the ITLBO algorithm is confirmed as a leading choice for the TNS task.
Table 5.
Comparison with other evolutionary algorithms.
| Algorithm | Standard deviation | Mean | Max Dice score |
|---|---|---|---|
| GA | 0.3928 | 0.7872 | 0.8102 |
| DE | 0.2163 | 0.8023 | 0.8289 |
| TLBO | 0.1739 | 0.8173 | 0.8298 |
| ITLBO | 0.0921 | 0.8247 | 0.8346 |
Table 6.
Statistical test results.
| Algorithm | p-value |
|---|---|
| ITLBO vs. GA | 1.6546e-05 |
| ITLBO vs. DE | 1.4653e-06 |
| ITLBO vs. TLBO | 1.0763e-05 |
Table 7 summarizes key distinctions of the proposed EvoThy-Net framework relative to existing NAS-based segmentation methods. In contrast to differentiable NAS approaches such as Auto-DeepLab42, which optimize cell-level topology within a fixed macro-architecture and demand substantial computational resources, EvoThy-Net employs an evolutionary NAS strategy that simultaneously optimizes block-level architecture and training parameters within a designed search space. Unlike EvoUNet46, genetic U-Net47 and DEvoNet49 models that typically reuse a single block type or vary only network depth, EvoThy-Net supports heterogeneous block selection, enabling different encoder and decoder stages to adopt distinct block types, normalization layers, and activation functions. Furthermore, EvoThy-Net uniquely incorporates attention mechanisms directly into the NAS process by treating attention blocks as searchable components, allowing the ITLBO algorithm to automatically determine both the attention type and its stage-wise placement. This integrated optimization strategy yields lightweight, task-adaptive architectures with improved segmentation performance.
Table 7.
Comparison of EvoThy-Net with existing NAS-based segmentation models.
| Method | NAS strategy | Search granularity | Architecture flexibility | Attention integration | Hyperparameters in search | Optimization algorithm |
|---|---|---|---|---|---|---|
| Auto-DeepLab | Differentiable NAS | Cell-level | Fixed macro structure | Not included | No | Gradient-based (DARTS) |
| EvoUNet | Evolutionary NAS | Network-level | Variable depth, fixed blocks | Not included | Partial | GA |
| Genetic U-Net | Evolutionary NAS | Network-level | Fixed U-Net backbone | Not included | No | GA |
| DEvoNet | Evolutionary NAS | Block-level | Same block repeated | Not included | Yes | DE |
| EvoThy-Net (Proposed) | Evolutionary NAS | Block-level (Distinct blocks) | Fully flexible ED structure | Yes (Searchable attention blocks) | Yes (Loss + Optimizer) | ITLBO |
Figure 8 presents the generation-wise progression of fitness (Dice) scores for the GA, DE, traditional TLBO, and ITLBO algorithms during the evolution process. The ITLBO algorithm exhibited a steady improvement in fitness values and consistently outperformed the other algorithms throughout the generations. This steady progress demonstrates ITLBO’s robustness and efficiency in exploring the search space and optimizing network structures. The comparison across generations showcases ITLBO’s superior performance, establishing it as the top choice for TNS tasks.
Fig. 8.
Comparison of fitness scores during evolution.
The aim of our approach is to determine the optimal parameters for constructing the encoder-decoder network and its hyperparameters for TNS. In Fig. 9, the authors present the frequency distribution of blocks that appeared in the top architectures for both the DDTI and TN3K datasets during the evolutionary process. It’s worth noting that block IDs 2 and 3 were observed more frequently than other blocks for both datasets throughout the evolution.
Fig. 9.
Frequency of blocks during evolution.
The effectiveness and efficiency of the neural network hinge not solely on the input-output relationship defined by individual blocks, but also on the prudent selection of normalization layers and activation functions. However, it is equally crucial to meticulously opt for these parameters to avert potential training stagnation within the network. Figures 10 and 11 present the frequency distribution of normalization layers and activation functions, respectively, during the evolutionary process on the two datasets. Interestingly, the MEMSWISH activation function and INSTANCE normalization layer were observed more frequently compared to other types.
Fig. 10.
Frequency of normalization layers during evolution.
Fig. 11.
Frequency of activation functions during evolution.
The segmentation performance experiences a notable enhancement with the integration of attention blocks in the proposed methodology. The efficacy and significance of this approach are distinctly illustrated through the utilization of attention mechanisms within the ED network. Illustrated in Fig. 12, the frequency distribution of each attention block throughout the evolutionary process underscores its pivotal role. Notably, attention blocks 2 and 3 recurrently emerge during evolution, underscoring their crucial contribution to achieving optimal segmentation results.
Fig. 12.
Frequency of attention blocks during evolution.
In addition, the training parameters, such as the loss function and optimizer, were also selected by the proposed method. Figure 13 illustrates the frequency distribution of loss functions during the search process. Remarkably, the Dice loss function emerges more frequently than other parameters on both datasets throughout the evolution process. Additionally, Fig. 14 showcases the frequency distribution of optimizers during the search process, revealing that AdamW is more commonly selected than other parameters on both datasets during evolution.
Fig. 13.
Frequency of loss functions during evolution.
Fig. 14.
Frequency of optimizers during evolution.
The dice score metric values during the training of the discovered model on both the DDTI and TN3K datasets are illustrated in Fig. 15. It can be observed that the training and validation metrics of EvoThy-Net show a steady improvement across epochs, indicating the effectiveness of the discovered architecture in segmenting thyroid nodules in ultrasound images.
Fig. 15.
Training and validation dice scores.
A comprehensive ablation study was conducted to evaluate the impact of using distinct block structures instead of a fixed configuration within the encoder-decoder network. The performance of each of the four block types from the defined search space was independently assessed on the DDTI and TN3K datasets, as presented in Table 8. The results reveal that the effectiveness of individual blocks varies across datasets — for instance, Block 3 achieved superior results on the DDTI dataset, whereas Block 4 performed better on the TN3K dataset. These variations indicate that no single block configuration consistently performs best across all scenarios. In contrast, the proposed EvoThy-Net, which dynamically integrates distinct blocks within the encoder-decoder framework, consistently outperforms all individual block configurations on both datasets. This demonstrates that leveraging heterogeneous block structures allows the model to capture diverse feature representations, leading to significantly improved segmentation accuracy and overall detection performance.
Table 8.
Block-wise performance.
| Block | DDTI | TN3k |
|---|---|---|
| Block1 | 0.7578 | 0.8175 |
| Block2 | 0.7503 | 0.8176 |
| Block3 | 0.7655 | 0.8187 |
| Block4 | 0.7598 | 0.8236 |
Furthermore, the authors conducted additional experiments comparing the proposed model’s performance using attention blocks, addition, and concatenation mechanisms. As shown in Table 9, the proposed model achieved substantially better results with attention blocks than with addition or concatenation. This improvement can be attributed to the ability of attention blocks to automatically learn and emphasize the most informative features, thereby enhancing the model’s representational power and segmentation quality.
Table 9.
Addition, concatenation and attention blocks performance.
| Block | DDTI | TN3k |
|---|---|---|
| Concatenation | 0.7613 | 0.8232 |
| Addition | 0.7489 | 0.8038 |
| Attention Blocks | 0.7748 | 0.8346 |
Conclusion and future enhancement
In this paper, a novel evolutionary NAS-based framework called EvoThy-Net is proposed for thyroid nodule segmentation in ultrasound images. The proposed method used an encoder-decoder network as a backbone, incorporating attention blocks to enhance the segmentation performance. ITLBO is employed to explore the designed search space and find the optimal block structure for the ED network, along with training parameters that improve the segmentation performance. The experimental results on the DDTI and TN3K datasets demonstrated that EvoThy-Net outperforms existing state-of-the-art TNS models in terms of all evaluated metrics. These results demonstrate the effectiveness of EvoThy-Net in accurately segmenting thyroid nodules in ultrasound images. Furthermore, the proposed model offers computational efficiency and a smaller model size compared to existing models. The significance of our work lies in its potential to address practical challenges associated with deploying TNS models in resource-constrained environments. Future research will focus on exploring weight-sharing techniques to further enhance the model’s training efficiency and applying EvoThy-Net to diverse medical image segmentation tasks beyond TNS, which could contribute to the diagnosis and treatment of various medical image tasks.
Author contributions
Author contributions: Naga Sujini Ganne: Conceptualization, Methodology/Study design, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review and editing, Visualization. Sivadi Balakrishna: Conceptualization, Methodology/Study design, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - review and editing, Visualization, Supervision.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Wei, X. et al. Ensemble deep learning model for multicenter classification of thyroid nodules on ultrasound images. Med. Sci. Monitor: Int. Med. J. Exp. Clin. Res.26, e926096–e926091 (2020). [DOI] [PMC free article] [PubMed]
- 2.Rahmandinof, A., Nazir, F. & Dwihapsari, Y. Image segmentation of thyroid spect using edge-based active contour model. in Journal of Physics: Conference Series, Vol. 1505, 012049 (IOP Publishing, 2020).
- 3.Huang, W. Segmentation and diagnosis of papillary thyroid carcinomas based on generalized clustering algorithm in ultrasound elastography. J. Med. Syst.44(1), 13 (2020). [DOI] [PubMed]
- 4.Abbasian Ardakani, A. et al. A hybrid multilayer filtering approach for thyroid nodule segmentation on ultrasound images. J. Ultrasound Med.38(3), 629–640 (2019). [DOI] [PubMed]
- 5.An, H., Ji, T., Lee, H. Y. & Im, I. C. The prevalence of thyroid nodules and the morphological analysis of malignant nodules on ultrasonography. J. Radiological Sci. Technol.42 (3), 201–207 (2019).
- 6.Esmaeili, E. N. et al. Parametrical modelling for texture characterization—a novel approach applied to ultrasound thyroid segmentation. PloS One14(1), e0211215 (2019). [DOI] [PMC free article] [PubMed]
- 7.Shahroudnejad, A. et al. Thyroid nodule segmentation and classification using deep convolutional neural network and rule-based classifiers. in 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 3118–3121 (IEEE, 2021). [DOI] [PubMed]
- 8.Garcia-Garcia, A. et al. A review on deep learning techniques applied to semantic segmentation. ArXiv Preprint arXiv :170406857 .
- 9.Ma J. et al. Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg.12, 1895–1910 (2017). [DOI] [PubMed]
- 10.Xu, L. et al. Dw-net: A cascaded convolutional neural network for apical four-chamber view segmentation in fetal echocardiography. Comput. Med. Imaging Graph.80, 101690 (2020). [DOI] [PubMed] [Google Scholar]
- 11.Ying, X. et al. Thyroid nodule segmentation in ultrasound images based on cascaded convolutional neural network, in: Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part VI 25, Springer, pp. 373–384. (2018).
- 12.Simonyan, K. & Zisserman, A. Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556
- 13.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical image computing and computer-assisted intervention 234–241 (Springer, 2015).
- 14.Song, W. et al. Multitask cascade Convolution neural networks for automatic thyroid nodule detection and recognition. IEEE J. Biomedical Health Inf.23 (3), 1215–1224 (2018). [DOI] [PubMed] [Google Scholar]
- 15.Liao, X. et al. Thyroid nodule segmentation in ultrasound images using a modified u-net and conditional random field model. Appl. Sci.9 (12), 2561 (2019). [Google Scholar]
- 16.Ding, J., Huang, Z., Shi, M. & Ning, C. Automatic thyroid ultrasound image segmentation based on u-shaped network. in 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) 1–5 (IEEE, 2019).
- 17.Liu, T. et al. Automated detection and classification of thyroid nodules in ultrasound images using clinical-knowledge-guided convolutional neural networks. Med. Image. Anal.58, 101555 (2019). [DOI] [PubMed] [Google Scholar]
- 18.Kumar, V. et al. Automated segmentation of thyroid nodule, gland, and cystic components from ultrasound images using deep learning. IEEE Access.8, 63482–63496 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang, L., Zhang, L., Zhu, M., Qi, X. & Yi, Z. Automatic diagnosis for thyroid nodules in ultrasound images by deep neural networks. Med. Image. Anal.61, 101665 (2020). [DOI] [PubMed] [Google Scholar]
- 20.Abdolali, F. et al. Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput. Biol. Med.122, 103871 (2020). [DOI] [PubMed] [Google Scholar]
- 21.Pan, H., Zhou, Q. & Latecki, L. J. Sgunet: Semantic guided unet for thyroid nodule segmentation. in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 630–634 (IEEE, 2021).
- 22.Pathak, R. K., Mishra, S., Sharan, P. & Roy, S. K. Nodule detection in infrared thermography using deep learning, in: 2022 IEEE 7th International conference for Convergence in Technology (I2CT), IEEE, pp. 1–6. (2022).
- 23.Kang, Q. et al. Thyroid nodule segmentation and classification in ultrasound images through intra-and inter-task consistent learning. Med. Image. Anal.79, 102443 (2022). [DOI] [PubMed] [Google Scholar]
- 24.Jin, Z. et al. Boundary regressionbased Reep neural network for thyroid nodule segmentation in ultrasound images. Neural Comput. Appl. 1–10 (2022).
- 25.Nguyen, D. T., Choi, J. & Park, K. R. Thyroid nodule segmentation in ultrasound image based on information fusion of suggestion and enhancement networks. Mathematics10, 3484 (2022). [Google Scholar]
- 26.Song, R. et al. Dualbranch network via pseudo-label training for thyroid nodule detection in ultrasound image. Appl. Intell.52 (10), 11738–11754 (2022). [Google Scholar]
- 27.Sun, J. et al. Tnsnet: thyroid nodule segmentation in ultrasound imaging using soft shape supervision. Comput. Methods Programs Biomed.215, 106600 (2022). [DOI] [PubMed] [Google Scholar]
- 28.Chen, F., Ye, H., Zhang, D. & Liao, H. Typeseg: A type-aware encoderdecoder network for multi-type ultrasound images co-segmentation. Comput. Methods Programs Biomed.214, 106580 (2022). [DOI] [PubMed] [Google Scholar]
- 29.Lu, J. et al. Gan-guided deformable attention network for identifying thyroid nodules in ultrasound images. IEEE J. Biomedical Health Inf.26 (4), 1582–1590 (2022). [DOI] [PubMed] [Google Scholar]
- 30.Tao, Z. et al. Local and context-attention adaptive lca-net for thyroid nodule segmentation in ultrasound images. Sensors22 (16), 5984 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kunapinun, A. et al. Improving Gan learning dynamics for thyroid nodule segmentation. Ultrasound. Med. Biol.49 (2), 416–430 (2023). [DOI] [PubMed] [Google Scholar]
- 32.Sun, X., Wei, B., Jiang, Y., Mao, L. & Zhao, Q. CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images (IEEE Signal Processing Letters, 2025).
- 33.Xiang, Z. et al. Federated learning via multi-attention guided UNet for thyroid nodule segmentation of ultrasound images. Neural Netw.181, 106754 (2025). [DOI] [PubMed] [Google Scholar]
- 34.Banerjee, T. et al. A novel hybrid deep learning approach combining deep feature attention and statistical validation for enhanced thyroid ultrasound segmentation. Sci. Rep.15(1), 27207 (2025). [DOI] [PMC free article] [PubMed]
- 35.Mou, L. et al. Dense dilated network with probability regularized walk for vessel detection. IEEE Trans. Med. Imaging. 39 (5), 1392–1403 (2019). [DOI] [PubMed] [Google Scholar]
- 36.Rajesh, C. & Kumar, S. Automatic retinal vessel segmentation using btlbo. in Soft Computing for Problem Solving: Proceedings of the SocProS 2022 189–200 (Springer, 2023).
- 37.Liu, H., Simonyan, K. & Yang, Y. Darts: Differentiable architecture search, arXiv preprint arXiv:1806.09055.
- 38.Rajesh, C., Sadam, R. & Kumar, S. An evolutionary chameleon swarm algorithm based network for 3d medical image segmentation. Expert Syst. Appl.239, 122509 (2024). [Google Scholar]
- 39.Zoph, B., Vasudevan, V., Shlens, J. & Le, Q. V. Learning transferable architectures for scalable image recognition. in Proceedings of the IEEE Conference on Computer vision and pattern recognition 8697–8710 (2018).
- 40.Mortazi, A. & Bagci, U. Automatically designing cnn architectures for medical image segmentation. in International Workshop on Machine Learning in Medical Imaging 98–106 (Springer, 2018).
- 41.Mou, L. et al. Cs2-net: deep learning segmentation of curvilinear structures in medical imaging. Med. Image. Anal.67, 101874 (2021). [DOI] [PubMed] [Google Scholar]
- 42.Liu, C. et al. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 82–92 (2019).
- 43.Weng, Y., Zhou, T., Li, Y. & Qiu, X. Nas-unet: neural architecture search for medical image segmentation. IEEE Access.7, 44247–44257 (2019). [Google Scholar]
- 44.Kim, S. et al. Scalable neural architecture search for 3d medical image segmentation. in International Conference on Medical Image Computing and ComputerAssisted Intervention 220–228 (Springer, 2019).
- 45.Fan, Z., Wei, J., Zhu, G., Mo, J. & Li, W. Evolutionary neural architecture search for retinal vessel segmentation, arXiv preprint arXiv:2001.06678.
- 46.Hassanzadeh, T., Essam, D. & Sarker, R. Evou-net: An evolutionary deep fully convolutional neural network for medical image segmentation. in Proceedings of the 35th Annual ACM Symposium on Applied Computing 181–189 (2020).
- 47.Wei, J. et al. Genetic Unet: Automatically designed deep networks for retinal vessel segmentation using a genetic algorithm. IEEE Transactions on Medical Imaging. [DOI] [PubMed]
- 48.Baldeon-Calisto, M. & Lai-Yuen, S. K. Adaresu-net: multiobjective adaptive convolutional neural network for medical image segmentation. Neurocomputing392, 325–340 (2020). [DOI] [PubMed] [Google Scholar]
- 49.Rajesh, C. & Kumar, S. An evolutionary block based network for medical image denoising using differential evolution. Appl. Soft Comput. 108776. (2022).
- 50.Rajesh, C., Sadam, R. & Kumar, S. A modified genetic algorithm based encoder-decoder network for brain tumor detection in 3D medical images. ACM Trans. Evol. Learn. (2025).
- 51.Rao, R. V. & Patel, V. An improved teaching-learning-based optimization algorithm for solving unconstrained optimization problems. Scientia Iranica. 20 (3), 710–720 (2013). [Google Scholar]
- 52.Wang, L. et al. An improved teaching–learning-based optimization with neighborhood search for applications of ann. Neurocomputing143, 231–247 (2014). [Google Scholar]
- 53.Yu, K., Wang, X. & Wang, Z. An improved teaching-learning-based optimization algorithm for numerical and engineering optimization problems. J. Intell. Manuf.27 (4), 831–843 (2016). [Google Scholar]
- 54.Ji, X., Ye, H., Zhou, J., Yin, Y. & Shen, X. An improved teaching-learning based optimization algorithm and its application to a combinatorial optimization problem in foundry industry. Appl. Soft Comput.57, 504–516 (2017). [Google Scholar]
- 55.Wu, D. et al. An improved teaching-learning-based optimization algorithm with reinforcement learning strategy for solving optimization problems. Comput. Intell. Neurosci. (2022). [DOI] [PMC free article] [PubMed]
- 56.Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. In International MICCAI Brainlesion Workshop (311–320) (Springer International Publishing, 2018). [Google Scholar]
- 57.Hatamizadeh, A. et al. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision 574–584 (2022).
- 58.Isensee, F. et al. nnu-net: Self-adapting framework for u-net-based medical image segmentation. ArXiv Preprint ArXiv:1809.10486 (2018).
- 59.Wang, G. et al. A noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from CT images. IEEE Trans. Med. Imaging. 39 (8), 2653–2663 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in International conference on machine learning, PMLR 448–456 (2015).
- 61.Ulyanov, D., Vedaldi, A. & Lempitsky, V. Instance normalization: The missing ingredient for fast stylization, arXiv preprint arXiv:1607.08022.
- 62.Wu, Y. & He, K. Group Normalization 3–19 (2018).
- 63.Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. in Icml (2010).
- 64.Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions, arXiv preprint arXiv:1710.05941.
- 65.He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. in Proceedings of the IEEE international conference on computer vision 1026–1034 (2015).
- 66.Maas, A. L. et al. Rectifier nonlinearities improve neural network acoustic models. in Proc. icml, Vol. 30 3 (2013).
- 67.Rajesh, C., Sadam, R. & Kumar, S. An evolutionary u-shaped network for retinal vessel segmentation using binary teaching–learning-based optimization. Biomed. Signal Process. Control. 83, 104669 (2023). [Google Scholar]
- 68.Zhao, R. et al. Rethinking dice loss for medical image segmentation. in 2020 IEEE International Conference on Data Mining (ICDM) 851–860 (IEEE, 2020).
- 69.Yeung, M., Sala, E., Sch¨onlieb, C. B. & Rundo, L. Unified focal loss: generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph.95, 102026 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. in Proceedings of the IEEE international conference on computer vision 2980–2988 (2017).
- 71.Ma, J. et al. Loss odyssey in medical image segmentation. Med. Image. Anal.71, 102035 (2021). [DOI] [PubMed] [Google Scholar]
- 72.Pedraza, L. et al. An open access thyroid ultrasound image database. in 10th International symposium on medical information processing and analysis Vol. 9287 188–193 (SPIE, 2015).
- 73.Gong, H. et al. Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Comput. Biol. Med. 106389. (2022). [DOI] [PubMed]
- 74.Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39 (12), 2481–2495 (2017). [DOI] [PubMed] [Google Scholar]
- 75.Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. in Proceedings of the European conference on computer vision (ECCV) 801–818 (2018).
- 76.Feng, S. et al. Cpfnet: context pyramid fusion network for medical image segmentation. IEEE Trans. Med. Imaging. 39 (10), 3008–3018 (2020). [DOI] [PubMed] [Google Scholar]
- 77.Targ, S., Almeida, D. & Lyman, K. Resnet in resnet: Generalizing residual architectures, arXiv preprint arXiv:1603.08029.
- 78.Wilcoxon, F. Individual Comparisons by Ranking methods, In: Breakthroughs in Statistics 196–202 (Springer, 1992).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

























