Skip to main content
Diagnostics logoLink to Diagnostics
. 2026 Feb 13;16(4):554. doi: 10.3390/diagnostics16040554

Adaptive Bandelet Transform and Transfer Learning for Geometry-Aware Thyroid Cancer Ultrasound Classification

Yassine Habchi 1, Hamza Kheddar 2, Mohamed Chahine Ghanem 3,*, Jamal Hwaidi 4
Editors: Rafał Obuchowicz, Michał Strzelecki, Adam Piórkowski, Karolina Nurzynska
PMCID: PMC12939836  PMID: 41750702

Abstract

Background and Objectives: Classification of thyroid nodules (TN) in ultrasound remains challenging due to limited labelled data and the limited capacity of conventional feature representations to capture complex, multi-directional textures. This work aims to improve data-efficient TN classification by integrating a geometry-adaptive Bandelet Transform (BT) with transfer learning (TL) to enhance feature representation and generalisation. Methods: The proposed pipeline first applies BT to strengthen directional and structural encoding in ultrasound images via quadtree-driven geometric adaptation. It then mitigates class imbalance using SMOTE and increases data diversity through targeted data augmentation. The resulting representations are classified using multiple ImageNet-pretrained architectures, where VGG19 yields the most consistent performance. Results: Experiments on the publicly available DDTI dataset show that BT-based preprocessing consistently improves performance over classical wavelet representations across multiple quadtree thresholds, with the best results obtained at T=30. Under this setting, the proposed BT+TL (VGG19) model achieves 98.91% accuracy, 98.11% sensitivity, 97.31% specificity, and a 98.89% F1-score, outperforming comparable approaches reported in the literature. Conclusions: Coupling geometry-adaptive transforms with modern TL backbones provides a robust and data-efficient strategy for ultrasound TN classification, particularly under limited annotation and challenging texture variability. The complete project is publicly available.

Keywords: bandelet transform, transfer learning, thyroid cancer, deep learning, medical imaging, diagnostic

1. Introduction

Thyroid nodules (TNs) are commonly encountered, and their evaluation is predominantly performed using ultrasound (US), which provides a fast, non-invasive, and radiation-free imaging method. Due to its ability to generate real-time, high-resolution images of the thyroid gland, the US has become the primary diagnostic tool for assessing thyroid abnormalities [1,2]. However, traditional computer-aided diagnosis (CAD) systems used by radiologists for diagnosing TN relied on manually extracted features, such as shape, texture, and margin characteristics. The process of manual feature extraction is labour-intensive, inefficient, and operator-dependent, often introducing variability and limiting large-scale applicability in clinical settings [3]. Ongoing innovations in deep learning (DL) techniques have greatly enhanced CAD systems. However, when employing DL for TN diagnosis, several challenges arise. First, DL models typically require large labelled datasets for effective training, which are often scarce in the medical imaging domain due to privacy concerns, annotation costs, and limited expert availability. Additionally, training DL models on small datasets can lead to significant overfitting issues, potentially leading to poor generalisation on unseen data. Their black-box nature also presents a significant interpretability challenge, making clinical adoption difficult, as clinicians often require transparent and explainable decision-making processes. Lastly, variability in US images, originating from differing hardware settings and user-specific techniques, operator techniques, and patient-specific factors, further complicates model robustness and generalisation.

Transfer learning (TL) has emerged as a promising solution to address these limitations by leveraging pretrained models, initially trained on large-scale, general-purpose datasets, and fine-tuning them on smaller, domain-specific datasets such as thyroid US images. This approach reduces the dependency on extensive labelled data, mitigates overfitting, and enhances model generalisation across diverse imaging conditions. TL also offers the advantage of significantly reducing computational costs and training time, making it more practical for clinical deployment [4,5].

Wavelet transform (WT) is an effective technique for processing US images, allowing multi-resolution analysis. It enhances feature extraction, noise reduction, and image details, improving diagnostic accuracy in medical imaging applications [6,7,8]. However, this classical WT often struggle to capture fine spatial details in US images, especially in thyroid imaging, due to their fixed frequency and resolution properties. These transforms fail to adapt to the varying features of the image, such as the heterogeneous texture of thyroid tissues or the complex, low-contrast boundaries between different tissue types. As a result, they kill complex geometries, causing discontinuities and loss of information, as they primarily operate in horizontal, vertical, and diagonal directions. This leads to suboptimal feature extraction, particularly in regions with small nodules or irregular structures, which are critical for accurate diagnosis [9,10,11].

This study trains a DL model using TL on a publicly available thyroid US image dataset. Pretrained architectures such as ResNet, DenseNet, and EfficientNet have been used, originally learned from large-scale datasets (e.g., ImageNet), and provide transferable feature representations suitable for medical imaging tasks with limited data [12,13]. In this work, the fine-tuned TL-based models are employed to classify TN as benign or malignant. The use of pretrained networks facilitates faster convergence and improved generalisation, which is critical for reliable TN diagnosis and clinical decision support [14]. Additionally, in the part of preprocessing, the authors focus on the use of second-generation wavelets, specifically bandlet transform (BT), which are designed to generate decorrelated coefficients, eliminate redundancy, preserve essential information, and struggle WT limitations. Specifically, BT are based on geometric principles and are adept at capturing complex structures that are often not apparent when using WT. In addition, the BT approach is able to model real-world data more effectively, particularly when the data is non-uniformly sampled, lies on curves, or exists in higher-dimensional spaces like surfaces and manifolds. This work makes a unique contribution by integrating BT with TL. As far as we know, no prior research has explored this particular combination in the context of thyroid cancer (TC) classification.

In summary, this work is characterised by the following:

  1. Methodological contributions: This study introduces a novel geometry-aware framework that integrates the BT with TL for TC classification in medical imaging, particularly for US images. The proposed approach exploits the capability of BT to adaptively model local geometric structures through quadtree-driven directional analysis, enabling the extraction of anisotropic and spatially coherent features that are not captured by conventional wavelet representations. By coupling these geometry-adaptive features with deep TL backbones, the framework enhances feature representation and robustness under limited labelled data conditions.

  2. Experimental contributions: Extensive experiments demonstrate that integrating BT with TL significantly improves the accuracy and reliability of TC classification compared with classical wavelet-based and standalone TL approaches. Several ImageNet-pretrained architectures are systematically evaluated to assess their generalisation capability on thyroid US images. The results identify the best-performing architecture among the tested models, highlighting its superior ability to extract discriminative geometric and textural features while reducing dependence on large annotated datasets.

An overview of the paper’s organisation is provided as follows: Section 2 examines prior research in the field that have utilised WT and TL, particularly in the context of TC diagnosis and classification. Section 3 presents the background and preliminaries necessary to understand the proposed approach, including the theoretical foundations of WT and TL. Section 4 describes the proposed methodology in detail, outlining the key steps of the algorithm. Section 5 discusses the experimental results, performance evaluation, and comparative analysis. The paper concludes in Section 6, which reflects on the main contributions and explores prospective research paths.

2. Related Work

In recent years, significant efforts have been directed toward diagnosing benign and malignant TC. While many studies initially relied on the fine-needle aspiration biopsy (FNAB) method, the increasing use of US imaging is driven by its accessibility and cost-effectiveness. A range of artificial intelligence-based approaches have emerged to enhance diagnostic accuracy. The related literature is organised below according to the artificial intelligence methodology employed.

Several studies have employed DL techniques for TC detection using imaging and clinical data. Vahdati et al. [15] propose a DL-based method combining YOLOv5 for detection and XGBoost for classification using transverse and longitudinal US views. Similarly, Wang et al. [16] introduce a model that integrates multimodal magnetic resonance imaging (MRI) data with clinical features to predict lymph node metastasis in papillary thyroid cancer. Gummalla et al. [17] develop a hybrid framework using a sequential convolutional neural networks (CNNs) and K-means clustering for classifying thyroid images. Chandana et al. [18] present a deep CNN model for classifying adenoma, thyroiditis, and cancer based on computed tomography (CT) and US scans. Wang et al. [19] utilise a combination of binary logistic regression (BLR) and CNN for metastasis prediction, integrating genetic mutations and clinical data. Qi et al. [20] apply Mask R-CNN with ResNet-50 and feature pyramid network (FPN) for detecting gross extrathyroidal extension in TC, outperforming radiologists in accuracy. Finally, Zhang et al. [21] propose an automated DL-based system daptive WT-based AdaBoost algorithm (AWT-AA) to differentiate benign and malignant nodules in US imaging, supported by logistic regression analysis.

Other studies utilise ensemble learning (EL) to improve model robustness and generalisability. Shah et al. [22] design a deep ensemble model incorporating long short-term memory (LSTM), GRU, and Bi-LSTM for mutation detection in thyroid adenocarcinoma, achieving high diagnostic accuracy from genomic data. Zhang et al. [23] introduce MC-CNNs along with a weighted averaging ensemble and Faster Apriori for multi-view medical image classification and association rule mining. Zhang et al. [24] develop a dynamic ensemble TL-based system that integrates multi-view ultrasonography data, featuring a novel weighting mechanism for optimal decision-making.

TL has been applied in several works to enhance generalisation on medical datasets. Chen et al. [25] utilise an improved GoogLeNet model with secondary TL, total variation-based image restoration, and joint training on hospital and public datasets to classify TN. Ma et al. [26] introduce Mul-DenseNet, a multi-channel DenseNet model pretrained on ImageNet to simultaneously segment thyroid and breast lesions in US images. Bakht et al. [27] apply fine-tuned VGG-19 and AlexNet models with a weighted classification layer to cytology slides, enhancing performance despite class imbalance.

Hybrid and domain-specific architectures have also been explored. Lu et al. [28] propose a dual-tree complex WT-based CNN for segmenting human thyroid optical coherence tomography (OCT) images. The model uses wavelet pooling to preserve texture details and resist noise, improving segmentation robustness. Wang et al. [29] present a soft-label fully convolutional network (SL-FCN), enabling more accurate boundary delineation in TC segmentation tasks compared to hard-label models.

To the best of the authors’ knowledge, no existing study has investigated a geometry-aware BT combined with TL for medical image diagnosis or classification. As discussed above and summarised in Table 1, prior works employ DL, wavelet-based features, or TL independently, whereas the proposed framework uniquely integrates geometry-aware BT, TL, and CNNs to improve directional feature representation and TC discrimination.

Table 1.

Summary and comparison of state-of-the-art TC approaches.

Ref. Used Method Datasets BPM (%) Limitations Employing
WT BT TL
 [15] Multi-view DL Private data Sensitivity: 84.00 Specificity: 63.00 F1 Score: 76.00 Studies should validate the results on other imaging modalities
 [16] AMMCNet Private data Accuracy: 85.70Sensitivity: 90.00 Specificity: 90.90 Small sample size and single-centre data
 [22] Deep EL Ensembl IntOGen Accuracy: 96.00 Sensitivity: 92.00 Specificity: 100 Limited dataset may affect generalisability
 [17] CNN with K-means DDTI Accuracy: 81.50 Precision: 97.40, Sensitivity: 83.10 Model performance depends on annotated data
 [18] Deep CNN Private data Accuracy: 97.20 High computational process
 [23] MC-CNNs DDTI, UCI thyroid and private datasets Acurracy: 98.70 The DL models require high computational power
 [29] SL-FCN DISH and FISH Breast, and Thyroid Dataset Dice Score: 89.00 SL-FCN requires significant computational power
 [19] BLR and CNN Clinicopathological AUC: 89.00 Using BLR but limited to retrospective data and needs validation on larger datasets
 [30] DL and EL DDTI Accuracy: 92.83 Precision: 87.76 Specificity: 88.89 Relies on a single public dataset
 [20] Mask R-CNN, ResNet-50 and FPN Private data Accuracy: 87.00 Sensitivity: 80.00 Specificity: 92.00 Single-province dataset and requiring broader validation
 [25] GoogLeNet with TL Hospital and public thyroid US images Accuracy: 96.04 F1 Score: 98.74 Precision: 98.42 The model requires high computational power
 [26] Multi-channel DenseNet Private data Accuracy: 92.57 Sensitivity: 98.69 F1 Score: 95.96 Dependence on high-quality annotations
 [24] Dynamic ensemble TL Private data Accuracy: 93.00Specificity: 95.00 F1 Score: 93.00 Limited data sources
 [27] Fine-tuned VGG-19 Private data Accuracy: 93.05 Sensitivity: 92.90 F1 Score: 92.80 Limited comparison with other models
 [28] WT-based CNN Custom human thyroid OCT Accuracy: 98.60 The model adds extra computations compared to traditional CNNs
 [21] AWT-AA Private data Accuracy: 95.00, Sensitivity: 97.50 Specificity: 86.00 The study is based on US images from a single institution

Abbreviations: Best performance metrics (BPMs); Wavelet transform (WT); Bandelet transform (BT); Transfer learning (TL).

3. Preliminaries

3.1. Wavelet Transform

WT is a powerful mathematical tool used in image processing to analyse images at multiple resolutions. Unlike the Fourier transform, which provides only frequency information, WT captures both spatial and frequency characteristics, making it highly effective for feature extraction. WT is based on the concept of analysing a signal or an image using scaled and shifted versions of a finite-duration function called the mother wavelet. This function is designed to be localised in both time and frequency domains, making wavelets suitable for capturing transient, localised, and multiscale features within signals and images. The fundamental principle of WT relies on the decomposition of a given function into a set of basis functions derived from a mother wavelet through scaling and translation operations. Mathematically, a wavelet function ψ(t) satisfies the admissibility condition, which ensures that it has zero mean and is well-localised:

+ψ(t)dt=0. (1)

The wavelet basis functions are constructed by modifying the scale and position of the mother wavelet as follows:

ψa,b(t)=1|a|ψtba, (2)

where a is the scaling parameter that controls the frequency resolution and b is the translation parameter that shifts the wavelet function in time. The continuous WT is given by

W(a,b)=+f(t)1|a|ψtbadt, (3)

where W(a,b) represents the wavelet coefficients at different scales and positions. The discrete wavelet transform (DWT), which is a computationally efficient variant of continuous WT, employs dyadic scales (a=2j) and integer translations (b=k2j) to construct the wavelet basis. This results in an orthogonal or biorthogonal representation that allows hierarchical decomposition of images into different frequency subbands. Using the DWT, an image is decomposed into four distinct frequency subbands: LL, LH, HL, and HH. The LL subband retains the approximation information, primarily reflecting the image’s coarse details, while LH, HL, and HH contain detailed coefficients corresponding to horizontal, vertical, and diagonal details, respectively [9,31].

This process is mathematically represented by the system of equations in (4), where I(x,y) denotes the original image. The two-dimensional DWT is carried out by first applying the one-dimensional DWT along the rows, followed by its application along the columns.

ILL(x,y)=mnI(m,n)ϕ(xm)ϕ(yn)ILH(x,y)=mnI(m,n)ϕ(xm)ψ(yn)IHL(x,y)=mnI(m,n)ψ(xm)ϕ(yn)IHH(x,y)=mnI(m,n)ψ(xm)ψ(yn) (4)

where ϕ(x) is the scaling function, which generates the LL subband (approximation coefficients), capturing the coarse image details; ψ(x) is the wavelet function, which generates the LH, HL, and HH subbands (detailed coefficients). This decomposition provides a multi-resolution representation of the image, allowing for efficient image processing techniques such as feature extraction.

In addition, the application of the WT to thyroid US images faces several intrinsic challenges related to speckle noise, low contrast, and device dependence. Although wavelet decomposition separates images into multiple frequency bands, the strong and signal-dependent nature of speckle noise often overlaps with diagnostically relevant high-frequency components, making it difficult to suppress noise without simultaneously removing fine structural details of TNs [32]. This trade-off may lead to loss of edge information and texture cues that are critical for reliable segmentation and classification. Furthermore, the inherently low contrast between normal thyroid tissue and nodules limits the effectiveness of wavelet-based enhancement, as subtle intensity variations may not be sufficiently amplified across all decomposition scales. Device-dependent variability in US acquisition (e.g., probe frequency, gain settings, and imaging protocols) introduces inconsistent frequency distributions, which can degrade the robustness and generalisation of wavelet-based features across datasets. In addition, the performance of WT-based approaches is highly sensitive to the choice of mother wavelet, decomposition level, and thresholding strategy, requiring careful tuning that may not transfer well between different clinical environments. These challenges highlight the difficulty of relying solely on WT for thyroid US analysis and motivate the integration of more adaptive learning-based frameworks [33,34].

3.2. Bandlet Transform

The bandlet transform, introduced by Le Pennec and Mallat [35], is designed to construct a basis that aligns with the geometric structure of an image by locally deforming the spatial domain. This deformation simplifies the structure into a separable basis along a fixed direction—either horizontal or vertical. A key aspect of this transform is the flow–curve relationship, where the flow in the vertical direction corresponds to curves with non-vertical tangents. This allows for the construction of test bandelets that respect the geometric regularity of each sub-block. Mathematically, the basis function is defined as

τ(x)=11+|c(x)|21c(x) (5)

where c(x) defines the curve, and c(x) denotes its slope, interpreted as the optical flow.

The bandlet transform is applied depending on the presence of geometric flow: if a sub-block exhibits no significant geometric variation, it is treated as uniformly regular and processed using a classical separable wavelet basis. Conversely, if geometric variations are detected, bandelet processing is employed. In cases involving singularities, additional Lagrangian-based computations are required.

The cost function governing this transformation is given by

L(f,R,B)=ffR2+λT2jRjG+RjB (6)

where f is the original image, fR is its reconstructed approximation, RjG denotes the number of bits used to encode the geometric flow (optical flow) in sub-block j, and RjB corresponds to the bits used to encode the quantised bandelet coefficients. The parameter λ is a Lagrangian multiplier that balances rate and distortion, and T is the quantisation step.

To efficiently represent the image, a quadtree decomposition is employed. This technique recursively divides the image domain into four quadrants (sub-blocks), denoted as S1,S2,S3, and S4, yielding a hierarchical representation. For each block S, the goal is to choose the best representation strategy—either by encoding S as a whole or by subdividing it further. This decision is governed by minimising the Lagrangian cost:

L0(S)=minLdirect(S),L~(S) (7)

Here, Ldirect(S) represents the cost of encoding the block S directly without further subdivision, while L~(S) is the cumulative cost of encoding its four children:

L~(S)=L0(S1)+L0(S2)+L0(S3)+L0(S4)+λT2 (8)

The term λT2 accounts for the overhead cost of subdivision. This recursive strategy ensures that each sub-block is processed in the most efficient manner, adapting the representation complexity to local image content. To address curved singularities in image structures, a deformation operator is applied to locally realign blocks so that anisotropic features are better captured along either horizontal or vertical directions. This leads to a new orthonormal basis in L2(Ω), replacing the standard horizontal wavelets Hn,jψ with geometry-adapted functions defined as

ϕj,j1(x1)ψj,j2(x2c(x1)),ψj,j1(x1)ϕj,j2(x2c(x1)),ψj,j1(x1)ψj,j2(x2c(x1)) (9)

Here, x1 and x2 represent the horizontal and vertical spatial coordinates of the image domain, respectively. The function c(x1) defines a local geometric flow or deformation that aligns the vertical coordinate x2 with directional image features such as edges. By warping the coordinate system through x2c(x1), the basis functions adapt to the underlying structure of the image, improving alignment with anisotropic features. This transformation process, known as bandeletisation, builds an orthonormal basis that is aligned with the image’s geometric structure. By doing so, it replaces traditional wavelet bases with more expressive functions, allowing for enhanced compression and analysis by better respecting geometric regularities in both horizontal and vertical orientations.

ψj,j1(x1)ψj,j2(x2c(x1))ψj,j1(x1)ϕj,j2(x2c(x1))ψj,j1(x1)ψj,j2(x2c(x1))=ψj,j1Hψj,j1Vψj,j1D(j,l)n1,n2, (10)

The theoretical foundation of relies on their ability to optimise representation in anisotropic geometric image structures, unlike traditional WT that only capture local oscillations. This geometric adaptability enables improved feature extraction, making bandelets particularly effective for DL applications where efficient hierarchical feature representation is crucial [18]. The final output of the bandlet transform consists of bandelet coefficients that capture the image’s directional and structural information. These coefficients, derived from optimal sub-block partitioning and adaptive transformations, form the bandelet feature vector, which is fed into a DL model. The bandelet feature vector includes multiscale directional energy distributions, geometric flow descriptors, and localised structural patterns, providing a compact yet expressive representation of the image. These bandelet features are then input into CNNs, or transformer-based models, depending on the application. The hierarchical nature of bandelet-based feature extraction ensures that DL models receive a structurally enriched representation of the data, significantly improving their performance in image classification, segmentation, super-resolution, and medical image analysis. By leveraging the BT’s ability to adapt to image geometry, DL architectures can process images more efficiently, achieving higher accuracy while reducing computational overhead [6,36].

3.3. Inductive TL

Inductive TL is a machine learning paradigm where a model trained on a source task is adapted to a target task, assuming that both tasks share some structural similarities while differing in their label spaces or distributions. Mathematically, given a labelled source dataset DS={(xiS,yiS)}i=1NS associated with a task TS and a labelled target dataset DT={(xjT,yjT)}j=1NT associated with a task TT, the goal is to learn a target function fT:XTYT by leveraging knowledge from a source function fS:XSYS, where XS=XT but P(XS,YS)P(XT,YT), meaning that while both tasks share a common feature space, they exhibit differences in their distributions or label mappings. The learning process consists of two key steps: (i) pretraining and (ii) fine-tuning. During pretraining, a model fS(x;θS) is trained on the source dataset to minimise the loss function:

θS=argminθi=1NSL(yiS,fS(xiS;θ)) (11)

where L refers to the loss function appropriate to the task at hand, typically cross-entropy for classification or mean squared error for regression. The learned parameters θS serve as the initialisation for the target model fT(x;θT), which is then fine-tuned on the target dataset using

θT=argminθj=1NTL(yjT,fT(xjT;θ)) (12)

Typically employing strategies such as feature extraction, where early layers of the pretrained model are frozen while only task-specific layers are updated, or full fine-tuning, where all model parameters are updated but with a lower learning rate to preserve generalizable knowledge. Optimisation is commonly performed using gradient descent with an update rule θTθTαθTLT, where α is the learning rate, ensuring stable adaptation to the new task. To balance knowledge retention and task-specific adaptation, a weighted loss function L=λLS+(1λ)LT can be employed, where λ controls the contribution of the source knowledge during training. Inductive TL is widely applied in DL, particularly in computer vision, where CNNs such as ResNet, VGG, or EfficientNet pretrained on ImageNet are fine-tuned for specialised applications like medical image analysis, object detection, or satellite imagery classification, as well as in natural language processing [8].

4. Methodology

This section evaluates the performance of both WT and approaches in enhancing image classification and feature extraction accuracy within the proposed framework. The algorithm is specifically designed to differentiate between malignant and benign TN, aiming to improve diagnostic precision by leveraging geometry-adaptive representations.

The overall procedural workflow is illustrated in Figure 1. The process begins with the digital database of thyroid ultrasound images (DDTI) dataset, where US images are acquired and preprocessed. Due to class imbalance, synthetic minority oversampling technique (SMOTE) is applied to augment benign cases and create a balanced dataset, followed by additional data augmentation (DA) techniques to further enhance data diversity. Feature selection is then performed using both WT and bases, allowing extraction of rich structural and texture descriptors. These features are subsequently fed into a DL classifier using TL from the pretrained VGG19 model. The dataset is split into training and testing subsets, and the classification model is trained accordingly. Performance is evaluated using key metrics and compared against alternative methods to assess the effectiveness and efficiency of the proposed approach. Algorithm 1 outlines the suggested TN classification algorithm based on DL techniques.

Algorithm 1 TN Classification Algorithm
 1: function LoadDatasets
 2:       Dataset ← DDTI ▹ 134 images: 14 benign, 62 malignant
 3:       return Dataset
 4: end function
 5: function Preprocess(Dataset)
 6:       Benign ← GetBenignImages(Dataset) ▹ 14 images
 7:       Malignant ← GetMalignantImages(Dataset) ▹ 62 images
 8:       New_Benign ← SMOTE (Benign, target = 28) ▹ Oversample benign
 9:       Balanced_Dataset ← Combine(New_Benign, Malignant[0:28])
10:      Augmented_Dataset ← Augment(Balanced_Dataset, Techniques = {Brightness, Flip, Rotate, Resise_to_512 × 512})
11:      return Augmented_Dataset ▹ Target: 2048 images
12: end function
13: function ExtractFeatures(Augmented_Dataset)
14:      Features ← []
15:      for each image in Augmented_Dataset do
16:            Bandelet_Features ← ApplyBandeletTransform(image)
17:            Features ← Add(Bandelet_Features)
18:      end for
19:      return Features
20: end function
21: function Classify(Augmented_Dataset, Features)
22:      Train_Data ← Take80Percent(Augmented_Dataset) ▹ 1638 images
23:      Val_Data ← Take20Percent(Augmented_Dataset) ▹ 410 images
24:      Models ← {VGG16} ▹ Simplified list
25:      for each model in Models do
26:            LoadPretrained(model)
27:            FineTune(model, Train_Data, Features)
28:            Predictions ← Test(model, Val_Data)
29:            Save(Predictions)
30:      end for
31:      return Predictions
32: end function
33: function Evaluate(Predictions, Val_Data)
34:      for each model in Predictions do
35:            Accuracy ← CalculateAccuracy(Predictions, Val_Data)
36:            Sensitivity ← CalculateSensitivity(Predictions, Val_Data)
37:            Display(model, Accuracy, Sensitivity)
38:      end for
39: end function
40: function Main
41:      Dataset ← LoadDatasets()
42:      Augmented_Dataset ← Preprocess(Dataset)
43:      Features ← ExtractFeatures(Augmented_Dataset)
44:      Predictions ← Classify(Augmented_Dataset, Features)
45:      Evaluate(Predictions, Augmented_Dataset)
46: end function

Figure 1.

Figure 1

Schematic representation of the study’s methodology.

4.1. Input TC Datasets

The DDTI dataset [37], provided by the Universidad Nacional de Colombia and the Instituto de Diagnóstico Médico (IDIME), is an open-access collection of thyroid US images. Table 2 summarises its key characteristics. This dataset was selected as the primary source for this study due to its credibility, public availability, and relevance to TN classification tasks. Previously used in related research, such as in [38], it serves as a reliable benchmark for performance comparison. Its open-access nature addresses the common challenge of restricted medical datasets, which often require complex ethical approvals. Additionally, the dataset is well-annotated, with clear labels distinguishing between benign and malignant cases, ensuring suitability for supervised learning. Although limited in size, the DDTI dataset remains valuable given the scarcity of publicly available, high-quality thyroid US datasets. Figure 2 illustrates sample thyroid US images from the DDTI dataset, showcasing both malignant and benign cases.

Table 2.

Characteristics of the DDTI Thyroid US Dataset.

Attribute Description
Image Count 134 images
Image Format PNG (some JPEG)
Image Resolution 560×315 pixels
Benign Cases 14 images
Malignant Cases 62 images
Image Modality B-mode 2D greyscale US
Frame Rate 15–30 fps (derived from video sequences)
US Equipment Toshiba Nemio 30 and Nemio MX
Transducer Types 12 MHz linear and convex transducers
Axial Resolution 0.1–0.15 mm
Lateral Resolution 0.5–1 mm
Penetration Depth 4–6 cm
Field of View 38–50 mm (linear), 60–80 mm (convex)
Dynamic Range 50–70 dB

Figure 2.

Figure 2

TC images samples from the DDTI dataset.

4.2. Preprocessing

To tackle dataset imbalance in TC classification, a two-step approach was applied: synthetic oversampling followed by DA [39,40]. Initially, the dataset comprised 14 benign and 62 malignant images, resulting in a significant class imbalance that could bias model predictions. To mitigate this, the SMOTE was employed to generate additional synthetic benign images by interpolating between existing samples, increasing their count from 14 to 28. This adjustment ensured parity with the 28 malignant images, creating a more balanced dataset for training. Following dataset balancing, various DA techniques were implemented to further expand the dataset to 2048 images, introducing variability to enhance model generalisation and robustness. The augmentation process included brightness adjustments to simulate diverse lighting conditions, ensuring the model’s adaptability to varying illumination levels. Nearest-neighbour fill was applied to preserve pixel integrity during transformations, preventing unwanted artefacts. Height scaling was introduced to modify the vertical proportions of images, while horizontal flipping effectively doubled spatial variations. Rotation was employed to alter image orientation, preventing model bias toward specific angles. This preprocessing not only mitigated class imbalance but also enriched dataset diversity, reducing overfitting risks and enhancing the model’s ability to accurately classify benign and malignant cases. By integrating SMOTE with targeted DA, the dataset became more representative and robust, ultimately leading to improved generalisation and performance in TC classification.

Prior to applying the BT, all US images are resized to a fixed spatial resolution of 512×512 pixels and subsequently downsampled to 224×224 pixels to match the input size of the VGG19 architecture. After the Bandelet decomposition, normalisation is applied independently to each coefficient channel using z-score normalisation:

x^=xμσ, (13)

where μ and σ denote the mean and standard deviation of the bandelet coefficients computed over the training set. This normalisation step ensures comparable dynamic ranges across all channels and stabilises network training.

4.3. Bandelet Feature Selection

The proposed feature extraction strategy exploits the intrinsic geometric structure of thyroid US images in order to enhance the encoding of directional and textural information for classification. Empirical studies demonstrate that the classification performance of TC images improves significantly when using TL-extracted features compared to WT [41]. Figure 3 illustrates the main processing stages of the proposed framework for thyroid US image analysis. Starting from TN images acquired from the DDTI dataset, the images are transformed into a geometry-aware representation that emphasises structural and directional characteristics. The resulting bandelet coefficients provide a compact and discriminative description of complex and anisotropic patterns in TNs, which is subsequently exploited by the DL classifier for robust feature learning.

Figure 3.

Figure 3

The steps employed in the proposed DL-based TC scheme.

After applying the BT to each preprocessed US image, the resulting representation consists of geometry-adapted coefficient maps obtained by deforming wavelet bases along locally estimated geometric flows using quadtree-based partitioning. Four dominant bandelet coefficient channels are retained, including one approximation component and three directionally adapted detail components that encode horizontal, vertical, and diagonal geometric structures. Unlike conventional wavelet subbands, these bandelet coefficients preserve anisotropic edges and curved contours more effectively, capturing the most discriminative geometric and textural information relevant to TN characterisation, such as boundaries, contours, and spatially coherent patterns. These four bandelet coefficient maps are stacked along the channel dimension to form a multi-channel tensor compatible with CNN input requirements.

Finally, the normalised four-channel bandelet tensor is provided as input to the CNN in place of the original greyscale US image. This enables the network to learn directly from geometry-aware frequency representations rather than raw pixel intensities, allowing the deep model to exploit directional, structural, and textural cues encoded by the BT for improved TN classification.

4.4. Deep Classification

Medical image classification benefits considerably from the adoption of the proposed strategy, where acquiring large, well-annotated datasets is time-consuming and resource-intensive [8,42]. Typically, a network trained on a large dataset like ImageNet is either fine-tuned by adjusting its layers or used as a fixed feature extractor, transferring learned representations to the new task. To optimise training and evaluation, the dataset was carefully partitioned following DL best practices, with 80% (1638 samples) allocated for training and 20% (410 samples) designated for validation. The training subset was essential for refining model parameters, improving the ability to extract meaningful feature representations, while the validation subset played a crucial role in ensuring generalisation and detecting overfitting by evaluating the model’s performance on unseen data.

The VGG19 architecture, illustrated in Figure 4, is employed in this study as a pretrained model for TL. It consists of a structured DL model used for feature extraction and classification. It begins with an input image (from the DDTI dataset) and processes it through multiple convolutional layers with 3 × 3 filters and ReLU activation, which capture hierarchical features such as edges, textures, and patterns. After every two or four convolutional layers, a max pooling layer (red) is applied to reduce spatial dimensions while preserving essential features. The model progressively increases the number of filters from 64 to 512, allowing deeper layers to extract more complex representations. Once feature extraction is complete, the fully connected layers flatten the extracted features and pass them through two dense layers with 4096 neurons, followed by a Softmax layer that classifies the image into 1000 possible categories. In TL, the convolutional base is typically frozen, and the fully connected layers are modified or fine-tuned for new classification tasks, making VGG19 highly effective for our classification of TC images [43]. By leveraging TL, these models could retain valuable pre-learned representations, minimising the data and computational resources required for training, a crucial advantage in medical imaging applications where data scarcity is a challenge. In TL, modifying layers between a pretrained model and a proposed model involves carefully adapting the network architecture to balance feature reuse and task-specific learning. First, the pretrained model, typically a deep neural network VGG, is loaded, and its architecture is examined. The earlier convolutional layers embedding layers in transformers, which capture fundamental and transferable features, are usually frozen to retain their learned representations. The fully connected (dense) layers or task-specific heads in natural language processing models, which encode high-level, domain-specific features, are removed or replaced with new layers customised for the target task. This replacement often involves adding new dense layers, batch normalisation, and dropout (to prevent overfitting), incorporating an output layer that uses task-specific activation functions, like softmax for multi-class classification and sigmoid for binary cases. Fine-tuning can be applied to some middle layers if the new dataset is sufficiently large, gradually unfreezing layers while using a lower learning rate to prevent catastrophic forgetting of previously learned features. Additionally, TL methods such as feature extraction (where only new layers are trained) or full fine-tuning (where pretrained weights are adjusted) are selected based on dataset sise, computational power, and the similarity between the source and target domains. The modified model is then compiled and trained, leveraging techniques like learning rate scheduling and data augmentation to improve adaptation while ensuring that the knowledge from the pretrained model enhances the performance of the new task.

Figure 4.

Figure 4

The typical set of layers in a pretrained VGG19 model.

4.5. Performance Metrics

Common metrics are used for evaluating US-based TC classification. These metrics offer a thorough assessment of the model’s performance in differentiating between benign and malignant TNs. The Accuracy (Acc) measures the overall correctness of predictions. As a harmonic mean of precision and recall, the F1 Score serves as a comprehensive metric for evaluating classification performance, especially in imbalanced datasets:

Acc(%)=TP+TNTP+FP+TN+FN×100,F1(%)=2×Precision×RecallPrecision+Recall×100 (14)

Sensitivity (Sen) quantifies the proportion of actual malignant cases correctly identified, Specificity (Spe) measures the proportion of benign cases correctly classified, and Precision (P) indicates the ratio of predicted malignancies that are confirmed as actual malignant instances:

Sen(%)=TPTP+FN×100,Spe(%)=TNTN+FP×100,P(%)=TPTP+FP×100 (15)

These metrics collectively ensure a reliable and multidimensional assessment of classification performance. The abbreviations TP, TN, FP, and FN refer to true positives, true negatives, false positives, and false negatives in that order.

5. Results and Discussion

5.1. Experiments

The obtained training and validation accuracy, along with their corresponding losses for different DL models, are illustrated in Figure 5. To justify the adoption of VGG19 in this study, multiple pretrained models were assessed and compared. The obtained results demonstrate that VGG19 consistently outperforms other models in TC classification using the BT+TL approach. The training accuracy curve (Figure 5a) shows that VGG19, ResNet50, and DenseNet201 achieve rapid convergence, surpassing 90% accuracy early in the training process, while MobileNetV2, EfficientNetB0, and GoogLeNet exhibit slower improvements and lower final accuracy values. Similarly, the validation accuracy graph (Figure 5b) confirms that VGG19 maintains the highest accuracy, followed by ResNet50 and DenseNet201, indicating strong generalisation to unseen data. In contrast, MobileNetV2 and GoogLeNet display lower validation accuracy, suggesting difficulties in capturing complex patterns in TC images. The training loss curve (Figure 5c) illustrates a steady decrease across all models, with VGG19 and ResNet50 achieving the lowest final loss values, reflecting their ability to minimise classification errors efficiently. However, GoogLeNet and MobileNetV2 maintain relatively higher loss values, implying weaker learning performance. The validation loss curve (Figure 5d) further supports these findings, as VGG19, ResNet50, and DenseNet201 exhibit smooth, consistently decreasing validation loss, whereas MobileNetV2 and GoogLeNet show more fluctuations, suggesting overfitting or instability during validation. These results highlight the superior performance of VGG19, which not only achieves the highest accuracy but also demonstrates better stability, faster convergence, and lower loss values, making it the most effective model for TC classification in this study.

Figure 5.

Figure 5

DL classification performance based on BT+TL (VGG19) for TC images classification. (a) Training accuracy; (b) validation accuracy; (c) training loss; (d) validation loss.

In Table 3, the performance of the WT and BT is compared under varying quadtree decomposition thresholds (T) across four evaluation metrics: accuracy, sensitivity, specificity, and F1 Score. The WT serves as a strong baseline, achieving an accuracy of 0.9430, sensitivity of 0.9302, specificity of 0.9433, and an F1 Score of 0.9584. When the bandlet is applied, performance improves at lower thresholds (T=10), with enhanced accuracy (0.9511), sensitivity (0.9622), specificity (0.9774), and F1 Score (0.9807) compared to the WT. As the threshold increases to T=20, both accuracy (0.9635) and F1 Score (0.9645) continue to rise; however, sensitivity decreases to 0.9108, indicating a reduced ability to identify true positives. The optimal performance for the is observed at T=30, yielding the highest accuracy (0.9891), sensitivity (0.9811), and F1 Score (0.9889), while maintaining a strong specificity of 0.9731.

Table 3.

A comparison of results between BT and WT for different values of quadtree decomposition threshold (T).

Transform Accuracy Sensitivity Specificity F1
Wavelet 0.9430 0.9302 0.9433 0.9584
Bandelet (T = 10) 0.9511 0.9622 0.9774 0.9807
Bandelet (T = 20) 0.9635 0.9108 0.9802 0.9645
Bandelet (T = 30) 0.9891 0.9811 0.9731 0.9889
Bandelet (T = 40) 0.9803 0.9723 0.9897 0.9746
Bandelet (T = 50) 0.9787 0.9794 0.9825 0.9764

For the BT, the threshold value T=30 used in the quadtree decomposition was determined empirically based on the Lagrangian cost function defined in the theoretical framework. Since the threshold controls the stopping criterion for region subdivision according to homogeneity and segmentation cost, several candidate values of T were evaluated by minimising the Lagrangian cost while monitoring segmentation accuracy and computational complexity. The value T=30 achieved the best trade-off between accurate region partitioning and limited over-segmentation, while avoiding excessive computational burden. This selection links the theoretical cost formulation with the experimental setup and ensures both stable convergence and good generalisation performance.

In Table 4, the performance of the proposed BT+TL (VGG19) model is compared with several recent state-of-the-art approaches evaluated on the same dataset. Gummalla et al. [17] achieved an accuracy of 0.8150 and sensitivity of 0.8310, with a high precision of 0.9740. Their method focused on enhancing TN classification using DL-based feature extraction; however, the relatively low accuracy and incomplete reporting of specificity indicate limitations in achieving balanced classification performance, particularly in reducing false negatives and false positives simultaneously. Zhang et al. [23] reported a high accuracy of 0.9870 using a DL framework for thyroid image analysis. Although this result demonstrates strong classification capability, the absence of sensitivity, specificity, and precision metrics makes it difficult to comprehensively assess the robustness and clinical reliability of their model, especially in terms of error distribution across malignant and benign classes. Sharma et al. [30] obtained an accuracy of 0.9283, specificity of 0.8889, and precision of 0.8776 by integrating an IoT-assisted DL system for medical image classification. While their approach shows reasonable performance, the lower specificity and precision compared to the proposed model suggest limited discrimination capability, particularly in challenging or borderline thyroid cases. In contrast, the proposed BT+TL (VGG19) framework achieves the highest overall performance, with an accuracy of 0.9891, sensitivity of 0.9811, specificity of 0.9731, and precision of 0.9968. These results demonstrate a consistent improvement over existing methods across all reported metrics. The integration of BT enables the model to effectively capture complex geometric and spatial patterns in thyroid US images, while TL with VGG19 enhances feature generalisation and reduces dependency on large labelled datasets. This synergistic design yields superior discriminative capability and robustness, making the proposed model more suitable for reliable clinical thyroid classification.

Table 4.

A comparison of the performance between our proposed model and previous approaches using the same dataset.

Methods Accuracy Sensitivity Specificity Precision
 [17] 0.8150 0.8310 - 0.9740
 [23] 0.9870 - - -
 [30] 0.9283 - 0.8889 0.8776
Our BT+TL (VGG19) 0.9891 0.9811 0.9731 0.9968

5.2. Statistical Performance Analysis

To ensure the statistical reliability of the reported results, all experiments were conducted using repeated k-fold cross-validation. For each performance metric, descriptive statistics including the mean, median, variance, standard deviation, quartiles, as well as the minimum and maximum values, were computed across folds. The proposed BT and TL framework based on VGG19 achieved a high mean training accuracy of 98.90% and a closely matched validation accuracy of 98.85%, with low standard deviations (1.75 and 0.69, respectively), demonstrating strong stability and consistency across different data partitions. The observed training accuracy ranged from 87.5% to 100%, while validation accuracy varied within a narrower interval from 93.66% to 99.51%, indicating robust performance even under the least favourable data splits.

Likewise, both training and validation losses exhibit low mean values (0.0325 and 0.0351) with limited dispersion, as reflected by their small variances and narrow interquartile ranges. The minimum training and validation losses reached 0.0006 and 0.0133, respectively, whereas their maximum values remained bounded at 0.3298 and 0.1472, further confirming stable convergence behaviour and the absence of extreme outliers. Overall, the limited spread between minimum and maximum values, together with the tight quartile distributions, confirms the generalisation capability of the proposed model without evidence of overfitting.

5.3. Clinical Implications and Decision Support

From a clinical perspective, the proposed BT+TL framework is designed to function as a CAD support tool for thyroid US examination rather than as a replacement for radiologists. In a realistic clinical workflow, the US image acquired during routine examination can be automatically processed by the proposed system to provide a probability score indicating benign or malignant thyroid nodules, together with visual feature representations derived from geometry-aware Bandelet coefficients.

Such a system can assist radiologists in several ways. First, it can serve as a second-opinion tool to reduce diagnostic uncertainty, particularly in borderline or visually ambiguous cases. Second, it can improve diagnostic consistency by reducing inter-observer variability among clinicians with different levels of experience. Third, it can help prioritise high-risk cases for further examination, biopsy, or specialist referral, thereby improving workflow efficiency.

The lightweight nature of the proposed pipeline and its reliance on standard US images make it suitable for integration into existing hospital information systems or US workstations without modifying current acquisition protocols. By supporting clinical decision-making with objective and reproducible predictions, the proposed model has the potential to enhance diagnostic accuracy while maintaining the radiologist as the final decision authority.

6. Conclusions

In this paper, we introduced a novel TN classification framework that tightly integrates geometric feature encoding via the BT with TL on a pre-trained VGG19 backbone. The core technical innovation lies in employing a quadtree-driven BT to locally adapt basis functions to the intrinsic flow of nodule contours, yielding a sparse set of directional coefficients that encode anisotropic texture and edge continuity more faithfully than standard wavelets. By systematically varying the quadtree subdivision threshold, we demonstrated that finer-scale geometric adaptivity optimises the trade-off between coefficient sparsity and reconstruction fidelity, directly translating into a marked improvement in classification metrics (98.91% accuracy, 98.11% sensitivity, 97.31% specificity, 98.89% F1 Score).

From a methodological standpoint, our study confirms three key findings: (i) Geometric versus separable bases: Replacing classical WT with BT yields a consistent gain across all metrics, underscoring the value of geometry-aware multiscale analysis in US images. (ii) Quadtree threshold optimisation: Intermediate thresholds strike the best balance, whereas overly coarse (T < 10) or overly fine (T > 50) partitions either underfit large-scale structures or overfit noise, respectively. (iii) TL strategy: Leveraging VGG19’s mid-level feature maps, frozen through the initial training epochs and then selectively unfrozen, accelerates convergence and mitigates overfitting on the limited DDTI dataset.

Looking forward, several avenues warrant exploration. First, extending the BT stage to incorporate adaptive Lagrangian rate-distortion optimisation could further refine coefficient selection under constrained bit budgets, facilitating on-device inference for portable US systems. Second, integrating vision Transformers as the downstream classifier may capitalise on the global attention mechanism to exploit further the long-range dependencies encoded by bandelet features. Finally, exploring semi-supervised or self-supervised pre-training on large unlabelled US repositories—augmented by synthetic data generated via generative adversarial networks—could improve generalisability and alleviate the label scarcity that continues to challenge medical imaging applications.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-5 (OpenAI) and Grammarly v1.2.208 (Grammarly Inc.) for language editing (grammar and readability). The authors reviewed and edited the output and take full responsibility for the content of this publication.

Abbreviations

BT bandelet transform
CNN convolutional neural networks
US ultrasound
DA data augmentation
TC thyroid cancer
TN thyroid nodules
TL transfer learning
WT wavelet transform
DWT discrete wavelet transform
BT bandlet transform
DDTI digital database of thyroid ultrasound images
BLR binary logistic regression
FPN feature pyramid network
SL-FCN soft-label fully convolutional network
OCT optical coherence tomography
SMOTE synthetic minority oversampling technique
LSTM long short-term memory
DL deep learning

Author Contributions

Conceptualisation, Y.H., H.K., M.C.G. and J.H.; Methodology, Y.H. and H.K.; Data curation, H.K.; Resources, Y.H. and H.K.; Investigation, Y.H., H.K., M.C.G. and J.H.; Visualisation, H.K., M.C.G. and J.H.; Writing—original draft, Y.H., H.K., M.C.G. and J.H.; Writing—review and editing, Y.H., H.K., M.C.G. and J.H. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Ethical review and approval were waived for this study because it involved no human participant recruitment or intervention and no new data collection; the study used only publicly available, de-identified US images.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Publicly available datasets were analysed in this study. The DDTI thyroid ultrasound dataset used in this work is publicly available (see Ref. [31]).

Conflicts of Interest

The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding Statement

The APC and Open Access fees for this work are funded by the University of Liverpool.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Cao C.L., Li Q.L., Tong J., Shi L.N., Li W.X., Xu Y., Cheng J., Du T.T., Li J., Cui X.W. Artificial intelligence in thyroid ultrasound. Front. Oncol. 2023;13:1060702. doi: 10.3389/fonc.2023.1060702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Habchi Y., Himeur Y., Kheddar H., Boukabou A., Atalla S., Chouchane A., Ouamane A., Mansoor W. AI in thyroid cancer diagnosis: Techniques, trends, and future directions. Systems. 2023;11:519. doi: 10.3390/systems11100519. [DOI] [Google Scholar]
  • 3.Yu X., Wang H., Ma L. Detection of thyroid nodules with ultrasound images based on deep learning. Curr. Med. Imaging. 2020;16:174–180. doi: 10.2174/1573405615666191023104751. [DOI] [PubMed] [Google Scholar]
  • 4.Mazari A.C., Kheddar H. Deep learning-and transfer learning-based models for COVID-19 detection using radiography images; Proceedings of the 2023 International Conference on Advances in Electronics, Control and Communication Systems (ICAECCS); Blida, Algeria. 6–7 March 2023; Piscataway, NJ, USA: IEEE; 2023. pp. 1–4. [Google Scholar]
  • 5.Kheddar H., Himeur Y., Amira A. BreathAI: Transfer Learning-Based Thermal Imaging for Automated Breathing Pattern Recognition; Proceedings of the 2025 IEEE International Conference on Image Processing (ICIP); Anchorage, AK, USA. 14–17 September 2025; Piscataway, NJ, USA: IEEE; 2025. pp. 2612–2617. [Google Scholar]
  • 6.Beladgham M., Habchi Y., Ben Aissa M., Taleb-Ahmed A. Medical video compression using bandelet based on lifting scheme and SPIHT coding: In search of high visual quality. Inform. Med. Unlocked. 2019;17:100244. doi: 10.1016/j.imu.2019.100244. Correction in Inform. Med. Unlocked 2020, 21, 100474. [DOI] [Google Scholar]
  • 7.Mohammed B., Yassine H., Abdelmouneim M.L., Abdelmalik T.A. A comparative study between bandelet and wavelet transform coupled by EZW and SPIHT coder for image compression. Int. J. Image Graph. Signal Process. 2013;5:9. doi: 10.5815/ijigsp.2013.12.02. [DOI] [Google Scholar]
  • 8.Habchi Y., Kheddar H., Himeur Y., Boukabou A., Atalla S., Mansoor W., Al-Ahmad H. Deep transfer learning for kidney cancer diagnosis. arXiv. 2024 doi: 10.48550/arXiv.2408.04318.2408.04318 [DOI] [Google Scholar]
  • 9.Habchi Y., Beladgham M., Taleb-Ahmed A. RGB Medical Video Compression Using Geometric Wavelet and SPIHT Coding. Int. J. Electr. Comput. Eng. 2016;6:1627–1636. doi: 10.11591/ijece.v6i4.pp1627-1636. [DOI] [Google Scholar]
  • 10.Habchi Y., Aimer A.F., Beladgham M., Bouddou R. Ultra low bitrate retinal image compression using integer lifting scheme and subband encoder. Indones. J. Electr. Eng. Comput. Sci. (IJEECS) 2021;24:295–307. [Google Scholar]
  • 11.Ding X., Liu Y., Zhao J., Wang R., Li C., Luo Q., Shen C. A novel wavelet-transform-based convolution classification network for cervical lymph node metastasis of papillary thyroid carcinoma in ultrasound images. Comput. Med. Imaging Graph. 2023;109:102298. doi: 10.1016/j.compmedimag.2023.102298. [DOI] [PubMed] [Google Scholar]
  • 12.Habchi Y., Kheddar H., Himeur Y., Ghanem M.C. Machine learning and transformers for thyroid carcinoma diagnosis. J. Vis. Commun. Image Represent. 2025;115:104668. doi: 10.1016/j.jvcir.2025.104668. [DOI] [Google Scholar]
  • 13.Pavithra S., Vanithamani R., Judith J. Proceedings of the Second International Conference on Image Processing and Capsule Networks (ICIPCN 2021) Springer; Cham, Switzerland: 2022. Classification of stages of thyroid nodules in ultrasound images using transfer learning methods; pp. 241–253. [Google Scholar]
  • 14.Sureshkumar V., Jaganathan D., Ravi V., Velleangiri V., Ravi P. A Comparative Study on Thyroid Nodule Classification using Transfer Learning Methods. Open Bioinform. J. 2024;17 doi: 10.2174/0118750362305982240627034926. [DOI] [Google Scholar]
  • 15.Vahdati S., Khosravi B., Robinson K.A., Rouzrokh P., Moassefi M., Akkus Z., Erickson B.J. A multi-view deep learning model for thyroid nodules detection and characterization in ultrasound imaging. Bioengineering. 2024;11:648. doi: 10.3390/bioengineering11070648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang X., Zhang H., Fan H., Yang X., Fan J., Wu P., Ni Y., Hu S. Multimodal MRI Deep Learning for Predicting Central Lymph Node Metastasis in Papillary Thyroid Cancer. Cancers. 2024;16:4042. doi: 10.3390/cancers16234042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gummalla K., Ganesan S., Pokhrel S., Somasiri N. Enhanced early detection of thyroid abnormalities using a hybrid deep learning model. J. Innov. Image Process. 2024;6:244–261. doi: 10.36548/jiip.2024.3.003. [DOI] [Google Scholar]
  • 18.Chandana K.H., Prasan U. Thyroid disease detection using CNN techniques. [(accessed on 15 December 2025)];THYROID. 2023 55 Available online: https://advancedengineeringscience.com/article/pdf/2023/02-353.pdf. [Google Scholar]
  • 19.Wang Z., Qu L., Chen Q., Zhou Y., Duan H., Li B., Weng Y., Su J., Yi W. Deep learning-based multifeature integration robustly predicts central lymph node metastasis in papillary thyroid cancer. BMC Cancer. 2023;23:128. doi: 10.1186/s12885-023-10598-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Qi Q., Huang X., Zhang Y., Cai S., Liu Z., Qiu T., Cui Z., Zhou A., Yuan X., Zhu W., et al. Ultrasound image-based deep learning to assist in diagnosing gross extrathyroidal extension thyroid cancer: A retrospective multicenter study. eClinicalMedicine. 2023;58:101905. doi: 10.1016/j.eclinm.2023.101905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhang F., Sun Y., Wu X., Meng C., Xiang M., Huang T., Duan W., Wang F., Sun Z. Analysis of the application value of ultrasound imaging diagnosis in the clinical staging of thyroid cancer. J. Oncol. 2022;2022:8030262. doi: 10.1155/2022/8030262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Shah A.A., Daud A., Bukhari A., Alshemaimri B., Ahsan M., Younis R. DEL-Thyroid: Deep ensemble learning framework for detection of thyroid cancer progression through genomic mutation. BMC Med. Inform. Decis. Mak. 2024;24:198. doi: 10.1186/s12911-024-02604-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang X., Lee V.C. Deep Learning Empowered Decision Support Systems for Thyroid Cancer Detection and Management. Procedia Comput. Sci. 2024;237:945–954. doi: 10.1016/j.procs.2024.05.183. [DOI] [Google Scholar]
  • 24.Zhang X., Liu F., Lee V.C.S., Jassal K., Di Muzio B., Lee J.C. Dynamic Ensemble Transfer Learning with Multi-View Ultrasonography for Improving Thyroid Cancer Diagnostic Reliability. J. Imaging Inform. Med. 2025. online ahead of print . [DOI] [PubMed]
  • 25.Chen W., Gu Z., Liu Z., Fu Y., Ye Z., Zhang X., Xiao L. A new classification method in ultrasound images of benign and malignant thyroid nodules based on transfer learning and deep convolutional neural network. Complexity. 2021;2021:6296811. doi: 10.1155/2021/6296811. [DOI] [Google Scholar]
  • 26.Ma J., Bao L., Lou Q., Kong D. Transfer learning for automatic joint segmentation of thyroid and breast lesions from ultrasound images. Int. J. Comput. Assist. Radiol. Surg. 2022;17:363–372. doi: 10.1007/s11548-021-02505-y. [DOI] [PubMed] [Google Scholar]
  • 27.Bakht A.B., Javed S., Dina R., Almarzouqi H., Khandoker A., Werghi N. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020) Springer; Cham, Switzerland: 2021. Thyroid nodule cell classification in cytology images using transfer learning approach; pp. 539–549. [Google Scholar]
  • 28.Lu H., Wang H., Zhang Q., Won D., Yoon S.W. A dual-tree complex wavelet transform based convolutional neural network for human thyroid medical image segmentation; Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI); New York, NY, USA. 4–7 June 2018; Piscataway, NJ, USA: IEEE; 2018. pp. 191–198. [Google Scholar]
  • 29.Wang C.W., Lin K.Y., Lin Y.J., Khalil M.A., Chu K.L., Chao T.K. A soft label deep learning to assist breast cancer target therapy and thyroid cancer diagnosis. Cancers. 2022;14:5312. doi: 10.3390/cancers14215312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sharma R., Mahanti G.K., Chakraborty C., Panda G., Rath A. An IoT and deep learning-based smart healthcare framework for thyroid cancer detection. ACM Trans. Internet Technol. 2023 doi: 10.1145/3637062. [DOI] [Google Scholar]
  • 31.Chao Z., Duan X., Jia S., Guo X., Liu H., Jia F. Medical image fusion via discrete stationary wavelet transform and an enhanced radial basis function neural network. Appl. Soft. Comput. 2022;118:108542. doi: 10.1016/j.asoc.2022.108542. [DOI] [Google Scholar]
  • 32.Boucherit I., Kheddar H. Reinforced Residual Encoder–Decoder Network for Image Denoising via Deeper Encoding and Balanced Skip Connections. Big Data Cogn. Comput. 2025;9:82. doi: 10.3390/bdcc9040082. [DOI] [Google Scholar]
  • 33.Rahate A.J., Quazi R. A Review of Overcoming Speckle Noise Challenges in Ultrasound Imaging with Different Wavelet Transformation. Abdom. Imaging. 2023;2:5. [Google Scholar]
  • 34.Bhonsle D., Saxena K., Sheikh R.U., Sahu A.K., Singh P., Rizvi T. Wavelet based random noise removal from color images using Python; Proceedings of the 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT); Bhilai, India. 11–12 January 2024; Piscataway, NJ, USA: IEEE; 2024. pp. 1–5. [Google Scholar]
  • 35.Le Pennec E., Mallat S. Sparse geometric image representations with bandelets. IEEE Trans. Image Process. 2005;14:423–438. doi: 10.1109/TIP.2005.843753. [DOI] [PubMed] [Google Scholar]
  • 36.Deo B.S., Pal M., Panigrahi P.K., Pradhan A. An ensemble deep learning model with empirical wavelet transform feature for oral cancer histopathological image classification. Int. J. Data Sci. Anal. 2024;20:1005–1022. doi: 10.1007/s41060-024-00507-y. [DOI] [Google Scholar]
  • 37.DDTI. [(accessed on 1 March 2023)]. Available online: https://cimalab.unal.edu.co/software/detail/2.
  • 38.Li X., Fu C., Xu S., Sham C.W. Thyroid Ultrasound Image Database and Marker Mask Inpainting Method for Research and Development. Ultrasound Med. Biol. 2024;50:509–519. doi: 10.1016/j.ultrasmedbio.2023.12.011. [DOI] [PubMed] [Google Scholar]
  • 39.Goodman J., Sarkani S., Mazzuchi T. Distance-based probabilistic data augmentation for synthetic minority oversampling. ACM/IMS Trans. Data Sci. (TDS) 2022;2:40. doi: 10.1145/3510834. [DOI] [Google Scholar]
  • 40.Khan A.A., Chaudhari O., Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst. Appl. 2024;244:122778. doi: 10.1016/j.eswa.2023.122778. [DOI] [Google Scholar]
  • 41.Fang Y., Liu J., Li J., Cheng J., Hu J., Yi D., Xiao X., Bhatti U.A. Robust zero-watermarking algorithm for medical images based on SIFT and Bandelet-DCT. Multimed. Tools Appl. 2022;81:16863–16879. doi: 10.1007/s11042-022-12592-x. [DOI] [Google Scholar]
  • 42.Kheddar H., Himeur Y., Al-Maadeed S., Amira A., Bensaali F. Deep transfer learning for automatic speech recognition: Towards better generalization. Knowl.-Based Syst. 2023;277:110851. doi: 10.1016/j.knosys.2023.110851. [DOI] [Google Scholar]
  • 43.Habchi Y., Kheddar H., Himeur Y. Ultrasound Images Classification of Thyroid Cancer using Deep Transfer Learning; Proceedings of the 2024 International Conference on Telecommunications and Intelligent Systems (ICTIS); Djelfa, Algeria. 14–15 December 2024; Piscataway, NJ, USA: IEEE; 2024. pp. 1–6. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Publicly available datasets were analysed in this study. The DDTI thyroid ultrasound dataset used in this work is publicly available (see Ref. [31]).


Articles from Diagnostics are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES