Abstract
In recent years, there has been increasing research on computer-aided diagnosis (CAD) using deep learning and image processing techniques. Still, most studies have focused on the benign-malignant classification of nodules. In this study, we propose an integrated architecture for grading thyroid nodules based on the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS). The method combines traditional handcrafted features with deep features in the extraction process. In the preprocessing stage, a pseudo-artifact removal algorithm based on the fast marching method (FMM) is employed, followed by a hybrid median filtering for noise reduction. Contrast-limited adaptive histogram equalization is used for contrast enhancement to restore and enhance the information in ultrasound images. In the feature extraction stage, the improved ShuffleNetV2 network with multi-head self-attention mechanism is selected, and its extracted features are fused with medical prior knowledge features. Finally, a multi-class classification task is performed using the eXtreme Gradient Boosting (XGBoost) classifier. The dataset used in this study consists of 922 original images, including 149 examples belonging to class 2, 140 examples to class 3, 156 examples to class 4A, 114 examples to class 4B, 123 examples to class 4C, and 240 examples to class 5. The model is trained for 2000 epochs. The accuracy, precision, recall, F1 score, and AUC value of the proposed method are 97.17%, 97.65%, 97.17%, 0.9834, and 0.9855, respectively. The results demonstrate that the fusion of medical prior knowledge based on C-TIRADS and deep features from convolutional neural networks can effectively improve the overall performance of thyroid nodule diagnosis, providing a new feasible solution for developing clinical CAD systems for thyroid nodule ultrasound diagnosis.
Keywords: Thyroid nodules; Ultrasound images; C-TIRADS; ShuffleNetV2; Multi-head self-attention, Deep learning
Introduction
Background
The thyroid gland, located below the thyroid cartilage in the neck, is the largest endocrine gland in the human body. It plays a crucial role in regulating the body’s metabolism and ensuring the stability of the physiological system. Thyroid nodules can be classified as benign or malignant, with malignant nodules referred to as thyroid cancer. Recent studies have shown a significant increase in the incidence of thyroid cancer, with malignant nodules accounting for 5–15% and being one of the fastest-growing malignancies [1–4].
In the diagnosis of thyroid nodules, ultrasound imaging has become the primary screening method due to its timeliness, convenience, safety, and lack of radiation. Fine-needle aspiration is often performed in conjunction with ultrasound examination to obtain more accurate diagnostic results [5–7].
In general, doctors typically rely on their work experience in radiology to diagnose thyroid nodules. The Chinese Society of Ultrasound in Medicine’s Subgroup on Superficial Organs and Vascular Ultrasound proposed the “2020 Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) Guidelines for Thyroid Nodule Ultrasound Malignancy Risk Stratification” [8]. In the subsequent text, it will be referred to as TI-RADS. It describes ultrasound features of thyroid nodules, including solid composition, microcalcifications, marked hypoechogenicity, blurred margins, irregular margins or extrathyroidal extension, and vertical orientation. For less experienced radiologists, accurate and consistent identification of ultrasound images can be challenging, and there can be variations both between observers and within the same observer [5, 9–12]. Therefore, various computer-aided diagnosis systems have been introduced.
Related Research Work
In previous studies on the diagnosis of thyroid nodules, the medical imaging field has explored several promising methods to develop effective computer-aided diagnosis (CAD) tools. The workflow of CAD typically involves image preprocessing, feature extraction, and image classification. The classification methods for thyroid nodules can be divided into traditional machine learning methods and deep learning methods. Traditional machine learning- based medical image classification methods often extract shape, color, and grayscale information of lesions in ultrasound images and analyze the extracted features using machine learning algorithms for subsequent classification. In 2007, Keramidas et al. [13] proposed a longitudinal ultrasound image analysis scheme that designed a new algorithm to detect the boundaries of thyroid nodules and extracted local binary pattern features from the boundary region as features for classification. Savelonas et al. [14] used boundary features, performed feature clustering using the K-nearest neighbors (k-NN) algorithm, and used support vector machine (SVM) for classification to achieve risk assessment of thyroid nodules. Iakovidis et al. [15] proposed a thyroid ultrasound pattern characterization method that obtained texture features and achieved semi-supervised classification of thyroid nodules using an SVM classifier.
Compared with traditional machine learning techniques, deep learning has the capability to automatically learn feature representations of images, uncover latent features within data, and adapt more easily to medical domains [16]. Consequently, researchers have shifted their focus towards deep learning. In 2014, Simonyan [17] and Zisserman developed a convolutional neural network (CNN) model called Visual Geometry Group-16(VGG-16), which achieved an accuracy of 92.7% on the ImageNet dataset. Zhu et al. [18] made improvements based on the VGG-16 model and proposed the VGG-16 T model for thyroid nodule classification, achieving an accuracy of 86.43%. Chi et al. [19] designed a computer-aided diagnosis system using thyroid ultrasound images as the dataset and achieved the benign-malignant classification of thyroid nodules. The images were preprocessed to remove noise, and the extracted features were fed into a random forest classifier. The results showed a classification accuracy of 98.29% on an open database. Yang et al. [20] adopted transfer learning and replaced the cross-entropy loss function in the ResNet18 model with a focal loss function, achieving a classification accuracy of 98.4%. Feres et al. [21] addressed the issues of data scarcity and dataset imbalance using deep convolutional generative adversarial networks. Then, they performed feature extraction separately on convolutional neural networks and visual transformers (VIT). Finally, training was conducted on Softmax and SVM, with results indicating that the combination of mixed VIT and SVM yielded the best classification performance. Li et al. [22] developed a novel end-to-end network featuring a deformable attention network and distillation-driven interactive aggregation module. The deformable attention network learns nodule features guided by deformable attention modules and class activation mechanisms. Meanwhile, the distillation-driven interactive aggregation module utilizes the complementarity of adjacent layers to supplement multi-scale semantic information and enhance the representational capacity of aggregated features, leading to a significant improvement in classification results. Ruiguo Yu [23] proposed a feature discretization-based deep clustering (FDDC) algorithm to improve deep clustering algorithms, achieving a tumor classification accuracy of 79.06% and a machine type classification accuracy of 96.17%. Yuan Hang [24] applied histogram equalization, Laplacian operator, logarithmic transformation, and gamma correction to the images, and used a random forest model as the classifier, combining Speeded Up Robust Features (SURF) with deep features extracted from Residual Generative Adversarial Networks (Res-GAN), achieving an accuracy of 95%. Chen et al. [25] employed a multi-channel fusion approach to enrich the image information. Subsequently, they utilized an adaptive feature space fusion method to merge the three-level features extracted from ResNet50. Finally, the classification process was completed by integrating these features with Local Binary Patterns (LBP) features, achieving a classification accuracy of 96.91%. Tao et al. [26] improved a deep learning model based on ResNet50 architecture to extract features from Grayscale ultrasound (GSU), color Doppler flow imaging (CDFI), strain elastography (SE), and region of interest mask images of transverse and longitudinal sections. They achieved diagnosis of TI-RADS classes 3–5 thyroid nodules with an accuracy of up to 90.9%. Gao et al. [27] integrated deep learning, manually crafted radiological features, and ultrasound image features to diagnose central lymph node metastasis in patients with papillary thyroid carcinoma. Experimental results demonstrated that the model’s performance was optimal when combining the three different features. Tang et al. [28] designed a targeted dual-stream attention neural network called TS-DSANN, focusing on ultrasound image texture and shape. It adopts ResNet34 as the baseline model to pay attention to information related to nodule texture. In the shape-focused stream, in addition to using ResNet34, a contour detection module is incorporated to learn features related to nodule shape. Finally, the captured two different features are concatenated to complete the identification of thyroid nodule benignity and malignancy, achieving a recognition rate of up to 90.31%.
From previous work, it is evident that most scholars prefer to utilize deep learning for thyroid nodule ultrasound image classification tasks, overlooking the extensive clinical experience accumulated by doctors in their work. These experiential insights can contribute to further enhancing the diagnostic performance of computer-aided diagnosis systems for thyroid nodules.
This paper proposes a method that combines feature engineering with deep learning, which involves the fusion of medical prior knowledge and deep features to form a hybrid feature space that contains richer feature information about thyroid nodules, thereby meeting the requirements of multi-classification tasks for thyroid nodules.
Method
Due to the limitations of image acquisition devices and manual annotations, the quality of ultrasound images obtained is often low. To improve the accuracy of thyroid nodule classification, in the preprocessing stage, this paper not only uses hybrid median filtering [29] to remove noise but also designs a method to remove artifacts introduced by manual annotations, aiming to improve the quality of ultrasound images. In the feature extraction stage, this paper adopts two different strategies: traditional feature engineering and deep learning, to extract features of thyroid nodules. The U-net network [30] is used to segment the lesion area and obtain the edge information of the nodules. Traditional feature engineering is employed to extract shallow features of thyroid nodules, while convolutional neural networks are used to extract deep features. Finally, the two types of features are fused and input into the XGBoost classifier [31] for classification. In this study, an integrated network architecture is proposed to accomplish the classification of thyroid nodules, as shown in Fig. 1.
Fig. 1.
Integrated model architecture
Preprocessing
For the artificial annotations in the images, the first step is their removal. The original image is subjected to edge detection using the Roberts operator to extract the edge information of the artificial annotations. Subsequently, the edge image is binarized to create a mask image. The original image is then added to the mask image, resulting in an image with the annotations removed. The final step is image restoration. The fast marching method (FMM) [32] is employed to repair the image with removed annotations. The FMM algorithm utilizes neighboring pixels to replace the surrounding defective pixels. Once a pixel is repaired, it rapidly proceeds to the next defective pixel. To address noise in the images, a mixed median filtering technique is used, which effectively removes noise while preserving a significant amount of edge information associated with nodules. Following noise removal, the image undergoes contrast-limited adaptive histogram equalization to enhance image contrast and augment texture and echogenicity information in the nodule regions, as shown in Fig. 2.
Fig. 2.
The process of image preprocessing. a Original image. b Image after artifact removal. c Image after denoising and enhancement
Feature Engineering
In clinical practice, doctors observe the pathological features of thyroid nodules in ultrasound images. Based on this fact, this study adopts the widely used Chinese Thyroid Imaging Reporting and Data System [8] as a reference. The C-TIRADS system describes the ultrasound features of thyroid nodules, including comet tail sign, solid, hypoechoic, microcalcifications, blurred margins, irregular margins or extrathyroidal invasion, and vertical position of the thyroid nodule. These features are closely related to the classification of thyroid nodules. Comet tail artifact is considered a benign feature, while all others are malignant features. Ratings are based on a scoring system, where the presence of a benign feature subtracts 1 from the total score, and the presence of a malignant feature adds 1 to the total score. The specific criteria are shown in Table 1. In this study, these descriptions are summarized into five features: shape, boundary, aspect ratio, echogenicity, and calcification. To obtain information about shape, boundary, and aspect ratio, the segmentation of the nodule boundary needs to be performed from the original ultrasound images. Currently, image segmentation techniques include graph-based methods, region-based methods, and semantic segmentation. For accurate segmentation of thyroid nodule ultrasound images, this is a challenging task. In this study, a U-net network based on semantic segmentation is used. The U-net network excels in segmenting images in pixel dimensions, allowing for more accurate contour information of thyroid nodules, thus improving the final classification accuracy. In addition, statistical features extracted from the gray-level co-occurrence matrix (GLCM) are used as echogenicity features. The GLCM statistically analyzes the texture information in the image region based on pixel values in the ultrasound image, reflecting the situation where two pixels at a certain distance maintain certain gray levels. For calcification features, this study considers the degree of microcalcification in the thyroid nodule as its characteristic quantity. The parameters for edge point sharpness and GLCM are shown in Table 2.
Table 1.
C-TIRADS risk stratification criteria
| Score | Estimated malignancy rate | Risk level |
|---|---|---|
| − 1 | 0% | 2, benign |
| 0 | < 2% | 3, possibly benign |
| 1 | 2–10% | 4A, mildly suspicious for malignancy |
| 2 | 10–50% | 4B, moderately suspicious for malignancy |
| 3 or 4 | 50–90% | 4C, highly suspicious for malignancy |
| 5 | > 90% | 5, highly indicative of malignancy |
Table 2.
Feature engineering parameter settings
| Gray level | Distance | Direction | |
|---|---|---|---|
| EAV | / | 1 | 0°, 45°, 90°, 135°, 180°, 225°, 270°, 315° |
| GLCM | 16 | 1 | 0°, 45°, 90°, 135° |
Shape
This study uses circularity and ellipticity as two metrics to measure the shape features. Circularity describes the compactness of the image object, similar to the boundary index but based on the area. It measures the similarity between the contour shape of the nodule and its equivalent circle, calculated using Formula (1). Here, A represents the area of the nodule region, and P represents the perimeter of the nodule boundary. Ellipticity is derived based on the ratio of the overlapping area S1 between the nodule and the fitted ellipse to the area of the ellipse S2, as shown in Formula (2).
| 1 |
| 2 |
Edge
According to the description of the edge features of thyroid nodules in TI-RADS, the clarity of the nodule boundary is highly correlated with its grading. In this study, two metrics, edge sharpness (Formula (3)) and edge intensity (Formula (4)), are used to quantify the boundary features.
| 3 |
| 4 |
Aspect Ratio
The aspect ratio is an important indicator in ultrasound examinations. A nodule with an aspect ratio less than 1 is usually benign, while a nodule with an aspect ratio greater than 1 is usually malignant. It is typically represented by the ratio of the height to the width of the minimum bounding rectangle of the nodule, as shown in Formula (5). Here, depth represents the height of the bounding rectangle, and width represents the width of the bounding rectangle.
| 5 |
Echogenicity
Echogenicity is typically represented by the texture features of an image, which belong to a category of visual features that reflect the essential intrinsic characteristics of an object. In this study, a well-established algorithm called the gray-level co-occurrence matrix (GLCM) is used to describe the texture features of the entire image. From the GLCM, six statistical features are extracted: contrast, entropy, energy, inverse variance, correlation, and standard deviation. These features are based on the analysis of the gray-level relationships within the image and provide insights into the echogenicity of the thyroid nodules.
Contrast
Contrast directly reflects the brightness contrast between individual pixel values and their neighboring pixel values, to some extent reflecting the clarity of the image and the depth of the texture. The deeper the texture, the higher the contrast, resulting in a clearer visual effect. Conversely, a lower contrast indicates a shallower texture and a blurrier image.
| 6 |
Entropy
Entropy is a measure of the information contained in an image. Texture information is also a form of information within an image. Entropy represents the complexity and non-uniformity of the texture in the image.
| 7 |
Energy
Energy is the sum of the squares of the elements in the gray-level co-occurrence matrix. It reflects the uniformity of the grayscale distribution and the coarseness of the texture in the image. If the elements in the gray-level co-occurrence matrix are similar, the energy value will be smaller, indicating fine texture. Conversely, if the elements have larger differences, the energy value will be higher, indicating a single or regular pattern of texture variation.
| 8 |
Inverse Variance
Inverse variance reflects the local variation of the texture in an image. If the texture in different regions of the image is relatively uniform and changes slowly, the inverse variance will be larger. Conversely, if there are significant variations in texture between different regions, the inverse variance will be smaller.
| 9 |
Correlation
Correlation measures the similarity of gray levels in an image in the row and column directions. A higher correlation value indicates a stronger similarity between the gray levels.
| 10 |
Variance
Variance is used to describe the degree of variation in gray-level values between neighboring pixels. A larger variance indicates a more pronounced variation in gray-level values between adjacent pixels, suggesting a rougher or more chaotic texture. On the other hand, a smaller variance indicates a smoother variation in gray-level values between adjacent pixels, suggesting a more uniform or delicate texture.
| 11 |
Calcification
Statistics have shown that more than a quarter of malignant thyroid nodules contain calcifications, with the majority of them being microcalcifications. So microcalcifications can be considered as the most reliable indicator for diagnosing thyroid cancer. In medical practice, the degree of microcalcification is determined by the number and size of calcifications. The degree of microcalcification, denoted as L, is defined as follows:
| 12 |
Here, n represents the number of calcification regions, Si represents the area of the i-th calcification region, and k = 1.2 [33].
Improvement of ShuffleNetV2
The structure of convolutional neural networks has been evolving for many years, becoming more accurate and efficient. Since the milestone work of AlexNet [34], significant improvements in classification accuracy on ImageNet have been achieved through the introduction of new structures such as VGG [17], GoogLeNet [35], ResNet [36], DenseNet [37], EfficientNet [38], and ShuffleNet [39]. In this study, we use the ShuffleNetV2 model pre-trained on ImageNet as the base model and incorporate a multi-head self-attention mechanism for deep feature extraction. ShuffleNetV2 [40] is a lightweight network constructed by stacking efficient modules. It utilizes pointwise group convolution and inverted residual blocks, and introduces operations like “channel shuffle” and “channel split” to fuse information from different channels. Additionally, a 1 × 1 convolutional layer is added before the final global average pooling layer to mix the features, which was not present in ShuffleNetV1. The structure of ShuffleNetV2 is shown in Fig. 3.
Fig. 3.
ShuffleNetV2 network structure
To further extract features that are highly correlated with nodules, a multi-head self-attention mechanism [41] is introduced into the base architecture of ShuffleNetV2. After the last inverted residual block in each stage, the multi-head self-attention mechanism is added. The shuffled features are then passed through the attention layer, enhancing the correlation and importance of features across different channels. This further improves the classification accuracy after the fusion of features. The multi-head self-attention module is illustrated in Fig. 4.
Fig. 4.

The multi-head self-attention module
The multi-head self-attention mechanism allows for the selective filtering of important information from a large amount of input by computing the correlation between features. It focuses on a small subset of the features that are deemed more important. The input to the self-attention mechanism consists of the query matrix Q, key matrix K, and value matrix V, which are derived from the previous layer’s output feature map. The query matrix Q is used to query the key matrix K to obtain the corresponding weight matrix, which is then multiplied with the value matrix V to allow V to attend to the more important information. Compared to the attention mechanism, the advantage of self-attention lies in the fact that the Q, K, and V matrices are derived from the same input, making them inherently connected. Multi-head self-attention mechanism utilizes multiple attention heads to learn different attention distributions of the input data, with each head computing a corresponding attention distribution and weighting the input data accordingly. The introduced structure is shown in Fig. 5.
Fig. 5.
Improved ShuffleNetV2 network structure
Feature Fusion
In the field of medical image diagnosis, the application of feature fusion techniques has become quite mature. Feature fusion allows us to extract the most discriminative information from multiple input features, leveraging the complementary advantages of different features. In reference [42], EfficientB0, feature engineering, and feature pyramid network are employed for feature extraction. Subsequently, the Concat function is utilized for feature fusion to accomplish the classification of thyroid pathological features. This paper explores the possibility of improving the multi-classification strategy for thyroid nodule ultrasound images by combining medical prior knowledge with deep features. Specifically, this study combines quantified features based on medical prior knowledge with improved features extracted from the ShuffleNetV2 model, establishing a feature fusion framework. ShuffleNetV2 is a lightweight network, and the feature fusion strategy adopts an early fusion approach by directly concatenating the two features using the Concat function. The formula is as follows:
| 13 |
In the formula, “a” and “b” represent the two types of extracted features.
XGBoost Classifier
In machine learning, Boosting is an ensemble learning algorithm that can transform multiple weak classifiers into a strong classifier. The basic principle of Boosting is to use weighted training data as input for a series of weak classifiers, iteratively fitting the data. At each iteration, higher weights are assigned to the misclassified training samples. The final model is obtained by weighting the performance of all the models and outputting the combined prediction results. For example, voting can be used to solve the weighting problem for classification models, while arithmetic averaging can be used for regression models. Gradient boosting is based on the theory of Boosting and is aimed at differentiable objective functions. It calculates the marginal derivative of the loss function to update the model parameters, allowing the objective function to descend in the fastest direction.
XGBoost, short for eXtreme Gradient Boosting, is an extensible gradient boosting tree algorithm that improves upon the GBDT algorithm. Its base classifiers are classification and regression trees (CART). The final prediction result of XGBoost is the sum of the outputs from each CART model. XGBoost is an additive model and can be represented as:
| 14 |
In the formula, represents a function in the function space.
| 15 |
represents the set of all possible CART classifiers; “q” represents the structure of each tree, mapping each input to its corresponding leaf node. “T” represents the number of leaf nodes. “w” represents the weights of each leaf node in the CART model. “k” represents the number of CART trees, indicating the complexity of the model. The objective function of XGBoost can be expressed as:
| 16 |
In the objective function, “Ω” represents the regularization term.
XGBoost can fully utilize multi-core CPUs for parallel computation, which improves the computational speed of the model. XGBoost is also insensitive to missing values and can automatically learn the splitting directions for them. In addition to CART tree models, XGBoost also supports using linear classifiers as base classifiers, which enhances its scalability. The core idea of XGBoost can be summarized as follows: (1) Continuously adding and splitting to form a tree: Each time a tree is added, it learns a new function to fit the residuals of the previous predictions. (2) After training is complete, we have a collection of k trees. When predicting a sample, based on its features, it will be assigned to a leaf node in each tree, which represents the score for that sample. (3) The final prediction is obtained by summing up the scores of all leaf nodes, giving us the predicted value for the sample.
Results
Dataset and Experimental Setup
In this study, two different datasets were utilized. One dataset (Dataset 1) originated from The Second Hospital of Hebei Medical University, China, and was used for classification tasks, The other dataset (Dataset 2) was obtained from the medical image computing and computer-assisted intervention (MICCAI) match and was employed for segmentation tasks.
Dataset 1 were collected by radiologists from January 1, 2019, to January 1, 2022. Two different models of ultrasound machines, Siemens and HITACHI-ALOKA, were used to acquire the ultrasound images. Figure 6 illustrates examples from the ultrasound image dataset. Dataset 1 consists of 922 thyroid nodule ultrasound images collected from 922 patients, categorized into 2A (149 cases), 3 (140 cases), 4A (156 cases), 4B (114 cases), 4C (123 cases), and 5 (240 cases), with participants’ ages ranging from 20 to 70 years. All ultrasound images were independently annotated by three experienced ultrasound physicians, and they have reached a consensus on the annotations. Cases with incomplete reports or unclear TI-RADS classifications were excluded, saving considerable manpower for image analysis and processing. This study was approved by the ethics committee of the relevant institutions. In addition, this study augmented the Dataset1 by performing horizontal flipping to obtain 922 images, brightness enhancement to acquire 922 images, scaling operations to acquire 922 images, resulting in a total of 3688 images combined with the original dataset, including 2A (596 cases), 3 (560 cases), 4A (624 cases), 4B (456 cases), 4C (492 cases), and 5 (960 cases). The augmented dataset was randomly and evenly divided into 5 folds, with a ratio of 3:1:1 for training, validation, and test sets, respectively. The partitioned dataset ensures that images from the same patient appear only in one set. Four-fold cross-validation was conducted on the training and validation sets. Table 3 illustrates the distribution of data used for the classification task in this study.
Fig. 6.
Dataset presentation: a TI-RADS: 2, b TI-RADS: 3, c TI-RADS: 4A, d TI-RADS:4B, e TI-RADS:4C, f TI-RADS:5
Table 3.
The distribution of data used for the classification task
| Class | Original | Augmented | Training set | Validation set | Test set |
|---|---|---|---|---|---|
| TI-RADS:2 | 149 | 596 | 361 | 117 | 118 |
| TI-RADS:3 | 140 | 560 | 336 | 112 | 112 |
| TI-RADS:4A | 156 | 624 | 375 | 124 | 125 |
| TI-RADS:4B | 114 | 456 | 276 | 90 | 90 |
| TI-RADS:4C | 123 | 492 | 294 | 98 | 100 |
| TI-RADS:5 | 240 | 960 | 576 | 192 | 192 |
| Total | 922 | 3688 | 2218 | 733 | 737 |
Dataset 2, utilized for the segmentation task, is a publicly available dataset containing 3644 images, which is published in the medical image computing and computer-assisted intervention (MICCAI) match. Each image is accompanied by a mask image annotated by professional physicians. For the segmentation task, all 3644 images were used as the training set, while the 922 images collected in Dataset 1 served as the validation set. Specific details regarding Dataset 2 are provided in Table 4.
Table 4.
The distribution of data used for the segmentation task
| Training set | Validation set | |
|---|---|---|
| Total | 3644 | 922 |
All experiments were conducted on a Windows 11 × 64 system with an 11th Gen Intel(R) Core (TM) i5-11400H @ 2.70 GHz 2.69 GHz CPU and an NVIDIA GeForce RTX 3050 Laptop GPU. All experiments were performed using the PyTorch framework in Python.
Evaluation Metrics
In the study, the distribution of nodules varies across different categories. To mitigate the potential performance degradation caused by this class imbalance and to comprehensively evaluate the model’s performance on the dataset, this paper utilizes a confusion matrix to assess the model’s performance, including metrics such as accuracy, precision, recall, and F1 score. Additionally, the paper selects the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) as supplementary evaluation metrics. The ROC curve plots the false positive rate (FPR) on the horizontal axis and the true positive rate (TPR) on the vertical axis. A model with a ROC curve closer to the top-left corner indicates better predictive performance. For multi-class tasks, there are two methods to obtain the ROC curve: macro-average and micro-average. In this study, the macro-average approach is employed to calculate the AUC. Performing significance test on experimental results using ANOVA, the post hoc test method selected is Tukey’s HSD (honestly significant difference), and p < 0.05 is considered statistically significant.
Performance of Medical Prior Knowledge Classification
In this study, the quantified descriptions of nodule features in TI-RADS were used to perform a multi-class classification task for classes 2, 3, 4A, 4B, 4C, and 5. The classifier used the same set of parameters for all classes. Before extracting medical prior knowledge features, automatic segmentation of thyroid nodule lesion areas was performed using the U-net segmentation network, and the segmentation results are shown in Table 5. To evaluate the effectiveness of medical prior knowledge features, this study conducted comparative experiments using gray-level co-occurrence matrix, LBP, and Histogram of Oriented Gradients (HOG) features. Accuracy, precision, recall, F1 score, and AUC were used as important evaluation metrics. As shown in Table 6, the TI-RADS features achieved an accuracy of 76.45%, precision of 71.05%, recall of 64.82%, F1 score of 66.84%, and AUC of 0.9006 in this multi-class classification task. The use of TI-RADS features resulted in a 15% increase in accuracy, a 20% increase in recall, and an 11% increase in F1 score. This indicates that the use of medical prior knowledge features for multi-class classification tasks outperforms texture features extracted by various individual algorithms, confirming the effectiveness of the TI-RADS features used in this study.
Table 5.
Thyroid nodule segmentation results
| Method | MIou | Dice | mPA | mRecall |
|---|---|---|---|---|
| U-net | 0.8388 | 0.9069 | 0.8704 | 0.8704 |
Table 6.
The classification results of TI-RADS prior knowledge features and others
| Method | Accuracy | Precision | Recall | F1 score | AUC |
|---|---|---|---|---|---|
| LBP | 0.4698 | 0.4447 | 0.3301 | 0.3511 | 0.7193 |
| GLCM | 0.5797 | 0.5944 | 0.4853 | 0.5219 | 0.8016 |
| Hog | 0.6633 | 0.7535 | 0.5395 | 0.5995 | 0.8605 |
| TI-RADS | 0.7645 | 0.7105 | 0.6482 | 0.6684 | 0.9006 |
To determine whether there are statistically significant differences in performance among different features, ANOVA analysis and Tukey HSD test were conducted on the four methods in Table 6. The ANOVA analysis yielded F-values of 42.5126, 47.8737, 43.6748, 49.9772, and 45.9684 for the four methods across different indicators, all with p-values less than 0.05, indicating significant differences in performance metrics when classifying using different features. Results of the Tukey HSD test showed that TI-RADS features exhibited significant differences compared to LBP, GLCM, and Hog features across all performance indicators Based on the data in Table 6, it can be concluded that utilizing TI-RADS features for classification significantly outperforms the utilization of the other three features.
The Classification Performance of the Improved Model
For the extraction of deep features, this study compared the performance of several networks, including AlexNet, DenseNet, MobileNetV2 [42], EfficientNetB0, ShuffleNetV2, and an improved ShuffleNetV2, in the multi-class classification task of thyroid nodules. The model with the most outstanding performance was selected for deep feature extraction. The hyperparameters for each convolutional neural network were set the same, with the optimizer chosen as stochastic gradient descent (SGD), batch size set to 16, learning rate of 0.015, momentum of 0.9, and weight decay of 0.00004. The augmented dataset using Dataset 1 is used to all models, the classification results of the different network models are shown in Table 7. The improved ShuffleNetV2 model in this study demonstrated the best overall performance in the multi-class classification task of thyroid nodules, with an accuracy of 0.9530, precision of 0.9610, recall of 0.9556, F1 score of 0.9582, and area under the ROC curve (AUC) of 0.9886. All these metrics outperformed the other state-of-the-art models. Furthermore, in this study, Grad-CAM visualization was performed on the network models before and after the improvement. It was observed that after the introduction of multi-head self-attention mechanism, the network models focused more on the nodule region and its surrounding tissues. This confirms that the introduction of multi-head self-attention mechanism does enhance the feature correlation extracted by the ShuffleNetV2 network, thereby improving the final classification performance of the model.
Table 7.
The classification results of improved ShuffleNetV2 and several other models
| Method | Accuracy | Precision | Recall | F1 score | AUC |
|---|---|---|---|---|---|
| AlexNet [34] | 0.7418 | 0.7599 | 0.6923 | 0.7190 | 0.9276 |
| DenseNet [37] | 0.6447 | 0.6370 | 0.5873 | 0.6070 | 0.8600 |
| MobileNetV2 [42] | 0.6678 | 0.6598 | 0.6232 | 0.6387 | 0.8880 |
| EfficientNetB0 [38] | 0.8141 | 0.8259 | 0.7732 | 0.7958 | 0.9334 |
| ShuffleNetV2 [40] | 0.8618 | 0.8861 | 0.8988 | 0.8660 | 0.9641 |
| Improved | 0.9530 | 0.9610 | 0.9556 | 0.9582 | 0.9711 |
For each indicator in Table 7, the F-values obtained from the ANOVA analysis are 63.1958, 19.1597, 26.5431, 29.8367, and 64.3748, with p-values close to 0, all of which are less than 0.05, indicating significant differences in performance metrics among different models. Furthermore, the results of the Tukey HSD test showed significant differences (p < 0.05) in comparisons between the improved model and all other methods; based on the data in Table 7, it can be concluded that the improved model outperforms other models in comprehensive performance.
The Overall Classification Performance
Table 6 confirms the effectiveness of the features extracted based on TI-RADS. In this section, medical prior knowledge features and deep features are fused, followed by classification of thyroid nodules using the XGBoost classifier. Previous studies have explored the use of TI-RADS features for pathological feature classification [43] and benign-malignant classification [44], but the methods for quantizing TI-RADS features differ. Table 8 and Fig. 7 present the classification results of the overall model on the test set.
Table 8.
Fused-MPK performance evaluation and performance comparison with others
Fig. 7.
Confusion matrix of different models. a Confusion matrice of ShuffleNetV2. b Confusion matrice of improved ShuffleNetV2. c Confusion matrice of Fused-MPK
By comparing the results in Table 8, it can be observed that the proposed approach, which combines medical prior knowledge and deep features, achieved accuracy, precision, recall, F1 score, and AUC of 0.9717, 0.9765, 0.9717, 0.9834, and 0.9855, respectively. Except for slightly lower precision compared to reference [43], the other metrics outperformed the methods in references [43, 44].
In Fig. 7, the confusion matrices for ShuffleNetV2, the improved model, and the overall model are displayed, while Table 9 provides the classification accuracy of different models for each category, all obtained on the test set. Combining Table 9 and Fig. 7, it can be observed that when using the ShuffleNetV2 model for classification, the recognition accuracy for each category of thyroid nodules is relatively low, especially for 3, 4A, 4B, and 4C categories. However, with the introduction of the multi-head self-attention mechanism, the classification accuracy of the model significantly improves. Furthermore, the addition of medical prior knowledge features further enhances the classification performance of the model. The final accuracy for each category on the test set is 0.9745, 0.9375, 0.9840, 0.9888, 0.9500, and 0.9740. This indicates that the medical prior knowledge features and the multi-head self-attention module introduced in this study can further improve the classification accuracy of the ShuffleNetV2 model for thyroid nodules. Additionally, Fig. 8 provides detailed ROC curves for feature extraction and classification using different models, with the macro-average AUC calculated to intuitively reflect the classification results for each category.
Table 9.
The classification accuracy of different models on each category
| ShuffleNetV2 | Improved | Fused-MPK | |
|---|---|---|---|
| TI-RADS:2 | 0.8390 | 0.9237 | 0.9745 |
| TI-RADS:3 | 0.7054 | 0.8929 | 0.9375 |
| TI-RADS:4A | 0.7440 | 0.9360 | 0.9840 |
| TI-RADS:4B | 0.7000 | 0.9016 | 0.9888 |
| TI-RADS:4C | 0.7600 | 0.9200 | 0.9500 |
| TI-RADS:5 | 0.9220 | 0.9427 | 0.9740 |
Fig. 8.
ROC curves of different models on different categories. a ROC curve of TI-RADS:2. b ROC curve of TI-RADS:3. c ROC curve of TI-RADS:4A. d ROC curve of TI-RADS:4B. e ROC curve of TI-RADS:4C. f ROC curve of TI-RADS:5
In the significance test, the F-values of various performance indicators in Table 8 are 20.3783, 20.2216, 20.0335, 22.7803, and 19.9639, respectively, all with p-values less than 0.05, indicating significant differences in performance metrics among different methods. Tukey HSD test can indicate the differences between each pair of methods. In terms of accuracy and precision indicators, the differences between Duan’s and Fused-MPK are significant, as well as between Duan’s and Yu’s. In terms of F1 score and AUC indicators, the differences between Duan’s and Fused-MPK are significant. However, there are no significant differences among the three methods in terms of the Recall indicator. This suggests that when considering only accuracy and precision, Duan’s method performs better. However, when considering all five performance indicators comprehensively, Fused-MPK stands out among these three methods.
For the data in Table 9, the ANOVA analysis results in an F-statistic of 21.2312 and a p-value of approximately 4.22e-05. This p-value suggests that there are significant differences in accuracy among the three methods (ShuffleNetV2, improved, Fused-MPK) for different TI-RADS categories.
Visualization
In this study, we fused features extracted through two different methods and fed them into a classifier for classification. Considering the interpretability of deep learning models and assessing the feature extraction performance of the improved model, we employed Class Activation Maps (Grad-CAM) for visualization. Grad-CAM highlights the regions that the model focuses on for the classification task. We randomly selected an image from each of the six categories of thyroid nodules, as shown in the first column of Fig. 9. Subsequently, we visualized the global pooling layer in ShuffleNetV2, as depicted in the second column of Fig. 9. It can be observed that ShuffleNetV2 only partially covers the regions of interest on the nodules, indicating the reason for its lower classification accuracy: excessive attention on surrounding tissues. The third column shows the visualization of the improved model proposed in this study, which also extracts features from the global pooling layer. Compared to the second column, the improved model’s regions of interest almost entirely cover the nodule region, while also encompassing some surrounding tissues. This indicates that the model not only focuses on the features of the nodule, such as cystic solidification and echogenicity, but also pays attention to the edge features of the nodule.
Fig. 9.
Heatmaps of different models
Discussion
In this study, an integrated deep learning model was constructed. In the preprocessing stage, we designed an algorithm to remove artificial annotations, eliminating their impact on the classification results. Subsequently, the images were denoised and enhanced using a combination of median filtering and contrast-limited histogram equalization, addressing the differences between ultrasound images captured by different devices and the inherent issues of noise, low resolution, and blurred edges.
Furthermore, in the feature extraction stage, we fused medical prior knowledge features with deep features. From the results in Table 6, it can be observed that the classification accu-racy of the medical prior knowledge features was 0.7645 with an AUC value of 0.9006, performing significantly better than algorithms directly extracting image texture features using GLCM, HOG, and LBP. For extracting deep features, considering that this study involved a multi-classification task with certain difficulties, we introduced a self-attention module to ena-ble the model to learn more discriminative features. According to the results in Table 7, the improved model achieved a classification accuracy of 0.9530 with an AUC value of 0.9886, representing an approximately 11% improvement in accuracy compared to the previous model.
Table 8 summarizes the classification results of our proposed method and other similar methods in recent studies. However, direct comparisons are not feasible due to the following reasons: (1) Most of the literature uses different or privately collected datasets, making it im- possible to apply our model to their datasets. (2) The codes of the literature are not open source, so our dataset cannot be directly applied to their models. Nevertheless, based on the descriptions of the methods in the literature, we have successfully reproduced the model and performed the.
classification task using our dataset. From Table 8, it can be observed that our method achieved an accuracy of 0.9717 with an AUC value of 0.9855. In comparison, the accuracies for Method 1 and Method 2 from Literature [37] and Literature [38] on our dataset were 0.9686 and 0.9490, respectively, with corresponding AUC values of 0.9722 and 0.9719. In terms of precision, our method (0.9765) was lower than the method used in Literature [37] (0.9846). However, considering the overall performance based on the evaluation metrics of accuracy, precision, recall, F1 score, and AUC value, our proposed method clearly demonstrates the best performance in the classification task.
Additionally, as shown in Table 9 and Fig. 7, the model designed in this study exhibits classification errors. On one hand, this could be attributed to the relatively small amount of data used in this study, especially in class 3 (560 cases), class 4B (456 cases), and class 4C (492 cases), where the data is significantly limited. Moreover, according to the TI-RADS classification standard, classes 3, 4A, 4B, and 4C may exhibit both benign and malignant features. When benign features are present, one point is subtracted from the total count. This discrepancy may lead to misclassification when the model fails to accurately identify benign or malignant features. On the other hand, insufficient feature extraction from ultrasound images may also contribute to the errors. Despite using both deep features and traditional handcrafted features during the extraction process, the handcrafted features may not fully exploit the information within the images, resulting in inaccurate grading by the model.
Furthermore, this study also has certain limitations: (1) Before extracting medical prior knowledge features, it is crucial to obtain highly accurate nodule contour information, which is essential for subsequent feature engineering. (2) The dataset used in this study was collected from 922 patients, comprising a total of 922 images. Although data augmentation was performed, these data are still somewhat limited, which may lead to a lack of model generalization. In future research, we will introduce networks with stronger segmentation performance into the model to further improve classification performance. Additionally, we will collect more thyroid ultrasound images from different categories to validate the model and assess its potential for extending to real-world applications.
Experimental demonstrates indicate that the approach proposed in this study, which incorporates medical prior knowledge into deep learning, is effective for the multi-classification task of thyroid nodule ultrasound images. The integration of medical prior knowledge features further improves the classification accuracy of deep learning models, underscoring the significant role of medical expertise in classification tasks. The following contributions can be observed in this paper: (1) By introducing multi-head self-attention mechanism in the ShuffleNetV2 model, added after the last channel shuffle operation in each stage, more discriminative features of thyroid nodules are extracted. (2) A novel integration architecture is proposed, which combines the feature space formed by quantifying the pathological features of thyroid nodules described by C-TIRADS, a medical prior knowledge, with the feature space obtained through convolutional neural networks, for multi-classification tasks of thyroid nodule ultrasound images.
Conclusion
This paper presents an integrated deep learning model. In the preprocessing stage, it employs a manually annotated algorithm to remove artifacts that might influence the classification results. Subsequently, it applies a combination of median filtering and contrast-limited adaptive histogram equalization to denoise and enhance the images. In the feature extraction stage, the paper quantifies ultrasound features from the TI-RADS description of thyroid nodules, including echogenicity, microcalcifications, margin, shape, and aspect ratio. It then extracts deep features from the ShuffleNetV2 model improved with a self-attention mechanism. Finally, it concatenates and fuses these two types of features and feeds them into an XGBoost classifier for classification.
The study utilizes feature engineering and deep learning to mine and learn ultrasound features from thyroid images and employs the machine learning-based XGBoost classifier to perform thyroid nodule-level evaluation. The approach achieves an accuracy of 0.9717 and an AUC value of 0.9855 on a private dataset. These results indicate that the method effectively discriminates between thyroid nodules of different grades and can assist medical professionals in thyroid nodule diagnosis, potentially reducing the burden on patients by avoiding unnecessary surgeries.
Author Contribution
All authors contributed to the study conception and design. Hua Chen conceived the idea; material preparation, data collection, and analysis were performed by Chong Liu, Xiaoshi Cheng Chenjun Jiang, and Ying Wang. The first draft of the manuscript was written by Hua Chen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
The project was supported by the S&T program of Hebei No.21377777D.
Declarations
Ethical Approval
This study was approved by the ethics committee of the relevant institutions.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Conflict of Interest
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Hua Chen, Email: chenhua@ysu.edu.cn.
Ying Wang, Email: dina513@163.com.
References
- 1.Camargo, R.Y. and E.K. Tomimori.: [Usefulness of ultrasound in the diagnosis and management of well-differentiated thyroid carcinoma]. Arq Bras Endocrinol Metabol, 51(5): 783-792, 2007. [DOI] [PubMed] [Google Scholar]
- 2.Enewold, L., et al.: Rising thyroid cancer incidence in the United States by demographic and tumor characteristics, 1980-2005. Cancer Epidemiol Biomarkers Prev, 18(3): 784-791, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Russ, G., et al.: Thyroid incidentalomas: epidemiology, risk stratification with ultrasound and workup. Eur Thyroid J, 3(3): 154-163, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu, H., et al.: Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin. Front Cell Dev Biol 9: 619330-619330, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Park, S.H., et al.: Interobserver agreement in assessing the sonographic and elastographic features of malignant thyroid nodules. AJR Am J Roentgenol 193(5): 416-423, 2009. [DOI] [PubMed] [Google Scholar]
- 6.Shin, J.H., et al.:Ultrasonography Diagnosis and Imaging-Based Management of Thyroid Nodules: Revised Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J Radiol 17(3): 370-95, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Haugen, B.R.: 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: What is new and what has changed? Cancer 123(3): 372-381, 2017. [DOI] [PubMed] [Google Scholar]
- 8.Zhou, J., et al.: 2020 Chinese guidelines for ultrasound malignancy risk stratification of thyroid nodules: the C-TIRADS. Endocrine 70(2): 256-279, 2020. [DOI] [PubMed] [Google Scholar]
- 9.Choi, S.H., et al.: Interobserver and intraobserver variations in ultrasound assessment of thyroid nodules. Thyroid 20(2): 167-172, 2010. [DOI] [PubMed] [Google Scholar]
- 10.Park, C.S., et al.: Observer variability in the sonographic evaluation of thyroid nodules. J Clin Ultrasound 38(6): p. 287-93, 2010. [DOI] [PubMed] [Google Scholar]
- 11.Kim, H.G., et al.: Man to man training: can it help improve the diagnostic performances and interobserver variabilities of thyroid ultrasonography in residents? Eur J Radiol 81(3): 352-356, 2012. [DOI] [PubMed] [Google Scholar]
- 12.Jeong, E.Y., et al.: Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators. Eur Radiol 29(4): 1978-1985, 2019. [DOI] [PubMed] [Google Scholar]
- 13.Keramidas, E.G., et al.: Efficient and effective ultrasound image analysis scheme for thyroid nodule detection. in Image Analysis and Recognition: 4th International Conference, ICIAR 2007, Montreal, Canada, August 22–24, 2007. Proceedings 4. Springer, 2007.
- 14.Savelonas, M., D. Maroulis, and M. Sangriotis.: A computer-aided system for malignancy risk assessment of nodules in thyroid US images based on boundary features. Comput Methods Programs Biomed 96(1): 25-32, 2009. [DOI] [PubMed] [Google Scholar]
- 15.Iakovidis, D.K., E.G. Keramidas, and D. Maroulis.: Fusion of fuzzy statistical distributions for classification of thyroid ultrasound patterns. Artif Intell Med 50(1): 33-41, 2010. [DOI] [PubMed] [Google Scholar]
- 16.Yuan, H., Y. Wu, and M.J.J.o.D.I. Dai.: Multi-Modal Feature Fusion-Based Multi-Branch Classification Network for Pulmonary Nodule Malignancy Suspiciousness Diagnosis. J Digit Imaging 36(2): 617–626, 2023. [DOI] [PMC free article] [PubMed]
- 17.Simonyan K, Z.A.: Very deep convolutional networks for large-scale image recognition. The 3rd International Conference on Learning Representations (ICLR2015), 2015.
- 18.Zhu, Y.C., et al.: Thyroid ultrasound image classification using a convolutional neural network. Ann Transl Med 9(20): 1526, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chi, J., et al.: Thyroid Nodule Classification in Ultrasound Images by Fine-Tuning Deep Convolutional Neural Network. J Digit Imaging 30(4): p. 477-486, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yang, J., et al.: Ultrasound Image Classification of Thyroid Nodules Based on Deep Learning. Front Oncol 12: 905955-90955, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Feres, J., A. Noura, and K. Nawres.: Automatic classification of ultrasound thyroids images using vision transformers and generative adversarial networks. J Scientific African 20, 2023.
- 22.Minglei, L., et al.: SDA-Net: Self-distillation driven deformable attentive aggregation network for thyroid nodule identification in ultrasound images. J Artificial Intelligence In Medicine 146: 102699-102699, 2023. [DOI] [PubMed] [Google Scholar]
- 23.Yu, R., et al.: Feature discretization-based deep clustering for thyroid ultrasound image feature extraction. Comput Biol Med 146: 105600, 2022. [DOI] [PubMed] [Google Scholar]
- 24.Hang, Y.: Thyroid Nodule Classification in Ultrasound Images by Fusion of Conventional Features and Res-GAN Deep Features. J Healthc Eng: 9917538, 2021. [DOI] [PMC free article] [PubMed]
- 25.Hua, C., Minglun, M., Gang, L., Ying, W., Zhihao, J., & Chong, L.: Breast Tumor Classification in Ultrasound Images by Fusion of Deep Convolutional Neural Network and Shallow LBP Feature. J Digit Imaging: 36(3), 932-946, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yi, T., et al.: Deep learning for the diagnosis of suspicious thyroid nodules based on multimodal ultrasound images
. J Frontiers in Oncology 12: 1012724–1012724, 2022. [DOI] [PMC free article] [PubMed]
- 27.Gao, Y., et al.: An integrated model incorporating deep learning, hand-crafted radiomics and clinical and US features to diagnose central lymph node metastasis in patients with papillary thyroid cancer. J BMC cancer 24(1): 69-69, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lu, T., et al.: TS-DSANN: Texture and shape focused dual-stream attention neural network for benign-malignant diagnosis of thyroid nodules in ultrasound images. J Medical image analysis 89: 102905-102905, 2023. [DOI] [PubMed] [Google Scholar]
- 29.Seetharaman, R., M. Tharun, and K. Anandan.: RETRACTED: A Novel approach in Hybrid Median Filtering for Denoising Medical images. IOP Conference Series: Materials Science and Engineering 1187(1) , 2021.
- 30.Ronneberger, O., Fischer, P.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 2015.
- 31.Chen, T. and C. Guestrin.: XGBoost, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 785–794, 2016.
- 32.Telea.: An image inpainting technique based on the fast marching method. Journal of graphics tools 9(1): 23–34, 2004.
- 33.Han, X., et al.: Thyroid nodule ultrasound image feature extraction technique based on TI-RADS 42(S2): p. 126-130, 2015. [Google Scholar]
- 34.Krizhevsky, A., I. Sutskever, and G.E Hinton.: Imagenet classification with deep convolutional neural networks 25, 2012.
- 35.Szegedy, C., et al.: Going deeper with convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
- 36.He, K., et al.: Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
- 37.Huang, G., et al.: Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
- 38.Tan, M. and Q. Le.: Efficientnet: Rethinking model scaling for convolutional neural networks. in International conference on machine learning, PMLR, 2019.
- 39.Zhang, X., et al.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
- 40.Ma, N., et al.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. in Proceedings of the European conference on computer vision (ECCV). 2018.
- 41.Vaswani, A., et al.: Attention is all you need. Neural Information Processing Systems (NeurIPS) 30, 2017
- 42.Duan, X., et al.: An Ensemble Deep Learning Architecture for Multilabel Classification on TI-RADS. in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2020.
- 43.Sandler, M., et al.: Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
- 44.Yu, H., et al.: Intelligent diagnosis algorithm for thyroid nodules based on deep learning and statistical features. Biomedical Signal Processing and Control 78, 2022.








