Skip to main content
Journal of Imaging Informatics in Medicine logoLink to Journal of Imaging Informatics in Medicine
. 2025 Mar 4;38(6):3741–3756. doi: 10.1007/s10278-025-01463-0

Spatial–Temporal Information Fusion for Thyroid Nodule Segmentation in Dynamic Contrast-Enhanced MRI: A Novel Approach

Binze Han 1,2,#, Qian Yang 3,#, Xuetong Tao 2, Meini Wu 3, Long Yang 2, Wenming Deng 3, Wei Cui 4, Dehong Luo 3, Qian Wan 2, Zhou Liu 3,, Na Zhang 2,5,
PMCID: PMC12701142  PMID: 40038135

Abstract

This study aims to develop a novel segmentation method that utilizes spatio-temporal information for segmenting two-dimensional thyroid nodules on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Leveraging medical morphology knowledge of the thyroid gland, we designed a semi-supervised segmentation model that first segments the thyroid gland, guiding the model to focus exclusively on the thyroid region. This approach reduces the complexity of nodule segmentation by filtering out irrelevant regions and artifacts. Then, we introduced a method to explicitly extract temporal information from DCE-MRI data and integrated this with spatial information. The fusion of spatial and temporal features enhances the model’s robustness and accuracy, particularly in complex imaging scenarios. Experimental results demonstrate that the proposed method significantly improves segmentation performance across multiple state-of-the-art models. The Dice similarity coefficient (DSC) increased by 8.41%, 7.05%, 9.39%, 11.53%, 20.94%, 17.94%, and 15.65% for U-Net, U-Net +  + , SegNet, TransUnet, Swin-Unet, SSTrans-Net, and VM-Unet, respectively, and significantly improved the segmentation accuracy of nodules of different sizes. These results highlight the effectiveness of our spatial-temporal approach in achieving accurate and reliable thyroid nodule segmentation, offering a promising framework for clinical applications and future research in medical image analysis.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10278-025-01463-0.

Keywords: Thyroid nodules, Thyroid nodule segmentation, Dynamic contrast-enhanced MRI (DCE-MRI), Spatial–temporal information fusion, Transformer

Introduction

Thyroid nodules (TNs) are a prevalent clinical condition, affecting approximately 25% of the general population, with only a minority being malignant (10%) or clinically significant due to compressive symptoms (5%) or thyroid dysfunction (5%) [13]. Accurate diagnosis and management of TNs depend heavily on differentiating benign from malignant nodules, which remains a critical challenge in clinical practice. Ultrasonography (US) has been the primary imaging modality for thyroid evaluation [410]. However, US is inherently limited by operator dependence, variable image quality, and lower specificity in distinguishing benign from malignant nodules, particularly in cases with ill-defined margins or overlapping structures [810].

Magnetic resonance imaging (MRI), known for its non-radiative nature, high soft tissue resolution, and objective 3D imaging capabilities, has gained widespread popularity among clinicians. With its ability to utilize various imaging sequences to depict both the structural and functional characteristics of tissues, MRI is a versatile diagnostic tool [10]. Among these sequences, dynamic contrast-enhanced MRI (DCE-MRI) is one of the most commonly used functional sequences [11, 12]. By monitoring signal changes over a period following the injection of a contrast agent, DCE-MRI provides an intuitive reflection of tissue hemodynamics. Through the analysis and processing of these hemodynamic data, particularly the calculation and evaluation of quantitative parameters such as Ktrans, Kep, and Ve within lesions, DCE-MRI plays a significant role in diagnosing benign and malignant lesions, pathological grading, treatment efficacy evaluation, and postoperative prognosis prediction [13]. In this process, high-precision lesion segmentation is crucial to ensure the accuracy and reliability of quantitative parameter calculations, thereby enhancing diagnostic precision and improving the scientific basis for treatment decisions. This not only provides robust technical support for lesion quantification but also lays a solid foundation for the advancement of personalized medicine.

In the segmentation of thyroid nodules, common challenges include similar signal intensity and low contrast between nodules and surrounding tissues, which makes it difficult to accurately distinguish the boundary. The nodules are highly heterogeneous in shape, size, and internal structure, which requires the segmentation algorithm to have strong generalization ability. And a large number of nodules are small, increasing the risk of missed detection and missed segmentation in low-resolution or noisy environments. Traditionally, manual segmentation has been the standard, but it is time-consuming and subject to the inter-observer variability. Automation of TN segmentation offers a promising solution [14]. Advancements in segmentation algorithms have sought to address these challenges. Early approaches, such as thresholding, region growing, and edge detection, were effective under optimal conditions but often failed in the presence of heterogeneous nodule characteristics or poor image quality [10, 1517]. More recently, machine learning and deep learning techniques have revolutionized the field [9]. In particular, convolutional neural networks (CNNs), U-Net architectures and their variants, and Transformers have demonstrated remarkable performance in accurately delineating nodule boundaries [1823]. These methods have predominantly been developed and applied to ultrasonography (US) for TN evaluation. For example, Ma et al. developed a CNN-based model for TN segmentation using 2D US images [19]. Their approach generated segmentation probability maps from image patches, achieving a Dice coefficient of 0.922. Pan et al. utilized U-Net as the backbone for TN segmentation, incorporating a single-channel semantic map at each decoding step to guide low-level features [22]. This approach achieved a Dice coefficient of 0.729 on the Thyroid Digital Image Database, an external high-quality TN US dataset, outperforming traditional U-Net and U-Net +  + by 2.0% and 2.4%, respectively. More recently, Transformers have gained attention for their ability to capture long-range dependencies and global context in images, making them well suited for tasks requiring a broader understanding of spatial relationships [23]. Hybrid architectures that combine the local feature-capturing capabilities of CNNs with the global context modeling of Transformers have shown great promise in improving segmentation accuracy [9]. For instance, Li et al. integrated CNNs and Transformers with a boundary attention mechanism, achieving a Dice coefficient of 0.892. Similarly, Deng et al. proposed STU3Net, which combines a modified Swin Transformer with a CNN encoder to extract morphological features and edge details of TNs in US images. The model incorporated a three-layer U-Net with cross-layer connectivity to merge shallow and deep network information through skip connections [24]. STU3Net achieved a Dice coefficient of 0.837 on the external dataset, demonstrating its effectiveness in TN segmentation.

TN segmentation research has predominately focused on US, while DCE-MRI-based TN segmentation remains underexplored. Despite studies highlighting its potential in detecting and characterizing thyroid lesions, assessing loco-regional invasion, and identifying metastatic nodes [2528], DCE-MRI has been more extensively studied in other clinical contexts, such as brain tumors, breast cancer, and other tumor types segmentation [2932]. Although these advances demonstrate the effectiveness of DCE-MRI segmentation in other situations, however, the limited studies that focus on DCE-MRI-based segmentation predominantly utilize images from a specific static phase, as recommended by clinicians, as the input for segmentation models. These studies typically rely solely on two-dimensional spatial information for lesion segmentation. This conventional approach has inherent limitations that compromise both segmentation performance and generalizability:

  1. Limited phase utilization: Relying on a specific static phase, selected based on clinical recommendations, fails to capture individual variations in contrast agent enhancement dynamics. This approach risks suboptimal segmentation, particularly when the chosen phase does not represent the maximal contrast between the lesion and normal tissue for a specific patient.

  2. Underutilization of prior medical knowledge: Lesions and normal tissues often exhibit similar morphological characteristics and tissue properties, resulting in minimal contrast enhancement differences in the lesion area during dynamic imaging. Furthermore, the magnetic resonance signal characteristics of both tissues are relatively similar, making it exceedingly challenging to directly segment lesions from DCE-MRI images without additional contextual or physiological information.

  3. Neglect of temporal dynamics: Many existing studies disregard the pharmacokinetic information inherent in DCE-MRI, failing to fully leverage the temporal contrast enhancement dynamics. While a few studies have incorporated time-series information, these efforts often lack interpretability and robust evidence to support their effectiveness. Consequently, the potential of temporal information remains underexplored, limiting the accuracy and robustness of lesion segmentation.

This study aims to address this gap by proposing a novel automated segmentation pipeline for TNs on DCE-MRI. To this end, we propose a novel automated segmentation pipeline that effectively integrates spatial details and temporal hemodynamic patterns from DCE-MRI, providing improved accuracy and robustness. In summary, the main contributions of our work are listed as follows:

  1. We proposed a method to select the best phase by dynamically analyzing the gradient variation of each phase on DCE-MRI. This method fully considers the individual differences of different cases and adaptively selects the phase image with the largest spatial contrast between lesion tissue and normal tissue, thereby improving the accuracy and applicability of segmentation.

  2. We proposed a two-stage method to segment the thyroid gland first and then the thyroid nodules, incorporating thyroid morphological priors to guide the model’s focus on key regions, reducing task complexity and enhancing segmentation accuracy.

  3. We developed an innovative temporal feature extraction scheme and a spatio-temporal feature fusion method. This approach leverages the pharmacokinetic information inherent in DCE-MRI, enhancing the robustness and accuracy of the model, particularly in complex clinical scenarios.

Materials and Methods

Dataset

Study Population

A total of 136 consecutive patients, aged 18 years or older, who were scheduled for surgical resection of a thyroid mass between December 2020 and December 2022, were included in this retrospective study. Among these patients, 152 thyroid nodules were identified, comprising 18 benign and 134 malignant nodules. The maximum longitudinal diameter of the nodules ranged from 0.5 to 4.3 cm (mean 1.506 cm, SD 0.810 cm), while the minimum longitudinal diameter ranged from 0.2 to 3.8 cm (mean 1.181 cm, SD 0.695 cm). Of the 134 malignant nodules, 127 were papillary thyroid carcinoma, 3 were differentiated thyroid carcinoma, 2 were follicular thyroid carcinoma, 1 was medullary thyroid carcinoma, and 1 was Hodgkin’s lymphoma. Pathological grading was not applicable for these malignant nodules.

This study was approved by the institutional review board of Cancer Hospital Chinese Academy of Medical Sciences, Shenzhen Hospital, and informed consent was waived from the patients due to the retrospective study design.

MR Imaging

All MR images were acquired on a 3 T MR scanner (Discovery MR 750w, General Electric Healthcare, Waukesha, WI) equipped with a dedicated surface coil exclusively designed for the thyroid gland. Individuals were laid head first, in the supine position, onto the gantry.

High-resolution T2-weighted images were elicited for the purpose of anatomical localization with a spin echo sequence set to these parameters: TR/TE = 5.1 ms/2.1 ms, slice thickness 4 mm, a field of view (FOV) of 22 cm, and an acquisition and reconstruction matrix of 256 × 224. The duration of each scan was 9 s. Thirty-five scans were conducted with free respiration. Gadoteric acid (DOTA Gd) (Dotarem; Guerbet, Aulnay-sous-Bois, France) at 0.1 mmol/kg body weight was injected via the intravenous route at a rate of 2.0 ml/s with the help of high-pressure syringes. This was followed by the injection of 0.9% 20 ml saline solution (w/v) at the same rate.

During dynamic contrast-enhanced MRI (DCE-MRI), 1–2 baseline frames were acquired prior to contrast administration, followed by repeated T1-weighted volumes every 9 s for a total of 35 dynamic frames. In our experience, the earliest visible thyroid enhancement is typically observed starting from the second dynamic frame, though in some cases it may appear as late as the fourth frame.

Image Analysis

Thyroid nodules in all DCE-MRI data were annotated by two experienced radiologists (Q.Y. and M.N.W.), and for 30% of the data (40 cases), the thyroid gland was also annotated. Since most of the images in the data were 512 × 512 in matrix size and a small number of data images were 256 × 256 in matrix size, we uniformly processed the data images into 256 × 256 by bilinear interpolation.

The Proposed Method

In this study, we proposed a novel method for TN segmentation using DCE-MRI, comprising three main steps: selecting the best phase, segmentation of the thyroid gland, and the spatio-temporal segmentation module. The first step employs a gradient-based approach to identify the phase at which the TN exhibits the highest contrast relative to its background [33]. Next, a semi-supervised method is used to generate a mask of the thyroid gland, minimizing the impact of non-thyroid structures on the subsequent segmentation workflow [34].

Finally, the spatio-temporal segmentation module integrates spatial and temporal features to improve nodule delineation and includes three key elements: the Image to Temporal Sequence (ITTS) module, which transforms multi-phase DCE-MR images into temporal sequences to capture enhancement dynamics; the Temporal Sequence Prediction (TSP) module, which models temporal dependencies and refines predictions based on dynamic patterns; and the Spatio-Temporal Fusion Segmentation Network, which combines spatial and temporal features to achieve precise segmentation by leveraging both nodule textural information and contrast kinetics. Together, these components exploit the unique capabilities of DCE-MRI for TN segmentation and provide a comprehensive framework for accurate and robust analysis.

Select the Best Phase

A gradient-based approach is employed to determine the phase at which the TN exhibits the highest contrast relative to its background in DCE-MRI (Fig. 1). Let {Vt,t=1,2,,T} represent the time-series imaging data for each patient, captured from pre-contrast to post-contrast injection at different time intervals. Each Vt=Fst,s=1,2,,K,t=1,2,,T consists of K two-dimensional imaging slices depicting the spatial structure of the thyroid at time point t. The initial TN mask Gst,(s=1,2,,K)=gt,t=1,2,,T underwent morphological operations, specifically dilation and erosion, to generate expanded and shrunk masks, respectively (Fig. 1(a)). By subtracting the eroded mask from the dilated mask, we derived the boundary mask {gtφ,t=1,2,,T}. For each slice Fst,(s=1,2,,K)={ft,t=1,2,,T}, the gradient map (Fig. 1(c)) was calculated {ft,t=1,2,,T}. The boundary map gtφ was then applied as a filter, multiplying it with the gradient map ft to extract gradient information specifically at the lesion boundary (Fig. 1(d)). Subsequently, the sum of the gradient values along the TN boundaries in each imaging slice at each time point was calculated, providing a comprehensive measure of signal intensity changes around the boundary. This data was used to plot a curve representing the dynamic change in gradient values for the inner and outer boundaries over time (Fig. 1(e)). By statistically analyzing the peak of this curve, the optimal time point ftbest was identified, corresponding to the phase with the maximum contrast between the TN and the surrounding background tissue.

Fig. 1.

Fig. 1

Proposed segmentation framework. (a) MRI images marking the inner and outer boundaries of lesions. (b) Label the inner and outer boundaries of the lesion. (c) The image gradient of MRI images. (d) Image gradient of the inner and outer boundaries of the lesion. (e) Gradient variation curve of lesion outer boundary image over time. (f) Predicted thyroid mask. (g) Predicted thyroid mask with connected regions labeled. (h) Optimized predicted thyroid mask

Semi-supervised Segmentation of the Thyroid Gland

Prior to TN segmentation, the thyroid gland was first segmented. Among the total cases, 40 cases were annotated with thyroid gland labels. Of these, 35 cases were designated as the training set, while the remaining 5 cases were used as the test set. Additionally, 96 cases lacking thyroid gland annotations were grouped into four sets of 24 cases each, which were used for semi-supervised segmentation. A U-Net-based thyroid gland segmentation model was initially trained on the annotated training and test sets. The trained model was then applied to predict and segment the thyroid gland in the first group of unannotated cases. The pseudo-labeled data generated from this first group were subsequently added to the training set for a second round of training. This iterative process continued, with each cycle evaluated against the fixed test set, until thyroid masks were successfully generated for all unannotated cases.

Given clinical knowledge that thyroid nodules typically do not appear at the edges of the thyroid gland, the boundary of the segmented thyroid mask did not require high precision. However, it was necessary to ensure that the mask was sufficiently large to encompass the entire thyroid gland. Thus, the evaluation criteria emphasized minimizing false negatives while tolerating a certain level of false positives. To achieve this, the Tversky Loss function was employed, which allows for the adjustment of the weights of false positives (FP) and false negatives (FN). This weighting mechanism enabled the model to prioritize the reduction of FN while accepting a higher rate of FP.

The application of Tversky Loss occasionally resulted in the generation of false positive regions within the predicted thyroid masks (Fig. 1(f)) [35]. Acceptable regions were those that were connected to the true thyroid tissue, as they did not interfere with subsequent analysis. In contrast, disconnected false-positive regions were discarded. To refine the segmentation, all connected regions in the thyroid prediction mask were identified, and the distance between their centroid coordinates and the center of the image was calculated. Since thyroid tissue in MRI scans is generally located near the center of the image, any region exceeding a predefined distance threshold (Clinician recommendation 70 voxels) was removed, as illustrated in Fig. 1(h). This method can quickly remove most of the obvious false-positive areas of the thyroid, and for the false-positive areas within the threshold range, the subsequent thyroid nodule segmentation network can be easily removed using spatio-temporal characteristics.

A statistical analysis of the longest diameters across all two-dimensional thyroid gland masks indicated that a matrix size of 160 × 160 pixels was sufficient to encompass the entire thyroid gland. Based on this finding, the cropped thyroid masks were multiplied by the corresponding cropped images to isolate the thyroid tissue, removing any extraneous regions. This preprocessing step ensured that the subsequent thyroid nodule segmentation network would focus solely on the thyroid region, thereby reducing network complexity and improving segmentation accuracy. Furthermore, the cropping operation reduced the image size, alleviating computational burdens and conserving resources.

For the final preprocessed images, centered on the thyroid region and with a matrix size of 160 × 160, we input the best phase ftbest and DCE sequence images Fst,(s=1,2,,K)={ft,t=1,2,,T} after filtering by the predicted thyroid gland mask into the spatio-temporal segmentation module. By filtering out irrelevant regions, this preprocessing pipeline not only optimized the segmentation task but also enhanced computational efficiency and segmentation performance.

Spatio-temporal Segmentation Module

To extract pharmacokinetic characteristics from DCE-MRI imaging, the temporal signal dynamics of each pixel in the image were treated as a classification task. An intuitive approach is to convert the two-dimensional dynamic DCE image data into a one-dimensional temporal sequence with a time dimension T and then predict the classification based on the temporal sequence. However, patient movement or involuntary actions (e.g., swallowing) during imaging can introduce misalignments, making it insufficient to rely solely on the one-dimensional temporal sequence of each pixel for classification. To address this, a depth-wise convolution with specific weights was designed to incorporate spatial context by capturing the signal dynamics within a defined neighborhood around each pixel, minimizing the temporal modeling errors caused by pixel misalignments during the dynamic imaging process.

Let {ft,t=1,2,,T}RH×W×T represent the dynamic contrast-enhanced thyroid images after thyroid mask filtering, spanning the time from pre-enhancement (t = 1) to post-enhancement (t = T). Here, H and W denote the height and width of the image, respectively, and T represents the temporal dimension of the DCE-MRI imaging. To capture the local spatial context around each pixel, depth-wise convolution was applied using k2 convolution kernels of size k × k. These convolution kernels are structured with a specific position weight of 1 and all other weights set to 0, as shown in Eq. (1) (with k = 3, as an example). The convolution results represent the original image shifted in the directions defined by the nine kernels.

The resulting outputs were reshaped into a matrix of shape (k2, H × W, T) through concatenation and dimension transformation. In this representation, the first dimension corresponds to the pixel values within the k × k range around each pixel, while the second dimension aggregates the pixel values within the k2 neighborhood across all pixels in a specific direction [36]. The transformation process is shown in Fig. 2, where the original dynamic sequence {ft,t=1,2,,T}RT×H×W is transformed into {Si,i=1,2,,N}RN×T×9, where N = H × W represents the total number of pixels in the image. Here, Si denotes a one-dimensional temporal sequence of length T, and 9 corresponds to the spatial information derived from the 3 × 3 neighborhood of each pixel.

W1=100000000,W2=010000000,W9=000000001#1
Fig. 2.

Fig. 2

Image to temporal sequence module (ITTS)

As illustrated in Fig. 3, the ITTS module converts all two-dimensional dynamic image datasets into one-dimensional temporal sequence datasets. Time sequences corresponding to pixels outside the thyroid gland, identified as having all zero values, were excluded. Subsequently, the TSP (Temporal sequence prediction module) (Fig. 4) was used to classify the remaining one-dimensional temporal sequence data. The cross-entropy loss was used for classification. Function was employed during classification training to optimize the prediction performance.

Fig. 3.

Fig. 3

Spatio-temporal segmentation module

Fig. 4.

Fig. 4

Time sequence prediction module (TSP)

As illustrated in Fig. 4, the TSP module first projects the temporal sequence vector (N,T,9) into an embedding dimension of d = 48 via a fully connected layer, providing sufficient representational capacity while maintaining feasible computational costs. Subsequently, an 8-head multi-head self-attention mechanism is applied to capture dependencies across time points, thereby enhancing the representation of temporal dynamics. Following the attention block, a two-layer MLP adopts an “expand-and-squeeze” approach—initially mapping from d to 4d, then back to d—with a non-linear activation (e.g., GELU) and dropout for improved expressiveness and reduced overfitting. To further refine features, one-dimensional max pooling is performed along the time axis, highlighting the most salient signals at each step. Finally, after the last layer of the self-attention module, we fuse all remaining time steps via one-dimensional average pooling, effectively consolidating the extracted temporal features into a unified representation.

Finally, each one-dimensional temporal sequence vector is mapped to a value between 0 and 1 using a fully connected layer followed by a sigmoid activation function. A value closer to 1 indicates a higher likelihood that the corresponding pixel in the temporal sequence represents a lesion, whereas a value closer to 0 suggests normal tissue. The classified one-dimensional temporal sequence vectors are then reconstructed into a two-dimensional image, resulting in the temporal prediction map.

The segmentation of TNs begins with a spatial network applied to MRI images captured during the optimal imaging phase. This step aims to leverage the spatial characteristics of both diseased and normal tissues in MRI thyroid imaging to generate a segmentation probability map based on spatial features. Subsequently, the Spatio-Temporal Fusion (STF) module is utilized to integrate the spatial segmentation probability map with the temporal prediction map derived from pharmacokinetics. This fusion process combines spatial and temporal information to produce the final segmentation result, ensuring more accurate and robust nodule delineation. The entire workflow is depicted in Fig. 3.

Implementation Details

The models were implemented using the PyTorch framework and trained on an NVIDIA RTX A6000 GPU. The Adam optimizer was employed for model training, with an initial learning rate set to 10 −3 and a batch size of 32. All models were initialized randomly, without pre-training on any external datasets.

Evaluation Metrics

To rigorously evaluate the thyroid nodule segmentation results, we conducted a fivefold cross-validation with patient-level splitting. In each fold, five performance metrics were employed to quantify model accuracy and robustness: precision (PRE), recall (REC), Dice similarity coefficient (DSC), intersection over union (IoU), and Hausdorff distance (HD). By enforcing patient-level splitting, we ensured that no single patient’s images appeared in both training and validation sets, thereby preventing data leakage.

Results

In this section, we select eight convolutional and Transformer-based networks that perform well on medical images as Spatial-Nets, including U-Net [37], U-Net +  + [20], SegNet [38], Res-Unet [39], TransUnet [40], Swin-Unet [41], SSTrans-Net [42], and VM-Unet [43]. The conventional DCE-MRI-based segmentation method selects the strongest enhancement phase recommended by doctors as the input to the Spatial-Net for direct segmentation, while clinically it is generally believed that the 10th phase image has the strongest enhancement effect on thyroid nodules. Therefore, we set the 10th phase as the control group (baseline). Three different scenarios were evaluated: the best phase through gradient calculation (best phase), the combination of thyroid mask priors (thyroid mask), and time prediction (temporal prediction). To ensure the reliability of the experiment, all evaluation data in this chapter were obtained through five cross-validations on the entire dataset.

Results of Ablation Experiments

Figure 5 shows the quantitative results of integrating our three proposed segmentation optimization methods into eight classic segmentation networks, compared to conventional segmentation methods (baseline), using commonly used evaluation metrics (Dice coefficient and IOU scores) in the segmentation field. The red values represent the improvement in metrics compared to conventional methods using our three methods within the same network.

Fig. 5.

Fig. 5

a The three methods proposed are applied to Dice coefficients in various networks. b The three methods proposed are applied to IOU scores in various networks. The red numbers indicate the performance improvement of our optimization method compared to using traditional methods (baseline)

We employed the Wilcoxon signed-rank test to assess the statistical significance of the observed improvements in Dice and IOU values. A two-tailed significance level of 0.05 was used as the threshold for determining statistical significance. As summarized in Tables S1 and S2, the p-values for all tested networks were below this threshold, confirming that the performance enhancements achieved through our segmentation optimization methods are statistically significant.

Taking U-Net as an example, the baseline Dice and IOU values were 60.64% and 48.88%, respectively. Incorporating the best phase selection as input led to a significant improvement in both metrics. Further integration of the thyroid mask and the temporal prediction optimization scheme yielded additional gains in segmentation accuracy, as illustrated in the segmentation performance diagram in Fig. 6.

Fig. 6.

Fig. 6

Segmentation effect diagram of ablation experiment based on three models

A similar pattern has also been observed in other network models. For instance, U-Net +  + [20] (Dice + 8.41%, IOU + 8.29%), SegNet [38] (Dice + 9.39%, IOU + 9.56%), Res-Unet [39] (Dice + 6.11%, IOU + 5.96%), TransUnet [40] (Dice + 11.53%, IOU + 11.02%), Swin-Unet [41] (Dice + 20.69%, IOU + 19.82%), SSTrans-Net [42] (Dice + 17.94%, IOU + 16.95%), and VM-Unet [43] (Dice + 15.65%, IOU + 15.53%) all demonstrated improvements over the respective baselines.

Figure 7 presents the performance using the commonly used convolutional-based network U-Net and the Transformer-based network TransUnet as Spatial-Net in the field of medical segmentation and integrating the three methods proposed by us respectively. Our results indicated that the proposed methods achieved superior performance across four key metrics: precision (PRE), recall rate (REC), Dice similarity coefficient (DSC), and intersection over union (IOU). To assess the statistical significance of these improvements, we conducted the Wilcoxon signed-rank test, as detailed in Tables S3 and S4. The analysis revealed that for both U-Net and TransUnet, serving as the Spatial-Nets, all p-values were below 0.05, indicating that the observed performance enhancements are statistically significant.

Fig. 7.

Fig. 7

Integrate the performance of the three proposed methods separately on U-Net and TransUnet

The Segmentation Effect of Nodules for Different Sizes

The size of thyroid nodules is of great value for clinical diagnosis and risk assessment [44, 45], and it is generally believed that nodules less than 1 cm in size have a lower risk and do not need to be evaluated by fine-needle aspiration (FNA) to determine their malignancy. Kamran et al. demonstrated that the increase in the size of thyroid nodules affects the risk of cancer in a non-linear manner, with a threshold effect at 2 cm, and the risk of cancer remains unchanged when the size of the nodule exceeds 2 cm [46]. However, considering the risk of cancer and the possibility that the growth of large nodules may affect the swallowing and breathing of patients, nodules > 4 cm are generally surgically removed clinically [4749]. In our data, the proportions of < 1 cm, 1–2 cm, 2–4 cm, and > 4 cm nodules were 29.58%, 39.92%, 27.93%, and 2.53%, respectively.

Figure 8b, c presents the segmentation performance of our proposed methods compared to the baseline across nodules of varying sizes, using U-Net and the Transformer-based segmentation network TransUnet, respectively. The analysis was conducted across four nodule size categories: < 1 cm, 1–2 cm, 2–4 cm, and > 4 cm. When U-Net was employed as the Spatial-Net, our proposed method demonstrated Dice coefficient improvements of 11.89%, 4.97%, 8.7%, and 14.47% for the respective size categories, compared to the baseline method. Similarly, when TransUnet was utilized as the Spatial-Net, Dice coefficient improvements of 15.04%, 8.43%, 11.59%, and 16.29% were observed for the same size categories. As the Wilcoxon signed-rank test results (Tables S5 and S6) suggested, these improvements showed statistical significance, indicating that our approach consistently enhances segmentation performance across all nodule sizes, with particularly pronounced improvements for very small (< 1 cm) and very large (> 4 cm) nodules. The specific segmentation effects for nodules of different sizes are illustrated in Fig. 7.

Fig. 8.

Fig. 8

a The proportion of nodules of different sizes. b When U-Net is used as Spatial-Net, the Dice coefficient of the proposed method on different nodules. c The Dice coefficient of the method proposed by TransUnet as Spatial-Net on different nodules

The Role of Gradient Calculation in Selecting the Best Phase in Thyroid Nodule Segmentation

Table 1 presents the nodule segmentation performance of eight classic medical segmentation networks under two scenarios: baseline and best phase. The experimental results demonstrate that compared to conventional methods for DCE-MRI segmentation tasks (which typically select the phase with the strongest enhancement based on clinical experience as the network input), using the best phase images obtained through gradient calculation improves the segmentation accuracy of all eight network models. To verify whether these improvements were statistically significant, we conducted a Wilcoxon signed-rank test (Table S7). Across all eight network models used as Spatial-Nets, the p-values for the Dice coefficient were below 0.05, confirming that the differences between our best phase approach and the baseline method are significant. This indicates that our method achieves enhanced model performance by fully accounting for individual differences across cases, enabling more accurate identification and segmentation of thyroid nodules.

Table 1.

Comparison of performance between conventional phase selection method and gradient calculation best phase method

Model PRE (%)↑ REC (%)↑ DSC (%)↑ IoU (%)↑ HD (mm)↓
Baseline Unet 62.92 69.50 60.64 48.88 9.76
Unet +  +  68.04 69.11 62.72 51.12 9.24
SegNet 55.78 65.78 55.09 42.96 10.77
Res-Unet 62.98 72.20 61.15 49.51 9.66
TransUnet 60.68 64.82 55.43 43.96 10.34
Swin-Unet 38.35 59.33 40.56 29.41 14.50
SSTrans-Net 40.91 61.23 42.80 31.49 13.95
VM-Unet 47.89 61.61 49.41 37.81 11.47
Baseline + Best phase Unet 67.12 71.63 63.97 52.24 8.99
Unet +  +  67.77 72.48 64.39 52.64 8.96
SegNet 57.60 66.99 56.71 44.82 10.20
Res-Unet 65.58 74.89 63.75 51.87 9.24
TransUnet 63.26 69.56 59.95 48.33 9.60
Swin-Unet 41.78 59.12 44.14 33.09 13.30
SSTrans-Net 43.03 61.27 44.40 33.17 13.59
VM-Unet 50.46 60.51 51.37 39.91 10.76

The Role of Thyroid Mask in Thyroid Nodule Segmentation

A semi-supervised segmentation method was employed to predict thyroid masks for all cases, utilizing only 30% of the available thyroid gland label data. The segmentation performance, summarized in Table 2, demonstrates that the Dice coefficient on the fixed test set remained stable at approximately 90% over five rounds of training, while the recall rate consistently exceeded 95%. These results indicate that the predicted thyroid masks achieved high accuracy and successfully encompassed the majority of the actual thyroid tissue regions. To determine whether these results were statistically significant, we applied the Friedman test across the five training rounds (Table S7). The p-values for the Dice coefficient and recall rate were 0.421 and 0.354, respectively, both above 0.05, indicating no statistically significant differences among the training rounds. These findings suggest that our pseudo-labels retained high accuracy throughout the semi-supervised training process and that incorporating pseudo-label data to train the dataset did not negatively affect the network’s ability to segment the thyroid gland.

Table 2.

Thyroid mask segmentation result

Method Data (train/test) PRE (%) REC (%) DSC (%) IoU (%) HD (mm)
Model 0 (35)/5 84.46 ± 6.00 95.55 ± 3.31 89.64 ± 3.89 81.44 ± 6.00 10.66 ± 1.90
Model 1 (35 + 24 × 1)/5 85.11 ± 5.96 96.76 ± 2.34 90.43 ± 3.76 82.73 ± 5.94 10.26 ± 1.84
Model 2 (35 + 24 × 2)/5 85.36 ± 5.42 96.93 ± 2.22 90.67 ± 3.42 83.11 ± 5.49 10.14 ± 1.79
Model 3 (35 + 24 × 3)/5 84.47 ± 5.48 97.01 ± 2.24 90.20 ± 3.42 82.31 ± 5.42 10.48 ± 1.87
Model 4 (35 + 24 × 4)/5 87.29 ± 5.12 95.14 ± 3.84 90.82 ± 3.29 83.34 ± 5.31 9.94 ± 1.68

To further refine the segmentation accuracy, over-segmented connected regions were removed from the predicted masks. This optimization step ensured that the thyroid masks more accurately represented the true thyroid tissue areas. Subsequently, all image data were cropped to a standardized size of 160 × 160, facilitating uniform input for downstream processing.

Table 3 presents the experimental results after using the predicted thyroid mask to filter out non-target tissues under both the baseline and best phase scenarios. Compared with the test results in Table 1, we observed improvements in the Dice and IOU scores across all networks after filtering out non-target tissues using the semi-supervised thyroid mask predictions. To verify whether these improvements were statistically significant, we conducted Wilcoxon signed-rank test (Tables S9 and S10). Under both the baseline + thyroid mask and best phase + thyroid mask conditions, the p-values were below 0.05 when compared with their respective baseline methods, indicating a significant difference in performance.

Table 3.

Performance after incorporating the predicted thyroid mask (baseline vs. best phase) and further temporal-pre fusion (baseline + best phase + thyroid mask)

Model PRE (%)↑ REC (%)↑ DSC (%)↑ IoU (%)↑ HD (mm)↓
Baseline + Thyroid mask Unet 65.83 75.66 66.41 54.55 8.80
Unet +  +  66.74 75.55 66.80 54.91 8.61
SegNet 59.14 74.07 61.21 49.58 9.68
Res-Unet 63.94 74.84 63.18 51.55 9.20
TransUnet 61.95 75.43 63.73 51.92 9.13
Swin-Unet 49.58 74.28 54.71 42.33 11.17
SSTrans-Net 49.55 76.42 55.22 42.62 11.54
VM-Unet 58.02 73.83 60.98 49.39 9.63
Baseline + Best phase + Thyroid mask Unet 67.33 75.53 67.87 55.52 8.35
Unet +  +  67.45 76.98 68.07 55.98 8.36
SegNet 61.15 75.12 63.53 51.31 9.06
Res-Unet 64.35 73.77 65.12 53.37 8.60
TransUnet 62.94 76.78 65.57 53.58 8.64
Swin-Unet 51.97 74.71 56.13 43.64 11.09
SSTrans-Net 54.39 76.40 59.16 47.37 10.23
VM-Unet 61.68 73.87 63.45 51.68 9.05
Baseline + Best phase + Thyroid mask + Temporal-pre Unet 68.28 78.21 69.05 57.17 8.33
Unet +  +  68.92 79.11 69.77 57.63 8.33
SegNet 66.57 71.99 64.48 52.52 8.45
Res-Unet 68.65 74.71 67.26 55.47 8.46
TransUnet 67.35 77.89 66.96 54.98 8.59
Swin-Unet 61.94 74.19 61.25 49.23 9.77
SSTrans-Net 58.62 75.88 60.74 48.44 10.05
VM-Unet 63.21 76.37 65.06 53.34 8.92

These findings indicate that the thyroid masks generated through the semi-supervised learning method can more accurately locate the thyroid region, reducing interference from non-target areas in the segmentation task. This substantially decreases the network’s learning burden, improving segmentation efficiency. By eliminating noise and artifacts from non-thyroid regions, the network can focus more effectively on the nodule segmentation task, enhancing the model’s robustness and reliability.

The Role of Temporal Feature Extraction and Fusion in Thyroid Nodule Segmentation

DCE-MRI can indirectly describe the hemodynamic process of contrast agent inflow and outflow through multi-phase MRI, reflecting the microphysiological characteristics of lesions and normal tissues, such as microvascular perfusion, permeability, and vascular density. Therefore, extracting and utilizing temporal sequence features can address the challenges of low contrast and similar MRI signals caused by the morphological and structural similarity between thyroid nodule tissue and normal tissue. This contributes to the accuracy and robustness of the nodule segmentation task.

The experimental results after integrating our three proposed segmentation optimization methods (best phase, thyroid mask, and temporal prediction) are presented in Table 3. Compared with the results in Table 3 (best phase + thyroid mask), incorporating temporal sequence feature prediction (best phase + thyroid mask + temporal pre) yielded further improvements in all segmentation metrics across each network model. To verify whether these improvements were statistically significant, we applied the Wilcoxon signed-rank test (Table S11). Under the eight medical segmentation networks configured as Spatial-Net, the p-values for the Dice metric were all below 0.05, indicating a significant enhancement in performance when temporal information is included. These findings demonstrate that the extraction of temporal features effectively supports DCE-MRI-based segmentation tasks, resulting in more accurate segmentation outcomes.

Discussion

In this section, we summarize the main contributions and differences of the proposed segmentation method compared to existing approaches. Additionally, we analyze the limitations of this study and discuss potential directions for future research.

Considering the Impact of Individual Differences in DCE-MRI Segmentation

In DCE-MRI segmentation, traditional methods often rely on selecting a unified contrast enhancement peak phase for all data as the input to the network. This approach neglects the biological and hemodynamic variability across different cases and regions within the imaging area. In contrast, our method analyzes the magnetic resonance signals of normal and lesion tissues by calculating the image gradient, enabling the selection of the actual contrast enhancement peak for each individual. Experimental results demonstrate that this personalized approach accounts for individual physiological characteristics rather than using a universal phase, significantly improving segmentation accuracy and robustness. This highlights the importance of adapting to physiological and pathological differences for precise segmentation.

Utilizing Prior Knowledge of Thyroid Gland Shape and Location

Unlike natural image segmentation tasks, medical image segmentation benefits from prior knowledge of organ shape and location to guide the segmentation process. Thyroid nodules often share similar morphological and magnetic resonance signal characteristics with normal thyroid tissue, and the nodule region may exhibit minimal contrast enhancement, making direct segmentation particularly challenging. To address this, we proposed a two-stage segmentation approach, where the thyroid gland is first segmented using a semi-supervised method, followed by nodule segmentation. The thyroid mask, generated in the first stage, is used to remove irrelevant tissue regions and refine the final nodule segmentation.

Experiments using data filtered by the thyroid mask obtained through semi-supervised prediction validated the effectiveness of the thyroid mask across different datasets and segmentation models. This strategy not only improved segmentation accuracy but also reduced computational resource requirements by filtering out noise and artifacts from non-thyroid regions. Consequently, the proposed approach accelerates the training process and enhances model efficiency, demonstrating its significant potential for clinical applications.

Application of Pharmacokinetic Information in DCE-MRI

DCE-MRI provides valuable insights into hemodynamics by capturing the inflow and outflow of contrast agents in tissues across multiple phases, reflecting microscopic differences in vascular perfusion, permeability, and density. Despite its potential, temporal information has been underutilized in DCE-MRI segmentation tasks.

In our study, we addressed the challenges posed by patient motion during imaging, such as breathing or swallowing, which can cause pixel shifts across phases. Additionally, the methods that rely solely on the temporal change of individual pixels are often noisy and lack robustness. To overcome these limitations, we proposed a temporal feature extraction method that incorporates temporal changes within a 3 × 3 neighborhood around each pixel, reducing temporal modeling errors. By integrating temporal features into medical image segmentation tasks, our approach compensates for the shortcomings of two-dimensional spatial segmentation in low-contrast scenarios, enhancing the robustness and accuracy of the model under complex imaging conditions.

Excellent Performance in Small Sample and Difficult Tasks

In this study, we selected eight classic segmentation networks commonly used in medical segmentation tasks as Spatial-Net to validate the superiority of our segmentation approach. We observed that Transformer-based networks consistently underperformed compared to convolution-based networks across all scenarios. We attribute this to two main reasons:

  1. The relatively small dataset limits the effectiveness of self-attention mechanisms, leading to overfitting.

  2. The morphological and structural similarities between nodules and normal tissue result in similar MRI signals, making nodule segmentation a challenging task. This task requires fine-grained feature extraction and the capture of local information, areas where convolutional networks excel in terms of capturing local features, edge information, and overall stability.

However, after integrating our proposed methods, all networks showed significant improvements in segmentation accuracy, particularly Transformer-based networks. Their Dice scores improved by 11.53–20.69%, and IOU scores improved by 11.02–19.82%. These results demonstrate that our approach delivers outstanding performance, even when tackling challenging tasks.

Small nodules (< 1 cm) are difficult to segment due to their small size and low contrast, making them highly susceptible to noise. However, our method still performs exceptionally well on small nodules. After integrating it into Unet and TransUnet, the Dice segmentation scores for small nodules improved by 11.89% and 15.04%, respectively, second only to the improvements observed in large nodules (> 4 cm) (14.47% and 16.29%).

Clinical Usability and Generalizability

Although the proposed network adopts a multi-stage design on the back end, it remains practical for clinical use. In practice, clinicians would only need to upload the DCE-MRI sequences; the system then automatically generates a refined segmentation mask. This single-step workflow prevents the additional complexity (introduced for higher segmentation accuracy) from translating into cumbersome procedures or extra manual steps for end-users. Moreover, despite its sophisticated appearance, each component of the network—such as the phase selection module, Spatial-Net, and Temporal-Net—is designed in a modular manner. This modularity allows for individual components to be updated or replaced without necessitating a complete system overhaul. As new techniques or imaging approaches emerge, the network can be incrementally improved, thus offering enhanced adaptability for future clinical deployments.

To demonstrate the generalizability of our method, we evaluated its performance on diverse datasets that included nodules of varying sizes, shapes, and contrast levels. The results consistently showed performance improvements, underscoring the robustness of our approach. Additionally, to enrich data diversity, we employed extensive data augmentation on the images collected from a single medical center. During training, each image had a probabilistic chance of undergoing horizontal/vertical flips, shifts, scaling, rotations, blur, or motion blur. These transformations simulate real-world variations in imaging protocols, thus reducing the risk of overfitting and enhancing the model’s adaptability. Finally, a five-fold cross-validation strategy was applied to obtain a reliable estimate of the model’s performance, which provided a more comprehensive assessment of how the model generalizes to unseen data.

Limitations and Future Work

Although our proposed method has achieved good results in thyroid nodule segmentation tasks, there are still some limitations and shortcomings in this work.

  1. In our experiments, identifying the “Best phase” relies on expert-annotated nodule masks. In routine clinical practice, such ground-truth labels may not be available for new patients. Although our findings demonstrate the benefit of personalized phase selection, additional techniques—such as approximate detection networks, automated bounding-box proposals, or multi-phase inference—are needed to generalize this approach without manually provided labels.

  2. Our data set comes from a single center. Although our current results indicate that the method can handle a substantial degree of data variability within a single center, we fully recognize the importance of validating the model across multiple centers with diverse MRI protocols and patient populations. In future work, we plan to collect a larger and more diverse, multi-institutional dataset to examine the method’s adaptability to different scanner types and demographic factors.

  3. We downsampled 512 × 512 images to 256 × 256 without first performing voxel-based resampling. Although all scans shared the same field of view (20 cm × 20 cm), those originally acquired at 512 × 512 possessed a finer spatial resolution than the 256 × 256 scans. In principle, this downsampling could lead to a slight loss of detail or compromise the visibility of very small or low-contrast lesions. However, since thyroid tissue and typical nodules are generally not so small that their key features appear in only a single pixel, their morphological characteristics and intensity contrasts remain adequately preserved for effective segmentation. Nonetheless, for future studies requiring more precise volumetric or morphological analyses, a complete voxel-based resampling may be warranted.

Future work will focus on addressing these issues, improving the end-to-end performance and robustness of the model, validating the method across multiple centers to ensure applicability and reliability on a broader scale, and optimizing the model to handle a wider variety of clinical scenarios, including the absence of ground-truth labels for personalized phase selection, through semi-supervised or unsupervised approaches.

Conclusions

This study proposed a novel segmentation framework for thyroid nodules based on DCE-MRI. By accounting for individual differences in contrast enhancement timing, the optimal enhancement phase for each case was selected using gradient calculation. Additionally, prior knowledge of thyroid gland shape and location was leveraged to design a two-stage segmentation network, where semi-supervised segmentation of the thyroid gland was followed by segmentation of the thyroid nodules. To further refine segmentation, we introduced methods to integrate pharmacokinetic information from DCE-MRI, enabling the fusion of temporal and spatial features to improve accuracy and robustness.

The results demonstrate that combining temporal and spatial features is an effective strategy for achieving high-precision medical image segmentation. This work provides a foundation for future research and clinical applications in DCE-MRI-based segmentation, contributing valuable insights into integrating pharmacokinetics and spatial priors for improved diagnostic performance.

Supplementary Information

Below is the link to the electronic supplementary material.

Funding

This study was partially supported by the National Key Technology Research and Development Program of China (Grant Nos. 2023YFC3402800 and 2023YFC3402802), the Natural Science Foundation of Guangdong Province (Grant Nos. 2023B1515020002 and 2024B1515040018), the Key Laboratory for Magnetic Resonance and Multimodality Imaging of Guangdong Province (Grant No. 2023B1212060052), and the Guangdong Innovation Platform of Translational Research for Cerebrovascular Diseases. Additional support was provided by the National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (Grant Nos. SZ2020ZD005 and E010321002), the Beijing Medical Award Foundation (Grant No. YXJL-2024-0350-0267), and the Shenzhen High-level Hospital Construction Fund.

Declarations

Ethics Approval

This study was approved by the local institutional review board, and informed consent was obtained from all patients.

Consent for Publication

In this research, we ensure full citation of all referenced content according to the prevailing academic standards. No personally identifiable information or sensitive data is published without explicit consent.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Binze Han and Qian Yang are contributed equally to this work.

Contributor Information

Zhou Liu, Email: zhou_liu8891@yeah.net.

Na Zhang, Email: na.zhang@siat.ac.cn.

References

  • 1.Grani G, Sponziello M, Filetti S, Durante C: Thyroid nodules: diagnosis and management. Nat Rev Endocrinol 20:715-728, 2024 [DOI] [PubMed] [Google Scholar]
  • 2.Uppal N, Collins R, James B: Thyroid nodules: Global, economic, and personal burdens. Frontiers in Endocrinology 14:1113977, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grani G, Sponziello M, Pecce V, Ramundo V, Durante C: Contemporary thyroid nodule evaluation and management. The Journal of Clinical Endocrinology & Metabolism 105:2869-2883, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tappouni RR, Itri JN, McQueen TS, Lalwani N, Ou JJ: ACR TI-RADS: Pitfalls, Solutions, and Future Directions. Radiographics 39:2040-2052, 2019 [DOI] [PubMed] [Google Scholar]
  • 5.Carneiro-Pla D: Ultrasound elastography in the evaluation of thyroid nodules for thyroid cancer. Current Opinion in Oncology 25:1-5, 2013 [DOI] [PubMed] [Google Scholar]
  • 6.Kim DH, Chung SR, Choi SH, Kim KW: Accuracy of thyroid imaging reporting and data system category 4 or 5 for diagnosing malignancy: a systematic review and meta-analysis. European Radiology 30:5611-5624, 2020 [DOI] [PubMed] [Google Scholar]
  • 7.Sharbidre KG, Lockhart ME, Tessler FN: Incidental thyroid nodules on imaging: relevance and management. Radiologic Clinics 59:525-533, 2021 [DOI] [PubMed] [Google Scholar]
  • 8.Li G, Chen R, Zhang J, Liu K, Geng C, Lyu L: Fusing enhanced Transformer and large kernel CNN for malignant thyroid nodule segmentation. Biomedical Signal Processing and Control 83:104636, 2023 [Google Scholar]
  • 9.Li C, Du R, Luo Q, Wang R, Ding X: A Novel Model of Thyroid Nodule Segmentation for Ultrasound Images. Ultrasound Med Biol 49:489-496, 2023 [DOI] [PubMed] [Google Scholar]
  • 10.Chen J, You H, Li K: A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images. Comput Methods Programs Biomed 185:105329, 2020 [DOI] [PubMed] [Google Scholar]
  • 11.Huang J, et al.: Differential diagnosis of thyroid nodules by DCE-MRI based on compressed sensing volumetric interpolated breath-hold examination: A feasibility study. Magn Reson Imaging 111:138-147, 2024 [DOI] [PubMed] [Google Scholar]
  • 12.Turnbull LW: Dynamic contrast-enhanced MRI in the diagnosis and management of breast cancer. NMR Biomed 22:28-39, 2009 [DOI] [PubMed] [Google Scholar]
  • 13.Lu Y, et al.: Using diffusion-weighted MRI to predict aggressive histological features in papillary thyroid carcinoma: a novel tool for pre-operative risk stratification in thyroid cancer. Thyroid 25:672-680, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sorrenti S, et al.: Artificial Intelligence for Thyroid Nodule Characterization: Where Are We Standing? Cancers (Basel) 14, 2022 [DOI] [PMC free article] [PubMed]
  • 15.Chang C-Y, Huang H-C, Chen S-J: Thyroid Nodule Segmentation and Component Analysis in Ultrasound Images. Biomedical Engineering Applications Basis and Communications 22, 2010
  • 16.Keramidas EG, Maroulis D, Iakovidis DK: ΤND: a thyroid nodule detection system for analysis of ultrasound images and videos. J Med Syst 36:1271-1281, 2012 [DOI] [PubMed] [Google Scholar]
  • 17.Jena, Manaswini, S. Prava Mishra, and Debahuti Mishra: A survey on applications of machine learning techniques for medical image segmentation. Internationa Journal of Engineering & Technology 7.4: 4489–4495, 2018
  • 18.Lecun Y, Bottou L, Bengio Y, Haffner P: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278-2324, 1998 [Google Scholar]
  • 19.Ma J, Wu F, Jiang T, Zhao Q, Kong D: Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks. Int J Comput Assist Radiol Surg 12:1895-1910, 2017 [DOI] [PubMed] [Google Scholar]
  • 20.Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: A nested U-Net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Lecture Notes in Computer Science, vol 11045. Springer, Cham; 2018. pp. 3–11. 10.1007/978-3-030-00889-5_1 [DOI] [PMC free article] [PubMed]
  • 21.Oktay O, Schlemper J, Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N, Kainz B, Glocker B, and Rueckert D. Attention U-Net: Learning where to look for the pancreas. 04 2018. 10.48550/arXiv.1804.03999
  • 22.Pan H, Zhou Q, Latecki L. SGUNET: Semantic guided U-Net for thyroid nodule segmentation. In: Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI). 2021. pp. 630–634. 10.1109/ISBI48211.2021.9434051
  • 23.Vaswani A, Brain G, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al.: Attention Is All You Need. Adv Neural Inf Process Syst 5998–6008; 2017
  • 24.Deng X, Dang Z, Pan L: STUNet: An Improved U-Net With Swin Transformer Fusion for Thyroid Nodule Segmentation. International Journal of Imaging Systems and Technology 34:e23160, 2024 [Google Scholar]
  • 25.Kang T, et al.: Magnetic Resonance Imaging Features of Normal Thyroid Parenchyma and Incidental Diffuse Thyroid Disease: A Single-Center Study. Front Endocrinol (Lausanne) 9:746, 2018 [DOI] [PMC free article] [PubMed]
  • 26.Bonjoc KJ, Young H, Warner S, Gernon T, Maghami E, Chaudhry A: Thyroid cancer diagnosis in the era of precision imaging. J Thorac Dis 12:5128-5139, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Renkonen S, et al.: Accuracy of preoperative MRI to assess lateral neck metastases in papillary thyroid carcinoma. Eur Arch Otorhinolaryngol 274:3977-3983, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cho SJ, Suh CH, Baek JH, Chung SR, Choi YJ, Lee JH: Diagnostic performance of MRI to detect metastatic cervical lymph nodes in patients with thyroid cancer: a systematic review and meta-analysis. Clin Radiol 75:562.e561-562.e510, 2020 [DOI] [PubMed] [Google Scholar]
  • 29.Zhou L, Zhang Y, Zhang J, Qian X, Gong C, Sun K, Ding Z, Wang X, Li Z, Liu Z, Shen D: Prototype learning guided hybrid network for breast tumor segmentation in DCE-MRI. IEEE Trans Med Imaging, 10.1109/TMI.2024.3435450, July 29, 2024 [DOI] [PubMed]
  • 30.Nalepa J, et al.: Fully-automated deep learning-powered system for DCE-MRI analysis of brain tumors. Artif Intell Med 102:101769, 2020 [DOI] [PubMed] [Google Scholar]
  • 31.Lv T, et al.: A hybrid hemodynamic knowledge-powered and feature reconstruction-guided scheme for breast cancer segmentation based on DCE-MRI. Med Image Anal 82:102572, 2022 [DOI] [PubMed] [Google Scholar]
  • 32.Zhang J, et al.: A robust and efficient AI assistant for breast tumor segmentation from DCE-MRI via a spatial-temporal framework. Patterns (N Y) 4:100826, 2023 [DOI] [PMC free article] [PubMed]
  • 33.Somkantha K, Theera-Umpon N, Auephanwiriyakul S: Boundary detection in medical images using edge following algorithm based on intensity gradient and texture gradient features. IEEE transactions on biomedical engineering 58:567-573, 2010 [DOI] [PubMed] [Google Scholar]
  • 34.Han K, et al.: Deep semi-supervised learning for medical image segmentation: A review. Expert Systems with Applications:123052, 2024
  • 35.Salehi SSM, Erdogmus D, Gholipour A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In: Wang Q, Shi Y, Suk HI, Suzuki K, eds. Machine Learning in Medical Imaging. MLMI 2017. Lecture Notes in Computer Science, vol 10541. Springer, Cham, 2017. pp. 379–387. 10.1007/978-3-319-67389-9_44
  • 36.Pan X, Ye T, Xia Z, Song S, Huang G: Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention. Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR), 10.1109/CVPR52729.2023.00207, June 2023
  • 37.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Proceedings of the 18th International Conference. Part III. Lecture Notes in Computer Science, vol 9351. Springer, Cham, 2015. pp. 234–241. 10.1007/978-3-319-24574-4_28.
  • 38.Badrinarayanan V, Kendall A, Cipolla R: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39:2481-2495, 2017 [DOI] [PubMed] [Google Scholar]
  • 39.Xiao X, Wang T, Wang S, Tang Y, Wu L. Weighted Res-UNet for high-quality retina vessel segmentation. In: Proceedings of the 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 2018. pp. 327–331. 10.1109/ITME.2018.00076
  • 40.Chen J, Lu Y, Yu Q, et al: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. Med Image Anal 2021; 70: 101996. 10.1016/j.media.2021.101996 [DOI] [PubMed] [Google Scholar]
  • 41.Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, et al. Swin-Unet: UNet-like pure transformer for medical image segmentation. In: Karlinsky L, Michaeli T, Nishino K, eds. Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13803. Springer, Cham, 2023. pp. 142–154. 10.1007/978-3-031-25066-8_9
  • 42.Fu L, Chen Y, Ji W, Yang F: SSTrans-Net: Smart Swin Transformer Network for medical image segmentation. Biomedical Signal Processing and Control 91:106071, 2024 [Google Scholar]
  • 43.Ruan J, Li J, Xiang S: VM-UNet: Vision mamba unet for medical image segmentation. arXiv:2402.02491(2024)
  • 44.Raparia K, Min SK, Mody DR, Anton R, Amrikachi M: Clinical outcomes for “suspicious” category in thyroid fine-needle aspiration biopsy: patient’s sex and nodule size are possible predictors of malignancy. Archives of pathology & laboratory medicine 133:787-790, 2009 [DOI] [PubMed] [Google Scholar]
  • 45.Mendelson AA, et al.: Predictors of malignancy in preoperative nondiagnostic biopsies of the thyroid. J Otolaryngol Head Neck Surg 38:395-400, 2009 [PubMed] [Google Scholar]
  • 46.Kamran SC, et al.: Thyroid nodule size and prediction of cancer. The Journal of Clinical Endocrinology & Metabolism 98:564-570, 2013 [DOI] [PubMed] [Google Scholar]
  • 47.Alexopoulou O, et al.: Predictive factors of thyroid carcinoma in non-toxic multinodular goitre. Acta clinica belgica 59:84-89, 2004 [DOI] [PubMed] [Google Scholar]
  • 48.Park J-H, Choi K-H, Lee H-B, Rhee Y-K, Lee Y-C, Chung M-J: Intrathoracic malignant peripheral nerve sheath tumor in von Recklinghausen’s disease. The Korean Journal of Internal Medicine 16:201, 2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Schlinkert RT, et al.: Factors that predict malignant thyroid lesions when fine-needle aspiration is "suspicious for follicular neoplasm". Mayo Clin Proc 72:913-916, 1997 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Journal of Imaging Informatics in Medicine are provided here courtesy of Springer

RESOURCES