Automatic hepatic tumor segmentation in intra-operative ultrasound: a supervised deep-learning approach

Tiziano Natali; Andrey Zhylka; Karin Olthof; Jasper N Smit; Tarik R Baetens; Niels F M Kok; Koert F D Kuhlmann; Oleksandra Ivashchenko; Theo J M Ruers; Matteo Fusaglia

doi:10.1117/1.JMI.11.2.024501

. 2024 Mar 12;11(2):024501. doi: 10.1117/1.JMI.11.2.024501

Automatic hepatic tumor segmentation in intra-operative ultrasound: a supervised deep-learning approach

Tiziano Natali ^a,^*, Andrey Zhylka ^a, Karin Olthof ^a, Jasper N Smit ^a, Tarik R Baetens ^a, Niels F M Kok ^a, Koert F D Kuhlmann ^a, Oleksandra Ivashchenko ^b, Theo J M Ruers ^a,^c, Matteo Fusaglia ^a

PMCID: PMC10929734 PMID: 38481596

Abstract.

Purpose

Training and evaluation of the performance of a supervised deep-learning model for the segmentation of hepatic tumors from intraoperative US (iUS) images, with the purpose of improving the accuracy of tumor margin assessment during liver surgeries and the detection of lesions during colorectal surgeries.

Approach

In this retrospective study, a U-Net network was trained with the nnU-Net framework in different configurations for the segmentation of CRLM from iUS. The model was trained on B-mode intraoperative hepatic US images, hand-labeled by an expert clinician. The model was tested on an independent set of similar images. The average age of the study population was 61.9 ± 9.9 years. Ground truth for the test set was provided by a radiologist, and three extra delineation sets were used for the computation of inter-observer variability.

Results

The presented model achieved a DSC of 0.84 ( $p = 0.0037$ ), which is comparable to the expert human raters scores. The model segmented hypoechoic and mixed lesions more accurately (DSC of 0.89 and 0.88, respectively) than hyper- and isoechoic ones (DSC of 0.70 and 0.60, respectively) only missing isoechoic or >20 mm in diameter (8% of the tumors) lesions. The inclusion of extra margins of probable tumor tissue around the lesions in the training ground truth resulted in lower DSCs of 0.75 ( $p = 0.0022$ ).

Conclusion

The model can accurately segment hepatic tumors from iUS images and has the potential to speed up the resection margin definition during surgeries and the detection of lesion in screenings by automating iUS assessment.

Keywords: intraoperative ultrasound, computer-aided diagnosis, liver, supervised learning, convolutional neural network

1. Introduction

For a large part of the 1.9 million patients that are globally affected every year by colorectal liver metastases (CRLM), surgical resection provides the best long-term patient outcome.¹ Radical resection of the tumors can be challenging due to the difficulty in localizing the lesions; the liver has a highly deformable nature and complex vascular system that add to the challenge of assessing clinical imaging tasks. Surgeons are guided during the procedure with a point-of-care imaging system, namely intraoperative ultrasound (iUS). It is the only readily available technology for continuous imaging without inducing ionizing radiations and shows excellent resolution and soft tissue contrast. However, iUS is characterized by relatively high speckle noise and artifacts when not tuned properly, making it a challenging modality, suffering from a relatively high operator dependency. All of this poses a challenge to the detectability of smaller or subtle liver lesions and accurate delineation of larger tumor borders on 2D iUS for any user. The development of an automated method for hepatic tumor segmentation in iUS can reduce the operator dependency of this imaging modality and assist clinicians during these challenging tasks. An automated segmentation method could be integrated into navigated liver surgeries.² In this scenario, the generated segmentations paired with the spatial information recorded with instrument tracking can be used to generate a 3D visualization of lesions that were imaged using iUS. Their correlation to preoperative scans (e.g., CT or MRI) could ultimately ease the tumor resection and ablation procedures, giving the surgeons updated information than preoperative imaging only. Automated detection and segmentation of hepatic tumors are also necessary to assist clinicians during colorectal surgeries. Clinicians specialized in these procedures are less experienced in interpreting iUS, since it is not a widely adopted imaging modality during these procedures, but it was suggested for screening of CLRM in previous studies.³

Improvements in the workflow of hepatic tumor resection and ablation have been the subject of extensive research in the past years. One of the most recent advancements in oncological liver surgery reported in the literature is the work from Paolucci et al.⁴ The authors presented an innovative clinical workflow for hepatic tumor resection in an ex-vivo setting, where semi-automatic segmentation of liver lesions is performed from iUS images with a seeded region growing algorithm. Analytical approaches based on the region growing, graph-based and thresholding concepts have been subject to many studies.⁵ Despite their promising results, clinically acceptable segmentation quality was not achieved, and they still require human intervention resulting in operator-dependent solutions. Deep-learning (DL) based algorithms have been pervading the medical imaging field in the most recent years, showing great improvement in the task of automatic segmentation of tumors, managing to generate high-quality, near-real-time segmentations and not requiring manual operation. Since this study focuses on iUS images, and there has been no implementation reported in the literature of a DL approach for the intraoperative segmentation of hepatic lesions, the most relevant work to it can be found for similar tasks of detection and segmentation from percutaneous US (pUS) of either liver lesions,⁶^–⁸ or breast lesions.⁹^,¹⁰ Dadoun et al.⁶ propose a model for the detection and characterization of hepatic lesions from pUS. Despite the good results reported by the authors, showing that DL methods on US images can be implemented for clinical use, its application for intraoperative assessment of tumor margins is not possible due to its focus on detecting (e.g., drawing a bounding box surrounding a possible lesion) and classifying lesions (e.g., distinguishing between benign and malignant), but not segmenting them. Another solution proposed by Ryu et al.⁸ capable of focal lesion segmentation requires the user to provide the region of interest for segmentation. Thus, it cannot be used in real time. Hu et al.⁹ proposed a U-Net model for full automatic segmentations of breast tumors from US, which achieved similar results in inter-observer variability.

As mentioned above, the reported solutions were proposed for pUS images. Although these imaging modalities share many similarities, the latter distinguishes itself by usually having higher contrast, superior resolution, and better signal-to-noise ratios, and thus an overall enhanced image quality; therefore, models specifically trained on iUS may result in improved performance.

In this work, the performance of a 2D U-Net trained using the nnU-Net framework¹¹ implemented for the intraoperative segmentation and detection of CRLM from iUS was evaluated. The model was trained and tested on a manually labeled internal dataset of liver iUS images. The model was trained in several setups, using different sets of ground truths and training sets. The model was shown to segment hypoechoic average-sized lesions more accurately than isoechoic or larger lesions. Results hint that real-time application is possible. Finally, the performance of the model was similar to inter-observer variability, suggesting that the employed model is capable of achieving quality comparable to expert human operators.

2. Methods

2.1. Dataset

For this retrospective study (summarized in Fig. 1), a dataset consisting of 300.000 B-Mode hepatic iUS images from 650 sweeps (e.g., sequences of US images acquired during a sweep, also known as cine-loops) from 110 patients (average age $61.9 \pm 9.9$ years) was used. The images were acquired as a part of a larger clinical study number IRBd20-091, which had been approved by the institutional medical ethical review board (IRB), so the consent form was waived by the IRB. Patients were accrued between 2018 and 2022, scheduled for open liver surgery of liver metastases, over 18 years, from both genders and without metal implants.

All images were acquired with the same T-shaped convex ultrasound transducer (I14C5T model, BK Medical ApS, Peabody, MA). Clinical frequency ranged from 5 to 14 MHz and a focal range of 10 to 80 mm was used. US sweeps were performed with a depth ranging from 35 to 110 mm, with a pixel size of 0.08 to 0.18 mm, respectively. Resulting image dimensions were $544 \times 668 pixels$ . All images were acquired before the application of any surgical clips. All sweeps were imaging a single lesion.

Twenty-four patients were excluded due to (i) severe steatosis identified from the images in the sweeps, which might have decreased the segmentation quality due to the low definition of imaged anatomical structures (seven patients); (ii) sweeps affected by prominent artifacts, where there was poor contact between transducer and liver (eight patients); (iii) acquisition depth of the US sweeps was not between 45 and 100 mm, which is the optimal focal zone of the transducer (nine patients). Next, images were classified based on the echogenicity of the tumors into hypo-, hyper-, isoechoic, and mixed (e.g., with varying echogenicity) tumor types by an expert radiologist (10 years of experience in liver ultrasound assessment). To enhance dataset variability, iUS sweeps were randomly sampled keeping between each selected image a minimum inter-sequence distance of five images. To minimize redundancy in the data and exclude possible overfitting of the models, a maximum of five images for the same tumor were sampled. Isoechoic tumor images were excluded from the training set. Detection of these lesions is precluded due to the inherent limitations of ultrasound’s properties. Additionally, because of their low incidence in patients affected by CRLM¹² (6% in our dataset), it would have been impossible to add a number of cases to the training set comparable to the other classes, hence it would not have been possible to properly evaluate their integration in the training. At the same time, isoechoic lesions were included in the test set in order to better understand and evaluate the performances of the network in every possible scenario and understand how it can generalize to unseen situations.

2.2. Dataset Partitioning

Following the above criteria, 1000 images of liver metastases from 206 sweeps from 59 patients were selected to form the train set (Table 1). At training time, 10% of the training dataset was used for model validation. Analysis of the performance of the first trained model on the test set revealed that the model consistently labeled the kidney as tumor. Thus, an additional 25 images picturing a tumor and a kidney together were selected from 10 sweeps from four patients and added to the training of the model forming set 3, following the hard-negative mining concept (e.g., including in the training set additional cases of consistent failures¹³). As a result, the complete training set composed of sets 1, 2, and 3 resulted to be of 1025 images from 216 sweeps of 63 patients. Images in sets 4 and 5 were used for evaluation of the models. All images in sets 4 and 5 were acquired from patients that were not included in sets 1 to 3. Set 4 included 100 images that were selected from 43 sweeps of 20 patients. Images were included in set 4 to represent the echogenicity classes with similar numbers. The isoechoic class is the only under-represented one, including only six images due to their low incidence in the patients included in our dataset. Set 5 included the complete sequence of images from three US sweeps from three patients, totaling 156 images. Tumor images in all sets have also been categorized into three classes based on their size (Table 1): small ( $diameter < 20 mm$ ), average ( $20 mm < diameter < 50 mm$ ), and large ( $diameter > 50 mm$ ). Diameters were computed directly from the manual labels on the images.

Table 1.

Datasets used in the study with size and echogenicity class distributions.

	#	Patients	Sweeps	Images	Size			Echogenicity
	#	Patients	Sweeps	Images	Small	Average	Large	Hypo	Hyper	Mixed	Iso
Train	(1)	59	206	1000	550	331	119	250	250	500	0
	(2)	59	206	1000	550	331	119	250	250	500	0
	(3)	4	10	25	10	15	0	6	6	13	0
Test	(4)	20	40	100	50	32	18	32	30	32	6
Test	(5)	3	3	156	1	1	1	0	1	2	0

Open in a new tab

1000 images were selected from the dataset to form our training set. Two sets of manual labeling were provided for the training, set 1 and 2. In the first set, labels only contained areas where the observer was most certain, while set 2 extends set 1 to include areas where the observer was less certain of tumor presence. Ten percent of the training images were used as a validation set. Set 3 included 25 images picturing the kidney for training to lower the false positive rate in images where the kidney is in the field of view. A set of 100 images from 40 sweeps were selected for testing the model (set 4). Set 5 contains the entire sequences of images from the sweeps of three lesions from three patients. Images in sets 1–3 and those in sets 4 and 5 are selected from disjoint groups of patients. Small tumors are considered to be $< 20 mm$ in diameter (computed from the images), average are between 20 and 50 mm, large are $> 50 mm$ .

2.3. Ground Truth Delineation

Two sets of ground-truth labeling for the images in the training set, sets 1 to 3 were provided by a technical physician with 6 years of experience. The difference between set 1 and set 2 stays in the tumor margins extension: delineations in set 2 extend those in set 1 widening the tumor margins. Sets 4 and 5 were labeled by an expert radiologist (10 years of experience), a liver surgeon (30 years of experience) and two technical physicians (7 and 5 years of experience, respectively). Reported years refer to experience in the assessment of hepatic iUS images. During the manual labeling of the tumors in the test set, the operators had access to the sweeps which contained the tumor images included in the test set, a procedure adopted to ensure the best accuracy in the delineation of the tumors. Delineations provided by the radiologist served as ground truth for this study, while the others were used in the computation of inter-observer variability.

2.4. Deep Learning Models

This study employed a 2D U-Net trained with the nnU-Net framework, which was best suited to study the real-time application of this model. Since it is the first time that a DL model has been implemented for application, it was chosen to adopt an architecture that has been extensively implemented and evaluated in other medical imaging fields, in order to give a baseline for comparison with future works. Furthermore, nnU-Net was adopted as a training framework given its demonstrated success in other medical imaging tasks.¹¹^,¹⁴^,¹⁵ The trained model inputs a single iUS image and outputs a binary map of the tumor segmentation. All trainings were performed using full size images without pre-processing.

The model was trained in different configurations that are summarized in Table 2. First, the use of the different ground truth delineations on the same images (sets 1 and 2 in Table 1) has been assessed (models A and B, respectively). Then, the addition of set 3 to the training (model C) to lower the false positive rate of the network output was studied. The U-Net was trained with weighted cross-entropy (WCE) loss functions due to the class imbalance between tumor and non-tumor pixels. The weights for the WCE loss were obtained following the method described by Ho and Wookey,¹⁶ where the weighting factors were computed as the ratio of background pixels (e.g., non-tumor) to foreground pixels (e.g., tumor). The resulting weight factors were 24 for the tumor class and one for the non-tumor class. Two further trainings with 12-1 and 6-1 weights were performed to assess the weighting more consistently for the different classes. Models were trained for a total of 1000 epochs, with the SGD optimizer and learning rate of 0.001 with a decay factor of 0.9 applied every 50 epochs, using Python 3.10.4, PyTorch 2.1 and CUDA 11.8 and nnU-Netv2 (commit from 6th of October), 2022. Training and evaluation were performed on a computer equipped with an NVIDIA GeForce GTX 1080ti GPU and Intel^® Xeon^® W-1250P CPU.

Table 2.

Results of the trained models on the radiologist’s segmentations for set 4 (Table 1) on DSC, precision, recall, and HDist.

Model	Dataset	DSC			Precision			Recall			HDist (mm)			Missed (#)
Model	Dataset	1IQ	Med.	3IQ	1IQ	Med.	3IQ	1IQ	Med.	3IQ	1IQ	Med.	3IQ	Missed (#)
A	(1)	0.47	0.79	0.88	0.42	0.73	0.92	0.54	0.94	0.98	3.11	4.65	16.54	8
B	(2)	0.37	0.75	0.86	0.22	0.70	0.83	0.69	0.92	0.98	2.27	5.35	11.23	11
C	(1), (3)	0.50	0.84	0.91	0.46	0.79	0.85	0.73	0.96	0.99	2.72	4.31	8.85	5

Open in a new tab

The medians of the metrics computed per image on the test set are reported in the table. Model C is the best performing, with the highest DSC and lowest number of undetected tumors. Model A scored higher in DSC than models B (0.79 versus 0.75 with confidence score $p = 0.0052$ ). The addition of set 3 to the training of the model resulted in an increase in DSC (0.84 of model C versus 0.79 of model A, $p = 0.0022$ ). All models were trained using weighted cross-entropy as loss function with weights being 6 for the tumor class and 1 for the non-tumor class. Comparisons between all reported models are statistically significant with respect to $α_{n} = 0.017$ . Median results for the best performing model are highlighted in bold.

2.5. Model Evaluation

The model segmentation performance was evaluated by computing four evaluation metrics: Dice similarity coefficient (DSC), precision, recall and Hausdorff Distance (HDist), and the analysis of the receiver operating characteristic (ROC) curves.

The overall segmentation performance of the models was assessed using the ROC curves and respective areas under the curve (AUC) (Scikit-Learn 1.3.1) based on true-positive (TPR) and false-positive rates (FPR) calculated on pixel-by-pixel basis. AUCs of the ROC curves were compared with the DeLong’s test for correlated ROC curves (pROC library, R version 4.3.2). The significance level was set at $α = 0.05$ . Due to multiple comparisons, $α$ was adjusted following the Bonferroni correction resulting in the new value of $α_{n} = \frac{α}{n_{comparisons}} = \frac{0.05}{3} = 0.017$ . Thus, the difference was considered statistically significant when $p \leq α_{n}$ was verified. The optimal segmentation threshold was chosen per model based on the respective ROC curve to achieve FPR of 0.1.

The models’ performance was further evaluated with DSC, precision, recall and HDist given the chosen threshold. These scores were computed by comparing the model segmentation for each image in the test set to its ground truth. The scores were then compared across models using the two-sided Wilcoxon signed rank test (Scipy 1.9.1) with statistical significance being verified when $p \leq α_{n}$ where $α_{n} = 0.017$ as defined previously.

The model performance was assessed with respect to different tumor echogenicity classes (defined in Sec. 2.3) using the same evaluation metrics, in order to understand which kind of tumors are most challenging to segment and require further attention. The agreement between observers (i.e., inter-rater variability) was evaluated by computing DSC and HDist between the manual delineations of the radiologist (ground truth) and the manual delineations from the other raters. The performance of the model was finally tested on all images from the sweeps included in set 5 (Table 1) to better understand the consistency of its segmentations in a scenario closer to its real application.

3. Results

Table 2 presents the results of the three models trained with different configurations on the sets in Table 1, in terms of DSC, precision, recall, HDist, and number of missed tumors. The models A and B were trained first. Model B achieved a DSC of 0.75 against 0.79 from model A with $p = 0.0052$ , and HDist of 4.65 against 5.35 with $p = 0.014$ . From the qualitative analysis of the segmentations on the test set using models A and B, it appeared that the kidney was consistently labeled as tumor. Therefore, set 1 was augmented with set 3 and used for the training of model C. Model C showed a 0.05 improvement in DSC (0.84 versus 0.79) over those of model A with $p = 0.0079$ and of 0.09 ( $p = 0.0022$ ) against model B, and an improvement of 0.39 mm in the HDist ( $p = 0.017$ ). Model C scored the highest DSC of 0.84, missing only 5 out of 100 tumors in the test set. Results were achieved with inference running time of 100 to 140 ms per image, resulting in 7 to 10 fps.

The analysis of the ROC curves in Fig. 2 suggests that model C has the highest AUC with a value of 0.93, against 0.90 of model A ( $p < 0.001$ ) and 0.88 of model B ( $p < 0.001$ ), defining this as the setup that trained the model with the best performances. Model C achieves a true positive ratio of 0.84 with the desired false positive rate of 0.10.

Figure 3 summarizes the study of segmentation performances of the models in Table 2 with respect to the echogenic properties of the tumors in the test set. The figure clearly shows that the hypoechoic (0.89, 0.86, 0.89, median for model A, B, and C, respectively) and mixed (0.81, 0.79, and 0.88) tumors are segmented with higher and more consistent DSC. On the other hand, Fig. 3 shows the distributions of DSC for hyper- (0.57, 0.81, and 0.70) and isoechoic (0.18, 0.02, and 0.60) ones that have higher variance and with lower medians.

Fig. 3 — DSC distribution of the trained models (Table 2) per tumor echogenic appearance. Models can clearly segment tumors from the hypoechoic (0.89, 0.86, and 0.89, median, respectively) and mixed (0.81, 0.79, and 0.88) classes with higher and more consistent DSCs, while results on hyper- (0.57, 0.81, and 0.70) and isoechoic (0.18, 0.02, and 0.60) tumors are more spread.

The same pattern can be seen in Fig. 4, where the results for DSC of the best model in configuration C are analyzed for each case in the test. The model achieves more consistent DSCs on hypo- and mixed tumors, while for hyperechoic cases there is more variation in the results. All the hypoechoic tumors were detected. All the undetected tumors are sized under 20 mm in diameter. Isoechoic tumors appear to be the most challenging for segmentation with three undetected cases and a median DSC of only 0.58.

When compared to the inter-observer agreement based on DSC (Table 3), model C achieves comparable results (Table 4). The average pair-wise observer agreement score is $0.86 \pm 0.04$ , while the network achieves a slightly worse average of $0.83 \pm 0.01$ . Average inter-observer HDist resulted to be $3.08 mm \pm 0.04$ and model C achieved 4.31 mm against the radiologist’s delineations and on average $3.36 mm \pm 0.87$ against all raters.

Table 3.

Inter-observer variability, raters against radiologist (ground truth).

Rater	DSC	Hausdorff distance (mm)
Surgeon	0.82	3.50
Tech. Phys. 1	0.88	3.10
Tech. Phys. 2	0.89	2.63
Average	0.86 ± 0.041	3.08 ± 0.43

Open in a new tab

The delineations from the two technical physicians and the liver surgeon were compared in terms of DSC and HDist against the ground truth provided by the radiologist. Reported DSCs and HDists are median values calculated based on the delineations of set 4 (Table 1).

Table 4.

Model C against the four raters.

Rater	DSC	Hausdorff Distance (mm)
Radiologist	0.84	4.31
Surgeon	0.85	3.81
Tech. Phys. 1	0.83	3.01
Tech. Phys. 2	0.82	2.33
Average	0.83 ± 0.012	3.36 ± 0.87

Open in a new tab

From a qualitative analysis of the results on set 4 (Fig. 5), it appeared that the network can consistently segment lesions. The graphs in Fig. 6 summarize the performance of the model at segmenting all images picturing a lesion in two of the sweeps from set 5 (Table 1). While Fig. 6(a) presents a common scenario, where the model can consistently segment the lesion throughout the sweep, Fig. 6(b) presents a case where the network performance is more inconsistent. In the fourth slice of the sweep (blue label), the tumor has been completely missed by the model, while it was correctly segmented in the previous slice (red label) with DSC of 0.88.

4. Discussion

In this study, we investigated for the first time the implementation of an automatic DL model for the segmentation of CRLM from iUS images and its real-time applicability. The assessment of different model configurations resulted in the following: (i) training the model with a more conservative ground truth (e.g., thinner tumor margins) leads to a model that achieves higher DSC and lower HDist; (ii) the model achieves results comparable to human operators’ inter-observer variability, scoring 0.83 versus 0.86 DSC and 3.44 mm versus 2.83 mm HDist (respectively average of the model against raters and inter-observer variability) with an area under the ROC curve of 0.93; (iii) we studied the model performance on different echogenicity and size classes of the tumors in the test set. The model achieves higher DSC for hypoechoic and mixed tumors while struggling with isoechoic lesions. The model segments tumors in sizes that range from 20 to 50 mm with a median DSC of 0.87, while the median DSC for tumors $> 50 mm$ sets lower at 0.64. The only non-segmented lesions that are non-hypoechoic are also $< 20 mm$ .

The results achieved in this study are comparable to those reported in previous work on US segmentation. Ryu et al.⁸ in their semi-automatic segmentation and classification method for liver lesions reported a Jaccard Index of 0.685 (which converted to DSC is equal to 0.813), which is lower than 0.84 achieved by our automatic segmentation model. In their work for automatic segmentation of tumors from breast US, Hu et al.⁹ reported a DSC of 0.89. Pan et al.¹⁰ reported a DSC of 0.82 in their work on DL for automated breast US. Another contribution of this work is the assessment of the DL model on an intraoperative US transducer. As was mentioned, hepatic iUS differs from abdominal US since it works with a different frequency, resulting in different contrast and noise-to-signal ratio. Also, the reported performance evaluation on separate kinds of tumors helps to identify challenges that need to be addressed in future studies.

According to this work, more restrictive annotation covering thinner tumor margins leads to a better-performing model. This is also in agreement with the findings from a previous study about AI for radiological applications by Kulkarni et al.¹⁷

Lower performance of the model on isoechoic tumors and tumors $> 50 mm$ may be related to data distribution in the training set. No isoechoic lesions were included in the training set and the tumor size distribution was unbalanced, with most tumors $< 50 mm$ . Balancing the representation of the tumor types and sizes, by including more isoechoic and large tumors, may lead to improvement in the performance.

The model failed at segmenting only non-hypoechoic small tumors, which may be explained by the combination of small size and echogenicity of the tumors that makes them easily mistaken for healthy tissue when analyzing single images instead of a sequence of images. We believe that model modifications are necessary to overcome this issue. Zhang et al.¹⁸ suggested the implementation of a multi-task trained model for both segmentation and image-wise classification of breast tumors from US images. Such a model was reported to miss less lesions than a segmentation-only model as well as achieving a higher DSC.

Despite the advantages in the implementation of a 2D network (easier to train, applicable for real-time use, and able to fully take advantage of the iUS high resolution), the lack of spatial information results in an inconsistent segmentation between neighboring images. Therefore, the introduction of neighboring images during training in a 2.5D fashion might stabilize the model’s output. Such an approach was reported in the literature to improve the accuracy of models for segmentation from CT and MRI images.¹⁹^,²⁰ Pan et al.¹⁰ report that DL models working on breast pUS can benefit the integration of spatial information. Similarly, the performance of models working with iUS might improve by adding neighboring images from the US sequence. The incorporation of 2.5D data adds the possibility of encoding the information about speckle patterns in sequential images, which are often the only biomarkers used by clinicians to localize isoechoic tumors. This might boost the overall performance of the model, improving the segmentation quality of detected lesions and the detection rate of isoechoic lesions.

We believe the proposed solution can be integrated into a navigated surgery procedure in cases when a reconstructed US sweep needs to be segmented. When it comes to its real-time application, the achieved 7 to 10 fps rate is lower than the desired 30 fps, which is usually the frame rate of iUS scanners. The implementation of the model in a lighter framework than nnU-Net, where preprocessing step takes 50% of inference time, will lead to higher fps. The detection rate of the presented model is 0.92, thus it could be used to help less experienced operators during screening of CRLM in colorectal surgeries.

The dataset used for the training and testing of the model was entirely acquired using the same US transducer. The dataset needs to be extended with intraoperative images from different transducer in order to study the generalizability of this model and ensuring its robustness. Additionally, during the test annotation, the operators did not have access to other medical information that is usually available. This was necessary to create a test set that was delineated based on US images’ information, the same way as the model operates. Finally, the dataset does not include information about frequency and gain levels used in the acquisitions. Studying the performances of the model with respect to these properties might provide more insights into DL-based tumor segmentation. This may lead to the development of an algorithm for suggesting best US setting during image acquisition.

This study provides information and insights on the implementation of a deep learning model for hepatic tumor segmentation in iUS. The evaluation of the model showed promising results comparable to human operators, hinting that it is possible to adopt such network to aid US operators at identifying tumors and finding their margins. This application could result in decreasing US operator dependency and improving surgical procedures.

Biographies

Tiziano Natali graduated in game and medical technology from Utrecht Universiteit in 2021, with a focus on deep learning and image processing. He is currently working at the Netherlands Cancer Institute—Antoni van Leeuwenhoek Oncological Hospital in Amsterdam. His current research focuses on deep learning applications on ultrasound imaging.

Biographies of the other authors are not available.

Contributor Information

Tiziano Natali, Email: t.natali@nki.nl.

Andrey Zhylka, Email: a.zhylka@nki.nl.

Karin Olthof, Email: ka.olthof@nki.nl.

Jasper N. Smit, Email: j.smit@nki.nl.

Tarik R. Baetens, Email: t.baetens@nki.nl.

Niels F. M. Kok, Email: n.kok@nki.nl.

Koert F. D. Kuhlmann, Email: k.kuhlmann@nki.nl.

Oleksandra Ivashchenko, Email: o.v.ivashchenko@umcg.nl.

Theo J. M. Ruers, Email: t.ruers@nki.nl.

Matteo Fusaglia, Email: m.fusaglia@nki.nl.

Disclosures

No conflict of interest with this study.

Code and Data Availability

The data that support the findings of this article are not publicly available due to compliance with the WGBO (Dutch regulations for medical records), which prevents us from sharing patients’ data.

References

1.Sung H., et al. , “Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” Cancer J. Clin. 71(3), 209–249 (2021). 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]
2.Smit J. N., et al. , “Ultrasound-based navigation for open liver surgery using active liver tracking,” Int. J. Comput. Assist. Radiol. Surg. 17(10), 1765–1773 (2022). 10.1007/s11548-022-02659-3 [DOI] [PubMed] [Google Scholar]
3.Danilo C., Leanza S., “Routine intraoperative ultrasound for the detection of liver metastases during resection of primary colorectal cancer-a systematic review,” Maedica 15(2), 250–252 (2020). 10.26574/maedica.2020.15.2.250 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Paolucci I., et al. , “Ultrasound based planning and navigation for non-anatomical liver resections-an ex-vivo study,” IEEE Open J. Eng. Med. Biol. 1, 3–8 (2019). 10.1109/OJEMB.2019.2961094 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Pasyar P., et al. , “Hybrid classification of diffuse liver diseases in ultrasound images using deep convolutional neural networks,” Inf. Med. Unlocked 22, 100496 (2021). 10.1016/j.imu.2020.100496 [DOI] [Google Scholar]
6.Dadoun H., et al. , “Deep learning for the detection, localization, and characterization of focal liver lesions on abdominal us images,” Radiol. Artif. Intell. 4(3), e210110 (2022). 10.1148/ryai.210110 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mostafiz R., et al. , “Focal liver lesion detection in ultrasound image using deep feature fusions and super resolution,” Mach. Learn. Knowl. Extract. 2(3), 172–191 (2020). 10.3390/make2030010 [DOI] [Google Scholar]
8.Ryu H., et al. , “Joint segmentation and classification of hepatic lesions in ultrasound images using deep learning,” Eur. Radiol. 31, 8733–8742 (2021). 10.1007/s00330-021-07850-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hu Y., et al. , “Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model,” Med. Phys. 46(1), 215–228 (2019). 10.1002/mp.13268 [DOI] [PubMed] [Google Scholar]
10.Pan P., et al. , “Tumor segmentation in automated whole breast ultrasound using bidirectional LSTM neural network and attention mechanism,” Ultrasonics 110, 106271 (2021). 10.1016/j.ultras.2020.106271 [DOI] [PubMed] [Google Scholar]
11.Isensee F., et al. , “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nat. Methods 18(2), 203–211 (2021). 10.1038/s41592-020-01008-z [DOI] [PubMed] [Google Scholar]
12.Smit J. N., et al. , “Validation of 3D ultrasound for image registration during oncological liver surgery,” Med. Phys. 48(10), 5694–5701 (2021). 10.1002/mp.15080 [DOI] [PubMed] [Google Scholar]
13.Schroff F., Kalenichenko D., Philbin J., “FaceNet: a unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. and Pattern Recognit., pp. 815–823 (2015). 10.1109/CVPR.2015.7298682 [DOI] [Google Scholar]
14.Isensee F., et al. , “nnU-Net for brain tumor segmentation,” Lect. Notes Comput. Sci. 12659, 118–132 (2021). 10.1007/978-3-030-72087-2_11 [DOI] [Google Scholar]
15.Zhang G., et al. , “Multiorgan segmentation from partially labeled datasets with conditional nnU-Net,” Comput. Biol. Med. 136, 104658 (2021). 10.1016/j.compbiomed.2021.104658 [DOI] [PubMed] [Google Scholar]
16.Ho Y., Wookey S., “The real-world-weight cross-entropy loss function: modeling the costs of mislabeling,” IEEE Access 8, 4806–4813 (2019). 10.1109/ACCESS.2019.2962617 [DOI] [Google Scholar]
17.Kulkarni V., Gawali M., Kharat A., “Key technology considerations in developing and deploying machine learning models in clinical radiology practice,” JMIR Med. Inf. 9(9), e28776 (2021). 10.2196/28776 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhang S., et al. , “Fully automatic tumor segmentation of breast ultrasound images with deep learning,” J. Appl. Clin. Med. Phys. 24(1), e13863 (2023). 10.1002/acm2.13863 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zhang H., et al. , “Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices,” Lect. Notes Comput. Sci. 11766, 338–346 (2019). 10.1007/978-3-030-32248-9_38 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zhang Y., et al. , “Bridging 2D and 3D segmentation networks for computation-efficient volumetric medical image segmentation: an empirical study of 2.5 d solutions,” Comput. Med. Imaging Graph. 99, 102088 (2022). 10.1016/j.compmedimag.2022.102088 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this article are not publicly available due to compliance with the WGBO (Dutch regulations for medical records), which prevents us from sharing patients’ data.

[r1] 1.Sung H., et al. , “Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” Cancer J. Clin. 71(3), 209–249 (2021). 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]

[r2] 2.Smit J. N., et al. , “Ultrasound-based navigation for open liver surgery using active liver tracking,” Int. J. Comput. Assist. Radiol. Surg. 17(10), 1765–1773 (2022). 10.1007/s11548-022-02659-3 [DOI] [PubMed] [Google Scholar]

[r3] 3.Danilo C., Leanza S., “Routine intraoperative ultrasound for the detection of liver metastases during resection of primary colorectal cancer-a systematic review,” Maedica 15(2), 250–252 (2020). 10.26574/maedica.2020.15.2.250 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Paolucci I., et al. , “Ultrasound based planning and navigation for non-anatomical liver resections-an ex-vivo study,” IEEE Open J. Eng. Med. Biol. 1, 3–8 (2019). 10.1109/OJEMB.2019.2961094 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Pasyar P., et al. , “Hybrid classification of diffuse liver diseases in ultrasound images using deep convolutional neural networks,” Inf. Med. Unlocked 22, 100496 (2021). 10.1016/j.imu.2020.100496 [DOI] [Google Scholar]

[r6] 6.Dadoun H., et al. , “Deep learning for the detection, localization, and characterization of focal liver lesions on abdominal us images,” Radiol. Artif. Intell. 4(3), e210110 (2022). 10.1148/ryai.210110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Mostafiz R., et al. , “Focal liver lesion detection in ultrasound image using deep feature fusions and super resolution,” Mach. Learn. Knowl. Extract. 2(3), 172–191 (2020). 10.3390/make2030010 [DOI] [Google Scholar]

[r8] 8.Ryu H., et al. , “Joint segmentation and classification of hepatic lesions in ultrasound images using deep learning,” Eur. Radiol. 31, 8733–8742 (2021). 10.1007/s00330-021-07850-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9.Hu Y., et al. , “Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model,” Med. Phys. 46(1), 215–228 (2019). 10.1002/mp.13268 [DOI] [PubMed] [Google Scholar]

[r10] 10.Pan P., et al. , “Tumor segmentation in automated whole breast ultrasound using bidirectional LSTM neural network and attention mechanism,” Ultrasonics 110, 106271 (2021). 10.1016/j.ultras.2020.106271 [DOI] [PubMed] [Google Scholar]

[r11] 11.Isensee F., et al. , “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nat. Methods 18(2), 203–211 (2021). 10.1038/s41592-020-01008-z [DOI] [PubMed] [Google Scholar]

[r12] 12.Smit J. N., et al. , “Validation of 3D ultrasound for image registration during oncological liver surgery,” Med. Phys. 48(10), 5694–5701 (2021). 10.1002/mp.15080 [DOI] [PubMed] [Google Scholar]

[r13] 13.Schroff F., Kalenichenko D., Philbin J., “FaceNet: a unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. and Pattern Recognit., pp. 815–823 (2015). 10.1109/CVPR.2015.7298682 [DOI] [Google Scholar]

[r14] 14.Isensee F., et al. , “nnU-Net for brain tumor segmentation,” Lect. Notes Comput. Sci. 12659, 118–132 (2021). 10.1007/978-3-030-72087-2_11 [DOI] [Google Scholar]

[r15] 15.Zhang G., et al. , “Multiorgan segmentation from partially labeled datasets with conditional nnU-Net,” Comput. Biol. Med. 136, 104658 (2021). 10.1016/j.compbiomed.2021.104658 [DOI] [PubMed] [Google Scholar]

[r16] 16.Ho Y., Wookey S., “The real-world-weight cross-entropy loss function: modeling the costs of mislabeling,” IEEE Access 8, 4806–4813 (2019). 10.1109/ACCESS.2019.2962617 [DOI] [Google Scholar]

[r17] 17.Kulkarni V., Gawali M., Kharat A., “Key technology considerations in developing and deploying machine learning models in clinical radiology practice,” JMIR Med. Inf. 9(9), e28776 (2021). 10.2196/28776 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Zhang S., et al. , “Fully automatic tumor segmentation of breast ultrasound images with deep learning,” J. Appl. Clin. Med. Phys. 24(1), e13863 (2023). 10.1002/acm2.13863 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Zhang H., et al. , “Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices,” Lect. Notes Comput. Sci. 11766, 338–346 (2019). 10.1007/978-3-030-32248-9_38 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Zhang Y., et al. , “Bridging 2D and 3D segmentation networks for computation-efficient volumetric medical image segmentation: an empirical study of 2.5 d solutions,” Comput. Med. Imaging Graph. 99, 102088 (2022). 10.1016/j.compmedimag.2022.102088 [DOI] [PubMed] [Google Scholar]

PERMALINK

Automatic hepatic tumor segmentation in intra-operative ultrasound: a supervised deep-learning approach

Tiziano Natali

Andrey Zhylka

Karin Olthof

Jasper N Smit

Tarik R Baetens

Niels F M Kok

Koert F D Kuhlmann

Oleksandra Ivashchenko

Theo J M Ruers

Matteo Fusaglia

Abstract.

Purpose

Approach

Results

Conclusion

1. Introduction

2. Methods

2.1. Dataset

Fig. 1.

2.2. Dataset Partitioning

Table 1.

2.3. Ground Truth Delineation

2.4. Deep Learning Models

Table 2.

2.5. Model Evaluation

3. Results

Fig. 2.

Fig. 3.

Fig. 4.

Table 3.

Table 4.

Fig. 5.

Fig. 6.

4. Discussion

Biographies

Contributor Information

Disclosures

Code and Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases