Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: Int J Comput Assist Radiol Surg. 2023 Jun 12;18(11):2083–2090. doi: 10.1007/s11548-023-02968-1

Automated Full Body Tumor Segmentation in DOTATATE PET/CT for Neuroendocrine Cancer Patients

Alice Santilli 1, Prashanth Panyam 1, Arthur Autz 1, Rick Wray 1, John Phillip 2, Pierre Elnajjar 1, Nathaniel Swinburne 1,*, Marius Mayerhoefer 1,*
PMCID: PMC10980256  NIHMSID: NIHMS1976963  PMID: 37306856

Abstract

Purpose:

Neuroendocrine tumors (NET) are a rare form of cancer that can occur anywhere in the body and commonly metastasizes. The large variance in location and aggressiveness of the tumors makes it a difficult cancer to treat. Assessments of the whole-body tumor burden in a patient image allows for better tracking of disease progression and informs better treatment decisions. Currently, radiologists rely on qualitative assessments of this metric since manual segmentation is unfeasible within a typical busy clinical workflow.

Methods:

We address these challenges by extending the application of the nnU-net pipeline to produce automatic NET segmentation models. We utilize the ideal imaging type of 68Ga-DOTATATE PET/CT to produce segmentation masks from which to calculate total tumor burden metrics. We provide a human level baseline for the task and perform ablation experiments of model inputs, architectures, and loss functions.

Results:

Our dataset is comprised of 915 PET/CT scans, divided into a held out test set (87 cases) and 5 training subsets to perform cross validation. The proposed models achieve test Dice scores of 0.644, on par with our inter-annotator Dice score on a subset 6 patients of 0.682. If we apply our modified Dice score to the predictions, the test performance reaches a score of 0.80.

Conclusion:

In this paper, we demonstrate the ability to automatically generate accurate NET segmentation masks given PET images through supervised learning. We publish the model for extended use and to support the treatment planning of this rare cancer1.

Keywords: Automatic Segmentation, nnUnet, Neuroendocrine Tumor, PET, DOTATATE, Radiology, Tumor Burden

1. Introduction

Neuroendocrine tumors (NETs) are a rare form of cancer with approximately 12,000 new cases in the US each year, increasing every year [1]. NETs can occur anywhere in the body and are usually detected incidentally. Most primary neuroendocrine tumors occur in the lungs, appendix, small intestine, rectum and pancreas; with the liver, lymph nodes, bones and the peritioneum as common sites for metastases. Treatment options for this cancer type include surgery, chemotherapy, targeted therapies, and radiation therapy, or frequently a combination thereof. Treatment decisions are often made depending on the tumor locations, whether it produces excess hormones, its aggressiveness and whether is has metastasized. The main commonality for all patients will be to track whole-body tumor burden with repeated imaging at baseline, during/after systemic treatment, following surgery, or during expectant monitoring.

The most common imaging types for assessment of NETs are contrastenhanced computed tomography (CT) and positron emission tomography (PET). While CT is chiefly used to assess tumor morphology (i.e., lesion size), functional imaging with PET exploits molecular features of tumors, such as receptor expression or enzyme activity. Since well-differentiated (G1) and intermediate-grade (G2) NETs (which make up the vast majority of these cancers) are usually slowly growing, the radiolabeled glucose analogue 18F-FDG, which is the standard PET tracer for most cancers, is of limited use. However, the characteristic overexpression of somatostatin receptors (SSTR) by NETs can be used as a target for functional imaging with 68GaDOTATATE PET, and also for treatment with octreotide or peptide receptor endoradiotherapy (PRRT), for instance with 177Lu-DOTATATE, which is now a standard treatment option. A much higher proportion of NETs can be identified with 68Ga-DOTATATE PET than with other imaging tests (90% sensitivity for 68Ga-DOTATATE PET/CT and 71% for CT/MRI) [2], and 68Ga-DOTATATE can also directly be used to identify patients that are eligible for PRRT (i.e., which show sufficiently high SSTR expression). Most importantly, 68Ga-DOTATATE PET overcomes the limitation of CT in differentiating between residual viable tumor and non-active disease following treatment, and therefore, PET and CT information are regarded as complementary in NETs.

The primary method of monitoring treatment response is by tracking the change in the total tumor burden over time using radiologic or nuclear medicine imaging tests[3]. Due to the spread and often large number of different tumors a patient can have in one scan, assessing sensitive changes in tumor burden is difficult for radiologists relying on qualitative assessment or by using traditional linear measurements of index lesions. A true assessment of total tumor burden in one scan would require manual segmentation of all tracer-avid lesions within the entire PET volume, which is time-consuming and laborious, making it an unfeasible part of patient workflow. For example, manual segmentation of a single lesion can take up to 10 minutes, while widely metastatic NET may include dozens, even hundreds of individual lesions [4]. Throughout the literature, it is apparent that automatic segmentation models are critical to medical computing and have garnered major collective efforts towards this goal. This can be seen by the participants of the most recent Medical Decathlon, a benchmark open competition to tackle this issue of generalizing solutions to automatic segmentation in medical imaging [5]. This decathlon was focused on organ segmentation specifically, but has provided a great architectural starting point for domain transfer to tumor segmentation.

In 2015, Ronneberger et al. proposed the use of the U-net, a convolutional neural network (CNN) containing symmetric contracting and expanding paths, for biomedical image segmentation [6]. Since then, many successful artificial intelligence (AI) based segmentation investigations have followed suit and led to variations on the original U-net architecture [5, 7]. The U-net structure allows for the use of global location and global context at the same time. One of its main advantages is that it utilizes pixel-wise loss. For images with small objects, such as individual cells or, in our case, small and numerous tumors, this loss structure is advantageous. A recent extension of the U-net method was developed by Insensee et al. which ranked first overall in the Medical Image Decathlon, nnU-net[8]. This end-to-end architecture demonstrated its potential to seamlessly transfer domains due to its ability to automatically adapt to the given image geometry as defined during preprocessing steps.

In this paper, we propose an extension of the nnU-net architecture for automatic segmentation of neuroendocrine tumors in 68Ga-DOTATATE PET volumes. By leveraging the large historical repository of DOTATATE-PET images at our cancer center, we hope to expand the currently limited literature on PET-DOTATATE segmentation models. Additionally, we aim to support the development of computer-assisted tools to facilitate the tracking of changes in morphologic and functional NET disease burden. The proposed models may then be leveraged by researchers with smaller datasets. We perform extensive baseline and model parameter experiments and asses in depth the addition of the CT paired image as a second model input. By applying a variation on the traditional Dice score, we demonstrate that the model outperforms the human level baseline.

2. Materials & Methods

Figure 1 depicts an overview of our proposed model in an institutional environment. Upon acquisition of a PET-DOTATATE scan during routine NET patient follow-up, the images are passed through the segmentation model as part of the radiology workflow. The calculated total tumor burden volume from the output of the model is then pushed to the electronic health record (EHR). When the images are interpreted by the radiologist, the tumor burden volume is available for reference and inclusion in the report. In addition, an easily interpretable dashboard automatically pulls all previous images over the course of this patient journey to quickly and effectively visualize disease progression, which informs treatment decisions.

Fig. 1.

Fig. 1

Overview of a radiology pipeline including the use of an automatic tumor segmentation model to inform radiologist of changes in NET tumor burden and lesion number over time.

2.1. Data Collection and Annotation

All patients with PET DOTATATE scans between 2016 and 2021 were queried to be part of the patient cohort. From this set, scans with an ICD-10 (International Classification of Disease) diagnosis of neuroendocrine cancer were selected. This made up the majority of the original cohort, from which we then excluded outlier scans, such as cases of patients with meningioma or neuroblastoma. This final dataset is comprised of 915 scan volumes, each with a PET-DOTATATE scan and attenuation-correction CT pair.

GA68-DOTATATE PET/CT:

These scans were acquired using the Ga-68 DOTATATE tracer. The attenuation correction CT is acquired at the same time and, following routine use, is used to correct soft tissue artifacts of the PET image for higher confidence reading. The raw PET images are of 128x128 pixels spaced at 5.4mm and an average volume depth of 263 image slices spaced at 3.26mm. The CTs are of 512x512 pixels, with an average volume depth of 263 and an average spacing of 4.3 mm.

Annotations:

The tumor delineations were produced by a set of 6 radiologists contracted through a third party vendor, Vasta Global (USA,India). To onboard the consulting Vasta radiologist annotators, the supervising radiologists (SRs) undertook a ”train the trainer” approach. The SRs demonstrated the segmentation workflow to the Vasta lead radiologist, who subsequently trained the remaining team members. After each Vasta team member had annotated 10 - 15 scans, a follow-up meeting was conducted with the SRs and Vasta to discuss any technical or clinical questions that had arisen. Individual scan-specific questions were discussed by both parties as needed until the entire dataset was annotated. During the annotation phase, the label volumes were returned in batches to the team at Memorial Sloan Kettering Cancer Center (MSK), which performed quality assurance by randomly selecting 10% of the cases to be manually reviewed by the SRs for acceptability.

2.2. Data prepossessing

Each scan pair is first processed to downsample the CT from 512x512 pixels to 128x128 to match the PET input resolution. This was necessary for the stacking of the images as input to the model. The planning and preprocessing component of the nnU-net package is then applied to the data pairs [8]. This applies z-scoring to both the PET and CT images in each volume pair. We select 10% (87 cases) of the dataset as a held out test set that we use to assess model performance making sure that scans of the same patient were not divided across sets. The remaining 828 cases are randomly divided into 5 subsets to perform cross-validation.

2.3. Network Architecture

The nnU-Net method is developed as a fully automated dynamic pipeline to independently asses a new task and determine the appropriate preprocessing, network topology and training parameters [8]. The implementation of this model given the PET dataset has 5 unet layers, uses leaky ReLU activation, the Adam optimizer, instance normalization and strided convolutions for downsampling. All models were run for 1000 epochs, using an initial learning rate of 0.1 with polynomial decay. The best model was selected based on validation loss. The use of several loss functions is examined and discussed in section 3.3.

2.4. Experiments

A common challenge faced with medical imaging segmentation investigations is generating gold-standard ground truth annotations. Delineating tumors in a volume is a subjective task that differs greatly between annotators. Training, experience and differing techniques to address areas of uncertainty, are all factors that can affect the way an annotator interprets an image. It is therefore difficult to achieve ”perfect” ground truth annotations, and the inherent interobserver variation in tumor segmentation must be considered when evaluating model performance. To establish the success of our models, we need to determine the human level performance baseline that must be attained or improved upon. To do this, we randomly selected 6 patient volumes and had the 6 vendor radiologists and 2 MSK radiologists annotate each scan, blinded to the work of the other annotators. By collecting multiple ground truth annotations for this subset, we can evaluate inter-observer agreement between annotators using the STAPLE method and its Dice scores [9].

Once the baseline measure is established for comparison, we can investigate different model parameters. The first and largest question to answer is whether the addition of the attenuation correction CT at the model input would aid or impede model performance. A common concern when increasing the number of inputs to a model is the correlated increase in parameters and weights for that model to learn. The trade off between this expansion of the input space and the measured information gained from the addition must be optimized. We use the nnU-net framework to train models with 5 fold cross validation and report their performance on the held out test set. We then visualize the segmentations to qualitatively assess the major differences in the two models and help guide further optimization.

Finally, we perform a wide set of experiments of varying parameters focusing on architectures and loss functions and present the top models of each category. In addition to the nnU-net, we investigate the use of Unet3+ [10] and DeepMedic [11] as alternative workflows and structures. The Unet3+ method has full scale skip-connections and previously outperformed the basic unet and unet++ architectures for liver segmentation[10]. DeepMedic, on the other hand, is a more traditional CNN model with double-pathway architecture for multi-scale processing. We compare these three state-of-the-art architectures to identify the optimal method for automated segmentation of PET-DOTATATE images.

3. Results & Discussion

3.1. Human Level Baseline

Table 1 shows the inter-annotator Dice scores for 6 test patients. To compare the segmentation quality of each annotator, we use the Simultaneous Truth and Performance Level Estimation (STAPLE) method [9] to create a ground truth annotation given the 8 segmentation submissions for each patient. For the purposes of this estimation, we assume that these 8 inputs allow for a STAPLE segmentation that we can consider independent. Each row of the table is the Dice score between the given annotator and the STAPLE ground truth for each patient. A high average Dice score in the second to last row demonstrates a high agreeability in the delineations between the annotators for the particular scan, suggesting that the given scan is ”easier” to segment than those with lower Dice scores. This may be seen in the large variability between scores for patients 1 and 2. When looking across an annotator row, we can gain insight into the quality of the segmentations performed by that individual annotator. Annotator 2, for example, has the lowest average Dice score meaning that this annotators masks are the most different from all other annotators. To avoid the assumption that the best annotator might have poor Dice scores if all other annotators are of lower quality, therefore having greater affects on the STAPLE segmentation, we extend the experiment to include two MSK radiologists that we will consider to be our experts in this experiment. From the MSK expert rows, we see that their pairwise Dice scores are in a similar range to all other annotators. When compared only to each other (last row of Table 1), these two experts produce similar annotations to each other, which we could attribute to working in the same environment with similar protocols and workflows, but still face difficulties achieving the same results in cases such as Patient 6. This finding supports the assumption that there is a large variance in the segmentation task difficulty across the entire dataset.

Table 1.

Dice scores between each annotators’ segmentation and the STAPLE ground truth image for each patient. The STAPLE image was created from all 8 annotators (expert included). The MSK expert inter-annotator Dice per patient is displayed on the last row.

Patients
1 2 3 4 5 6 Average
 Annotator 1 0.466 0.924 0.815 0.893 0.955 0.895 0.824
 Annotator 2 0.172 0.774 0.280 0.311 0.572 0.548 0.443
 Annotator 3 0.287 0.622 0.245 0.458 0.444 0.627 0.447
 Annotator 4 0.740 0.903 0.601 0.794 0.757 0.833 0.772
 Annotator 5 0.797 0.928 0.825 0.758 0.518 0.980 0.801
 Annotator 6 0.471 0.520 0.616 0.601 0.915 0.479 0.600
 MSK expert 1 0.593 0.935 0.975 0.717 0.941 0.806 0.828
 MSK expert 2 0.408 0.928 0.895 0.949 0.721 0.529 0.738
 Average per Patient 0.491 0.817 0.656 0.686 0.728 0.712 0.682
Between MSK 1 and 2 0.651 0.893 0. 875 0.687 0.690 0.544

3.2. Addition of Attenuation Correction CT

3.2.1. Model Performance

Table 2 displays the performance metrics of the two models of interest, PET only and PET + the attenuation correction CT (AC/CT) pair. The held out test set is used to score the performance of a 5 fold ensemble model and the softmax values are filtered in three different ways (3 columns in the table). The ensemble max values are obtained by placing the pixel in the group (0 for background and 1 for tumor) that has the higher predicted value. The other two ensemble predictions are filtered by only selecting tumor pixels that have probability values greater than 0.75 and 0.9, respectively. We can interpret these predictions as confidence values. From the 0.9 ensemble column, we see that the PET-only model produces a greater Dice score, meaning that it has greater confidence in the areas that it correctly predicts as tumor. However, as you slightly lower the confidence threshold, the PET+CT model catches back up to having greater predictive value. If thought about visually, as compared with the AC/CT images, the PET images have larger intensity differences between positive (tumor-containing) and negative pixels due to the tumor-specific metabolic nature of PET image acquisition. This may explain the higher confidence in ”obvious” tumor regions for the PET only 90% confidence model, whereas the CT information added to the second model might better help delineate the uncertain regions, giving higher Dice scores, but only when the confidence is lowered. We perform a visual quality assessment in the following section 3.2.2.

Table 2.

Performance metrics of ensemble models with and without AC/CT on a test set of 87 cases. The metrics in each column were calculated on predictions of a “maximum”, 0.75 and 0.9 probability filtering.

Ensemble (max) Ensemble (0.75) Ensemble (0.9)
PET PET+CT PET PET+CT PET PET+CT
Dice 0.644 0.644 0.602 0.623 0.608 0.590
LTPR1 0.681 0.711 0.588 0.635 0.596 0.586
Precision 0.741 0.757 0.821 0.809 0.805 0.842
Sensitivity 0.675 0.665 0.573 0.605 0.585 0.547
1

LTPR: Lesion True Positive Rate.

In all three instances, we expected to see a larger difference between the two models, however the addition of this CT information as a secondary input only marginally improves the performance. Although the benefits are minimal, the PET+CT model can be seen as the better one for our use case. When a radiologist reviews a case that has been automatically segmented, it is of greater importance to have detected more positive areas than to have produced ”perfect” lesion outlines. The lesion true positive rate (LTPR) metric defines this ability and in two of the three ensemble models above, the PET+CT model results in a higher value.

3.2.2. Segmentation Evaluation

Figure 2 depicts representative slices from three example PET DOTATATE scans, two slices from each, showing varying tumor representations. Each prediction comes from the max ensemble model. As one can see in the figure, the 128x128 resolution of the PET input is pixelated. When the images and segmentations are overlapped, we can see that a single pixel may cover a large area of tissue. When calculating the Dice score between the predicted and true masks, the traditional Dice score performs poorly in many cases due to a predicted tumor border area that is larger than the manual annotation. As we saw in section 3.1, this is a challenging dataset, with a large imaging area, a wide variety of tumor locations, and variable treatment statuses; factors likely contributing to the high inter-rater variance. It is therefore fair to assume that the borders of the true masks are not ”perfect,” and a one pixel difference may result in an excessive penalty when using the Dice metric. This is especially true in cases with a high number of small tumors, as seen in the outermost slice of the first row in figure 2, as there are many ”edges” that may not line up perfectly. With this assumption, we calculated a ”modified Dice” score, which gives a one pixel allowance in every direction from a true pixel in the ground truth annotations. In other words, when using the modified Dice metric, any false positive predicted pixel that is directly adjacent to a true positive pixel is now considered a correct prediction. An example of such an intersection using the given allowance is displayed in Figure 3. As seen in Figure 2, the results of this new ”Dice with allowance” score are shown in bold. By applying this modified Dice score to the entire test set, the average scores of both ensemble models reach 0.80, above most annotator Dice scores determined in section 3.1. If we also apply the modified Dice score to the overlap set in that section, it reaches an average Dice of 0.82. As with the model prediction, this value also greatly benefits from this pixel allowance. This score is indeed slightly higher than model performance but only calculated from a subset of 6 cases.

Fig. 2.

Fig. 2

Example predictions from three cases of both models (PET and PET+CT). The bolded scores refer to the modified Dice allowance calculation discussed in section 3.2.2. For visual reference, the middle images are the corresponding PET slices to the inner most predicted image in each row.

Fig. 3.

Fig. 3

Example slice from an image that demonstrates what is calculated to be true using our modified Dice metric with allowance of one pixel from a true pixel

One question often faced when dealing with manual ground truth annotations is whether the large region of false negatives, as exemplified in row three of figure 2, are ”true” false negatives or simply a result of annotators segmenting in broad strokes. By doing so, they may include regions of normal tissue between tumors, as opposed to meticulously outlining each individual lesion. Neuroendocrine cancer typically produces a large tumor burden within the liver, which is laborious to closely delineate on PET/CT. In this dataset, it has become clear that in cases with many tumor deposits, the annotators tended to include most of the liver, overstating the actual disease burden and unfairly penalizing the model through the large areas of apparent false negatives. We attempt to account for this in the modified Dice score by allowing any apparent false negative pixel that is adjacent to a predicted positive pixel to be calculated as a true positive pixel. In this particular case (high tumor burden - large lesions), we can observe a straight line cutting across the yellow section (false positives) that most likely reflects such an annotator ”shortcut.” This direct line probably joined two points on a contour during the annotation of this case, and now leads to the model’s correct prediction of the tumor surface to be inappropriately scored as false positive. It is difficult to inspect every case in the large dataset one by one, and therefore the best assessment of this model’s performance would be achieved from a true prospective assessment performed after deploying the model into clinical workflow and getting real-time feedback from the radiologist inspecting the resulting segmentations.

3.3. Model Parameters

Table 3 displays the metrics of the top performing models of a selection of different architectures and loss functions. Variations of each row were run, and here we present only the best outcome of that architecture or loss function. Of our experiments, it is clear that the nnU-net architecture with the Dice coefficient (DC) + cross entropy (CE) loss model performs the best. However, with each manipulation of the model parameters and shapes, the Dice performance never surpasses ≈ 0.64. We attempt to overcome this ceiling by giving the model more information about the inputs. We did this by applying weights to each case based on our confidence in the annotation of that case; a common solution to challenges faced by human labelled medical images [12]. The idea is to learn more (apply a higher weighted loss) from cases where we expect better annotations and deemphasize (apply a lower weighted loss) on cases where the annotations might be poor. As described in section 2.1, the MSK radiologists QA’d a proportion of the cases, which we consider to be ”high confidence” manual annotations. The cases done at the beginning of the cohort, when our annotators were still becoming familiar with the image set and annotation approach, are considered to be ”low confidence.” We explored a few different weighting schemes, but the best model, in row 5 of Table 3 was grouped in the following rankings (weights): MSK QA’d (2), QA’d by the vendor (1.3), any case delivered after batch 4 (1), batches 2 and 3 (0.4) and the first batch of cases submitted (0.2). For the training sets, these criteria yielded subsets of 114, 83, 502, 104, 26, respectively. Unfortunately, the weighting did not add as much information as we hoped to the given model and we achieved similar performance to the best nnU-net model without weighting.

Table 3.

Performance metrics for several different architecture and loss function experiments for the PET+CT model.

Architecture Loss function Dice LTPR Precision Sensitivity
nnU-net1 DC + CE 0.644 0.711 0.757 0.665
nnU-net DC 0.631 0.689 0.710 0.670
nnU-net CE 0.619 0.632 0.755 0.632
nnU-net Focal loss 0.606 0.619 0.745 0.609
nnU-net + sample weighting DC + CE 0.632 0.696 0.734 0.662
Unet 3+ DC + focal 0.535 0.536 0.556 0.582
DeepMedic CE 0.557 0.689 0.523 0.655
1

ensemble max model from table 2

4. Conclusion

In this paper, we propose an extension of the nnU-net model for the automatic tumor segmentation of 68Ga-DOTATATE PET/CT scans. The performance of our model on the uniquely large size of our dataset demonstrates high confidence in the feasibility of deploying such a model within the clinical workflow. As with many medical segmentation tasks, we are limited by the quality of our manual annotations, and the inherent difficulty of the task, which introduces interobserver variability. In the current state of automatic segmentation applications in medical imaging, learning from the literature and developing a specific in-house models seems to be the optimal path to deployment and integration for many groups. Given the modularity of the nnU-net framework, we have made our models public to allow other, smaller, institutions to benefit from our pre-training and to improve upon and verify the generalizability of the models.

Funding:

This project is supported by the National Institutes of Health and National Cancer Institute (P30 CA008748).

Footnotes

1

Link to information about the trained model: https://github.com/AliceSantilli/nnUNet

Conflict of Interest: The authors declare no conflicts of interest. Pierre Elnajjar currently employed by Regeneron, Inc.

Informed Consent & Ethics Approval: This retrospective study was approved by the local institutional review board, and the need for written informed consent was waived. All data storage and handling were performed in compliance with Health Insurance Portability and Accountability Act regulations.

References

  • [1].Dasari A, Shen C, Halperin D, Zhao B, Zhou S, Xu Y, Shih T, and Yao JC. Trends in the incidence, prevalence, and survival outcomes in patients with neuroendocrine tumors in the united states. JAMA Oncology, 3(10):1335–1342, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Fallahi B, Manafi-Farid R, Eftekhari M, Fard-Esfahani A, EmamiAderkani A, Geramifar P, Akhlaghi M, Taheri APH, and Beiki D. Diagnostic efficiency of 68ga-dotatate pet/ct as compared to 99mtc-octreotide spect/ct and conventional morphologic modalities in neuroendocrine tumors. Asia Ocean J Nucl Med Biol., 7:129–140, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Dromain C, Pavel ME, Ruszniewski P, Langley A, Massien C, Baudin E, and Caplin ME. Tumor growth rate as a metric of progression, response, and prognosis in pancreatic and intestinal neuroendocrine tumors. BMC Cancer, 19(66), 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Egger J, Kapur T, Fedorov A, Pieper S, Miller JV, Veeraraghavan H, Freisleben B, Golby AJ, Nimsky C, and Kikinis R. Gbm volumetry using the 3d slicer medical image computing platform. Scientific Reports, 3(1364), 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Antonelli M, Reinkeb A, Bakase S, Farahanif K, Kopp-Schneiderg A, Landman BA, Litjens G, Menze B, Ronneberger O, Summers RM, Van Ginneken B, Bilello M, Bilic P, Christ PF, Do RKG, Gollub MJ, Heckers SH, Huisman H, Jarnagin WR, McHugo MK, Napel S, Pernicka JSG, Rohde K, Tonbo-Gomex C, Vorontsov E, Huisman H, Meakin JA, Ourselin S, Wiesenfarth M, Arbelaex P, Bae B, Chen S, Daza L, Feng J, He B, Isensee F, Ji Y, Jia F, Kim N, Kim I, Merhof D, Pai A, Park B, Perslev M, Rezaiifar R, Rippel O, Sarasua I, Shen W, Son J, Wachinger C, Wang L, Wang Y, Xia Y, Xu. D, Xu Z, Zheng Y, Simpson AL, Maier-Hein L, and Jorge Cardoso M. The medical segmentation decathlon. Nature Digital Medicine, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Ronneberger O, Fischer P, and Brox T. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computer and Computer Assisted Interventions MICCAI, 2015. [Google Scholar]
  • [7].Yousefirizi F, Jha AK, Brosch-Lenz J, Saboury B, and Rahmin A. Towards high-throughput artificial intelligence-base segmentation in oncological pet imaging. PET Clinics, 16:577–596, 2021. [DOI] [PubMed] [Google Scholar]
  • [8].Isensee F, Jaeger PF, Kohl SAA, Petersen J, and Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18:203–211, 2021. [DOI] [PubMed] [Google Scholar]
  • [9].Warfield SK, Zou KH, and Wells WM. Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Transaction on Medical Imaging, 23(7):903–213, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, and Wu J. Unet 3+: A full-scale connected unet for medical image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1055–1059, 2015. [Google Scholar]
  • [11].Kamnitsas K, Chen L, Ledig C, Rueckert D, and Glocker B. Multiscale 3d cnns for segmentation of brain lesions in multi-modal mri. Medical Image Computer and Computer Assisted Interventions MICCAI, 2015. [Google Scholar]
  • [12].Karimi D, Dou H, Warfield SK, and Gholipour S. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES