Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 1.
Published in final edited form as: Comput Med Imaging Graph. 2024 Jan 11;112:102335. doi: 10.1016/j.compmedimag.2024.102335

Segmentation of Pelvic Structures in T2 MRI via MR-to-CT Synthesis

Yan Zhuang a, Tejas Sudharshan Mathai a, Pritam Mukherjee a, Ronald M Summers a
PMCID: PMC10969342  NIHMSID: NIHMS1961986  PMID: 38271870

Abstract

Segmentation of multiple pelvic structures in MRI volumes is a prerequisite for many clinical applications, such as sarcopenia assessment, bone density measurement, and muscle-to-fat volume ratio estimation. While many CT-specific datasets and automated CT-based multi-structure pelvis segmentation methods exist, there are few MRI-specific multi-structure segmentation methods in literature. In this pilot work, we propose a lightweight and annotation-free pipeline to synthetically translate T2 MRI volumes of the pelvis to CT, and subsequently leverage an existing CT-only tool called TotalSegmentator to segment 8 pelvic structures in the generated CT volumes. The predicted masks were then mapped back to the original MR volumes as segmentation masks. We compared the predicted masks against the expert annotations of the public TCGA-UCEC dataset and an internal dataset. Experiments demonstrated that the proposed pipeline achieved Dice measures ≥65% for 8 pelvic structures in T2 MRI. The proposed pipeline is an alternative method to obtain multi-organ and structure segmentations without being encumbered by time-consuming manual annotations. By exploiting the significant research progress in CTs, it is possible to extend the proposed pipeline to other MRI sequences in principle. Our research bridges the chasm between the current CT-based multi-structure segmentation and MRI-based segmentation. The manually segmented structures in the TCGA-UCEC dataset are publicly available.

1. Introduction

Magnetic resonance imaging (MRI) is a widely used imaging modality owing to its non-ionizing nature, superior soft tissue resolution, and improved contrast between fat and water (Dirix et al. (2014)). MRI of the pelvis is critical for many applications, such as the assessment of prostate and endometrial/cervical cancers (Otero-García et al. (2019)), muscle and fat volume ratios (Doyle et al. (2012); Lopez and Newton (2022)) for survival rate prediction in cancer patients, bone density for risk of osteoporosis and fractures (Nüchtern (2015)), radiotherapy (Nyholm et al. (2018)), and sarcopenia (Huber et al. (2020); Boutin et al. (2015)). For most of these applications, segmentation of the various structures in the pelvis is critical for tasks such as targeted biopsy, focal tumor ablation, assessment of pelvic fractures, robotic surgery, and evaluation of response to treatment (Jones et al. (2015); Dirix et al. (2014); Li et al. (2022a)).

However, the segmentation of multiple pelvic structures in MRI remains under-explored; existing segmentation algorithms are disproportionately focused towards CT, for which a plethora of training datasets are publicly available (Zhou et al. (2019); Huang et al. (2020); Ji et al. (2022); Wasserthal et al. (2022)). For example, Wasserthal et al. (2022) recently used more than 1000 CT volumes to develop a CT-based multi-structure segmentation tool called TotalSegmentator. In contrast, MRI datasets with corresponding segmentations are scarce, and the volumes in some of the publicly available pelvic MRI datasets are cropped to the prostate region (Li et al. (2022a); Liu et al. (2020)), making them unsuitable for training segmentation tools for structures such as hips, sacrum and iliopsoas muscles that lie outside the prostate area. To the best of our knowledge, there are no MRI datasets that contain annotations for important pelvic structures, such as the hips, sacrum, gluteal musculature (maximus, medius, mininus) and iliopsoas muscles. Consequently, there are few approaches on the segmentation of important pelvic structures in MRI.

A conventional approach towards MRI-based multi-structure segmentation involves a supervised learning pipeline where the data needs to be collected and labeled manually. This supervised approach is annotation-intensive and requires a great deal of expert knowledge, time and effort (Greenspan et al. (2016)). An efficient alternative that circumvents the aforementioned drawbacks and alleviates the annotation burden is to exploit recent CT-based segmentation tools for pelvic MRI-based multi-structure segmentation via MR-to-CT image translation approaches.

In this pilot study, we propose a simple yet effective idea for the segmentation of multiple structures in pelvic MRI using an annotation-free pipeline. We focused our attention on T2-weighted pelvic MRI volumes. T2 MR slices were synthetically translated into CT using existing image-to-image translation methods, and an existing CT-based volumetric segmentation tool (TotalSegmentator (TS)) Wasserthal et al. (2022)), was used to segment 8 pelvic structures in the synthetic CT volumes. The segmentation masks for each slice were mapped back to the original T2 MRI volumes. Through experiments, we empirically demonstrated the feasibility and effectiveness of this method.

We investigated several state-of-the-art image synthesis methods, such as CycleGAN (Zhu et al. (2017)), Pix2Pix(Isola et al. (2017)), and a more recent diffusion-based approach called SynDiff (Özbey et al. (2022)). These algorithms were trained on the Gold Atlas Male Pelvis dataset (Nyholm et al. (2018)) containing paired T2 MRI and CT volumes. Validation was conducted on two other T2 MRI datasets: the public TCGA-UCEC dataset consisting of female patients, and an internal dataset consisting of male patients imaged. TS segmented the 8 pelvic structures in each of the translated MR-to-CT volumes, and the segmentation masks were mapped back to the original T2 MRI volumes. The output of our pipeline was compared against ground truth annotations manually segmented by an expert. Experimental results demonstrated a mean Dice coefficient of 71.6% on the TCGA-UCEC dataset and 75.6% on the internal dataset, respectively. In addition, comparison studies implied that the diffusion-based model (e.g., SynDiff) achieved the best performance when training and testing data are similar, while the traditional paired GAN model, e.g., Pix2Pix, outperformed other methods when testing on the out-of-distribution data with disease conditions (e.g., TCGA-UCEC dataset). These results indicate the feasibility of the proposed annotation-free pipeline as well as its robustness to out-of-distribution data.

Our contributions are three-fold: 1) leveraging current image translation approaches and the progress in CT-based segmentation to segment 8 pelvic structures in T2 MRI without annotating any MRI data; 2) providing gender-agnostic segmentation of pelvic structures; and 3) making the manual annotations for the TCGA-UCEC dataset publicly available for other researchers1. Notably, our work allows us to circumvent the established annotation-dependent route for segmentation tasks and provides a novel perspective to address inter-modality segmentation challenges.

2. Related Works

Multi-structure Segmentation in the Pelvis.

Multi-structure segmentation in CT are mainly explored in CT modalities. Existing segmentation tools (Wasserthal et al. (2022); Ji et al. (2023)) contains are capable of segmenting pelvic muscles and bone structures for CT. In addition, Balagopal et al. (2018) presented a 3D U-Net based deep learning model to segment prostate, bladder, rectum, and femoral heads in CT images. Liu et al. (2021a) presented a nnU-Net based multi-class segmentation network for segmenting bony structures in the pelvis. In MRI, multi-structure segmentation of the pelvis is under-explored. One major work was performed in Nyholm et al. (2018), in where the authors present a dataset of 19 co-registered male pelvic MR and CT scans were released as part of the Gold Atlas project along with the labels of several structures (e.g., rectum, bladder, prostate), yet it still lacked labels for neighboring muscles and bones. More recently, Li et al. (2022b) recently proposed a registration-assisted prototypical learning algorithm including a novel image alignment module to segment several important pelvis structures such as bladder, bone, and prostate, together with a multi-institution pelvic dataset for male prostate. However, the pelvic images were cropped to the prostate region in such a way that other larger important pelvic bone and muscle structures were unavailable to segment. Moreover, Liu et al. (2021b) proposed a deep learning-based approach using 3D U-Net to automatically segment pelvic bony structures for patients with prostate cancer on an internal MRI dataset. To the best of our knowledge, we are not aware of any public unavailable MRI datasets of the pelvis that contain segmentation labels for hips, sacrum, gluteus maximus, gluteus medius, gluteus minimus, and iliopsoas muscles.

Image Synthesis-assisted Segmentation.

Image synthesis-assisted segmentation was more common for tumor segmentation, especially for the brain in neuroimaging. For example, several works has been proposed to segment abnormalities by translating images from diseased to healthy or vice versa (Andermatt et al. (2019); Vorontsov et al. (2022)). Vorontsov et al. (2022) proposed a semi-supervised model that synthesizes images from diseased to healthy and from healthy to diseased to derive a residual image for locating and segmenting the tumor. Zhao et al. (2017) proposed to segment gray matter and white matter in the brain in CT images, which first used a deep learning network to translate non-contrast CT to synthetic MR. Then a standard brain segmentation pipeline was utilized to segment and label the whole brain on synthetic MR, because CT images had poor the poor soft tissue contrast and were difficult to segment. In addition, Yu et al. (2021) developed a GAN-based framework to synthesize multi-modality MRI sequences from a single MRI modality and fuse them using a deep learning network, in order to perform semantic segmentation for the mouse brain. Nevertheless, the image synthesis-assisted segmentation of the pelvis is largely under-examined.

3. Methods

Fig. 1 illustrates the proposed annotation-free pipeline that consists of two steps. In the first step, T2 MRI volumes are synthetically translated into CT using image-to-image translation techniques. The second step uses the CT-based tool TotalSegmentator to segment the 8 pelvic structures in each synthetic CT volume; these segmentation masks were mapped back to the original T2 MRI volume.

Figure 1:

Figure 1:

The proposed annotation-free pipeline for the segmentation of 8 gender-agnostic pelvic structures in T2 MRI volumes such as: hip (red), sacrum (light green), femur (blue), urinary bladder (yellow), gluteus maximus (cyan), gluteus medius (purple), gluteus minimus (brown), and iliopsoas (dark green). First, an image-to-image translation approach (e.g. Pix2Pix) translated each slice in a T2 MRI volume into a synthetic CT slice, and the slices were aggregated into a synthetic 3D CT volume. Next, a CT-based volumetric segmentation tool (TotalSegmentator) segmented the 8 gender-agnostic pelvic structures in the synthetic CT volume. The segmentation masks for each slice were mapped back to original T2 MRI volume.

3.1. MR-to-CT translation

Image-to-image translation converts the input image in the source domain (e.g., T2 MR image) to the output image in the target domain (e.g., synthetic CT image). The translation process can be achieved using conventional generative adversarial networks (GANs) such as CycleGAN (Zhu et al. (2017)), Pix2Pix (Isola et al. (2017)), and their variants (Zhao et al. (2023); Kalantar et al. (2021); Wang et al. (2021); Yi et al. (2019)). Translation can also be achieved through recent diffusion models(Ho et al. (2020); Özbey et al. (2022)). We investigated three mainstream image-to-image algorithms for MR-to-CT translation: CycleGAN, Pix2Pix, and a recent diffusion-based image synthesis model called SynDiff (Özbey et al. (2022)).

Pix2Pix is a conditional GAN model for image-to-image translation, which consists of a U-Net generator (Ronneberger et al. (2015)), a PatchGAN-based discriminator (Li and Wand (2016)), and a loss function that combines the traditional adversarial loss and L1 loss as a regularizer to reduce image blurring. One key limitation of the Pix2Pix model is that it requires paired image data to train the model. To address this issue, CycleGAN was proposed to perform image-to-image translation for unpaired data. CycleGAN had 2 conditional GANs, where each conditional GAN included a generator module and a discriminator module. Given a pair of unpaired input images from two different domains, one conditional GAN used the image from the first domain to synthesize an image in the second domain. The other conditional GAN performed cyclic translation of the input image from second domain into the output image in the first domain. The CycleGAN model was trained with a combination of traditional adversarial loss and the cycle consistency loss, which ensured that the mapping from one domain to another and its inverse were meaningful.

More recently, a diffusion-based generative model (Ho et al. (2020)) has attracted considerable attention because of its ability to provide realistic state-of-the-art image synthesis results (Dhariwal and Nichol (2021)). SynDiff (Özbey et al. (2022)) learned a mapping function that translated images from MRI domain to target CT domain through a denoising diffusion model. SynDiff utilized a combination of a generative adversarial network and diffusion model to perform MR-to-CT translation. During the forward diffusion process, it used larger time steps and correspondingly applied greater amounts of noise to the target image (CT). As this process broke the normality assumption during the reverse denoising process, SynDiff employed an adversarial diffusion projector to accurately map between the source and target images. Conditioned on the source image (T2 MRI), a diffusive generator synthesized a denoised image sample at each time step for the source image. A diffusive discriminator distinguished between the synthetic denoised sample and the actual denoised sample at a particular time step. In the experiment section, we investigated different image synthesis methods to show the proposed pipeline was generic and robust to different image synthesis methods. We refer interested readers to their respective work for additional details (Zhu et al. (2017); Isola et al. (2017); Özbey et al. (2022)).

3.2. Segmentation of Synthesized CT

Each slice in the T2 MRI volume was translated into CT, and the slices were then stacked into a 3D volume. TS was executed on the CT volume to obtain the segmentation masks for 104 structures, including 27 organs, 59 bones, 10 muscles, and 8 vessels. The underlying model in TS is a nnUNet (Isensee et al. (2021)) trained on a diverse dataset of 1204 CT examinations, spanning a variety of pathologies, scanners, populations, sequences, imaging sites among others. For this pilot study, we were interested in the segmentation of 8 important anatomical structures of the pelvis: hip, sacrum, femur, urinary bladder, gluteal muscles (minimus, maximus, medius), and ilio-psoas. These structures are present in all humans regardless of their gender. Therefore, we only extract the segmentation masks of the 8 pelvic structures of interest as the output of the proposed pipeline.

4. Experiments

4.1. Dataset

The Gold Atlas Male Pelvis dataset (Nyholm et al. (2018)) was used to train image-to-image translation models. It consisted of 19 co-registered T2 MRI and CT volumes of the pelvis for 19 patients, along with the corresponding segmentations of 8 structures. The CT data were collected from 3 medical centers using 3 different scanners. The detailed image acquisition information of the dataset is summarized in Table 1. In this dataset, 8 structures were available as segmentation labels: urinary bladder, rectum, anal canal, penile bulb, neurovascular bundles, femoral heads, prostate, seminal vesicles. Other than the bladder, the remaining structures were specific to male patients. We targeted 8 gender-agnostic pelvic structures: hip, sacrum, femur, urinary bladder, gluteal muscles (maximus, medius, minimus), and iliopsoas. As segmentation masks were unavailable for most of these pelvic structures, we used the T2 MRI and CT volumes in this dataset solely for the purposes of training our MR-to-CT translator.

Table 1:

Datasets used in this study

Dataset Modality Scanner Voxel resolution (mm) Number of subjects
x y z
The Gold Atlas Male Pelvis CT Siemens Somatom, Toshiba Aquilion Siemens Emotion 0.9–1.0 0.9–1.0 2.0–3.0 19
MR GE Discovery with FRFSE, Siemens with TSE GE Signa with FRFSE 0.9–1.0 0.9–1.0 2.5
TCGA-UCEC MR Siemens scanners with TSE 0.8–1.1 0.8–1.1 5.2 30
In-house MR Philips and Siemens scanners with TSE 0.3–1.6 0.3–1.6 3.0–7.2 6

For testing, we used two distinct datasets: an external publicly available TCGA-UCEC dataset (Erickson et al. (2016)) and an internal dataset from our institution. The usage of the internal dataset was approved by the institutional IRB and the need for signed consent was waived. The TCGA-UCEC dataset consisted of 30 female subjects with endometrial carcinoma. The internal dataset contained 6 male subjects. Image acquisition information for these two testing datasets is given in Table 1. The pelvic structures of interest in these two testing datasets were manually delineated by a postdoctoral trainee and reviewed by a senior board-certified radiologist.

4.2. Experimental Setup

To evaluate the proposed pipeline, we experimented with 4 different MR-to-CT image synthesis methods: CycleGAN, Pix2Pix, SynDiff paired (SynDiff-P), and SynDiff unpaired (SynDiff-U). In the Pix2Pix and SynDiff-P experiment, the training images were paired, while in the CycleGAN and SynDiff-U experiments, the training images were shuffled and unpaired. We tested SynDiff in the paired images setup (i.e.,SynDiff-P) to examine the effect of utilizing paired data on model performance as the paired data will allow the SynDiff model to learn the mapping function more accurately. To evaluate the segmentation performance, we utilized the Dice Similarity Coefficient (DSC), relative absolute volume difference (RAVD), and average symmetric surface distance (ASSD) metrics.

4.3. Implementation

In the training phase, each 2D slice in the 3D T2 MRI volume was extracted and resized to 256×256 pixels. The CT image was pre-processed by applying a windowing operation with a level of 40 and a width of 400. Each 2D CT slice was normalized to the range of [0, 1], and resized to 256×256 pixels. In the testing phase, each 2D MR slice was extracted, resized to 180×180 pixels, and padded to 256×256 pixels. The MR 2D image in both training and testing was normalized to the range of [0, 1]. Then, the image synthesis models synthetically translated the 2D MRI slice into a 2D CT slice. The set of synthetically generated 2D CT slices were then stacked into a 3D volume. TotalSegmentator was executed on the synthetic 3D CT volume. TS generated the corresponding segmentation masks for the 8 pelvic structures. At test time, the latest checkpoint was selected for inference. All the comparative methods, such as CycleGAN, Pix2Pix and SynDiff models, were implemented in PyTorch. The learning rates for CycleGAN, Pix2Pix, and SynDiff were 5e−4, 2e−4, and 1e−4 respectively. Adam optimizer was used for training all models. The models were trained for 300 epochs using a 40 GB A100 GPU on a NVIDIA DGX station.

5. Results

Fig. 2 and 3 show a few examples of MR-to-CT synthesis results using the Pix2Pix, CycleGAN, SynDiff-P, and SynDiff-U models for the TCGA-UCEC dataset and the internal dataset. Qualitatively, CycleGAN had the worst performance when compared with other methods, especially in translating bony structures in both datasets. Other methods, including the unpaired method SynDiff-U, achieved better synthesis results. In addition, in Fig. 3, the bladder was poorly synthesized by CycleGAN, while SynDiff-U and SynDiff-P were able to translate the bladder correctly.

Figure 2:

Figure 2:

Translation of T2 MRI images to CT images for a subject in the external TCGA-UCEC dataset. Column 1 shows original T2 MRI images. Column 2 – 4 show translation results from Pix2Pix. CycleGAN, SynDiff paired (SynDiff-P) and unpaired (SynDiff-U) respectively.

Figure 3:

Figure 3:

Translation of T2 MRI images to CT images for a subject in the internal NIH dataset. Column 1 shows original T2 MR images. Column 2 – 4 show translation results from Pix2Pix. CycleGAN, SynDiff paired (SynDiff-P) and unpaired (SynDiff-U) respectively.

Table 2 shows the segmentation performance of proposed pipeline using different MR-to-CT image-to-image translation approaches. For the external TCGA-UCEC dataset, Pix2Pix, CycleGAN, SynDiff-U and SynDiff-P achieved average Dice scores of 70.9 ± 8.8%, 62.0 ± 11.7%, 68.0 ± 9.2%, and 68.7 ± 8.7% respectively for all 8 pelvic structures respectively. For the internal dataset, Pix2Pix, CycleGAN, SynDiff-U and SynDiff-P achieved the average Dice scores of 67.1 ± 12.2%, 43.4 ± 23.1%, 73.7 ± 8.0%, and 74.3 ± 7.0% for all 8 structures respectively. Pix2Pix outperformed other approaches on the TCGA-UCEC dataset, while SynDiff performed the best on the internal dataset for all structures except for the femur. Due to SynDiff and Pix2Pix being trained on the male pelvis dataset, they performed better on the male-only internal dataset compared to the female-only TCGA-UCEC dataset. Table 3 and 4 demonstrate results that confirmed these observations.

Table 2:

Mean ± Standard Deviation of Dice score (%) for CycleGAN, SynDiff paired (SynDiff-P), and SynDiff unpaired (Syndiff-U) for the external TCGA-UCEC dataset and internal dataset. The highest scores for each dataset are highlighted in bold font.

Structure TCGA-UCEC Internal
CycleGAN Pix2Pix SynDiff-P SynDiff-U CycleGAN Pix2Pix SynDiff-P SynDiff-U

Hip 67.3 ± 7.9 73.6 ± 3.6 72.2 ± 5.1 72.3 ± 5.3 36.5 ± 21.7 75.7 ± 6.1 77.7 ± 2.1 77.7 ± 1.9
Sacrum 57.2 ± 11.4 66.5 ± 8.6 63.4 ± 7.8 63.2 ± 8.9 30.9 ± 18.7 72.8 ± 7.3 75.2 ± 4.1 76.1 ± 3.7
Femur 75.9 ± 5.8 80.6 ± 3.6 80.1 ± 3.0 80.0 ± 4.1 64.5 ± 29.1 78.6 ± 8.1 77.9 ± 10.4 75.2 ± 17.9
Urinary bladder 34.9 ± 24.1 41.7 ± 18.5 34.8 ± 18.9 33.2 ± 19.8 33.6 ± 21.8 39.6 ± 25.0 66.2 ± 14.8 65.0 ± 15.1
Gluteus maximus 77.6 ± 4.8 89.2 ± 5.0 85.9 ± 4.0 85.4 ± 4.3 61.7 ± 27.8 84.6 ± 4.2 86.3 ± 1.2 86.0 ± 1.9
Gluteus medius 63.0 ± 13.8 77.7 ± 6.9 75.6 ± 8.6 73.8 ± 8.6 54.7 ± 19.7 66.3 ± 22.0 77.6 ± 2.4 76.9 ± 3.2
Gluteus minimus 51.6 ± 12.8 65.1 ± 9.5 64.4 ± 8.0 63.1 ± 7.8 25.2 ± 21.3 55.8 ± 13.6 65.4 ± 3.6 64.9 ± 4.5
Iliopsoas 72.0 ± 11.8 78.4 ± 11.8 76.4 ± 11.8 76.6 ± 12.6 46.0 ± 23.7 70.7 ± 9.1 79.2 ± 4.0 78.8 ± 3.7

Table 3:

Mean ± Standard Deviation of Average Asymmetric Surface Distance (ASSD) (mm) for CycleGAN, SynDiff paired (SynDiff-P), and SynDiff unpaired (SynDiff-U) for TCGA-UCEC dataset and internal dataset. Lower ASSD scores are better and the lowest scores for each dataset are highlighted in bold font.

Structure TCGA-UCEC Internal
CycleGAN Pix2Pix SynDiff-P SynDiff-U CycleGAN Pix2Pix SynDiff-P SynDiff-U

Hip 3.3 ± 2.1 2.0 ± 0.6 2.2 ± 0.7 2.2 ± 0.7 19.5 ± 23.7 2.0 ± 0.2 2.1 ± 0.2 2.1 ± 0.2
Sacrum 4.2 ± 2.1 2.2 ± 1.0 2.6 ± 0.9 2.8 ± 1.1 17.5 ± 21.3 1.8 ± 0.6 1.5 ± 0.3 1.5 ± 0.2
Femur 3.9 ± 2.6 2.1 ± 0.7 2.1 ± 0.4 2.1 ± 0.6 29.2 ± 57.9 2.0 ± 0.4 2.0 ± 0.4 2.2 ± 0.8
Urinary bladder 11.6 ± 7.1 7.1 ± 3.3 9.5 ± 5.4 9.2 ± 4.6 8.0 ± 3.0 10.0 ± 5.5 5.0 ± 2.2 4.8 ± 1.9
Gluteus maximus 3.7 ± 0.8 1.7 ± 1.1 2.2 ± 0.6 2.4 ± 0.7 4.8 ± 0.7 2.7 ± 0.7 2.5 ± 0.4 2.6 ± 0.5
Gluteus medius 6.3 ± 6.4 3.0 ± 1.2 3.4 ± 1.7 3.8 ± 1.8 9.2 ± 6.6 4.9 ± 2.7 3.4 ± 0.4 3.4 ± 0.4
Gluteus minimus 4.5 ± 3.0 2.9 ± 1.4 3.0 ± 1.3 3.1 ± 1.2 27.9 ± 35.6 4.8 ± 1.6 3.5 ± 0.6 3.5 ± 0.7
Iliopsoas 3.2 ± 3.2 2.1 ± 1.9 2.4 ± 2.3 2.7 ± 3.3 12.0 ± 13.8 3.3 ± 1.0 2.3 ± 0.3 2.4 ± 0.3

Table 4:

Mean ± Standard Deviation of Relative Absolute Volume Difference (RAVD) (%) for CycleGAN, SynDiff paired (SynDiff-P), and SynDiff unpaired (SynDiff-U) for TCGA- UCEC dataset and internal dataset. Lower RAVD scores are better, and the lowest scores for each dataset are highlighted in bold font.

Strutcure TCGA-UCEC Internal
CycleGAN Pix2Pix SynDiff-P SynDiff-U CycleGAN Pix2Pix SynDiff-P SynDiff-U

Hip 46.1 ± 18.6 43.9 ± 16.1 50.1 ± 17.2 47.4 ± 17.0 60.6 ± 24.4 24.1 ± 11.7 36.0 ± 8.7 36.9 ± 10.0
Sacrum 22.8 ± 19.7 14.3 ± 8.9 17.1 ± 10.7 17.7 ± 16.2 68.6 ± 22.1 15.2 ± 10.2 5.8 ± 5.2 8.3 ± 5.4
Femur 18.5 ± 15.5 15.3 ± 9.3 17.0 ± 10.5 16.0 ± 7.6 26.3 ± 10.3 14.2 ± 8.3 18.8 ± 12.0 24.4 ± 22.6
Urinary bladder 448.5 ± 704.7 216.9 ± 285.7 334.5 ± 472.0 309.0 ± 463.3 50.7 ± 28.5 46.8 ± 28.0 84.0 ± 94.9 63.6 ± 79.3
Gluteus maximus 18.5 ± 8.3 11.8 ± 6.8 14.5 ± 6.5 16.6 ± 7.5 43.5 ± 26.6 20.3 ± 09.3 18.9 ± 5.2 19.5 ± 5.2
Gluteus medius 33.9 ± 17.7 18.3 ± 10.7 23.4 ± 14.1 27.6 ± 13.3 45.1 ± 26.8 29.6 ± 25.8 13.4 ± 7.9 19.2 ± 6.2
Gluteus minimus 32.4 ± 16.4 24.8 ± 23.3 25.8 ± 14.7 25.9 ± 11.2 73.4 ± 20.7 40.6 ± 18.5 26.8 ± 8.7 27.6 ± 10.4
Iliopsoas 20.6 ± 13.1 13.8 ± 8.4 14.7 ± 11.7 16.6 ± 11.7 61.7 ± 22.9 36.0 ± 10.5 21.7 ± 8.4 22.3 ± 8.6

CycleGAN displayed inferior segmentation performance for both datasets, which indicated that the MR-to-CT translation in the first step was poor. In particular, CycleGAN failed to synthesize the corresponding CT volume for one subject in the internal dataset, and consequently TS failed to generate the pseudo-segmentation labels. This led to a large standard deviation in the Dice scores. SynDiff-P showed a smaller spread in the segmentation performance for the internal dataset in contrast to Pix2Pix. However, the spread was comparable between the two methods for the external TCGA-UCEC dataset with a ≤4% difference in segmentation Dice scores. Figs. 4 and 5 show examples of segmentation results for one female subject from the TCGA-UCEC dataset and one male subject from the internal dataset respectively. Box plots in Fig. 6 showed the distribution of Dice scores of each subjects for the internal dataset and the TCGA-UCEC dataset. It demonstrated that the bony and muscle structures has a small spread compared to the bladder. This is consistent with the results reported in Table 2.

Figure 4:

Figure 4:

Segmentation of 8 pelvic structures for a female subject from the external TCGA-UCEC dataset. Three representative levels through the pelvis are shown (rows). Hip (red), sacrum (light green), femur (blue), urinary bladder (yellow), gluteus maximus (cyan), gluteus medius (purple), gluteus minimus (brown), and iliopsoas (dark green) segmentations are seen.

Figure 5:

Figure 5:

Segmentation results for a male subject from the internal dataset. Hip (red), sacrum (light green), femur (blue), urinary bladder (yellow), gluteus maximus (cyan), gluteus medius (purple), gluteus minimus (brown), and iliopsoas (dark green) segmentations are seen.

Figure 6:

Figure 6:

Box plots of Dice scores showing the segmentation performance of all 4 models for various structures in the pelvis. Here, HIP: hip, SACRM: sacrum, FEMUR: femur, UBLDR: urinary bladder, GLMAX: gluteus maximus, GLMED: gluteus medius, GLMIN: gluteus minimus, ISOAS: iliopsoas.

6. Discussion

In this study, we proposed a lightweight and annotation-free pipeline to synthetically translate T2 MRI volumes into CT, and subsequently leveraged TotalSegmentator to segment 8 pelvic structures in the synthetic CT volume in a gender-agnostic manner. The translators were trained on a male-specific pelvis dataset, but the pipeline was tested on two distinct datasets: an internal dataset (males only) and an external TCGA-UCEC public dataset (females only). These were two out-of-distribution datasets due to the differences in genders, different populations having disparate underlying diseases, scanners, and exam acquisition protocols. Experiments conducted in our study demonstrated that SynDiff and Pix2Pix were more resilient to the domain shift between training and testing datasets. They were able to achieve consistent translation performance for most pelvic structures (7/8), except for the urinary bladder. Our results demonstrated that the proposed framework achieved a median DSC of 76.7% across all 8 structures.

It is worth noting that image-to-image translation results have an impact on the segmentation performance. Thus, we evaluated and validated the proposed pipeline’s performance with different image synthesis methods including multiple mainstream methods such as cycleGAN, Pix2Pix, and the most recent diffusion-based approach SynDiff on two external testing datasets, to demonstrate the proposed pipeline’s generalization ability and robustness. Since these two testing datasets have no paired (and co-registered) CT data available, directly assessing the quality of synthetic CT images was difficult as a result. However, segmentation tasks in our study can serve as a proxy to measure the learned synthetic translation correspondence between T2 MRI and CT volumes. In other words, higher Dice scores between ground truth annotations and segmentation results indicated better MR-to-CT translation and greater semantic correspondence between them. Therefore, Table 2 demonstrated that Pix2Pix and SynDiff models outperformed CycleGAN by a large margin for both testing datasets, showing the fact that Pix2Pix and SynDiff models were more robust for out-of-distribution testing data. CycleGAN encountered some difficulties in translating MR to CT. For example, as shown in Fig. 2 and 3, the CycleGAN model was prone to synthesize soft tissues into bony structures. The problematic translation consequently led to poor segmentation performance by TS. Furthermore, one should note that the underlying difference between training and testing data (e.g.,the anatomy differences in genders, different populations having disparate underlying diseases, scanners, and exam acquisition protocols) posed extra challenges for these image-to-image translation methods. Nevertheless, the DSC scores achieved by Pix2Pix and SynDiff illustrated that both models were capable of learning the information regarding most of gender-agnostic structures (e.g., bones and muscles) regardless of the difference in anatomy, health condition, and data acquisition difference, except for the bladder. Segmentation of bladder was relatively poor, particularly on the TCGA-UCEC dataset. Multiple reasons contributed to its poor performance. First, the location of the bladder was more affected by gender when compared with other structures. The MR-to-CT models were trained on the Male Pelvis dataset (male patients only), while the TCGA-UCEC dataset consisted entirely of female patients with endometrial cancer. Anatomical differences between the sexes detrimentally affected the performance of MR-to-CT translation for all the generative models (as seen by the Dice scores by TS in Table 1). Second, the subjects in the TCGA-UCEC dataset had endometrial cancer, and enlargement of the endometrium or uterus can result in the displacement and distortion of the bladder. Adding female patients to the training set of the MR-to-CT model would likely remedy this problem, but such datasets are currently not publicly available. Our future work will include more female patients. In addition, in contrast to SynDiff and CycleGAN which could be trained on unpaired data, the Pix2Pix model required a paired dataset for effective training, which could limit its potential application. One may need to use the SynDiff model when only unpaired training data are unavailable.

As mentioned previously, a substantial focus of current research has been targeted towards CT-based multi-structure segmentation tasks, such as abdominal multi-organ segmentation (Ji et al. (2022, 2023)). However, due to a limited number of datasets and available annotation/labels, similar segmentation tasks in their MRI counterpart remain underexplored. Taking our case as an example, segmentation of multiple pelvic structures on CT has been explored in depth (Wasserthal et al. (2022); Ji et al. (2023)); to the best of our knowledge, there are no such counterparts for MRI. The proposed annotation-free pipeline in this work paves a new path to circumvent this issue by exploiting the progress made in CT-based segmentation tasks. Our pilot study empirically demonstrated that knowledge of pixel-level annotations from the CT domain can be effectively transferred to the MR domain using image-to-image translation and an existing CT-only segmentation tool. In addition, we showed that multiple mainstream image-to-image translation models such Pix2Pix, CycleGAN, and SynDiff can be utilized well beyond a specific MR-to-CT image synthesis method. In principle, our pipeline can be extended to other body regions such as abdomen and chest and other MRI sequences such as T1 and FLAIR as well. Our method may reduce the burden of creating large-scale annotated MRI datasets, and facilitate the development of AI tools for multi-organ segmentation on MRI.

Highlights.

  • A segmentation-by-translation pipeline to segment 8 pelvic structures in T2 MRI

  • The segmentation process requires no annotation

  • The proposed pipeline was evaluated with state-of-the-art image synthesis methods

  • A comprehensive evaluation of segmentation-by-translation pipeline was conducted

Acknowledgments

This work was supported by the Intramural Research Program of the National Institutes of Health (NIH) Clinical Center (project number 1Z01 CL040004).

Footnotes

Declaration of Competing Interest

The authors declare the following financial interests or personal relationships which may be considered as potential competing interests: RMS receives royalties from iCAD, Philips, ScanMed, PingAn, and Translation Holdings and has received research support from Ping An (CRADA). Authors YZ, TSM, and PM declare that they have no known competing financial interests or personal relationships.

Declaration of interests

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Ronald M. Summers reports a relationship with Ping An (CRADA) that includes: funding grants.

Co-author RMS receives royalties from iCAD, Philips, ScanMed, PingAn, and Translation Holdings.

CRediT authorship contribution statement

Yan Zhuang: Conceptualization, Data Curation, Methodology, Software, Investigation, Writing - original draft preparation. Tejas Sudharshan Mathai: Conceptualization, Writing – review and editing. Pritam Mukherjee: Conceptualization, Writing – review and editing. Ronald M. Summers: Conceptualization; Clinical advice, Project administration, Writing – review and editing, Supervision.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Andermatt S, Horváth A, Pezold S, Cattin P, 2019. Pathology segmentation using distributional differences to images of healthy origin, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part I 4, Springer. pp. 228–238. [Google Scholar]
  2. Balagopal A, Kazemifar S, Nguyen D, Lin MH, Hannan R, Owrangi A, Jiang S, 2018. Fully automated organ segmentation in male pelvic ct images. Physics in Medicine & Biology 63, 245015. [DOI] [PubMed] [Google Scholar]
  3. Boutin RD, Yao L, Canter RJ, Lenchik L, 2015. Sarcopenia: Current concepts and imaging implications. American Journal of Roentgenology 205, W255–W266. URL: 10.2214/AJR.15.14635, doi: 10.2214/AJR.15.14635, arXiv: 10.2214/AJR.15.14635. pMID: 26102307. [DOI] [PubMed] [Google Scholar]
  4. Dhariwal P, Nichol A, 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34, 8780–8794. [Google Scholar]
  5. Dirix P, Haustermans K, Vandecaveye V, 2014. The value of magnetic resonance imaging for radiotherapy planning. Seminars in Radiation Oncology 24, 151–159. doi: 10.1016/j.semradonc.2014.02.003. magnetic Resonance Imaging in Radiation Oncology. [DOI] [PubMed] [Google Scholar]
  6. Doyle SL, Donohoe CL, Lysaght J, Reynolds JV, 2012. Visceral obesity, metabolic syndrome, insulin resistance and cancer. Proceedings of the Nutrition Society 71, 181–189. doi: 10.1017/S002966511100320X. [DOI] [PubMed] [Google Scholar]
  7. Erickson B, Mutch D, Lippmann L, Jarosz R, 2016. The cancer genome atlas uterine corpus endometrial carcinoma collection (tcga-ucec). The Cancer Imaging Archive. [Google Scholar]
  8. Greenspan H, Van Ginneken B, Summers RM, 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE transactions on medical imaging 35, 1153–1159. [Google Scholar]
  9. Ho J, Jain A, Abbeel P, 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33, 6840–6851. [Google Scholar]
  10. Huang R, Zheng Y, Hu Z, Zhang S, Li H, 2020. Multi-organ segmentation via co-training weight-averaged models from few-organ datasets, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 146–155. [Google Scholar]
  11. Huber FA, Grande FD, Rizzo S, Guglielmi G, Guggenberger R, 2020. MRI in the assessment of adipose tissues and muscle composition: how to use it. Quantitative Imaging in Medicine and Surgery 10. URL: https://qims.amegroups.com/article/view/38300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH, 2021. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18, 203–211. [DOI] [PubMed] [Google Scholar]
  13. Isola P, Zhu JY, Zhou T, Efros AA, 2017. Image-to-image translation with conditional adversarial networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. [Google Scholar]
  14. Ji Y, Bai H, Yang J, Ge C, Zhu Y, Zhang R, Li Z, Zhang L, Ma W, Wan X, et al. , 2022. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv preprint arXiv:2206.08023. [Google Scholar]
  15. Ji Z, Guo D, Wang P, Yan K, Ge J, Ye X, Xu M, Zhou J, Lu L, Gao M, et al. , 2023. Continual segment: Towards a single, unified and accessible continual segmentation model of 143 whole-body organs in CT scans. arXiv preprint arXiv:2302.00162. [Google Scholar]
  16. Jones KI, Doleman B, Scott S, Lund JN, Williams JP, 2015. Simple psoas cross-sectional area measurement is a quick and easy method to assess sarcopenia and predicts major surgical complications. Colorectal Disease 17, O20–O26. doi: 10.1111/codi.12805. [DOI] [PubMed] [Google Scholar]
  17. Kalantar R, Messiou C, Winfield JM, Renn A, Latifoltojar A, Downey K, Sohaib A, Lalondrelle S, Koh DM, Blackledge MD, 2021. CT-based pelvic t1-weighted MR image synthesis using unet, unet++ and cycle-consistent generative adversarial network (cycle-gan). Frontiers in Oncology 11, 665807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Li C, Wand M, 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks, in: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14, Springer. pp. 702–716. [Google Scholar]
  19. Li Y, Fu Y, Yang Q, Min Z, Yan W, Huisman H, Barratt D, Prisacariu VA, Hu Y, 2022a. Few-shot image segmentation for cross-institution male pelvic organs using registration-assisted prototypical learning, in: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1–5. [Google Scholar]
  20. Li Y, Fu Y, Yang Q, Min Z, Yan W, Huisman H, Barratt D, Prisacariu VA, Hu Y, 2022b. Few-shot image segmentation for cross-institution male pelvic organs using registration-assisted prototypical learning, in: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1–5. [Google Scholar]
  21. Liu P, Han H, Du Y, Zhu H, Li Y, Gu F, Xiao H, Li J, Zhao C, Xiao L, et al. , 2021a. Deep learning to segment pelvic bones: large-scale ct datasets and baseline models. International Journal of Computer Assisted Radiology and Surgery 16, 749–756. [DOI] [PubMed] [Google Scholar]
  22. Liu Q, Dou Q, Heng PA, 2020. Shape-aware meta-learning for generalizing prostate MRI segmentation to unseen domains, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23, Springer. pp. 475–485. [Google Scholar]
  23. Liu X, Han C, Cui Y, Xie T, Zhang X, Wang X, 2021b. Detection and segmentation of pelvic bones metastases in mri images for patients with prostate cancer based on deep learning. Frontiers in Oncology 11, 773299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lopez P, Newton RU amd Taaffe D, 2022. Associations of fat and muscle mass with overall survival in men with prostate cancer: a systematic review with meta-analysis. Prostate Cancer Prostatic Dis 25, 615–626. URL: 10.1038/s41391-021-00442-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nyholm T, Svensson S, Andersson S, Jonsson J, Sohlin M, Gustafsson C, Kjellén E, Söderström K, Albertsson P, Blomqvist L, Zackrisson B, Olsson LE, Gunnlaugsson A, 2018. MR and CT data with multi-observer delineations of organs in the pelvic area—part of the gold atlas project. Medical Physics 45, 1295–1300. doi: 10.1002/mp.12748. [DOI] [PubMed] [Google Scholar]
  26. Nüchtern Je.a., 2015. Significance of clinical examination, CT and MRI scan in the diagnosis of posterior pelvic ring fractures. Injury 46, 315–319. [DOI] [PubMed] [Google Scholar]
  27. Otero-García M, Mesa-Álvarez A, Nikolic O, 2019. Role of mri in staging and follow-up of endometrial and cervical cancer: pitfalls and mimickers. Insights Imaging 10. URL: 10.1186/s13244-019-0696-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Özbey M, Dar SU, Bedel HA, Dalmaz O, Özturk S¸., Güngör A, Çukur T, 2022. Unsupervised medical image translation with adversarial diffusion models. arXiv preprint arXiv:2207.08208. [DOI] [PubMed] [Google Scholar]
  29. Ronneberger O, Fischer P, Brox T, 2015. U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, Springer. pp. 234–241. [Google Scholar]
  30. Vorontsov E, Molchanov P, Gazda M, Beckham C, Kautz J, Kadoury S, 2022. Towards annotation-efficient segmentation via image-to-image translation. Medical Image Analysis 82, 102624. [DOI] [PubMed] [Google Scholar]
  31. Wang T, Lei Y, Fu Y, Wynne JF, Curran WJ, Liu T, Yang X, 2021. A review on medical imaging synthesis using deep learning and its clinical applications. Journal of applied clinical medical physics 22, 11–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wasserthal J, Meyer M, Breit HC, Cyriac J, Yang S, Segeroth M, 2022. Totalsegmentator: robust segmentation of 104 anatomical structures in CT images. arXiv preprint arXiv:2208.05868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Yi X, Walia E, Babyn P, 2019. Generative adversarial network in medical imaging: A review. Medical image analysis 58, 101552. [DOI] [PubMed] [Google Scholar]
  34. Yu Z, Zhai Y, Han X, Peng T, Zhang XY, 2021. Mousegan: Gan-based multiple mri modalities synthesis and segmentation for mouse brain structures, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer. pp. 442–450. [Google Scholar]
  35. Zhao B, Cheng T, Zhang X, Wang J, Zhu H, Zhao R, Li D, Zhang Z, Yu G, 2023. CT synthesis from MR in the pelvic area using residual transformer conditional gan. Computerized Medical Imaging and Graphics 103, 102150. [DOI] [PubMed] [Google Scholar]
  36. Zhao C, Carass A, Lee J, He Y, Prince JL, 2017. Whole brain segmentation and labeling from ct using synthetic mr images, in: Machine Learning in Medical Imaging: 8th International Workshop, MLMI 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 10, 2017, Proceedings 8, Springer. pp. 291–298. [Google Scholar]
  37. Zhou Y, Li Z, Bai S, Wang C, Chen X, Han M, Fishman E, Yuille AL, 2019. Prior-aware neural network for partially-supervised multi-organ segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10672–10681. [Google Scholar]
  38. Zhu JY, Park T, Isola P, Efros AA, 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. [Google Scholar]

RESOURCES