Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: IEEE Trans Med Imaging. 2020 Nov 30;39(12):4071–4084. doi: 10.1109/TMI.2020.3011626

PSIGAN: Joint probabilistic segmentation and image distribution matching for unpaired cross-modality adaptation based MRI segmentation

Jue Jiang 1, Yu Chi Hu 1, Neelam Tyagi 1, Andreas Rimner 2, Nancy Lee 2, Joseph O Deasy 3, Sean Berry 3, Harini Veeraraghavan 3
PMCID: PMC7757913  NIHMSID: NIHMS1650696  PMID: 32746148

Abstract

We developed a new joint probabilistic segmentation and image distribution matching generative adversarial network (PSIGAN) for unsupervised domain adaptation (UDA) and multi-organ segmentation from magnetic resonance (MRI) images. Our UDA approach models the co-dependency between images and their segmentation as a joint probability distribution using a new structure discriminator. The structure discriminator computes structure of interest focused adversarial loss by combining the generated pseudo MRI with probabilistic segmentations produced by a simultaneously trained segmentation sub-network. The segmentation sub-network is trained using the pseudo MRI produced by the generator sub-network. This leads to a cyclical optimization of both the generator and segmentation sub-networks that are jointly trained as part of an end-to-end network. Extensive experiments and comparisons against multiple state-of-the-art methods were done on four different MRI sequences totalling 257 scans for generating multi-organ and tumor segmentation. The experiments included, (a) 20 T1-weighted (T1w) in-phase mdixon and (b) 20 T2-weighted (T2w) abdominal MRI for segmenting liver, spleen, left and right kidneys, (c) 162 T2-weighted fat suppressed head and neck MRI (T2wFS) for parotid gland segmentation, and (d) 75 T2w MRI for lung tumor segmentation. Our method achieved an overall average DSC of 0.87 on T1w and 0.90 on T2w for the abdominal organs, 0.82 on T2wFS for the parotid glands, and 0.77 on T2w MRI for lung tumors.

Index Terms—: Unsupervised domain adaptation, generative adversarial network, MRI segmentation, lung tumor, parotid glands, abdominal organs

I. Introduction

Magnetic resonance imaging (MRI) is rapidly emerging as the modality for image-guided adaptive radiation therapy treatments[1] due to its better soft tissue contrast compared with computed tomography (CT) scans. However, a critical requirement for MR-guided radiotherapy is fast, accurate, and consistent segmentation of target and surrounding normal organs at risk (OAR) [2].

Deep learning-based methods have shown remarkable success in diverse image analysis tasks when they can be trained using large annotated datasets. However, acquiring large expert-segmented medical image datasets is difficult. This is because, slice-by-slice delineations of several organs is highly time consuming, and such delineations require a domain expert, such as a radiologist or a radiation oncologist.

Domain adaptation [4], [5], [6], [7] is a commonly used approach to overcome the issue of learning from limited and unlabeled target modality datasets, where a model for the target domain is trained by using an existing labeled dataset from a different modality, called the source domain. In unsupervised domain adaptation (UDA) based segmentation, the focus of this work, no target domain labeled data is available for training.

UDA segmentation has been accomplished by either using feature-level or pixel-level adaptation. In feature-level adaptation, the encoder networks are adversarially trained to extract domain-invariant features, such that a single segmentation model trained on source data is applicable to both domains [4], [5], [8], [9]. In pixel-level adaptation, generative adversarial networks (GAN) model the complex inter-modality anatomical relationships and compute image to image (I2I) translations [6], [10]. The target modality model is then learned by using the source to target transformed images. Cyclical consistency [11] and feature disentanglement [12], [13] losses are often used in unpaired I2I translation methods to circumvent the lack of corresponding source and target modality images [3], [14], [7]. Hybrid methods combine feature and pixel-level adaptation [15], [16], [17] in order to ensure good pixel-level I2I transformations and the preservation of low-level edge and mid-level textural characteristics of the target modality images.

A major issue while performing UDA is mode collapse [18], wherein multiple distinct inputs are mapped to a same output. Modality hallucination is a related manifestation of this issue that commonly occurs in medical image I2I translations, wherein distinct organ characteristics like geometry (or overall shape) and appearance (or intensity distribution) are ill-preserved or are removed [10] in translated images. This is because the commonly used losses based on matching global or marginal intensity statistics of whole images cannot sufficiently constrain the generator to model the local organ/tumor geometry and appearance statistics. Feature disentanglement methods have been reported to reduce the aforementioned issues by using shared content and domain-specific attribute encoders [12], with demonstrated success in medical image applications [13]. Nevertheless, the lack of explicit conditioning of losses with respect to the geometry and appearance of the various structures of interest (SOI), may not yield the desired results for the output task.

Prior works have used output segmentation as an adversarial loss to constrain UDA and improve segmentations [14], [19], [20], [21], [10]. As shown in [15], task-specific losses can improve training stability and reduce chances of mode collapse. However, adversarial losses computed only using the segmentation output can constrain the geometry but not necessarily the appearance of SOIs in I2I translation.

A key difference of our approach compared to prior works is the use of joint distribution matching adversarial loss. The joint distribution is represented as a channel-wise concatenation of images and their voxel-wise segmentation probability maps. Prior works have employed similar joint-distribution matching to constrain bi-directional mapping between images and a low dimensional latent distribution vector [22], [23] or images and scalar output categories [24]. Ours on the other hand constrains pixel-level relationships between I2I translations and segmentations. To our best knowledge, this is the first approach to compute joint distribution matching of images and segmentation probabilities for UDA segmentation (Fig. 1). This approach also leads to a cyclical optimization where both segmentation and generator outputs constrain each other. In prior approaches like [21], [8], segmentation outputs do not constrain generator network gradient computation.

Fig. 1.

Fig. 1.

Difference between traditional UDA [3] and PSIGAN segmentation. In both (a) and (b), the translation network T parameterized by θT produces labeled pseudo target data {Xcm, Yc} from unpaired source Xc and target Xm data and trains the segmentation (S) network, parameterized by θS. PSIGAN also uses a structure discriminator Dstruct to match the joint distribution of image-segmentation probability maps {xm, ψm}, {xcm, ψcm} to further optimize θT.

Our contributions are:

  • Organ geometry and appearance constrained unpaired cross-modality adaptation. We introduced an UDA approach that constrains organ geometry and appearance in I2I translations by computing adversarial losses to minimize the mismatch in the joint distribution of images and their segmentation probability maps.

  • A cyclic feedback based UDA segmentation. In our approach, the generator outputs are used to train the segmentation network, while the segmentation outputs are used to compute losses for the generator.

  • Comprehensive performance comparisons were done against multiple state-of-the-art methods using three datasets for multiple organs and tumor segmentation. Extensive ablation experiments were done to measure the impact of joint distribution matching on both segmentation and I2I translation.

II. Related Works

A. Feature-level UDA segmentation

Feature-level UDA segmentation methods extract a domain-invariant feature encoding, such that a model trained on source data is applicable to both source and target domains. This method is often used for performing domain adaptation between related image sequences. Example applications of this include prostate gland segmentation from MRIs acquired from different scanners [5], and brain tumor segmentation from similar MRI contrasts [4]. Joint adversarial training strategy combining a domain critic network with a segmentation network has been used for the more challenging cross-modality adaptation between CT and MRI in [25]. However, as shown in [25], these methods may require different number of feature layers to be adapted for segmenting various organs. Hence, network feature-adaptation depth is an important hyper-parameter. Such a depth tuning can require computationally intensive training when generating segmentation of large number of organs for radiation therapy applications. This issue has been avoided by using joint latent space learning with variational autoencoders [7] and disentanged feature representations [26]. The work in [27] employed output feature matching for producing scanner invariant estimation of cardiothoracic ratio (or heart size) from chest X-rays. Segmentation probability maps produced from softMax layer [28] and entropy maps indicative of pixel-wise classification uncertainties [21] have been used for domain invariant (synthetic and camera-acquired) natural image segmentation.

B. Pixel-level UDA segmentation

Pixel-level domain adaptation and segmentation methods model the inter-modality relationships and compute I2I image translations. The translated images are used to train a segmentation model for the target domain [29], [30], [3], [14]. Two-step training consisting of I2I translation (e.g., CT to MRI) followed by segmentation network training has been used for MRI lung tumor [10] and fundus image segmentation [31]. I2I translation and segmentation were combined into one network to segment cardiac structures from CT [14], [17], abdominal organs from CT and MRI [3], and knee structures from MRI [30]. Multiple works [32], [30], [3] have combined cyclical consistency losses of the CycleGAN [11] with a segmentation network to train with unpaired source and target modalities. Cyclical consistency loss shrinks the space of possible mappings in GANs by computing global intensity distribution mismatches. But this loss alone is insufficient to preserve the SOI geometry and appearance in I2I translation of medical images [33]. Inclusion of style and perceptual losses have shown improvements in I2I translations [34]. Geometry preserving losses implemented by backpropagating segmentation losses [14] to the generator and high-level segmentation features matching losses have also shown improvements for both tumor [10] and semantic segmentation of real-world images [16]. However, none of the aforementioned losses provide constraints to sufficiently control both geometry and appearance of the SOIs in I2I translation. We address this problem by combining images and their segmentation probability maps as a joint density for adversarial learning.

III. Method

Goal:

Learn MRI multi-organ segmentation models by using unpaired expert-segmented CT and unlabeled MRI images.

Notation:

CT (Xc, Yc) is the source domain that consists of images xcXC and expert-segmentations ycYC for training. MRI is the target domain that is provided with only MRI images xmXM for training. As the learning optimization involves a finite set of examples, the probability distribution of CT and MRI images are represented as p(xc) and p(xm). The probability distribution of pseudo MRI xcm and pseudo CT xmc resulting from I2I translations of CT to MRI and MRI to CT are represented as p(xcm) and p(xmc), respectively. The joint probability distribution of the real MR and pseudo MR images and their probabilistic segmentations are represented as p(xm, ψm) and p(xcm,ψcm), respectively.

A. Background

In supervised learning with finite set of training examples N, given a joint probability distribution of inputs and outputs p(x, y) and a model parameterized by θ, a chosen loss function Lc(.) is used to compute the empirical risk in predicting the outputs y from inputs x:

E[L(x,y,θ)]=argminθΘi=1Np(xi,yi)Lc(xi,yi,θ), (1)

In pixel-level UDA segmentation, we seek to minimize the empirical risk of training with pseudo target examples, obtained through a model ϕ:xcxcm, by assuming that p(xcm)p(xm):

E[L(xcm,y,θ)]=argminθΘi=1Np(xcim,yi)Lc(xcim,yi,θ). (2)

where p(xcm,y) is the joint distribution on an intermediate representation or pseudo domain. However, ϕ does not produce a perfect mapping, that is p(xcm)p(xm). Therefore, an additional domain translation loss Lt needs to be added to (2). With p(xcm,xm) as the joint probability distribution over pseudo and real target samples, the risk is computed as:

E[L(xcm,xm,y,θ,ϕ)]=argminϕΦj=1Mp(xcjm,xmj)Lt(xcjm,xmj,ϕ)Domain translation +argminθΘi=1Np(xcim,yi)Lc(xcim,yi,θ).Segmentation  (3)

Equation (3) can be optimized by training the domain translation and segmentation networks either sequentially [10], [31] or jointly [3], [14], [16]. However, this optimization ignores the co-dependency of domain translation and segmentation. For instance, no explicit constraint exists to preserve any inherent target modality appearance or geometric characteristics of the SOIs that distinguishes them from the background in the I2I translated images, which is a cause of sub-optimal performance. We model this co-dependency by computing adversarial losses using the joint distribution of images and their segmentation probability maps as a pair obtained from the translated ({xcm,ψcm}) and real target images ({xm, ψm}). In other words, we require p(xcm,ψcm)p(xm,ψm).

B. Probabilistic segmentation and image matching GAN (PSIGAN)

The overview of PSIGAN is shown in Fig. 2. PSIGAN consists of a CT to MRI generator GCM:xcxcm, global intensity discriminator DM, a structure discriminator Dstruct, and a target domain segmentor S: xm → {ψm, ym}, where ψm is the predicted map of voxel-wise segmentation probabilities for xm and ym is the K-organ segmentation for xm. We also include a MR to CT generator GMC:xmxmc and global intensity discriminator DC to implement cyclical consistency losses for unpaired I2I translation. DM computes a global adversarial loss to penalize mismatches in the marginal intensity distribution of pseudo and real MRI images in order that p(xcm)p(xm):

maxDMminGCMLadvCM(GCM,DM)=Exm~p(xm)[log(DM(xm))]+Exc~p(xc)[log(1(DM(GCM(xc))]. (4)

Fig. 2.

Fig. 2.

Approach overview. Generator GCM converts CT image xc into pseudo MR image xcm, which is used to train the segmentor S. S uses split encoders EM, ECM and shared decoder DE to separate the gradient flows for Dstruct and GCM through sub-networks SM and SCM. Dstruct computes joint distribution matching ({xcm, ψcm}, {xm, ψm}) adversarial loss LstructD, where ψcm and ψm are produced by SM. The corresponding adversarial loss LstructG for GGM uses {xcm, ψ¯cm}, where ψ¯cm is produced by SCM. DM and DC are global intensity discriminators for MRI and CT domain; GMC converts MRI to CT images for enforcing cyclically consistent I2I translations.

Dstruct computes a joint distribution adversarial loss to reduce SOI appearance (intensity distribution) and geometry (overall shape) mismatches between pseudo xcm and real MRIs xm. It accomplishes this by matching the joint distributions p(xm, ψm) and p(xcm,ψcm), implemented by concatenating the images and their segmentation probability maps. The segmentor S (Fig. 2) produces the segmentation probability maps ψm, ψcm from real xm and pseudo MRI xcm. Dstruct loss is computed by using both real and pseudo MRI as:

maxDstructLstructD=E(xm,ψm)~p(xm,S(xm))[log(Dstruct(xm,ψm))]+E(xcm,ψcm)~p(xcm,S(xcm))[log(1Dstruct(xcm,ψcm))]. (5)

The inclusion of segmentation probability maps in this loss constrains the SOI geometry similar to prior works [3], [14], [31], [17]. The inclusion of images in this loss additionally constrains the appearance. The generator GCM is optimized by adversarial loss, which is computed using the pseudo MR images xcm=GCM(xc) as:

minGCMLstructG=E(xcm,ψcm)~p(xcm,S(xcm))[log(1Dstruct(xcm,ψcm))]. (6)

ψm, ψcm are produced by aggregating label assignment probabilities ei, j at location i, j for K SOIs from K channels as generated by the softMax layer of the network S as:

ψi,j=n=2Kei,jn. (7)

The first channel corresponds to background and is thus ignored in the aggregation shown in (7). The map ψ has continuous values in the range [0, 1], where higher values indicate higher likelihood of a voxel corresponding to a SOI, while lower values are indicative of background.

The segmentor S can be trained using the generated pseudo MRI and the associated expert-segmentations available on the source modality {xcm,yc} and optimized using cross-entropy losses. The output segmentation is computed using an argmax function.

Given a fixed GCM and S, the optimal Dstruct at any point in the optimization of (5) is given by p(xm,ψm)p(xm,ψm)+p(xcm,ψcm). The global equilibrium is achieved if and only if the joint distribution of p(xcm,ψcm) and p(xm, ψm) are matched, i.e., p(xcm,ψcm)=p(xm,ψm). Thus, unlike methods in [3], [14], [10], [31], [17] that only constrain geometry, ours constrains both organ geometry and appearance.

The joint distribution formulation leads to a cyclical optimization of the generator GCM and the segmentation S networks. Concretely, S requires the output xcm of GCM for its training, while GCM requires the output of S, namely ψcm for gradient computation. The gradient flow relation between Dstruct, GCM and S is illustrated in Fig. 3(a)1.

Fig. 3.

Fig. 3.

Gradient flow between GGM, Dstruct, and segmentors using (a) single segmentor S, and (b) split segmentors SCM and SM.

The gradient flow resulting from a network using a single segmentor S is shown in Fig. 3(a), where the outputs of GCM and S are used to update each other. Concretely, xcm is used to train S, and {xcm,ψcm} is used to compute the joint distribution adversarial loss for GCM. As a result, the outputs of GCM and S can co-adapt to facilitate easy segmentation, but without xcm matching the target distribution. In this case, Dstruct can easily distinguish p{xcm,ψcm} from p(xm, ψm), allowing it to reach a stable local minima well before GCM reaches its local minima, and possibly prevent GAN convergence. This is because, GAN optimization hinges on achieving Nash equilibrium, whereby all players (generator and discriminator) achieve equal payoffs (reach local minima at similar times). This would also lower the generalization accuracy of S on real MRI. We address this potential issue by separating the networks used for producing the segmentation probability maps for GCM and Dstruct. Thus, segmentation probability maps from network SM is used to compute Dstruct loss, and segmentation probability maps from SCM is used for to compute GCM loss. The modified configuration of gradient flow is shown in Fig. 3 (b).

C. Split segmentor network

The split segmentation network consists of two encoder subnetworks, called EM and ECM with a shared decoder network (DE) (Fig. 2), from which two segmentations are generated via SM = EMDE and SCM=ECMDE. Shared decoder is used because the high-level contextual features for segmentation should be same between pseudo and real MRI. It also reduces the number of parameters required for training SM and SCM. The networks SM, SCM are trained using pseudo MRI data with label (xcm, yc) and optimized with cross-entropy losses LsegM (first part of summation in (8)) for SM and LsegM¯ (second part of summation in (8)) for SCM, respectively. The overall loss Lseg=LsegM+LsegM¯ is computed as:

Lseg=Excm~p(xcm),yc~Yc[logP(ycSM(xcm)]LsegM+Excm~p(xcm),yc~Yc[logP(ycSCM(xcm))]LsegM¯. (8)

The network SM is used to produce ψm and ψcm from xm (solid blue arrow Fig. 2) and xcm (solid orange arrow Fig. 2), respectively to compute the gradients for Dstruct. Gradient flow with respect to GCM, SM, and Dstruct is shown in Fig. 3 (b). The modified loss LstructD to optimize Dstruct, obtained by replacing S in (5) with SM is computed as:

maxDstructLstructD=E(xm,ψm)~p(xm,SM(xm))[log(Dstruct([xm,ψm]))]+E(xcm,ψcm)~p(xcm,SM(xcm))[log(1(Dstruct([xcm,ψcm])))]. (9)

The network SCM is used to produce ψ¯cm from xcm (dotted orange arrow Fig. 2) in order to compute the gradients for GCM. We use ψ¯cm to differentiate from ψ¯cm that is generated using SM in (9). The loss LstructG to optimize ψ¯cm is now computed by replacing S in (6) with SCM and xcm=GCM(xc):

minGCMLstructG=E(xcm,ψ¯cm)~p(xcm,SCM(xcm))[1log(Dstruct([xcm,ψ¯cm]))]. (10)

The network SM, which is never used in generator update is used to segment target datasets at test time. We found that this network produces more accurate MRI segmentations than the SCM as shown in results.

D. Additional losses for unpaired mapping

In order to train with unpaired CT and MR datasets, we enforce consistent reverse mapping by computing adversarial penalties from target to source generator GCM and global intensity discriminator DC. This loss is expressed as:

maxDCminGMCLadvMC(GMC,DC)=Exc~p(xc)[log(DC(xc))]+Exm~p(xm)[log(1(DC(GMC(xm))]. (11)

The global adversarial loss is then computed as Ladv=LadvCM+LadvMC. Cyclical consistency loss [11] is computed to account for lack of any pixel-level correspondence between the source and the target domain images in the I2I translation (GCM=GMC(GCM(xc)); GMC=GCM(GMC(xm))) as:

Lcyc(GCM,GMC)=Exc~p(xc)[GCM(xc)xc1]+Exm~p(xm)[GMC(xm)xm1]. (12)

The total loss is expressed as:

Ltotal=Ladv+λcycLcyc+λstructLstruct+λsegLseg, (13)

where Lstruct corresponds to either LstructD or LstructG depending on whether this loss is for Dstruct ((9)) or GCM ((10)).

III.

The generators (GCM, GMC), discriminators (DM, DC, and Dstruct), and segmentors (SM, SCM) are updated with the following gradients, ΔθG(Ladv+λcycLcyc+λstructLstructG+λsegLsegM¯), ΔθD,Dstruct(Ladv+λstructLstructD) and ΔθS(Lseg), respectively. The algorithm for the proposed method is shown in Algorithm 1.

E. Implementation and network structure

All networks were implemented using the Pytorch library and were trained on Nvidia GTX V100 with 16 GB memory. Training was done using ADAM algorithm[35] with an initial learning rate of 1e-4 and batch size of 2. We set λcyc=10, λstruct=0.5 and λseg=5 in the training. The appropriate values for these hyper-parameters were selected empirically from the T2w MR parotid dataset set (see Supplementary document Sec. I). The learning rate was kept constant for the first 30 epochs and decayed to zero in the next 30 epochs.

The generator architectures were adopted from DC-GAN [36]. The generators (GCM and GMC) consisted of two stride-2 convolutions, 9 residual blocks and two fractionally strided convolutions with half strides and used tanh activation following the last convolutional layer to produce output images. The discriminator networks (DM, DC and Dstruct) were composed of 5 convolutional layers with a kernel size of 4 ×4 pixels that resulted in feature maps of size 64, 128, 256, 512, 1 in each layer of these networks. Discriminators were implemented as 70 × 70 pixels patchGAN [37].

The segmentation network was based on the U-net [38] with batch normalization added after each convolution filter. Feature encoders, EM and ECM of the segmentation network were implemented using 2 successive operations of convolution using kernels of size (3×3), batch normalization and ReLu activation followed by max pooling. Four max-pooling operations were used in the encoder structure for subsampling the image feature maps leading to the feature sizes of 64, 128, 256 and 512. Skip connections from both the encoder networks layers are used to combine with the decoder layer features to prevent segmentation blurring. The decoder network (DE) was implemented using four unpooling operations to upsample the features back to the original image resolution. We have made our code available2.

F. Evaluation Metrics

Per-organ segmentation accuracies were measured using Dice similarity coefficient (DSC), computed from voxel-wise true positives (TP), false positives (FP) and false negatives (FN) (DSC=2×TPFP+2×TP+FN), and Hausdorff Distance at 95th percentile (HD95) as suggested in [39]. We also computed an overall DSC that is an average DSC over all the structures segmented in a given scan.

G. Compared methods

We compared our method against multiple UDA segmentation methods including the CycadaGAN[16], segmentation structure matching (SA)[28], segmentation entropy matching (ADVENT)[21], SynSeg-Net[3], and SIFA[17]. Also, we compared against the CycleGAN[11] with a U-net[38] and the UNIT[40] with a U-net. SynSeg[3] and SIFA[17] are methods that have been developed and applied to medical image segmentation, while all other compared methods were developed for analyzing natural images. Both SA and ADVENT methods employed additional segmentation matching based adversarial losses for domain adapted segmentation training. We also computed the performance on a model trained with only CT (source) images without domain adaptation. Moreover, we compared performance against supervised MRI segmentation model to ascertain the accuracy upper limit. The supervised MRI segmentation training was done by using the MRI validation set as the training set.

Default implementation of the various networks as available from the authors’ were used. All networks, including ours, were trained and tested using identical datasets with the same hyperparameter settings (learning rates and batch size).

IV. Experiments and Results

We evaluated multiple organ and tumor segmentation from four different MRI sequences arising from two external institution (with two MR sequences) and two private and internal institution datasets. All networks were trained using 256×256 CT and MRI image patches enclosing the SOIs. All MR scans were standardized to remove patient-dependent signal intensity variations [41] and normalized to a range [−1, 1] for meaningful computation of global and joint distribution adversarial losses.

Ablation experiments were done to assess the utility of the proposed structure discriminator (LstructD), the global intensity discriminator (Ladv) and cycle losses (Lcyc). We also evaluated alternative network design choices for computing the structure adversarial GAN losses by using (a) only segmentation probability maps, (b) joint distribution representation using multi-channel segmentation probability maps, (c) a single encoder-decoder segmentation network, and (d) SOI-specific discriminators.

A. MRI abdomen organs dataset

1). Data:

Twenty MRIs (T1-DUAL in-phase and T2w spectral pre-saturation inversion recovery or SPIR) acquired on a 1.5T Philips machine from the Combined Healthy Abdominal Organ Segmentation (CHAOS) challenge data [42] were used to generate segmentation of liver, left and right kidneys, and spleen. Ten MRIs were used for training (without labels) and validation and the remaining 10 scans were used for testing. Both MRI sequences were acquired to perform fat suppression. These data sets were acquired using a 1.5T Philips MRI, with an image resolution of 256 × 256 pixels, slice thickness that ranged between 5.5mm to 9mm (average of 7.84mm). Additional details of the MR sequences are in supplemental document Sec.I.A.

Thirty contrast-enhanced portal venous phase CT scans with expert segmentations were obtained from a completely different dataset [43] for UDA training. The CT images had a resolution of 512×512 pixels, an in-plane resolution ranging between 0.54mm ×0.54mm to 0.98mm ×0.98mm, and slice thickness ranging between 2.5mm to 5.0mm. CT images were cropped to contain only body region and then resampled to 256×256 to have the same resolution as the MRI images. Following histogram standardization, T1w and T2w MRI were clipped to the range of [0, 1136] and [0, 1814] using the 95th percentile of the reference MRI intensity values, respectively. Separate segmentation networks were trained for the T1w and T2w MRI using 256×256 pixels image patches. These patches were extracted from 8000 T1w, 7872 T2w MRI slices, and 14038 CT slices.

We also evaluated the feasibility of our method to segment T2w MRIs acquired on different scanners with various repetition times (TR), echo times (TE), with and without fat suppression from external institution TCIA-LIHC [44] dataset. Six patients were downselected from a total of 97 patients; exclusion criteria were only CT scans (N=57), absence of T2w FSE/TSE MRI (N=22), motion artifacts (N=6), and large liver tumors(N=6). Table I shows the sequence details from the CHAOS and the six patients from TCIA-LIHC dataset.

TABLE I.

T2w MRI scanning parameters used in the analysis. ETL: Echo train length; TE: Echo time; TR; repetition time; FatSup: Fat suppressed; FSE: Fast spin echo; TSE: Turbo spin echo; RT: Respiratory trigger; Nav: navigator; FatSat: Fat saturation

Dataset Series Manufacturer Magnet ETL TE ms TR ms Flip angle
CHAOS SPIR FatSup Philips 1.5T 24 70 1930 90°
DD-A4NJ FSE RT Siemens 1.5T 21 89 5233 180°
DD-A1ED FSE FatSat RT GE 3T 12 83.90 10000 90°
DD-A4NF TSE Nav Siemens 1.5T 17 76 3386 150°
DD-A113 FSE GE 1.5T 10 78.9 7500 90°
DD-A114 FSE GE 1.5T 16 86.8 6000 90°
K7-AAU7 FSE RT Philips 1.5T 85 80 563 90°

2). Volumetric segmentation accuracies:

The volumetric DSC segmentation accuracies for the multiple abdominal organs generated from the T2w and T1w MRI sequences on the testset are shown in Table II. A model trained with only CT images (w/o adaptation) was unable to produce clinically usable segmentations on MRI. PSIGAN achieved better overall average (computed over all organs) DSC of 0.90 on T2w and 0.87 on T1w MRI and a HD95 of 7.80mm on T2w and 6.90mm on T1w MRI than all other methods except supervised MRI segmentation. PSIGAN accuracy was only slightly lower than the fully supervised MRI segmentation.

TABLE II.

Overall segmentation accuracy on CHAOS dataset. Liver-Lv, Spleen-Sp, Left kidney-LK, Right kidney-RK. Overall average (Avg) is also shown.

Method T2w-SPIR MRI (fat suppressed) (N=10) T1w-DUAL in phase MRI (fat suppressed) (N=10)
DSC ↑ HD95 mm DSC ↑ HD95 mm
LV SP LK RK Avg. LV SP LK RK Avg. LV SP LK RK Avg. LV SP LK RK Avg.
W/o Adaptation Avg. 0.08 0.23 0.26 0.01 0.15 64.52 71.12 47.48 72.70 63.96 0.00 0.00 0.00 0.00 0 97.03 141.26 89.47 108.45 109.05
Std. 0.12 0.15 0.26 0.02 34.21 22.23 19.29 10.85 0.00 0.00 0.00 0.00 36.47 42.04 50.05 25.06
Supervised Avg. 0.92 0.87 0.92 0.91 0.91 11.13 6.26 4.78 4.26 6.61 0.92 0.87 0.85 0.86 0.88 7.24 5.44 5.64 4.67 5.75
Std. 0.03 0.07 0.03 0.03 8.31 2.19 5.34 1.68 0.04 0.04 0.14 0.18 2.48 1.18 3.28 1.90
CycleGAN Avg. 0.86 0.75 0.88 0.87 0.84 24.59 13.51 14.26 6.08 14.61 0.82 0.83 0.63 0.61 0.72 10.40 9.45 10.72 20.26 12.71
Std. 0.09 0.17 0.03 0.05 11.50 16.09 10.36 1.74 0.07 0.05 0.14 0.24 2.93 10.07 5.01 9.16
UNIT Avg. 0.87 0.76 0.91 0.88 0.86 15.85 15.45 7.47 5.87 11.16 0.89 0.81 0.64 0.62 0.74 11.39 10.64 11.45 12.59 11.52
Std. 0.08 0.23 0.03 0.04 9.63 15.80 7.46 1.68 0.02 0.07 0.17 0.23 6.62 7.84 5.66 5.82
CycaDA Avg. 0.88 0.77 0.86 0.86 0.84 12.34 12.24 6.86 8.56 10.00 0.85 0.73 0.71 0.70 0.75 11.46 12.75 9.13 17.13 12.62
Std. 0.08 0.18 0.03 0.06 15.82 14.70 1.43 1.80 0.04 0.12 0.21 0.19 9.79 13.77 3.10 4.31
SA Avg. 0.86 0.80 0.89 0.89 0.86 18.18 12.32 10.09 9.14 12.43 0.89 0.73 0.72 0.79 0.78 11.67 11.33 10.30 10.33 10.91
Std. 0.12 0.07 0.07 0.03 11.47 11.84 5.25 4.67 0.03 0.10 0.10 0.10 2.86 9.67 2.62 2.61
ADVENT Avg. 0.89 0.79 0.81 0.80 0.82 12.58 13.76 11.50 11.65 12.37 0.87 0.84 0.76 0.79 0.82 14.55 11.72 10.22 10.62 11.78
Std. 0.08 0.03 0.03 0.12 4.81 12.21 2.26 5.26 0.03 0.08 0.06 0.08 6.08 10.32 1.78 1.44
SynSeg Avg. 0.88 0.77 0.89 0.85 0.85 21.04 12.63 10.23 6.28 12.55 0.89 0.85 0.73 0.70 0.79 8.98 8.76 9.13 11.95 9.71
Std. 0.08 0.19 0.03 0.10 11.97 13.08 9.05 2.44 0.03 0.05 0.09 0.19 2.82 5.01 3.10 5.51
SIFA Avg. 0.89 0.77 0.90 0.89 0.86 19.20 13.56 7.28 5.78 11.46 0.90 0.85 0.77 0.78 0.83 9.55 7.45 8.35 12.67 9.51
Std. 0.09 0.22 0.02 0.03 13.01 16.53 1.21 1.44 0.02 0.05 0.12 0.07 2.92 4.02 1.76 7.55
PSIGAN Avg. 0.91 0.87 0.91 0.90 0.90 11.15 8.34 6.81 4.88 7.80 0.92 0.87 0.83 0.84 0.87 7.41 5.87 7.62 6.70 6.90
Std. 0.03 0.02 0.03 0.06 3.38 6.67 4.66 1.35 0.02 0.03 0.10 0.06 2.76 1.95 1.78 1.44

Fig. 4 shows example segmentations generated by the various methods from T1w and T2w MRI. Segmentations produced without adaptation and supervised method are shown in Supplementary Fig.2. As shown, the PSIGAN segmentations are nearly indistinguishable from the expert’s segmentations.

Fig. 4.

Fig. 4.

Segmentation performance of different methods on mdixon T1w and T2w MRI. The overall DSC computed for all organs on this patient is shown in the top right corner of images.

3). Evaluation on TCIA-LIHC dataset:

PSIGAN produced an average DSC accuracy of 0.87 and an average HD95 accuracy of 8.59mm on this dataset, which is close to that achieved on the CHAOS dataset (Table. II). PSIGAN segmentations were highly similar to expert delineations on all six cases with highly varying MR tissue contrasts (Supplementary Fig. 5).

4). MR to CT UDA segmentation:

We also evaluated our and other approaches to segment CT images by using either T1w or T2w MRI as the source modality. Validation was done using 15 CTs and testing was done on remaining 15 cases. Fig. 5 shows segmentations computed on a representative CT case. The corresponding segmentations without adaption and with supervised segmentation are shown in Supplementary Fig. 6. The DSC and HD95 accuracies of all methods computed from the test set are in Supplementary Table I. PSIGAN produced an overall average DSC of 0.90 and HD95 of 10.35mm when performing T2w to CT UDA and an average DSC of 0.89 and HD95 of 10.30mm when performing T1w to CT UDA segmentation. The next closest method SIFA produced a lower average DSC of 0.86 from T2w to CT and 0.85 from T1w to CT UDA segmentations and higher average HD95 of 14.26mm and 16.20mm for T2w to CT and T1w to CT UDA segmentations, respectively.

Fig. 5.

Fig. 5.

Representative segmentations produced by different methods on CT when performing T1w to CT (top row) and T2w to CT UDA segmentation. The overall DSC accuracies for each method are also shown.

B. MRI parotid glands dataset

1). Data:

A private institution head and neck MRI dataset consisting of 162 T2w fat suppressed (T2wFS) head and neck MRIs and obtained from 57 patients who were scanned before and every week during radiation therapy was analyzed. Eighty five MRIs from 30 patients that resulted in 14000 MRI 2D slices were used in training (expert-segmentations removed) and validation (with expert-segmentations added). Remaining 77 MRIs from 27 patients were used for testing. Ninety six head and neck CT scans combining 48 private and 48 opensource public domain database for computational anatomy (PDDCA) [45] were used as the source domain. MRI images were clipped to the range of [0, 1651] using the 95th percentile of the intensity values of the reference scan following image standardization. Two-dimensional patches of size 256 × 256 pixels containing the head and neck regions obtained after cropping portions outside of the body, resulting in 15000 CT and 14000 MRI images were used for training. Ablation experiments were done using this dataset, because it had the most MRIs available.

2). Volumetric segmentation accuracies:

Table. III shows the segmentation accuracies for the left and right parotid glands using the various methods including, supervised MRI training and MRI segmentation obtained using a CT network trained without domain adaption. The CT model trained without any domain adaptation was unable to generate segmentation on MRI. PSIGAN produced an average DSC of 0.82 and the lowest H95 of 3.06mm, which was slightly lower than supervised method’s accuracy with DSC of 0.84 and HD95 of 2.26mm. The next best method SIFA reached an average DSC of 0.72 and HD95 of 4.99mm. Fig. 6 shows segmentations generated on a representative test case using the compared methods; the corresponding segmentations produced without adaptation and with the supervised method are in Supplementary Fig. 4. As seen, PSIGAN segmentations were nearly indistinguishable from the expert contours.

TABLE III.

Overall segmentation accuracy on MRI parotid test dataset. Left parotid - LP, right parotid - RP.

Method Test T2wFS MRI (N=77)
DSC ↑ HD95 mm
RP LP Avg. RP LP Avg.
W/o Adaption 0.00±0.00 0.00±0.00 0.00 87.81±18.89 85.45±19.94 86.63
Supervised 0.84±0.06 0.84±0.04 0.84 2.24±0.48 2.28±1.31 2.26
CycleGAN 0.55±0.09 0.51±0.11 0.53 8.22±4.81 7.38±2.29 3.90
UNIT 0.66±0.06 0.62±0.10 0.64 6.91±5.35 5.91±1.89 6.41
CycaDA 0.70±0.08 0.69±0.09 0.70 5.59±2.40 4.82±1.44 5.21
SA 0.74±0.07 0.71±0.07 0.73 5.05±1.74 4.94±1.58 5.00
ADVENT 0.73±0.12 0.71±0.11 0.72 4.66±2.16 5.06±1.65 4.86
SynSeg 0.67±0.09 0.65±0.09 0.66 5.27±3.56 6.08±2.73 5.68
SIFA 0.73±0.08 0.71±0.06 0.72 4.95±1.55 5.03±1.47 4.99
PSIGAN 0.82±0.03 0.81±0.05 0.82 2.98±1.01 3.14±1.17 3.06
Fig. 6.

Fig. 6.

Segmentation performance of different methods on T2wFS MRI. Red contour indicates the manual segmentation and the yellow contour indicates the algorithm segmentation. The overall DSC is shown in the top right corner of the images.

C. MRI lung tumor dataset

1). Data:

A private institution dataset with 75 T2-weighted turbo spin-echo (T2wTSE) MRIs obtained from 27 non-small cell lung cancer (NSCLC) cancer patients scanned before and every week during conventional fractionated external beam radiotherapy of 60 Gy was analyzed. Motion-robust two-dimensional axial images were acquired by using respiratory triggering on a 3T Philips Ingenia scanner (Medical Systems, Best, Netherlands). This is the same dataset as used in our prior work [10]. MRI images were clipped to the range of [0, 1198] using the 95th percentile of the reference MRI intensity values following image standardization. Training was done on 9696 unlabeled 2D image patches containing lung tumor extracted from 35 longitudinal MRI scans of 5 patients, while independent testing was done on the remaining 40 MRI scans from 22 patients. The CT source domain data was obtained from 377 expert-segmented CT scans of NSCLC patients and available from the Cancer Imaging Archive (TCIA) [46]. Training was done using 32, 000 image patches of size 256×256 pixels containing lung tumor.

2). Volumetric segmentation accuracies:

Table IV shows the lung tumor segmentation accuracies achieved by the various methods. The CT segmentation model without domain adaptation failed to generate segmentations on MRI. PSIGAN was more accurate than all except the supervised MRI segmentation method. Both tumor-aware [10] and SIFA produced a lower accuracy than PSIGAN. Fig. 7 shows two representative examples with algorithms’ segmentation together with expert delineations. The corresponding segmentations without adaption and with supervised segmentation are in Supplementary Fig. 3.

TABLE IV.

Segmentation accuracy on T2wTSE MRI (Lung tumor) test set.

Method Test (N=40)
DSC HD95 mm
W/o Adaption 0.00±0.00
Supervised 0.80±0.09 7.05±3.66
CycleGAN 0.64±0.20 17.83±15.43
UNIT 0.70±0.16 14.31±12.26
Cycada 0.70±0.18 14.17±12.91
SA 0.71±0.15 12.42±11.38
ADVENT 0.72±0.18 12.30±11.93
SynSeg 0.69±0.20 15.70±11.98
SIFA 0.72±0.15 11.97±6.12
Tumor-aware 0.72±0.16 12.88±11.08
PSIGAN 0.77±0.10 7.90±4.40
Fig. 7.

Fig. 7.

MRI lung tumor segmentation with DSC accuracies of multiple methods. Red contour is expert’s, yellow is algorithm segmentation.

D. Differences in feature maps extracted using global and joint distribution discriminator

Fig. 8 shows four randomly chosen feature maps of the first convolutional layer from the global intensity discriminator that used only images (Fig. 8(c)), a discriminator that used only segmentation probability maps (Fig. 8(d)), and structure discriminator that used a concatenation of image and aggregated segmentation probability map (Fig. 8(e)). We visualized the first layer features due to their proximity to the input images and to ascertain what low-level features were relevant for the discriminator. As seen, the image intensity matching global intensity discriminator extracts features without a clear focus on any part of the image (Fig. 8(c)), while the segmentation probability matching discriminator amplified features at the SOI boundaries (Fig. 8(d)), thereby, emphasizing organ geometry. On the other hand, feature responses are higher both within and at SOI boundaries, with a slight emphasis on some background structures (e.g. bottom row Fig. 8(e)) when using the structure discriminator. As a result, our method allows the structure discriminator to focus on both geometry and the content within the SOIs, while also preserving some background features.

Fig. 8.

Fig. 8.

Feature maps from first convolutional layer and computed with global intensity discriminator using (c) image, (d) aggregated segmentation probability, and (e) structure discriminator using joint distribution of image and aggregated segmentation probability map. The T2w SPIR MRI (a) and the aggregated segmentation probability map (b) are also shown. Feature response values are normalized to [0, 1] for visualization.

E. PSIGAN network design experiments

1). Structure discriminator:

We evaluated segmentation performance under the following structure discriminator network designs: (i) multi-channel segmentation probability only, (ii) aggregated segmentation probability map only, (iii) channel-wise concatenation of image and multi-channel segmentation probability, (iv) SOI specific structure discriminators that used channel-wise concatenation of image and segmentation probability map for each SOI, and (v) default PSIGAN, which used channel-wise concatenation of image and aggregated segmentation probability maps. These tests were done on both the MRI parotid and the T1w SPIR MRI abdomen dataset (CHAOS). Separate networks were trained in each one of these settings from scratch. The main difference between settings (i) and (ii) vs. the rest was the use of segmentation probability only in (i) and (ii) vs. joint-distribution matching of images and segmentation probabilities in (iii), (iv), and (v). Both (iii) and (v) used a single structure discriminator as opposed to K structure discriminators in (iv). Finally, whereas K channels for each organ was used to represent segmentation probability map in (iii), a single aggregated segmentation probability map was used in (v). All other losses including the global intensity discriminator and cycle consistency losses as used in PSIGAN were used all experiments.

As shown in Table V, the settings (iii), (iv), and (v) produced more accurate segmentations than (i) and (ii). Furthermore, the default setting of PSIGAN that aggregates the segmentation probabilities into a single map produced more accurate segmentation than all other methods. SOI-specific discriminator setting (iv) was similarly accurate as the setting using multi-channel segmentation probability maps in (iii). Segmentations produced on a representative case from the CHAOS dataset by all these methods is shown in Fig. 9. Segmentations on a case from the MR parotid dataset is in Supplementary Fig.9.

TABLE V.

Impact of using segmentation probability maps for adversarial training on segmentation accuracy.

T1wFS MR Parotid (N = 77) T1w SPIR MR Abdomen (N = 10)
Discriminator Parotid right Parotid left Avg. Liver Spleen Kidney left Kidney right Avg.
i. Multi-channel segmentation prob 0.74±0.07 0.73±0.06 0.74 0.89±0.02 0.84±0.05 0.79±0.12 0.80±0.07 0.83
ii. Aggregated segmentation prob 0.75±0.05 0.75±0.06 0.75 0.89±0.03 0.83±0.09 0.80±0.11 0.81±0.09 0.83
iii. Multi-channel segmentation prob + Image 0.78±0.05 0.77±0.05 0.78 0.90±0.03 0.85±0.05 0.80±0.10 0.82±0.07 0.84
iv. SOI-specific segmentation prob + Image 0.79±0.03 0.78±0.04 0.79 0.91±0.02 0.85±0.08 0.81±0.10 0.82±0.07 0.85
v. Aggregated segmentation prob + Image 0.82±0.03 0.81±0.05 0.82 0.92±0.02 0.87±0.03 0.83±0.10 0.84±0.06 0.87
Fig. 9.

Fig. 9.

Segmentation results for a representative case when using different ways of combining segmentation for computing adversarial loss. SOI-specific discriminator combines segmentation probability map for individual SOI with the images.

2). Segmentation network:

We evaluated performance when using single or split segmentor networks (Fig. 3). The structure discriminator and generator losses were computed using (5) and (6) for single segmentor, as opposed to using (9), (10) for the split segmentor case. In addition, we also measured the accuracy of segmentor SCM for generating MRI segmentations instead of the default SM network used in PSIGAN. Both split and single segmentor networks were trained from scratch till convergence with identical training, validation, and testing sets for both parotid and T1w SPIR abdomen organs segmentation.

The single segmentor configuration was less accurate than the split-segmentator (Table. II, and Table. III) with an average DSC of 0.85 on the T1w SPIR abdomen segmentation, and an average DSC of 0.80 on the parotid segmentation, respectively. The network parameters were slightly higher for the split-segmentor (42.97M) compared with single segmentor (38.99M). Adversarial losses for GCM and Dstruct during training are shown for split and single network configurations in Supplementary Fig. 10. As shown, the gap in losses for GCM and Dstruct stabilized faster for the split than the single segmentor configuration.

The SCM network produced a much lower accuracy than both SM and single-segmentor networks with an average DSC of 0.84 on the T1w SPIR abdomen segmentation, and an average DSC of 0.77 on the parotid segmentation, respectively. Organ-specific accuracies are in Supplementary Table II. Segmentations on representative examples using these three networks are shown in Supplementary Fig.7 and Fig.8 for both datasets.

F. Ablation experiments

1). Contribution of structure discriminator, global intensity discriminator, and cycle losses on accuracy:

The goal of this experiment was to evaluate the contribution of each loss on segmentation performance. Both structure and global intensity discriminator compute an adversarial loss and can train the generator independent of each other. Separate networks were trained from scratch until convergence using identical training, validation, and testing sets from the T2wFS MRI parotid datasets with the following loss settings:

  1. CT to MR global intensity discriminator loss LadvCM ( (4) ): Only a global adversarial loss was computed using the discriminator DM for CT to MRI I2I translation.

  2. CT to MR, MR to CT global intensity discriminators, and cycle losses LadvCM+LadvMC+Lcyc ( (4), (11), (12) ): Cycle consistency, global adversarial losses computed using DM and DC (for MR to CT translation) were used.

  3. Structure discriminator loss Lstruct ( (9), (10) ): Joint distribution (image and aggregated segmentation probability map) matching adversarial loss was computed from Dstruct.

  4. Structure and CT to MR global intensity discriminator losses Lstruct+LadvCM ( (9), (10), (4) ): Losses from setting 1 and 3 were combined.

  5. Structure, MR to CT global intensity discriminator, and cycle losses Lstruct+LadvMC+Lcyc ( (9), (10), (11), (12) ): Cycle loss and loss from setting 3 were combined.

  6. Structure, CT to MR, MR to CT global intensity discriminators, and cycle losses Lstruct+LadvCM+LadvMC+Lcyc: Default PSIGAN.

As shown in Table VI, Lstruct loss alone leads to a clear performance improvement compared with the combination of global adversarial and cycle losses (LadvCM+LadvMC+Lcyc). Addition of either the CT to MR global adversarial (setting 4) or cycle losses (setting 5) to the Lstruct loss resulted in equivalent performance improvements. PSIGAN, which combines all the losses produced the most accurate segmentation. Segmentations produced on a representative case by the various networks trained in the aforementioned settings are shown in Supplementary Fig. 11.

TABLE VI.

Impact of each component in PSIGAN. RP: Right parotid; LP: Left parotid.

Setting LadvCM LadvMC Lcyc Lstruct DSC
RP LP
1) 0.53±0.10 0.47±0.08
2) 0.65±0.09 0.63±0.10
3) 0.75±0.06 0.74±0.06
4) 0.77±0.05 0.77±0.04
5) 0.77±0.04 0.77±0.06
6) 0.82±0.03 0.81±0.05

2). Impact of structure discriminator on I2I translation:

Fig. 10 shows I2I translations produced by networks trained without and with structure discriminator on two example images, first one from the T2w SPIR MRI abdomen and the second from the T2wFS MRI parotid dataset. The source CT image is shown for reference. As shown, the addition of structure discriminator improved the contrast of SOIs with respect to background Fig. 10(c) and more accurately modeled the internal characteristics like the regularity in the organization of blood vessels in the liver, as appearing on real MRI. Additional I2I translation results are in Supplementary Fig. 12.

Fig. 10.

Fig. 10.

Impact of Dstruct on the CT to MRI translation on parotid and abdomen T2w dataset.

Quantitative comparison of the distribution of MR signal intensities within the SOIs between the pseudo and real MRIs were computed using Kullback-Leibler (KL) divergence metric3. The method trained without the structure discriminator produced a higher KL-divergence of 1.5 within the parotid glands and 0.14 within liver. Whereas the method trained using structure discriminator produced a K-L divergence of 0.05 for parotid glands and 0.018 for liver.4

G. Evolution of segmentation probability maps during training

Fig. 11 shows the evolution of segmentation probability maps produced on representative examples from the three analyzed datasets during early epochs in training. As seen, the various organs and the tumor are correctly detected and the segmentation probabilities improve, becoming sharper and more focused with training. The higher probabilities are indicated by red color while low probabilities correspond to blue color. These maps clearly indicate their usefulness to constrain I2I translation after training only for a few epochs.

Fig. 11.

Fig. 11.

Example PSIGAN segmentation probability maps produced during the early training epochs. First column shows expert segmentations on MRI. The structure segmentation probabilities (blue - low probability, and red - high probability) steadily increase with training.

V. Discussion

We introduced PSIGAN, a joint distribution matching method for unsupervised domain adaptation-based multiple organ segmentation. Our method produced highly similar accuracy as supervised method on three different datasets, indicating its ability to learn without requiring target modality labeled image sets. Our method also showed feasibility to segment CT scans when performing T1w to CT and T2w to CT UDA segmentation.

Joint distribution matching has previously been used to constrain the space of GAN mappings by learning the bidirectional mapping from image to a scalar latent variable [23], [22] or from image to vector of class categories [24]. To our knowledge, ours is the first to perform joint distribution matching using pairs of images for UDA segmentation.

We conducted ablation and network design experiments to determine the utility of the joint distribution matching structure discriminator for both I2I translation and segmentation. Our results show clear performance improvements when using joint distribution matching adversarial losses. Consistent with prior work that computed adversarial losses using segmentation [21], [8] or with a joint translation-segmentation network [3], [14], we also found that features extracted using adversarial discriminators using only the segmentation maps showed a strong preference for SOI geometry, by emphasizing features on SOI boundary. On the other hand, the joint-distribution matching discriminator heavily weighted features corresponding to both the geometry and the appearance within the organs.

We also found that joint distribution formulations that used aggregated segmentation probability maps yielded more accurate segmentations than formulations using either multi-channel segmentation maps or SOI-specific structure discriminators. Performance improvement in the aggregated case could have resulted from increased contextual information from the other organs that was available to the structure discriminator.

Finally, the split segmentor showed a small improvement in accuracy over the single segmentor. However, the choice of the split segmentation network for MRI segmentation clearly impacted accuracy. More specifically, the network that was used for computing discriminator gradients was more accurate than the one used for computing the generator gradients.

Our idea of using segmentation probability maps to guide the translation is similar in principle to attention-guided translation methods [47], [48], which iteratively focus the domain translation network on to regions of interest and produce the desired translation. The main difference is that our method handles simultaneous translation of multiple target and background structures while attention-guided methods are typically restricted to transfiguring a single foreground. Also, as the optimization of the segmentation network is done in a supervised manner, pre-specified image to target relationships can be easily extracted through a fast converging network to constrain translation. Deriving unsupervised attention information on the other hand, would require pre-training of the self-attention network for several epochs before it can be combined with the generator as shown in [48].

A deficiency of our method as is common to most UDA methods is the inability to handle expert delineation variabilities across modalities that may arise as visibility of structures can vary across modalities. Preliminary evaluation of our method on an external dataset with a variety of MR contrasts and scanning parameters indicates that it is possibly robust to MR contrast variations. However, extensive validation and potential extension to handle large MR contrast variations on much bigger cohorts are needed and is work for future. Nevertheless, our method outperformed multiple state-of-the-art methods.

VI. Conclusion

We developed and evaluated a new unpaired domain adaptation segmentation approach using joint distribution matching structure discriminator for multiple organ segmentation on MRI datasets. Our approach outperformed multiple state-of-the art methods and demonstrated the value of structure discriminator in improving I2I translation and segmentation.

Supplementary Material

supp1-3011626

Acknowledgments

This work was supported by the MSK Cancer Center core grant P30 CA008748.

Footnotes

1

We only show the components related to our contribution for simpler explanation. Other parts like DC, GMC, DM are used as done in CycleGAN.

3

The metric was computed from pseudo MRI to MRI direction.

4

We chose liver as this is the largest organ and is highly textured for better quantification of errors in both methods on CHAOS dataset.

References

  • [1].Kupelian P and Sonke J, “Magnetic-resonance guided adaptive radiotherapy: a solution to the future,” Semin Radiat Oncol, vol. 24, no. 3, pp. 227–32, 2014. [DOI] [PubMed] [Google Scholar]
  • [2].Bainbridge H, Salem A, Tijssen R, Dubec M, Wetscherek A, Van EC et al. , “Magnetic resonance imaging in precision radiation therapy for lung cancer,” Transl Lung Cancer Research, vol. 6, no. 6, pp. 689–707, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Huo Y, Xu Z, Moon H, Bao S, Assad A, Moyo TK et al. , “Synsegnet: Synthetic segmentation without target modality ground truth,” IEEE Trans. on Med. Imaging, vol. 34, no. 4, pp. 1016–1025, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Kamnitsas K, Baumgartner C, Ledig C, Newcombe V, Simpson J, Kane A et al. , “Unsupervised domain adaptation in brain lesion segmentation with adversarial networks,” in Information Processing in Medical Imaging, 2017, pp. 597–609. [Google Scholar]
  • [5].Zhu Q, Du B, and Yan P, “Boundary weighted domain adaptive neural network for prostate MR image segmentation,” IEEE Trans. Med Imaging, no. 3, pp. 753–763, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Zhang Z, Yang L, and Zheng Y, “Translating and Segmenting Multimodal Medical Volumes With Cycle- and Shape-Consistency Generative Adversarial Network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2018, pp. 9242–9251. [Google Scholar]
  • [7].Ouyang C, Kamnistas K, Biffi C, Duan J, and Rueckert D, “Data efficient unsupervised domain adaptation for cross-modality image segmentation,” Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, pp. 669–677, 2019. [Google Scholar]
  • [8].Li Y, Yuan L, and Vasconcelos N, “Bidirectional learning for domain adaptation of semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2019, pp. 6936–6945. [Google Scholar]
  • [9].Duo Q, Ouyang C, Chen C, Chen H, Glocker B, Zhuang X et al. , “PnP-AdaNet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation,” IEEE Access, vol. 7, pp. 99 065–99 076, 2019. [Google Scholar]
  • [10].Jiang J, Hu Y-C, Tyagi N, Zhang P, Rimner A, Mageras GS et al. , “Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.- Assist. Intervent, 2018, pp. 777–785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Zhu JY, Park T, Isola P, and Efros A, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Intl. Conf. Computer Vision ICCV, 2017, pp. 2223–2232. [Google Scholar]
  • [12].Lee HY, Tseng HY, Huang JB, Singh M, and Yang M-H, “Diverse image-to-image translation via disentangled representations,” in Proc. Euro. Conf. Comput. Vis, 2018, pp. 35–51. [Google Scholar]
  • [13].Yang J, Dvornek NC, Zhang F, Chapiro J, Lin M, and Dun-can JS, “Unsupervised domain adaptation via disentangled representations: Application to cross-modality liver segmentation,” Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, pp. 255–263, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Zhang Z, Yang L, and Zheng Y, “Translating and segmenting multimodal medical volumes with cycle- and shape consistency generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2018, pp. 9242–9251. [Google Scholar]
  • [15].Bousmalis K, Silberman N, Dohan D, Erhan D, and Krishnan D, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, no. 2, 2017, pp. 3722–3731. [Google Scholar]
  • [16].Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K et al. , “CyCADA: Cycle-consistent adversarial domain adaptation,” in Proc. Int. Conf. on Machine Learning, vol. 80, 2018, pp. 1989–1998. [Google Scholar]
  • [17].Chen C, Dou Q, Chen H, Qin J, and Heng P-A, “Synergistic image and feature adaptation: Towards cross-modality domain adaptation for medical image segmentation,” arXiv preprint arXiv:1901. 08211, 2019. [Google Scholar]
  • [18].Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, and Chen X, “Improved techniques for training gans,” in Proc. Adv. Neural Inf. Process. Syst, 2016, pp. 2234–2242. [Google Scholar]
  • [19].Chen C, Dou Q, Chen H, and Heng P-A, “Semantic-aware generative adversarial nets for unsupervised domain adaptation in Chest X-Ray segmentation,” in Machine Learning in Medical Imaging, 2018, pp. 143–151. [Google Scholar]
  • [20].Tsai Y-H, Sohn K, Schulter S, and Chandraker M, “Domain adaptation for structured output via discriminative patch representations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2019, pp. 1456–1465. [Google Scholar]
  • [21].Vu TH, Jain H, Bucher M, Cord M, and Pérez P, “ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2019, pp. 2517–2526. [Google Scholar]
  • [22].Li C, Liu H, Chen C, Pu Y, Chen L, Henao R et al. , “ALICE: Towards understanding adversarial learning for joint distribution matching,” Proc. Adv. Neural Inf. Process. Syst, pp. 5495–5503, 2017. [Google Scholar]
  • [23].Donahue J, Krähenbühl P, and Darrell T, “Adversarial feature learning,” arXiv preprint arXiv:1605 09782, 2016. [Google Scholar]
  • [24].Chongxuan L, Xu T, Zhu J, and Zhang B, “Triple generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst, 2017, pp. 4088–4098. [Google Scholar]
  • [25].Dou Q, Ouyang C, Chen C, Chen H, and Heng P-A, “Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss,” arXiv preprint arXiv:1804. 10916, 2018. [Google Scholar]
  • [26].Joyce T, Chartsias A, and Tsaftaris SA, “Deep multi-class segmentation without ground-truth labels,” in Proc. Int. Conf. Medi. Imag. with Deep Learning, 2018. [Google Scholar]
  • [27].Dong N, Kampffmeyer M, Liang X, Wang Z, Dai W, and Xing E, “Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, 2018, pp. 544–552. [Google Scholar]
  • [28].Tsai Y-H, Hung W-C, Schulter S, Sohn K, Yang M-H, and Chandraker M, “Learning to adapt structured output space for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2018, pp. 7472–7481. [Google Scholar]
  • [29].Murez Z, Kolouri S, Kriegman D, Ramamoorthi R, and Kim K, “Image to image translation for domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2018, pp. 4500–4509. [Google Scholar]
  • [30].Liu F, “SUSAN: segment unannotated image structure using adversarial network,” Magnetic resonance in medicine, vol. 81, no. 5, pp. 3330–3345, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Zhao H, Li H, Maurer-Stroh S, Guo Y, Deng Q, and Cheng L, “Supervised segmentation of un-annotated retinal fundus images by synthesis,” IEEE Trans. on Med. Imaging, vol. 38, no. 1, pp. 46–56, 2018. [DOI] [PubMed] [Google Scholar]
  • [32].Cai J, Zhang Z, Cui L, Zheng Y, and Yang L, “Towards cross-modal organ translation and segmentation: A cycle-and shape-consistent generative adversarial network,” Med. Im. Ana, vol. 52, pp. 174–184, 2018. [DOI] [PubMed] [Google Scholar]
  • [33].Cohen JP, Margaux L, and Sina H, “Distribution matching losses can hallucinate features in medical image translation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, 2018, pp. 529–536. [Google Scholar]
  • [34].Armanious K, Yang C, Fischer M, Küstner T, Nikolaou K, Gatidis S et al. , “MedGan: Medical Image Translation using GANs,” vol. abs/1806.06397, 2018. [DOI] [PubMed]
  • [35].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” Proceedings of the 3rd Int. Conf. on Learning Representations, 2014. [Google Scholar]
  • [36].Radford A, Metz L, and Chintala S, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511 06434, 2015. [Google Scholar]
  • [37].Isola P, Zhu J-Y, Zhou T, and Efros AA, “Image-to-image translation with conditional adversarial networks,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 1125–1134, 2017. [Google Scholar]
  • [38].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, 2015, pp. 234–241. [Google Scholar]
  • [39].Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J et al. , “The multimodal brain tumor image segmentation benchmark (BRATS),” IEEE Trans. on Med. Imaging, vol. 34, no. 10, p. 1993, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Liu M-Y, Breuel T, and Kautz J, “Unsupervised image-to-image translation networks,” in Proc. Adv. Neural Inf. Process. Syst, 2017, pp. 700–708. [Google Scholar]
  • [41].Nyúl LG and Udupa JK, “On standardizing the MR image intensity scale,” Magnetic Resonance in Medicine, vol. 42, no. 6, pp. 1072–1081, 1999. [DOI] [PubMed] [Google Scholar]
  • [42].Kavur AE, Selver MA, Dicle O, Bar M, and Gezer NS, “CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data,” April 2019. [Online]. Available: 10.5281/zenodo.3362844 [DOI] [PubMed]
  • [43].Landman B, Xu Z, Igelsias J, Styner M, Langerak T, and Klein A, “MICCAI multi-atlas labeling beyond the cranial vault-workshop and challenge,” 2015. [Google Scholar]
  • [44].Erickson B, Kirk S, Lee Y, Bathe O, Kearns M, Gerdes C et al. , “Radiology data from the cancer genome atlas liver hepatocellular carcinoma [TCGA-LIHC] collection the,” Cancer Imaging Archive, 2016. [Google Scholar]
  • [45].Raudaschl PF, Zaffino P, Sharp GC, Spadea MF, Chen A, Dawant BM et al. , “Evaluation of segmentation methods on head and neck ct: Auto-segmentation challenge 2015,” Medical physics, vol. 44, no. 5, pp. 2020–2036, 2017. [DOI] [PubMed] [Google Scholar]
  • [46].Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S et al. , “Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach,” Nature communications, vol. 5, p. 4006, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Mejjati YA, Richardt C, Tompkin J, Cosker D, and Kim KI, “Unsupervised attention-guided image-to-image translation,” in Proc. Adv. Neural Inf. Process. Syst, 2018, pp. 3693–3703. [Google Scholar]
  • [48].Zhang H, Goodfellow I, Metaxas D, and Odena A, “Self-attention generative adversarial networks,” in Proc. of Int. Conf. on Machine Learning, vol. 97, 2019, pp. 7354–7363. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-3011626

RESOURCES