Abstract
Two key questions in cardiac image analysis are to assess the anatomy and motion of the heart from images; and to understand how they are associated with non-imaging clinical factors such as gender, age and diseases. While the first question can often be addressed by image segmentation and motion tracking algorithms, our capability to model and answer the second question is still limited. In this work, we propose a novel conditional generative model to describe the 4D spatio-temporal anatomy of the heart and its interaction with non-imaging clinical factors. The clinical factors are integrated as the conditions of the generative modelling, which allows us to investigate how these factors influence the cardiac anatomy. We evaluate the model performance in mainly two tasks, anatomical sequence completion and sequence generation. The model achieves high performance in anatomical sequence completion, comparable to or outperforming other state-of-the-art generative models. In terms of sequence generation, given clinical conditions, the model can generate realistic synthetic 4D sequential anatomies that share similar distributions with the real data. The code and the trained generative model are available at https://github.com/MengyunQ/CHeart.
Index Terms: Conditional generative model, synthetic data generation, cardiac image analysis, cardiac anatomy and motion
I. Introduction
CARDIAC imaging plays an essential role in cardiovascular image diagnosis and management [10]. Imaging modalities such as cine cardiac magnetic resonance (CMR) or ultrasound scans reveals the anatomical structure of the heart as well as its contracting and relaxing patterns [26]. A classical but long-standing research problem is to explore the associations between the three-dimensional (3D) cardiac anatomy and other non-imaging clinical factors, such as age, gender, diseases [5]. Besides 3D anatomical information, the temporal dynamic motion of the heart also contains information that is useful for clinical diagnosis and therapy selection [20], [32], [47]. It is of particular interest to develop computational tools that can bridge between spatial-temporal imaging features and non-imaging clinical factors. In this work, we aim to improve our understanding of the spatial-temporal cardiac anatomy and clinical factors from a generative modelling perspective. We propose a conditional generative model to model the interaction between imaging features and clinical factors. Given clinical factors as conditions, the proposed model can generate corresponding 4D spatial-temporal cardiac anatomies. We demonstrate that the generated 4D anatomies are realistic and consistent with real data distribution.
Lately, the field of conditional generative modelling has made tremendous progress, greatly driven by deep learning methods such as conditional generative adversarial networks (GAN) [34], conditional variational autoencoders (VAEs) [29], [48], flow-based models [41] and diffusion models [36]. These approaches enable efficient approximation of the underlying conditional distributions and generation of high-quality samples. Improvements in conditional generative models have been characterised by numerous developments in different generation tasks: image-to-image translation [13], [24], [27], style and lyrics-to-music generation [16] and text-to-image synthesis [12].
Apart from generating static images [36], generative models have also been applied to sequential data, such as videos [46], [53] and music [16]. In these applications, it is important to learn a model that is able to capture the inner connection of temporal sequences. To this end, long short-term memory (LSTM) [28], [52] and transformers [56] have been explored to learn the sequential progression of the latent representations of the samples. Some work also introduces spatiotemporal convolution and attention layers to learn temporal world dynamics from a collection of videos [46]. Sequential data contain both structural variations and motion information. Disentangled representation learning approaches such as DiSCVAE [59] have been proposed to separate representations of the motion features from the structural features.
In the field of medical imaging, several papers have explored incorporating non-imaging clinical factors into the image generation process. Dalca et al. [15] proposed a learning framework for building deformable brain image templates conditioned on age. Xia et al. [54] developed a model to generate synthetic brain images conditioned on age and the status of Alzheimer’s disease. For cardiac images, Biffi et al. [7] presented LVAE for interpretable classification of anatomical shapes into different clinical conditions. Krebs et al. [30] proposed to learn a probabilistic motion model for spatio-temporal cardiac image registration. Reynaud et al. [40] proposed a causal generative model to generate synthetic 3D ultrasound videos conditioned on a given input image and an expected ejection fraction. Campello et al. [9] proposed a conditional generative model in cardiac imaging to extract longitudinal patterns related to aging. Duchateau et al. [17] built a scheme for synthesizing pathological cardiac sequences from real healthy sequences. Amirrajab et al. [1] developed a framework for simulating cardiac MRI with variable anatomical and imaging characteristics. For cardiac temporal modeling scheme, some work [57], [60], [61] showed dynamic cardiac data could be described by low-dimensional latent representations, i.e. a conditional autoencoder to capture latent representations of data [61] or temporal smoothness applied as a regularisation term in the reconstruction loss function [60], [61]. These works provide useful insights for conditional medical image generation. However, the generation of a sequence of spatial-temporal cardiac anatomies from multiple clinical factors has been less explored.
In this work, we propose a conditional generative model that can generate realistic cardiac anatomical sequences conditioned on non-imaging factors including age, gender, weight, height and blood pressure. We name the Conditional Heart generation model as CHeart. The model employs a variational autoencoder to learn the latent representations for cardiac anatomies and a condition encoder to embed the clinical conditions into a condition latent vector. Then, a Temporal Module is designed to generate the condition-related sequential latent space based on the anatomy latent representations and the condition latent vector. The proposed model demonstrates a high diversity and fidelity in the generation, evaluated using structural overlaps and surface distance metrics, as well as clinical measure (ventricular volume and mass) distributions. The main contributions in this work are summarised as follows:
We propose a spatial-temporal generative model for 3D cardiac anatomy that accounts for both the spatial variations and the temporal variations i.e. motion during the cardiac cycle.
We leverage both imaging data and non-imaging clinical data to train the model, which allows the model to generate cardiac anatomical sequences conditioned on multiple clinical factors.
We introduce a temporal module into the latent space of cardiac anatomy and conditions to model the complex sequential patterns of a beating heart.
We demonstrate that the model can generate highly realistic and diverse cardiac anatomical sequences that follow the real data distributions.
II. Methods
The proposed generative model takes non-imaging clinical factors as input and generates a cardiac anatomical sequence. Fig. 1 illustrates the overall framework. The following sections provide more technical details. First, we introduce the conditional generative model. Then, we describe the temporal module for learning the sequential latent representations due to cardiac motion. Lastly, we demonstrate two applications of the generative model at the inference stage: anatomical sequence completion and anatomical sequence generation.
Fig. 1.
Overview of the CHeart model including training and inference stages. During training, an encoder is applied to learn the latent representations zc, z0 for the clinical conditions c and anatomy at the first time frame x0. A temporal module models the trajectory of in the latent space across the temporal dimension from the initial latent vectors zc and z0. The decoder then generates the 4D cardiac anatomy sequence x0:T–1 from the latent vectors on the temporal trajectory. The training process enables two inference mechanisms at test time: sequence completion and sequence generation. In sequence completion, the model is given x0 and c, and generates the remaining sequence of anatomies in the cardiac cycle. In sequence generation, a random latent code z0 sampled from the prior distribution and c are given to the model and the temporal module to generate the latent vector sequence , which are used to generate synthetic cardiac anatomical sequence .
A. Conditional generative model
Assume that we observe a dynamic sequence of anatomies of a subject, xt(t = 0, 1, …, T – 1), where xt denotes the anatomical segmentation at the t-th frame and T denotes the total number of time frames in a sequence. We also observe some clinical conditions c for this subject, where c could include factors such as age, gender, weight, height, blood pressure etc. Our aim is to learn the probability distribution of the anatomy x conditioned on c with a chosen model, pθ(x|c), where θ denotes the model parameters. We seek a model pθ(x|c) which is sufficiently flexible to be able to describe the data x. Deep neural networks have often been used for this modelling due to its complex modelling capacity [21], [29], [48]. Without losing generality, we first attempt to learn the distribution of anatomy at the first time frame, pθ(x0|c), which is often the end-diastolic (ED) frame in cardiac imaging.
We adopt the conditional β-VAE model [21], [29], [48] to learn the data distribution. The condition c is embedded as a condition latent vector zc by the MLP, which integrates multiple clinical factors and enables exploration across the conditional latent space. The model consists of a decoder pθ(x0|z0, zc) and an encoder qϕ(z0|x0, zc). The decoder pθ(x0|z0, zc) with parameters θ maps the latent variables z0, zc to the anatomy x0. We assume a prior distribution p(z0) over the latent variable z0. The prior and the decoder together define a joint distribution, denoted as pθ(x0, z0|zc), which is parameterized by θ.
To turn the intractable posterior inference and learning problem into a tractable problem, we introduce a parametric encoder model qϕ(z0|x0, zc) with ϕ as the variational parameters, which approximates the true but intractable posterior distribution pθ(z0|x0, zc) of the generative model, given an input x0 and condition space zc:
| (1) |
where qϕ(z0|x0, zc) often adopts a simpler form, e.g. the Gaussian distribution. By introducing the approximate posterior qϕ(z0|x0, zc), the log-likelihood of pθ(x0|zc) can be formulated as:
| (2) |
where the second term denotes the Kullback-Leibler (KL) divergence DKL(qϕ || pθ), between qϕ(z0|x0, zc) and pθ(z0|x0, zc). It is non-negative and zero only if the approximate posterior qϕ(z0|x0, zc) equals the true posterior distribution pθ(z0|x0, zc). Due to the non-negativity of the KL divergence, the first term in Eq. 2 is the lower bound of the evidence log[pθ(x0|zc)], known as the evidence lower bound (ELBO). Instead of optimising the evidence log[pθ(x0|zc)] which is often intractable, we optimise the ELBO:
| (3) |
To better control the encoding representation capacity and encourage more efficient latent encoding, we adopt β-VAE by modifying VAE with an adjustable hyperparameter β [21]. As a result, the loss function of the generative model is formulated as:
| (4) |
where the sign is negated so as we can minimise the loss function.
In practice, we use the reconstruction loss for the first term., i.e. how accurate the generative model pθ(x0) can be for reconstructing the anatomy x0 from the latent vector z0 using the decoder. The reparameterization trick is applied to replace the subscript of the expectation and express the random variable z0 ~ qϕ(z0|x0, zc) as some differentiable and invertible transformation of another random variable ϵ, so the expectation does not rely on q itself.
B. Motion modelling in the latent space
In the previous section, we modelled qϕ(z0|x0, zc) and pθ(x0|z0, zc) for the first frame x0 in a sequence. Here, to model the whole anatomical sequence x0, x1, …, xT–1 on the clinical conditions c, we design a Temporal Module constructed using a one-to-many LSTM structure [49] with parameters ω, which generates the condition-related sequential latent codes based on z0 and zc. The detailed structure of the temporal module is illustrated in Fig. 2.
Fig. 2.
The temporal module for generating the sequential latent codes z0:T–1, constructed with a one-to-many long short-term memory (LSTM) structure.
LSTM [22] is a variant of recurrent neural networks that consists of gating mechanisms and cell memory blocks. The first LSTM cell of the module takes the concatenation of the anatomy latent representation z0 and the condition latent representation zc as input, which is denoted as . With the hidden state h0 and cell state cell0 being initialised to zero, it infers the latent at the next time frame. Each following LSTM cell, with shared weights, takes as input, updates the hidden state ht and cell state cellt, and infers the latent . All the LSTM cells have shared weights. Each latent code contains information of both the anatomy at time t and the clinical conditions c. The cardiac anatomy of a dynamic sequence forms a temporal sequence in the latent space, where t = 0, 1, …, T. After the temporal module computes the latent codes across all the time frames, the decoder generates the anatomical sequence from , illustrated in Fig. 1.
The overall loss function for modelling the anatomical sequence generation is formulated based on Eq.4:
| (5) |
The training loss function is composed of two parts: 1) the reconstruction accuracy at all time frames, where we use cross-entropy for evaluating the performance in reconstructing the segmentation maps; 2) the KL divergence term, penalising the discrepancy between the learned prior and posterior distributions. The whole training process is performed end-to-end, with the encoder, temporal module and decoder being trained together. The VAE enables the model to learn a low-dimensional latent space that captures the underlying anatomical variations. By incorporating the temporal module, the model can effectively model the temporal dynamics in the cardiac images, enabling the generation of anatomically consistent and coherent sequences over time.
C. Inference
To demonstrate the performance of the proposed generative model at the inference stage, we carry out two benchmark tasks, namely anatomical sequence completion and anatomical sequence generation, as shown in the right panel of Fig. 1.
In anatomical sequence completion, the model is given the anatomy at the first time frame x0 and clinical conditions c. It is asked to generate the remaining sequence of anatomies across the cardiac cycle. The model maps x0 and c to their latent representations z0 and zc, predicts the sequential latent codes through the temporal module and finally reconstructs the full sequence of cardiac anatomy using the shared-weight decoders.
In anatomical sequence generation, the model is only conditioned on the clinical factors c and it does not require any anatomy as input. Since the model has learnt the distribution of anatomical latent variable pz0, we can draw samples z0 in the latent space from a Gaussian distribution 𝒩(0, 1) and concatenate it with the clinical latent code zc. We then provide the concatenated latent code to the temporal module to predict and generate the full anatomical sequence using the decoder.
D. Evaluation
To evaluate the conditional generative model, we use quantitative measures to assess the generated anatomy, as well as use clinical measures to assess the distribution similarity.
First, we employ the Dice coefficient, the Hausdorff distance (HD) and the average symmetric surface distance (ASSD) which compare the similarity of the generated cardiac anatomy to the ground truth anatomy associated with the same clinical conditions.
Second, we derive five imaging phenotypes including the left ventricular myocardial mass (LVM), LV end-diastolic volume (LVEDV), LV end-systolic volume (LVESV), right ventricular end-diastolic volume (RVEDV) and RV end-systolic volume (RVESV). We evaluate differences between generated data and real data with the same clinical conditions, denoted as dphenotype. Furthermore, these phenotypes are closely associated with age and gender [5]. We calculate the distributions of the imaging phenotypes against age and gender, and compare the generated data to the real data. The comparison is illustrated qualitatively using density plots and quantitatively using the Kullback–Leibler (KL) divergence and Wasserstein distance (WD). The KL divergence [14] is an information-theoretic measurement of the similarity between two probability mass functions. Similarly, WD [2] measures the distance between two probability distributions and can be computed as:
| (6) |
where ∏(P, Q) is the set of all joint distributions over u and v. WD can be seen as the minimum work needed to transform one distribution to another, where work is defined as the amount of mass that must be moved from u to v to transform P to Q and the distance to be moved.
III. Experiments
A. Data sets
A short-axis 3D cardiac MR dataset of 1,383 subjects was used, acquired from Hammersmith Hospital, Imperial College London. Each cardiac cine image sequence comprises 20 time frames (T = 20) covering one complete cardiac cycle, with a spatial resolution of 1.25 mm × 1.25 mm × 2 mm. The temporal resolution ranges from 0.041 to 0.048 seconds per frame, accommodating variations in the heart rate. The cardiac anatomy is described by the image segmentation map with four labels: background, the left ventricle (LV), myocardium (Myo) and the right ventricle (RV). Ground truth segmentation at end-diastolic (ED) and end-systolic (ES) frames was generated by using a multi-atlas segmentation method [3], then quality controlled and manually corrected by an experienced cardiologist using itkSNAP [58]. A state-of-the-art nnU-net model [23] was trained using the ED and ES segmentation and then deployed to all time frames generating the 3D-t segmentation, followed by manual quality control. To eliminate the influence of image orientations in the generation, all 3D-t segmentation were rigidly aligned to a template space using MIRTK [42], [44] and cropped to a standard size of 128 × 128 × 64. In this way, the generative model will focus on learning subject-specific variations of the anatomy instead of image orientations.
In terms of demographic information, all subjects were healthy volunteers, with 775 females and 608 males, aged between 18-73 years old, weighed between 33-131 kg, with height between 142-195 cm and systolic blood pressure (SBP) between 79-183 mm Hg. When incorporating the clinical information into the model, age was represented as a categorical factor with seven age groups with an interval of 10 years, from 10 to 80 years old. The dataset was randomly split into three subsets for training (n = 968), validation (n = 138) and test (n = 277).
B. Experimental setup
1). Implementation
The model was implemented in PyTorch [37]. The encoder qϕ, consisted of four 3D convolution layers, one flatten layer and one bottleneck layer, outputting the latent code z0. The condition mapping network was constructed using an MLP, outputting latent code zc for input conditions c. A latent dimension of 32 was used for both z0 and zc, and 64 for the concatenated latent vector . The decoder consisted of one flatten layer and four 3D transposed convolution layers. All convolution and transposed convolution layers in the encoder and the decoder used a kernel size of 4. The temporal module was built with one-layer LSTMCells. The regularisation weight β in β-VAE was set to 0.001. The model was trained using the Adam optimiser with a learning rate of 5·10–4 and a batch size of 8. It was trained for 500 epochs and an early stopping criterion was used based on the validation set performance. The training took 17 hours on an NVIDIA RTX A6000 GPU.
2). Baseline methods
Currently, there is no other existing work for performing conditional generation of 3D-t cardiac anatomies. For comparison, we implemented the following baseline generation methods developed in other application domains, extending them from 2D image generation to 3D-t data generation:
CGAN
A conditional version of the generative adversarial network (GAN) originally developed for MNIST images [34]. Note that the model can only perform cardiac sequence generation, not sequence completion.
CVAE
The conditional generative model CVAE [48]. It was modified to adapt to this application. CVAE applied condition incorporation by concatenating conditions and anatomies in both the encoder and decoder.
CVAE-GAN
A conditional variational generative adversarial network proposed in [6]. It is a general learning framework that combines a VAE with a GAN for synthesizing natural images in fine-grained categories.
PCA
The principal component analysis (PCA) [25]. It is a classical method for dimensionality reduction, which aims to preserve as much of the variation in data as possible using the principal components. Note that the PCA is only used for performing sequence completion, but not for sequence generation.
C. Sequence completion
A well-known challenge to generative modelling is the difficulty in evaluation, as we normally do not have access to the ground truth data distribution, e.g. the distribution of all possible cardiac anatomies in our case. Therefore, we adopt anatomical sequence completion as a surrogate task for evaluating the model performance. The sequence completion experiments were conducted to assess the ability of capturing the sequential information given the first frame of a cardiac anatomy sequence. One example of sequence completion is shown in Fig. 3. It can be seen in the figure that the generated anatomies across time frames maintain the same heart structures as the ED frame and capture the temporal motion pattern through time, contracting first and then expanding.
Fig. 3.
An example of sequence completion, arranged in two rows with the left-to-right and top-to-bottom order. With the end-diastolic (ED) frame in time t = 0 and conditions c as input, the model generates the remaining anatomical sequence at time frame t = 1–19, shown within the gray box. The top row depicts anatomy images at time frame t = 0 – 9, and the bottom row depicts at time frame t = 10 – 19.
The sequence completion accuracy is evaluated between the generated anatomy and ground truth across the whole sequence in terms of the Dice metric, HD and ASSD for three structures: LV, Myo and RV. Table I reports the sequence completion accuracy of the proposed model and compares it to other generative models including CVAE-GAN [6], CVAE [48] and PCA [25]. It shows that the proposed model achieves a good sequence completion accuracy with an average Dice metric of 0.874, HD of 5.842 mm and ASSD of 1.462 mm, which is comparable to or outperforms the other three generative models in most metrics. In addition, we conducted evaluations at the basal, mid-cavity, and apical slices. The proposed model achieved an average Dice metric of 0.929, 0.927, and 0.878 for LV at the three locations, surpassing the corresponding metrics of the other three generative models.
Table I. The Sequence Completion Performance of Different Models in terms of Dice, Hausdorff distance (HD), average symmetric surface distance (ASSD). Mean and standard deviation are reported. Asterisks indicate statistical significance (* : P ≤ 0.05) when using a paired Student’s t-test comparing the performance of the proposed method to other methods.
| Dice (unit: 1) | ||||
| LV | Myo | RV | Average | |
| CVAE-GAN [6] | 0.845*±0.028 | 0.697*±0.054 | 0.832*±0.028 | 0.791*±0.032 |
| CVAE [48] | 0.900±0.023 | 0.800*±0.040 | 0.894±0.023 | 0.864*±0.026 |
| PCA [25] | 0.906±0.022 | 0.810±0.038 | 0.901±0.023 | 0.872±0.025 |
| Proposed | 0.908 ±0.023 | 0.814 ±0.037 | 0.902 ±0.021 | 0.874 ±0.024 |
| HD (unit: mm) | ||||
| LV | Myo | RV | Average | |
| CVAE-GAN [6] | 10.361*±1.475 | 9.571*±1.379 | 14.070*±3.736 | 11.334*±1.849 |
| CVAE [48] | 5.920*±1.335 | 5.891*±1.055 | 6.525±1.076 | 6.112*±1.049 |
| PCA [25] | 5.517 ±1.029 | 5.710±1.125 | 6.165 ±1.072 | 5.797 ±0.978 |
| Proposed | 5.535±1.180 | 5.576 ±0.955 | 6.445±1.067 | 5.842±1.017 |
| ASSD (unit: mm) | ||||
| LV | Myo | RV | Average | |
| CVAE-GAN [6] | 2.120*±0.390 | 1.670*±0.236 | 2.244*±0.399 | 1.983*±0.306 |
| CVAE [48] | 1.657*±0.348 | 1.376*±0.212 | 1.622*±0.305 | 1.461±0.280 |
| PCA [25] | 1.565±0.324 | 1.319*±0.221 | 1.519 ±0.301 | 1.490±0.305 |
| Proposed | 1.535 ±0.330 | 1.298 ±0.208 | 1.620±0.323 | 1.462 ±0.266 |
We also performed paired student’s t-tests between the results generated by our method and those of competing methods. The performance metrics of the proposed model marked with asterisk in Table I were significantly better than other methods at a p value smaller than 0.05. On a different cardiac MR dataset, [4] reports an average Dice metric of 0.94, 0.88, 0.90 for LV, myocardium and RV, respectively, for interobserver variability in manual cardiac image segmentation (Table 3 of [4]). The Dice metric of the proposed generative model is close to this value, which indicates its high performance and capability for anatomical sequence completion.
Table III. Comparison of sequence generation performance among CGAN, CVAE, CVAE-GAN and the proposed model. The clinical measures derived from each real sample are compared to those derived from 20 synthetic samples of exactly the same conditions. The mean and the minimal differences of the clinical measures are reported here.
| Model | dLVEDV (mL) | dLVESV (mL) | dRVEDV (mL) | dRVESV (mL) | dLVM (g) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| mean | best/min | mean | best/min | mean | best/min | mean | best/min | mean | best/min | |
| CGAN [34] | 35.58±20.33 | 15.66±16.67 | 20.06±9.71 | 19.74±9.72 | 51.47±25.25 | 14.71±17.12 | 17.57±12.19 | 17.04±12.18 | 38.26±19.15 | 10.40±11.23 |
| CVAE [48] | 35.74±16.99 | 4.91 ±9.84 | 13.92±6.06 | 1.87±3.46 | 44.97±21.58 | 6.46 ±12.92 | 19.49±9.21 | 2.86±5.74 | 23.07±9.96 | 2.70 ±4.33 |
| CVAE-GAN [6] | 51.32±20.40 | 6.33±11.96 | 19.80±6.53 | 1.69 ±2.57 | 48.94±28.66 | 8.28±17.52 | 25.26±10.99 | 2.57 ±4.11 | 51.03±11.40 | 8.29±7.91 |
| Proposed | 25.93 ±17.47 | 6.87 ±12.09 | 11.74 ±8.41 | 3.54±6.25 | 34.63 ±21.31 | 6.88±12.87 | 15.54 ±11.33 | 5.12±9.19 | 17.34 ±9.89 | 2.95±5.62 |
D. Sequence generation
Apart from the sequence completion task, we also perform anatomical sequence generation and evaluate how close the generated anatomical sequences are to the real data. In this experiment, we generate new synthetic anatomies of the heart by providing the clinical conditions as the only input to the model. Given the stochastic nature of the VAE generation, for each set of input conditions, multiple anatomical sequences can be generated. We draw 20 random samples from the Gaussian distribution of the latent vector, and correspondingly generate 20 synthetic anatomical sequences for this input condition set.
We first compare the synthetic anatomies to the real anatomy with the same clinical conditions and evaluate the mean similarity and the best similarity across 20 samples, in terms of the Dice metric, HD, ASSD and differences of clinical measures. This is similar to the random average or random best evaluation in other recent generation works in computer vision [38]. Table II shows that the proposed model achieves a reasonably good sequence generation accuracy with a mean Dice metric of 0.713, HD of 10.940 mm and ASSD of 3.023 mm. We also reported the best value of each measurement, with a significantly improved maximum Dice of 0.793, minimum HD of 8.166 mm, and ASSD of 2.049 mm. This perhaps means the proposed method can capture a wide variation of anatomies and thus draw a sample that is close to the real sample. When we compare the differences of clinical phenotypes, Table III shows that our model achieved the lower measurement difference with a mean difference of 25.93 mL, 11.74 mL, 34.63 mL, 15.54 mL and 17.34 g and minimum difference of 6.87 mL, 3.54 mL, 6.88 mL, 5.12 mL and 2.95 g for LVEDV, LVESV, RVEDV, RVESV and LVM, respectively. The results of mean and best values indicate that our model achieves similar (Dice) or better sequence generation accuracy (HD, ASSD, difference in clinical measures) compared to other methods. The best values of the metrics indicate the high fidelity of the proposed generative model, which refers to the degree to which the generated samples resemble the real ones [35], [43]. It is important to acknowledge that in anatomical sequence generation, the model is not expected to replicate existing anatomies. But instead, the model generates a plausible anatomy that fulfils certain conditions, which is compared to a real anatomy with the same conditions.
Table II. Comparison of sequence generation performance between CGAN, CVAE, CVAE-GAN and the proposed model, in terms of mean and best Dice metric and contour distance metrics for the average performance over LV, RV and Myo. The best value across 20 samples for Dice metric (maximum), HD (minimum) and ASSD (minimum) are reported. Asterisks indicate statistical significance (*: p ≤ 0.05) when using a paired Student’s t-test comparing the performance of the proposed method to other methods.
| Model | Dice (unit: 1) | HD (unit: mm) | ASSD (unit: mm) | |||
|---|---|---|---|---|---|---|
| mean | best/max | mean | best/min | mean | best/min | |
| CGAN [34] | 0.713 ±0.061 | 0.717*±0.061 | 15.533*±2.258 | 13.956*±2.326 | 3.004 ±0.714 | 2.862*±0.712 |
| CVAE [48] | 0.694±0.056 | 0.789±0.049 | 11.461*±1.809 | 8.321±1.536 | 3.380*±0.710 | 2.317*±0.540 |
| CVAE-GAN [6] | 0.645*±0.052 | 0.774±0.039 | 16.844*±2.008 | 12.105*±1.815 | 3.693*±0.709 | 2.185±0.394 |
| Proposed | 0.713 ±0.058 | 0.793 ±0.052 | 10.940 ±2.343 | 8.166 ±1.621 | 3.023±0.757 | 2.049 ±0.521 |
Further, we visualised two examples of anatomical sequence generation in Fig. 4. For each example, we show five random synthetic samples which share the same clinical conditions as the real sample. It illustrates that the LV and RV structures look realistic and their shapes share a high similarity to the real anatomy. The contracting pattern of the ventricles and myocardium from ED to ES frame also looks realistic and similar to the real sample. This demonstrates our model can capture the overall anatomy and temporal dynamics of the heart during generation. The five samples with the same conditions also present certain degrees of variations, which demonstrates the diversity of synthetic data. This is due to the Gaussian sampling part of the generation process and reflects the individual differences between two hearts even if they are of the same gender and age, which can be caused by genetic, environmental, lifestyle and many other factors that are not easily accounted for by the model.
Fig. 4.
Visualisation of synthetic anatomies (last five columns) generated by the model, compared to the real anatomy (first column) with the same clinical conditions (text annotation). The whole anatomical sequence is generated but only ED and ES frames are shown here. The first and second rows of each example show the ED and ES frames of the cardiac anatomical sequence.
To further evaluate whether fidelity and diversity of the generated samples with respect to the real samples, we assess the distance between their distributions, conditioned on age, a common factor of interest in clinical research. In addition to quantitative assessments, we conducted qualitative comparisons by evaluating the distributions of five clinical measures for both real and synthetic anatomies against age, including LVM, LVEDV, LVEV, RVEDV, and RVEF, illustrated in Fig. 5. Compared to other methods, the synthetic data distributions from our model closely resemble the real distributions and cover the full variability of the real samples. Table IV reports the KL divergence and Wasserstein distance between synthetic and real data distributions. The proposed model achieves the best KL or WD metrics in most clinical measurements, with KL divergence values of 0.034, 0.043, 0.034, 0.039, 0.031, and WD values of 15.053, 5.773, 12.214, 9.182, 9.215 for LVEDV, LVESV, RVEDV, RVESV, and LVM, respectively. These results demonstrate that the synthetic data generated by our model maintains a distribution against age that is similar to the real data.
Fig. 5.
Distributions of clinical measures for real data and synthetic data. Each graph displays a kernel density plot of an imaging phenotype (LVM, LVEDV, LVESV, RVEDV, RVESV) against age. For each plot, the x-axis denotes age and the y-axis denotes the value of the imaging phenotype. Darker areas in the plot indicate the regions where the data is more concentrated. Lighter areas show the regions where the data is sparser.
Table IV. KL divergence and Wasserstein distance between synthetic data distribution and real data distribution.
| Distribution similarity | Kullback–Leibler (KL) divergence | Wasserstein Distance (WD) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| LVEDV | LVESV | RVEDV | RVESV | LVM | LVEDV | LVESV | RVEDV | RVESV | LVM | |
| CGAN [34] | 0.023 ±0.001 | 0.019 ±0.001 | 0.036±0.004 | 0.022 ±0.001 | 0.050±0.006 | 33.687±1.173 | 19.982±0.025 | 41.643±4.161 | 17.434±0.036 | 35.395±2.933 |
| CVAE [48] | 0.039±0.005 | 0.041±0.004 | 0.042±0.004 | 0.034±0.003 | 0.030±0.003 | 11.929 ±2.116 | 7.017±0.964 | 14.680±2.869 | 9.665±1.051 | 10.365±1.703 |
| CVAE-GAN [6] | 0.153±0.025 | 0.023±0.003 | 0.046±0.006 | 0.064±0.008 | 0.098±0.019 | 27.001±2.809 | 8.425±1.771 | 24.614±3.202 | 9.748±2.675 | 43.251±4.566 |
| Proposed | 0.034±0.002 | 0.043±0.002 | 0.034 ±0.002 | 0.039±0.002 | 0.031 ±0.002 | 15.053±3.597 | 5.773 ±1.358 | 12.214 ±2.408 | 9.182 ±2.145 | 9.215 ±1.713 |
E. Temporal dynamics
The proposed model encoded the anatomical and clinical information of the first frame (ED) and generated the latent vectors for the following frames by the temporal module. We use the dimensionality reduction technique, t-distributed stochastic neighbor embedding (t-SNE) [31], to visualise the latent space of the generated anatomical sequences, as shown in Fig. 6. The sequential latent codes start at ED (t = 0) and move along a cyclic path in the latent space. It shows that the generative model can capture the temporal dynamics of the anatomy during the heartbeat and form a cyclic pattern as a real heart [45]. More overlapped areas between frames 9 to 18 show that the variation of anatomies is smaller in the relaxation stage, which demonstrates the nonlinear trajectories of cardiac motion. We plotted one example of the anatomical sequence at time frame 0, 3, 4, 6, 9, 12, 15, 18 in the figure. Through the time frames, the anatomies present first decreased and then increased LV volumes. The thickness of Myo has the opposite trend, which is consistent with the contraction and relaxation pattern of the heart [19].
Fig. 6.
T-distributed stochastic neighbor embedding (t-SNE) visualization of latent space for generated anatomical sequences from frame 0 to frame 18. Each dot represents a single time frame of a sample, with colors indicating the frame index. A sequence of anatomies, decoded from a corresponding sequence of latent codes, belonging to one subject, is visualised in the figure.
F. Condition manipulation
With the conditional generative model, we are able to simulate the change of anatomy when certain conditions (e.g. age) change. Fig. 7(a) shows a series of generated anatomies during ageing, when the condition age increases but all the other conditions as well as the latent vectors drawn from the Gaussian distribution are fixed. The difference map comparing the aged anatomy to the anatomy at 10-20 years old shows subtle changes to the LV and RV structures. We further generate 200 random samples of the synthetic ageing anatomies and derive the clinical measures. Fig. 7(b) illustrates the longitudinal evolution of these measures, stratified by gender. We observe a longitudinally increasing trend in LVM during ageing and a decreasing trend in LVEDV, consistent with findings in clinical literature [18] (Figure 3 of [18]). It demonstrates the potential of using this model for simulating anatomical data distributions. However, we need to be cautious in interpreting this result, as our training data is cross-sectional instead of longitudinal and also the mechanism of cardiac ageing is complex, confounded by more factors (genetics, lifestyle etc) than the five conditions we used in this work.
Fig. 7.
(a) An example of the synthetic cardiac anatomy during ageing. The first and third rows show the cardiac anatomies at end-diastolic (ED) and end-systolic (ES) frames.
The second and fourth rows show the difference maps between the aged anatomy 20-80 years old and the anatomy at 10-20 years old. (b) The simulated evolution of clinical measures (LVM, LVEDV, LVESV, RVEDV, RVESV) by generating 200 samples of gender-specific ageing cardiac anatomy and plotting their mean measures with 95% confidence interval.
IV. Discussion
The proposed model is built upon a β-VAE for learning the latent space of the cardiac anatomy. It integrates a conditional branch to model the influence of multiple clinical factors on the generation process and uses a temporal module to model the temporal relationship of anatomical latent vectors during cardiac motion. The experiments demonstrate good performance in both anatomical sequence completion and sequence generation tasks, qualitatively and quantitatively. The model enables condition manipulation for demonstrating the impact of clinical factors on anatomical shape variation. When using the common clinical measures (ventricular volumes and mass) for evaluation, the distribution of generated anatomies is close to the real data distribution visually (Fig. 5) and quantitatively (Table IV), which indicate both the fidelity and diversity of the generation. While the model performs well in generating anatomically coherent structures, further improvement can be made in terms of achieving a closer similarity between the distribution of generated anatomies and real data distribution. There is also potential for further exploration of the relationship between cardiac motion and clinical conditions
We foresee there are several potential downstream tasks for the generative cardiac anatomy model, including discovering patterns in large datasets, facilitating out-of-distribution detection and generating synthetic data etc. First, by training a generative model on a large dataset of cardiac anatomies, the trained model can capture complex patterns and variations of the anatomy associated with different clinical factors. This knowledge can be valuable for understanding population-level characteristics, identifying risk factors and informing public health strategies. Second, by learning the distribution of normal cardiac anatomy and dynamics, the proposed model can identify patterns of a given anatomy that deviate from the norm, indicating potential anomalies that require further investigation. More importantly, the proposed method is a conditional generative model, which means it can learn the norm specifically for certain conditions (e.g. a gender and age group) and evaluate the deviation from the norm in a personalised manner. Third, the trained generative model can provide a large amount of synthetic data for other tasks. Synthetic data can be used for performing data augmentation for training machine learning models [8], creating synthetic fair data to improve the fairness of prediction models [11], [50], or used as digital anatomies for performing in-silico trials [55]. Diverse and realistic synthetic data will alleviate the data scarcity issue in the medical field, where real data are often limited or not easy to share. This includes the creation of synthetic data for privacy-preserving research [39], [51].
There are a few limitations of this work. The first limitation is the high computational cost during training to learn the spatio-temporal patterns from 4D data, even after cropping the images to 128 × 128 × 64 and using sequences of only 20 time frames. An interesting future direction is to reduce the computational complexity of high-dimensional and high-resolution medical imaging data. Second, here we use a segmentation map as a representation of the anatomy so that the generative model can focus on learning the variations of anatomy, instead of intensity image styles. Future explorations could be extended to the generation of intensity images for the heart [1] or using mesh as a representation for the anatomy [33], which may be computationally more efficient. Third, we use a cross-sectional imaging dataset of mainly healthy volunteers for training the generative model, due to the challenge of curating large-scale longitudinal datasets with high spatial resolution. It would be interesting to extend this to longitudinal and clinical imaging cohorts with cardiac diseases in the future.
V. Conclusion
In this work, we propose a novel conditional generative model that is able to synthesise spatial-temporal cardiac anatomies given clinical factors as input. It demonstrates the feasibility of generating highly realistic synthetic 3D-t anatomies for the heart that captures both the anatomical variations and motion of the heart. The work paves the way for further generative modelling research in cardiac imaging, such as incorporating disease types or representing anatomy as meshes. It also has the potential to be applied to downstream tasks, such as performing data augmentation based on various anatomies, building condition-specific atlases and performing biomechanical modelling of the heart etc.
Acknowledgments
This work is supported by EPSRC DeepGeM Grant (EP/W01842X/1). SW is supported by Shanghai Sailing Program (22YF1409300), CCF-Baidu Open Fund (CCF-BAIDU 202316) and International Science and Technology Cooperation Program under the 2023 Shanghai Action Plan for Science (23410710400); HQ is supported by EPSRC SmartHeart (EP/P001009/1) and Innovate UK (104691). DO’R is supported by the Medical Research Council (MC_UP_1605/13); National Institute for Health Research (NIHR) Imperial College Biomedical Research Centre; and the British Heart Foundation (RG/19/6/34387, RE/18/4/34215). ADM is supported by the Fetal Medicine Foundation (495237) and Academy of Medical Sciences (SGL015/1006). DR was supported in part by the European Research Council (Grant Agreement no. 884622). For the purpose of open access, the authors have applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising.
Contributor Information
Mengyun Qiao, Email: m.qiao21@imperial.ac.uk, Department of Computing, Department of Brain Sciences and Data Science Institute, Imperial College London, London, SW7 2AZ, United Kingdom.
Shuo Wang, Digital Medical Research Center, School of Basic Medical Sciences, Fudan University and Shanghai Key Laboratory of MICCAI, Shanghai, China.
Huaqi Qiu, Biomedical Image Analysis Group (BioMedIA), Department of Computing, Imperial College London.
Antonio de Marvao, MRC Laboratory of Medical Sciences, Imperial College London, London W12 0HS, United Kingdom; Department of Women and Children’s Health, and British Heart Foundation Centre of Research Excellence, School of Cardiovascular and Metabolic Medicine and Sciences, King’s College London, London, United Kingdom.
Declan P. O’Regan, MRC Laboratory of Medical Sciences, Imperial College London, London W12 0HS, United Kingdom
Daniel Rueckert, Biomedical Image Analysis Group (BioMedIA), Department of Computing, Imperial College London; Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
Wenjia Bai, Department of Computing, Department of Brain Sciences and Data Science Institute, Imperial College London, London, SW7 2AZ, United Kingdom.
References
- [1].Amirrajab S, Al Khalil Y, Lorenz C, Weese J, Pluim J, Breeuwer M. A framework for simulating cardiac MR images with varying anatomy and contrast. IEEE Transactions on Medical Imaging. 2022 doi: 10.1109/TMI.2022.3215798. [DOI] [PubMed] [Google Scholar]
- [2].Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks; International Conference on Machine Learning; 2017. pp. 214–223. [Google Scholar]
- [3].Bai W, Shi W, O’regan DP, Tong T, Wang H, Jamil-Copley S, Peters NS, Rueckert D. A probabilistic patch-based label fusion model for multi-atlas segmentation with registration refinement: application to cardiac MR images. IEEE Transactions on Medical Imaging. 2013;32(7):1302–1315. doi: 10.1109/TMI.2013.2256922. [DOI] [PubMed] [Google Scholar]
- [4].Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance. 2018;20(1):65. doi: 10.1186/s12968-018-0471-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Bai W, Suzuki H, Huang J, Francis C, Wang S, Tarroni G, Guitton F, Aung N, Fung K, Petersen SE, et al. A population-based phenome-wide association study of cardiac and aortic structure and function. Nature Medicine. 2020;26(10):1654–1662. doi: 10.1038/s41591-020-1009-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Bao J, Chen D, Wen F, Li H, Hua G. CVAE-GAN: fine-grained image generation through asymmetric training; International Conference on Computer Vision; 2017. pp. 2745–2754. [Google Scholar]
- [7].Biffi C, Cerrolaza JJ, Tarroni G, Bai W, De Marvao A, Oktay O, Ledig C, Le Folgoc L, Kamnitsas K, Doumou G, et al. Explainable anatomical shape analysis through deep hierarchical generative models. IEEE Transactions on Medical Imaging. 2020;39(6):2088–2099. doi: 10.1109/TMI.2020.2964499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Billot B, Greve DN, Puonti O, Thielscher A, Van Leemput K, Fischl B, Dalca AV, Iglesias JE, et al. Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis. 2023;86:102789. doi: 10.1016/j.media.2023.102789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Campello VM, Xia T, Liu X, Sanchez P, Martín-Isla C, Petersen SE, Seguí S, Tsaftaris SA, Lekadir K. Cardiac aging synthesis from cross-sectional data with conditional generative adversarial networks. Frontiers in Cardiovascular Medicine. 2022;9 doi: 10.3389/fcvm.2022.983091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Cardim N, Galderisi M, Edvardsen T, Plein S, Popescu BA, d’Andrea A, Bruder O, Cosyns B, Davin L, Donal E, et al. Role of multimodality cardiac imaging in the management of patients with hypertrophic cardiomyopathy: an expert consensus of the European Association of Cardiovascular Imaging Endorsed by the Saudi Heart Association. European Heart Journal - Cardiovascular Imaging. 2015;16(3):280. doi: 10.1093/ehjci/jeu291. [DOI] [PubMed] [Google Scholar]
- [11].Chen RJ, Lu MY, Chen TY, Williamson DF, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering. 2021;5(6):493–497. doi: 10.1038/s41551-021-00751-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Chen Z, Kim VG, Fisher M, Aigerman N, Zhang H, Chaudhuri S. Decor-gan: 3d shape detailization by conditional refinement; IEEE Conference on Computer Vision and Pattern Recognition; 2021. pp. 15740–15749. [Google Scholar]
- [13].Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation; IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 8789–8797. [Google Scholar]
- [14].Cover TM. Elements of information theory. John Wiley & Sons; 1999. [Google Scholar]
- [15].Dalca AV, Rakic M, Guttag J, Sabuncu MR. Learning conditional deformable templates with convolutional networks. Neural Information Processing Systems. 2019:806–818. [Google Scholar]
- [16].Dhariwal P, Jun H, Payne C, Kim JW, Radford A, Sutskever I. Jukebox: A generative model for music. arXiv preprint. 2020:arXiv:2005.00341 [Google Scholar]
- [17].Duchateau N, Sermesant M, Delingette H, Ayache N. Model-based generation of large databases of cardiac images: Synthesis of pathological cine mr sequences from real healthy cases. IEEE Transactions on Medical Imaging. 2018;37(3):755–766. doi: 10.1109/TMI.2017.2714343. [DOI] [PubMed] [Google Scholar]
- [18].Eng J, McClelland RL, Gomes AS, Hundley WG, Cheng S, Wu CO, Carr JJ, Shea S, Bluemke DA, Lima JA. Adverse left ventricular remodeling and age assessed with cardiac MR imaging: the multi-ethnic study of atherosclerosis. Radiology. 2016;278(3):714–722. doi: 10.1148/radiol.2015150982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Fukuta H, Little WC. The cardiac cycle and the physiologic basis of left ventricular contraction, ejection, relaxation, and filling. Heart Failure Clinics. 2008;4(1):1–11. doi: 10.1016/j.hfc.2007.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Gilbert K, Mauger C, Young AA, Suinesiaputra A. Artificial intelligence in cardiac imaging with statistical atlases of cardiac anatomy. Frontiers in Cardiovascular Medicine. 2020;7 doi: 10.3389/fcvm.2020.00102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A. beta-VAE: Learning basic visual concepts with a constrained variational framework; International Conference on Learning Representations; 2017. [Google Scholar]
- [22].Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- [23].Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods. 2021;18(2):203–211. doi: 10.1038/s41592-020-01008-z. [DOI] [PubMed] [Google Scholar]
- [24].Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks; IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 1125–1134. [Google Scholar]
- [25].Jolliffe IT. Principal component analysis for special types of data. Springer; 2002. [Google Scholar]
- [26].Karamitsos TD, Francis JM, Myerson S, Selvanayagam JB, Neubauer S. The role of cardiovascular magnetic resonance imaging in heart failure. Journal of the American College of Cardiology. 2009;54(15):1407–1424. doi: 10.1016/j.jacc.2009.04.094. [DOI] [PubMed] [Google Scholar]
- [27].Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks; IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 4401–4410. [DOI] [PubMed] [Google Scholar]
- [28].Khamparia A, Pandey B, Tiwari S, Gupta D, Khanna A, Rodrigues JJPC. An Integrated Hybrid CNN–RNN Model for Visual Description and Generation of Captions. Circuits, Systems, and Signal Processing. 2020;39(2):776–788. [Google Scholar]
- [29].Kingma DP, Welling M. Auto-encoding variational Bayes; International Conference on Learning Representations; 2014. [Google Scholar]
- [30].Krebs J, Delingette H, Ayache N, Mansi T. Learning a generative motion model from image sequences based on a latent motion matrix. IEEE Transactions on Medical Imaging. 2021;40(5):1405–1416. doi: 10.1109/TMI.2021.3056531. [DOI] [PubMed] [Google Scholar]
- [31].Maaten Lvd, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research. 2008 Nov;9:2579–2605. [Google Scholar]
- [32].Mauger CA, Govil S, Chabiniok R, Gilbert K, Hegde S, Hussain T, McCulloch AD, Occleshaw CJ, Omens J, Perry JCmvo, Pushparajah K, et al. Right-left ventricular shape variations in tetralogy of fallot: associations with pulmonary regurgitation. Journal of Cardiovascular Magnetic Resonance. 2021;23(1):1–14. doi: 10.1186/s12968-021-00780-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Meng Q, Bai W, Liu T, O’Regan DP, Rueckert D. Mesh-based 3D motion tracking in cardiac MRI using deep learning; International Conference on Medical Image Computing and Computer-Assisted Intervention; 2022. [Google Scholar]
- [34].Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint. 2014:arXiv:1411.1784 [Google Scholar]
- [35].Naeem MF, Oh SJ, Uh Y, Choi Y, Yoo J. Reliable fidelity and diversity metrics for generative models; International Conference on Machine Learning; PMLR; 2020. Jul 13-18, pp. 7176–7185. [Google Scholar]
- [36].Nichol A, Dhariwal P, Ramesh A, Shyam P, Mishkin P, McGrew B, Sutskever I, Chen M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models; International Conference on Machine Learning; 2022. [Google Scholar]
- [37].Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. Neural Information Processing Systems. 2019;32:8026–8037. [Google Scholar]
- [38].Petrovich M, Black MJ, Varol G. TEMOS: Generating diverse human motions from textual descriptions; European Conference on Computer Vision; 2022. [Google Scholar]
- [39].Qian Z, Callender T, Cebere B, Janes SM, Navani N, van der Schaar M. Synthetic data for privacy-preserving clinical risk prediction. medRxiv. 2023:2023–05. doi: 10.1038/s41598-024-72894-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Reynaud H, Vlontzos A, Dombrowski M, Lee C, Beqiri A, Leeson P, Kainz B. D’artagnan: Counterfactual video generation. Medical Image Computing and Computer Assisted Intervention. 2022:599–609. [Google Scholar]
- [41].Rezende D, Mohamed S. Variational inference with normalizing flows; International Conference on Machine Learning; PMLR; 2015. pp. 1530–1538. [Google Scholar]
- [42].Rueckert D, Sonoda LI, Denton ER, Rankin S, Hayes C, Leach MO, Hill DL, Hawkes DJ. Medical Imaging 1999: Image Processing. Vol. 3661. International Society for Optics and Photonics; 1999. Comparison and evaluation of rigid and nonrigid registration of breast mr images; pp. 78–88. [Google Scholar]
- [43].Sajjadi MS, Bachem O, Lucic M, Bousquet O, Gelly S. Assessing generative models via precision and recall. Advances in Neural Information Processing Systems. 2018;31 [Google Scholar]
- [44].Schuh A, Makropoulos A, Robinson EC, Cordero-Grande L, Hughes E, Hutter J, Price AN, Murgasova M, Teixeira RPA, Tusor N, et al. Unbiased construction of a temporally consistent morphological atlas of neonatal brain development. bioRxiv. 2018:251512 [Google Scholar]
- [45].Scott AD, Keegan J, Firmin DN. Motion in cardiovascular MR imaging. Radiology. 2009;250(2):331–351. doi: 10.1148/radiol.2502071998. [DOI] [PubMed] [Google Scholar]
- [46].Singer U, Polyak A, Hayes T, Yin X, An J, Zhang S, Hu Q, Yang H, Ashual O, Gafni O, et al. Make-a-video: Text-to-video generation without text-video data. arXiv preprint. 2022:arXiv:2209.14792 [Google Scholar]
- [47].Smiseth OA, Torp H, Opdahl A, Haugaa KH, Urheim S. Myocardial strain imaging: how useful is it in clinical decision making? European Heart Journal. 2016;37(15):1196–1207. doi: 10.1093/eurheartj/ehv529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Sohn K, Lee H, Yan X. Learning structured output representation using deep conditional generative models. Neural Information Processing Systems. 2015;28:3483–3491. [Google Scholar]
- [49].Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Neural Information Processing Systems. 2014:3104–3112. [Google Scholar]
- [50].van Breugel B, Kyono T, Berrevoets J, van der Schaar M. Decaf: Generating fair synthetic data using causally-aware generative networks. Advances in Neural Information Processing Systems. 2021;34:22221–22233. [Google Scholar]
- [51].van Breugel B, van der Schaar M. Beyond privacy: Navigating the opportunities and challenges of synthetic data. arXiv preprint. 2023:arXiv:2304.03722 [Google Scholar]
- [52].Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: A neural image caption generator; IEEE Conference on Computer Vision and Pattern Recognition; 2015. pp. 3156–3164. [Google Scholar]
- [53].Walker J, Razavi A, Oord Avd. Predicting video with vqvae. arXiv preprint. 2021:arXiv:2103.01950 [Google Scholar]
- [54].Xia T, Chartsias A, Wang C, Tsaftaris SA, A. D. N. Initiative et al. Learning to synthesise the ageing brain without longitudinal data. Medical Image Analysis. 2021;73:102169. doi: 10.1016/j.media.2021.102169. [DOI] [PubMed] [Google Scholar]
- [55].Xia Y, Ravikumar N, Lassila T, Frangi AF. Virtual high-resolution mr angiography from non-angiographic multi-contrast mris: synthetic vascular model populations for in-silico trials. Medical Image Analysis. 2023;87:102814. doi: 10.1016/j.media.2023.102814. [DOI] [PubMed] [Google Scholar]
- [56].Yan W, Zhang Y, Abbeel P, Srinivas A. Videogpt: Video generation using vq-vae and transformers. arXiv preprint. 2021:arXiv:2104.10157 [Google Scholar]
- [57].Yoo J, Jin KH, Gupta H, Yerly J, Stuber M, Unser M. Time-dependent deep image prior for dynamic mri. IEEE Transactions on Medical Imaging. 2021;40(12):3337–3348. doi: 10.1109/TMI.2021.3084288. [DOI] [PubMed] [Google Scholar]
- [58].Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006;31(3):1116–1128. doi: 10.1016/j.neuroimage.2006.01.015. [DOI] [PubMed] [Google Scholar]
- [59].Zolotas M, Demiris Y. Disentangled sequence clustering for human intention inference; International Conference on Intelligent Robots and Systems; 2022. [Google Scholar]
- [60].Zou Q, Ahmed AH, Nagpal P, Priya S, Schulte RF, Jacob M. Variational manifold learning from incomplete data: application to multi-slice dynamic mri. IEEE transactions on medical imaging. 2022;41(12):3552–3561. doi: 10.1109/TMI.2022.3189905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Zou Q, Priya S, Nagpal P, Jacob M. Joint cardiac t 1 mapping and cardiac cine using manifold modeling. Bioengineering. 2023;10(3):345. doi: 10.3390/bioengineering10030345. [DOI] [PMC free article] [PubMed] [Google Scholar]







