Skip to main content
IEEE Sponsored Documents logoLink to IEEE Sponsored Documents
. 2022 Feb 24;41(8):1961–1974. doi: 10.1109/TMI.2022.3154599

MulViMotion: Shape-Aware 3D Myocardial Motion Tracking From Multi-View Cardiac MRI

Qingjie Meng 1,, Chen Qin 2, Wenjia Bai 1,3, Tianrui Liu 1, Antonio de Marvao 4, Declan P O'Regan 4, Daniel Rueckert 1,5
PMCID: PMC7613225  EMSID: EMS143763  PMID: 35201985

Abstract

Recovering the 3D motion of the heart from cine cardiac magnetic resonance (CMR) imaging enables the assessment of regional myocardial function and is important for understanding and analyzing cardiovascular disease. However, 3D cardiac motion estimation is challenging because the acquired cine CMR images are usually 2D slices which limit the accurate estimation of through-plane motion. To address this problem, we propose a novel multi-view motion estimation network (MulViMotion), which integrates 2D cine CMR images acquired in short-axis and long-axis planes to learn a consistent 3D motion field of the heart. In the proposed method, a hybrid 2D/3D network is built to generate dense 3D motion fields by learning fused representations from multi-view images. To ensure that the motion estimation is consistent in 3D, a shape regularization module is introduced during training, where shape information from multi-view images is exploited to provide weak supervision to 3D motion estimation. We extensively evaluate the proposed method on 2D cine CMR images from 580 subjects of the UK Biobank study for 3D motion tracking of the left ventricular myocardium. Experimental results show that the proposed method quantitatively and qualitatively outperforms competing methods.

Keywords: Multi-view, 3D motion tracking, shape regularization, cine CMR, deep neural networks

I. Introduction

The motion of the beating heart is a rhythmic pattern of non-linear trajectories regulated by the circulatory system and cardiac neuroautonomic control [1][3]. Estimating cardiac motion is an important step for the exploration of cardiac function and the diagnosis of cardiovascular diseases [1], [4], [5]. In particular, left ventricular (LV) myocardial motion tracking enables spatially and temporally localized assessment of LV function [6]. This is helpful for the early and accurate detection of LV dysfunction and myocardial diseases [7], [8].

Cine cardiac magnetic resonance (CMR) imaging supports motion analysis by acquiring sequences of 2D images in different views. Each image sequence covers the complete cardiac cycle containing end-diastolic (ED) and end-systolic (ES) phases [10]. Two types of anatomical views are identified, including (1) short-axis (SAX) view and (2) long-axis (LAX) view such as 2-chamber (2CH) view and 4-chamber (4CH) view (Fig. 1). The SAX sequences typically contain a stack of 2D slices sampling from base to apex in each frame (e.g., 9-12 slices). The LAX sequences contain a single 2D slice that is approximately orthogonal to the SAX plane in each frame. These acquired images have high temporal resolution, high signal-to-noise ratio as well as high contrast between the blood pool and myocardium. With these properties, cine CMR imaging has been utilized in recent works for 2D myocardial motion estimation, e.g., [11][15].

Fig. 1.

Fig. 1.

Examples of 2D cine CMR scans of a healthy subject. Cine CMR scans are acquired from short-axis (SAX) view and two long-axis (LAX) views. The SAX view contains a stack of 2D images while each LAX view contains a single 2D image. (a) Inline graphic-plane of the SAX stack. (b) Inline graphic-plane of the SAX stack. (c) LAX 2-chamber (2CH) view. (d) LAX 4-chamber (4CH) view. Red and green contours1 show the epicardium and endocardium, respectively. The area between these contours is the myocardium of the left ventricle. We show the end-diastolic (ED) frame (top row) and the end-systolic (ES) frame (bottom row) of the cine CMR image sequence.

2D myocardial motion estimation only considers motion in either the SAX plane or LAX plane and does not provide complete 3D motion information for the heart. This may lead to inaccurate assessment of cardiac function. Therefore, 3D motion estimation that recovers myocardial deformation in the Inline graphic, Inline graphic and Inline graphic directions is important. However, estimating 3D motion fields from cine CMR images remains challenging because (1) SAX stacks have much lower through-plane resolution (typically 8 mm slice thickness) than in-plane resolution (typically Inline graphic mm), (2) image quality can be negatively affected by slice misalignment in SAX stacks as only one or two slices are acquired during a single breath-hold, and (3) high-resolution 2CH and 4CH view images are too spatially sparse to estimate 3D motion fields on their own.

In this work, we take full advantage of both SAX and LAX (2CH and 4CH) view images, and propose a multi-view motion estimation network for 3D myocardial motion tracking from cine CMR images. In the proposed method, a hybrid 2D/3D network is developed for 3D motion estimation. This hybrid network learns combined representations from multi-view images to estimate a 3D motion field from the ED frame to any Inline graphic-th frame in the cardiac cycle. To guarantee an accurate motion estimation, especially along the longitudinal direction (i.e., the Inline graphic direction), a shape regularization module is introduced to leverage anatomical shape information for motion estimation during training. This module encourages the estimated 3D motion field to correctly transform the 3D shape of the myocardial wall from the ED frame to the Inline graphic-th frame. Here anatomical shape is represented by edge maps that show the contour of the cardiac anatomy. During inference, the hybrid network generates a sequence of 3D motion fields between paired frames (ED and Inline graphic-th frames), which represents the myocardial motion across the cardiac cycle. The main contributions of this paper are summarized as follows:

  • We develop a solution to a challenging cardiac motion tracking problem: learning 3D motion fields from a set of 2D SAX and LAX cine CMR images. We propose an end-to-end trainable multi-view motion estimation network (MulViMotion) for 3D myocardial motion tracking.

  • The proposed method enables accurate 3D motion tracking by combining multi-view images using both latent information and shape information: (1) the representations of multi-view images are combined in the latent space for the generation of 3D motion fields; (2) the complementary shape information from multi-view images is exploited in a shape regularization module to provide explicit constraint on the estimated 3D motion fields.

  • The proposed method is trained in a weakly supervised manner which only requires sparsely annotated data in different 2D SAX and LAX views and requires no ground truth 3D motion fields. The 2D edge maps from the corresponding SAX and LAX planes provide weak supervision to the estimated 3D edge maps for guiding 3D motion estimation in the shape regularization module.

  • We perform extensive evaluations for the proposed method on 580 subjects from the UK Biobank study. We further present qualitative analysis on the CMR images with severe slice misalignment and we explore the applicability of our method for wall thickening measurement.

II. Related Work

1). Conventional Motion Estimation Methods:

A common method for quantifying cardiac motion is to track noninvasive markers. CMR myocardial tagging provides tissue markers (stripe-like darker tags) in myocardium which can deform with myocardial motion [16]. By tracking the deformation of markers, dense displacement fields can be retrieved in the imaging plane. Harmonic phase (HARP) technique is the most representative approach for motion tracking in tagged images [17][19]. Several other methods have been proposed to compute dense displacement fields from dynamic myocardial contours or surfaces using geometrical and biomechanical modeling [20], [21]. For example, Papademetris et al. [21] proposed a Bayesian estimation framework for myocardial motion tracking from 3D echocardiography. In addition, image registration has been applied to cardiac motion estimation in previous works. Craene et al. [22] introduced continuous spatio-temporal B-spline kernels for computing a 4D velocity field, which enforced temporal consistency in motion recovery. Rueckert et al. [23] proposed a free form deformation (FFD) method for general non-rigid image registration. This method has been used for cardiac motion estimation in many recent works, e.g., [1], [4], [6], [14], [24][27]. Thirion [28] built a demons algorithm which utilizes diffusing models for image matching and further cardiac motion tracking. Based on this work, Vercauteren et al. [29] adapted demons algorithm to provide non-parametric diffeomorphic transformation and McLeod et al. [30] introduced an elastic-like regularizer to improve the incompressibility of deformation recovery.

2). Deep Learning-Based Motion Estimation Methods:

In recent years, deep convolutional neural networks (CNNs) have been successfully applied to medical image analysis, which has inspired the exploration of deep learning-based cardiac motion estimation approaches. Qin et al. [11] proposed a multi-task framework for joint estimation of segmentation and motion. This multi-task framework contains a shared feature encoder which enables a weakly-supervised segmentation. Zheng et al. [12] proposed a method for cardiac pathology classification based on cardiac motion. Their method utilizes a modified U-Net [31] to generate flow maps between ED frame and any other frame. For cardiac motion tracking in multiple datasets, Yu et al. [15] considered the distribution mismatch problem and proposed a meta-learning-based online model adaption framework. Different from these methods which estimate motion in cine CMR, Ye et al. [32] proposed a deep learning model for tagged image motion tracking. In their work, the motion field between any two consecutive frames is first computed, followed by estimating the Lagrangian motion field between ED frame and any other frame. Most of these existing deep learning-based methods aim at 2D motion tracking by only using SAX stacks. In contrast, our method focuses on 3D motion tracking by fully combining multiple anatomical views (i.e., SAX, 2CH and 4CH), which is able to estimate both in-plane and through-plane myocardial motion.

3). Multi-View Based Cardiac Analysis:

Different anatomical scan views usually contain complementary information and the combined multiple views can be more descriptive than a single view. Chen et al. [33] utilized both SAX and LAX views for 2D cardiac segmentation, where the features of multi-view images are combined in the bottleneck of 2D U-Net. Puyol-Antón et al. [27] introduced a framework that separately uses multi-view images for myocardial strain analysis. In their method, the SAX view is used for radial and circumferential strain estimation while the LAX view is used for longitudinal strain estimation. Abdelkhalek et al. [34] proposed a 3D myocardial strain estimation framework, where the point clouds from SAX and LAX views are aligned for surface reconstruction. Attar et al. [35] proposed a framework for 3D cardiac shape prediction, in which the features of multi-view images are concatenated in CNNs to predict the 3D shape parameters. In this work, we focus on using multi-view images for 3D motion estimation. Compared to most of these existing works which only combine the features of multi-view images in the latent space (e.g., [33], [35]), our method additionally combines complementary shape information from multiple views to predict anatomically plausible 3D edge map of myocardial wall on different time frames, which provides guidance for 3D motion estimation.

III. Method

Our goal is to estimate 3D motion fields of the LV myocardium from multi-view 2D cine CMR images. We formulate our task as follows: Let Inline graphic be a SAX sequence which contains stacks of 2D images ( Inline graphic slices) and Inline graphic be LAX sequences which contain 2D images in the 2CH and 4CH views. Inline graphic and Inline graphic are the height and width of each image and Inline graphic is the number of frames. We want to train a network to estimate a 3D motion field Inline graphic by using the multi-view images of the ED frame ( Inline graphic) and of any Inline graphic-th frame ( Inline graphic). Inline graphic describes the motion of the LV myocardium from ED frame to the Inline graphic-th frame. For each voxel in Inline graphic, we estimate its displacement in the Inline graphic, Inline graphic, Inline graphic directions.

To solve this task, we propose MulViMotion that estimates 3D motion fields from multi-view images with shape regularization. The schematic architecture of our method is shown in Fig. 2. A hybrid 2D/3D network that contains FeatureNet (2D CNNs) and MotionNet (3D CNNs) is used to predict Inline graphic from the input multi-view images. FeatureNet learns multi-view multi-scale features and is used to extract multi-view motion feature Inline graphic and multi-view shape feature Inline graphic from the input. MotionNet generates Inline graphic based on Inline graphic. A shape regularization module is used to leverage anatomical shape information for 3D motion estimation during training. In this module, 3D edge maps of the myocardial wall are predicted from Inline graphic using ShapeNet and warped from ED frame to the Inline graphic-th frame by Inline graphic. The sparse ground truth 2D edge maps derived from the multi-view images provide weak supervision to the predicted and warped 3D edge maps, and thus encourage an accurate estimation of Inline graphic, especially in the Inline graphic direction. Here, a slicing step is used to extract corresponding multi-view planes from the 3D edge maps in order to compare 3D edge maps with 2D ground truth. During inference, a 3D motion field is directly generated from the input multi-view images by the hybrid network, without using shape regularization.

Fig. 2.

Fig. 2.

An overview of MulViMotion. We use a hybrid 2D/3D network to estimate a 3D motion field Inline graphic from the input multi-view images. In the hybrid network, FeatureNet learns multi-view motion feature Inline graphic and multi-view shape feature Inline graphic from the input, followed by MotionNet which generates Inline graphic based on Inline graphic. A shape regularization module leverages anatomical shape information for 3D motion estimation. It encourages the predicted 3D edge maps of the myocardial wall Inline graphic (predicted from Inline graphic using ShapeNet) and the warped 3D edge map Inline graphic (warped from ED frame to the Inline graphic-th frame by Inline graphic) to be consistent with the ground truth 2D edge maps defined on multi-view images. Shape regularization is only used during training.

A. 3D Motion Estimation

1). Multi-View Multi-Scale Feature Extraction (FeatureNet):

The first step of 3D motion estimation is to extract internal representations from the input 2D multi-view images Inline graphic. We build FeatureNet to simultaneously learn motion and shape feature from the input because the motion and shape of the myocardial wall are closely related and can provide complementary information to each other [11], [36], [37]. FeatureNet consists of (1) multi-scale feature fusion and (2) multi-view concatenation (see Fig. 3).

Fig. 3.

Fig. 3.

An overview of FeatureNet. FeatureNet takes multi-view images as input and extracts multi-view motion feature Inline graphic and multi-view shape feature Inline graphic. Panel (a) describes multi-scale feature fusion. Panel (b) shows the 2D encoder Inline graphic, where Inline graphic refers to SAX, 2CH and 4CH views. Panel (c) describes the combination of multi-view features.

In the multi-scale feature fusion (Fig. 3 (a)), the input multi-view images are unified to Inline graphic-channel 2D feature maps by applying 2D convolution on 2CH and 4CH view images. Then three 2D encoders Inline graphic are built to extract motion and shape features from each anatomical view,

1).

Here, Inline graphic represents anatomical views and Inline graphic refers to the network parameters of Inline graphic. Inline graphic and Inline graphic are the learned motion feature and shape feature, respectively. As these encoders aim to extract the same type of information (i.e., shape and motion information), the three encoders share weights to learn representations that are useful and related to different views.

In each encoder, representations at different scales are fully exploited for feature extraction. Inline graphic consists of (1) a Siamese network that extracts features from both ED frame and Inline graphic-th frame, and (2) feature-fusion layers that concatenate multi-scale features from pairs of frames (Fig. 3 (b)). From the Siamese network, the last feature maps of the two streams are used as shape feature of the ED frame ( Inline graphic) and the Inline graphic-th frame ( Inline graphic), respectively, and Inline graphic. All features across different scales from both streams are combined by feature-fusion layers to generate motion feature Inline graphic. In detail, these multi-scale features are upsampled to the original resolution by a convolution and upsampling operation and then combined using a concatenation layer.

With the obtained Inline graphic, a multi-view concatenation generates the multi-view motion feature Inline graphic and the multi-view shape feature Inline graphic via channel-wise concatenation Inline graphic (see Fig. 3 (c)),

1).

Here Inline graphic and Inline graphic.

The FeatureNet model is composed of 2D CNNs which learns 2D features from the multi-view images and inter-slice correlation from SAX stacks. The obtained Inline graphic is first unified to Inline graphic-channels using 2D convolution and then is used to predict Inline graphic in the next step. The obtained Inline graphic is used for shape regularization in Sec. III-B.

2). Motion Estimation (MotionNet):

In this step, we introduce MotionNet to predict the 3D motion field Inline graphic by learning 3D representations from the multi-view motion feature Inline graphic. MotionNet is built with a 3D encoder-decoder architecture. Inline graphic is predicted by MotionNet with

2).

where Inline graphic represents MotionNet and Inline graphic refers to the network parameters of Inline graphic. The function Inline graphic denotes an un-squeeze operation which changes Inline graphic from a stack of 2D feature maps to a 3D feature map by adding an extra dimension.

3). Spatial Transform (Warping):

Inspired by the successful application of spatial transformer networks [38], [39], the SAX stack of the ED frame ( Inline graphic) can be transformed to the Inline graphic-th frame using the motion field Inline graphic. For voxel with location Inline graphic in the transformed SAX stack ( Inline graphic), we compute the corresponding location Inline graphic in Inline graphic by Inline graphic. As image values are only defined at discrete locations, the value at Inline graphic in Inline graphic is computed from Inline graphic in Inline graphic using trillinear interpolation.2

4). Motion Loss:

As true dense motion fields of paired frames are usually unavailable in real practice, we propose an unsupervised motion loss Inline graphic to evaluate the 3D motion estimation model using only the input SAX stack ( Inline graphic) and the generated 3D motion field ( Inline graphic). Inline graphic consists of two components: (1) an image similarity loss Inline graphic that penalizes appearance difference between Inline graphic and Inline graphic, and (2) a local smoothness loss Inline graphic that penalizes the gradients of Inline graphic,

4).

Here Inline graphic is a hyper-parameter, Inline graphic is defined by voxel-wise mean squared error and Inline graphic is the Huber loss used in [11], [39] which encourages a smooth Inline graphic,

4).

Here Inline graphic and we use the same approximation to Inline graphic and Inline graphic. Same to [11], [39], Inline graphic is set to 0.01. In Eq. 5 and Eq. 6, Inline graphic is the Inline graphicth voxel and Inline graphic denotes the number of voxels.

Note that Inline graphic is only applied to SAX stacks because 2D images in 2CH and 4CH views typically consist of only one slice and can not be directly warped by a 3D motion field.

B. Shape Regularization

The motion loss ( Inline graphic) on its own is not sufficient to guarantee motion estimation in the Inline graphic direction due to the low through-plane resolution in SAX stacks. To address this problem, we introduce a shape regularization module which ensures the 3D edge map of the myocardial wall is correct before and after Inline graphic warping, and thus enables an accurate estimation of Inline graphic. Here, the ground truth 2D edge maps derived from the multi-view images provide weak supervision to the predicted and warped 3D edge maps.

1). Shape Estimation (ShapeNet):

ShapeNet is built to generate the 3D edge map of the myocardial wall in the ED frame ( Inline graphic) and the Inline graphic-th frame ( Inline graphic) from Inline graphic,

1).

Here Inline graphic and Inline graphic are the two branches in ShapeNet which contain shared 2D decoders and 3D convolutional layers in order to learn 3D edge maps from 2D features for all frames (Fig. 4). The dimension of Inline graphic and Inline graphic are Inline graphic. With the spatial transform in Sec. III-A.3, Inline graphic is warped to the Inline graphic-th frame by Inline graphic, which generates the transformed 3D edge map Inline graphic. Then Inline graphic, Inline graphic and Inline graphic are weakly supervised by ground truth 2D edge maps.

Fig. 4.

Fig. 4.

An overview of ShapeNet. ShapeNet predicts the 3D edge maps of the LV myocardial wall in ED frame and the Inline graphic-th frame from the corresponding shape features Inline graphic and Inline graphic.

2). Slicing:

To compare the 3D edge maps with 2D ground truth, we use 3D masks Inline graphic to extract SAX, 2CH and 4CH view planes from Inline graphic, Inline graphic and Inline graphic with

2).

where Inline graphic represents anatomical views and Inline graphic refers to element-wise multiplication. These 3D masks describe the locations of multi-view images in SAX stacks and are generated based on the input during image preprocessing.

3). Shape Loss:

The sliced 2D edge maps Inline graphic are compared to 2D ground truth Inline graphic by a shape loss Inline graphic,

3).

For each component in Inline graphic, we utilize cross-entropy loss ( Inline graphic) to measure the similarity of edge maps, e.g.,

3).

Same to Eq. 10, Inline graphic is computed by Inline graphic and Inline graphic is computed by Inline graphic.

C. Optimization

Our model is an end-to-end trainable framework and the overall objective is a linear combination of all loss functions

C.

where Inline graphic and Inline graphic are hyper-parameters chosen experimentally depending on the dataset. We use the Adam optimizer ( Inline graphic) to update the parameters of MulViMotion. Our model is implemented by Pytorch and is trained on a NVIDIA Tesla T4 GPU with 16 GB of memory.

IV. Experiments

We demonstrate our method on the task of 3D myocardial motion tracking. We evaluate the proposed method using quantitative metrics such as Dice, Hausdorff distance, volume difference and Jacobian determinant. Geometric mesh is used to provide qualitative results with 3D visualization. We compared the proposed method with other state-of-the-art motion estimation methods and performed extensive ablation study. In addition, we show the effectiveness of the proposed method on the subjects with severe slice misalignment. We further explore the applicability of the proposed method for wall thickening measurement. We show the key results in the main paper. More results (e.g., dynamic videos) are shown in the Appendix.3

A. Experiment Setups

1). Data:

We performed experiments on randomly selected 580 subjects from the UK Biobank study.4 All participants gave written informed consent [40]. The participant characteristics are shown in Table I. The CMR images of all subjects are acquired by a 1.5 Tesla scanner (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthcare, Erlangen, Germany). Each subject contains SAX, 2CH and 4CH view cine CMR sequences and each sequence contains 50 frames. More CMR acquisition details for UK Biobank study can be found in [41]. For image preprocessing, (1) SAX view images were resampled by linear interpolation from a spacing of Inline graphic to a spacing of Inline graphic while 2CH and 4CH view images were resampled from Inline graphic to Inline graphic, (2) by keeping the middle slice of the resampled SAX stacks in the center, zero-padding was used on top or bottom if necessary to reshape the resampled SAX stacks to 64 slices, (3) to cover the whole LV as the ROI, based on the center of the LV in the middle slice, the resampled SAX stacks were cropped to a size of Inline graphic (note that we computed the center of the LV based on the LV myocardium segmentation of the middle slice of the SAX stack), (4) 2CH and 4CH view images were cropped to Inline graphic based on the center of the intersecting line between the middle slice of the cropped SAX stack and the 2CH/4CH view image, (5) each frame was independently normalized to zero mean and unitary standard deviation, and (6) 3D masks (Eq. 8) were computed by a coordinate transformation using DICOM image header information of SAX, 2CH and 4CH view images. Note that 2D SAX slices used in the shape regularization module were unified to 9 adjacent slices for all subjects, including the middle slice and 4 upper and lower slices. With this image preprocessing, the input SAX, 2CH and 4CH view images cover the whole LV in the center.

TABLE I. Participant Characteristics. Data are Mean±Standard Deviation for Continuous Variables and Number of Participant for Categorical Variable.
Parameter Value (Subject number is 580)
Age (years) 64±8
Sex (Female/Male) 325 / 255
Ejection fraction (%) 60±6
Weight (kg) 74±15
Height (cm) 169±9
Body mass index (kg/m2) 26±4
Diastolic blood pressure (mm Hg) 79±10
Systolic blood pressure (mm Hg) 138±19

3D high-resolution segmentations of these subjects were automatically generated using the 4Dsegment tool [9] based on the resampled SAX stacks, followed by manual quality control. The obtained segmentations have been shown to be useful in clinical applications (e.g., [1]), and thus we use them to generate ground truth 2D edge maps (Fig. 1) in this work. In detail, we utilize the obtained 3D masks to extract SAX, 2CH and 4CH view planes from these 3D segmentations and then use contour extraction to obtain Inline graphic used in Sec. III-B.2. Note that we use 3D segmentation(s) to refer to the 3D segmentations obtained by [9] in this section.

We split the dataset into 450/50/80 for train/validation/test and train MulViMotion for 300 epochs. The hyper-parameters in Eq. 11 are selected as Inline graphic.

2). Evaluation Metrics:

We use segmentations to provide quantitative evaluation to the estimated 3D motion fields. This is the same evaluation performed in other cardiac motion tracking literature [11], [12], [15]. Here, 3D segmentations obtained by [9] are used in the evaluation metrics. The framework in [9] performs learning-based segmentation, followed by an atlas-based refinement step to ensure robustness towards potential imaging artifacts. The generated segmentations are anatomically meaningful and spatially consistent. As our work aims to estimate real 3D motion of the heart from the acquired CMR images, such segmentations that approximate the real shape of the heart can provide a reasonable evaluation. In specific, on test data, we estimate the 3D motion field Inline graphic from ED frame to ES frame, which shows large deformation. Then we warp the 3D segmentation of the ED frame ( Inline graphic) to ES frame by Inline graphic. Finally, we compared the transformed 3D segmentation ( Inline graphic) with the ground truth 3D segmentation of the ES frame ( Inline graphic) using following metrics. Note that the ES frame is identified as the frame with the least image intensity similarity to the ED frame.

Dice score and Hausdorff distance (HD) are utilized to respectively quantify the volume overlap and contour distance between Inline graphic and Inline graphic. A high value of Dice and a low value of HD represent an accurate 3D motion estimation.

Volume difference (VD) is computed to evaluate the volume preservation, as incompressible motion is desired within the myocardium [13], [19], [25], [30]. Inline graphic, where Inline graphic computes the number of voxels in the segmentation volume. A low value of VD means a good volume preservation ability of Inline graphic.

The Jacobian determinant Inline graphic ( Inline graphic) is employed to evaluate the local behavior of Inline graphic: A negative Jacobian determinant Inline graphic indicates that the motion field at position Inline graphic results in folding and leads to non-diffeomorphic transformations. Therefore, a low number of points with Inline graphic corresponds to an anatomically plausible deformation from ED frame to ES frame and thus indicates a good Inline graphic. We count the percentage of voxels in the myocardial wall with Inline graphic in the evaluation.

3). Baseline Methods:

We compared the proposed method with three cardiac motion tracking methods, including two conventional methods and one deep learning method. The first baseline is a B-spline free form deformation (FFD) algorithm [23] which has been used in many recent cardiac motion tracking works [1], [6], [14], [26], [27]. We use the FFD approach implemented in the MIRTK toolkit.5 The second baseline is a diffeomorphic Demons (dDemons) algorithm [29] which has been used in [13] for cardiac motion tracking. We use a SimpleITK software package as the dDemons implementation.6 In addition, the UNet architecture has been used in many recent works for image registration [37], [42], [43], and thus our third baseline is a deep learning method with 3D-UNet [44]. The input of 3D-UNet baseline is paired frames ( Inline graphic) and output is a 3D motion field. Eq. 4 is used as the loss function for this baseline. We implemented 3D-UNet based on its online code.7 For the baseline methods with hyper-parameters, we evaluated several sets of parameter values. The hyper-parameters that achieve the best Dice score on the validation set are selected.

B. 3D Myocardial Motion Tracking

1). Motion Tracking Performance:

For each test subject, MulViMotion is utilized to estimate 3D motion fields in the full cardiac cycle. With the obtained Inline graphic, we warp the 3D segmentation of ED frame ( Inline graphic) to the Inline graphic-th frame. Fig. 5 (a) shows that the estimated Inline graphic enables the warped 3D segmentation to match the myocardial area in images from different anatomical views. In addition, we warp the SAX stack of the ED frame ( Inline graphic) to the Inline graphic-th frame. Fig. 5 (b) shows the effectiveness of Inline graphic by comparing the warped and the ground truth SAX view images. By utilizing the warped 3D segmentation, we further compute established clinical biomarkers. Fig. 6 demonstrates the curve of LV volume over time. The shape of the curve are consistent with reported results in the literature [11], [45].

Fig. 5.

Fig. 5.

Examples of motion tracking results. 3D motion fields generated by MulViMotion are used to warp 3D segmentations and SAX stacks from ED frame to the Inline graphic-th frame. (a) The warped segmentations overlaid on SAX, 2CH and 4CH view images. (b) The ground truth (GT) and the warped SAX stacks as well as their difference maps (i.e., GT–Warped).

Fig. 6.

Fig. 6.

The results of LV volume across the cardiac cycle. (a) Results on a randomly selected test subject. (b) Results on all test subjects (mean values and confidence interval are presented). Note that, for each subject in (b), we normalized LV volume (dividing LV volume in all time frames by that in the ED frame) and show the average results of all test subjects.

We quantitatively compared MulViMotion with baseline methods in Table II. With the 3D motion fields generated by different methods, the 3D segmentations of ED frame are warped to ES frame and compared with the ground truth 3D segmentations of ES frame by using metrics introduced in Sec. IV-A.2. From this table, we observe that MulViMotion outperforms all baseline methods for Dice and Hausdorff distance, demonstrating the effectiveness of the proposed method on estimating 3D motion fields. MulViMotion achieves the lowest volume difference, indicating that the proposed method is more capable of preserving the volume of the myocardial wall during cardiac motion tracking. Compared to a diffeomorphic motion tracking method (dDemons [29]), the proposed method has a similar number of voxels with a negative Jacobian determinant. This illustrates that the learned motion field is smooth and preserves topology.

TABLE II. Comparison of Other Cardiac Motion Tracking Methods. Inline graphic Indicates the Higher Value the Better While Inline graphic Indicates the Lower Value the Better. Results are Reported as “Mean (Standard Deviation)” for Dice, Hausdorff Distance (HD), Volume Difference (VD) and Negative Jacobian Determinant ( Inline graphic ( Inline graphic) < 0). CPU and GPU Runtimes are Reported as the Average Inference Time for a Single Subject. Best Results in Bold.
Methods Anatomical views Dice Inline graphic HD (mm) Inline graphic VD (%) Inline graphic Inline graphic (%) Inline graphic Times CPU (s) Inline graphic Times GPU (s) Inline graphic
FFD [23] SAX 0.7250 (0.0511) 20.1138 (5.1130) 14.45 (6.87) 11.94 (5.01) 15.91
dDemons [29] SAX 0.7219 (0.0422) 18.3945 (3.5650) 14.46 (6.38) 0.13 (0.17) 28.32
3D-UNet [44] SAX 0.7382 (0.0293) 17.4785 (3.1030) 30.97 (9.89) 0.95 (1.05) 16.88 1.09
MulViMotion SAX, 2CH, 4CH 0.8200 (0.0348) 14.5937 (4.2449) 8.62 (4.85) 0.93 (0.94) 3.55 1.15

We further qualitatively compared MulViMotion with baseline methods in Fig. 7. A geometric mesh is used to provide 3D visualization of the myocardial wall. Specifically, 3D segmentations of ED frame are warped to any Inline graphic-th frame in the cardiac cycle and geometric meshes are reconstructed from these warped 3D segmentations. Red meshes in Fig. 7 demonstrate that in contrast to all baseline methods which only show motion within SAX plane (i.e., along the Inline graphic and Inline graphic directions), MulViMotion is able to estimate through-plane motion along the longitudinal direction (i.e., the Inline graphic direction) in the cardiac cycle, e.g., the reconstructed meshes of Inline graphic frame is deformed in the Inline graphic, Inline graphic, Inline graphic directions compared to Inline graphic and Inline graphic frames. In addition, white meshes in Fig. 7 illustrate that compared to all baseline methods, the 3D motion field generated by MulViMotion performs best in warping ED frame to ES frame and obtains the reconstructed mesh of ES frame which is most similar to the ground truth (GT) ES frame mesh (blue meshes). These results demonstrate the effectiveness of MulViMotion for 3D motion tracking, especially for estimating through-plane motion.

Fig. 7.

Fig. 7.

3D visualization of motion tracking results using the baseline methods and MulViMotion. Column 1 (blue) shows the ground truth (GT) meshes of ED frame. Columns 2-6 (red) show 3D motion tracking results across the cardiac cycle. These meshes are reconstructed from the warped 3D segmentations (warped from ED frame to different time frames). Column 7 (white) additionally shows the reconstructed meshes of ES frame from the motion tracking results and Column 8 (blue) shows the ground truth meshes of ES frame.

2). Runtime:

Table II shows runtime results of MulViMotion and baseline methods using Intel Xeon E5-2643 CPU and NVIDIA Tesla T4 GPU. The average inference time for a single subject is reported. FFD [23] and dDemons [29] are only available on CPUs while the 3D-UNet [44] and MulViMotion are available on both CPU and GPU. The results show that our method achieves similar runtime to 3D-UNet [44] on GPU and at least 5 times faster than baseline methods on CPU.

3). Ablation Study:

For the proposed method, we explore the effects of using different anatomical views and the importance of the shape regularization module. We use evaluation metrics in Sec. IV-A.2 to show quantitative results.

Table III shows the motion tracking results using different anatomical views. In particular, M1 only uses images and 2D edge maps from SAX view to train the proposed method, M2 uses those from both SAX and 2CH views and M3 uses those from both SAX and 4CH views. M2 and M3 outperforms M1, illustrating the importance of LAX view images. In addition, MulViMotion ( Inline graphic) outperforms other variant models. This might be because more LAX views can introduce more high-resolution 3D anatomical information for 3D motion tracking.

TABLE III. 3D Motion Tracking With Different Anatomical Views. Inline graphic and Inline graphic are Variants of the Proposed Method and Inline graphic Refers to MulViMotion. Results are Reported the Same Way as Table II. Best Results in Bold.
Anatomical views Dice Inline graphic HD (mm) Inline graphic VD (%) Inline graphic
SAX 2CH 4CH
M1 Inline graphic 0.7780 (0.0275) 18.2564 (3.4031) 30.66 (7.73)
M2 Inline graphic Inline graphic 0.7964 (0.0273) 18.1014 (3.7146) 24.05 (5.24)
M3 Inline graphic Inline graphic 0.7904 (0.0305) 19.2265 (3.2441) 17.50 (4.55)
M Inline graphic Inline graphic Inline graphic 0.8200 (0.0348) 14.5937 (4.2449) 8.62 (4.85)

In Table IV, the proposed method is trained using all three anatomical views but optimized by different combination of losses. A1 optimizes the proposed method without shape regularization (i.e., without Inline graphic in Eq. 11). A2 introduces basic shape regularization on top of A1, which adds Inline graphic and Inline graphic for Inline graphic. MulViMotion ( Inline graphic) outperforms A1, illustrating the importance of shape regularization. MulViMotion also outperforms A2. This is likely because Inline graphic and Inline graphic are both needed to guarantee the generation of distinct and correct 3D edge maps for all frames in the cardiac cycle. These results show the effectiveness of all proposed components in Inline graphic.

TABLE IV. 3D Motion Tracking With Different Combination of Loss Functions. Inline graphic Optimizes the Proposed Method Without Shape Regularization (Without Inline graphic in Eq. 11). Inline graphic Adds Basic Shape Regularization on Top of Inline graphic. Inline graphic Refers to MulViMotion. All Models are Trained by Three Anatomical Views. Results are Reported the Same Way as Table II. Best Results in Bold.
Inline graphic Dice Inline graphic HD (mm) Inline graphic VD (%) Inline graphic
Inline graphic Inline graphic Inline graphic
A1 0.7134 (0.0316) 18.9555 (3.1054) 33.93 (10.27)
A2 Inline graphic Inline graphic 0.7294 (0.0295) 17.5047 (3.7485) 12.51 (4.28)
M Inline graphic Inline graphic Inline graphic 0.8200 (0.0348) 14.5937 (4.2449) 8.62 (4.85)

Fig. 8 shows motion estimation performance using different strength of shape regularization. In detail, the proposed method is trained by three anatomical views and all loss components, but the shape loss ( Inline graphic) is computed by different percentage of training subjects (20%, 40%, 60%, 80%, 100%). From Fig. 8, we observe that motion estimation performance is improved with an increased percentage of subjects.

Fig. 8.

Fig. 8.

3D motion tracking with different strength of shape regularization, where the shape loss ( Inline graphic) is computed by different percentage of training subjects (20%, 40%, 60%, 80%, 100%). The left column is Dice score and the right column is Hausdorff distance.

4). The Influence of Hyper-Parameters:

Fig. 9 presents Dice and Hausdorff distance (HD) on the test data for various smoothness loss weight Inline graphic and shape regularization weight Inline graphic (Eq. 11). The Dice scores and HDs are computed according to Sec. IV-A.2. We observe that a strong constraint on motion field smoothness may scarify registration accuracy (see Fig. 9 (a)). Moreover, registration performance improves as Inline graphic increases from 1 to 5 and then deteriorates with a further increased Inline graphic (from 5 to 9). This might be because a strong shape regularization can enforce motion estimation to focus mainly on the few 2D planes which contain sparse labels.

Fig. 9.

Fig. 9.

Effects of varied hyper-parameters on Dice and Hausdorff distance. (a) shows the results of using various Inline graphic under Inline graphic. (b) shows the results of using various Inline graphic under Inline graphic.

5). The Performance on Subjects With Slice Misalignment:

Acquired SAX stacks may contain slice misalignment due to poor compliance with breath holding instructions or the change of position during breath-holding acquisitions [46]. This leads to an incorrect representation of cardiac volume and result in difficulties for accurate 3D motion tracking. Fig. 10 compares the motion tracking results of 3D-UNet [44], MulViMotion and MulViMotion without Inline graphic for the test subject with the severe slice misalignment (e.g., Fig. 10 (a) middle column). Fig. 10 (b) shows that in contrast to 3D-UNet, the motion fields generated by MulViMotion enables topology preservation of the myocardial wall (e.g., mesh of Inline graphic). MulViMotion outperforms MulViMotion without Inline graphic, which indicates the importance of the shape regularization module for reducing negative effect of slice misalignment. These results demonstrate the advantage of integrating shape information from multiple views and shows the effectiveness of the proposed method on special cases.

Fig. 10.

Fig. 10.

Motion tracking results on the test subject with slice misalignment. The first three columns in (a) are the three orthogonal planes of the SAX stack and the last two columns are 2CH and 4CH view images, respectively. (b) presents examples of motion tracking results using 3D-UNet [44], MulViMotion and MulViMotion without Inline graphic. The yellow arrow shows an example of slice misalignment while green arrows show examples of motion tracking failures using 3D-UNet. Note that we show the results in frame Inline graphic for a more distinct comparison.

6). Wall Thickening Measurement:

We have computed regional and global myocardial wall thickness at ED frame and ES frame based on ED frame segmentation and warped ES frame segmentation,8 respectively. The global wall thickness at ED frame is Inline graphic, which is consistent with results obtained by [14] ( Inline graphic). The wall thickness at the ES frame for American Heart Association 16-segments are shown in Table V. In addition, we have computed the fractional wall thickening between ED frame and ES frame by Inline graphic. The results in Table V shows that the regional and global fractional wall thickening are comparable with results reported in literature [47], [48].

TABLE V. Wall Thickness at the ES Frame and Fractional Wall Thickening Between ED and ES Frames. Results are Reported as “Mean (Standard Deviation)”.
Segments Wall thickness (mm) Fractional wall thickening (%)
Basal Anterior (1) 9.7 (2.7) 34.0 (39.5)
Anteroseptal (2) 5.7 (2.9) −24.4 (38.7)
Inferoseptal (3) 5.5 (2.0) −17.3 (30.2)
Inferior (4) 9.0 (1.7) 47.8 (28.5)
Inferolateral (5) 11.0 (2.0) 72.8 (25.9)
Anterolateral (6) 10.9 (1.8) 62.0 (23.8)
Mid-ventricle Anterior (7) 10.9 (1.5) 79.9 (21.0)
Anteroseptal (8) 11.9 (1.6) 76.2 (21.4)
Inferoseptal (9) 10.8 (1.4) 39.8 (12.3)
Inferior (10) 10.9 (1.3) 62.5 (15.5)
Inferolateral (11) 11.2 (1.5) 73.3 (17.1)
Anterolateral (12) 10.5 (1.2) 63.9 (15.6)
Apical Anterior (13) 10.8 (1.1) 86.3 (23.2)
Septal (14) 10.9 (1.4) 76.7 (20.5)
Inferior (15) 10.6 (1.4) 76.2 (15.1)
Lateral (16) 11.1 (1.4) 84.3 (18.9)
Global 10.1 (2.5) 55.9 (40.6)

V. Discussion

In this paper, we propose a deep learning-based method for estimating 3D myocardial motion from 2D multi-view cine CMR images. A naïve alternative to our method would be to train a fully unsupervised motion estimation network using high-resolution 3D cine CMR images. However, such 3D images are rarely available because (1) 3D cine imaging requires long breath holds during acquisition and are not commonly used in clinical practice, and (2) recovering high-resolution 3D volumes purely from 2D multi-view images is challenging due to the sparsity of multi-view planes.

Our focus has been on LV myocardial motion tracking because it is important for clinical assessment of cardiac function. Our model can be easily adapted to 3D right ventricular myocardial motion tracking by using the corresponding 2D edge maps in the shape regularization module during training.

In shape regularization, we use edge maps to represent anatomical shape, i.e., we predict 3D edge maps of the myocardial wall and we use 2D edge maps defined in the multi-view images to provide shape information. This is because (1) the contour of the myocardial wall is more representative of anatomical shape than the content, (2) compared to 3D dense segmentation, 3D edge maps with sparse labels are more likely to be estimated by images from sparse multi-view planes, and (3) using edge maps offers the potential of using automatic contour detection algorithms to obtain shape information directly from images.

An automated algorithm is utilized to obtain 2D edge maps for providing shape information in the shape regularization module. This is because manual data labeling is time-consuming, costly and usually unavailable. The proposed method can be robust to these automatically obtained 2D edge maps since the 2D edge maps only provides constraint to spatially sparse planes for the estimated 3D edge maps.

We use the aligned 2D edge maps of SAX stacks to train MulViMotion. This is reasonable because aligned SAX ground truth edge maps can introduce correct shape information of the heart, and thus can explicitly constrain the estimated 3D motion field to reflect the real motion of the heart. Nevertheless, we further test the effectiveness of the proposed method by utilizing unaligned SAX edge maps during training. In specific, MulViMotion* uses the algorithm in [49] to predict the 2D segmentation of myocardium for each SAX slice independently without accounting for the inter-slice misalignment. The contour of this 2D segmentation is used as the SAX ground truth edge map during training. LAX ground truth edge maps are still generated based on [9]. Table VI and Fig. 11 (e.g., Inline graphic) show that the proposed method is capable of estimating 3D motion even if it is trained with unaligned SAX edge maps. This indicates that the LAX 2CH and 4CH view images that provides correct longitudinal anatomical shape information can compensate the slice misalignment in the SAX stacks and thus makes a major contribution to the improved estimation accuracy of through-plane motion.

TABLE VII. Quantitative Comparison Between VoxelMorph (VM) [42] and MulVi Motion on Test Set. VM Follows the Optimal Architecture and Hyper-Parameters Suggested by the Authors. VM† Uses a Bigger Architecture10. Results are Reported the Same Way as Table II. Best Results in Bold.

Methods Dice Inline graphic HD (mm) Inline graphic VD (%) Inline graphic
VM [44] 0.7115 (0.0339) 15.3277 (2.7690) 34.71 (11.84)
VM† [44] 0.7147 (0.0307) 17.6747 (4.3181) 31.75 (10.80)
MulViMotion 0.8200 (0.0348) 14.5937 (4.2449) 8.62 (4.85)

TABLE VI. Quantitative Comparison Between 3D-UNet and MulViMotion * on Test Set. MulViMotion * Uses Unaligned SAX Ground Truth Edge Maps During Training. Results are Reported the Same Way as Table II. Best Results in Bold.

Methods Dice Inline graphic HD (mm) Inline graphic VD (%) Inline graphic
3D-UNet [44] 0.7382 (0.0293) 17.4785 (3.1030) 30.97 (9.89)
MulViMotion* 0.7856 (0.0295) 16.0028 (3.9749) 21.35 (5.32)

Fig. 11.

Fig. 11.

3D visualization of motion tracking results using 3D-UNet and MulViMotion*. MulViMotion* uses unaligned SAX ground truth edge maps during training.

In the proposed method, a hybrid 2D/3D network is built to estimate 3D motion fields, where the 2D CNNs combine multi-view features and the 3D CNNs learn 3D representations from the combined features. Such a hybrid network can occupy less GPU memory compared to a pure 3D network. In particular, the number of parameters in this hybrid network is 21.7 millions, much less than 3D-UNet (41.5 millions). Moreover, this hybrid network is able to take full advantage of 2D multi-view images because it enables learning 2D features from each anatomical view before learning 3D representations.

In the experiment, we use 580 subjects for model training and evaluation. This is mainly because our work tackles 3D data and the number of training subjects is limited by the cost of model training. In specific, we used 500 subjects to train our model for 300 epochs with a NVIDIA Tesla T4 GPU, which requires ~ 60 hours of training for each model. In addition, this work focus on developing the methodology for multi-view motion tracking and this sample size align with other previous cardiac analysis work for method development [11], [15], [32], [33]. A population-based clinical study for the whole UK Biobank (currently ~ 50,000 subjects) still requires future investigation.

With the view planning step in standard cardiac MRI acquisition, the acquired multi-view images are aligned and thus are able to describe a heart from different views [50]. In order to preserve such spatial connection between multiple separate anatomical views, data augmentations (e.g., rotation and scaling) that used in some 2D motion estimation works are excluded in this multi-view 3D motion tracking task.

We use two LAX views (2CH and 4CH) in this work for 3D motion estimation but the number of anatomical views is not a limitation of the proposed method. More LAX views (e.g., 3-chamber view) can be integrated into MulViMotion by adding extra encoders in FeatureNet and extra views in Inline graphic for shape regularization. However, each additional anatomical view can lead to an increased occupation of GPU memory and extra requirement of image annotation (i.e., 2D edge maps).

The data used in the experiment is acquired by a 1.5 Tesla (1.5T) scanner but the proposed method can be applied on 3T CMR images. The possible dark band artifacts in 3T CMR images may affect the image similarity loss ( Inline graphic). However, the high image quality of 3T CMR and utilizing high weights for the regularization terms (e.g., shape regularization and the local smoothness loss) may potentially reduce the negative effect of these artifacts.

We utilize the ED frame and the Inline graphic-th frame ( Inline graphic, Inline graphic is the number of frames) as paired frames to estimate the 3D motion field. This is mainly because the motion estimated from such frame pairing is needed for downstream tasks such as strain estimation [27], [51], [52]. In the cardiac motion tracking task, the reference frame is commonly chosen as the ED or ES frame [15]. Such frame pairing can often be observed in other cardiac motion tracking literature, e.g., [11], [12], [15].

In this work, apart from two typical and widely used conventional algorithms, we also compared the proposed method with a learning-based method [31] which can represent most of the recent image registration works. In specific, the architecture of [31] has been used in many recent works, e.g., [37], [42], [43], and many other recent works, e.g., [42], [53], [54], are similar to [31] where only single view images are utilized for image registration. Nevertheless, we further thoroughly compared the proposed method with another recent and widely used learning-based image registration method [42] (VoxelMorph9). We train VoxelMorph following the optimal architecture and hyper-parameters suggested by the authors (VM) and we also train VoxelMorph with a bigger architecture10 (VM †). For fair comparison, 2D ground truth edge maps ( Inline graphic, Inline graphic in Eq. 8) are used to generate the segmentation of SAX stacks for adding auxiliary information. Table VI shows that the proposed method outperforms VoxelMorph for 3D cardiac motion tracking. This is expected because SAX segmentation used in VoxelMorph has low through-plane resolution and thus can hardly help improve 3D motion estimation. Moreover, VoxelMorph only uses single view images while the proposed method utilizes information from multiple views.

VI. Conclusion

In this paper, we propose multi-view motion estimation network (MulViMotion) for 3D myocardial motion tracking. The proposed method takes full advantage of routinely acquired multi-view 2D cine CMR images to accurately estimate 3D motion fields. Experiments on the UK Biobank dataset demonstrate the effectiveness and practical applicability of our method compared with other competing methods.

Supplementary Materials

Supplementary materials
Download video file (86.4KB, mp4)
Download video file (101KB, mp4)
Download video file (91.7KB, mp4)
Download video file (87.6KB, mp4)

Appendix

A. Examples of 3D Masks

Fig. 12 shows the examples of 3D masks used in the shape regularization module of MulViMotion. These 3D masks identify the locations of multi-view images in the SAX stack. We generate these 3D masks in image preprocessing step by a coordinate transformation using DICOM image header information.

Fig. 12.

Fig. 12.

Examples of 3D masks used in the shape regularization module of MulViMotion. The top row show the 2D images from different anatomical views in the space of the SAX stack. The bottom row show the 3D masks which represent the locations of these 2D images in the SAX stack. (a) The 2D images from SAX view (9 slices). (b) The single 2D image from 2CH view. (c) The single 2D image from 4CH view.

B. The Dynamic Videos of Motion Tracking Results

The dynamic videos of motion tracking results of different motion estimation methods have been attached as “Dynamic_videos.zip” in the supplementary material. This file contains four MPEG-4 movies where “FFD.mp4”, “dDemons.mp4”, “3D-UNet.mp4” are the results of the corresponding baseline methods and “MulViMotion.mp4” is the result of the proposed method. All methods are applied on the same test subject. The Codecs of these videos is H.264. We have opened the uploaded videos in computers with (1) Win10 operating system, Movies&TV player, (2) Linux Ubuntu 20.04 operating system, Videos player, and (3) Mac OS, QuickTime Player. However, if there is any difficulty to open the attached videos, the same dynamic videos can be found in https://github.com/qmeng99/dynamic_videos/blob/main/README.md

C. Additional 3D Motion Tracking Results

Fig. 13 shows the additional 3D motion tracking results on a test subject with slice misalignment. This test subject is the same subject used in Fig. 10 in the main paper. These more results further demonstrate that the proposed method is able to reduce the negative effect of slice misalignment on 3D motion tracking. In addition, we have computed more established clinical biomarkers. Fig. 14 shows the temporal ejection fraction across the cardiac cycle.

Fig. 13.

Fig. 13.

Motion tracking results on the test subject with slice misalignment. using 3D-UNet [44], MulViMotion, and MulViMotion without Inline graphic. (a) The warped 3D segmentation overlaid on SAX view. (b) The 3D visualization of the motion tracking results. The green arrows show examples of motion tracking failures using 3D-UNet. Note that we show results in frame Inline graphic for a more distinct comparison.

Fig. 14.

Fig. 14.

The results of temporal ejection fraction across the cardiac cycle. (a) Results on a randomly selected test subject. (b) Results on all test subjects (mean values and confidence interval are presented).

D. Applications

1). Strain Estimation:

Myocardial strain provide a quantitative evaluation for the total deformation of a region of tissue during the heartbeat. It is typically evaluated along three orthogonal directions, namely radial, circumferential and longitudinal. Here, we evaluate the performance of the proposed method by estimating the three strains based on the estimated 3D motion field Inline graphic. The myocardial mesh at the ED frame is warped to the Inline graphic-th frame using a numeric method and vertex-wise strain is calculated using the Lagrangian strain tensor formula [55] (implemented by https://github.com/Marjola89/3Dstrain_analysis). Subsequently, global strain is computed by averaging across all the vertices of the myocardial wall.

Fig. 15 shows the estimated global strain curves on test subjects. Both the shapes of the curves and the value ranges of peak strains are consistent with reported results in the literature [52], [56], [57], i.e., radial peak strain is ~ 20% to ~ 70%, circumferential peak strain is ~ −15% to ~ −22% and longitudinal peak strain is ~ −8% to ~ −20%.

Fig. 15.

Fig. 15.

Global strains across the cardiac cycle which are estimated base on MulViMotion. (a) Results on a randomly selected test subject. (b) Results on all test subjects (mean values and confidence interval are presented).

To get more reference strains, we have separately computed global longitudinal and circumferential strains on the 2D LAX and SAX slices according to the algorithm in [14]. On the test set, global longitudinal peak strain is −18.55%±2.74% (ours is −9.72%±2.49%) while global circumferential peak strain is −22.76%±3.31% (ours is −27.38%±9.63%). It is possible that our strains are different from these strains. This is because these strains in [14] are computed only on sparse 2D slices by 2D motion field estimation, and in contrast, we compute global strains by considering the whole myocardium wall with 3D motion fields.

Compared to echocardiograpy, another widely used imaging modality for strain estimation, the average circumferential peak strain reported in our work (−27.38%) is consistent with those typically reported in echocardiograpy (~ −22% to ~ −32% [58]). The average longitudinal peak strain in our study (−9.72%) is lower than that reported in echocardiograpy (~ −20% to ~ −25% [58]). This difference is likely due to the higher spatial and temporal resolution of echocardiography (e.g., Inline graphic for spatial resolution and 40 – 60 frames/s for temporal resolution) compared to CMR (e.g., our data has Inline graphic in-plane resolution, Inline graphic through-plane resolution and 50 frames/heart-beat temporal resolution) [41], [58].

For strain estimation, our results are in general consistent with the value ranges reported in [52], [56], [57]. However, it has to be noted that we calculate the strain based on 3D motion fields, whereas most existing strain analysis methods or software packages are based on 2D motion fields, i.e. only accounting for in-plane motion within SAX or LAX views. This may result in difference between our estimated strain values and the reported strain values in literature. In addition, there is still a lack of agreement of strain value ranges (in particular for radial strains) even among mainstream commercial software packages [57]. This is because strain value ranges can vary depending on the vendors, imaging modalities, image quality and motion estimation techniques [57], [58]. It still requires further investigations to set up a reference standard for strain evaluation and to carry out clinical association studies using the reported strain values. Moreover, when manual segmentation is available, it could be used to provide more perfect and accurate shape constraint, which may further improve 3D motion estimation and thus strain estimation.

Funding Statement

This work was supported in part by the British Heart Foundation under Grant RG/19/6/34387 and Grant RE/18/4/34215, in part by the Medical Research Council under Grant MC-A658-5QEB0, in part by the National Institute for Health Research (NIHR) Imperial College Biomedical Research Centre, and in part by Wellcome Trust Grant 102431. This research has been conducted using the UK Biobank resource under Application 40616.

Footnotes

1

The contours are generated based on [9] and a manual quality control. Detailed information is shown in Sec. IV-A.

2

This is implemented by Pytorch function grid_sample().

3

Code is at DOI: 10.5281/zenodo.6092253

4

Application number 40616, https://www.ukbiobank.ac.uk/

10

Filters in encoder are [64, 128, 256, 512] while filters in decoder are [512, 512, 256, 256, 128, 64, 64]. The weight of the smoothness loss is chosen with grid search ( Inline graphic) and we select the value with the best result on validation data Inline graphic. The weight for auxiliary segmentation is chosen from Inline graphic and we select Inline graphic.

Contributor Information

Qingjie Meng, Email: q.meng16@imperial.ac.uk.

Chen Qin, Email: chen.qin@ed.ac.uk.

Wenjia Bai, Email: w.bai@imperial.ac.uk.

Tianrui Liu, Email: t.liu15@imperial.ac.uk.

Antonio de Marvao, Email: antonio.de-marvao@imperial.ac.uk.

Declan P O’Regan, Email: declan.oregan@lms.mrc.ac.uk.

Daniel Rueckert, Email: daniel.rueckert@tum.de.

References

  • [1].Bello G. A.et al. , “Deep-learning cardiac motion analysis for human survival prediction,” Nature Mach. Intell., vol. 1, no. 2, pp. 95–104, Feb. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Stefanovska A., “Physics of the human cardiovascular system,” Contemp. Phys., vol. 40, no. 1, pp. 31–55, Jan. 1999. [Google Scholar]
  • [3].Ivanov P. C., Amaral L. A. N., Goldberger A. L., and Stanley H. E., “Stochastic feedback and the regulation of biological rhythms,” Europhys. Lett., vol. 43, no. 4, pp. 363–368, Aug. 1998. [DOI] [PubMed] [Google Scholar]
  • [4].Shen D., Sundar H., Xue Z., Fan Y., and Litt H., “Consistent estimation of cardiac motions by 4D image registration,” in Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2005. [DOI] [PubMed] [Google Scholar]
  • [5].Reindl M.et al. , “Prognostic implications of global longitudinal strain by feature-tracking cardiac magnetic resonance in ST-elevation myocardial infarction,” Circulat., Cardiovascular Imag., vol. 12, no. 11, Nov. 2019, Art. no. e009404. [DOI] [PubMed] [Google Scholar]
  • [6].Puyol-Anton E.et al. , “Regional multi-view learning for cardiac motion analysis: Application to identification of dilated cardiomyopathy patients,” IEEE Trans. Biomed. Eng., vol. 66, no. 4, pp. 956–966, Apr. 2019. [DOI] [PubMed] [Google Scholar]
  • [7].Ibrahim E.-S.-H., “Myocardial tagging by cardiovascular magnetic resonance: Evolution of techniques–pulse sequences, analysis algorithms, and applications,” J. Cardiovascular Magn. Reson., vol. 13, no. 1, pp. 1–40, Dec. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Claus P., Omar A. M. S., Pedrizzetti G., Sengupta P. P., and Nagel E., “Tissue tracking technology for assessing cardiac mechanics: Principles, normal values, and clinical applications,” JACC Cardiovasc. Imag., vol. 8, pp. 1444–1460, Dec. 2015. [DOI] [PubMed] [Google Scholar]
  • [9].Duan J.et al. , “Automatic 3D bi-ventricular segmentation of cardiac images by a shape-refined Multi- task deep learning approach,” IEEE Trans. Med. Imag., vol. 38, no. 9, pp. 2151–2164, Sep. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Ginat D. T., Fong M. W., Tuttle D. J., Hobbs S. K., and Vyas R. C., “Cardiac imaging: Part 1, MR pulse sequences, imaging planes, and basic anatomy,” Amer. J. Roentgenology, vol. 197, no. 4, pp. 808–815, Oct. 2011. [DOI] [PubMed] [Google Scholar]
  • [11].Qin C., Bai W., Schlemper J., Petersen S. E., Piechnik S. K., Neubauer S., and Rueckert D., “Joint learning of motion estimation and segmentation for cardiac MR image sequences,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2018. [Google Scholar]
  • [12].Zheng Q., Delingette H., and Ayache N., “Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow,” Med. Image Anal., vol. 56, pp. 80–95, Aug. 2019. [DOI] [PubMed] [Google Scholar]
  • [13].Qin C., Wang S., Chen C., Qiu H., Bai W., and Rueckert D., “Biomechanics-informed neural networks for myocardial motion tracking in MRI,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2020. [Google Scholar]
  • [14].Bai W.et al. , “A population-based phenome-wide association study of cardiac and aortic structure and function,” Nature Med., vol. 26, no. 10, pp. 1654–1662, Oct. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Yu H.et al. , “FOAL: Fast online adaptive learning for cardiac motion estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 4313–4323. [Google Scholar]
  • [16].Zerhouni E. A., Parish D. M., Rogers W. J., Yang A., and Shapiro E. P., “Human heart: Tagging with MR imaging—A method for noninvasive assessment of myocardial motion,” Radiology, vol. 169, no. 1, pp. 59–63, Oct. 1988. [DOI] [PubMed] [Google Scholar]
  • [17].Osman N. F., McVeigh E. R., and Prince J. L., “Imaging heart motion using harmonic phase MRI,” IEEE Trans. Med. Imag., vol. 19, no. 3, pp. 186–202, Mar. 2000. [DOI] [PubMed] [Google Scholar]
  • [18].Chen T., Wang X., Chung S., Metaxas D., and Axel L., “Automated 3D motion tracking using Gabor filter bank, robust point matching, and deformable models,” IEEE Trans. Med. Imag., vol. 29, no. 1, pp. 1–11, Jan. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Liu X.et al. , “Incompressible deformation estimation algorithm (IDEA) from tagged MR images,” IEEE Trans. Med. Imag., vol. 31, no. 2, pp. 326–340, Feb. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Wang Y. P., Chen Y., and Amini A. A., “Fast LV motion estimation using subspace approximation techniques,” IEEE Trans. Med. Imag., vol. 20, no. 6, pp. 499–513, Jun. 2001. [DOI] [PubMed] [Google Scholar]
  • [21].Papademetris X., Sinusas A. J., Dione D. P., and Duncan J. S., “Estimation of 3D left ventricular deformation from echocardiography,” Med. Image Anal., vol. 5, no. 1, pp. 17–28, Mar. 2001. [DOI] [PubMed] [Google Scholar]
  • [22].De Craene M.et al. , “Temporal diffeomorphic free-form deformation: Application to motion and strain estimation from 3D echocardiography,” Med. Image Anal., vol. 16, no. 2, pp. 427–450, Feb. 2012. [DOI] [PubMed] [Google Scholar]
  • [23].Rueckert D., Sonoda L. I., Hayes C., Hill D. L. G., Leach M. O., and Hawkes D. J., “Nonrigid registration using free-form deformations: Application to breast MR images,” IEEE Trans. Med. Imag., vol. 18, no. 8, pp. 712–721, Aug. 1999. [DOI] [PubMed] [Google Scholar]
  • [24].Chandrashekara R., Mohiaddin R. H., and Rueckert D., “Analysis of 3-D myocardial motion in tagged MR images using nonrigid image registration,” IEEE Trans. Med. Imag., vol. 23, no. 10, pp. 1245–1250, Oct. 2004. [DOI] [PubMed] [Google Scholar]
  • [25].Shi W.et al. , “A comprehensive cardiac motion estimation framework using both untagged and 3-D tagged MR images based on nonrigid registration,” IEEE Trans. Med. Imag., vol. 31, no. 6, pp. 1263–1275, Jun. 2012. [DOI] [PubMed] [Google Scholar]
  • [26].Tobon-Gomez C.et al. , “Benchmarking framework for myocardial tracking and deformation algorithms: An open access database,” Med. Image Anal., vol. 17, no. 6, pp. 632–648, 2013. [DOI] [PubMed] [Google Scholar]
  • [27].Puyol-Anton E.et al. , “Fully automated myocardial strain estimation from cine MRI using convolutional neural networks,” in Proc. ISBI, 2018, pp. 1139–1143. [Google Scholar]
  • [28].Thirion J.-P., “Image matching as a diffusion process: An analogy with Maxwell’s demons,” Med. Image Anal., vol. 2, no. 3, pp. 243–260, 1998. [DOI] [PubMed] [Google Scholar]
  • [29].Vercauteren T., Pennec X., Perchant A., and Ayache N., “Non-parametric diffeomorphic image registration with the demons algorithm,” in Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2007. [DOI] [PubMed] [Google Scholar]
  • [30].McLeod K., Prakosa A., Mansi T., Sermesant M., and Pennec X., “An incompressible log-domain demons algorithm for tracking heart tissue,” in Proc. MICCAI Workshop STACOM, 2011, pp. 55–67. [Google Scholar]
  • [31].Ronneberger O., Fischer P., and Brox T., “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2015, pp. 234–241. [Google Scholar]
  • [32].Ye M.et al. , “DeepTag: An unsupervised deep learning method for motion tracking on cardiac tagging magnetic resonance images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 7261–7271. [Google Scholar]
  • [33].Chen C., Biffi C., Tarroni G., Petersen S. E., Bai W., and Rueckert D., “Learning shape priors for robust cardiac MR segmentation from multi-view images,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2019, pp. 523–531. [Google Scholar]
  • [34].Abdelkhalek M., Aguib H., Moustafa M., and Elkhodary K., “Enhanced 3D myocardial strain estimation from multi-view 2D CMR imaging,” 2020, arXiv:2009.12466. [Google Scholar]
  • [35].Attar R.et al. , “3D cardiac shape prediction with deep neural networks: Simultaneous use of images and patient metadata,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2019. [Google Scholar]
  • [36].Cheng J., Tsai Y.-H., Wang S., and Yang M.-H., “Segflow: Joint learning for video object segmentation and optical flow,” in Proc. ICCV, 2017, pp. 686–695. [Google Scholar]
  • [37].Ta K., Ahn S. S., Stendahl J. C., Sinusas A. J., and Duncan J. S., “A semi-supervised joint network for simultaneous left ventricular motion tracking and segmentation in 4D echocardiography,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Jaderberg M., Simonyan K., Zisserman A., and Kavukcuoglu K., “Spatial transformer networks,” in Proc. NIPS, 2015, pp. 2017–2025. [Google Scholar]
  • [39].Caballero J.et al. , “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proc. CVPR, Jul. 2017, pp. 4778–4787. [Google Scholar]
  • [40].Bycroft C.et al. , “The U.K. Biobank resource with deep phenotyping and genomic data,” Nature, vol. 562, no. 7726, pp. 203–209, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Petersen S.et al. , “U.K. Biobank’s cardiovascular magnetic resonance protocol,” J. Cardiovascular Magn. Reson., vol. 18, pp. 1–7, Dec. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Balakrishnan G., Zhao A., Sabuncu M. R., Guttag J., and Dalca A. V., “VoxelMorph: A learning framework for deformable medical image registration,” IEEE Trans. Med. Imag., vol. 38, no. 8, pp. 1788–1800, Aug. 2019. [DOI] [PubMed] [Google Scholar]
  • [43].Xu Z.et al. , “Adversarial uni- and multi-modal stream networks for multimodal image registration,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2020, pp. 222–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Çiçek Ö., Abdulkadir A., Lienkamp S., Brox T., and Ronneberger O., “3D U-net: Learning dense volumetric segmentation from sparse annotation,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2016, pp. 424–432. [Google Scholar]
  • [45].Clough J., Oksuz I., Anton E. P., Ruijsink B., King A., and Schnabel J., “Global and local interpretability for cardiac mri classification,” in Medical Image Computing and Computer-Assisted Intervention. Cham, Switzerland: Springer, 2019. [Google Scholar]
  • [46].Biffi C.et al. , “3D high-resolution cardiac segmentation reconstruction from 2D views using conditional variational autoencoders,” in Proc. ISBI, 2019, pp. 1643–1646. [Google Scholar]
  • [47].Ubachs J., Heiberg E., Steding K., and Arheden H., “Normal values for wall thickening by magnetic resonance imaging,” J. Cardiovascular Magn. Reson., vol. 11, no. S1, pp. 1–316, Jan. 2009. [Google Scholar]
  • [48].Dong S. J.et al. , “Left ventricular wall thickness and regional systolic function in patients with hypertrophic cardiomyopathy. A three-dimensional tagged magnetic resonance imaging study,” Circulation, vol. 90, no. 3, pp. 1200–1209, Sep. 1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Bai W.et al. , “Automated cardiovascular magnetic resonance image analysis with fully convolutional networks,” J. Cardiovascular Magn. Reson., vol. 20, no. 1, pp. 1–12, Dec. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Lu X.et al. , “Automatic view planning for cardiac MRI acquisition,” in Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2011, pp. 86–479. [DOI] [PubMed] [Google Scholar]
  • [51].Sinclair M.et al. , “Myocardial strain computed at multiple spatial scales from tagged magnetic resonance imaging: Estimating cardiac biomarkers for CRT patients,” Med. Image Anal., vol. 43, pp. 169–185, Jan. 2018. [DOI] [PubMed] [Google Scholar]
  • [52].Ferdian E.et al. , “Fully automated myocardial strain estimation from cardiovascular MRI–tagged images using a deep learning framework in the U.K. Biobank,” Radiol., Cardiothoracic Imag., vol. 2, no. 1, Feb. 2020, Art. no. e190032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].de Vos B. D., Berendsen F. F., Viergever M. A., Sokooti H., Staring M., and Išgum I., “A deep learning framework for unsupervised affine and deformable image registration,” Med. Image Anal., vol. 52, pp. 128–143, Feb. 2018. [DOI] [PubMed] [Google Scholar]
  • [54].Krebs J., Delingette H., Mailhé B., Ayache N., and Mansi T., “Learning a probabilistic model for diffeomorphic registration,” IEEE Trans. Med. Imag., vol. 38, no. 9, pp. 2165–2176, Sep. 2019. [DOI] [PubMed] [Google Scholar]
  • [55].Petitjean C., Rougon N., and Cluzel P., “Assessment of myocardial function: A review of quantification methods and results using tagged MRI,” J. Cardiovascular Magn. Reson., vol. 7, no. 2, pp. 501–516, Apr. 2005. [DOI] [PubMed] [Google Scholar]
  • [56].Kawel-Boehm N.et al. , “Normal values for cardiovascular magnetic resonance in adults and children,” J. Cardiovascular Magn. Reson., vol. 17, no. 1, pp. 1–33, Dec. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Cao J. J., Ngai N., Duncanson L., Cheng J., Gliganic K., and Chen Q., “A comparison of both DENSE and feature tracking techniques with tagging for the cardiovascular magnetic resonance assessment of myocardial strain,” J. Cardiovascular Magn. Reson., vol. 20, no. 1, pp. 1–9, Dec. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Amzulescu M. S.et al. , “Myocardial strain imaging: Review of general principles, validation, and sources of discrepancies,” Eur. Heart J. Cardiovascular Imag., vol. 20, no. 6, pp. 605–619, Jun. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary materials
Download video file (86.4KB, mp4)
Download video file (101KB, mp4)
Download video file (91.7KB, mp4)
Download video file (87.6KB, mp4)

Articles from Ieee Transactions on Medical Imaging are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES