Using Synthetic Data Generation to Train a Cardiac Motion Tag Tracking Neural Network

Michael Loecher; Luigi E Perotti; Daniel B Ennis

doi:10.1016/j.media.2021.102223

. Author manuscript; available in PMC: 2022 Dec 1.

Published in final edited form as: Med Image Anal. 2021 Sep 10;74:102223. doi: 10.1016/j.media.2021.102223

Using Synthetic Data Generation to Train a Cardiac Motion Tag Tracking Neural Network

Michael Loecher ^a,^*, Luigi E Perotti ^b, Daniel B Ennis ^a,^c,^d

PMCID: PMC8560564 NIHMSID: NIHMS1743586 PMID: 34555661

Abstract

A CNN based method for cardiac MRI tag tracking was developed and validated. A synthetic data simulator was created to generate large amounts of training data using natural images, a Bloch equation simulation, a broad range of tissue properties, and programmed ground-truth motion. The method was validated using both an analytical deforming cardiac phantom and in vivo data with manually tracked reference motion paths. In the analytical phantom, error was investigated relative to SNR, and accurate results were seen for SNR>10 (displacement error <0.3 mm). Excellent agreement was seen in vivo for tag locations (mean displacement difference = −0.02 pixels, 95% CI [−0.73, 0.69]) and calculated cardiac circumferential strain (mean difference = 0.006, 95% CI [−0.012, 0.024]). Automated tag tracking with a CNN trained on synthetic data is both accurate and precise.

Keywords: Cardiac MRI, Tag Tracking, Convolutional Neural Network, Machine Learning, Synthetic Data

Graphical Abstract

graphic file with name nihms-1743586-f0008.jpg

1. Introduction

Cardiac MRI (CMR) is used clinically to measure cardiac performance and is a vital tool to diagnose cardiac dysfunction. Ejection fraction (EF) is the most widely used quantitative measure of global cardiac function. Decreased EF, however, is a non-specific and late outcome. Measures of regional cardiac strains can improve early diagnosis, guide clinical decision making, and augment our understanding of heart function and dysfunction (Scatteia et al., 2017; Wang and Amini, 2011).

CMR tagging enables the quantitative characterization of global (e.g., torsion) and regional (e.g., strain) cardiac function. CMR tagging uses radio-frequency pulses to create a grid of dark (signal nulled) lines within a few milliseconds. Subsequent cine CMR imaging enables the tag lines to be visualized throughout the cardiac cycle. Image processing can “track” the tag lines in each time frame from which regional cardiac displacements can be measured and used to calculate cardiac strains. These measures are better predictors of cardiac dysfunction than EF for a range of diseases (Götte et al., 2006; Scatteia et al., 2017).

The clinical adoption of CMR tagging, however, has long been hampered by difficult post-processing methods and weak validation. While the challenge of extracting the information persists, these quantitative measurements are important for understanding cardiac dysfunction, evaluating disease progression, and characterizing the response to therapy. Numerous methods for tracking tags exist (Osman et al., 1999; Prince and McVeigh, 1992; Young et al., 1995). However, many include laborious segmentation and tag-tracking corrections, or other substantial user input to produce confident and quantitative measurements.

Convolutional neural networks (CNN) are well suited to both image segmentation (Ronneberger et al., 2015) and motion tracking (Fechter and Baltas, 2020; Ferdian et al., 2020; Nam and Han, 2016). Training a CNN for tag tracking, however, requires a large amount of training data and the associated ‘ground truth’ tag motion. This presents two significant limitations because CNN training based on in vivo data requires both tag tracking results from another tracking algorithm and large amounts of data. As an alternative to training a CNN on in vivo data, a growing body of research has demonstrated the use training networks with synthetic datasets across a range of applications (Barbosa et al., 2018; Haibo He et al., 2008; Jaderberg et al., 2014; Shrivastava et al., 2017; Tremblay et al., 2018; Wu et al., 2019), and in medical imaging (Frid-Adar et al., 2018; Heimann et al., 2014; Mahmood et al., 2018). The use of synthetic data addresses these limitations by allowing any amount of data to be generated and by providing an objective ‘ground truth’ motion field to use during training and validation.

The objective of this study was to develop a CNN approach for fast and automatic tag tracking. The network was trained using an extensive synthetic data generation and MRI simulation framework. The approach generates large amounts of synthetic training data using natural images, a Bloch equation simulation, a broad range of tissue properties, and programmed ground-truth motion. These images are then used to train a CNN to extract motion paths from the input images. Strain measures from the tracking algorithm are validated in an analytical deforming cardiac phantom and using in vivo images. Additionally, CNN tracking was tested against manual tracking in pediatric patients with Duchenne’s Muscular Dystrophy (DMD) and known cardiomyopathy.

2. Methods

2.1. Synthetic Data Generation

Natural images were obtained by randomly drawing 100,000 images from ImageNet (Deng et al., 2009) and the Columbia University Image Library (Nene et al., 1996) (Fig. 1A). Images were converted to grayscale and resized (cropped and/or bicubic interpolated) to 256x256 pixels. To simulate typical signal voids in MR images, masks were randomly applied to the images either as ellipses to mask out the exterior of an image, as annular masks of random size to simulate left ventricular-like structures, or as masks on grayscale values in the image (Fig. 1B). Each image pixel was then further discretized into nine points using bicubic interpolation for a total resolution of 768x768 points, which was needed to simulate MR images with intravoxel features.

Figure 1: — A) Images were drawn from a large natural image database (Deng et al., 2009), and converted to 768x768 grayscale images, where each pixel was considered a spin for the purposes of the Bloch simulation. B) Additional pseudo-anatomical features were added to the image, including simulated signal voids with pseudo-randomly shaped masks and short-axis heart-like feature areas. C) Motion paths were generated as a perturbed elliptical path, with a cardiac-motion-like smooth periodic waveform. The figure depicts a random sampling of some of the motion paths created during different data generations. D) Each pixel was assigned MRI values (M0, T1, T2), and used in a Bloch simulation of a cine tagged (1-4-6-4-1 SPAMM) MR imaging, outputting a 256x256 simulated MR image of the moving object. E) 32x32 patches were extracted from the dataset for CNN training. Each patch is linked to a precisely known motion path corresponding to the motion at the center of each patch.

Each pixel was assigned a 2D elliptical motion path with a polynomial-smooth temporal parameterization, $\vec{r} (r_{0}, t)$ , based on its starting position ${\vec{r}}_{0}$ following:

\vec{r} ({\vec{r}}_{0}, τ) = R (θ) \vec{E} ({\vec{r}}_{0}, τ) + {\vec{r}}_{m o d} ({\vec{r}}_{0}, τ),

(1a)

with:

\vec{E} ({\vec{r}}_{0}, τ) = [\begin{array}{l} A ({\vec{r}}_{0}) \cos (τ) \\ B ({\vec{r}}_{0}) s i n (τ) \end{array}]

(1b)

where $A ({\vec{r}}_{0}) and B ({\vec{r}}_{0})$ are parameter fields generated with random coefficients used to define an elliptical shape, $\vec{E} ({\vec{r}}_{0}, τ)$ , for each motion path. R(θ) is a 2D rotation matrix with θ defined by a parameter field $(θ = C ({\vec{r}}_{0}))$ to describe the orientation of the elliptical motion path. τ gives the temporal characteristics of the motion and ranges from 0 to 2π and provides a wide range of temporal dynamics to the model, with more motion at the beginning of the cycle than at the end, implemented with a sine lobe shape. ${\vec{r}}_{m o d} ({\vec{r}}_{0}, τ)$ adds additional time dependent polynomial fields that perturb the motion path based on ${\vec{r}}_{0}$ and allows for a deviation from elliptical shapes. Example motion paths are shown in Fig. 1C, and further described in Fig. 2. Half of the training data applied motion fields to all spins in the image. For the other half of the training data, more cardiac-like motion was prescribed together with a heart-like mask and θ was defined to produce contractile movement. The parameter fields used as $A ({\vec{r}}_{0}), B ({\vec{r}}_{0}), C ({\vec{r}}_{0}),$ and ${\vec{r}}_{m o d} ({\vec{r}}_{0}, τ)$ were generated by randomly selecting coefficients for a second order polynomial field, and then further perturbed with Perlin noise (Fig. 2A,B). This approach ensures training against a very broad range of possible motion paths. Additionally, the data is truncated in the temporal dimension by removing the last 0-20% of data to account for the possibility of missing the final portion of the cardiac cycle due to prospective gating.

Figure 2: — A) Pseudo-random fields used to generate the motion described by Eq. 1. These pseudo-random fields represent motion parameters for general image deformation, while B) shows the same fields for cardiac-like motion generation. C) x- and y-positions over time for 100 randomly sampled motion paths from the training data. D) Box plots for the absolute displacement values from 50,000 randomly sampled paths. Whiskers represent 1.5·IQR, and outliers are removed for visualization. E) 2D histogram of the locations at peak displacement in x and y for the same 50,000 paths shown in panel (D).

For MRI signal simulation, each point was assigned an initial signal magnitude (proton density), T1 relaxation, and T2 relaxation based on the underlying grayscale value. Signal magnitude was directly proportional to the grayscale value. T1 and T2 were assigned from a third order polynomial with randomized coefficients defined for each image and scaled to a predefined acceptable range (T1 = [400, 1600 ms], T2 = [40, 400 ms]) (Fig. 1D).

The set of points and their motion paths were used as input to a Bloch simulation to produce realistic MR images. The Bloch simulation included grid-tagging pulses with randomized tag spacing [4, 12 mm] using a 1-4-6-4-1 SPAMM pulse. 25 time frames were generated. Random 2D Gaussian noise was added to the complex signal to achieve an SNR in the range of [10, 50]. Patches (32x32 pixels x 25 timeframes) were then extracted from the data at each tag line intersection to be tracked, with subpixel shifting as needed to place the desired point at the exact center of the patch so that no displacement is expected in the first timeframe, and displacements are measured relative to the center of the patch. The patch location is centered at the position from the first timeframe, and maintains that centering in all following frames. Tag line intersections were semi-automatically selected for tracking by using the known tagging grid configuration for initial selection, with manual adjustments sometimes required if tag lines moved between tagging and the first acquired image. The true motion path for the center of the patch (Eq.1) was also exported for training. Supplemental video file (patch_demo.mp4) shows an example of randomly generated patches, as well as patches from real in vivo data used in the analysis.

2.2. Convolutional Neural Network

A version of an 18-layer ResNet (ResNet-18) was used for this work (He et al., 2016), with changes to the convolution style and additional layers. Spatiotemporal (2+1)D convolutional layers were used and showed improvements over 3D convolutional layers (Tran et al., 2018). Additionally, CoordConv channels were added with each convolutional block to give positional references and increased performance (Liu et al., 2018). The final layer of the network involved a fully connected layer with linear activation, connected to a final output as a 2x25 vector of spatial positions that describe the estimated motion path. A network diagram is presented in Fig. 3A.

Figure 3: — A) Overview of the ResNet-18 architecture used in this study. All connections include batch normalization and ReLU activation. For this study, the input is a 32x32x25 patch, and the output is a 2x25 x- and y-vector describing the motion path. A patch is used for each point in the image to be tracked. B) Training and validation loss curves. C) An overview of the training procedure.

Training was performed for 200 epochs with a stochastic gradient descent optimizer and using cosine annealing to modify the learning rate (Loshchilov and Hutter, 2016), and a training validation split of 90%:10%. 100,000 synthetic images were created, and 10 patches were extracted from random locations in each image for a total training size of 1 million patches. A mean squared error loss function of the predicted path was used, with loss convergence shown in Fig. 3B. A total of one million imaging patches and accompanying motion paths were generated for training the neural net. The network was designed in pyTorch. Training the network required 20 hours with a single Nvidia GTX 1080Ti, while inference for a batch of image patches requires 200 ms on the same hardware (Fig. 3C). For all processing presented here, only one batch of inference was needed. Code for reproducing the network can be found at github.com/mloecher/tag_tracking.

2.3. Reference Phantom

Validating a trained network using in vivo MRI is difficult because there are no ground truth measurements of displacement or strain. To that end, we used an analytical deforming cardiac phantom generated using known strains (Perotti et al., 2020; Verzhbinsky et al., 2019) that served as ground truth. The phantom was constructed with an axially symmetric cardiac-mimicking geometry with initial (t = 0) inner radius equal to 25mm and wall thickness equal to 10 mm. The radial, circumferential, and longitudinal displacement fields for the phantom were designed by solving an optimization problem to best match target peak systolic strains in the myofiber (E_ff=−0.13) (Perotti et al., 2017), radial (E_rr=0.45 to 0.30, endocardium to epicardium), circumferential (E_cc=−0.20 to −0.15, endocardium to epicardium), and longitudinal (E_ℓℓ = −0.15) directions per (Moore et al., 2000; Zhong et al., 2010). The final ground truth strains from the deforming phantom were quantified from the Green-Lagrangian strain tensor corresponding to the simulated displacement field. The deforming cardiac phantom was then combined with a static body MR image and used as input to the synthetic data generation algorithm with Bloch simulations to compute realistic MR images representing the deformable heart with known ground truth displacements and strains.

2.4. Imaging

Imaging data was obtained from healthy pediatric volunteers (N=9, median age = 15Y) and 5 patients with DMD, (N=5, median age = 14Y) with IRB approval and informed consent. The selected DMD patients were from a larger cohort and represent the patients with the most advanced cardiac involvement from the disease. Cine grid-tagged images were acquired with 110° total tagging flip angle for the tagging pulse and 8 mm grid spacing, 8 mm slice thickness, TE/TR = 2.5 ms/4.9 ms, flip angle = 10 °, field of view = 260 mmx320 mm, spatial resolution = 1.4 mmx1.4 mm, 25 temporal time frames with retrospective gating, 21-37 ms temporal resolution, 8-12 s breath hold, on a 3T scanner (Siemens Skyra). Three short axis slices at basal, middle, and apical locations were acquired for each subject, resulting in 27 total slices in healthy subjects and 15 total slices in DMD patients.

2.5. Analysis

The CNN estimated displacement was compared to the programmed ground truth in the deforming cardiac phantom. Absolute error in the predicted motion paths was quantified as the distance from the programmed motion in the deforming cardiac phantom at each time point. Strains were then computed from the CNN displacement field and compared to the strains calculated in the deforming cardiac phantom. Strain error over all time frames and all pixels was reported. Additionally, these error metrics were used to evaluate the CNN’s performance over a range of SNR levels for the simulated MR images.

The grid tagged images were manually tracked by an expert reader. Each tag intersection point was followed by the reader across all time frames and the motion path for each point was exported and used as the in vivo ground truth. Across all slices in all subjects, 1398 total points were tracked, representing 11-50 (median = 34) points per slice depending on the number of tag line intersections within the LV.

Strains were calculated from the motion paths of all points within a slice using the same algorithm for both manually tracked and CNN predicted displacements. Strains were calculated in Matlab (v2020b) after first approximating the deformation mapping using a radial basis function (RBF) fitting (Bistoquet et al., 2008) with Tikhonov regularization (λ = 1e − 4). A Gaussian RBF was used with the shape parameters set to two times the tracked point spacing. The analytical derivative of the RBF was used to calculate the Green-Lagrange strain tensor. Both pixel-wise strains and the mid-wall (defined as 1/2 wall thickness) slice average strain were reported for each time frame and each slice.

For the in vivo images, the predicted motion paths were compared to the manually tracked motion paths. Positional tracking differences were measured for all points (1205 points) in all phases (25), as well as the derived circumferential strain curves across the mid-wall LV. Bland-Altman analysis was also performed to compare the positional measurements of all tracked points in the x and y directions (results combined), as well as mean mid-wall E_cc. Additionally, the images for each slice were processed with a clinically available software package (Diagnosoft, Myocardial Solutions), wherein the data was processed with a HARP like approach (Osman et al., 1999). E_cc curves were compared across methods, however strain from the HARP processing was inconsistent after peak contraction and was subsequently omitted from visualizations.

3. Results

Details of the training process are shown in Fig. 3B–C. Fig. 4 shows example images from the analytical deforming cardiac phantom. Mean absolute error of the tag locations was ⪅0.2 pixels for SNR >10, and mean strain differences were <0.01, with standard deviation <0.02 for SNR >10. Qualitative comparisons of the E_cc maps shown in Fig. 4E–F demonstrate very good agreement, with small (<0.02) heterogeneous strain differences throughout the LV.

Figure 4: — A) Analytical deforming cardiac phantom in the reference configuration (first time-frame). B) Overlay of the CNN identified tag locations compared to ground truth on the reference configuration image. C) Mean error of CNN tag locations compared to ground truth as a function of SNR. Error bars show standard deviation. D) Error in E_cc throughout all time frames, compared to ground truth as a function of SNR. E) A peak systolic E_cc map derived from the CNN estimated motion paths for SNR=30. F) An E_cc error map for peak systole with SNR=30.

Good agreement was seen in vivo between the CNN tag tracking and manual tag tracking (Fig. 5). Fig. 5B shows a linear regression of manually tracked and CNN tracked positions, where good agreement is seen, with r² = 0.96 for a linear fit of y = 1.01x – 0.02. Fig. 5C shows the mean and standard deviation of the agreement across time frames, where the peak difference in position is at peak contraction, and averages 0.53 ± 0.34 pixels. The median displacement difference between manually tracked and CNN tracked was 0.38 pixels. Bland-Altman analysis of all tracked points shows a bias = −0.02 pixels and limits of agreement (LoA) of [−0.73, 0.69 pixels] when comparing the manual tracking to CNN tracking. (Fig. 5D–F) shows Bland-Altman analyses of the data separated by slice locations, where similar bias and LoA were seen, with slightly worse performance in the apical slices.

Figure 5: — A) An example of *in vivo* expert manually tracked motion path and the CNN tracked motion path. The slice displayed here had the median disagreement among all slices. B) Linear regression of all x- and y-positions of manually tracked points versus CNN tracked points. C) Difference in tracked locations as a function of time frame, calculated as the Euclidean distance between points. The shaded region represents the standard deviation of the differences. D-F) Bland-Altman comparisons of the tracked position data between tracking methods. Differences in x and y are plotted as individual points. Data are separated by slice locations of base, mid, and apex.

Fig. 6A shows an example E_cc curve comparison from the case with the median disagreement between tracking methods. Good agreement between the two strain curves is seen, with mean difference = 0.006 ± 0.006. Fig. 6B shows an aggregate comparison of the mean and standard deviation of E_cc differences between expert manual tracking and CNN estimated strains across time frames. The worst time frame in aggregate was timeframe 11, which had an E_cc difference of 0.011 ± 0.012 Fig. 6C illustrates the overall good agreement between both expert manual tracking and CNN tracking when compared to the commercial HARP analysis software. Later timeframes are omitted when the HARP analysis was stopped due to poor performance.

Figure 6: — A) Example E_cc curves computed using expert manual tracking and from CNN estimated motion. This case is representative of the median disagreement between methods. B) Difference in E_cc computed on all cases based on expert manual tracked and CNN tracked points. The shaded region represents the standard deviation of the differences. C) Difference in E_cc computed in the commercial HARP analysis software and based on both manually and CNN tracked points. Measurements after time frame 11 were removed due to instabilities in the commercial product for later time frames.

Fig. 7 shows tracking results and derived strain values in a healthy subject as well as two different DMD patients. All slices were from the mid-ventricular level. Good agreement is seen in the tracking, strain maps, as well as the strain curves for these patients. Overall tracking accuracy in patients was measured with Bland-Altman analysis and showed bias = 0.05 pixels and LoA of [−0.41, 0.52 pixels], which was slightly better than in healthy subjects. The mean and standard deviation of the difference in E_cc across all patients and timeframes was −0.001 ± 0.004.

Figure 7: — Tracking and strain results from a healthy subject A) and two DMD patients B-C). Each row shows the tracked paths, calculated strain maps at peak systole, and global midwall strain curves with both manual tracking and CNN tacking. The patients show significantly lower strain, as well as more heterogeneous strain. Good agreement is seen between manual and CNN results in all comparisons. D) Shows a Bland-Altman analysis of the tracked positions from all points in all patients slices. E) Difference in manual tracked and CNN tracked computed E_cc across all patient slices. The shaded region represents the standard deviation of the differences.

4. Discussion

This work presents a synthetic data generator for motion encoded MRI data that can be used for training machine learning algorithms. The resultant CNN was used to quickly (<1s per image) and reliably (<0.03 LoA E_cc) track in vivo myocardial motion and to calculate cardiac strains. The CNN was trained entirely from the synthetically generated MRI data, which was created from natural images, simulated motion, and a full Bloch simulation of the MRI tagging sequence. The method was validated with an analytical deforming cardiac phantom and compared for accuracy using in vivo images with manually tracked motion. All comparisons showed very good agreement across methods, with negligible bias and average displacement disagreement <1.0 mm and calculated E_cc limits of agreement <0.03. When tested in patients with cardiomyopathy due to DMD, no difference in performance was seen relative to the healthy patients used in the study.

The technique produced these results with no manual review or modification of the tracking paths, as is often required with previous methods. While this allows for completely automatic tag tracking and strain calculation, some user input is still required to create an initial segmentation of the LV, which is used to determine the boundaries for strain calculation, and guide the selection of tag line intersections. Future work will involve integrating automatic segmentation methods (Bernard et al., 2018; Khened et al., 2019) into the workflow to make the entire image-to-strain quantification completely automated.

One strength of training with synthetic training data is that it allows for an objective ground truth value to be used for supervised learning, which is not easily obtainable in vivo. The lack of accepted tracking methods and truth values in vivo, however, also complicates the testing of in vivo tracking accuracy. Also in this study, manual tracking was used as ground truth reference for in vivo data. While the manual tracking used here is reliable, it is not a gold standard, particularly in diastole when tag lines have faded. A manuscript testing tracking algorithm performance in 3D tagged MR images used manual tracking as a gold standard, but found median inter-observer variability to be 0.77-0.84mm, similar to the median difference found in this work (0.69 mm) (Tobon-Gomez et al., 2013). It is therefore possible that the proposed CNN tracking method is equally accurate compared to manual tracking, with differences attributable to the limits of precision in manual tracking. Additional comparisons with other tracking algorithms or in deformable MRI motion phantoms (Drangova et al., 1996) are needed.

To enable the comparison with manual point tracking in this work, only tag line intersections were tracked, which limits the number of data points used to calculate strain. While this should not affect the comparison of methods in the presented work, it does limit the ability to compute radial strains, and the resolution of strain maps that can be generated. In the future, these points can be used with a spline based interpolation to alleviate this problem (Amini et al., 1998), or the network could be expanded to track more points along the tag lines.

Recent work by Ferdian et al. demonstrated a machine learning based approach to compute strain curves from tagged images (Ferdian et al., 2020). This work differs in network architecture and the source of training and testing data. At this point the relative performance of the network architectures is unclear, although reported errors are similar. The comparative benefits of using synthetic data with known motion versus in vivo data tracked with existing algorithms and manual review could be further compared. Additionally, it highlights the need for better methods of comparison between tracking methods as the field moves forward.

One of the strengths of this work is that the MRI simulator can be easily adapted to simulate other cardiac imaging and tagging methods. This could enable the network to be easily retrained for the different imaging parameters that might be used for different research or clinical applications, or even for other imaging techniques such as displacement encoding (Aletras et al., 1999), or non- or subtly-tagged bSSFP imaging (Evin et al., 2015; Schrauben et al., 2018).

Conversely, the method may not be as robust or as accurate as possible due to the use of only synthetic data, which may not capture all features found in real MR images. While no significant errors were seen that could be attributed to this technique, future work will investigate the effects of augmenting real in vivo data with the analytical deforming cardiac phantom during training. The range of simulated motions, however, was designed to a be a broad distribution of motion patterns to encompass different physiological, pathophysiological, and even non-physiological motion. Because of this the network is designed to be applicable to different pathologies and imaging locations.

Another unique attribute of using synthetic data for training the CNN is that it mitigates all patient privacy and HIPPA relate concerns, thereby enabling easier portability of the approach between clinical environments. Furthermore, as the data used for training is not based on a specific patient cohort, all issues of cohort bias are mitigated.

In conclusion, we developed an extensive synthetic cine MR image generator that can be used to train a CNN for motion tracking. We used the generator to train a CNN-based tag tacking algorithm and then tested the network in an analytical deforming cardiac phantom and in vivo data, showing very good agreement with both ground truth and manually tracked motion.

Supplementary Material

Download video file^{(11.3MB, mp4)}

NIHMS1743586-supplement-2.pdf^{(4.4MB, pdf)}

A synthetic data generator is demonstrated that creates deforming MR images with known motion paths.
The known motion paths are used to train a tracking network for grid tagged cardiac MR images.
The tracking network can track in vivo images with high accuracy after being trained with only synthetic training.

Acknowledgements:

This project was supported by NIH/NHLBI R01 HL131823, NIH/NHLBI R01 HL131975, and NIH/NHLBI R01 HL152256. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Aletras AH, Ding S, Balaban RS, Wen H, 1999. DENSE: displacement encoding with stimulated echoes in cardiac functional MRI. Journal of magnetic resonance (San Diego, Calif.: 1997) 137, 247. [DOI] [PMC free article] [PubMed] [Google Scholar]
Amini AA, Chen Y, Curwen RW, Mani V, Sun J, 1998. Coupled B-snake grids and constrained thin-plate splines for analysis of 2-D tissue deformations from tagged MRI. IEEE Transactions on Medical Imaging 17, 344–356. [DOI] [PubMed] [Google Scholar]
Barbosa IB, Cristani M, Caputo B, Rognhaugen A, Theoharis T, 2018. Looking beyond appearances: Synthetic training data for deep CNNs in reidentification. Computer Vision and Image Understanding 167, 50–62. doi: 10.1016/j.cviu.2017.12.002, arXiv:1701.03153. [DOI] [Google Scholar]
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng PA, Cetin I, Lekadir K, Camara O, Ballester MAG, et al. , 2018. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging 37, 2514–2525. [DOI] [PubMed] [Google Scholar]
Bistoquet A, Oshinski J, Škrinjar O, 2008. Myocardial deformation recovery from cine MRI using a nearly incompressible biventricular model. Medical image analysis 12, 69–85. [DOI] [PubMed] [Google Scholar]
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L, 2009. ImageNet: A Large-Scale Hierarchical Image Database, in: CVPR09. [Google Scholar]
Drangova M, Bowman B, Pelc NJ, 1996. Physiologic motion phantom for MRI applications. Journal of Magnetic Resonance Imaging 6, 513–518. [DOI] [PubMed] [Google Scholar]
Evin M, Cluzel P, Lamy J, Rosenbaum D, Kusmia S, Defrance C, Soulat G, Mousseaux E, Roux C, Clement K, et al. , 2015. Assessment of left atrial function by MRI myocardial feature tracking. Journal of Magnetic Resonance Imaging 42, 379–389. [DOI] [PubMed] [Google Scholar]
Fechter T, Baltas D, 2020. One-shot learning for deformable medical image registration and periodic motion tracking. IEEE Transactions on Medical Imaging 39, 2506–2517. doi: 10.1109/TMI.2020.2972616. [DOI] [PubMed] [Google Scholar]
Ferdian E, Suinesiaputra A, Fung K, Aung N, Lukaschuk E, Barutcu A, Maclean E, Paiva J, Piechnik SK, Neubauer S, et al. , 2020. Fully automated myocardial strain estimation from cardiovascular MRI–tagged images using a deep learning framework in the UK biobank. Radiology: Cardiothoracic Imaging 2, e190032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H, 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331. doi: 10.1016/j.neucom.2018.09.013, arXiv:1803.01229. [DOI] [Google Scholar]
Götte MJ, Germans T, Rüssel IK, Zwanenburg JJ, Marcus JT, van Rossum AC, van Veldhuisen DJ, 2006. Myocardial strain and torsion quantified by cardiovascular magnetic resonance tissue tagging: studies in normal and impaired left ventricular function. Journal of the American College of Cardiology 48, 2002–2011. [DOI] [PubMed] [Google Scholar]
He Haibo, Bai Yang, Garcia EA, Shutao Li, 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , 1322–1328URL: http://ieeexplore.ieee.org/document/4633969/, doi: 10.1109/IJCNN.2008.4633969. [DOI] [Google Scholar]
He K, Zhang X, Ren S, Sun J, 2016. Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. doi: 10.1109/CVPR.2016.90. [DOI] [Google Scholar]
Heimann T, Mountney P, John M, Ionasec R, 2014. Real-time ultrasound transducer localization in fluoroscopy images by transfer learning from synthetic training data. Medical Image Analysis 18, 1320–1328. URL: 10.1016/j.media.2014.04.007, doi: 10.1016/j.media.2014.04.007. [DOI] [PubMed] [Google Scholar]
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A, 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 1–10URL: http://arxiv.org/abs/1406.2227, arXiv:1406.2227.
Khened M, Kollerathu VA, Krishnamurthi G, 2019. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical image analysis 51, 21–45. [DOI] [PubMed] [Google Scholar]
Liu R, Lehman J, Molino P, Such FP, Frank E, Sergeev A, Yosinski J, 2018. An intriguing failing of convolutional neural networks and the coordconv solution, in: Advances in Neural Information Processing Systems, pp. 9605–9616. [Google Scholar]
Loshchilov I, Hutter F, 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 . [Google Scholar]
Mahmood F, Chen R, Durr NJ, 2018. Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training. IEEE Transactions on Medical Imaging 37, 2572–2581. doi: 10.1109/TMI.2018.2842767, arXiv:1711.06606. [DOI] [PubMed] [Google Scholar]
Moore CC, Lugo-Olivieri CH, McVeigh ER, Zerhouni EA, 2000. Three-dimensional systolic strain patterns in the normal human left ventricle: characterization with tagged MR imaging. Radiology 214, 453–466. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nam H, Han B, 2016. Learning multi-domain convolutional neural networks for visual tracking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302. [Google Scholar]
Nene SA, Nayar SK, Murase H, et al. , 1996. Columbia object image library (coil-20) . [Google Scholar]
Osman NF, Kerwin WS, McVeigh ER, Prince JL, 1999. Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonance imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42, 1048–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perotti LE, Magrath P, Verzhbinsky IA, Aliotta E, Moulin K, Ennis DB, 2017. Microstructurally anchored cardiac kinematics by combining in vivo DENSE MRI and cDTI, in: International Conference on Functional Imaging and Modeling of the Heart, Springer. pp. 381–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perotti LE, Verzhbinsky IA, Moulin K, Cork TE, Loecher M, Balzani D, Ennis DB, 2020. Estimating cardiomyofiber strain in vivo by solving a computational model. Medical Image Analysis 68, 101932. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prince JL, McVeigh ER, 1992. Motion estimation from tagged MR image sequences. IEEE transactions on medical imaging 11, 238–249. [DOI] [PubMed] [Google Scholar]
Ronneberger O, Fischer P, Brox T, 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer. pp. 234–241. [Google Scholar]
Scatteia A, Baritussio A, Bucciarelli-Ducci C, 2017. Strain imaging using cardiac magnetic resonance. Heart failure reviews 22, 465–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schrauben EM, Cowan BR, Greiser A, Young AA, 2018. Left ventricular function and regional strain with subtly-tagged steady-state free precession feature tracking. Journal of Magnetic Resonance Imaging 47, 787–797. [DOI] [PubMed] [Google Scholar]
Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R, 2017. Learning from simulated and unsupervised images through adversarial training. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, 2242–2251. doi: 10.1109/CVPR.2017.241, arXiv:1612.07828. [DOI] [Google Scholar]
Tobon-Gomez C, De Craene M, McLeod K, Tautz L, Shi W, Hennemuth A, Prakosa A, Wang H, Carr-White G, Kapetanakis S, Lutz A, Rasche V, Schaeffter T, Butakoff C, Friman O, Mansi T, Sermesant M, Zhuang X, Ourselin S, Peitgen HO, Pennec X, Razavi R, Rueckert D, Frangi A, Rhode K, 2013. Benchmarking framework for myocardial tracking and deformation algorithms: An open access database. Medical Image Analysis 17, 632 – 648. URL: http://www.sciencedirect.com/science/article/pii/S1361841513000388, doi: 10.1016/j.media.2013.03.008. [DOI] [PubMed] [Google Scholar]
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M, 2018. A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459. [Google Scholar]
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S, 2018. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2018-June, 1082–1090. doi: 10.1109/CVPRW.2018.00143, arXiv:1804.06516. [DOI] [Google Scholar]
Verzhbinsky IA, Perotti LE, Moulin K, Cork TE, Loecher M, Ennis DB, 2019. Estimating aggregate cardiomyocyte strain using in vivo diffusion and displacement encoded MRI. IEEE transactions on medical imaging . [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H, Amini AA, 2011. Cardiac motion and deformation recovery from MRI: a review. IEEE transactions on medical imaging 31, 487–503. [DOI] [PubMed] [Google Scholar]
Wu X, Liang L, Shi Y, Fomel S, 2019. FaultSeg3D: Using synthetic data sets to train an end-to-end convolutional neural network for 3D seismic fault segmentation. Geophysics 84, IM35–IM45. doi: 10.1190/geo2018-0646.1. [DOI] [Google Scholar]
Young AA, Kraitchman DL, Dougherty L, Axel L, 1995. Tracking and finite element analysis of stripe deformation in magnetic resonance tagging. IEEE Transactions on Medical Imaging 14, 413–421. [DOI] [PubMed] [Google Scholar]
Zhong X, Spottiswoode BS, Meyer CH, Kramer CM, Epstein FH, 2010. Imaging three-dimensional myocardial mechanics using navigator-gated volumetric spiral cine DENSE MRI. Magnetic resonance in medicine 64, 1089–1097. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Download video file^{(11.3MB, mp4)}

NIHMS1743586-supplement-2.pdf^{(4.4MB, pdf)}

[R1] Aletras AH, Ding S, Balaban RS, Wen H, 1999. DENSE: displacement encoding with stimulated echoes in cardiac functional MRI. Journal of magnetic resonance (San Diego, Calif.: 1997) 137, 247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Amini AA, Chen Y, Curwen RW, Mani V, Sun J, 1998. Coupled B-snake grids and constrained thin-plate splines for analysis of 2-D tissue deformations from tagged MRI. IEEE Transactions on Medical Imaging 17, 344–356. [DOI] [PubMed] [Google Scholar]

[R3] Barbosa IB, Cristani M, Caputo B, Rognhaugen A, Theoharis T, 2018. Looking beyond appearances: Synthetic training data for deep CNNs in reidentification. Computer Vision and Image Understanding 167, 50–62. doi: 10.1016/j.cviu.2017.12.002, arXiv:1701.03153. [DOI] [Google Scholar]

[R4] Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng PA, Cetin I, Lekadir K, Camara O, Ballester MAG, et al. , 2018. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging 37, 2514–2525. [DOI] [PubMed] [Google Scholar]

[R5] Bistoquet A, Oshinski J, Škrinjar O, 2008. Myocardial deformation recovery from cine MRI using a nearly incompressible biventricular model. Medical image analysis 12, 69–85. [DOI] [PubMed] [Google Scholar]

[R6] Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L, 2009. ImageNet: A Large-Scale Hierarchical Image Database, in: CVPR09. [Google Scholar]

[R7] Drangova M, Bowman B, Pelc NJ, 1996. Physiologic motion phantom for MRI applications. Journal of Magnetic Resonance Imaging 6, 513–518. [DOI] [PubMed] [Google Scholar]

[R8] Evin M, Cluzel P, Lamy J, Rosenbaum D, Kusmia S, Defrance C, Soulat G, Mousseaux E, Roux C, Clement K, et al. , 2015. Assessment of left atrial function by MRI myocardial feature tracking. Journal of Magnetic Resonance Imaging 42, 379–389. [DOI] [PubMed] [Google Scholar]

[R9] Fechter T, Baltas D, 2020. One-shot learning for deformable medical image registration and periodic motion tracking. IEEE Transactions on Medical Imaging 39, 2506–2517. doi: 10.1109/TMI.2020.2972616. [DOI] [PubMed] [Google Scholar]

[R10] Ferdian E, Suinesiaputra A, Fung K, Aung N, Lukaschuk E, Barutcu A, Maclean E, Paiva J, Piechnik SK, Neubauer S, et al. , 2020. Fully automated myocardial strain estimation from cardiovascular MRI–tagged images using a deep learning framework in the UK biobank. Radiology: Cardiothoracic Imaging 2, e190032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H, 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331. doi: 10.1016/j.neucom.2018.09.013, arXiv:1803.01229. [DOI] [Google Scholar]

[R12] Götte MJ, Germans T, Rüssel IK, Zwanenburg JJ, Marcus JT, van Rossum AC, van Veldhuisen DJ, 2006. Myocardial strain and torsion quantified by cardiovascular magnetic resonance tissue tagging: studies in normal and impaired left ventricular function. Journal of the American College of Cardiology 48, 2002–2011. [DOI] [PubMed] [Google Scholar]

[R13] He Haibo, Bai Yang, Garcia EA, Shutao Li, 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) , 1322–1328URL: http://ieeexplore.ieee.org/document/4633969/, doi: 10.1109/IJCNN.2008.4633969. [DOI] [Google Scholar]

[R14] He K, Zhang X, Ren S, Sun J, 2016. Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. doi: 10.1109/CVPR.2016.90. [DOI] [Google Scholar]

[R15] Heimann T, Mountney P, John M, Ionasec R, 2014. Real-time ultrasound transducer localization in fluoroscopy images by transfer learning from synthetic training data. Medical Image Analysis 18, 1320–1328. URL: 10.1016/j.media.2014.04.007, doi: 10.1016/j.media.2014.04.007. [DOI] [PubMed] [Google Scholar]

[R16] Jaderberg M, Simonyan K, Vedaldi A, Zisserman A, 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 1–10URL: http://arxiv.org/abs/1406.2227, arXiv:1406.2227.

[R17] Khened M, Kollerathu VA, Krishnamurthi G, 2019. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical image analysis 51, 21–45. [DOI] [PubMed] [Google Scholar]

[R18] Liu R, Lehman J, Molino P, Such FP, Frank E, Sergeev A, Yosinski J, 2018. An intriguing failing of convolutional neural networks and the coordconv solution, in: Advances in Neural Information Processing Systems, pp. 9605–9616. [Google Scholar]

[R19] Loshchilov I, Hutter F, 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 . [Google Scholar]

[R20] Mahmood F, Chen R, Durr NJ, 2018. Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training. IEEE Transactions on Medical Imaging 37, 2572–2581. doi: 10.1109/TMI.2018.2842767, arXiv:1711.06606. [DOI] [PubMed] [Google Scholar]

[R21] Moore CC, Lugo-Olivieri CH, McVeigh ER, Zerhouni EA, 2000. Three-dimensional systolic strain patterns in the normal human left ventricle: characterization with tagged MR imaging. Radiology 214, 453–466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Nam H, Han B, 2016. Learning multi-domain convolutional neural networks for visual tracking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302. [Google Scholar]

[R23] Nene SA, Nayar SK, Murase H, et al. , 1996. Columbia object image library (coil-20) . [Google Scholar]

[R24] Osman NF, Kerwin WS, McVeigh ER, Prince JL, 1999. Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonance imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42, 1048–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Perotti LE, Magrath P, Verzhbinsky IA, Aliotta E, Moulin K, Ennis DB, 2017. Microstructurally anchored cardiac kinematics by combining in vivo DENSE MRI and cDTI, in: International Conference on Functional Imaging and Modeling of the Heart, Springer. pp. 381–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Perotti LE, Verzhbinsky IA, Moulin K, Cork TE, Loecher M, Balzani D, Ennis DB, 2020. Estimating cardiomyofiber strain in vivo by solving a computational model. Medical Image Analysis 68, 101932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Prince JL, McVeigh ER, 1992. Motion estimation from tagged MR image sequences. IEEE transactions on medical imaging 11, 238–249. [DOI] [PubMed] [Google Scholar]

[R28] Ronneberger O, Fischer P, Brox T, 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer. pp. 234–241. [Google Scholar]

[R29] Scatteia A, Baritussio A, Bucciarelli-Ducci C, 2017. Strain imaging using cardiac magnetic resonance. Heart failure reviews 22, 465–476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Schrauben EM, Cowan BR, Greiser A, Young AA, 2018. Left ventricular function and regional strain with subtly-tagged steady-state free precession feature tracking. Journal of Magnetic Resonance Imaging 47, 787–797. [DOI] [PubMed] [Google Scholar]

[R31] Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R, 2017. Learning from simulated and unsupervised images through adversarial training. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, 2242–2251. doi: 10.1109/CVPR.2017.241, arXiv:1612.07828. [DOI] [Google Scholar]

[R32] Tobon-Gomez C, De Craene M, McLeod K, Tautz L, Shi W, Hennemuth A, Prakosa A, Wang H, Carr-White G, Kapetanakis S, Lutz A, Rasche V, Schaeffter T, Butakoff C, Friman O, Mansi T, Sermesant M, Zhuang X, Ourselin S, Peitgen HO, Pennec X, Razavi R, Rueckert D, Frangi A, Rhode K, 2013. Benchmarking framework for myocardial tracking and deformation algorithms: An open access database. Medical Image Analysis 17, 632 – 648. URL: http://www.sciencedirect.com/science/article/pii/S1361841513000388, doi: 10.1016/j.media.2013.03.008. [DOI] [PubMed] [Google Scholar]

[R33] Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M, 2018. A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459. [Google Scholar]

[R34] Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S, 2018. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2018-June, 1082–1090. doi: 10.1109/CVPRW.2018.00143, arXiv:1804.06516. [DOI] [Google Scholar]

[R35] Verzhbinsky IA, Perotti LE, Moulin K, Cork TE, Loecher M, Ennis DB, 2019. Estimating aggregate cardiomyocyte strain using in vivo diffusion and displacement encoded MRI. IEEE transactions on medical imaging . [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Wang H, Amini AA, 2011. Cardiac motion and deformation recovery from MRI: a review. IEEE transactions on medical imaging 31, 487–503. [DOI] [PubMed] [Google Scholar]

[R37] Wu X, Liang L, Shi Y, Fomel S, 2019. FaultSeg3D: Using synthetic data sets to train an end-to-end convolutional neural network for 3D seismic fault segmentation. Geophysics 84, IM35–IM45. doi: 10.1190/geo2018-0646.1. [DOI] [Google Scholar]

[R38] Young AA, Kraitchman DL, Dougherty L, Axel L, 1995. Tracking and finite element analysis of stripe deformation in magnetic resonance tagging. IEEE Transactions on Medical Imaging 14, 413–421. [DOI] [PubMed] [Google Scholar]

[R39] Zhong X, Spottiswoode BS, Meyer CH, Kramer CM, Epstein FH, 2010. Imaging three-dimensional myocardial mechanics using navigator-gated volumetric spiral cine DENSE MRI. Magnetic resonance in medicine 64, 1089–1097. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Using Synthetic Data Generation to Train a Cardiac Motion Tag Tracking Neural Network

Michael Loecher

Luigi E Perotti

Daniel B Ennis

Abstract

Graphical Abstract

1. Introduction