A Deep Boltzmann Machine-Driven Level Set Method for Heart Motion Tracking Using Cine MRI Images

Jian Wu; Thomas R Mazur; Su Ruan; Chunfeng Lian; Nalini Daniel; Hilary Lashmett; Laura Ochoa; Imran Zoberi; Mark A Anastasio; H Michael Gach; Sasa Mutic; Maria Thomas; Hua Li

doi:10.1016/j.media.2018.03.015

. Author manuscript; available in PMC: 2019 Jul 1.

Published in final edited form as: Med Image Anal. 2018 Apr 6;47:68–80. doi: 10.1016/j.media.2018.03.015

A Deep Boltzmann Machine-Driven Level Set Method for Heart Motion Tracking Using Cine MRI Images

Jian Wu ^a, Thomas R Mazur ^a, Su Ruan ^b, Chunfeng Lian ^b, Nalini Daniel ^a, Hilary Lashmett ^a, Laura Ochoa ^a, Imran Zoberi ^a, Mark A Anastasio ^c, H Michael Gach ^a, Sasa Mutic ^a, Maria Thomas ^a, Hua Li ^a

PMCID: PMC6501847 NIHMSID: NIHMS961636 PMID: 29679848

Abstract

Heart motion tracking for radiation therapy treatment planning can result in effective motion management strategies to minimize radiation-induced cardiotoxicity. However, automatic heart motion tracking is challenging due to factors that include the complex spatial relationship between the heart and its neighboring structures, dynamic changes in heart shape, and limited image contrast, resolution, and volume coverage. In this study, we developed and evaluated a deep generative shape model-driven level set method to address these challenges. The proposed heart motion tracking method makes use of a heart shape model that characterizes the statistical variations in heart shapes present in a training data set. This heart shape model was established by training a three-layered deep Boltzmann machine (DBM) in order to characterize both local and global heart shape variations. During the tracking phase, a distance regularized level-set evolution (DRLSE) method was applied to delineate the heart contour on each frame of a cine MRI image sequence. The trained shape model was embedded into the DRLSE method as a shape prior term to constrain an evolutional shape to reach the desired heart boundary. Frame-by-frame heart motion tracking was achieved by iteratively mapping the obtained heart contour for each frame to the next frame as a reliable initialization, and performing a level-set evolution. The performance of the proposed motion tracking method was demonstrated using thirty-eight coronal cine MRI image sequences.

Keywords: Heart motion tracking, generative shape model, deep Boltzmann machine, distance regularized level-set evolution, MRI-guided radiation therapy

Graphical Abstract

graphic file with name nihms-961636-f0001.jpg

1. Introduction

In radiation therapy, heart motion characterization can provide useful information for analyzing the risk of radiation-induced cardiotoxicity and establishing motion management strategies for optimized treatment delivery. Magnetic resonance imaging (MRI)-guided radiation therapy systems provide on-board cine images and allow systematic and quantitative investigations of heart motion during radiation treatment (Chen et al., 2015; Huang et al., 2015; Weygand et al., 2016). Manual delineation of heart contours is time-consuming and therefore not practically feasible for clinical workflows. There is a great need for automatic or semi-automatic motion tracking methods. However, automatic heart motion tracking is a very challenging task due to the complex spatial relationship between the heart and its neighboring structures, dynamic heart shape changes during involuntary motion, and the low-contrast/resolution of cine MRI images.

Automatic or semi-automatic heart motion tracking methods can be categorized as image-based (Lu et al., 2009; Huang et al., 2009), classification-based (Stalidis et al., 2002; Kedenburg et al., 2006), and deformable model-based methods (Petitjean and Dacher, 2011; Xu and Prince, 1998; Battani et al., 2003; Liu et al., 2013; Brieva et al., 2015; Feng et al., 2013; Liu et al., 2016). Deformable model-based methods, including snakes (Kass et al., 1988), level-set evolution (Osher and Sethian, 1988), and their variants (Xu and Prince, 1998; Battani et al., 2003; Liu et al., 2013; Brieva et al., 2015; Feng et al., 2013; Liu et al., 2016), have been widely applied for heart or ventricle segmentation on cardiac gated images. Level set-based methods evolve an initially defined contour or shape to achieve the boundaries of desired objects through the minimization of pre-defined energy functions. However, conventional level set methods (Chan and Vese, 2001; Li et al., 2008) lack the ability to handle curve irregularities during the evolution and might lose the evolution stability. The distance regularized level-set evolution (DRLSE) method (Li et al., 2010) includes an additional regularization term in the energy function to avoid shape irregularities and instability during the level set evolution process while eliminating the need for level set re-initialization. Liu and coauthors extended the DRLSE method to a two-level-set formulation specifically for segmenting endocardium and epicardium simultaneously by utilizing the spatial relationship between them (Liu et al., 2016). However, these methods only consider the intensity information in the images for the motion tracking and is prone to becoming trapped in local minima, resulting in the ˝boundary leaking˝ problem.

To improve segmentation accuracy, the level set energy function can include a penalty that encodes a priori information about the desired shapes, which is established by use of a set of representative training images with delineated shape contours. For example, two segmentation methods (Sun et al., 2005) and (Pham et al., 2014) have been proposed that utilized a statistical shape model that was trained with a set of representative left ventricle shapes observed in cardiac MRI images. However, most of the reported shape priors were generated based on linear combinations of static training shapes only, and are not well-suited for dynamic heart motion tracking.

Deep learning methods recently gained significant attention due to their excellent performance on robustly extracting features from complex images. Deep learning methods have been applied to solve various computer vision problems such as image classification (Guo et al., 2016; Gidaris and Komodakis, 2015; Zhang et al., 2015), object detection (Liang et al., 2015), image retrieval (Sun et al., 2014; Babenko et al., 2014), and semantic segmentation (Long et al., 2015). Recently, deep learning methods have also been applied in medical image segmentation. Two automatic MRI brain image segmentation methods (Pereira et al., 2016) and (Moeskops et al., 2016) have been proposed by using convolutional neural networks (CNN). In their methods, the CNN is used to learn a high-level representation and abstraction of each image patch of the brain MR images. A CNN-based method normally takes intensity information within regions-of-interest as input while neglecting the shape information carried by the interesting object boundary. Incorporating organ shape information into the segmentation process can potentially improve the accuracy of segmenting challenging anatomical organs like the heart.

Recently, there has been significant progress in applying generative models for medical image analysis (van Tulder and de Bruijne, 2016; Zhang et al., 2014; Agn et al., 2016). For example, Restricted Boltzmann Machines (RBM) (Hinton, 2010) has been widely used for classification, feature learning, filtering, and modeling (Larochelle and Bengio, 2008; Su et al., 2016; Melchior et al., 2017). The RBM is a powerful generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. The RBM has gained increasing attention in organ shape modeling for medical image segmentation applications (Zhang et al., 2014; Agn et al., 2016). A shape model based on a two-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM) has been proposed (Zhang et al., 2014) to model lung shapes for lung segmentation. Furthermore, a convolutional Restricted Boltzmann Machine (cRBM) was employed to form the tumor shape prior (Agn et al., 2016) due to its capability of modeling higher-order interactions between voxels through local connections to hidden units. Then, a Gaussian mixture model-based method was proposed for brain tumor segmentation by modeling the relationships between tumor intensities and using cRBM-driven shape information. Although the RBM can impose local constraints such as smoothness and continuity, traditional RBMs lack the ability to capture global properties of complicated shapes. By using structured inference on a Deep Belief Network (DBN) (Hinton et al., 2006), the global shape priors of left ventricle endocardium and epicardium (Ngo et al., 2017) can be learned and used for segmentation purposes. However, as a directed model, the DBN lacks the ability to receive feedback from a higher layer to a lower layer and might generate a sub-optimal shape model.

Another deep generative model, the deep Boltzmann machine (DBM) (Salakhutdinov and Hinton, 2009), has been proposed and yields the ability to build undirected probabilistic shape models. The DBM can build a strong probabilistic shape model of objects that meets the requirements of both realism and generalization by capturing more global and local constraints of object shapes than RBM and DBN. Realism means that samples from the model should look realistic. Generalization means that the model should yield the ability to generate realistic samples even if it is trained with a relatively small-sized training dataset. The DBM has been approved to be able to characterize various object shapes and its performance on realism and generalization was validated (Eslami et al., 2012, 2014). To date, the DBM has been mainly applied to model binary object shapes for computer vision applications (Eslami and Williams, 2012; Chen et al., 2013), and has not yet found widespread application in the medical imaging field, especially for organ segmentation in medical applications.

The heart undergoes continuous cardiac and respiratory motion. Therefore, it is more challenging to build a robust shape model for the heart in comparison with other static structures. Inspired by the strong shape representation ability of the DBM (Eslami et al., 2014), we propose to obtain shape constraints with a DBM and use the shape constraints for heart motion tracking on cine MR images. A three-layered DBM was employed to construct a reliable heart shape model that can effectively characterize both the global and local heart shape variations. The shape model was embedded into a distance regularized level set-based segmentation method for heart shape tracking by effectively combining the stability of DRLSE method with the strong generating ability of DBM-trained shape model. The tracking accuracy was measured by comparing the results to the average of two manual delineations in thirty-eight cine MRI datasets from nineteen volunteer cases.

2. Background: Deep Boltzmann Machine (DBM)

A Boltzmann machine is a generative stochastic neural network and Markov Random field capable of learning a probability distribution over its set of inputs. By imposing restrictions on the Boltzmann Machines network topology, the RBM was proposed to handle the time-consuming and difficult training process of Boltzmann Machines. The DBM is an extended RBM with multiple layers of hidden units (Hinton, 2010). The structures of DBM and RBM are shown in Figure 1. As shown in Figure 1(a), the RBM includes two layers. The first layer v is a vectorized representation of a 2D mask that is 1 inside the interesting region (e.g., heart region in this study) and zero outside. The second layer consists of hidden units h¹ which captures shape constraints (local smoothness and continuity) of object shape from the layer of v. The two layers are designed to fully connect to each other, meaning that each visual unit of v is connected to all hidden units of h¹ through undirected weighted connections. Also, there are no direct connections between the units in the same layer for restricting the computational cost of training process.

The standard structure of the RBM (left) and DBM (right) models.

The energy value of the state {v, h¹} in an RBM model is defined as:

ξ_{R B M} (v, h^{1}; θ) = - \sum_{i, j} W_{i j} v_{i} h_{j}^{1} - \sum_{j} a_{j} h_{j}^{1} - \sum_{i} b_{i} v_{i},

(1)

where i ranges over visual units v(1 ≤ i ≤ I), j ranges over hidden units h¹(1 ≤ j ≤ J ), W_ij is a matrix of weights associated with the connection between v_i and $h_{j}^{1}$ , with a_j and b_i as bias weights. By minimizing the energy function showed in Eq. (1) with a set of training data, model parameters θ = {W, a, b} can be learned. With the learned model parameters, the model captures local shape constraints of the objects of interest (such as edges or corners of the heart shape in this study) from the given training heart shapes v. Given v as a binary vector, a probability distribution over v and h¹ of the RBM is defined in terms of ξ_RBM (v, h¹; θ) as,

P (v, h^{1}; θ) = \frac{1}{Z} e^{- ξ_{R B M} (v, h^{1}; θ)},

(2)

where Z is a partition function defined as $\sum_{v} \sum_{h} e^{- ξ_{R B M} (v, h^{1}; θ)}$ .

As shown in Figure 1(b), the DBM is designed to provide a richer model than an RBM for capturing higher-order shape constraints by adding more layers of hidden units (Salakhutdinov and Hinton, 2009). A DBM with two hidden layers is used in this study to capture both local and global heart shape constraints and handle noise, location, and rotation changes of shapes of interest. In a three-layered DBM model, the energy value of the state {v, h¹, h²} is defined as:

ξ_{D B M} (v, h^{1}, h^{2}; θ) = - \sum_{i, j} W_{i j}^{1} v_{i} h_{j}^{1} - \sum_{j, k} W_{j k}^{2} h_{j}^{1} h_{k}^{2} - \sum_{j} a_{j}^{1} h_{j}^{1} - \sum_{k} a_{k}^{2} h_{k}^{2} - \sum_{i} b_{i} v_{i},

(3)

where k ranges over hidden units h²(1 ≤ k ≤ K), and model parameters θ = {W¹,W²,a¹,a²,b}. $W_{i j}^{1}$ are weights associating the connections between v_i and $h_{j}^{1}$ . $W_{j k}^{2}$ are weights associating the connections between $h_{j}^{1}$ and $h_{k}^{2}$ . a¹, a², and b were the biases relating to h¹, h², and v, respectively. A probability distribution for the three-layered DBM is defined as:

P (v, h^{1}, h^{2}; θ) = \frac{1}{Z} e^{- ξ_{D B M} (v, h^{1}, h^{2}; θ)},

(4)

where Z is a partition function defined as $\sum_{v} \sum_{h^{1}} \sum_{h^{2}} e^{- ξ_{D B M} (v, h^{1}, h^{2}; θ)}$ . As mentioned earlier, the purpose of shape model training is to determine the model parameters that minimize the energy function computed over the training set. The minimization is carried out by using a mean-field inference procedure Welling and Teh (2003) combined with a fully factorized approximation (Salakhutdinov and Hinton, 2009) to establish the probabilistic distribution model. Therefore, the trained shape model has the ability to generate realistic samples even if it is trained with a relatively small-sized training dataset.

3. Heart Motion Tracking Method

3.1. Method Overview

As shown in Figure 2, the proposed heart motion tracking method makes use of a heart shape model that characterizes the statistical variations in heart shapes from a training data set. The method includes the training phase and the motion tracking phase. In the training phase, a three-layered DBM was designed to effectively characterize local smoothness and continuity, and global consistency of the heart shape, and train the shape model.

Flowchart of the proposed DBM-driven heart motion tracking method.

In the motion tracking phase, a shape model-driven DRLSE method was applied for heart segmentation on each frame of the given set of cine MR images. A bounding box was manually delineated on the first frame of the cine and used as the initialization of the DBM-driven DRLSE method to obtain the heart contour on the same frame. The frame-by-frame heart motion tracking was achieved by iteratively mapping the obtained heart contour on each frame to the next adjacent frame as a reliable initialization and performing the level-set evolution. The shape priors inferred by the DBM model provided both local and global heart shape constraints to the level-set propagation for accurate motion tracking. Details regarding these steps are provided next.

3.2. Heart shape model training with a three-layered DBM

We used a three-layered DBM to establish a heart shape model in this study. The final model parameters {W¹, W², a¹, a², b} in Eq. (3) were obtained after performing a three-step strategy, including separated training, joint training, and fine tuning. First, the parameters {W¹, a¹, b} of the first hidden layer and {W ², a²} of the second layer were determined, in turn, by use of the persistent contrastive divergence (PCD) method (Hinton, 2002). Namely, after training the first layer of the model by itself, the parameters {W¹, a¹, b} were kept fixed and the second layer was trained to determine {W², a²}. Subsequently, {W¹, a¹, b} and {W², a²} were jointly refined using a mean-field inference procedure combined with a fully factorized approximation (Salakhutdinov and Hinton, 2009). Mean-field inference is a commonly used approximation for efficient inference in probabilistic graphical models with the advantage of converging toward the global solution in a shorter time compared to other global optimization methods (Koller and Friedman, 2009). Finally, {W¹, W², a¹, a², b} were further fine-tuned based on a standard backpropagation. Given a set of aligned heart shapes v as the training data set, the whole training process of heart shape model was completed by use of the above three-step strategy. The trained heart shape model was employed to generate shape estimation by observing a given shape contour, and guide the DRLSE-based heart segmentation process on each frame as described in Sections 3.3 and 3.4.

3.3. Energy formulation of the DBM-driven DRLSE method for heart segmentation

The DRLSE method was used to track the heart shape on each cine image frame. With a given initialization, the DRLSE method can provide stable evolution of the heart shape by maintaining a desired shape of the level set function and eliminating the need for re-initialization (Li et al., 2010). A two-valued level set function $ϕ : Ω \to ℜ^{2}$ representing the interface between two regions was defined as:

ϕ (x, y) = {\begin{matrix} - c, & (x, y) \in Ω_{i n} \\ c, & (x, y) \in Ω_{o u t} \end{matrix},

(5)

where (x, y) represents the two-dimensional coordinates of a pixel in an image, Ω_in and Ω_out are the portion of regions inside and outside of the heart, respectively, and c > 0 is a constant. An energy function ξ(ϕ) was defined to estimate the target heart shape through a minimization process. In the traditional DRLSE method (Li et al., 2010), the energy function considers only the distance-based regularization, image region intensity, and image gradient terms. We re-defined the energy function ξ(ϕ) with an additional DBM-driven shape constraint term S(ϕ, ψ; θ) to guide the evolution of ϕ with the estimated shape ψ inferred from the trained DBM learning model. The energy function ξ(ϕ) is defined as:

ξ (ϕ) = μ R_{p} (ϕ) + λ L_{g} (ϕ) + α A_{g} (ϕ) + β S (ϕ, ψ; θ),

(6)

where µ > 0, λ > 0, β > 0 and $α \in ℜ$ . R_p(ϕ) is the distance regularization term, L_g(ϕ) is the weighted image gradient term, A_g(ϕ) is the weighted region term, and S(ϕ, ψ; θ) is the shape constraint term. These terms are defined as

R_{p} (ϕ) = \int_{Ω} p (| \nabla ϕ |) d Ω,

(7)

L_{g} (ϕ) = \int_{Ω} g δ (ϕ) | \nabla ϕ | d Ω,

(8)

A_{g} (ϕ) = \int_{Ω} g H (- ϕ) d Ω,

(9)

S (ϕ, ψ; θ) = \int_{Ω} {(H (- ϕ) - H (- ψ))}^{2} d Ω,

(10)

where $p (s) = \frac{1}{2} {(s - 1)}^{2}$ is a potential function, $g = \frac{1}{1 + {| \nabla G_{σ *} I |}^{2}}$ is an edge indicator function with G_σ as the Gaussian kernel and I as the image frame, δ is the Dirac delta function, and H is the Heaviside function. ψ(x, y; θ) was defined in $Ω \to ℜ^{2}$ as:

ψ (x, y; θ) = {\begin{matrix} - c, & s (x, y) = 1 \\ c & s (x, y) = 0 \end{matrix},

(11)

where s represents the mask of shape constraints inferred from the DBM-trained shape model by a layer-wise block-Gibbs sampling strategy (as shown in Algorithm 2 in Section 3.4), and c > 0 is a constant. The shape constraint S(ϕ, ψ; θ) could be considered as high-level shape priors inferred from the DBM-trained model for regularizing the target shape.

3.4. Energy minimization for heart shape segmentation

The target heart shape ϕ was estimated using image data and DBM-driven shape constraints through the minimization of the energy function ξ(ϕ) shown in Eq. (6). By integrating Eq. (7, 8, 9, and 10), the energy function ξ(ϕ) in Eq. (6) can be represented as:

ξ_{ε} (ϕ) = μ \int_{Ω} p (| \nabla ϕ |) d Ω + λ \int_{Ω} g δ_{ε} (ϕ) | \nabla ϕ | d Ω + α \int_{Ω} g H_{ε} (- ϕ) d Ω + β \int_{Ω} {(H_{ε} (- ϕ) - H_{ε} (- ψ))}^{2} d Ω,

(12)

where

δ_{ε} (x) = {\begin{cases} \frac{1}{2 ε} [1 + c o s (\frac{π x}{ε})], | x | \leq ε \\ 0, | x | > ε \end{cases},

(13)

H_{ε} (x) = {\begin{cases} \frac{1}{2 ε} (1 + \frac{x}{ε} + \frac{1}{π} s i n (\frac{π x}{ε})), | x | \leq ε \\ 1, x > ε \\ 0, x < - ε \end{cases},

(14)

with the parameter ε usually set to 1.5.

A standard method to minimize an energy function ξ_ε(ϕ) is to find the steady state solution of the gradient flow equation (Aubert and Kornprobst, 2006). Therefore, the minimization was achieved by solving the following gradient flow:

\frac{\partial (ϕ)}{\partial (t)} = μ d i v (d_{p} (| \nabla ϕ |) \nabla ϕ) + λ δ_{ε} (ϕ) d i v (g \frac{\nabla ϕ}{| \nabla ϕ |}) + α g δ_{ε} (ϕ) + 2 β δ_{ε} (H_{ε} (- ϕ) - H_{ε} (- ψ)),

(15)

where t is a temporal variable. δ_ε is the derivative of H_ε, i.e., $H_{ε}^{'} = δ_{ε}$ . The energy minimization process, i.e., the procedure of heart segmentation for each cine MRI frame is shown in Algorithm 1 below.

During each iteration of the energy minimization process, the estimated shape ψ in the shape constraint term is inferred from the trained DBM learning model using zero level set from the previous DRLSE iteration as the input shape. With the DRLSE evolution, zero level set from the previous iteration is getting closer and closer to the heart boundary. Therefore, a more accurate inferred shape sample is obtained and can further help guide the DRLSE evolution in the current iteration. The layer-wise block-Gibbs sampling strategy, as shown in Figure 3, is used to infer an updated shape ψ based on the input shape, and includes the inference phase and generation phase.

Layer-wise block-Gibbs sampling scheme for shape prior update.

In the inference phase, as marked in Figure 3, the high-order constraints of heart shape are inferred from low-level features of the heart contour with a bottom-up connection strategy. After updating the states of the visual layer v with the zero level set from the previous iteration, the inference process updates the hidden units h¹ first and then updates the states of the hidden units h² by Eq. (16) and Eq. (17) shown below:

h_{j}^{1} = B (f (P (h_{j}^{1} = 1 | v; h^{2}))) = B (f (\sum_{i} W_{i j}^{1} v_{i} + \sum_{k} W_{j k}^{2} h_{j}^{2} + a_{j}^{1})),

(16)

h_{k}^{2} = B (f (P (h_{k}^{2} = 1 | h^{1}))) = B (f (\sum_{j} W_{j k}^{2} h_{j}^{1}) + a_{k}^{2}),

(17)

where f (t) denotes the logistic sigmoid function $f (t) = \frac{1}{1 + e^{(- t)}}$ , and B(n) denotes the Bernoulli distribution which is a probability function B(n) = (n > rand(1)) where n is a random value between 0 and 1 (Eslami et al., 2014).

In the generation phase, the top-down connections generate low-level features of the heart contour from the high-order constraints of heart shape. The generation process updates the hidden units h² first, and then updates the states of all the hidden units h¹ and visible units v. The hidden layer h¹ to be updated is conditioned on the current state of v and h², as indicated in Eq. (16). The visual layer v is updated by:

v_{i} = B (f (P (v_{i} = 1 | h^{1}))) = B (f (\sum_{j} W_{i j}^{1} h_{j}^{1} + b_{i})) .

(18)

Algorithm 2 below describes the iterative process of updating the shape priors ψ. The number of iterations M was set to 10 based on the fast convergence ability of layer-wise block-Gibbs sampling strategy. Also, we used a morphological opening operation to remove any potential holes in the generated binary mask.

4. Experimental Studies

Nineteen volunteers received thoracic cine MR images after informed consent was obtained in accordance with an Institutional Review Board approved protocol. MR images were acquired using a Philips Ingenia 1.5 T MRI scanner (Cleveland, OH). The MRI protocol included two three-plane localizers covering the thorax and abdomen, one acquired with free-breathing and the other acquired during an inhalation breath-hold. For each volunteer, 2D coronal heart cine MR images, were acquired during free-breathing and an inhalation breath-hold. Cardiac cine MR images were acquired using free-running coronal steady-state free precession balanced fast field echo (B-FFE) acquisitions [1.3 ms echo time (TE), 2.7 ms repetition time (TR), 140 × 142 matrix, 350 mm ×350 mm field of view (FOV), 5 mm slice thickness, ∼2 mm in-plane resolution, 60° flip angle, 1417 Hz/pixel bandwidth, 378 ms frame rate]. All the analyzed coronal images were acquired in the roughly same anatomic locations in the AP direction across volunteers. Considering the difference of heart size across volunteers, we accomplished the data acquisition as the following steps: (1) For each subject, we acquired 4 breath-hold and 4 free-breathing coronal slices. According to the heart sizes of different volunteers, we adjusted the slice position to make sure that the images are equally distributed to cover the heart. (2) For acquiring coronal images, the first slice is located at the posterior part of the heart and above the left atrium. By setting up the slice spacing about 15 ∼ 20 mm depending on heart size of subjects, the slice will cover to the right ventricle. Sixty frames were acquired for every breath-hold and free-breathing cine MRI sequences.

The acquired thirty-eight cine MRI datasets were randomly divided into a group of seven training datasets and two groups of testing datasets, as shown in Figure 4. The group of training datasets includes 14 datasets from randomly selected 7 of the total 19 volunteers. For these 14 datasets, the heart was manually delineated by two radiation oncologists for each of the 60 frames. The average of the manual contouring results were used as the ground truth to train the DBM model and validate the performance of the proposed method. The delineated heart shape on the first 50 frames of the training datasets were used to train the DBM-based heart shape model. The delineated contours on the remaining 10 frames formed the first group of testing datasets to evaluate the performance of the proposed heart motion tracking method. The second group of testing datasets included 24 datasets from the remaining 12 volunteers, and were used as unseen testing datasets to further validate the performance of the proposed method. Sequential ungated single-shot balanced steady-state free precession acquisition was used in this study. The image acquisition time (and thus temporal resolution) was 0.38 s/frame with the frame rate was 2.6 frames/s. This kind of images consider motion blurring and can provide guidance on motion management for radiation therapy. Considering this frame rate, 60 frames can cover about 29 heart beat cycles (Note that the average duration of one cardiac cycle is 0.8 seconds (Abdi et al. 2015)). The test on these images can cover multiple cardiac cycles and are suitable for radiation therapy purposes. In addition, the start time, with respect to the R-wave, for acquiring the first frame in each cardiac cycle varies, which allows for different regions of the cardiac cycle to be modeled by each consecutive frame. Therefore, the trained DBM-based heart shape model is more robust even in low temporal resolution.

The separation of all the cine MRI datasets.

For this second group of testing datasets, the heart was manually delineated by radiation oncologists for 60 frames of each dataset. The delineations were used for performance evaluation of heart tracking accuracy. Both qualitative and quantitative accuracy evaluations of the proposed method and a comparison with four other segmentation methods were conducted.

The three-layered DBM in this study was set to include 2304 binary visible units for the visual layer v, and 2000 and 500 hidden units for h¹ and h², respectively. The number of visual units was decided based on the pixel size of a bounding box (48×48 pixel) enclosing the heart. In the DBM training process (described in Section 3.2), 3000 and 1000 iterations were used for separately training the first hidden layer and second hidden layer, respectively, while 1000 iterations were used for joint-training the hidden layers. Finally, the model parameters were fine-tuned by a standard backpropagation for 1000 iterations. The coefficients in Eq. (15) for energy minimization process were set as: µ = 0.04, λ = 5, α = 1.5, β = 0.03 based on the training datasets by considering both the stability of the curve evolution and the effectiveness of the shape prior-based guidance.

5. Experimental Results

5.1. Heart motion tracking accuracy

The contour evolution process is illustrated in Figure 5. Green contours denote ground truth contours which are generated based on the average of the manual contours delineated by two radiation oncologists. The red contours present the tracking results from our proposed method. The shape priors inferred by the DBM-trained shape model ensure that the evolutional contour follows local and global constraints and achieves the tracking results.

Example of curve evolution process for heart segmentation of the proposed method. The upper and lower rows show the tracking results on the first frame of one breath-hold and one free-breathing set of cine MR images from two different volunteers. From left to right: the initialization, and the evolution results after 5, 10, 20, and 30 iterations.

5.2. Experimental comparison with other segmentation methods

In this study, in addition to the comparison with other two level set based methods such as Chan-Vese (Chan and Vese, 2001) and DRLSE (Li et al., 2010), we also compared our method with other two shape prior-based methods. One is the DRLSE method enhanced with convolutional restricted Boltzmann machines (cRBM)-based shape priors (Agn et al., 2016) and the other one is the DRLSE method enhanced with partial DBM method (Wu et al., 2017). cRBM, as a variant of the RBM model, was employed in the work (Agn et al., 2016) to form the tumor shape prior by modeling higher-order interactions between voxels through local connections to hidden units. Figure 6 shows the comparison results of our method with these four methods on the first frame of three breath-hold and three free-breathing cine MRI sequences. As shown in Figure 6, the Chan-Vese and DRLSE methods could not handle the ˝boundary leaking˝ problem in delineating dynamic heart boundaries, mainly because they do not provide reliable solutions from insufficient low-level image information. By incorporating cRBM-based shape priors into the DRLSE method, the cRBM-based method can provide more accurate results than the traditional DRLSE method. In our DBM-based DRLSE method, we effectively combines the stability of DRLSE method with the strong generating ability of DBM-trained shape model. Thus, with the same initialization, our method performs better than four other methods.

Comparison of the proposed tracking method with other four methods on examples of three breath-hold and three free-breathing cine MRI frames from three randomly selected volunteer cases of the unseen datasets. Green contours denote manual segmentation while the red contours present the results from five automatic methods. From left to right: (i) the tested MRI cine frame with initialization; (ii) Chan-Vese level set method result; (iii) DRLSE method result; (iv) DRLSE method with cRBM based shape priors result; (v) DRLSE method with pDBM result; (vi) the proposed DBM method result.

5.3. Quantitative evaluation of motion tracking results

We compared the tracking results with ground truth. A spatial overlap index, the dice similarity coefficient (DSC), and a contour distance measure, the mean margin error (MME) (Li et al., 2016), were used to evaluate the consistency between the automatic result and the ground truth. The two metrics were defined as:

D S C = \frac{2 | A \cap B |}{| A | + | B |}, M M E = \sum_{i = 1}^{Q} \frac{\sqrt{{(u_{i} - y_{i})}^{2}}}{Q},

(19)

where the DSC value denotes the degree of overlap with DSC∈ [0, 1]. Sets A and B consist of pixels of the automatic result and the ground truth, respectively. Coefficient u_i denotes the coordinate of the ith contour point of the automatic result, and y_i is the coordinate of the contour point of the ground truth which is the closest to u_i. Scalar Q is the number of contour points of the automatic result. The consistency between the automatic result and the ground truth is proportional to the DSC and inversely proportional to the MME.

Table 1 compares the means and standard deviations of the DSC and MME from the proposed method and the other four methods on breath-hold, free-breathing for the first group of testing dataset. The proposed method achieved an average DSC 90% on breath-hold cine MR images, 88% for free-breathing cine MR images, and an overall average DSC of 89%, indicating good agreement between the proposed method and the ground truth. The proposed method yields higher DSC and lower MME compared to the other four methods. The overall DSC is higher on the breath-hold cine MR images compared with that on free-breathing cine MR images. Figure 7 compares the DSC and MME scores between the five methods.

Table 1:

Accuracy evaluation of heart motion tracking on the first group of testing datasets.

	Breath-Hold Cine MR images
	Average DSC (%) (Mean ± STD)	Average MME(mm) (Mean ± STD)
Chan-Vese	63.72 ± 6.65	21.35 ± 7.37
DRLSE	77.51 ± 4.16	7.87 ± 2.18
DRLSE with cRBM	80.82 ± 4.22	7.50 ± 1.17
DRLSE with pDBM	86.51 ± 3.45	5.93 ± 1.32
The proposed method	89.55 ± 2.26	4.26 ± 1.88
	Free-Breathing Cine MR images
	Average DSC (%) (Mean ± STD)	Average MME(mm) (Mean ± STD)
Chan-Vese	55.75 ± 6.28	31.56 ± 2.74
DRLSE	70.65 ± 4.59	12.93 ± 2.26
DRLSE with cRBM	76.67 ± 4.91	8.10 ± 0.64
DRLSE with pDBM	83.23 ± 4.67	6.49 ± 1.41
The proposed method	88.42 ± 2.49	4.48 ± 1.49
	Overall Cine MR images
	Average DSC (%) (Mean ± STD)	Average MME(mm) (Mean ± STD)
Chan-Vese	59.74 ± 6.47	28.01 ± 8.70
DRLSE	74.08 ± 4.38	10.40 ± 3.37
DRLSE with cRBM	78.75 ± 4.57	7.82 ± 0.95
DRLSE with pDBM	84.87 ± 4.06	6.21 ± 1.35
The proposed method	88.99 ± 2.38	4.37 ± 1.70

Open in a new tab

Comparison of the proposed tracking method with four other methods for the first subset. Whiskers represent the maximum and minimum values.

The agreement between the heart areas obtained by the proposed method and the ground truth is depicted in the Bland-Altman plot (Bland and Altman, 2010), as shown in Figure 8. Given the cine frame F, let F₁ and F₂ be the heart areas of ground truth and automatic tracking results, respectively. The Cartesian coordinates of each sample F in the Bland-Altman plot are determined by:

F (x, y) = (\frac{F_{1} + F_{2}}{2}, F_{1} - F_{2}),

(20)

where x presents the average of two delineation results while y shows the difference between them. The red line denotes the mean difference of F₁ and F₂ on all cine frames, while the black lines denote the confidence interval which is the mean plus/minus two standard deviations of the mean, respectively. The majority of data points fall in the confidence interval, indicating good agreement between the tracking results with the manual segmentation. Good agreement was observed for both groups of testing datasets as well.

Bland-Altman plot of the tracking results from the proposed method compared to the ground truth for breath-hold (left) and free-breathing (right) MR images. The upper row shows the results on the first group of testing datasets while the lower row on the second group of testing datasets.

Table 2 compares the tracking accuracy of the second group of unseen testing datasets to the first group of testing datasets. The separation of the two groups of testing datasets are described in Section 4. As shown in the Table, the proposed method achieved similar results for the second group unseen testing datasets with an average DSC of 89% on the breath-hold cine MR images and 87% on the free-breathing cine sequences.

Table 2:

Accuracy Evaluation of Heart Motion Tracking on the Second Group.

Image Datasets	The Second Group of Testing Datasets		The first Group of Testing Datasets
	Average DSC (%)	Average MME(mm)	Average DSC (%)	Average MME(mm)
Breath-hold Cines	89.11 ± 2.86	4.04 ± 1.16	89.55 ± 2.26	4.26 ± 1.88
Free-Breathing Cines	87.17 ± 3.68	4.66 ± 1.48	88.42 ± 2.49	4.48 ± 1.49
Overall Cines	88.14 ± 3.41	4.35 ± 1.36	88.99 ± 2.38	4.37 ± 1.70

Open in a new tab

In addition, the convergence time is an important factor which determines the performance of a segmentation method. A comparison of computational cost for different methods is shown in Table 3. Although in our proposed method, additional Gibbs sampling process is embedded in each DRLSE level set evolution, less iterations are needed for convergence compared to the original DRLSE method (30 vs. 120 as shown in Table 3). Therefore, the overall computational cost of our method is slightly higher but still within a comparable level.

Table 3:

Average convergence time for all the five methods on testing datasets.

Method	Convergence Time (s/frame)	The Number of Iterations
Chan-Vese	0.949	2000
DRLSE	3.731	120
DRLSE with cRBM	4.175	30
DRLSE with pDBM	4.134	120
The proposed method	4.618	30

Open in a new tab

5.4. Quantification of heart motion between breath-hold and free-breathing cine images

Investigating the significant difference of heart shape under breath-hold and free-breathing can help on designing more appropriate motion management in radiation therapy. For patients treated under different situations (e.g., breath-hold or free-breathing during the radiation treatment), different sized margins should be added accordingly with the purpose to maximize the treatment of tumor and minimize the un-wanted irradiation to the heart. Based on the tracking results, the motion of the heart was quantified. As illustrated in Figure 9, the heart movement for each image frame was quantified with respect to the cine frame with the minimum heart area of the same MRI by six geometrical measures on the detected heart contour. These six measures were defined as the heart area change (AC), the heart area centroid shift (CS), and the maximum displacements along the right, left, superior, and inferior directions: DR, DL, DS, and DI. DR = |MinX_i − MinX_r|, DL = |MaxX_i − MaxX_r|, DS = |MinY_i − MinY_r|, and DI = |MaxY_i − MaxY_r|, of which MinX_i, MaxX_i, MinY_i, and MaxY_i represent either the minimum or maximum coordinates of the heart region boundaries along the X or Y direction.

Quantitative measurements of heart motion. (a) The reference cine frame and the heart contour. (b) The i-th cine frame and the heart contour of the same cine sequence.

Maximum heart area changes on each breath-hold and free-breathing cine MRI of the nineteen volunteers were evaluated and the result is shown in Figure 10(a). The result shows that heart motion on free-breathing cine sequences is larger than that on breath-hold cine sequences. Respiratory motion contributes to the amplitude of heart motion acquired in the low-frame-rate cine images. The increased heart motion in free breathing versus breathhold is caused by the combination of the cardiac and respiratory motion components. Similarly, the quantitative measurements of centroid shift are depicted in Figure 10(b). The centroid shift of the heart was 2 to 4 mm for breath-hold and 3 to 8 mm for free-breathing for all nineteen volunteers. A t-test indicated that there was a significant difference in centroid shift between breath-hold and free-breathing (p < 0.001).

The maximum heart area changes (a) and the centroid shift (b) of the tracking results of all nineteen test volunteer datasets. The cine frame with the minimum heart area of the same MRI was used as a reference for the measurement.

In Figure 11, the average percentage of frames that yield area changes larger than 50, 100, 200 and 300 mm² were evaluated separately on the breath-hold and free-breathing MR images. For each cine sequence, the cine frame with the minimum heart area was used as the reference. Figure 11 further confirms the observations shown in Figure 10(a) that heart motion on free-breathing cine sequences is larger than that on breath-hold cine sequences.

The percentage of frames yielding heart area changes larger than 50, 100, 200, and 300 mm².

Quantitative measurements of heart displacements along anterior, posterior, superior and inferior directions were conducted for all testing datasets from the nineteen volunteers (Figure 12). For each set of cine MR images, the displacements were calculated on each direction using the cine frame with the minimum heart area as a reference. Statistically significant differences were observed between the inferior and superior directions (p < 0.001 for both breath-hold and free-breathing cine MR images). The superior displacement was found to be larger than the inferior displacement.

The average displacements (mean, standard deviation) on each direction for all thirty-eight volunteer datasets. *** represents statistically significant difference between the inferior and superior directions (*p <* 0.001).

6. Discussion

More accurate tracking results were achieved by the proposed method compared to other four methods mainly due to the reasons described below. First, the DBM-trained shape model can capture the local and global properties of heart contours that can constrain the level-set evolution to follow global shape consistency and preserve local smoothness and continuity. Second, a layer-wise block-Gibbs sampling strategy was used to infer the heart shape priors during the level-set evolution process. In each DRLSE iteration, the zero level set ϕ obtained from the previous iteration is assigned to ψ, which is used as the input of the Gibbs sampling scheme to infer an updated shape sample ψ and will be used as the initialization for the next iteration in the DRLSE method. By iteratively performing the inference phase and generation phase, the strategy can infer shape priors that are incorporated into the level-set evolution to guarantee the realism and generalization of the final heart contour. Third, the proposed shape-driven DRLSE method integrated image intensity, image gradient, distance regularization, and local/global shape priors in the level-set evolution process to further improve the accuracy of heart tracking on each cine frame. Fourth, due to the continuous motion of heart, using the tracking results from the previous cine frame as the initialization for the heart tracking on the current frame can improve both the computational cost and tracking accuracy than using an arbitrary initialization.

In addition, as shown in Figure 6(ii), the Chan-Vese level set model yielded undesirable fragmental regions. The ˝boundary leaking˝ occurred because of excessive dependence on regional intensity. Similarly, as shown in Figure 6(iii), the DRLSE method ensured the stability of the level set evolution and avoided fragmental regions. However, the method is still subject to the ˝boundary leaking˝ problem, especially for the delineation of right and superior heart boundaries. The issues affecting the two methods are mainly due to the lack of shape prior constraints within the level-set evolution process. By incorporating cRBM-based shape priors into the DRLSE method, the tracking results, shown in Figure 6(iv), were visually more accurate than the traditional DRLSE method. However, this third method still suffers from insufficient constraint of global consistency. The segmentation accuracy of pDBM was lower than the proposed method due to its inability to incorporate image intensity and gradient information into the tracking process, as shown in Figure 6(v).

In the proposed method, the shape priors inferred by the DBM-trained shape model can effectively constrain the evolutional heart shape. The proposed method can accurately identify the heart boundaries and yield ideal segmentation despite cardiac and respiratory motion, as shown in Figure 6(vi). The quantitative evaluation shown in Table 1 and Figure 7 confirmed the higher performance of the proposed method.

For all the tested cine sequences in this study, we used a rectangle contour as the initialization to delineate the heart on the first frame of each cine sequence based on the proposed DBM-based DRLSE method. Our method yielded good results using this very general initialization. If the initialization used to segment the heart on the first frame is closer to the real heart boundary, the results should be at least as good as the current one. Also, the computational cost of delineating the heart contour on the first cine frame could be decreased because a lower number of DRLSE iterations are needed. In other words, the delineation accuracy of the proposed method will not be affected by the initialization, but the computational efficiency can be improved if the initialization is more similar to the shape, and closer to the location of the heart.

The experimental results shown in Section 5.4 demonstrate that the amplitude of heart motion is larger on the free-breathing cines compared to that on breath-hold cines, which indicates that respiratory motion is an important factor affecting heart shape on noncardiac gating acquisitions. In this study, we used sequential ungated single-shot TrueFISP acquisitions that are commonly used for acquiring images for motion tracking analysis in radiation therapy. Due to the low frame rate used for the image acquisition, the heart shape in each cine frame is larger than the static shape due to the internal organ motion including the respiratory motion. Furthermore, statistically significant differences were revealed between the displacements on inferior and superior directions in both breath-hold and free-breathing MR images. Further studies will be conducted to quantitatively investigate the respiratory effect and provide useful information for motion management in treatment plan delivery and radiation-induced heart toxicity analysis.

We plan to improve the proposed method in the following directions as well. We will study DBM-based multimodal learning (Srivastava and Salakhutdinov, 2012) in order to train a more comprehensive and robust heart shape model using multiple types of information contained in images. The heart in cine MR images is associated with multi-source information, including image intensity and features within the heart region, heart shape, and dynamic changes of heart region related information. Information from each source can provide distinct statistical properties of the heart. A more comprehensive generative shape model can be trained and used for more accurate tracking by fusing the multimodal information from the training data set with a joint representation and using it in the DBM.

In the future, the proposed method could be deployed on a GPU platform to improve the model training speed and fit computational speed requirements of specific clinical applications, e.g., real-time beam gating. The current model training process based on a Matlab implementation on a 6-core, 3.50 GHz personal computer with 16 GB of memory achieved heart tracking at a speed of less than 5 seconds per frame. The computational performance may be satisfactory for off-line motion quantification and assessment but not for real-time tracking. The proposed method has the potential for real-time motion management in radiotherapy with GPU-based fast computation.

Also, learning the shape prior from the 2D heart shape might be affected by variations of viewpoints. In this study, we designed the data acquisition strategy to minimize the effect. Also, one of our on-going research work is applying DBM on 3D heart motion tracking for radiation therapy. We will compare the performance of the current study with the 3D shape prior based heart motion tracking and hope to provide more information and guidance on heart-related radiation therapy.

In this study, fourteen cine MRI datasets from seven volunteer cases was used to generate the training set for the heart shape. The trained shape model can describe the data variations more accurately using more datasets. Thus, we are collecting more datasets for model training and performance verification.

7. Conclusion

A robust and stable automatic heart motion tracking method should be able to handle the complex spatial relationship between the heart and its neighboring structures and dynamic heart shape changes during involuntary motion. In this study, we demonstrated a deep generative shape model-driven method that can automatically track the motion of the heart, a complex and highly deformable organ, on cine MRI images. The experimental results showed that the trained shape model was able to characterize both the global and local shape properties of the heart, and assist in accurate heart tracking. The tracking results can be applied to heart motion pattern analyses, and for the evaluation of the potential risk of cardiotoxicity induced by external beam radiation treatment of the thorax.

Supplementary Material

Download video file^{(1.5MB, mp4)}

Highlights.

The DBM needs small-sized data set to train, but imposes strong modeling ability.
A three-layer DBM can capture both local and global properties of heart contours.
An efficient layer-wise block-Gibbs sampling is used to infer heart shape priors.
The DBM-induced heart shape priors are used as constraints of DRLSE evolution.

8. Acknowledgments

This work was supported in part by award NIH EB02016802.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Agn M, Law I, af Rosenschöld PM, Van Leemput K, 2016. A generative model for segmentation of tumor and organs-at-risk for radiation therapy planning of glioblastoma patients, in: SPIE Medical Imaging, International Society for Optics and Photonics; pp. 97841D–97841D. [Google Scholar]
Aubert G, Kornprobst P, 2006. Mathematical problems in image processing: partial differential equations and the calculus of variations volume 147 Springer Science & Business Media. [Google Scholar]
Babenko A, Slesarev A, Chigorin A, Lempitsky V, 2014. Neural codes for image retrieval, in: European conference on computer vision, Springer; pp. 584–599. [Google Scholar]
Battani R, Corsi C, Sarti A, Lamberti C, Piva T, Fattori R, 2003. Estimation of right ventricular volume without geometrical assumptions utilizing cardiac magnetic resonance data, in: Computers in Cardiology, IEEE; pp. 81–84. [Google Scholar]
Bland JM, Altman DG, 2010. Statistical methods for assessing agreement between two methods of clinical measurement. International Journal of Nursing Studies 47, 931–936. [PubMed] [Google Scholar]
Brieva J, Moya-Albor E, Escalante-Ramırez B, 2015. A level set approach for left ventricle detection in ct images using shape segmentation and optical flow, in: International Symposium on Medical Information Processing and Analysis, International Society for Optics and Photonics; pp. 92870K–92870K. [Google Scholar]
Chan TF, Vese LA, 2001. Active contours without edges. IEEE Transactions on Image Processing 10, 266–277. [DOI] [PubMed] [Google Scholar]
Chen F, Yu H, Hu R, Zeng X, 2013. Deep learning shape priors for object segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1870–1877. [Google Scholar]
Chen T, Reyhan M, Yue N, Metaxas DN, Haffty BG, Goyal S, 2015. Tagged mri based cardiac motion modeling and toxicity evaluation in breast cancer radiotherapy. Frontiers in oncology 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eslami S, Nicolas H, John W, 2012. The shape boltzmann machine: a strong model of object shape, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. [Google Scholar]
Eslami S, Williams C, 2012. A generative model for parts-based object segmentation, in: Advances in Neural Information Processing Systems, pp. 100–107.
Eslami SA, Heess N, Williams CK, Winn J, 2014. The shape boltzmann machine: a strong model of object shape. International Journal of Computer Vision 107, 155–176. [Google Scholar]
Feng C, Li C, Zhao D, Davatzikos C, Litt H, 2013. Segmentation of the left ventricle using distance regularized two-layer level set approach, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer; pp. 477–484. [DOI] [PubMed] [Google Scholar]
Gidaris S, Komodakis N, 2015. Object detection via a multi-region and semantic segmentation-aware cnn model, in: IEEE International Conference on Computer Vision, pp. 1134–1142. [Google Scholar]
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS, 2016. Deep learning for visual understanding: A review. Neurocomputing 187, 27–48. [Google Scholar]
Hinton G, 2010. A practical guide to training restricted boltzmann machines. Momentum 9, 926. [Google Scholar]
Hinton GE, 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800. [DOI] [PubMed] [Google Scholar]
Hinton GE, Osindero S, Teh YW, 2006. A fast learning algorithm for deep belief nets. Neural computation 18, 1527–1554. [DOI] [PubMed] [Google Scholar]
Huang C, Petibon Y, Ouyang J, Reese TG, Ahlman MA, Bluemke DA, El Fakhri G, 2015. Accelerated acquisition of tagged mri for cardiac motion correction in simultaneous pet-mr: Phantom and patient studies. Medical physics 42, 1087–1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang S, Liu J, Lee L, Venkatesh S, Teo L, Au C, Nowinski W, 2009. Segmentation of the left ventricle from cine mr images using a comprehensive approach. The MIDAS Journal [DOI] [PMC free article] [PubMed]
Kass M, Witkin A, Terzopoulos D, 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 321–331. [Google Scholar]
Kedenburg G, Cocosco CA, Kothe U, Niessen WJ, Vonken E.j.P., Viergever MA, 2006. Automatic cardiac mri myocardium segmentation using graphcut, in: SPIE Medical Imaging, International Society for Optics and Photonics; pp. 61440A–61440A. [Google Scholar]
Koller D, Friedman N, 2009. Probabilistic graphical models: principles and techniques MIT press. [Google Scholar]
Larochelle H, Bengio Y, 2008. Classification using discriminative restricted boltzmann machines, in: International Conference on Machine Learning, ACM; pp. 536–543. [Google Scholar]
Li C, Kao CY, Gore JC, Ding Z, 2008. Minimization of region-scalable fitting energy for image segmentation. IEEE Transactions on Image Processing 17, 1940–1949. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li C, Xu C, Gui C, Fox MD, 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 3243–3254. [DOI] [PubMed] [Google Scholar]
Li H, Chen HC, Dolly S, Li H, Fischer-Valuck B, Victoria J, Dempsey J, Ruan S, Anastasio M, Mazur T, et al. , 2016. An integrated model-driven method for in-treatment upper airway motion tracking using cine mri in head and neck radiation therapy. Medical Physics 43, 4700–4710. [DOI] [PubMed] [Google Scholar]
Liang X, Liu S, Wei Y, Liu L, Lin L, Yan S, 2015. Towards computational baby learning: A weakly-supervised approach for object detection, in: IEEE International Conference on Computer Vision, pp. 999–1007. [Google Scholar]
Liu L, Zhang Q, Wu M, Li W, Shang F, 2013. Adaptive segmentation of magnetic resonance images with intensity inhomogeneity using level set method. Magnetic Resonance Imaging 31, 567–574. [DOI] [PubMed] [Google Scholar]
Liu Y, Captur G, Moon JC, Guo S, Yang X, Zhang S, Li C, 2016. Distance regularized two level sets for segmentation of left and right ventricles from cine-mri. Magnetic Resonance Imaging 34, 699–706. [DOI] [PubMed] [Google Scholar]
Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. [DOI] [PubMed] [Google Scholar]
Lu Y, Radau P, Connelly K, Dick A, Wright G, 2009. Automatic image-driven segmentation of left ventricle in cardiac cine mri. The MIDAS Journal 49, 2. [Google Scholar]
Melchior J, Wang N, Wiskott L, 2017. Gaussian-binary restricted boltzmann machines for modeling natural image statistics. PloS one 12, e0171015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moeskops P, Viergever MA, Mendrik AM, de Vries LS, Benders MJ, Išgum I, 2016. Automatic segmentation of mr brain images with a convolutional neural network. IEEE Transactions on Medical Imaging 35, 1252–1261. [DOI] [PubMed] [Google Scholar]
Ngo TA, Lu Z, Carneiro G, 2017. Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance. Medical Image Analysis 35, 159–171. [DOI] [PubMed] [Google Scholar]
Osher S, Sethian JA, 1988. Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. Journal of Computational Physics 79, 12–49. [Google Scholar]
Pereira S, Pinto A, Alves V, Silva CA, 2016. Brain tumor segmentation using convolutional neural networks in mri images. IEEE Transactions on Medical Imaging 35, 1240–1251. [DOI] [PubMed] [Google Scholar]
Petitjean C, Dacher JN, 2011. A review of segmentation methods in short axis cardiac mr images. Medical Image Analysis 15, 169–184. [DOI] [PubMed] [Google Scholar]
Pham VT, Tran TT, Shyu KK, Lin LY, Wang YH, Lo MT, 2014. Multiphase b-spline level set and incremental shape priors with applications to segmentation and tracking of left ventricle in cardiac mr images. Machine vision and applications 25, 1967–1987. [Google Scholar]
Salakhutdinov R, Hinton G, 2009. Deep boltzmann machines, in: Artificial Intelligence and Statistics, pp. 448–455.
Srivastava N, Salakhutdinov RR, 2012. Multimodal learning with deep boltzmann machines, in: Advances in Neural Information Processing Systems, pp. 2222–2230.
Stalidis G, Maglaveras N, Efstratiadis SN, Dimitriadis AS, Pappas C, 2002. Model-based processing scheme for quantitative 4-d cardiac mri analysis. IEEE Transactions on Information Technology in Biomedicine 6, 59–72. [DOI] [PubMed] [Google Scholar]
Su J, Thomas DB, Cheung PY, 2016. Increasing network size and training throughput of fpga restricted boltzmann machines using dropout, in: International Symposium on Field-Programmable Custom Computing Machines, IEEE; pp. 48–51. [Google Scholar]
Sun S, Zhou W, Li H, Tian Q, 2014. Search by detection: Object-level feature for image retrieval, in: Proceedings of International Conference on Internet Multimedia Computing and Service, ACM; p. 46. [Google Scholar]
Sun W, Cetin M, Chan R, Reddy V, Holmvang G, Chandar V, Willsky A, 2005. Segmenting and tracking the left ventricle by learning the dynamics in cardiac images, in: International Conference on Information Processing in Medical Imaging, Springer; pp. 553–565. [DOI] [PubMed] [Google Scholar]
van Tulder G, de Bruijne M, 2016. Combining generative and discriminative representation learning for lung ct analysis with convolutional restricted boltzmann machines. IEEE Transactions on Medical Imaging 35, 1262–1272. [DOI] [PubMed] [Google Scholar]
Welling M, Teh YW, 2003. Approximate inference in boltzmann machines. Artificial Intelligence 143, 19–50. [Google Scholar]
Weygand J, Fuller CD, Ibbott GS, Mohamed AS, Ding Y, Yang J, Hwang KP, Wang J, 2016. Spatial precision in magnetic resonance imaging–guided radiation therapy: The role of geometric distortion. International Journal of Radiation Oncology* Biology* Physics 95, 1304–1316. [DOI] [PubMed] [Google Scholar]
Wu J, Daniel N, Lashmett H, Mazur T, Gach M, Ruan S, Anastasio M, Mutic S, Thomas M, Li H, 2017. Deep boltzmann machine-driven method for in-treatment heart motion tracking using cine mri, in: ISMRM 25th Annual Meeting and Exhibition, ISMRM; pp. 691–691. [Google Scholar]
Xu C, Prince JL, 1998. Snakes, shapes, and gradient vector flow. IEEE Transactions on image processing 7, 359–369. [DOI] [PubMed] [Google Scholar]
Zhang H, Zhang S, Li K, Metaxas DN, 2014. Robust shape prior modeling based on gaussian-bernoulli restricted boltzmann machine, in: International Symposium on Biomedical Imaging, IEEE; pp. 270–273. [Google Scholar]
Zhang Y, Sohn K, Villegas R, Pan G, Lee H, 2015. Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Download video file^{(1.5MB, mp4)}

[R1] Agn M, Law I, af Rosenschöld PM, Van Leemput K, 2016. A generative model for segmentation of tumor and organs-at-risk for radiation therapy planning of glioblastoma patients, in: SPIE Medical Imaging, International Society for Optics and Photonics; pp. 97841D–97841D. [Google Scholar]

[R2] Aubert G, Kornprobst P, 2006. Mathematical problems in image processing: partial differential equations and the calculus of variations volume 147 Springer Science & Business Media. [Google Scholar]

[R3] Babenko A, Slesarev A, Chigorin A, Lempitsky V, 2014. Neural codes for image retrieval, in: European conference on computer vision, Springer; pp. 584–599. [Google Scholar]

[R4] Battani R, Corsi C, Sarti A, Lamberti C, Piva T, Fattori R, 2003. Estimation of right ventricular volume without geometrical assumptions utilizing cardiac magnetic resonance data, in: Computers in Cardiology, IEEE; pp. 81–84. [Google Scholar]

[R5] Bland JM, Altman DG, 2010. Statistical methods for assessing agreement between two methods of clinical measurement. International Journal of Nursing Studies 47, 931–936. [PubMed] [Google Scholar]

[R6] Brieva J, Moya-Albor E, Escalante-Ramırez B, 2015. A level set approach for left ventricle detection in ct images using shape segmentation and optical flow, in: International Symposium on Medical Information Processing and Analysis, International Society for Optics and Photonics; pp. 92870K–92870K. [Google Scholar]

[R7] Chan TF, Vese LA, 2001. Active contours without edges. IEEE Transactions on Image Processing 10, 266–277. [DOI] [PubMed] [Google Scholar]

[R8] Chen F, Yu H, Hu R, Zeng X, 2013. Deep learning shape priors for object segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1870–1877. [Google Scholar]

[R9] Chen T, Reyhan M, Yue N, Metaxas DN, Haffty BG, Goyal S, 2015. Tagged mri based cardiac motion modeling and toxicity evaluation in breast cancer radiotherapy. Frontiers in oncology 5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Eslami S, Nicolas H, John W, 2012. The shape boltzmann machine: a strong model of object shape, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. [Google Scholar]

[R11] Eslami S, Williams C, 2012. A generative model for parts-based object segmentation, in: Advances in Neural Information Processing Systems, pp. 100–107.

[R12] Eslami SA, Heess N, Williams CK, Winn J, 2014. The shape boltzmann machine: a strong model of object shape. International Journal of Computer Vision 107, 155–176. [Google Scholar]

[R13] Feng C, Li C, Zhao D, Davatzikos C, Litt H, 2013. Segmentation of the left ventricle using distance regularized two-layer level set approach, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer; pp. 477–484. [DOI] [PubMed] [Google Scholar]

[R14] Gidaris S, Komodakis N, 2015. Object detection via a multi-region and semantic segmentation-aware cnn model, in: IEEE International Conference on Computer Vision, pp. 1134–1142. [Google Scholar]

[R15] Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS, 2016. Deep learning for visual understanding: A review. Neurocomputing 187, 27–48. [Google Scholar]

[R16] Hinton G, 2010. A practical guide to training restricted boltzmann machines. Momentum 9, 926. [Google Scholar]

[R17] Hinton GE, 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800. [DOI] [PubMed] [Google Scholar]

[R18] Hinton GE, Osindero S, Teh YW, 2006. A fast learning algorithm for deep belief nets. Neural computation 18, 1527–1554. [DOI] [PubMed] [Google Scholar]

[R19] Huang C, Petibon Y, Ouyang J, Reese TG, Ahlman MA, Bluemke DA, El Fakhri G, 2015. Accelerated acquisition of tagged mri for cardiac motion correction in simultaneous pet-mr: Phantom and patient studies. Medical physics 42, 1087–1097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Huang S, Liu J, Lee L, Venkatesh S, Teo L, Au C, Nowinski W, 2009. Segmentation of the left ventricle from cine mr images using a comprehensive approach. The MIDAS Journal [DOI] [PMC free article] [PubMed]

[R21] Kass M, Witkin A, Terzopoulos D, 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 321–331. [Google Scholar]

[R22] Kedenburg G, Cocosco CA, Kothe U, Niessen WJ, Vonken E.j.P., Viergever MA, 2006. Automatic cardiac mri myocardium segmentation using graphcut, in: SPIE Medical Imaging, International Society for Optics and Photonics; pp. 61440A–61440A. [Google Scholar]

[R23] Koller D, Friedman N, 2009. Probabilistic graphical models: principles and techniques MIT press. [Google Scholar]

[R24] Larochelle H, Bengio Y, 2008. Classification using discriminative restricted boltzmann machines, in: International Conference on Machine Learning, ACM; pp. 536–543. [Google Scholar]

[R25] Li C, Kao CY, Gore JC, Ding Z, 2008. Minimization of region-scalable fitting energy for image segmentation. IEEE Transactions on Image Processing 17, 1940–1949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Li C, Xu C, Gui C, Fox MD, 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 3243–3254. [DOI] [PubMed] [Google Scholar]

[R27] Li H, Chen HC, Dolly S, Li H, Fischer-Valuck B, Victoria J, Dempsey J, Ruan S, Anastasio M, Mazur T, et al. , 2016. An integrated model-driven method for in-treatment upper airway motion tracking using cine mri in head and neck radiation therapy. Medical Physics 43, 4700–4710. [DOI] [PubMed] [Google Scholar]

[R28] Liang X, Liu S, Wei Y, Liu L, Lin L, Yan S, 2015. Towards computational baby learning: A weakly-supervised approach for object detection, in: IEEE International Conference on Computer Vision, pp. 999–1007. [Google Scholar]

[R29] Liu L, Zhang Q, Wu M, Li W, Shang F, 2013. Adaptive segmentation of magnetic resonance images with intensity inhomogeneity using level set method. Magnetic Resonance Imaging 31, 567–574. [DOI] [PubMed] [Google Scholar]

[R30] Liu Y, Captur G, Moon JC, Guo S, Yang X, Zhang S, Li C, 2016. Distance regularized two level sets for segmentation of left and right ventricles from cine-mri. Magnetic Resonance Imaging 34, 699–706. [DOI] [PubMed] [Google Scholar]

[R31] Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. [DOI] [PubMed] [Google Scholar]

[R32] Lu Y, Radau P, Connelly K, Dick A, Wright G, 2009. Automatic image-driven segmentation of left ventricle in cardiac cine mri. The MIDAS Journal 49, 2. [Google Scholar]

[R33] Melchior J, Wang N, Wiskott L, 2017. Gaussian-binary restricted boltzmann machines for modeling natural image statistics. PloS one 12, e0171015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Moeskops P, Viergever MA, Mendrik AM, de Vries LS, Benders MJ, Išgum I, 2016. Automatic segmentation of mr brain images with a convolutional neural network. IEEE Transactions on Medical Imaging 35, 1252–1261. [DOI] [PubMed] [Google Scholar]

[R35] Ngo TA, Lu Z, Carneiro G, 2017. Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance. Medical Image Analysis 35, 159–171. [DOI] [PubMed] [Google Scholar]

[R36] Osher S, Sethian JA, 1988. Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. Journal of Computational Physics 79, 12–49. [Google Scholar]

[R37] Pereira S, Pinto A, Alves V, Silva CA, 2016. Brain tumor segmentation using convolutional neural networks in mri images. IEEE Transactions on Medical Imaging 35, 1240–1251. [DOI] [PubMed] [Google Scholar]

[R38] Petitjean C, Dacher JN, 2011. A review of segmentation methods in short axis cardiac mr images. Medical Image Analysis 15, 169–184. [DOI] [PubMed] [Google Scholar]

[R39] Pham VT, Tran TT, Shyu KK, Lin LY, Wang YH, Lo MT, 2014. Multiphase b-spline level set and incremental shape priors with applications to segmentation and tracking of left ventricle in cardiac mr images. Machine vision and applications 25, 1967–1987. [Google Scholar]

[R40] Salakhutdinov R, Hinton G, 2009. Deep boltzmann machines, in: Artificial Intelligence and Statistics, pp. 448–455.

[R41] Srivastava N, Salakhutdinov RR, 2012. Multimodal learning with deep boltzmann machines, in: Advances in Neural Information Processing Systems, pp. 2222–2230.

[R42] Stalidis G, Maglaveras N, Efstratiadis SN, Dimitriadis AS, Pappas C, 2002. Model-based processing scheme for quantitative 4-d cardiac mri analysis. IEEE Transactions on Information Technology in Biomedicine 6, 59–72. [DOI] [PubMed] [Google Scholar]

[R43] Su J, Thomas DB, Cheung PY, 2016. Increasing network size and training throughput of fpga restricted boltzmann machines using dropout, in: International Symposium on Field-Programmable Custom Computing Machines, IEEE; pp. 48–51. [Google Scholar]

[R44] Sun S, Zhou W, Li H, Tian Q, 2014. Search by detection: Object-level feature for image retrieval, in: Proceedings of International Conference on Internet Multimedia Computing and Service, ACM; p. 46. [Google Scholar]

[R45] Sun W, Cetin M, Chan R, Reddy V, Holmvang G, Chandar V, Willsky A, 2005. Segmenting and tracking the left ventricle by learning the dynamics in cardiac images, in: International Conference on Information Processing in Medical Imaging, Springer; pp. 553–565. [DOI] [PubMed] [Google Scholar]

[R46] van Tulder G, de Bruijne M, 2016. Combining generative and discriminative representation learning for lung ct analysis with convolutional restricted boltzmann machines. IEEE Transactions on Medical Imaging 35, 1262–1272. [DOI] [PubMed] [Google Scholar]

[R47] Welling M, Teh YW, 2003. Approximate inference in boltzmann machines. Artificial Intelligence 143, 19–50. [Google Scholar]

[R48] Weygand J, Fuller CD, Ibbott GS, Mohamed AS, Ding Y, Yang J, Hwang KP, Wang J, 2016. Spatial precision in magnetic resonance imaging–guided radiation therapy: The role of geometric distortion. International Journal of Radiation Oncology* Biology* Physics 95, 1304–1316. [DOI] [PubMed] [Google Scholar]

[R49] Wu J, Daniel N, Lashmett H, Mazur T, Gach M, Ruan S, Anastasio M, Mutic S, Thomas M, Li H, 2017. Deep boltzmann machine-driven method for in-treatment heart motion tracking using cine mri, in: ISMRM 25th Annual Meeting and Exhibition, ISMRM; pp. 691–691. [Google Scholar]

[R50] Xu C, Prince JL, 1998. Snakes, shapes, and gradient vector flow. IEEE Transactions on image processing 7, 359–369. [DOI] [PubMed] [Google Scholar]

[R51] Zhang H, Zhang S, Li K, Metaxas DN, 2014. Robust shape prior modeling based on gaussian-bernoulli restricted boltzmann machine, in: International Symposium on Biomedical Imaging, IEEE; pp. 270–273. [Google Scholar]

[R52] Zhang Y, Sohn K, Villegas R, Pan G, Lee H, 2015. Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258. [Google Scholar]

PERMALINK

A Deep Boltzmann Machine-Driven Level Set Method for Heart Motion Tracking Using Cine MRI Images

Jian Wu

Thomas R Mazur

Su Ruan

Chunfeng Lian

Nalini Daniel

Hilary Lashmett

Laura Ochoa

Imran Zoberi

Mark A Anastasio

H Michael Gach

Sasa Mutic

Maria Thomas

Hua Li

Abstract

Graphical Abstract

1. Introduction

2. Background: Deep Boltzmann Machine (DBM)

Figure 1.

3. Heart Motion Tracking Method

3.1. Method Overview

Figure 2.

3.2. Heart shape model training with a three-layered DBM

3.3. Energy formulation of the DBM-driven DRLSE method for heart segmentation

3.4. Energy minimization for heart shape segmentation

Figure 3.

4. Experimental Studies

Figure 4.

5. Experimental Results

5.1. Heart motion tracking accuracy

Figure 5.

5.2. Experimental comparison with other segmentation methods

Figure 6.

5.3. Quantitative evaluation of motion tracking results

Table 1:

Figure 7.

Figure 8.

Table 2:

Table 3:

5.4. Quantification of heart motion between breath-hold and free-breathing cine images

Figure 9.

Figure 10.

Figure 11.

Figure 12.

6. Discussion

7. Conclusion

Supplementary Material

Highlights.

8. Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases