Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 13.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2012;15(0 3):58–65. doi: 10.1007/978-3-642-33454-2_8

A Dynamical Appearance Model Based on Multiscale Sparse Representation: Segmentation of the Left Ventricle from 4D Echocardiography

Xiaojie Huang 1, Donald P Dione 4, Colin B Compas 2, Xenophon Papademetris 2,3, Ben A Lin 4, Albert J Sinusas 3,4, James S Duncan 1,2,3
PMCID: PMC3889160  NIHMSID: NIHMS493196  PMID: 23286114

Abstract

The spatio-temporal coherence in data plays an important role in echocardiographic segmentation. While learning offine dynamical priors from databases has received considerable attention, these priors may not be suitable for post-infarct patients and children with congenital heart disease. This paper presents a dynamical appearance model (DAM) driven by individual inherent data coherence. It employs multi-scale sparse representation of local appearance, learns online multiscale appearance dictionaries as the image sequence is segmented sequentially, and integrates a spectrum of complementary multiscale appearance information including intensity, multiscale local appearance, and dynamical shape predictions. It overcomes the limitations of database-driven statistical models and applies to a broader range of subjects. Results on 26 4D canine echocardiographic images acquired from both healthy and post-infarct subjects show that our method significantly improves segmentation accuracy and robustness compared to a conventional intensity model and our previous single-scale sparse representation method.

1 Introduction

Segmentation of the left ventricle from 4D echocardiography plays an essential role in quantitative cardiac functional analysis. Due to gross image inhomogeneities, artifacts, and poor contrast between regions of interest, robust and accurate automatic segmentation of the left ventricle, especially the epicardial border, is very challenging in echocardiography. The inherent spatio-temporal coherence of echocardiographic data provides important constraints that can be exploited to guide cardiac border estimation and has motivated a spatio-temporal view point of echocardiographic segmentation.

Following the seminal work of Cootes et al. [1] on statistical shape/appearance modeling, a number of spatio-temporal statistical models (e.g., [26]) have been proposed for learning dynamical priors offine from databases. While these models have advantages in different aspects, the problem of forming a database that can handle a wide range of normal and abnormal heart images is still open to our knowledge. The assumption that different subjects have similar shape or motion pattern or their clinical images have similar appearance may not hold for routine clinical images, especially for disease cases, due to natural subject-to-subject tissue property variations and operator-to-operator variation in acquisition [7]. For example, for post-infarct patients, the positions, sizes and shapes of infarcts and thereby the overall heart motion can be highly variable across the population. It is very hard to build a reliable database accounting for all these variations, while such individual uniqueness is essentially desired information in some important applications like motion-based functional analysis. In addition, the tremendous cost of building reliable databases compromises the attractiveness of the database-driven methods.

Exploiting individual data coherence through online learning overcomes these limitations. It is particularly attractive when a database is inapplicable, unavailable, or defective. To this end, a model is indispensable for reliably uncovering the inherent spatio-temporal structure of individual 4D data. Sparse representation is a powerful mathematical framework for studying high-dimensional data. We proposed a 2D single-scale sparse-representation-based segmentation method in [8]. It shows the feasibility of analyzing 2D+t echocardiographic images via sparse representation and online dictionary learning. However, it is difficult to directly apply this method to 4D data. An important limitation is that it utilizes only a single scale of appearance information and requires careful tuning of scale parameters. This compromises segmentation accuracy and robustness. This paper generalizes our previous work [8] and introduces a new 3D dynamical appearance model (DAM) that leverages a full spectrum of complementary multiscale appearance information including intensity, multiscale local appearance, and shape. It employs multiscale sparse representation of high-dimensional local appearance, encodes appearance patterns with multiscale appearance dictionaries, and dynamically updates the dictionaries as the frames are segmented sequentially. The online multiscale dictionary learning process is supervised in a boosting framework to seek optimal weighting of multiscale information and generate dictionaries that are both generative and discriminative. Sparse coding w.r.t. the predictive dictionaries produces a local appearance discriminant. We also include intensity and a dynamical shape prediction to complete the appearance spectrum that we incorporate into a MAP framework.

2 Methods

2.1 Multiscale Sparse Representation

Let Ω denote the 3D image domain. We describe the multiscale local appearance at a pixel uΩ in frame It with a series of appearance vectors ytk(u)IRn at different appearance scales k = 1, …, J. ytk(u) is constructed by concatenating orderly the pixels within a block centered at u. Complementary multiscale appearance information is extracted using a fixed block size at different levels of Gaussian pyramid. Modeled with sparse representation, an appearance vector y ∈ IRn can be represented as a sparse linear combination of the atoms from an appearance dictionary D ∈ IRn×K which encodes the typical patterns of a corresponding appearance class. That is, yDx. Given y, D, and a sparsity factor T0, the sparse representation x can be solved by sparse coding:

minxy-Dx22s.t.x0T0. (1)

A shape st in It is represented by a level set function Φt(u). We define Φt+(u)=Φt(u)+ψ1 and Φt-(u)=Φt(u)-ψ2. The regions of interest are two band regions Ωt1={uΩ:Φt-(u)<0,Φt(u)>0}, and Ωt2={uΩ:Φt+(u)>0,Φt(u)<0}. We define Ωt={uΩ:Φt-1+(u)+ζ10,Φt-1-(u)-ζ20}. The constants are chosen such that stΩt. Suppose {Dt1,Dt2}k are two dictionaries adapted to appearance classes Ωt1 and Ωt2 respectively at scale k. They exclusively span, in terms of sparse representation, the subspaces of the respective classes. Reconstruction residues are defined as

{Rtc(u)}k=ytk(u)-{Dtcx^tc(u)}k2 (2)

uΩt, k ∈ {1, …, J}, and c ∈ {1, 2}, where x^tc is the sparse representation of ytk w.r.t. Dtc. It is logical to expect that {Rt1(u)}k>{Rt2(u)}k when uΩt2, and {Rt1(u)}k<{Rt2(u)}k when uΩt1. Combining the multiscale information, we introduce a local appearance discriminant

Rt(u)=1Ωt(u)k=1J(log1βk)sgn({Rt2(u)}k-{Rt1(u)}k), (3)

uΩ, where βk’s are the weighting parameters of the J appearance scales.

2.2 Online Multiscale Dictionary Learning

To obtain the discriminant Rt, {Dt1,Dt2}k and βk need to be learned. Leveraging the inherent spatio-temporal coherence of individual data, we introduce an online multiscale appearance dictionary learning process supervised in a boosting framework. We interlace the processes of dictionary learning and segmentation as illustrated in Fig. 1. Similar to the database-driven dynamical shape models [35], we also assume a segmented first frame for initialization. It can be achieved by some automatic method with expert correction or purely manual segmentation. We dynamically update the multiscale appearance dictionaries each time a new frame is segmented. For t > 2, {Dt1,Dt2}k are well initialized with {Dt-11,Dt-12}k and updated with only a few iterations. {D21,D22}k are initialized with training samples. To reduce propagation error, we divide a sequence into two subsequences to perform bidirectional segmentation like [8]. The proposed dictionary learning algorithm following the structure of the AdaBoost [9] is detailed in Algorithm 1. J dictionary pairs {Dt1,Dt2}k and weighting parameters βk are learned from two classes of appearance samples {Yt-11}k={yt-1k(u):uΩt-11} and {Yt-12}k={yt-1k(u):uΩt-12}, k = 1, …, J. The K-SVD [10] algorithm is invoked to enforce the reconstructive property of the dictionaries. The boosting supervision strengthens the discriminative property of the dictionaries and optimizes the weighting of multiscale information.

Fig. 1.

Fig. 1

Dynamical dictionary update interlaced with sequential segmentation

Algorithm 1. Multiscale Appearance Dictionary Learning.

Input: appearance samples {Yt-11}k={y1,ik}i=1M1 and {Yt-12}k={y2,jk}j=1M2, initial dictionaries {Dt-11,Dt-12}k, k = 1, …, J, w11={w1,i1}i=1M1=1,w21={w2,j1}j=1M2=1.
Output: dictionary pairs {Dt1,Dt2}k, weighting parameters βk, k = 1, …, J.
For k = 1, …, J:
  • Resampling: Draw sample sets Y1k from {Yt-11}k and Y2k from {Yt-12}k based on distributions p1k={p1,ik}i=1M1=w1ki=1M1w1,ik and p2k={p2,jk}j=1M2=w2kj=1M2w2,jk

  • Dictionary Learning: Apply the K-SVD to learn {Dt1,Dt2}k from Y1k and Y2k:

    minDtc,XYck-DtcX22s.t.i,xi0T0;c{1,2}. (6)
  • Sparse Coding: y{Yt-11,Yt-12}k, solve the sparse representations w.r.t. {Dt1}k, and {Dt2}k using the OMP [11], and get residues R(y,Dt1)k and R(y,Dt2)k

  • Classification: Make a hypothesis hk:y{Yt-11,Yt-12}k{0,1}:hk(y)=Heaviside(R(y,Dt2)k-R(y,Dt1)k). Calculate the error of hk:εk=i=1M1p1,ikhk(y1,ik)-1+j=1M2p2,jkhk(y2,jk). Set βk = εk/(1 − εk).

  • Weight Update: w1,ik+1=w1,ikβk1-hk(y1,ik)-1,w2,jk+1=w2,jkβk1-hk(y2,jk).

2.3 MAP Estimation

We estimate the shape Φt in frame It given the knowledge of Φ̂1:t−1 and I1:t. Different from the single-scale method in [8], we integrate a spectrum of complementary multiscale appearance information including intensity, the multiscale local appearance discriminant, and a dynamical shape prediction Φt. Since Φt−1 and Φt−2 are both spatially and temporally close, we assume a constant evolution speed during [t − 2, t]. Within the band domain Ωt1Ωt2 we introduce an approximate shape prediction Φt=Φ^t-1+G(Φ^t-1-Φ^t-2) to regularize the shape estimation. Here G(*) denotes Gaussian smoothing operation used to preserve the smoothness of level set function. The segmentation is estimated by maximizing the posterior probability:

Φ^t=argmaxΦtp(Φ^1:t-1,I1:t-1,ItΦt)p(Φt)argmaxΦtp(Φt,Rt,ItΦt)p(Φt)argmaxΦtp(ΦtΦt)p(RtΦt)p(ItΦt)p(Φt). (4)

The shape regularization is given by p(ΦtΦt)p(Φt)exp{-γΩt1Ωt2(Φt-Φt)2du}exp{-μΩδ(Φt)Φtdu}. We assume i.i.d. normal distribution of Rt:p(RtΦt)uΩt1exp{-[Rt(u)-c1]22ω12}uΩt2exp{-[Rt(u)-c2]22ω22}, and i.i.d. Rayleigh density of It:p(ItΦt)=uΩt1It(u)σ12exp{-It(u)22σ12}uΩt2It(u)σ22exp{-It(u)22σ22}. Since intensity is not helpful for epicardial discrimination, p(It|Φt) is dropped in the epicardial case. The overall segmentation energy functional is given by:

E(Θ,Φt)=Ωt1It2/2σ12+log(σ12/It)du+Ωt2It2/2σ22+log(σ22/It)du+Ωt1(Rt-c1)2/2ω12du+Ωt2(Rt-c2)2/2ω22du+γΩt1Ωt2(Φt-Φt)2du+μΩδ(Φt)Φtdu, (5)

where Θ = [c1, c2, ω1, ω2, σ1, σ2]. We minimize the energy functional as follows: (a) Initialize Φt0 with Φt−1, τ = 0; (b) Compute the maximum likelihood estimate of Θ(Φtτ); (c) Update Φtτ+1 by gradient descent; (d) Reinitialize Φtτ+1 after every few iterations; (e) Stop if Φtτ+1-Φtτ2<ξ, otherwise, τ = τ + 1, go to (b).

3 Experiments and Results

We acquired 26 3D canine echocardiographic sequences from both healthy and post-infarct subjects using a Phillips iE33 ultrasound imaging system with a frame rate of ~ 40 Hz. Each sequence spanned a cardiac cycle. The sequential segmentation was initialized with a manual tracing of the first frame. Both endo-cardial and epicardial borders were segmented throughout the sequences. Fig. 2 shows typical segmentation examples by our method. 100 frames were randomly drawn from ~ 700 frames for manual segmentation and quality assessment. We evaluated automatic results against expert manual tracings using the following segmentation quality metrics: Hausdorff Distance (HD), Mean Absolute Distance (MAD), Dice coefficient (DICE), and Percentage of True Positives (PTP).

Fig. 2.

Fig. 2

Typical segmentations by our method (red, purple) and manual tracings (green)

Benefit from the Dynamical Appearance Model

When the dynamical appearance components are turned off, our model reduces to a conventional ultrasound intensity model: the Rayleigh model [12]. Comparison with the Rayleigh method clearly shows the added value of the proposed DAM. Since the Rayleigh method is generally sensitive to initial contours, we initialized its segmentation of each frame with the first frame manual tracing. Fig. 3(a) compares typical segmentation examples by the Rayleigh method and our method. We observed that the Rayleigh method was easily trapped by misleading intensity information (e.g., image inhomogeneities and artifacts), while our approach produced accurate segmentations. Fig. 2 qualitatively shows the capability of the DAM in estimating reliably 3D left ventricular borders throughout the whole cardiac cycle. The Rayleigh method did not generate acceptable segmentation sequences in the experiment. Table 1 demonstrates that the DAM significantly outperformed the Rayleigh model. Better means (higher DICE and PTP, and lower MAD and HD) and lower standard deviations show the remarkable improvement of segmentation accuracy and robustness achieved by employing the DAM.

Fig. 3.

Fig. 3

Segmentation examples. (a) Manual (Green), DAM (Red), Rayleigh (Blue). (b) Manual (Green), DAM (Red), SSR (Blue).

Table 1.

Sample means and standard deviations of segmentation quality measures

expressed as mean±std DICE (%) PTP (%) MAD (mm) HD (mm)
Endocardial Rayleigh [12] 74.9 ± 18.8 83.1 ± 16.3 2.01 ± 1.22 9.17 ± 3.37
DAM 93.6 ± 2.49 94.9 ± 2.34 0.57 ± 0.14 2.95 ± 0.62
SSDM [5] —— 95.9 ± 1.24 1.41 ± 0.40 2.53 ± 0.75
Epicardial Rayleigh [12] 74.1 ± 17.4 82.5 ± 12.0 2.80 ± 1.55 16.9 ± 9.30
DAM 97.1 ± 0.93 97.6 ± 0.86 0.60 ± 0.19 3.03 ± 0.76
SSDM [5] —— 94.5 ± 1.74 1.74 ± 0.39 2.79 ± 0.97

Advantages over Single-scale Sparse Representation

We compared our DAM to the single-scale sparse representation model (SSR) in [8]. The SSR was extended to 3D and performed at 5 appearance scales ranging from low scale 3.5 × 3.5 × 3.5mm3 to high scale 15.5 × 15.5 × 15.5mm3, while the DAM utilized multiscale appearance information. Fig. 3(b) presents end-systolic segmentation examples showing that the use of DAM resulted in lower propagation error and higher segmentation accuracy compared to the SSR. Fig. 4 presents the quantitative results of the comparison study. We observed that the performance of the SSR varied with the scale, which implies its sensitivity to the appearance scale. The SSR required scale tuning to get better results. The DAM achieved the best results in almost all the metrics for both endocardial and epicardial segmentations, which demonstrates the advantages of DAM over SSR. By summarizing complementary multiscale appearance information, the DAM consistently produced accurate segmentations without careful parameter tuning.

Fig. 4.

Fig. 4

Means and 95% confidence intervals obtained by the SSR (blue, scales 1, …, 5) and the DAM (yellow, 6) in endocardial (top row) and epicardial (bottom row) cases.

Comparison with Database-Driven Dynamical Models

Table 1 compares the HD, MAD and PTP achieved by our model and that by a state-of-the-art database-driven dynamical shape model SSDM reported in [5]. The database-free DAM achieved comparable results with the SSDM, and outperformed the SSDM in segmenting epicardial borders. It is worth noticing that the DAM does not require more human interaction at the segmentation stage than the database-driven dynamical models such as [35] which also need manual tracings of the first or first few frames for initialization.

4 Discussion and Conclusion

We have proposed a 3D dynamical appearance model that exploits the inherent spatio-temporal coherence of individual echocardiographic data. It employs mul-tiscale sparse representation, online multiscale appearance dictionary learning, and a spectrum of complementary multiscale appearance information including intensity, multiscale local appearance, and shape. Our method resulted in significantly improved accuracy and robustness of left ventricular segmentation compared to a standard intensity method and our previous single-scale sparse representation method. The DAM achieved comparable results with a state-of-the-art database-driven statistical dynamical model SSDM. Since the DAM is database-free, it overcomes the limitations introduced by the use of databases. The DAM can be applied to the cases (e.g., the post-infarct subjects in this study) where it is inappropriate to apply database-based a priori motion or shape knowledge. Even when the priors are effective, the DAM can be a good choice for complementing the database and relaxing the reliance of statistical models (e.g., [26]) on database quality. Future work includes extensions to human data, other modalities, and an integrated online and offine learning framework to exploit their complementarity.

Footnotes

This work was supported by NIH RO1HL082640.

References

  • 1.Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE TPAMI. 2001;23(6):681–685. [Google Scholar]
  • 2.Bosch JG, Mitchell SC, Lelieveldt BPF, Nijland F, Kamp O, Sonka M, Reiber JHC. Automatic segmentation of echocardiographic sequences by active appearance motion models. IEEE TMI. 2002;21(11):1374–1383. doi: 10.1109/TMI.2002.806427. [DOI] [PubMed] [Google Scholar]
  • 3.Jacob G, Noble JA, Behrenbruch CP, Kelion AD, Banning AP. A shape-space based approach to tracking myocardial borders and quantifying regional left ventricular function applied in echocardiography. IEEE TMI. 2002;21(3):226–238. doi: 10.1109/42.996341. [DOI] [PubMed] [Google Scholar]
  • 4.Sun W, Çetin M, Chan R, Reddy V, Holmvang G, Chandar V, Willsky AS. Segmenting and Tracking the Left Ventricle by Learning the Dynamics in Cardiac Images. In: Christensen GE, Sonka M, editors. IPMI 2005. LNCS. Vol. 3565. Springer; Heidelberg: 2005. pp. 553–565. [DOI] [PubMed] [Google Scholar]
  • 5.Zhu Y, Papademetris X, Sinusas AJ, Duncan JS. A Dynamical Shape Prior for LV Segmentation from RT3D Echocardiography. In: Yang G-Z, Hawkes D, Rueckert D, Noble A, Taylor C, editors. MICCAI 2009, Part I. LNCS. Vol. 5761. Springer; Heidelberg: 2009. pp. 206–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yang L, Georgescu B, Zheng Y, Meer P, Comaniciu D. 3D ultrasound tracking of the left ventricle using one-step forward prediction and data fusion of collaborative trackers. CVPR. 2008 [Google Scholar]
  • 7.Noble JA, Boukerroui D. Ultrasound image segmentation: a survey. IEEE TMI. 2006;25(8):987–1010. doi: 10.1109/tmi.2006.877092. [DOI] [PubMed] [Google Scholar]
  • 8.Huang X, Lin BA, Compas CB, Sinusas AJ, Staib LH, Duncan JS. Segmentation of left ventricles from echocardiographic sequences via sparse appearance representation. MMBIA. 2012 Jan;:305–312. [Google Scholar]
  • 9.Freund Y, Schapire R. A Desicion-Theoretic Generalization of On-line Learning and an Application to Boosting. In: Vitányi PMB, editor. EuroCOLT 1995. LNCS. Vol. 904. Springer; Heidelberg: 1995. pp. 23–37. [Google Scholar]
  • 10.Aharon M, Elad M, Bruckstein A. K-SVD: An algorithm for designing overcom-plete dictionaries for sparse representation. IEEE TSP. 2006;54(11):4311–4322. [Google Scholar]
  • 11.Tropp J. Greed is good: algorithmic results for sparse approximation. IEEE TIT. 2004;50(10):2231–2242. [Google Scholar]
  • 12.Sarti A, Corsi C, Mazzini E, Lamberti C. Maximum likelihood segmentation of ultrasound images with rayleigh distribution. IEEE TUFFC. 2005;52(6):947–960. doi: 10.1109/tuffc.2005.1504017. [DOI] [PubMed] [Google Scholar]

RESOURCES