Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 10.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2013;16(Pt 3):57–65. doi: 10.1007/978-3-642-40760-4_8

Segmentation of 4D Echocardiography Using Stochastic Online Dictionary Learning

Xiaojie Huang 1, Donald P Dione 4, Ben A Lin 4, Alda Bregasi 4, Albert J Sinusas 3,4, James S Duncan 1,2,3
PMCID: PMC12969474  NIHMSID: NIHMS2147572  PMID: 24505744

Abstract

Dictionary learning has been shown to be effective in exploiting spatiotemporal coherence for echocardiographic segmentation. To overcome the limitations of previous methods, we present a stochastic online dictionary learning approach for segmenting left ventricular borders from 4D echocardiography. It is based on stochastic approximations and processes a mini-batch of samples at a time, which results in lower memory consumption and lower computational cost than classical batch algorithms. In contrast to the previous methods, where dictionaries and their weights are optimized only on the most recently segmented frame, our stochastic online learning procedure optimizes the dictionaries and the corresponding weights by aggregating all the past information while adapting them to the dynamically changing data. The rate of updating the past information is controlled and varied according to the appearance scale to seek a balance between old and new information. Results on 26 4D echocardiographic images show the proposed method is more accurate, more robust, and faster than the previous batch algorithm.

1. Introduction

Segmentation of 4D echocardiography plays an important role in the quantitative analysis that provides important cardiac functional parameters such as ejection fraction and strain. Due to gross intensity inhomogeneities, characteristic artifacts, and poor contrast, automatic segmentation of the left ventricle is particularly challenging in echocardiography. The inherent spatiotemporal coherence of echocardiographic data provides useful constraints. The key observation is that the inherent spatio-temporal consistencies regarding image appearance (e.g., speckle pattern) and shape over the sequence can be exploited to guide cardiac border estimation. Statistical models have received considerable attention. Following the seminal work of Cootes et al. on statistical shape modeling [1], a number of statistical models [25] have been proposed for learning spatiotemporal priors offline from a database. The main limitation of these methods is that the high level spatiotemporal patterns in routine clinical images, especially for disease cases, may deviate from the priors learned from a database.

Exploiting individual data coherence through online learning overcomes this limitation. It is particularly attractive when a database is inapplicable or unavailable. Sparse representation and dictionary learning have recently been successfully applied to modelling local image appearance and segmenting left ventricular borders in 4D echocardiography [6, 7]. Dictionary learning on the fly exploits the spatiotemporal coherence inherent to individual data and achieves promising segmentation results [6]. However, these methods use classical second-order batch procedures for dictionary learning. The batch algorithm assumes a fixed-size dataset and accesses the whole training set at each iteration. It is memory-consuming and computationally expensive. It can be impractical when the training set is large. Every time new data is added to the training set, the dictionary needs to been retrained on the new complete training set in order to incorporate the new information, which makes the batch algorithm inefficient for dynamically changing data and online learning. In [6, 7], the appearance dictionaries are trained only on the last segmented frame rather than all the previous frames. This accelerates error accumulation and compromises the segmentation accuracy and reliability, especially for endocardial borders.

To overcome these limitations, we present a stochastic online dictionary learning approach for segmenting left ventricular borders from 4D echocardiography. It utilizes a stochastic optimization technique and processes a mini-batch of samples at a time, which results in lower memory consumption and lower computational cost than classical second-order batch algorithms. In contrast to the previous methods, our stochastic online learning procedure optimizes the dictionaries and the corresponding weights by aggregating the information of all the past frames while adapting the dictionaries to the latest segmented frame. The past information is carried forward by sufficient statistics. We weight the past information to control the rate at which the past information is updated by the new information. This updating rate varies with appearance scale to maintain a balance between old and new information.

2. Methods

2.1. Segmentation Framework

We employ a frame-by-frame sequential segmentation procedure interlaced with dictionary learning on the fly introduced in [6, 7]. Multiscale appearance dictionaries are dynamically updated each time a new frame is segmented. In a maximum a posteriori (MAP) framework, we estimate the shape St in frame It given the knowledge of Sˆ1:t-1 and I1:t:

Sˆt=argmaxStpStSˆ1:t-1,I1:t. (1)

It is approximated by a decomposition of information into intensity It, local appearance discriminant Rt, and shape prediction St*:

SˆtargmaxStpSt*StpRtStpItStpSt. (2)

The discriminant Rt summarizing multiscale local appearance dominates the estimation. It is predicted by multiscale appearance dictionaries Dt that are derived from Sˆ1:t-1 and I1:t-1 through sparse representation and dictionary learning. In [6, 7], the dictionaries Dt are trained only on Sˆt-1,It-1. The knowledge of the previous information is not fully utilized. This paper focuses on computing Dt more efficiently and reliably and achieving more accurate and reliable discriminant Rt. Further details of solving (2) can be found in [6].

2.2. Multiscale Sparse Representation

Let Ω denote the 3D image domain. We describe a pixel uΩ in frame It with a series of appearance vectors ytk(u)Rn at different appearance scales k=1,,J.ytk(u) is constructed by concatenating orderly the pixels in a local block centered at u and normalized to unit length. Complementary multiscale appearance information is extracted at different levels of Gaussian pyramid. A shape St in It is represented by a level set function Φt(u). The regions of interest are two band regions Ωt1=uΩ:0Φt(u)<ψ2 and Ωt2=uΩ:0>Φt(u)>-ψ1 which form two appearance classes. Let Dt1,Dt2k denote two dictionaries adapted to appearance classes Ωt1 and Ωt2 respectively at scale k. Under a sparse linear model, an appearance vector yRn can be decomposed as a sparse linear combination of the atoms from a dictionary DRn×K which encodes the typical patterns of a corresponding appearance class. That is, yDx, and x0 is small. How well ytk(u) is sparsely represented by the appearance dictionary Dtck is measured by the reconstruction residue:

Rtc(u)k=ytk(u)-Dtcxˆtc(u)k2 (3)

k{1,,J} and c{1,2}, where

xˆtc(u)k=argminxytku-Dtckx22s.t.x0T, (4)

where T is a sparsity factor. The residue indicates the likelihood u is in class c. Combining the multiscale information, we define the discriminant as

Rt(u)=k=1Jlog1βtksgnRt2(u)k-Rt1(u)k/j=1Jlog1βtj, (5)

uΩ, where βtk’s are the weighting parameters of the J appearance scales.

2.3. Stochastic Online Dictionary Learning

Learning a dictionary DRn×K from a finite training set Y=y1,,yMRn×M is to solve a joint optimization problem with respect to the dictionary D and the sparse representation coefficients X=x1,,xMRK×M:

minD,X12Y-DX22+λi=1Mxiq, (6)

where xq is a sparsity-inducing regularization that can be 0 pseudo norm or 1 norm. Classic algorithms for dictionary learning are second-order iterative batch algorithms such as the K-SVD [8] algorithm that is used in [6, 7]. The batch algorithm accesses the whole training set at each iteration and is memory consuming and computationally expensive. It may become impractical in the case of large training sets. This problem is aggravated when the data is dynamically changing over time like echocardiography, since the dictionary needs to be retrained on the new complete dataset each time new data is available. In [6, 7], the appearance dictionaries are updated each time a new frame is segmented, but they are only optimized on the newly segmented frame rather than all the previous frames. This accelerates accumulation of errors, especially at endocardial borders where there are often large deformations.

Stochastic online learning technique proposed in [9] can be used to overcome these limitations. It has recently been applied to shape modeling [10]. It processes one element of the training set at a time, which particularly suits applications with large training sets or image sequence analysis. It alternates classic sparse coding steps with dictionary update steps where the new dictionary Dm at mth iteration minimizes a surrogate for the empirical cost (6):

Dm=argminD1mi=1m12yi-Dxi22+λxi1 (7)

where sufficient statistics xi computed during the previous steps aggregate the past information. The past information is carried forward in matrices:

Am=Am-1+xmxmTandBm=Bm-1+ymxmT, (8)

which enables optimizing dictionaries on the past information without accessing the past data again. Then the dictionary update step (7) is reduced to solving (9) with initialization Dm-1. This procedure leads to faster performance and better dictionaries than classical batch algorithms [9]. It converges almost surely to a stationary point of the cost function and scales up gracefully to large datasets [9]. For dynamic data, the dictionary is dynamically updated by the new data while optimized on the whole dataset. Here we use a variant of [9] as summarized in Algorithm 1. We use a mini-batch extension that accesses a mini-batch of η samples per iteration to accelerate convergence. We assign weights ϱ to the past training data to control the rate of updating out-of-date information.

We introduce a stochastic online learning process supervised in a boosting framework [11] as detailed in Algorithm 2. Algorithm 1 is invoked to enforce the reconstructive property of the dictionaries. The boosting supervision strengthens the discriminative property and optimizes the weighting of multiscale information. At each time point t, the series of multiscale appearance dictionary pairs Dt1,Dt2k, matrices Atk and Btk, and the corresponding weighting parameters βtk,k=1,,J, are updated by the latest segmented frame t-1: training samples of appearance vectors belonging to two classes Yt-11k=yt-1k(u):uΩt-11 and Yt-12k=yt-1k(u):uΩt-12. In contrast to [6, 7] where Dt1,Dt2k and βtk depend only on frame t-1, we optimize Dt1,Dt2k and βtk by aggregating the information of all the preceding frames (stored in At-1k,Bt-1k, and βt-1k). If an error occurs in one frame, it can be compensated by the information of the previous frames. The propagation of errors is alleviated. The rate of updating the past information varies with appearance scale. Let lk be the axial width in millimeter of the local image at scale k, we set ϱk=alk-2 where aR+. Higher ϱ’s are assigned to finer appearance scales to incorporate more past information. Lower ϱ’s are assigned to coarser appearance scales to put more emphasis on the latest information, since the coarse appearance scale is more sensitive to cardiac deformation. The stochastic online learning procedure can be initialized either by offline learning from a suitable database or by a manual tracing.

Algorithm 1.

Stochastic Online Dictionary Learning

Require: training set yn~p(y) sparsity weight λ initial dictionary DmIn×K, initial iteration number mI and terminal iteration number mT, mini-batch size η, weight ϱ, and initial matrices AmI and BmI.
for m=mI to mT do
  Draw η samples Ym=ym,ii=1η from p(y)
  Sparse coding: xm,i=argminxKym,iDm1x22+λx1,i{1,,η}Am=ϱAm1+1ηi=1ηxm,ixm,iT,Bm=ϱBm1+1ηi=1ηym,ixm,iT.
  Update dictionary: compute Dm with Dm1 as initialization
Dm=argminD1m12TrDTDAmTrDTBm. (9)
end for
return dictionary DmT, and matrices AmT and BmT.

3. Results

We validated our method on 26 4D canine open-chest echocardiographic images acquired from both healthy and post-infarct animals using Phillips iE33 and an X7-2 array probe. Each image sequence spanned a cardiac cycle and contained about 25–30 volumes. The sequential segmentation was initialized with a manual tracing of the end-diastole volume. 100 volumes were randomly selected for expert manual segmentation and quality assessment. We evaluated automatic results against manual tracings using the following segmentation quality metrics: Hausdorff Distance (HD), Mean Absolute Distance (MAD), and Dice coefficient (DICE). We compared the proposed method to [6] that uses the batch dictionary learning technique K-SVD. The two algorithms shared the same set of relevant parameters. We used the following parameter setting: J=10,T=2,K=1.5n,N=10(t>2) or 20(t=2), η=2048,λ=0.8, and a=100.

Algorithm 2.

Boosted Multiscale Online Dictionary Learning

Require: training sets Yt11k=y1,iki=1M1 and Yt12k=y2,jkj=1M2, initial dictionaries Dt11;Dt12k, matrices At1k and Bt1k, accumulated # of previous iterations Nt1, weighting parameters βt1k,,ϱk,k=1,,J, mini-batch size η, # of iterations N, and sparsity factor T.
w11=w1,i1i=1M1=1,w21=w2,j1j=1M2=1.
for k=1 to J do
  Dictionary Learning: Apply Algorithm 1 for N iterations to adapt Dt1,Dt2k to Yt11k~p1k=p1,iki=1M1=w1ki=1M1w1,ik and Yt12k~p2k=p2,jkj=1M2=w2kj=1M2w2,jk. Use ϱ=ϱk for the first iteration and ϱ=1 for the rest.
  Sparse Coding: yYt11,Yt12k, solve (4) for sparse representations w.r.t. Dt1k and Dt2k and get residues Ry,Dt1k and Ry,Dt2k.
  Hypothesis hk:yYt11,Yt12k{0,1}:hk(y)=HeavisideRy,Dt2kRy,Dt1k. Calculate the error of hk:ϵk=11+ϱki=1M1p1,ikhky1,ik1+j=1M2p2,jkhky2,jk+ϱk1+ϱkβt1k1+βt1k. Set βtk=ϵk/1ϵk.
  Weight Update: w1,ik+1=w1,ikβtk1hky1,ik1,w2,jk+1=w2,jkβtk1hky2,jk.
end for
return Dt1,Dt2k,Atk,Btk,βtk,k=1,,J, and Nt=Nt1+N.

Figure 1 shows representative segmentation results for frames at end-systole when it is easiest to access error accumulation. In the top row, the batch method [6] resulted in more errors in end-systolic segmentations, since it learned appearance dictionaries only on the latest segmented frame and did not fully leverage the information carried in all the previous frames. The segmentation error of a frame is likely to propagate to the following frames. Images in the bottom row show the improved segmentation results by employing our new stochastic learning procedure. Since we optimize the dictionaries on all the previous frames, the error in a given frame is compensated by the information of the other frames. Figure 2 presents the quality measure curves from end-diastole to end-systole for the endocardial segmentations of a healthy sequence and a post-infarct sequence. DICE decays and HD and MAD rise from end-diastole to end-systole due to accumulation of errors. Compared to the batch method, our method resulted in flattened curves, which suggests our method effectively alleviates error accumulation and improves segmentation performance for both healthy and post-infarct images. For epicardial segmentation, the improvement was not significant, since the baseline accuracy of [6] was already very high (97% in DICE). Table 1 summarizes the statistics of segmentation quality measures and computational efficiency achieved by the two algorithms in segmenting endocardial borders. The proposed method achieved smaller mean MAD, smaller mean HD, larger mean DICE, and smaller standard deviations of all the measures. The overall segmentation accuracy and robustness were effectively improved using our stochastic online learning procedure. We tested the two algorithms on a laptop with Intel quad-core 2.2 GHz CPU and 8 GB memory. Both algorithms were implemented with a mixture of MATLAB and C++. The batch algorithm took about 45 seconds per frame for dictionary learning. The proposed algorithm took only about 25 seconds per frame.

Fig. 1.

Fig. 1.

Comparisons of segmentation results by the batch method (top row) and our method (bottom row). Green: Manual segmentation. Red: Automatic segmentation.

Fig. 2.

Fig. 2.

Segmentation quality measures at different frames of two example sequences (healthy (top row) and post-infarct (bottom row)) from end-diastole to end-systole. Blue: the batch method. Red: the proposed method.

Table 1.

Sample means ± standard deviations of the quality measures and dictionary learning time per frame for the segmentation of endocardial borders

DICE (%) MAD (mm) HD (mm) Time (s)
Batch Algorithm [6] 93.6 ± 2.49 0.57 ± 0.14 2.95 ± 0.62 ~45
Proposed Algorithm 94.6 ± 2.17 0.48 ± 0.11 2.83 ± 0.53 ~25

4. Conclusion

We have presented an approach for segmenting left ventricular borders from 4D echocardiography using stochastic online dictionary learning. It is based on a stochastic optimization technique resulting in lower memory consumption and computational cost than classical batch algorithms. We optimize the dictionaries and their weights on all the preceding frames while adapting them to the latest segmented frame. The rate of updating the past information is controlled and varies with appearance scale. Our method effectively improved the accuracy and robustness of endocardial segmentation and computational efficiency compared to the previous batch methods. Future work will include automating the dictionary initialization through offline learning. The stochastic learning procedure is suitable for both offline and online learning. A database that is too large for batch methods can be gracefully handled by our method which avoids accessing the database during online learning. Our method can ultimately be used to build an integrated offline and online learning framework.

Acknowledgments

This work was supported by NIH RO1HL082640.

References

  • 1.Cootes TF, Edwards GJ, Taylor CJ: Active appearance models. IEEE TPAMI 23(6), 681–685 (2001) [Google Scholar]
  • 2.Bosch JG, Mitchell SC, Lelieveldt BPF, Nijland F, Kamp O, Sonka M, Reiber JHC: Automatic segmentation of echocardiographic sequences by active appearance motion models. IEEE TMI 21(11), 1374–1383 (2002) [DOI] [PubMed] [Google Scholar]
  • 3.Jacob G, Noble JA, Behrenbruch CP, Kelion AD, Banning AP: A shape-space based approach to tracking myocardial borders and quantifying regional left ventricular function applied in echocardiography. IEEE TMI 21(3), 226–238 (2002) [DOI] [PubMed] [Google Scholar]
  • 4.Sun W, Çetin M, Chan R, Reddy V, Holmvang G, Chandar V, Willsky AS: Segmenting and tracking the left ventricle by learning the dynamics in cardiac images. In: Christensen GE, Sonka M (eds.) IPMI 2005. LNCS, vol. 3565, pp. 553–565. Springer, Heidelberg: (2005) [DOI] [PubMed] [Google Scholar]
  • 5.Zhu Y, Papademetris X, Sinusas AJ, Duncan JS: A dynamical shape prior for LV segmentation from RT3D echocardiography. In: Yang G-Z, Hawkes D, Rueckert D, Noble A, Taylor C (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 206–213. Springer, Heidelberg: (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang X, Dione DP, Compas CB, Papademetris X, Lin BA, Sinusas AJ, Duncan JS: A Dynamical Appearance Model Based on Multiscale Sparse Representation: Segmentation of the Left Ventricle from 4D Echocardiography. In: Ayache N, Delingette H, Golland P, Mori K (eds.) MICCAI 2012, Part III. LNCS, vol. 7512, pp. 58–65. Springer, Heidelberg: (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Huang X, Lin BA, Compas CB, Sinusas AJ, Staib LH, Duncan JS: Segmentation of left ventricles from echocardiographic sequences via sparse appearance representation. In: MMBIA, pp. 305–312 (2012) [Google Scholar]
  • 8.Aharon M, Elad M, Bruckstein A: K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE TSP 54(11), 4311–4322 (2006) [Google Scholar]
  • 9.Mairal J, Bach F, Ponce J, Sapiro G: Online dictionary learning for sparse coding. In: ICML, p. 87 (2009) [Google Scholar]
  • 10.Zhang S, Zhan Y, Zhou Y, Uzunbas M, Metaxas DN: Shape prior modeling using sparse representation and online dictionary learning. In: Ayache N, Delingette H, Golland P, Mori K (eds.) MICCAI 2012, Part III. LNCS, vol. 7512, pp. 435–442. Springer, Heidelberg: (2012) [DOI] [PubMed] [Google Scholar]
  • 11.Freund Y, Schapire R: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi PMB (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg: (1995) [Google Scholar]

RESOURCES