Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 1.
Published in final edited form as: Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:1217–1220. doi: 10.1109/EMBC44109.2020.9176491

L-CO-Net: Learned Condensation-Optimization Network for Segmentation and Clinical Parameter Estimation from Cardiac Cine MRI

S M Kamrul Hasan 1, Cristian A Linte 1,2
PMCID: PMC8169002  NIHMSID: NIHMS1705225  PMID: 33018206

Abstract

In this work, we implement a fully convolutional segmenter featuring both a learned group structure and a regularized weight-pruner to reduce the high computational cost in volumetric image segmentation. We validated our framework on the ACDC dataset featuring one healthy and four pathology patient groups imaged throughout the cardiac cycle. Our technique achieved Dice scores of 96.8% (LV blood-pool), 93.3% (RV blood-pool), and 90.0% (LV Myocardium) with five-fold cross-validation and yielded similar clinical parameters as those estimated from the ground-truth segmentation data. Based on these results, this technique has the potential to become an efficient and competitive cardiac image segmentation tool that may be used for cardiac computer-aided diagnosis, planning, and guidance applications.

Keywords: Cine MRI, learned group-convolution, condensation-optimization network, ventricle segmentation

I. INTRODUCTION

The emerging success of Convolutional Neural Networks (CNNs) in solving high-level computer vision tasks can be utilized to develop machine learning tools that are capable of learning hierarchical features in an end-to-end manner [1]. Motivated by the superior performance of deep learning, the medical imaging community has also embraced the implementation of deep learning-based approaches for medical image segmentation [2], as a precursor task for clinical parameter estimation [3]. However, image segmentation in clinical settings still requires high accuracy and precision, with even minimal segmentation errors being unacceptable.

In the context of cardiac image segmentation, fully convolutional networks (FCNs) have become well established, thanks to their per pixel prediction capabilities. An example of such an application is the segmentation of various cardiac structures from MR images [4]. Similarly, Bai et al. [5] reported improved accuracy and robustness of the ventricles and atria segmentation by using a modified FCN architecture.

The formulation and integration of various regularization techniques has been a growing strategic trend to improve the generalization performance of neural networks. One such particularly compelling approach is the use of Dropout at the training stage of a neural network. However, the accuracy of a trained deep network will not be severely improved by dropping out a majority of connections at the training stage and hence current research efforts have been focused on the use of deep model compression tasks, including weight pruning [6], weight decay [7], and knowledge distillation [8].

Weight-pruning has aroused much research attention due to its faster inference with minimal loss in accuracy. Huang et al. [9] demonstrated the use of weight-pruning technique in a group-convolution setting, where a DenseNet type architecture can learn more sparse information during the training process and prune redundant connections between convolution layers.

In this work, we propose to use the concept of learned-group convolution and weight-pruning technique in a fully convolutional setting to segment the left and right ventricle blood-pool and left ventricle Myocardium from end-diastolic and end-systolic cardiac MR images in a more accurate and more efficient manner. To assess the performance of this proposed framework, we compare our results (Dice score, Hausdorff distance, and clinical parameters) to those obtained using five other segmentation architectures on the Automatic Cardiac Diagnosis Challenge (ACDC) dataset. Lastly, we show that the proposed learned-group convolution and weight-pruning technique improve the segmentation performance, as well as the estimation of clinical cardiac indices in cine MR slices.

II. Methodology

To tackle the task of precise and rapid heart chamber detection and segmentation in cine MR images, we propose a specifically designed network architecture — learned condensation-optimization network (L-CO-Net), shown in Fig. 1. Our proposed L-CO-Net framework substitutes the concept of both standard convolution and group convolution (G-Conv) with learned group-convolution (LG-Conv). While the standard convolution needs an increased level of computation, i.e. OIi×Oo, and concurrently, the pre-defined use of filters in each group convolution restricts its representation capability, these aforementioned problems are mitigated by introducing LG-Conv that learns group convolution dynamically during training through a multi-stage scheme. Before training, the input channels and filters are split into equally sized M groups denoted as Ik=I1k,I2k,,IMk and Fk=F1k,F2k,,FMk, where Iik is the ith feature map of kth layer. The output of this group convolution layer is formulated as Ik+1=F1kI1k,F2kI2k,,FMkIMk=f11ki11k,f12ki12k,,f1Nki1hk,f21ki21k,f22ki22k,,f2Nki2hk,,fM1kiM1k,fM2kiM2k,,fMNkiMhk, where Ik=i1k,i2k,,ihk, Fk=f1k,f2k,,fNk, h is the number of channels, and N is the number of filters. Since each group has its own weights, they can select their own set of relevant input features, assisting the system to predict most relevant features at the relevant connections. This multi-stage pipeline consists of multi-condensation stages followed by the optimization stage. In the first half of the pipeline, training is initiated by calculating the magnitude of the weights for each incoming feature, which are then averaged. After that, the low-magnitude weighted column is screened out from the features. Thus, a fraction of (C−1)/C is truncated after each of the C−1 condensing stages.

Fig. 1.

Fig. 1.

Illustration of L-CO-Net framework: (a) ROI detection around LV-RV; (b) Segmentation block consisting of a decoder and an encoder where each condense block (CB) consists of 3 Layers with a growth rate of k = 16. The transformations within each CB and the transition-down block are labeled with a cyan and yellow box, respectively. (c) Learned Group Convolution (LG-Conv) block is shown in the red rectangular box.

The second part of the pipeline is where all training occurs. This stage is focused on finding the optimal permutation connection that will share a similar sparsity pattern, to mitigate any negative effects on accuracy induced by the pruning process. As mentioned by Huang et al. [9], both the L1 and L2 regularization methods are efficient for solving the overfitting problem, but they do not perform well for network optimization. To address this limitation, we use an efficient regularizer referred to as group lasso (GL), which is a natural generalization of the standard lasso (least absolute shrinkage and selection operator) objective [10]. Additionally, the GL regularizer encourages group-level sparsity at the factor level by forcing all outgoing connections from a single neuron (corresponding to a group) to be either simultaneously zero or not.

A. Heart Localization

To reduce computational complexity and improve accuracy, a Fourier transform-based method proposed by Lin et al. [11] is used to automatically detect and extract a region of interest (ROI) that encompasses the LV and RV. The motivation for using the Fourier transform is that LV and RV are the only large moving structures in the thorax and move at the same frequency, dictated by the heart rate. Therefore, the pixel intensity changes over time between the LV blood-pool and the LV-myocardium, whereas the change in pixel intensity is almost static at the boundary. We enhanced the LV and RV regions by computing the Fourier transform for each slice and retaining only the first harmonic. Moreover, since the shape of the LV is circular in nature, we also used the circle Hough transform introduced by Oksuz et al. [12] to identify the center and radius of the ROI of the LV and RV. We then generated a bounding-box and used it to crop the ROI from the image (Fig. 1 (a)).

B. Heart Segmentation

The heart segmentation block in Fig. 1 (b) consists of both an encoder and a decoder path, where the encoder path has an input image size of 128 × 128, and three condense blocks (CBs) with feature map size of {1282, 542, 322}. We employ separable convolution with different filter sizes in the initial layers and then stack them together, as inspired by the Xception network.

We introduced a novel skip connection block which is computationally and memory-efficient (Fig. 1). The decoder is symmetrical to the encoder consisting of three blocks, comprised of 3 × 3 transposed convolutions CBs, and a softmax layer in the last layer for generating the image mask. The concatenation in skip-layer has been replaced by an element-wise addition operation to mitigate the problem of the feature-map explosion. We employ a number of layers per block as 2, 3, 4, 5, 4, 3, 2 with 32 initial feature maps, 3 max-pooling layers, a growth rate of k = 16, group/condense block = 4, and condensation factor, C = 4 (Fig. 1). The weights are updated during back-propagation operation by minimizing the dual loss function, LTotal:

LTotal=α.LEntropyA,E+β.1LDiceA,E (1)

where LEntropy is the weighted cross-entropy loss and LDice is the dice loss. The parameter α varies between 0 and 1 and β = 1−α. A be the training samples and E be the weights. The first term, LEntropy in equation 1 is used to calculate the weight map from the reference classes and labels, where L and V are the set of all reference classes and voxels in the training set, respectively in equation in 2.

LTotal=α.[aiA{lLscaleVclassfreq+lLedgescaleVedgefreq}logpri|ai;E]+β.[1lLBBlaiApri|ai;EGai+ϵlLBBlaiApri|ai;E+Gai+ϵ] (2)

Let ri be the label of the reference class corresponding to voxel aiA. B represents the number of pixels in a minibatch and Bl represents the number of pixels in each class lL. The term ϵ is used to prevent division by 0, when one of the sets is empty. The total loss, LTotal is minimized via the Adam optimizer and evaluated by dice scores associated with clinical indices i.e. ejection fraction and myocardial mass etc.

C. Imaging Data

For this study, we used the Automated Cardiac Diagnosis Challenge (ACDC) dataset1, consisting of short-axis cardiac cine-MR images acquired for 100 patients divided into 5 subgroups: normal (NOR), myocardial infarction (MINF), dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy (HCM), and abnormal right ventricle (ARV), available through the 2017 MICCAI-ACDC challenge [13] which are then splitted into 70% training and 15% validation set.

III. Results

The proposed architecture was evaluated on the MICCAI STACOM 2017 ACDC dataset in a stratified five-fold cross validation. Fig. 2 shows segmentation results and the ground truth masks for both 2D and 3D cases. Table I summarizes the comparison results, which show that our proposed model significantly improved the segmentation performance against several state-of-the-art multi-class segmentation techniques [13] in terms of Dice metrics, Hausdorff distance, and clinical parameters. Our proposed L-CO-Net architecture achieved a Dice score and (Hausdorff distance) of 96.8%(7.9mm) and 95.1%(6.4mm) for the LV blood-pool, 89.5%(8.9mm) and 90.0%(8.9mm) for the LV-Myocardium and 93.3%(11.2mm) and 87.43%(11.9mm) for the RV blood-pool in end-diastole and end-systole, respectively.

Fig. 2.

Fig. 2.

Representative ED and ES frames segmentation results of a complete cardiac cycle from the base (high slice index) to apex (low slice index) showing RV blood-pool, LV blood-pool, and LV-Myocardium in purple, red, and cyan respectively.

TABLE I.

Quantitative evaluation of the segmentation results in terms of Mean Dice Score (%) with Hausdorff distance(in mm), no. of parameters (×106), and the clinical indices evaluated on the ACDC dataset for LV, RV blood-pool and LV-myocardium compared across several best performing networks, including L-CO-Net. The statistical significance of the L-Co-Net results compared against five other baseline models are represented by *(p < 0.05) and ** (p < 0.01). The best dice scores and Hausdorff distances are emphasized using bold fonts.

End Diastole (ED) End Systole (ES)

UNet
[14]
DCN
[15]
MUNet
[16]
MNet
[17]
DNet
[18]
L-CO-Net UNet DCN MUNet MNet DNet L-CO-Net
Dice [LV] 95.0 96.0 96.3 96.1 96.4 96.8* 90.0 91.0 91.1 91.5 91.7 95.1**
Hausdorff (8.2) (7.5) (6.5) (7.7) (8.1) (7.9) (10.9) (9.6) (9.2) (7.1) (9.0) (6.4)
Dice [Myo] 88.2 87.5 89.2 87.5 88.9 89.5* 89.7 89.4 90.1 89.5 89.8 90.0*
Hausdorff (9.8) (11.1) (8.7) (9.9) (9.8) (8.9) (11.3) (10.7) (10.6) (8.9) (12.6) (8.9)
Dice[RV] 91.1 92.8 93.2 92.9 93.5 93.3 81.9 87.2 88.3 88.5 87.9 87.4
Hausdorff (13.5) (11.9) (12.7) (12.9) (14.0) (11.2) (18.7) (13.4) (14.7) (11.8) (13.9) (11.9)

#Parameters 4.1 - 19.0 2.11 0.65 0.34

The predicted segmentation was subsequently used to compute the clinical parameters. The agreement between the ground truth and the automatic is reported using correlation statistical analysis by mapping the predicted volumes of the testing set onto the ground truth volumes of the training set. As illustrated in Table II the agreement between our method’s prediction and ground truth is high, characterized by a Pearson’s correlation coefficient (rho) of 0.997(p < 0.01) for LV-EF, 0.998 for LV-EDV and 0.993(p < 0.1) for Myomass. There was a slight over-estimation in the RV blood-pool segmentation, also reflected in the clinical parameters estimation.

TABLE II.

correlation between clinical parameters estimated using L-CO-Net segmentation and homologous parameters estimated from six other baseline segmentation methods (*(p < 0.1), ** (p < 0.01)).

Parson’s Correlation Coefficient

UNet DCN MUNet MNet DNet EUNet L-CO-Net
LV EF 0.987 0.988 0.988 0.989 0.989 0.991 0.997**
LV EDV 0.997 0.993 0.995 0.993 0.997 0.997 0.998
RV EF 0.791 0.852 0.851 0.793 0.858 0.901 0.869
RV EDV 0.945 0.980 0.977 0.986 0.982 0.988 0.988
Myo mass 0.989 0.963 0.982 0.968 0.990 0.989 0.993*

DCN: Dilated Convolution Network, MUNet: Modified 3D UNet, MNet: Modified M-Net, DNet: DenseNet, EUNet[19]: Ensemble UNet, L-CO-Net: Learned Condensation-Optimization Network.

Fig. 3 shows a graphical comparison between the clinical parameters estimated from the cardiac features segmented via L-CO-Net and the same homologous parameters estimated from the ground truth, manual segmentations, for both healthy volunteers and patients featuring various cardiac conditions. As shown, the clinical parameters estimated using our automatically segmented features show no significant difference from those estimated based on the ground truth, manually segmented features.

Fig. 3.

Fig. 3.

Graphical comparison between clinical parameters estimated using L-CO-Net segmentation and same parameters estimated using the ground truth segmentation in terms of Mean(Std. Dev.) EDV (in mL) = end-diastolic volume, ESV (in mL) = end-systolic volume, SV (in mL) = stroke volume, EF (%) = ejection fraction MM (in gm) = myocardial mass.

In terms of performance, as summarized in Table I, our proposed L-CO-Net segmentation framework entails roughly 340, 000 parameters, which represents more than 10-fold reduction from the UNet (~ 4.1 million parameters), 60-fold reduction from MUNet (~ 19 million parameters), and a 2-fold reduction from the most parameter-efficient method reported here - DNet (∼ 650, 000 parameters).

IV. Discussion and Conclusion

In this paper, we propose a new memory-efficient architecture for accurate LV, RV blood-pool and myocardium segmentation, and clinical parameter quantification from breath-hold cine cardiac MRI. The capability of our network to learn the group structure allows multiple groups to re-use the same features via condensed connectivity. Moreover, the efficient weight-pruning methods lead to high computational savings without compromising segmentation accuracy. To the best of our knowledge, this is the first paper that presents a learned condensation-optimization approach for estimating clinical parameters from cardiac image segmentation in a fully convolutional setting. Our analysis across both healthy and abnormal patients indicated that the segmentation and estimated clinical parameters show no statistically significant difference from the ground truth manual segmentation and the inherently estimated clinical parameters.

Our proposed model outperforms several best methods according to dice scores, Hausdorff distances(HD), and clinical parameters, achieving 96.8% dice with 7.9mm HD for LV blood pool in ED and 95.1%(6.4mm) in ES phase, which showed at least 0.41% improvement in ED phase and 3.7% improvement in ES phase over the current methods, as well as more than 6% improvement over the traditional U-Net architecture. For LV-Myocardium segmentation, we achieved 89.5%(8.9mm) in ED and 90.0%(8.9mm) in ES, which showed at least 0.67% improvement in ED and 0.22% improvement in ES phase over the current methods, with at least a 10-fold reduction in the number of parameters.

To improve the robustness of L-CO-Net framework, we used a low-level image pre-processing operation which serves as a precursor preliminary segmentation that narrows the capture range of the subsequent deep learning segmentation and parameter estimation. Our experiments show that L-CO-Net runs on the ACDC dataset using 50% of the memory requirements of Dense-Net and 8% of the memory requirements of U-Net, while still maintaining excellent clinical accuracy. We observed that the segmentation results for RV have not improved significantly beyond those of the LV or myocardium. An alternative solution for better segmentation of the RV would be to explore an additional slice refinement and slice misalignment correction for future work.

Acknowledgments

Research reported in this publication was supported by the National Institute of General Medical Sciences Award No. R35GM128877 of the National Institutes of Health, and the Office of Advanced Cyber infrastructure Award No. 1808530 of the National Science Foundation.

Footnotes

References

  • [1].Kirillov Alexander, Girshick Ross, He Kaiming, and Dollár Piotr. Panoptic feature pyramid networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6399–6408, 2019. [Google Scholar]
  • [2].Ronneberger Olaf, Fischer Philipp, and Brox Thomas. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015. [Google Scholar]
  • [3].Avan Suinesiaputra et al. Fully-automated left ventricular mass and volume MRI analysis in the UK Biobank population cohort: evaluation of initial results. The International Journal of Cardiovascular Imaging, 34(2):281–291, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Tran Phi Vu. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv preprint arXiv:1604.00494, 2016.
  • [5].Bai Wenjia et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 20(1):65, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Ye Shaokai, Zhang Tianyun, Zhang Kaiqi, Li Jiayu, Xu Kaidi, Yang Yunfei, Yu Fuxun, Tang Jian, Fardad Makan, Liu Sijia, et al. Progressive weight pruning of deep neural networks using ADMM. arXiv preprint arXiv:1810.07378, 2018.
  • [7].Zhang Guodong, Wang Chaoqi, Xu Bowen, and Grosse Roger. Three mechanisms of weight decay regularization. arXiv preprint arXiv:1810.12281, 2018.
  • [8].Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  • [9].Huang Gao, Liu Shichen, Van der Maaten Laurens, and Weinberger Kilian Q. CondenseNet: An efficient DenseNet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2752–2761, 2018. [Google Scholar]
  • [10].Friedman Jerome, Hastie Trevor, and Tibshirani Robert. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.
  • [11].Lin Xiang, Cowan Brett R, and Young Alistair A. Automated detection of left ventricle in 4D MR images: experience from a large study. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 728–735. Springer, 2006. [DOI] [PubMed] [Google Scholar]
  • [12].Oksuz Ilkay et al. Automatic CNN-based detection of cardiac MR motion artefacts using k-space data augmentation and curriculum learning. Medical Image Analysis, 55:136–147, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Bernard Olivier, Lalande Alain, Zotti Clement, Cervenansky Frederick, Yang Xin, Heng Pheng-Ann, Cetin Irem, Lekadir Karim, Camara Oscar, Gonzalez Ballester Miguel Angel, et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Transactions on Medical Imaging, 37(11):2514–2525, 2018. [DOI] [PubMed] [Google Scholar]
  • [14].Patravali Jay, Jain Shubham, and Chilamkurthy Sasank. 2D-3D fully convolutional neural networks for cardiac MR segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 130–139. Springer, 2017. [Google Scholar]
  • [15].Wolterink Jelmer M, Leiner Tim, Viergever Max A, and Išgum Ivana. Automatic segmentation and disease classification using cardiac cine MR images. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 101–110. Springer, 2017. [Google Scholar]
  • [16].Baumgartner Christian F, Koch Lisa M, Pollefeys Marc, and Konukoglu Ender. An exploration of 2D and 3D deep learning techniques for cardiac MR image segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 111–119. Springer, 2017. [Google Scholar]
  • [17].Jang Yeonggul, Hong Yoonmi, Ha Seongmin, Kim Sekeun, and Chang Hyuk-Jae. Automatic segmentation of LV and RV in cardiac MRI. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 161–169. Springer, 2017. [Google Scholar]
  • [18].Khened Mahendra, Alex Kollerathu Varghese, and Krishnamurthi Ganapathy. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical Image Analysis, 51:21–45, 2019. [DOI] [PubMed] [Google Scholar]
  • [19].Isensee Fabian, Jaeger Paul F, Full Peter M, Wolf Ivo, Engelhardt Sandy, and Maier-Hein Klaus H. Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 120–129. Springer, 2017. [Google Scholar]

RESOURCES