Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 8.
Published in final edited form as: Stat Atlases Comput Models Heart. 2021 Jan 29;2020:3–13. doi: 10.1007/978-3-030-68107-4_1

A Persistent Homology-Based Topological Loss Function for Multi-class CNN Segmentation of Cardiac MRI

Nick Byrne 1,2,, James R Clough 2, Giovanni Montana 3, Andrew P King 2
PMCID: PMC7610940  EMSID: EMS124653  PMID: 34109327

Abstract

With respect to spatial overlap, CNN-based segmentation of short axis cardiovascular magnetic resonance (CMR) images has achieved a level of performance consistent with inter observer variation. However, conventional training procedures frequently depend on pixel-wise loss functions, limiting optimisation with respect to extended or global features. As a result, inferred segmentations can lack spatial coherence, including spurious connected components or holes. Such results are implausible, violating the anticipated topology of image segments, which is frequently known a priori. Addressing this challenge, published work has employed persistent homology, constructing topological loss functions for the evaluation of image segments against an explicit prior. Building a richer description of segmentation topology by considering all possible labels and label pairs, we extend these losses to the task of multi-class segmentation. These topological priors allow us to resolve all topological errors in a subset of 150 examples from the ACDC short axis CMR training data set, without sacrificing overlap performance.

Keywords: Image segmentation, CNN, Topology, MRI

1. Introduction

Medical image segmentation is a prerequisite of many pipelines dedicated to the analysis of clinical data. In cardiac magnetic resonance (CMR) imaging, segmentation of short axis cine images into myocardial, and left and right ventricular components permits quantitative assessment of important clinical indices derived from ventricular volumes [17]. Motivated by the burden of manual segmentation, researchers have sought automated solutions to this task.

To this end, deep learning, and in particular convolutional neural networks (CNNs), have brought significant progress [4]. One key to their success has been the design of specialised architectures dedicated to image segmentation. Theoretically, these permit the learning of image features related to extended spatial context, such as anatomical morphology.

Whilst considerable effort has gone into investigating methods for the extraction of multi-scale image features, less attention has been paid to their role in network optimisation [7]. Instead, CNNs have often been trained using pixel-wise loss functions based on cross-entropy (CE) or the Dice Similarity Coefficient (DSC). Though having favourable numerical properties, these are insensitive to higher order image features such as topology. In the absence of this information, CNN optimisation can result in predicted segmentations with unrealistic properties such as spurious connected components or holes [15]. These errors can appear nonsensical, violating the most basic features of image segments. If small, the presence of such errors may not preclude a high spatial overlap with the ground truth and have little consequence for certain clinical indices. However, a wider array of downstream applications such as biophysical modelling or 3D printing, demand a high fidelity representation of such features.

In short axis CMR, prior knowledge dictates that the right ventricular cavity is bound to the left ventricular myocardium, which in turn fully surrounds the left ventricular blood pool. In contrast to pixel-wise objectives, this is a global description of segmentation coherence. However, whilst these constraints are simple to express qualitatively, the opaque nature of CNNs has made it difficult to explicitly exploit such prior information in model optimisation.

1.1. Related Work

At least in the context of large, homogeneous training datasets, CNN-based short axis segmentation has achieved a level of performance consistent with inter observer variation [4]. However, in studies of cardiovascular disease (for which datasets typically contain fewer subjects who exhibit greater morphological variation), a deficit remains. This gap is in part characterised by the anatomically implausible errors described.

To address these modes of failure, previous works have sought to leverage prior information. The combination of deep learning with atlas-based segmentation [7] and active contour refinement [1] have both been investigated. Whilst these extensions improve performance, their capacity to represent pathological variation in image features is limited. Accordingly, others have injected prior information directly into CNN optimisation, developing a supervisory signal from a learned, latent distribution of plausible anatomy [13]. Their implicit embedding, however, hinders an understanding of the extent to which such priors are related to morphology or topology as claimed. Bridging this gap, Painchaud et al. augmented the latent space via a rejection sampling procedure, maintaining only those cases satisfying sixteen criteria related to anatomical plausibility [15].

Eleven of Painchaud et al.’s criteria concern anticipated anatomical topology. Structured losses have previously been designed to capture aspects of segmentation topology including hierarchical class containment [2] and adjacency [9,11]. More recently, CNN-based segmentation has benefited from the global, exhaustive and robust description of topology provided by persistent homology (PH). PH admits the construction of topological loss functions which, in contrast to those built on a latent representation of anatomical shape, allow evaluation against an explicit topological prior. Applications have included segmenting the tree-like topology of the murine neurovasculature [10]; and the toroidal topology of the myocardium in short axis CMR [5,6].

1.2. Contributions

To the best of our knowledge, no PH-based loss function has been proposed for the task of multi-class segmentation. Compared with the binary case, extension to this setting considers a richer set of topological priors, including hierarchical class containment and adjacency. short axis CMR segmentation is a useful test bed for this task, not only for its clinical significance, but also for its economic representation of such priors in a well-defined, four-class problem. In this context, our contributions are as follows:

  1. We propose a novel topological loss function, based on PH, for the task of multi-class image segmentation.

  2. We use the novel loss function for CNN-based segmentation of short axis CMR images from the ACDC dataset.

  3. We demonstrate significant improvement in segmentation topology without degradation in spatial overlap performance.

2. Materials and Methods

We address a multi-class segmentation task, seeking a meaningful division of the 2D short axis CMR image X : ℝ × ℝ → ℝ into background, right and left ventricular cavities and left ventricular myocardium1. We denote the ground truth image segmentation by Y : ℝ × ℝ → {0, 1}4, being made up by four mutually exclusive class label maps: Ybg, Yrv, Ymy and Ylv. We consider a deep learning solution, optimising the parameters, θ, of a CNN to infer the probabilistic segmentation, Ŷ : ℝ × ℝ → [0, 1]4, a distribution over the per class segmentation maps: Ŷbg, Ŷrv, Ŷmy and Ŷlv. We write segmentation inference as Ŷ = f (X; θ). Given the success of CNN-based solutions, we assume that, at least with respect to spatial overlap, Ŷ is a reasonable estimate of Y. In this setting we describe our CNN post-processing framework for the correction of inferred segmentation topology.

2.1. Multi-class Topological Priors

In 2D, objects with differing topology can be distinguished by the first two Betti numbers: b = (b 0, b 1). Intuitively, b0 counts the number of connected components which make up an object, and b 1, the number of holes contained [14]. The Betti numbers are topological invariants permitting the specification of priors for the description of foreground image segments. Consider our short axis example:

brv=(1,0)bmy=(1,1)blv=(1,0) (1)
brvmy=(1,1)brvlv=(2,0)bmylv=(1,0) (2)

Equation set (1) specifies that each of the right ventricle, myocardium and left ventricle should comprise a single connected component, and that the myocardium should contain a single hole. However, these equations only provide a topological specification in a segment-wise, binary fashion: they fail to capture inter-class topological relationships between cardiovascular anatomy. For instance, they make no specification that the myocardium surround the left ventricular cavity or that the right ventricle and myocardium should be adjacent.

By the inclusion-exclusion principle, the topology of a multi-class image segmentation is characterised by that of all foreground objects and all possible object pairs: see Equation set (2). For convenience, we collect Equation sets (1) and (2) into a 3D Betti array B : {1, 2, 3} × {1, 2, 3} × {0, 1} → ℝ. Each element Bdij 2 denotes the Betti number of dimension d for the ground truth segmentation Yi⋃j 3. Vitally, even in the absence of the ground truth, B can be determined by prior knowledge of the anatomy to be segmented.

2.2. Topological Loss Function

To expose topological features we apply PH (see [14] for a theoretical background). For a practical understanding of the topological loss described, the results of PH analyses are most easily appreciated by inspection of persistence barcodes. The PH barcode summarises the topological features present within data. However, rather than providing a singular topological description, the barcode returns a dynamic characterisation of the way that the topology evolves as a function of some scale parameter. More concretely, and in our context a barcode reflects the topology of a probabilistic segmentation Ŝ : ℝ × ℝ → [0, 1], binarised at all possible probability thresholds in the range [0, 1]. As the threshold, p, reduces, the barcode diagram tracks the evolving topology of the binarised segmentation Ŝp.

Critical values of p admit changes in the topological features of Ŝp. In Fig. 1, such values are indicated by the endpoints of each bar. Accordingly, the persistence of a topological feature Δp is the length of its associated bar. Moreover, the presentation of each bar indicates the topological dimension of the feature shown: solid bars are connected components; open bars are loops. Persistent bars are considered robust to small perturbations, suggesting that they are true topological features of the data. Hence, in Figs. 1 and 2, we consider barcodes in order of descending lifetime after grouping by topological dimension. From the persistence barcode of the probabilistic segmentation Ŝ, we write the lifetime of the lth most persistent feature of dimension d as Δpd,l(Ŝ).

Fig. 1. Construction of the PH barcode. The barcode reflects the topological features of the probabilistic segmentation Ŝ, when binarised at all possible probability thresholds in the interval [0, 1]. At a particular p: the number of vertically intersected solid bars counts connected components (d = 0); open bars count the number of loops (d = 1). Additionally, each bar is labelled with its topological dimension, and its persistence ranking in order of descending lifetime: d, l.

Fig. 1

Fig. 2.

Fig. 2

Construction of the loss Ltopo. Each probabilistic segmentation (Yi or Yi⋃j), is accompanied by its associated persistence barcode (for clarity, only features with a lifetime Δpd,i ≥ 0.05 are shown). Ltopo weighs the persistence of topological features which match the topological description (Adij; depicted as green bars), against those which do not (Zdij; depicted as red bars). To sensitise Ltopo to multi-class label map topology, the result is repeated for, and summed over all topological dimensions (d), and individual and paired label sets (i, ji).

Importantly, topological persistence can be determined in a fashion that is differentiable and consistent with gradient-based learning [8]. This permits the construction of topological loss functions, exposing the differences between Ŷ and our prior specification B. Key to our formulation is the choice of the probabilistic segmentation S, from which we extract topological features. To align with the theory of Sect. 2.1, we consider the persistence barcode for all foreground class labels and class label pairs (see Fig. 2):

Ltopo =d,i,jiBdijAdij+ZdijAdij=l=1BdijΔpd,l(Y^ij)Zdij=l=Bdij+1Δpd,l(Y^ij) (3)

Adij evaluates the total persistence of the Bdij longest, d-dimensional bars for the probabilistic union of segmentations for classes i and j, Ŝi⋃j (see Footnote 2). Assuming that the inferred segmentation closely approximates the ground truth, and recalling that l ranks topological features in descending order of persistence, Adij measures the presence of anatomically meaningful topological features. Zdij evaluates the persistence of spurious topological features that are superfluous to Bdij. Summing over all topological dimensions d, and considering all class labels i, j = i and class label pairs i, j > i, optimising Ltopo maximises the persistence of topological features which match the prior specification, and minimises those which do not.

As in the single class formulation presented in [5], Ltopo is used to guide test time adaptation of the weights of a pre-trained CNN f(X; θ), seeking an improvement in inferred segmentation topology. A new set of network parameters θn are learned for the individual test case X n. However, since topology is a global property, there are many segmentations that potentially minimise Ltopo. Hence, where Vn is the number of pixels in X n, a similarity constraint limits test time adaptation to the minimal set of modifications necessary to align the segmentation and the topological prior, B:

LTP=Ltopo (f(Xn;θn),B)+λVn|f(Xn;θ)f(Xn;θn)|2 (4)

2.3. Implementation

We apply our loss to a topologically consistent subset of the ACDC [3] training data. Ignoring irregular anatomical appearances at apex and base, we extract three mid ventricular slices from each short axis stack, including end diastolic and systolic frames. We achieve a dataset of 600 examples from 100 subjects. As per the winning submission to the ACDC Challenge, all image - label pairs were resampled to an isotropic pixel spacing of 1.25 mm (less than the mean and median spatial resolution of the training data) and normalised to have zero mean and unit variance [12]. Subjects were randomly divided between training, validation and test sets in the ratio 2:1:1, stratified by diagnostic group according to ACDC classification.

A U-Net model [16] was trained using CE loss and the combined training and validation set of 450 examples, for 16,000 iterations. We used the Adam optimiser with a learning rate of 10–3. Each minibatch contained ten patches of size 352 by 352, randomly cropped from ten different patients, zero padding where necessary. Data augmentation applied random rotations between ±15°. Graphics processing unit (GPU)-accelerated training took nine hours.

Topological post-processing was performed on the inferred multi-class segmentations of the held-out test set. This sought to minimise LTP for the topological priors expressed by Equation sets (1) and (2), and summarised in B. In Eq. (4) we used a value of λ = 1000. Test time adaptation used the Adam optimiser with a learning rate of 10–5 for 100 iterations. In the worst case, topological post-processing required six minutes per short axis slice.

All experiments were implemented using PyTorch: making use of the Topolayer package introduced in [8] for the computation of topological persistence. The reported hyperparameters for both supervised training and topological postprocessing were optimised using the validation set of 150 examples.

3. Results and Discussion

We assess multi-class topological post-processing against several baselines. Each applies a variant of connected component analysis (CCA) or topological postprocessing (TP) to the segmentation inferred by the U-Net trained in a fully supervised fashion (UN et). In all cases, a discrete segmentation is finally achieved by the set of labels which maximise inferred probability on a pixel-wise basis:

UN et the output of the U-Net trained in a supervised manner.
UN et + CCA the largest connected components for each foreground label.
UN et + TPi,j=i our topological loss based on individual class labels only.
UN et + TPi,j≥i our topological loss based on individual and paired labels.

Table 1 presents results on the held-out test set. Spatial overlap is quantified by DSC, averaged across cardiac phases. Topological performance is assessed as the proportion of cases in which the discrete segmentation demonstrated the correct multi-class topology: inferred and ground truth segmentations shared the same set of Betti numbers for individual and paired labels. Finally, we summarise the effect of any post-processing on spatial overlap by the change in DSC.

Table 1.

Segmentation results on held-out test set. Spatial overlap: Dice Similarity Coefficient (DSC) per class; the average over classes (μ); and the change induced by post-processing (Δμ). Topological accuracy: T is the proportion of test images with the correct multi-class topology. σ is the standard deviation.

DSC(σ) T(σ)
rv my lv μ Δμ
UNet 0.891(0.115) 0.885(0.050) 0.954(0.039) 0.910(0.045) 0.853(0.032)
+CCA 0.892(0.113) 0.886(0.049) 0.954(0.039) 0.911(0.044) 0.001(0.004) 0.927(0.024)
+TPi,j=i 0.889(0.130) 0.887(0.048) 0.954(0.039) 0.910(0.049) 0.000(0.012) 0.980(0.013)
+TPi,j≥i 0.889(0.129) 0.888(0.048) O.954(0.039) 0.910(0.049) 0.000(0.011) 1.000(0.000)

Table 1 confirms that pixel-wise metrics of spatial overlap do not reliably predict topological performance. Trained by CE, the supervised model infers a segmentation with incorrect topology in almost 15% of cases. Whilst spurious connected components alone account for approximately 50% of U-Net errors, higher dimensional topological errors are not insignificant. As commonly employed, CCA is insensitive to such features, resolving spurious components by their discrete removal. Our method takes a probabilistic approach to the treatment of priors, modifying inferred segmentations by CNN parameter update. This permits expressive topological refinement, as illustrated in Fig. 3.

Fig. 3. Topological post-processing enables expressive correction of U-Net errors.

Fig. 3

Optimisation with respect to topological priors for individual labels resolves half of high dimensional errors. This approach, +TPi,j=i, reflects the naive extension of the binary segmentation method outlined in [5]. However, by exact binomial test, and after Bonferroni correction, there was no statistically significant difference (95% confidence) in the proportion of topological errors between +TPi,j=i and +CCA (p = 0.025 × 6). The best performing scheme is our proposed model, +TPi,j≥i, which considers priors for all individual and paired labels. The incremental benefit of +TPi,j≥i is shown in Fig. 4. Significantly, and without degradation in DSC, our approach resolves all topological errors which remain after CCA (p = 0.001 × 6).

Fig. 4. Multi-class topological priors capture a rich topological description.

Fig. 4

Compared with losses based on a learned latent space of plausible anatomical shapes [13,15], PH-based loss functions allow optimisation with respect to an explicit topological prior. This is beneficial in terms of interpretability and in low data settings. However, PH-based losses also allow topological prior information to be decoupled from its appearance in training data, on which a learned distribution is necessarily biased. We speculate that additional biases may degrade performance when used to refine the segmentation of out-of-sample test data. Favourably, PH-based priors permit even-handed topological post-processing in the presence of pathology-induced structural variation. At the same time, we recognise the potential complementarity of these approaches if explicit topological specification could be enhanced with learned shape priors.

More generally, the PH losses described are limited by their reliance on an expert-provided prior. Demonstrating our approach on mid-ventricular slices permitted the consistent application of Equation sets (1) and (2). Allowing for a slice-wise specification, the same approach could equally be applied to apical and basal slices, including associated topological changes. At present this would necessitate operator intervention. However, we also observe that this requirement is an artefact of CMR short axis acquisition. Given 3D cine data, we could specify priors which truly reflect the topology of anatomy rather than its appearance within a 2D acquisition.

4. Conclusion

We have extended PH-based losses to the task of multi-class, CNN-based image segmentation. Leveraging an enriched topological prior, including high dimensional and multi-class features, this approach improved segmentation topology in a post-processing framework. Future work will seek to understand its limits with respect to the fidelity of CNN-based segmentation; its application within weakly supervised learning; and its extension to more challenging targets.

Acknowledgments

N. Byrne—Funded by a National Institute for Health Research (NIHR), Doctoral Research Fellowship for this research project. This report presents independent research funded by the NIHR. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Footnotes

The authors have no conflicts of interest to disclose.

1

The semantic classes of this task match those of the ACDC image segmentation challenge [3]. Whilst neither considers the right ventricular myocardium, this could easily be incorporated into the framework set out in Sects. 2.1 and 2.2.

2

In Bdij we divide indices between sub and super scripts to make clear the difference between class labels (i, j) and topological dimension (d), without further significance.

3

We use the union operator (U) to combine individual classes of a multi-class segmentation. When applied to a binary segmentation, Yi⋃j is the pixel-wise Boolean union of classes i and j. When applied to a probabilistic segmentation, Ŷi⋃j is the pixel-wise probability of class i or j. We consider the union of a class with itself to be the segmentation of the single class: Yi⋃j=i = Yi and Ŷi⋃j=i = Ŷi.

References

  • 1.Avendi MR, Kheradvar A, Jafarkhani H. Automatic segmentation of the right ventricle from cardiac MRI using a learning-based approach. Magn Reson Med. 2017;78(6):2439–2448. doi: 10.1002/mrm.26631. [DOI] [PubMed] [Google Scholar]
  • 2.BenTaieb A, Hamarneh G. Topology aware fully convolutional networks for histology gland segmentation. In: Ourselin S, Joskowicz L, Sabuncu MR, Unal G, Wells W, editors. MICCAI 2016, Part II. LNCS. Vol. 9901. Springer; Cham: 2016. pp. 460–468. [DOI] [Google Scholar]
  • 3.Bernard O, et al. Deep learning techniques for automatic MRI cardiac multistructures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging. 2018;37(11):2514–2525. doi: 10.1109/TMI.2018.2837502. [DOI] [PubMed] [Google Scholar]
  • 4.Chen C, et al. Deep learning for cardiac image segmentation: a review. Front Cardiovasc Med. 2020;7:25. doi: 10.3389/fcvm.2020.00025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Clough JR, Byrne N, Oksuz I, Zimmer VA, Schnabel JA, King AP. A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans Pattern Anal Mach Intell. 2020;1 doi: 10.1109/TPAMI.2020.3013679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Clough JR, Oksuz I, Byrne N, Schnabel JA, King AP. Explicit topological priors for deep-learning based image segmentation using persistent homology. In: Chung ACS, Gee JC, Yushkevich PA, Bao S, editors. IPMI 2019 LNCS. Vol. 11492. Springer; Cham: 2019. pp. 16–28. [DOI] [Google Scholar]
  • 7.Duan J, et al. Automatic 3D bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach. IEEE Trans Med Imaging. 2019;38(9):2151–2164. doi: 10.1109/TMI.2019.2894322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gabrielsson RB, Nelson BJ, Dwaraknath A, Skraba P. A topology layer for machine learning; International Conference on Artificial Intelligence and Statistics; 2020. pp. 1553–1563. [Google Scholar]
  • 9.Ganaye P-A, Sdika M, Benoit-Cattin H. Semi-supervised learning for segmentation under semantic constraint. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Vol. 11072. Springer; Cham: 2018. pp. 595–602. (MICCAI 2018, Part III. LNCS). [DOI] [Google Scholar]
  • 10.Haft-Javaherian M, Villiger M, Schaffer CB, Nishimura N, Golland P, Bouma BE. A topological encoding convolutional neural network for segmentation of 3D multiphoton images of brain vasculature using persistent homology; Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; 2020. pp. 990–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.He Y, et al. Fully convolutional boundary regression for retina OCT segmentation. In: Shen D, et al., editors. Vol. 11764. Springer; Cham: 2019. pp. 120–128. (MICCAI 2019, Part I LNCS). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Isensee F, Jaeger PF, Full PM, Wolf I, Engelhardt S, Maier-Hein KH. Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features. In: Pop M, et al., editors. STACOM 2017 LNCS. Vol. 10663. Springer; Cham: 2018. pp. 120–129. [DOI] [Google Scholar]
  • 13.Oktay O, et al. Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE Trans Med Imaging. 2018;37(2):384–395. doi: 10.1109/TMI.2017.2743464. [DOI] [PubMed] [Google Scholar]
  • 14.Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA. A roadmap for the computation of persistent homology. EPJ Data Sci. 2017;6(1):1–38. doi: 10.1140/epjds/s13688-017-0109-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Painchaud N, Skandarani Y, Judge T, Bernard O, Lalande A, Jodoin PM. Cardiac segmentation with strong anatomical guarantees. IEEE Trans Med Imaging. 2020;39:3703–3713. doi: 10.1109/TMI.2020.3003240. [DOI] [PubMed] [Google Scholar]
  • 16.Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Vol. 9351. Springer; Cham: 2015. pp. 234–241. (MICCAI 2015, Part III LNCS). [DOI] [Google Scholar]
  • 17.Ruijsink B, et al. Fully automated, quality-controlled cardiac analysis from CMR. JACC Cardiovasc, Imaging. 2019 doi: 10.1016/j.jcmg.2019.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES