Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 13.
Published in final edited form as: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2023 Aug 22;2023:3323–3333. doi: 10.1109/cvpr52729.2023.00324

Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Shahira Abousamra 1, Rajarsi Gupta 2, Tahsin Kurc 2, Dimitris Samaras 1, Joel Saltz 2, Chao Chen 2
PMCID: PMC11090253  NIHMSID: NIHMS1987947  PMID: 38741683

Abstract

In digital pathology, the spatial context of cells is important for cell classification, cancer diagnosis and prognosis. To model such complex cell context, however, is challenging. Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. We incorporate such structural descriptors into a deep generative model as both conditional inputs and a differentiable loss. This way, we are able to generate high quality multi-class cell layouts for the first time. We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.

1. Introduction

Deep learning has advanced our learning ability in digital pathology. Deep-learning-based methods have achieved impressive performance in various tasks including but not limited to: cell detection and classification [2, 23, 24, 52], nuclei instance segmentation [8, 18, 19, 21, 26, 32-34, 42, 51], survival prediction and patient outcome [1, 28, 30, 49], interpretation of multiplex immunohistochemistry and immunofluorescence imagery [3, 14-16] and many others.

Despite the rapid progress in recent years, pathology image analysis is still suffering from limited observations. The available annotated images are still scarce relative to the highly heterogeneous and complex tumor microenvironment driven by numerous biological factors. The limitation in training data constraints a learning algorithm’s prediction power. To this end, one solution is to train generative models that can generate realistic pathology images to augment existing data. Generative models have been proposed to help learning methods in various tasks such as nuclei segmentation [7, 21], survival prediction [44] and cancer grade prediction [47].

Generating pathology images usually involves two steps: (1) generating spatial layout of cells and (2) filling in stains and textures inside and outside cell nuclei masks. Most existing methods only focus on the second step. They either generate random cell positions [21] or directly copy nuclei masks from existing images [17]. These methods miss the opportunity to learn the rich cell spatial context carrying critical information about cancer biology.

Spatial context includes how different types of cells (tumor, lymphocyte, stromal, etc) are distributed around each other, as well as how they form different structural patterns such as clusters, holes and lineages. Plenty of evidence have demonstrated the importance of spatial context in cancer diagnosis and prognosis [31, 53]. One good example is the clinical significance of tumor infiltrating lymphocytes (TILs), i.e., lymphocytes residing within the border of invasive tumors [37-40]. The spatial distribution of stromal cells in the vicinity of tumor has been shown to be directly related to cancer outcomes [35, 53]. Tumor budding, i.e., the presence of isolated or small clusters of tumor cells at the invasive tumor front, is a prognosis biomarker associated with an increased risk of lymph node metastasis in colorectal carcinoma and other solid malignancies [29]. In prostate cancer tissue samples, plenty of loopy cellular structures are formed corresponding to glands. Their integrity, known as the Gleason score, is a good indicator of cancer progression [48].

Given the biological significance of cell spatial context, we hypothesize that being able to model and generate cell configurations will benefit various downstream tasks. To model the complex cell spatial context, the main challenge is the limited information one can rely on –coordinates and types of cells. This makes it hard for even powerful deep learning methods [27] to learn the underlying distribution. To better model the spatial context, we argue that principled mathematical machinery has to be incorporated into the deep learning framework. Formally, we introduce the classic K-function from spatial statistics [5], as well as the theory of persistent homology [13], to model the spatial distribution of multi-class cells and their structural patterns. These mathematical constructs have been shown to correlate with clinical outcomes [4]. However, they have not been used in the generation of pathology images.

We incorporate these spatial topological descriptors into a deep generative model. Our generative model takes an input pathology image and generates a new cell layout with similar spatial and topological characteristics. To enforce the expected spatial characteristics, we propose a novel cell configuration loss based on the persistent homology and spatial statistics of input cell spatial configuration. The loss compares the generated and the reference cell configurations and match their topology in view of a topological measure called persistence diagram. The loss enforces holes in the generated cell configuration to be one-to-one matched to holes in the reference cell configuration, i.e., having similar shapes and density.

A direct topological matching via persistence diagrams is agnostic of the cell type composition. This is undesirable; we do not want to match a tumor cell hole to a stromal cell hole. To this end, we also incorporate spatial statistics measure, i.e., cross K-functions, into the loss. This way, holes composed of different types of cells are matched properly. Using the generated cell spatial configuration, we generate the nuclei mask, staining and texture.

See Fig. 1 for an illustration of the generation pipeline. Also see Fig. 2 for examples of the generated cell layouts. The generated cell layouts have very similar spatial and structural characteristics as the reference/input image. This is not guaranteed with previous methods using randomly generated masks. In the experiment section, we provide comprehensive comparisons to verify the benefit of our method. We will also show that the augmented images can be used to train downstream tasks such as cell classification.

Figure 1.

Figure 1.

Overview of our multi-class cell context generator.

Figure 2.

Figure 2.

Sample results from our cell layout generator. Our generated samples have similar spatial characteristics as the corresponding reference layouts.

To summarize, our contributions are as follows:

  • We propose the first generative model to learn cell spatial context from pathology images.

  • We introduce multi-class spatial context descriptors based on spatial statistics and topology. These descriptors are used as conditional input for the generator.

  • We propose a novel cell configuration loss function to enforce the desired behavior of spatial distribution and topology. The loss matches holes of generated cell layout and holes of the reference cell layout, in shape, density, and cell type composition.

  • We show that the generated layouts can be used to generate synthetic H&E images for data augmentation. We show the efficacy of the augmentation data in downstream tasks such as cell classification.

We stress that the benefit of modeling cell spatial context is beyond data augmentation. Modeling the spatial context will provide the foundation for better understanding and quantifying the heterogeneous tumor microenvironment, and correlate with genomics and clinical outcomes. This work is one step towards such direction.

2. Related Work

Generative models have been broadly used in medical imaging. Within the context of digital pathology, different methods [7, 21, 43] have been proposed to use generated images as an augmentation for nuclei or tissue structure segmentation. Most of these methods, however, over-look generating spatial configuration of cells. Several methods [7, 20, 21] creates randomly distributed nuclei masks before generating staining and texture. When the downstream task is not nuclei segmentation, one may generate randomly distributed masks of other structures, e.g., glands [12]. Another category of methods generates new images using nuclei masks from reference images and only synthesize the staining and textures [6, 43, 46]. Gong et al. [17] randomly deform the nuclei mask from a reference image. These methods, however, still use the same cell positions. All these methods either use the original cell spatial configuration or generate random configurations. To the best of our knowledge, our method is the first to learn to generate the cell spatial and structural configurations.

Topology-aware losses.

Persistent-homology-based losses have been used to enforce topological constraints in image segmentation and generation [22, 45, 50]. These methods focus on thin structures like vessels and road networks. They are not applicable to modeling structural patterns in cells configuration. Perhaps the closest work to ours is TopoGAN [45]. It learns to generate thin structures whose numbers of holes/loops match those of the real images. The key difference of our method is that our topological features are adapted to the cell layout setting. The persistence diagram is enriched with cell density and multi-class cell composition information. The loss is also based on the enriched persistence diagrams. This is critical to our success.

3. Method

Assume we are given a cell layout, i.e., a set of multi-class cells distributed in the image domain. The spatial configuration of these cells includes their structural organization, as well as the spatial distribution of different cell classes. Given a reference layout, our method generates a multi-class cell layout with a similar configuration. The generated layout can be used for different purposes including data augmentation. Our model takes as input not only the reference layout, but also a set of spatial descriptors collected from the reference layout. Furthermore, for the training of the generator, we propose a loss function to match topological features in the generated and reference layouts. Minimizing such loss will ensure the generated layout has similar structural patterns as the reference layout.

In Section 3.1, we will introduce different spatial descriptors we will use, based on the theory of persistent homology and the classic spatial statistics. In Section 3.2, we will introduce the proposed neural network generator, as well as how these spatial descriptors are incorporated to ensure that the generated layout has a desired configuration.

3.1. Cell Configuration Descriptors

Our configuration descriptors should capture (1) structural patterns such as clusters and holes of a reference cell layout; and (2) how different types of cells are distributed with regard to other types. These structural and multi-class distribution is part of what pathologists study when they inspect a histology image. We formalize such information into two descriptors: cross K-function features and enriched persistence diagram features.

Spatial statistics features: cross K-functions of cells.

We first characterize the relative distribution across different classes of cells, e.g., how close are lymphocytes distributed surrounding tumor cells. We use the cross K-function from the classic spatial statistics [5]. See Fig. 4. Given two cell classes (source class and target class), the cross K-function measures the expected number of neighboring target class cells within different radii of a source class cell. Formally, denote by 𝒞s and 𝒞t the set of source cells and the set of target cells, respectively. The cross K-function at radius r is defined as:

Kst(r)=Acs𝒞sct𝒞t1{dist(cs,ct)<r} (1)

where A is a normalization term depending on the image area and the sizes of 𝒞s and 𝒞t. 1{} is the indicator function.1 The cross K-function is computed for each pair of classes. Note that when the source and target represent the same class, the K-function is measuring how much a particular class of cells is clustered. In practice, we vectorize the K-function by sampling at a finite set of radii. We note that K-function has previously been used in cell classification task [2], but it has not been used for cell configuration characterization and cell layout generation.

Figure 4.

Figure 4.

Ripley’s K function. The K-function considers the number of neighboring target points (cells) of different classes at increasing radii from a source. The K-function can indicate the spatial distribution clustering or scattering. Left: an illustration of the computation of K-function at radius ri. Right: An observed K-function (purple) and a K-function of a Poisson random point process (blue). The distribution is clustered/scattered when the observed K-function is higher/lower than the K-function of the random process, respectively.

We also use a location-specific K-function,

Kt(r,x)=Act𝒞t1{dist(x,ct)<r}. (2)

It describes the distribution of target class cells surrounding a specific location x. This will be used for the characterization of holes identified by persistent homology.

Topological features: enriched cell persistence diagrams.

We propose topological features characterizing gaps and holes distributed in a cell layout. These topological structures provide unique structural characterization of the cell layout, as evident in sample cell layouts (see Fig. 2 second row). We use the theory of persistent homology which captures holes and gaps of various scales in a robust manner. To adapt to cell configuration characterization, we propose to enrich the output of persistent homology with spatial distribution information, so that the topological structures are better characterized.

We briefly introduce persistent homology in the context of cell layout characterization. Please refer to [13] for more details. Given a cell layout 𝒞 with holes in it, we first compute a distance transform from the cells, f(x)=minc𝒞dist(x,c) (see Fig. 3). Holes essentially correspond to salient local maxima of the distance transform. To capture these salient holes, we threshold the image domain ΩR2 with a progressively increasing threshold t. As the threshold increases, the thresholded domain Ωt={xΩf(x)t} monotonically grows from empty to the whole image domain. It essentially simulates the progress of growing disks centered at all cells with an increasing radius t. Through the process, different holes will appear (be born) and eventually are sealed up (die). Persistent homology captures all these holes and encodes their information into a 2D point set called a persistence diagram. Each hole in the cell layout corresponds to a 2D point within the diagram, whose coordinates are the birth and death times (thresholds) of the hole. Holes with long life spans are considered more salient. See Fig. 3 for an illustration of the filtration and the corresponding persistence diagram. Note we only focus on 1D topology, i.e., holes. Clusters of cells can also be described with 0D topology (connected components in the growing Ωt). We do not think 0D topology is necessary as the spatial statistics feature implicitly characterizes the cell cluster structures.

Figure 3.

Figure 3.

Illustration of the filtration process of the distance transform map to obtain the persistence homology. Red dots are the tumor cells in the original image. The blue dots in the last figure (f) are the centers for the holes, the saddle points which are obtained once a hole dies or disappears.

While persistent homology captures all possible holes in a cell layout, the persistence diagram alone does not really describe the holes in full details. Intuitively, the birth and death times of a point in the diagram only measure the compactness along the boundary and the size of the hole. We propose to enrich the diagram with additional information regarding density and spatial statistics. In particular, for each hole, we focus on its corresponding local maximum of the distance transform. Note this local maximum is the location at which the hole disappears (dies), and its function value is the death time of the hole. It roughly represents the center of the hole. We compute the location-specific K-function (Eq. (2)) for the local maximum. It essentially characterizes cell class composition surrounding the hole. Furthermore, we compute the multi-scale cell density function at the local maximum, namely, the cell kernel density function estimated with different bandwidths. As shown in Fig. 5, these multi-scale density functions characterize cell distribution at different scales regarding the hole of interest.

Figure 5.

Figure 5.

Illustration of the multi-scale density maps generated from a dot map of cells with different Gaussian kernel standard deviations.

By attaching the spatial statistic and multi-scale density with each persistent point, we compute an enriched cell persistence diagram. See Fig. 3 for an illustration. This diagram will be used in our cell configuration loss. Details will be introduced in Sec. 3.2.

3.2. Deep Cell Layout Generator

Next, we introduce our deep cell layout generator. The framework is illustrated in Fig. 6. From a reference cell layout, we extract spatial descriptors, including persistence diagrams and spatial statistics. We compute different diagrams for different cell classes separately. These diagrams are all used. The generator takes in vectorized spatial descriptors and style noise, and outputs the coordinates of points in the generated layout. To vectorize a persistence diagram, we convert it into a histogram with predefined buckets of persistence range values. This takes care of the variation in persistence diagram size across different point sets. Since larger 1D topological features (i.e. holes) usually have smaller frequency compared to smaller ones, we use the log of the histogram to account for the tail effect.

Figure 6.

Figure 6.

Training of our Cell Layout Generator.

The generator backbone model is a modified version of a state-of-the-art point cloud generative model called SP-GAN [27]. SP-GAN is trained with global and local priors. The global prior is the initial input point coordinates which are a fixed set of points sampled from a unit sphere. The local prior is a latent encoding that determines the style. The generator architecture consists of a set of graph attention modules that act on the global prior or points coordinates, intervened with adaptive instance normalization blocks using the local prior or the style embedding. The final generated point coordinates are the output of an MLP block. Note that our method is agnostic to the backbone. In principle, we can use any other conditional generative model.

Next we introduce a novel loss function that enforces the generated layout to have a matching configuration with the reference layout.

The cell configuration loss.

We define the cell configuration loss as the matching distance between the enriched cell persistence diagrams of the generated layout and the reference layout. This is an extension of the classic Wasserstein distance between persistence diagrams [10, 11].

Recall in an enriched diagram, each point (representing a hole) has not only birth/death times, but also additional attributes including location-specific cross K-function and multi-scale density function. We use K-function to match generated holes and reference holes. Then we use the density function between matched holes as a loss to control the generated points. This design choice is well justified; K-function helps identify holes with matching contexts. Meanwhile, for matched holes, using multi-scale density functions as the loss can more efficiently push generated points, thus improving the generator.

In particular, we compute the persistence diagrams Dgmgen and Dgmref from the generated and reference layouts, respectively. Next, we find an optimal matching between the two diagrams. Assume the two diagrams have the same cardinality. We compute

γ=argminγΓpDgmgendistK(p,γ(p)), (3)

where Γ is the set of all one-to-one mapping between the two diagrams. The distance distK(p,γ(p)) is the Euclidean distance between the K-function vectors of the two holes represented by the persistence points pDgmgen and γ(p)Dgmref. In other words, we find an optimal matching between the diagrams using the K-function distance between holes. If the two diagrams have different cardinalities, the unmatched holes will be matched to a dummy hole with zero persistence.

Once the optimal matching is found, we define the configuration loss as

LCC=pDgmgendistden(p,γ(p)), (4)

in which distden(p,γ(p)) is the distance between the multi-scale density of the two matched holes.

During training, for each pair of generated and reference layouts, we compute their enriched diagrams and find the optimal matching γ using Hungarian method. Then we optimize the loss in Eq. 4. This essentially moves points in the generated layout so that each hole has a similar multi-scale density as its matched hole. Fig. 6 illustrates the loss.

4. Experiments

4.1. Implementation Details

Layout generator.

Our cell layout generator model is based on the point cloud generator SP-GAN [27]. We make several changes to the model to make it suitable for the conditional cell layout generation task. SP-GAN takes as input a fixed 3D point cloud in the form of a unit sphere. Instead, We have varying size 2D points with their pre-assigned classes. The coordinates for the points in each class are equally distributed in a mesh grid in the range (−1, 1) with a small normal perturbation. The mesh size varies for each class based on the number of points in the class so the points end up covering the space. The conditioning spatial descriptors are transformed to a 32 dimensional vector embedding before being attached to every point. Last, to account for the variation in input sizes, we employ instance norm instead of batch norm. We use 2 discriminators; D and Dc for adversarial training. D discriminates a layout or a set of coordinates as real/fake and is ignorant of the points classes. Dc is similar but takes also the points classes into account. Using least squares GAN loss, the Generator loss is a weighted sum of the GAN losses and the cell configuration loss.

𝒢=12[(D(P^)1)2+(Dc(P^c)1)2+(LCC(P^c,Pc))] (5)

where P and P^ are the reference real, and generated point sets, respectively. PcandP^c are the reference real, and generated point sets along with the class assigned to each point.

Layout to Image Generator.

The trained cells’ layout generator provides realistic cell layouts that can be used by themselves as augmentations to train a machine learning model on multi-class point patterns. However, to use the generated layouts to provide more complex image augmentation than flipping, rotation, or stain/style transfer, we need to transform the layouts into images. To do that, we create a layout-to-image generator model based on the pix2pix model [25], and its variation in the biomedical domain [9]. There are 3 main differences from [25] in our setting, first we do not have an exact mask of the cell shapes, we only have their coordinates. Second, while [25] learns a specific mapping from one domain to the other, here the generated images are expected to have a similar texture as a reference input H&E image, and last, there are relatively few annotated images for training with biomedical data.

To generate an image from the cells coordinates, we first create binary images; one per class with a point at each cell location and dilate these points to give us a visible layout. These multi-channel cell layout together with a reference H&E image are the input to the Layout-to-Image generator. The output is an image with the similar texture as the reference H&E image and has the same cells distribution as the layout image. The model is trained adversarially with multi-scale discriminators that classify whether an image is real or fake and whether 2 images come from the same slide. For annotated images, we use L1 reconstruction loss, in addition to a perceptual loss, similar to [9].

4.2. Dataset

We use the breast cancer dataset, BRCA-M2C [2]. It consists of 120 patches belonging to 113 patients, collected from TCGA [41]. The patches are ≈ 500 × 500 pixels at 20x magnification, which is large enough to provide spatial context. The annotations are in the form of dot annotations at the approximate centers of cells. Each dot is assigned one of 3 main cell classes: inflammatory, epithelial, or stromal.

4.3. Evaluation of Cell Layout Generation

To evaluate the quality of the generated cell layouts, we propose a set of metrics to measure the similarity of the spatial distributions of the generated and the reference layouts. We focus on both topology and spatial statistics.

For topology, we compare a generated layout and its reference layout by comparing their persistence diagrams. Our evaluation is carried out class-by-class. We compare diagrams for each class of cells, and aggregate the scores. To compare two diagrams, we use two metrics: Earth Mover’s Distance (PD-EMD) and Cell Configuration Matching Distance (PD-CCMD). PD-EMD is agnostic to the multi-class spatial configuration surrounding each hole and so it may not give an accurate evaluation. To get a better evaluation, we propose to use the cell configuration matching distance in PD-CCMD, which is designed to take into account the spatial configuration. It matches the holes in the generated and reference layouts using the optimal K-function matching γ as in Eq. (3). Next, PD-CCMD computes the mean distance between the persistence of the matched holes. Note that the persistence distance is ignored when the number of cells in the class is very small (less than 5), and is assumed to be zero. This is because when there are very few points, the distance is unreliable and greatly affected by where the points are located with respect to the border and hence can result in very large unreasonable distance values. Table 1 shows the PD-EMD and PD-CCMD metrics for each class of cells and their mean.

Table 1.

Evaluation of persistence diagram of generated cell layout compared to reference layouts in BRCA-M2C dataset.

PD - EMD ↓ PD - CCMD ↓
Method Infl. Epi. Stro. Mean Infl. Epi. Stro. Mean
w/o Spatial Descriptors +w/o Matching Loss 0.28 0.082 0.19 0.184 0.80 1.74 1.66 1.4
w/o Matching Loss 0.249 0.203 0.156 0.202 0.90 1.69 1.79 1.46
w/o K-function Descriptor 0.237 0.167 0.17 0.191 0.75 1.74 1.77 1.42
Ours 0.246 0.141 0.165 0.184 0.74 1.64 1.71 1.36

To evaluate the spatial co-localization across different classes, we use the cross K-function. For each class of cells, cross K-functions are computed with that class as the source and different target classes, creating a high dimensional vector. The mean absolute error (MAE) and root mean squared error (RMSE) are computed between the generated and real layouts vectors, as shown in Table 2. The distance is normalized by the vector dimensions and the expected number of cells from one class in a patch.

Table 2.

Evaluation of Cross K-function of generated cell layout compared to the reference layouts in BRCA-M2C dataset.

Cross K-function MAE ↓ Cross K-function RMSE ↓
Method Infl. Epi. Stro. Mean Infl. Epi. Stro. Mean
w/o Spatial Descriptors +w/o Matching Loss 0.555 0.096 0.424 0.359 0.829 0.127 0.666 0.541
w/o Matching Loss 0.592 0.126 0.402 0.373 0.861 0.176 0.683 0.573
w/o K-function Descriptor 0.417 0.154 0.431 0.334 0.602 0.226 0.583 0.470
Ours 0.413 0.146 0.357 0.306 0.611 0.201 0.509 0.440

We evaluate the generated layouts using our proposed metrics in Table 1 and Table 2. We compare our proposed method to models trained: (a) without spatial descriptors and without the cell configuration loss, (b) with spatial descriptors but without cell configuration loss, (c) with the cell configuration loss but without the K-function spatial descriptor. Adding the cell configuration loss improves performance, and the best result uses both the K-functions and the multi-scale densities. We also observe that without the cell configuration loss, the model tends to collapse, generating almost identical layouts as the reference layouts.

4.4. Cell Layout Generation for Augmentation

We test the generated cell layouts on the downstream cell classification task on the BRCA-M2C dataset. We use the multi-class cell layouts generated by our cell layouts generator and use the layout-to-image generator to transform into H&E images that can be used for data augmentation. To train the layout-to-image generator, we use the labeled patches in BRCA-M2C in the reconstruction and perceptual losses. We extract additional patches from TCGA; some from nearby the annotation regions in the same slides and others randomly sampled from different slides, and ensure that sampled patches do not belong to the background. During training, the reference texture patch and the reference cell layout patch may belong to the same slide or may come from different slides. The perceptual and reconstruction losses are only applied when they are from the same slide, while the adversarial losses are applied in both cases. To generate the H&E images for augmentation, we generate cell layouts and apply a postprocessing to remove overlapping cells. The final layout along with varying reference H&E patches are transfomred into H&E patches with different styles, see Figure 7.

Figure 7.

Figure 7.

Sample Results from Cells’ Layout-to-Image Generator

We train U-Net [36] and MCSpatNet [2] on the BRCA-M2C dataset in addition to augmentation data generated from our models and from random cell layouts. The loss is weighted based on whether it is real or generated data, giving the generated data a lower weight of 0.5. Table 3 shows the F-score comparing both models trained with and without our data augmentation. We see that the augmentation improves the performance, especially with the U-Net model, with greater improvement when using our generated augmentation data. For MCSpatNet, the Stromal cells F-score is below the F-score without the augmentation. We attribute this to the quality of the image generation. The image generation model needs to learn how each type of cells appear and that is a challenging task specially with Stromal cells. They are often hard to classify without uncertainty even by expert pathologists.

Table 3.

F-scores on the cell classification task, comparing models trained with only manually labeled data to models trained with additional data augmentation from random cell layout (Rand.) and from our generated cell layout (Ours).

Method Infl. Epi. Stro. Mean
U-Net 0.498 0.744 0.476 0.572
U-Net + Aug. (Rand.) 0.625 0.735 0.472 0.611
U-Net + Aug. (Ours) 0.65 0.768 0.511 0.644
MCSpatNet 0.635 0.785 0.553 0.658
MCSpatNet + Aug. (Rand.) 0.652 0.772 0.506 0.644
MCSpatNet + Aug. (Ours) 0.678 0.8 0.522 0.667

5. Conclusion

In this paper, we propose the first generative model for digital pathology to explicitly generate cell layout with desirable configuration. We focus on topological pattern and spatial distribution of multi-class cells, and compute configuration descriptors based on classic spatial statistics and theory of persistent homology. Using these descriptors, and by proposing a novel cell configuration loss, our generator can effectively generate new cell layouts based on a reference cell layout. We show through qualitative and quantitative results that our method generates cell layouts with realistic spatial and structural distribution. We also use our method to augment H&E images, thus improving the performance in downstream tasks such as cell classification.

Acknowledgement.

This work was support by the NSF grants CCF-2144901, IIS-2123920 and IIS-2212046, the National Institutes of Health (NIH) and National Cancer Institute (NCI) grants UH3CA225021, U24CA215109, 5R01CA253368, as well as generous private fund from Bob Beals and Betsy Barton.

Footnotes

1

We simplify the definition of K-function by ignoring the edge correction term.

References

  • [1].Abbet Christian, Zlobec Inti, Bozorgtabar Behzad, and Thiran Jean-Philippe. Divide-and-rule: Self-supervised learning for survival analysis in colorectal cancer. In Medical Image Computing and Computer Assisted Intervention (MICCAI), page 480–489, 2020. [Google Scholar]
  • [2].Abousamra Shahira, Belinsky David, Van Arnam John, Allard Felicia, Yee Eric, Gupta Rajarsi, Kurc Tahsin, Samaras Dimitris, Saltz Joel, and Chen Chao. Multi-class cell detection using spatial context representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Abousamra Shahira, Fassler Danielle, Hou Le, Zhang Yuwei, Gupta Rajarsi, Kurc Tahsin, Escobar-Hoyos Luisa F, Samaras Dimitris, Knudson Beatrice, Shroyer Kenneth, Saltz Joel, and Chen Chao. Weakly-supervised deep stain decomposition for multiplex ihc images. In IEEE International Symposium on Biomedical Imaging (ISBI), pages 481–485, 2020. [Google Scholar]
  • [4].Aukerman Andrew, Carrière Mathieu, Chen Chao, Gardner Kevin, Rabadán Rául, and Vanguri Rami. Persistent homology based characterization of the breast cancer immune microenvironment: a feasibility study. Journal of Computational Geometry, 12(2):183–206, 2021. [Google Scholar]
  • [5].Baddeley Adrian, Rubak Ege, and Turner Rolf. Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, 2015. [Google Scholar]
  • [6].Boyd Joseph, Villa Irène, Mathieu Marie-Christine, Deutsch Eric, Paragios Nikos, Vakalopoulou Maria, and Christodoulidis Stergios. Region-guided cyclegans for stain transfer in whole slide images. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 356–365, 2022. [Google Scholar]
  • [7].Butte Sujata, Wang Haotian, Xian Min, and Vakanski Aleksandar. Sharp-gan: Sharpness loss regularized gan for histopathology image synthesis. In IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Chamanzar A and Nie Y. Weakly supervised multi-task learning for cell detection and segmentation. In IEEE International Symposium on Biomedical Imaging (ISBI), 2020. [Google Scholar]
  • [9].Chang Qi, Qu Hui, Zhang Yikai, Sabuncu Mert, Chen Chao, Zhang Tong, and Metaxas Dimitris N. Synthetic learning: Learn from distributed asynchronized discriminator gan without sharing medical image data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13856–13866, 2020. [Google Scholar]
  • [10].Cohen-Steiner David, Edelsbrunner Herbert, and Harer John. Stability of persistence diagrams. In Proceedings of the twenty-first annual symposium on Computational geometry, pages 263–271, 2005. [Google Scholar]
  • [11].Cohen-Steiner David, Edelsbrunner Herbert, Harer John, and Mileyko Yuriy. Lipschitz functions have l p-stable persistence. Foundations of computational mathematics, 10(2):127–139, 2010. [Google Scholar]
  • [12].Deshpande Srijay, Minhas Fayyaz, Graham Simon, and Rajpoot Nasir. Safron: Stitching across the frontier network for generating colorectal cancer histology images. Medical Image Analysis, 77:102337, 2022. [DOI] [PubMed] [Google Scholar]
  • [13].Edelsbrunner Herbert and Harer John. Computational Topology - an Introduction. American Mathematical Society, 2010. [Google Scholar]
  • [14].Fassler Danielle J., Abousamra Shahira, Gupta Rajarsi, Chen Chao, Zhao Maozheng, Paredes David, Batool Syeda Areeha, Knudsen Beatrice S., Escobar-Hoyos Luisa F., Shroyer Kenneth R, Samaras Dimitris, Kurç Tahsin M., and Saltz J. Deep learning-based image analysis methods for brightfield-acquired multiplex immunohistochemistry images. Diagnostic Pathology, 15, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Ghahremani Parmida, Li Yanyun, Kaufman Arie, Vanguri Rami, Greenwald Noah, Angelo Michael, Hollmann Travis J, and Nadeem Saad. Deep learning-inferred multiplex immunofluorescence for immunohistochemical image quantification. Nature Machine Intelligence, 4(4):401–412, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Ghahremani Parmida, Marino Joseph, Dodds Ricardo, and Nadeem Saad. Deepliif: An online platform for quantification of clinical pathology slides. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21399–21405, 2022. [PMC free article] [PubMed] [Google Scholar]
  • [17].Gong Xuan, Chen Shuyan, Zhang Baochang, and Doermann David. Style consistent image generation for nuclei instance segmentation. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3993–4002, 2021. [Google Scholar]
  • [18].Graham S, Epstein D, and Rajpoot N. Dense steerable filter cnns for exploiting rotational symmetry in histology images. IEEE Transactions on Medical Imaging, 39(12):4124–4136, 2020. [DOI] [PubMed] [Google Scholar]
  • [19].Graham Simon, Vu Quoc Dang, Raza Shan E Ahmed, Azam Ayesha, Tsang Yee Wah, Kwak Jin Tae, and Rajpoot Nasir. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58:101563, 2019. [DOI] [PubMed] [Google Scholar]
  • [20].Hossain Md Shamim and Sakib Nazmus. Renal cell cancer nuclei segmentation from histopathology image using synthetic data. In IEEE International Colloquium on Signal Processing & Its Applications (CSPA), pages 236–241, 2020. [Google Scholar]
  • [21].Hou Le, Agarwal Ayush, Samaras Dimitris, Kurc Tahsin M., Gupta Rajarsi R., and Saltz Joel H.. Robust histopathology image analysis: To label or to synthesize? In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8525–8534, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Hu Xiaoling, Li Fuxin, Samaras Dimitris, and Chen Chao. Topology-preserving deep image segmentation. In Advances in Neural Information Processing Systems (NeurIPS), pages 5658–5669, 2019. [Google Scholar]
  • [23].Hung Jane, Goodman Allen, Ravel Deepali, Lopes Stefanie, Rangel Gabriel, Nery Odailton, Malleret Benoît, Nosten Francois, Lacerda Marcus, Ferreira Marcelo, Renia Laurent, Duraisingh Manoj, Costa Fabio, Marti Matthias, and Carpenter Anne. Keras r-cnn: library for cell detection in biological images using deep neural networks. BMC Bioinformatics, 21:300, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Höfener Henning, Homeyer André, Weiss Nick, Molin Jesper, Lundström Claes F., and Hahn Horst K.. Deep learning nuclei detection: A simple approach can deliver state-of-the-art results. Computerized Medical Imaging and Graphics, 70:43–52, 2018. [DOI] [PubMed] [Google Scholar]
  • [25].Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A. Image-to-image translation with conditional adversarial networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [Google Scholar]
  • [26].Kumar N, Verma R, Sharma S, Bhargava S, Vahadane A, and Sethi A. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging, 36(7):1550–1560, 2017. [DOI] [PubMed] [Google Scholar]
  • [27].Li Ruihui, Li Xianzhi, Hui Ke-Hei, and Fu Chi-Wing. Spgan: sphere-guided 3d shape generation and manipulation. ACM Transactions on Graphics (Proc. SIGGRAPH), 40(4), 2021. [Google Scholar]
  • [28].Liu Huidong and Kurc Tahsin. Deep learning for survival analysis in breast cancer with whole slide image data. Bioinformatics, 38(14):3629–3637, 06 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Lugli Alessandro, Zlobec Inti, Berger Martin D, Kirsch Richard, and Nagtegaal Iris D. Tumour budding in solid cancers. Nature Reviews Clinical Oncology, 18(2):101–115, 2021. [DOI] [PubMed] [Google Scholar]
  • [30].Mobadersany Pooya, Yousefi Safoora, Amgad Mohamed, Gutman David A., Barnholtz-Sloan Jill S., Vega José E. Velázquez, Brat Daniel J., and Cooper Lee A. D.. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13):E2970–E2979, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Nawaz Sidra and Yuan Yinyin. Computational pathology: Exploring the spatial dimension of tumor ecology. Cancer letters, 380(1):296–303, 2016. [DOI] [PubMed] [Google Scholar]
  • [32].Naylor Peter, Laé Marick, Reyal Fabien, and Walter Thomas. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Medical Imaging, 38(2):448–459, 2019. [DOI] [PubMed] [Google Scholar]
  • [33].Qu Hui, Wu Pengxiang, Huang Qiaoying, Yi Jingru, Riedlinger Gregory M., De Subhajyoti, and Metaxas Dimitris N.. Weakly supervised deep nuclei segmentation using points annotation in histopathology images. In MIDL, 2019. [DOI] [PubMed] [Google Scholar]
  • [34].Raza Shan E Ahmed, Cheung Linda, Shaban Muhammad, Graham Simon, Epstein David, Pelengaris Stella, Khan Michael, and Rajpoot Nasir M.. Micro-Net: A unified model for segmentation of various objects in microscopy images. Medical Image Analysis, 52:160–173, 2019. [DOI] [PubMed] [Google Scholar]
  • [35].Rogojanu R, Thalhammer T, Thiem U, Heindl A, Mesteri I, Seewald A, Jäger W, Smochina C, Ellinger I, and Bises G. Quantitative Image Analysis of Epithelial and Stromal Area in Histological Sections of Colorectal Cancer: An Emerging Diagnostic Tool. Biomed Res Int, 2015:569071, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Ronneberger O, Fischer P, and Brox T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015. [Google Scholar]
  • [37].Salgado Roberto, Denkert Carsten, Demaria S, Sirtaine N, Klauschen F, Pruneri Giancarlo, Wienert S, Van den Eynden Gert, Baehner Frederick L, Pénault-Llorca Frederique, et al. The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an international tils working group 2014. Annals of oncology, 26(2):259–271, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Saltz Joel, Gupta Rajarsi, Hou Le, Kurc Tahsin, Singh Pankaj, Nguyen Vu, Samaras Dimitris, Shroyer Kenneth R, Zhao Tianhao, Batiste Rebecca, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Reports, 23(1):181–193, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Shibutani M, Maeda K, Nagahara H, Fukuoka T, Iseki Y, Matsutani S, Kashiwagi S, Tanaka H, Hirakawa K, and Ohira M. Tumor-infiltrating Lymphocytes Predict the Chemotherapeutic Outcomes in Patients with Stage IV Colorectal Cancer. In Vivo, 32(1):151–158, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Stanton Sasha E. and Disis Mary L.. Clinical significance of tumor-infiltrating lymphocytes in breast cancer. Journal for ImmunoTherapy of Cancer, 4(1), 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].TCGA development team. The Cancer Genome Atlas. https://tcga-data.nci.nih.gov/docs/publications/tcga/.
  • [42].Tian Kuan, Zhang Jun, Shen Haocheng, Yan Kezhou, Dong Pei, Yao Jianhua, Che Shannon, Luo Pifu, and Han Xiao. Weakly-supervised nucleus segmentation based on point annotations: A coarse-to-fine self-stimulated learning strategy. In Medical Image Computing and Computer Assisted Intervention (MICCAI), 2020. [Google Scholar]
  • [43].Tsirikoglou Apostolia, Stacke Karin, Eilertsen Gabriel, and Unger Jonas. Primary tumor and inter-organ augmentations for supervised lymph node colon adenocarcinoma metastasis detection. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 624–633, 2021. [Google Scholar]
  • [44].Uemura Tomoki, Watari Chinatsu, Näppi Janne J., Hironaka Toru, Kim Hyoungseop, and Yoshida Hiroyuki. GAN-based survival prediction model from CT images of patients with idiopathic pulmonary fibrosis. In Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications, volume 11318, page 113181F. International Society for Optics and Photonics, SPIE, 2020. [Google Scholar]
  • [45].Wang Fan, Liu Huidong, Samaras Dimitris, and Chen Chao. Topogan: A topology-aware generative adversarial network. In Vedaldi Andrea, Bischof Horst, Brox Thomas, and Frahm Jan-Michael, editors, Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III, volume 12348 of Lecture Notes in Computer Science, pages 118–136. Springer, 2020. [Google Scholar]
  • [46].Wang Haotian, Xian Min, Vakanski Aleksandar, and Shareef Bryar. SIAN: style-guided instance-adaptive normalization for multi-organ histopathology image synthesis. CoRR, abs/2209.02412, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Wei Kaimin, Li Tianqi, Huang Feiran, Chen Jinpeng, and He Zefan. Cancer classification with data augmentation based on generative adversarial networks. Front. Comput. Sci, 16(2), apr 2022. [Google Scholar]
  • [48].Wright Jonathan L., Salinas Claudia A., Lin Daniel W., Kolb Suzanne, Koopmeiners Joseph, Feng Ziding, and Stanford Janet L.. Prostate cancer specific mortality and gleason 7 disease differences in prostate cancer outcomes between cases with gleason 4 + 3 and gleason 3 + 4 tumors in a population based cohort. The Journal of Urology, 182(6):2702–2707, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Wulczyn Ellery, Steiner David F., Xu Zhaoyang, Sadhwani Apaar, Wang Hongwu, Flament-Auvigne Isabelle, Mermel Craig H., Chen Po-Hsuan Cameron, Liu Yun, and Stumpe Martin C.. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLOS ONE, 15(6):1–18, June 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Yang Jiaqi, Hu Xiaoling, Chen Chao, and Tsai Chialing. 3d topology-preserving segmentation with compound multi-slice representation. In IEEE International Symposium on Biomedical Imaging (ISBI), pages 1297–1301, 2021. [Google Scholar]
  • [51].Yoo Inwan, Yoo Donggeun, and Paeng Kyunghyun. Pseudoedgenet: Nuclei segmentation only with point annotations. In Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019. [Google Scholar]
  • [52].Yousefi S and Nie Y. Transfer learning from nucleus detection to classification in histopathology images. In IEEE International Symposium on Biomedical Imaging (ISBI), 2019. [Google Scholar]
  • [53].Yuan Yinyin, Failmezger Henrik, Rueda Oscar M., Ali H. Raza, Gräf Stefan, Chin Suet-Feung, Schwarz Roland F., Curtis Christina, Dunning Mark J., Bardwell Helen, Johnson Nicola, Doyle Sarah, Turashvili Gulisa, Provenzano Elena, Aparicio Sam, Caldas Carlos, and Markowetz Florian. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Science Translational Medicine, 4(157):157ra143–157ra143, 2012. [DOI] [PubMed] [Google Scholar]

RESOURCES