Abstract
The morphology of individual cells can reveal much about the underlying states and mechanisms in biology. In tumor environments, the interplay among different cell morphologies in local neighborhoods can further improve this characterization. In this paper, we present an approach based on representation learning to capture similarities and subtle differences in cells positive for γH2AX, a common marker for DNA damage. We demonstrate that texture representations using GLCM and VAE-GAN enable profiling of cells in both singular and local neighborhood contexts. Additionally, we investigate a possible quantification of immune and DNA damage response interplay by enumerating CD8+ and γH2AX+ on different scales. Using our profiling approach, regions in treated tissues can be differentiated from control tissue regions, demonstrating its potential in aiding quantitative measurements of DNA damage and repair in tumor contexts.
Keywords: cell morphology, DNA damage, 3D histology, representation learning
1. Introduction
The analysis of cell morphology is an important task in biology and is an active area of research where techniques are being developed for different imaging modalities and domains [6]. The characterization of morphology can give clues on cell functions and responses [9, 15]. Different cellular processes can alter the morphology of the cell. Importantly, morphology allows assignment of biological effects due to, for example, drug treatment, to be assigned to specific cell types in a tissue under study. This process is performed by pathologists and automated methods that can help them profile morphology of cells and tissues can have big impact in experimental medicine.
One cell process that manifests differently, in terms of morphology, due to many underlying factors is DNA damage and repair. Normally, a DNA-damaged cell will initiate repair once it detects alterations such as breaks, fragmentation, translocation and deletions in the DNA. If excessive damage has occurred, programmed cell death can be triggered. In tumor contexts, studying DNA repair mechanisms is specially important as evidences of impaired repairing capability in tumor cells have been presented [1]. Dysregulation of DNA damage response (DDR) promotes genomic instability, increased mutation rate, and intra-tumor heterogeneity [4, 3]. Inhibition of DNA repair by targeting specific repair enzymes, e.g. topoisomerases and PARP, has proven to be a useful strategy for treating patients with solid tumors, and a number of drugs are widely used in clinical practice.
Our understanding of DDR defects in tumors is not yet complete but several anti-cancer therapies already exploit this mutation. Ionizing radiation and chemotherapy aim to induce cell death by damaging DNA. This process however is not guaranteed as tumor cells can still initiate DNA repair pathways and become resistant to these therapies. Inhibiting such pathways and specific targeting of tumor cells can therefore enhance the anticancer effect of DNA damage-based therapy [17].
To further efforts in this area, the quantification of DNA damage is the subject of many previous studies. DNA damage in cells is frequently investigated through imaging with a marker for γH2AX [13, 20, 7, 8, 24]. Traditionally, the quantification is done by counting the foci in cells either manually or automatically, or by counting the number of cells in an image field that are positive for γH2AX signal above some cutoff value assigned on a per cell basis [2, 23].
In this work, we investigated the use of representation learning and texture features computed from fluorescence intensity statistics (gray-level co-occurrence matrices, GLCM) and latent encoding using a deep learning model (VAE-GAN) towards building profiles of cells in 3D histology. Representation learning reduces the need for manual counting and measurements to capture similarities and dissimilarities in data points. Using these discovered features from representation learning in an unsupervised manner enables building phenotypes without the need for annotation or additional fluorescent markers. We deviate from previous works by profiling DNA damage using visual texture, instead of directly computing for foci density. We analyzed 3D volumes of 125μm thick 4T1 tissue sections, identified cells and cell clusters with DNA damage, and extended the characterization to local neighborhoods of cells, looking at proximity of immune cells (CD8+) and γH2AX+ cells in both control and treated samples.
We highlight the contribution of this work as follows:
State-of-the-art artificial intelligence methods are integrated with thick 3D histology analysis by developing a seamless pipeline, from segmentation of 3D 4T1 histology volumes to profiling DNA damage in local neighborhoods of cells.
Representations of morphological variations of DNA damage in cells are constructed using a VAE-GAN model and GLCM features.
Clustering on the extracted features using GLCM and VAE-GAN provided pseudo-labels for texture classes that can be used to characterize tissue regions.
Local neighborhood profiles that were constructed using our analysis pipeline and the proximity information on CD8+ and γH2AX+ cells reveal differences between control and treated classes, demonstrating potential in refining quantification of experiments.
2. Methods
2.1. Data
In our experiments, we analyzed four 4T1 mouse tumor tissue samples. Thick tissue biopsy sections (125μm) were cleared and imaged using standard confocal microscopy at 63× magnification. Tissues were treated with irradiation, indomethacin, and L-NAME. Cells were stained for γH2AX, CD8, pan proteins, and DAPI.
From the acquired 3D confocal images, regions of size 302px×302px×130px (W×H×D) with high level of fluorescence for γH2AX and CD8 were manually curated and extracted. Twelve sub-volumes were selected from the control and 14 sub-volumes were selected from the treated samples. The number of cells in each selected region ranged from 500 to 900.
2.2. Segmentation
The first step in our analysis is to isolate individual nuclei in the 3D volumes, one of which is shown in Figure 1A. We used a pre-trained Cellpose [22] model for this segmentation task. Cellpose is a segmentation model for biological images based on deep learning. The segmentation is done by estimating flow gradients from cell centroids and reconstructing cell outlines by tracking these gradients. The model has been demonstrated to generalize well to many different microscopy data and can be used without the need for retraining. Figure 1E shows a slice of a 3D Cellpose nuclei segmentation of a histology volume using its DAPI channel (shown in Figure 1B).
Fig. 1.
A 3D image volume (A) from the dataset used in this work (CD8 in red, γH2AX in yellow, pan protein in green, DAPI in blue), z-slices of individual channels (B-D) and nuclei segmentation (E-G). Using the DAPI channel (B), Cellpose was used to generate segmentation of all nuclei (E). Using this segmentation and threshold masks on the γH2AX channel (C) and the CD8 channel (D), γH2AX+ cells and CD8+ are identified (F,G).
After segmenting the nuclei using Cellpose, we wanted to focus on cells positive for γH2AX (see Figure 1C). We observed that simple thresholding on the γH2AX channel is sufficient to segment γH2AX+ nuclei and using the Cellpose model on the γH2AX channel actually resulted in “hallucinated” nuclei, false segmentations in areas without nuclei. One drawback of the thresholding, however, is that cell outlines are rougher compared to the Cellpose output. To solve this, we superimposed the thresholded γH2AX volumes with the previous nuclei segmentation. Cells with a significant overlap in this superimposition are retained in the final γH2AX segmentation mask. Figure 1F shows segmented γH2AX+ nuclei.
For each of the identified γH2AX+ nuclei, we extracted bounding boxes to be used for the computation of texture feature representation. A total of 372 γH2AX+ nuclei were extracted with high confidence. Augmentation was done to increase this number for training our model.
We also applied a similar segmentation approach to the CD8 channel (see Figures 1D and 1G) to isolate CD8+ cells. As the immunomarker for CD8 binds on the cell membrane, the nuclei segmentation was dilated first before superimposition to ensure overlap.
2.3. Building cell profiles of DNA damage
Techniques that look solely at shape like in [18] exist and were demonstrated to capture the heterogeneity and variations of cells in tissues. However, they rely heavily on segmentation. In our case, as γH2AX+ is primarily analyzed for its distribution across nuclei, texture more than shape would capture this information.
Statistical Texture Features
A popular way to quantify texture in images is to compute and analyze gray-level co-occurrence matrices (GLCM). GLCM is a second-order statistic that captures pairwise relationship of intensity levels within a specific neighborhood size in an image. Its use in medical image analysis is widespread, ranging from brain magnetic resonance images [21] to liver ultra-sound [25]. From the computed matrices, features encapsulating the intensity co-occurrences in different ways can be measured [12]. In this work, we used the following GLCM features: energy, contrast, prominence, and correlation.
Latent Features
We also explored the use of latent features from deep learning methods to characterize the morphology of DNA-damaged and apoptotic cells. Data representations in latent spaces have been applied to many biological domains and were demonstrated to capture subtle similarities and dissimilarities in imaging data [19, 5]. In this work, we used a type of a variational autoencoder [14].
VAE-GAN [16] is a type of a variational autoencoder that adds a discriminator to the network to further improve decoding and stabilize the training process. It borrows the discriminator concept from generative adversarial networks (GAN) [10], another generative deep learning technique. Further, it utilizes a learned similarity metric derived from feature maps from the discriminator in addition to the pixel-wise MSE loss. As capturing good representations of DNA damage-induced morphology and not faithful reconstructions of image data is the primary task, a high level similarity test is desirable. For this reason, we employed the VAE-GAN architecture to construct a manifold for DNA damage. Figure 2 shows the structure of our model. Our model’s encoder contains three convolutional layers with batch normalization, followed by a series of dense feed-forward layers. For all our experiments, the dimension of the latent space is set to 16. The generator/decoder mirrors the encoder with three transposed convolutional layers. For the discriminator, a network with five convolutional layers with batch normalization was constructed. Feature maps after the fourth convolutional layer are extracted and used for the learned similarity metric. For training, the images of segmented γH2AX nuclei were resized to and centered on 64 × 64 patches, and scaled to the [0,1] range. All network modules are trained with ADAM optimizers with learning rate of 0.0001 for 500 epochs.
Fig. 2.
The VAE-GAN architecture (A) used to construct the manifold of DNA damage in cells. The network is composed of three components: encoder, generator, and discriminator. The encoder (B) maps input images to 16-dimensional Gaussian distributions with diagonal covariance. The generator (C) produces a reconstruction from sampled points in the latent space. The discriminator (D) forces the generator to output images as similar to the input as possible. Latent encoding of the image volumes are then clustered to form pseudo-classes of DNA damage subtypes (E).
2.4. Pseudo-class labels
Extracted representations for individual cells were clustered into 5 pseudo texture classes using KMeans. These clusters are then projected and visualized into 2D using principal components analysis. Since our data lacks expert annotation for repair and apoptotic classes, we aimed to use the formed clusters to find texture similarities in foci across individual cells, acting as class label surrogates.
2.5. Comparing cell profiles in control and treated tissues
The last step in our analysis is to form profiles of local neighborhoods of cells to distinguish between regions from treated and control samples. Constructing spherical neighborhoods of various radii lengths, centered on identified γH2AX+ cells, we looked at the proximity of other γH2AX+ from different texture classes. We also counted the number of identified CD8+ cells within the vicinity. This colocalization of γH2AX+ and CD8+ cells follows the results of studies establishing links between immune response and DNA repair in tumor microenvironments [11].
3. Results and Discussion
Using the texture subtypes identified by clustering on GLCM-based features, we constructed heatmaps for control and treated samples. Shown in Figure 3 are co-occurrence heatmaps of texture subtypes. Visually, there seems to be a clear difference between control and treated tissues. Clusters of γH2AX+ cells with labels 0 and 3 are more prominent in treated tissues. On the other hand, control tissues exhibit clusters of γH2AX+ cells with labels 0 and 2. We also observed difference in heatmaps using pseudo-class labels generated from clustering VAE-GAN encodings as shown in Figure 3. The number of neighboring γH2AX+ cells with labels 2 and 4 are elevated in control tissues while prominence in treated is in γH2AX+ cells with labels 1 and 2. Both representations resulted in distinct profiles for control and treated regions. This demonstrates that the identified texture subtypes capture information that can be utilized for a more refined analysis of tissue sections. Lacking annotation for the data used in this work, however, explaining the profile differences will be difficult. It will be worth exploring how these texture subtypes correspond to DNA damage response types.
Fig. 3.
Clustering on GLCM representations (A) and VAE-GAN encodings (B). Heatmaps, using pseudo-classes from GLCM (C) and VAE-GAN(D), show differences in γH2AX profiles in neighborhoods in control and treated samples. Proximity analysis shows a higher number of γH2AX+ cells near CD8+ cells in treated samples (E).
Lastly, we looked at the proximity of identified CD8+ cells to γH2AX cells. Our understanding of the links between immune response and DNA repair still contain gaps and tools that can provide quantitative measurements have the potential to accelerate current studies in this area. In Figure 3, we enumerated γH2AX+ cells within the neighborhood of CD8+ cells. We computed this on various scales (32px (10.28μm), 64px (20.56μm), 128px (41.13μm)). In general, we see an increase in the number of γH2AX+ cells near CD8+ cells. While further tests are needed for increased immune response to be claimed with certainty, quantifying the proximity of these two cell types can be useful in establishing baseline levels and investigating the heterogeneity in tumor tissues.
4. Conclusion
Here we presented a pipeline for the analysis and profiling of 3D histology samples. We demonstrated using a 4T1 dataset that texture representations using GLCM and VAE-GAN have a potential impact on DNA repair analysis on tissue microenvironments. Using our approach, we identified γH2AX+ and CD8+ nuclei and constructed profiles describing local cell neighborhoods of different scales. The learned texture subtypes and the neighborhood profiles show no-table differences between control and treated tissues. This can enable a more precise quantification of experiments, particularly response to anti-cancer drugs and therapies.
In future work, we aim to validate the texture clusters formed from GLCM and VAE-GAN features against morphological manifestation of DNA repair and apoptosis types. While the pseudo-class labels from these clusters are demonstrated to be useful in showing differences between regions in treated and control tissues, it is desirable to improve the explainability of the labels and ground them on current biological and pathological knowledge. This, however, would entail curating individual cells and incorporating additional markers to determine the specific types. Moreover, we envision to extend our profiling to other specific cell targets such as immune cells and to support further studies looking at links between immune response and DNA repair in tumor contexts.
Acknowledgements
This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26100055 under Contract No. HHSN261201500003I. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. K.D. is supported by EPSRC and MRC (EP/L016052/1), the University of the Philippines, and the Philippine Department of Science and Technology (ERDT). M.D. and J.R. were funded by a Wellcome Collaborative award (203285/C/16/Z) from Wellcome Trust.
Contributor Information
Kristofer E. delas Peñas, Email: kristofer.delaspenas@wolfson.ox.ac.uk.
Mariia Dmitrieva, Email: mariia.dmitrieva@eng.ox.ac.uk.
Jens Rittscher, Email: jens.rittscher@eng.ox.ac.uk.
References
- 1.Alhmoud JF, Woolley JF, Al Moustafa AE, Malki MI. DNA damage/repair management in cancers. Cancers. 2020;12(4) doi: 10.3390/cancers12041050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Brunner S, Varga D, Bozó R, Polanek R, Tőkés T, Szabó ER, Molnár R, Gémes N, Szebeni GJ, Puskás LG, Erdélyi M, et al. Analysis of Ionizing Radiation Induced DNA Damage by Superresolution dSTORM Microscopy. Pathology and Oncology Research. 2021;27 doi: 10.3389/pore.2021.1609971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013;501(7467) doi: 10.1038/nature12625. [DOI] [PubMed] [Google Scholar]
- 4.Chae YK, Anker JF, Carneiro BA, Chandra S, Kaplan J, Kalyan A, Santa-Maria CA, Platanias LC, Giles FJ. Genomic landscape of DNA repair genes in cancer. Oncotarget. 2016;7(17) doi: 10.18632/oncotarget.8196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chartsias A, Joyce T, Papanastasiou G, Semple S, Williams M, Newby DE, Dharmakumar R, Tsaftaris SA. Disentangled representation learning in cardiac image analysis. Medical Image Analysis. 2019;58 doi: 10.1016/j.media.2019.101535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chen S, Zhao M, Wu G, Yao C, Zhang J. Recent advances in morphological cell image analysis. Computational and mathematical methods in medicine. 2012;2012:101536. doi: 10.1155/2012/101536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Do K, Wilsker D, Ji J, Zlott J, Freshwater T, Kinders RJ, Collins J, Chen AP, Doroshow JH, Kummar S. Phase I study of single-agent AZD1775 (MK-1775), a wee1 kinase inhibitor, in patients with refractory solid tumors. Journal of Clinical Oncology. 2015;33(30) doi: 10.1200/JCO.2014.60.4009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dull AB, Wilsker D, Hollingshead M, Mazcko C, Annunziata CM, LeBlanc AK, Doroshow JH, Kinders RJ, Parchment RE. Development of a quantitative pharmacodynamic assay for apoptosis in fixed tumor tissue and its application in distinguishing cytotoxic drug-induced DNA double strand breaks from DNA double strand breaks associated with apoptosis. Oncotarget. 2018;9(24) doi: 10.18632/oncotarget.24936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Essen DVAN, Kelly J. Correlation of Cell Shape and Function in the Visual Cortex of the Cat. Nature. 1973 Feb;241(5389):403–405. doi: 10.1038/241403a0. [DOI] [PubMed] [Google Scholar]
- 10.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. In: Advances in Neural Information Processing Systems. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, editors. Vol. 27. Curran Associates, Inc; 2014. Generative Adversarial Nets; pp. 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf . [Google Scholar]
- 11.Green AR, Aleskandarany MA, Ali R, Hodgson EG, Atabani S, De Souza K, Rakha EA, Ellis IO, Madhusudan S. Clinical impact of tumor DNA repair expression and T-cell infiltration in breast cancers. Cancer Immunology Research. 2017;5(4) doi: 10.1158/2326-6066.CIR-16-0195. [DOI] [PubMed] [Google Scholar]
- 12.Haralick RM, Dinstein I, Shanmugam K. Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics. 1973;SMC-3(6) doi: 10.1109/TSMC.1973.4309314. [DOI] [Google Scholar]
- 13.Kinders RJ, Hollingshead M, Lawrence S, Ji U, Tabb B, Bonner WM, Pommier Y, Rubinstein L, Evrard YA, Parchment RE, Tomaszewski J, et al. Development of a validated immunofluorescence assay for γH2AX as a pharmacodynamic marker of topoisomerase I inhibitor activity. Clinical Cancer Research. 2010;16(22) doi: 10.1158/1078-0432.CCR-09-3076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kingma DP, Welling M. An Introduction to Variational Autoencoders. CoRR. 2019:abs/1906.0. http://arxiv.org/abs/1906.02691 . [Google Scholar]
- 15.Labouesse C, Verkhovsky AB, Meister JJ, Gabella C, Vianay B. Cell Shape Dynamics Reveal Balance of Elasticity and Contractility in Peripheral Arcs. Biophysical Journal. 2015 May;108(10):2437–2447. doi: 10.1016/j.bpj.2015.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond Pixels Using a Learned Similarity Metric; Proceedings of the 33rd International Conference on International Conference on Machine Learning; 2016. pp. 1558–1566. [Google Scholar]
- 17.Li LY, Guan YD, Chen XS, Yang JM, Cheng Y. DNA Repair Pathways in Cancer Therapy and Resistance. Frontiers in Pharmacology. 2021;11 doi: 10.3389/fphar.2020.629266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Phillip JM, Han KS, Chen WC, Wirtz D, Wu PH. A robust unsupervised machine-learning method to quantify the morphological heterogeneity of cells and nuclei. Nature Protocols. 2021 Feb;16(2):754–774. doi: 10.1038/s41596-020-00432-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rappez L, Rakhlin A, Rigopoulos A, Nikolenko S, Alexandrov T. DeepCycle reconstructs a cyclic cell cycle trajectory from unsegmented cell images using convolutional neural networks. Molecular systems biology. 2020;16(10):e9474. doi: 10.15252/msb.20209474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Redon CE, Nakamura AJ, Zhang YW, Ji J, Bonner WM, Kinders RJ, Parchment RE, Doroshow JH, Pommier Y. Histone γH2AX and poly(ADP-ribose) as clinical pharmacodynamic biomarkers. Clinical Cancer Research. 2010;16(18) doi: 10.1158/1078-0432.CCR-10-0523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sivapriya TR, Saravanan V, Ranjit Jeba Thangaiah P. Communications in Computer and Information Science. Vol. 204. CCIS; 2011. Texture analysis of brain MRI and classification with BPN for the diagnosis of dementia. [DOI] [Google Scholar]
- 22.Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nature Methods. 2021;18(1) doi: 10.1038/s41592-020-01018-x. [DOI] [PubMed] [Google Scholar]
- 23.Varga D, Majoros H, Ujfaludi Z, Erdélyi M, Pankotai T. Quantification of DNA damage induced repair focus formation: Via super-resolution dSTORM localization microscopy. Nanoscale. 2019;11(30) doi: 10.1039/c9nr03696b. [DOI] [PubMed] [Google Scholar]
- 24.Wilsker DF, Barrett AM, Dull AB, Lawrence SM, Hollingshead MG, Chen A, Kummar S, Parchment RE, Doroshow JH, Kinders RJ. Evaluation of pharmacodynamic responses to cancer therapeutic agents using DNA damage markers. Clinical Cancer Research. 2019;25(10) doi: 10.1158/1078-0432.CCR-18-2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xian GM. An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Systems with Applications. 2010;37(10) doi: 10.1016/j.eswa.2010.02.067. [DOI] [Google Scholar]



