Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2023 Nov 9;21(11):e3002357. doi: 10.1371/journal.pbio.3002357

Accurate classification of major brain cell types using in vivo imaging and neural network processing

Amrita Das Gupta 1, Livia Asan 1,¤, Jennifer John 1, Carlo Beretta 1, Thomas Kuner 1,*, Johannes Knabbe 1,2,*
Editor: Carole A Parent3
PMCID: PMC10689024  PMID: 37943858

Abstract

Comprehensive analysis of tissue cell type composition using microscopic techniques has primarily been confined to ex vivo approaches. Here, we introduce NuCLear (Nucleus-instructed tissue composition using deep learning), an approach combining in vivo two-photon imaging of histone 2B-eGFP-labeled cell nuclei with subsequent deep learning-based identification of cell types from structural features of the respective cell nuclei. Using NuCLear, we were able to classify almost all cells per imaging volume in the secondary motor cortex of the mouse brain (0.25 mm3 containing approximately 25,000 cells) and to identify their position in 3D space in a noninvasive manner using only a single label throughout multiple imaging sessions. Twelve weeks after baseline, cell numbers did not change yet astrocytic nuclei significantly decreased in size. NuCLear opens a window to study changes in relative density and location of different cell types in the brains of individual mice over extended time periods, enabling comprehensive studies of changes in cell type composition in physiological and pathophysiological conditions.


This study presents NuCLear (Nucleus-instructed tissue composition using deep learning), a novel method enabling in vivo analysis of cell types in the mouse brain. It combines two-photon imaging of labeled cell nuclei with deep learning to classify and track cells over time, allowing the study of cellular changes in physiological and pathological conditions.

Introduction

Understanding the plasticity and interaction of different brain cell types in vivo for the investigation of large-scale structural brain alterations associated with diverse physiological and pathophysiological states has been challenging due to the inability to study multiple cell types simultaneously. Until now, to investigate the effect of an experimental intervention on different cortical cell types and their respective spatial interactions, each would require an individual labeling strategy using a set of specific promoters and fluorophores. This would require prioritizing cell types from the outset and thereby limit the scope to a small subset of the population. With multiple cell types to consider, the number of experiments and animals investigated may become intractable. To date, quantitative studies assessing whole tissue composition have primarily employed ex vivo approaches using manual or automated analysis of antibody staining in tissue sections [1], isotropic fractionation [2], in situ hybridization [3], or single-cell sequencing [4]. Other studies used manual [5] or automated stereological approaches [6,7]. All these studies were conducted ex vivo and a subset required the dissolution of the studied organ’s cellular architecture. The ability to identify multiple cell types and their location in 3D space within a single subject, here referred to as “tissue composition,” in the integrity of the living brain to quantify and observe changes over time delivers new opportunities.

Here, we propose an experimental approach to study excitatory and inhibitory neurons as well as glial cells, especially, astroglia, microglia and oligodendroglia, and endothelial cells in the same mouse in vivo over time, using a single genetically modified mouse line expressing a fusion protein of histone 2B and enhanced green fluorescent protein (eGFP) in all cell nuclei. Our approach uses a deep learning method, implementing artificial neural networks to classify each nucleus belonging to a specific cell type, making it possible to perform a nucleus-instructed tissue composition analysis using deep learning (NuCLear). Additionally, determining the precise coordinates of every nucleus within the imaged space allows one to assess spatial relations of cells, e.g., the degree of cell clustering versus even distribution of cells. This can be utilized as an indirect marker of glial territory size, which has been proven to be relevant in pathogenesis of disease such as Alzheimer’s [8], or give information about neuron-glia-vasculature proximity [9]. Apart from the application in longitudinal in vivo imaging of the mouse brain, the concept of this technique could be applicable to many other cellular imaging techniques such as confocal and widefield imaging of different organs, when ground truth data for classifier training is available. This will make NuCLear a powerful approach for future analyses of large-scale automated tissue composition.

Results

For the comprehensive identification and tracking of cell type composition in the mouse brain over the course of weeks and even months, a mouse line constitutively expressing a human histone 2B-eGFP (H2B-eGFP) fusion protein in all cell nuclei was imaged using two-photon microscopy after implantation of a chronic cranial window (Figs 1A and S1A). Volumetric images taken from this mouse line show a nuclei distribution resembling the well-known DAPI staining (S1B and S1C Fig). The idea behind the approach was to use nuclei as proxy for cells and train a neuronal network with ground truth data to classify nuclei belonging to distinct cell types. In brief, the proposed method consists of 3 main steps: (1) automated nuclei segmentation in the raw data (detection and labeling of nuclei); (2) feature extraction of the segmented nuclei; and (3) classification of individual nuclei using pretrained classifiers for each desired cell type (Fig 1B, 1C and 1C(i)). In our study, we selected 5 different cell types for classification: neurons, astroglia, microglia, oligodendroglia, and endothelial cells. Neurons were further classified into inhibitory or excitatory subtypes.

Fig 1. Imaging of cell nuclei and classification into different types.

Fig 1

(A) Illustration showing acquisition of a 700 μm × 700 μm × 700 μm image volume of cell nuclei in the neocortex using in vivo 2P microscopy in a H2B-eGFP mouse; 3D reconstruction and single imaging plane. (B) Flow diagram showing the classification process from acquisition of the 2P images until the final classification. (C) Illustration visualizing a simplified process of the training and classification algorithms. Using an overlay of red (only 1 cell type labeled per mouse) and green (nuclei with H2B-eGFP expression: bright green) fluorescent images, ground truth data were obtained to train a supervised neural network to classify nuclei into cell types. Yellow: Neurons, blue: Astroglia, green: Microglia, turquoise: Oligodendroglia, red: Endothelial cells. (C(i)) Visualization of the classification process described in (B) and the blue dashed area in (C). Image volumes were obtained from H2B-eGFP mice, each nucleus was automatically segmented. Features were extracted and each nucleus was classified using pretrained classifiers. (N = Neuron, Ag = Astroglia, Mg = Microglia, Og = Oligodendroglia, EC = Endothelial cells, UC = unclassified cells, UD: undecided cells (classified as belonging to multiple classes)). 2P, 2-photon; H2B-eGFP, histone 2B-eGFP.

To be able to train a classifier for each cell type, ground truth data had to be generated. In addition to the H2B-eGFP-labeled nuclei, a red fluorescent marker protein was introduced (tdTomato, DyLight 549, or mCherry) to mark a specific cell type. The colocalization of both the red and green label allowed us to assign each nucleus to a specific cell type and to extract ground truth data for nucleus classification in the green channel. Nuclei belonging to a specific cell type were manually selected and their features were used to train a neuronal network classifier in a supervised way (Fig 2A). To avoid a bleed through effect of the red fluorescent protein tdTomato into the eGFP channel, which could influence the quality of the classification, we induced the Cre-ERT2-dependent expression of tdTomato via intraperitoneal injection of tamoxifen (S1D Fig, right) for cell type identification after the first imaging of the nuclei (Fig 2B, overlapping emission spectra for eGFP and tdTomato are shown in S1E Fig, bleed through of the tdTomato signal into the eGFP channel is demonstrated in S1F Fig). As microglia were the only cells to change their positions after induction with tamoxifen, we chose acute injection of tomato-lectin coupled to DyLight 594 to label these cells, which did not produce any signal alterations in the eGFP channel (S1G Fig). Neurons were labeled using a cortically injected AAV expressing the mCherry red fluorescent protein under the neuron-specific synapsin promotor (syn1-mCherry), which did not affect the eGFP signal due to its specific fluorescence properties with a more red-shifted emission spectrum compared to tdTomato (S1E and S1G Fig). To further distinguish between different neuronal subtypes, excitatory and inhibitory neurons were labeled by AAV-mediated expression of CamKIIα-mCherry and mDLX-mRuby2, respectively (S2A and S2B Fig). To prepare the ground truth datasets of tdTomato expressing mice, nuclei were manually selected from the preinduction time point images (green fluorescence signal) wherever they overlapped with tdTomato signal in the post-induction time point images (red fluorescence signal) (Figs 2C and S4B). For the microglial and neuronal datasets, nuclei that overlapped in the green and red channel were manually selected (Figs 2C and S2B).

Fig 2. Strategies for training a neural network classifier using nucleus features.

Fig 2

(A) Flow diagram showing classification training pipeline. Cell type specific red fluorescent proteins were used to manually identify H2B-eGFP nuclei. This information was used to train a classifier for each cell type. (B) Labeling strategies used for each cell type. Reporter mouse lines were created by breeding H2B-eGFP mice to carry a floxed sequence of the red fluorescence protein tdTomato and a tamoxifen inducible Cre-recombinase under the expression of different cell type specific promoters that were used for identification of astroglia, oligodendroglia, and endothelial cells (see Methods). Microglia and neurons were visualized using intracortical injections of Lycopersicon Esculentum (Tomato) lectin and an AAV expressing mCherry under the synapsin promoter, respectively. (C) Nuclei belonging to a specific cell type could be identified by a red fluorescent marker (maximum intensity projections; z = 20 μm). 3D-renderings of individual nuclei are shown in the lower panel. Scale bar: 50 μm. (D) Illustrations of a subset of radiomics features showing examples of different shape, intensity, and texture features. (E) Radar plots showing a subset of nuclear radiomics features for each cell type. (F) Comparison of nuclear features (3D diameter in voxels, (voxel resolution in xyz: 0.29 μm × 0.29 μm x 2μm), intensity non uniformity, flatness, entropy) between cell types (Wilcoxon test, Bonferroni correction for multiple comparisons, n of neurons: 135, n of astroglia: 137, n of microglia: 62, n of oligodendroglia: 72, n of endothelial cells: 155, p < 0.05 *, p < 0.01 **, p < 0.001 ***), N = Neuron, AGlia = Astroglia, MGlia = Microglia, OGlia = Oligodendroglia, EC = Endothelial cells. Plot data can be found in S1 Data. H2B-eGFP, histone 2B-eGFP.

Automated segmentation of nuclei in the raw data was achieved using the StarDist neural network [10], trained on manually traced, and labeled ground truth datasets of the H2B-eGFP mouse line (S1H(i)–S1H(ii) and S1I Fig). The trained network showed a high nucleus detection accuracy of 94% as well as a good shape segmentation of nuclei (S1H(iii)–S1H(iv) and S1I Fig). From the binary mask of individual nuclei and their respective pixel intensities in the raw image, in total 107 features were extracted using the PyRadiomics package [11], including 3D-diameter, flatness, gray level non-uniformity, entropy, first order minimum, or coarseness (Fig 2D) (for a full list and short description of the features, see S1 and S2 Tables). To reduce possible overfitting of the classification algorithm, a subset of 12 features were automatically selected using a sequential feature selection algorithm (S2 Table). Radar plots with these nuclear features visualize the differences between the cell types, for example, neuronal nuclei having a larger 3D diameter and minor axis length in comparison to microglial and astroglial nuclei that exhibit a larger pixel entropy (Fig 2E). For certain features, significant differences between the cell types can be shown as well, for example, flatness being significantly higher in nuclei of microglia than in nuclei of neurons (Fig 2F). Furthermore, combinations of 2 or 3 distinctive features allow for the visual separation of nuclei of different cell types in the 2D and 3D space (S3 Fig). When analyzing inhibitory and excitatory neurons, nuclei showed significant differences in shape and texture features (S2C(i)–S2C(iv) Fig).

Having demonstrated the ability to distinguish between cell types using only nuclear features, we created a neural network model for classifying cell types based on their features (S4A Fig). Utilizing the dataset obtained from the 5 reporter mouse models, 5 neural network classifiers were trained, 1 for each cell type. The purpose of each classifier was to distinctly differentiate its corresponding cell type from the diverse array of other cell types (Figs 1C(i) and 3A). To increase the amount of training data and equalize the nuclei counts for each cell type, thus reducing training bias, synthetic data was generated from the features of the original dataset (S4C Fig). Synthetic data distribution fit well to the distribution of the original data for each cell type (S4C and S4D Fig). After each nucleus was classified by all trained classifiers, it was either assigned to a single class (neurons, astrocytes, microglia, oligodendroglia, and endothelial cells), 2 or more classes (undecided) or to none (unclassified). Precision and recall rates for the model were high for neurons and endothelial cells (Fig 3B). Due to their relative similarity, glial cells exhibited lower precision and recall rates. The classification accuracy for the entire training dataset was highest for neurons (98%) and endothelial cells (99%), whereas the classifier showed slightly lower accuracies for all glial cell types, especially for microglia (96%) (Fig 3C and 3D, for the respective confusion plots see S5 Fig). To be able to distinguish between inhibitory and excitatory neurons, another classifier was trained on the ground truth data from excitatory and inhibitory cells. The classifier had a 93% overall accuracy (S2D Fig), enabling a good distinction between inhibitory and excitatory neuronal cells. The 3D classification results for 1 example cortical volume containing 20,123 nuclei (S6A and S6B Fig) visualizes the relative density and positioning of cell types within the volume and their mutual spatial relationship (see also S1 Movie for an animated version of the segmentation and classification process).

Fig 3. Training a neural network for cell type classification.

Fig 3

(A) Schematic depiction of the training process. After segmenting nuclei of the raw data with StarDist for every cell type, radiomics features of nuclei were extracted from the ground truth data (blue) and synthetic data (green) was generated. For each cell type, a single neuronal network was trained to distinguish the corresponding nucleus type from the other nuclei. (B) Precision, recall, false discovery rate, and false negative rates for the test dataset. (C, D) Mean accuracy for all cell types in the validation dataset (15% of the ground truth dataset) and the test dataset (15% of the ground truth dataset). The dotted black horizontal line shows mean of all classifiers. (E, F) Comparison of mean nucleus volume (unit: μm3) and mean nuclear surface to volume ratio (unit: μm-1) of each cell type for classified volumes. (G) Number of nuclei per mm3 in the secondary motor cortex. (H) Number of nuclei per mm3 in the secondary motor cortex 12 weeks apart. Black horizontal line depicts mean per class. (I) Comparison of normalized number of nuclei after 12 weeks. (J) Mean nucleus volume after 12 weeks normalized to baseline. (Significance testing for C, D, E, F, G, H, I, J: Wilcoxon test, p-values were corrected for multiple comparisons using the Bonferroni method, n = 8 mice, p < 0.05*, p < 0.01 **, p < 0.001 ***), N = Neuron, AGlia = Astroglia, MGlia = Microglia, OGlia = Oligodendroglia, EC = Endothelial cells. Plot data can be found in S1 Data.

To apply NuClear in mice, we studied 16 imaging volumes from 8 mice (male, 12 to 14 weeks at baseline) in layer 2/3 of the secondary motor cortex 12 weeks apart. When the trained model was applied to the first imaging time point (baseline), 2 features not chosen by the automated feature selection algorithm differed significantly between the classes: nucleus volume (Fig 3E) and mean surface to volume ratio, a measure for the sphericity of the nucleus (Fig 3F). Both findings are expected for each of the cell types and hence demonstrate that even features that were not used for training the neural network model are able to differentiate between the classes. At baseline, a density of 43,000 neurons per mm3 and of 57,000 glial cells per mm3 (including unclassified cells that were counted as glial cells) could be calculated, resulting in a glia to neuron ratio of 1.3 (Fig 3G). The density of excitatory neurons was 37,000 cells/mm3 compared to 6,000 cells/mm3 for inhibitory neurons (S2E Fig). The density of cells 12 weeks after baseline imaging remained unchanged after correcting for multiple comparisons (Fig 3H). When normalizing these counts to baseline, only the number of astrocytes shows a trend to increase (Fig 3I), which could be either due to astrogliosis caused by aging or by continued reaction to the chronic window implantation. Interestingly, astroglial nuclear volume significantly decreased over time (Fig 3J), which might be due to altered transcriptional activity known to occur in astrocytes with aging [12,13], whereas the nuclei volume in inhibitory and excitatory neurons did not change over time (S2F Fig). This latter result illustrates that our method can extract even subtle and unexpected changes from of a vast parameter space of different aspects of tissue composition.

Discussion

Here, we present a novel method for intravital, longitudinal, comprehensive, large-scale, and 3D determination of tissue composition based on a deep learning algorithm trained on radiomics features of cell nuclei (NuCLear—Nucleus-instructed tissue composition using deep learning). We demonstrate that cell types and even subtypes can be reliably distinguished via the respective properties of their cell nucleus and that an inventory of cells and their 3D position can be generated for a large imaging volume. To demonstrate the usability of NuCLear, we analyzed volumetric images of the cortex of H2B-eGFP mice acquired with in vivo 2 photon microscopy. To be able to image a large dataset in a comparatively short time, we chose an imaging resolution with an axial resolution of 2 μm and a lateral resolution of 0.3 μm. We were able to image a whole 3D volume of 700 μm3 consisting of approximately 25,000 cells in around 20 min. This speed will make it possible to perform large-scale data acquisition in a repeated manner, enabling longitudinal intravital monitoring of tissue histology over time and in response to perturbations, treatments, or adaptive processes such as learning or exploring.

Neurons and endothelial cells showed the best classification results regarding total accuracy and precision as well as recall rates (Fig 3B), as their nucleus feature profile turned out to be markedly different between each other as well as between the individual glial cell types (Fig 2E and 2F). Differentiation between the 3 different glial cell types was more difficult to achieve, resulting in lower overall accuracy rates as well as lower precision and recall rates, due to the higher similarity of their respective nucleus feature profiles. Nevertheless, with an accuracy above 90% our approach can reliably be used to subclassify glia. We expect that a higher imaging resolution as well as more training data would further increase the classifier performance. Another reason for the comparatively lower classification accuracy in microglia might be due to acute injection of tomato-lectin to label these cells, which could have introduced a bias towards activated microglia. Nevertheless, the overall accuracy of the microglia classifier for the whole training dataset was around 96%, owing to the fact, that non-microglial nuclei could be identified correctly. For astrocytes, we used GFAP-labeled cells for classification, a label that primarily marks reactive astrocytes and has a higher expression in older animals. Future studies could optimize this labeling strategy. Additional confirmation of the classifier’s precision can be inferred by examining characteristics of the nuclei, like average nuclear volume and the surface-to-volume ratio, which were not utilized in the training phase (Fig 3E and 3F). These features showed distinct clustering and low standard deviation in each class as well as significant differences between the cell types.

When applying the NuCLear to imaging volumes from layer 2/3 of the secondary motor area and calculating the number of cells per mm3, a marked similarity of the number of neurons with published results is evident (43,413 ± 1,003 versus 45,374 ± 4,894) [1] (Fig 3G). Astrocyte numbers are similar (18,629 ± 1,007 versus 13,258 ± 1,416), microglia numbers differ more (12,411 ± 745 versus 24,584.6 ± 2,687). Oligodendrocyte numbers seem to be vastly different (4,317 ± 154 versus 51,436.1 ± 488), which might stem from the fact that only layer 2/3 was analyzed, excluding deeper cortical layers that have been shown to contain most oligodendrocytes [14]. When all glial cells as well as undecided and unclassified cells are summed up, a Glia/Neuron ratio of 1.3 can be calculated, a result that is in line with known results of the rodent cortex [15]. Candidates for unclassified cells could be the relatively large population of oligodendrocyte precursor cells (about 5% of all cells in the brain [16]). After classifying the neurons into subtypes, 86% of all neurons were classified as excitatory and 14% as inhibitory (37,199 ± 1,387 versus 6,214 ± 1,001, respectively), which is in par with previous studies that reported 10% to 20% of neurons in layer 2/3 of the cortex being inhibitory interneurons [17,18]. In conclusion, cell type counts are comparable to published results, supporting the validity of our approach.

A decisive advantage of NuCLear over existing ex vivo methods is the ability to study cell type changes over time in the same imaging volume and thus achieving a higher statistical power in experiments with fewer animals, which is crucial for complying with the “3R” rules in animal research (reduce, replace, refine). We succeeded to image selected locations up to 1 year after the implantation of chronic cranial windows. When analyzing the same imaging volumes from the secondary motor cortex 12 weeks apart, we were able to detect a trend towards astrogliosis, which might be due to the aging process leading to a higher GFAP-reactivity [19]. The significant decrease in mean nucleus volume of astroglia over time could be attributed to altered transcriptional activity associated with aging [12,13] since astroglial reactivity changes in the cortex over time [13]. Another factor underlying astrogliosis might arise from the continued presence of the chronic cranial window [20]. Our study illustrates only 1 possible application of NuCLear and subsequent analysis of 3D tissue composition. Here, we focused on the identification of cell types, but the 3D datasets generated allow for additional analyses such as statistical distribution of cell types relative to each other, their nearest neighbor distances or inferences on physical tissue volume [21]. Our classifiers mostly rely on geometrical parameters such as nucleus shape, but also consider texture information to a certain degree. Pathophysiological conditions that alter nuclear shape and texture might influence classification accuracy. An increased sampling resolution may further increase textural information and thereby yield even higher classification accuracies. For now, segmentation and classification depend on the type of microscope used, quality of the images, and depth of imaging as well the region which was analyzed. To make the classification more robust and accountable for different qualities of fluorescence signal, augmentations to the training data could be added such as 3D blurring with PSF-shaped convolutions.

NuCLear will be applicable to different organs as techniques have emerged in the last years to perform in vivo imaging through chronic windows in other rodent organs such as skin, abdominal organs, the tongue, or spinal cord [22], depending on the image quality and the possibility to acquire images during multiple time points. Other organisms and in vitro preparations like cell cultures or organoids can also be utilized, provided that the respective ground truths can be generated. It will be usable with a variety of microscopy techniques such as confocal-, light-sheet-, or three-photon microscopy, making large-scale tissue analysis much more accessible. We assume that it will be possible to detect more cell types by adding classifiers trained with appropriate ground truth data. In this way, we propose a readily usable method to implement large-scale tissue analysis ex vivo or in vivo to study effects of interventions on cell type compositions of different organs.

Materials and methods

Ethical approval

The entire study was conducted in line with the European Communities Council Directive (86/609/EEC) to reduce animal pain and/or discomfort. All experiments followed the German animal welfare guidelines specified in the TierSchG. The study has been approved by the local animal care and use council (Regierungspräsidium Karlsruhe, Baden-Wuerttemberg) under the reference numbers G294/15 and G27/20. All experiments followed the 3R principle and complied with the ARRIVE guidelines [23].

Animals

Adult transgenic mice expressing the human histone 2B protein (HIST1H2B) fused with eGFP under the control of the chicken beta actin promoter (CAG-H2B-eGFP) [24] (B6.Cg-Tg(HIST1H2BB/EGFP)1Pa/J, Jackson Laboratory; # 006069) were used in all animal experiments. For the reporter mouse lines, H2B-eGFP mice were bred to carry a floxed sequence of the red fluorescent protein tdTomato and a tamoxifen inducible Cre-recombinase under the expression of different cell type specific promoters: GFAP (Glial Fibrillary Acidic Protein) for astroglia (HIST1H2BB/EGFP-GFAP/ERT2CRE- CAG/loxP/STOP/loxP/tdTomato), PLP (Proteolipid Protein) for oligodendroglia (HIST1H2BB/EGFP-PLP/ERT2CRE- CAG/loxP/STOP/loxP/tdTomato), and Tie2 (Tyrosine Kinase) for endothelial cells (HIST1H2BB/EGFP-Tie2/ERT2CRE- CAG/loxP/STOP/loxP/tdTomato). Expression of the Cre-recombinase was achieved over a course of up to 5 days with daily intraperitoneal injection of 2 doses of 1 mg Tamoxifen (Sigma Life sciences; dissolved in 1 part ethanol (99.8% absolute) and 10 parts sunflower oil) (Figs 2B and S1D). Microglia were visualized using intracortical injection of Lycopersicon Esculentum (Tomato) lectin coupled with DyLight 594 during the cranial window surgery after dilution by 1:39 in 150 mM NaCl, 2.5 mM KCl, 10 mM HEPES at pH 7.4 [25] (Figs 2B and S1D). For neuronal labeling, an in house produced adeno-associated virus expressing mCherry under the Synapsin promotor was cortically injected during the cranial window surgery (Figs 2B and S1D). To label excitatory and inhibitory neurons, a viral labeling strategy was implemented via intracortical injections, using AAV5-CamKIIα-mCherry (Addgene plasmid 114469) and AAV1-mDLX-mRuby2 (Addgene plasmid 99130) (S2 Fig) [26]. All mice were between 10 and 18 weeks old during baseline imaging (5 female/10 male).

Chronic cranial window implantation

A craniectomy procedure was performed on each mouse to enable in vivo two-photon imaging following a predefined protocol as described before [27] (S1A Fig). Briefly, the mice were anesthetized with an intraperitoneal injection (i.p.) of a combination of 60 μl Medetomidine (1 mg/ml), 160 μl Midazolam (5 mg/ml), and 40 μl Fentanyl (0.05 mg/ml) at a dosage of 3 μl/g body weight. The head was shaved, and the mice were fixed with ear bars in the stereotactic apparatus and eye ointment was applied. Xylocain 1% (100 μl, Lidocaine hydrochloride) was applied under the cranial skin and 250 μl Carprofen (0.5 mg/ml) was injected subcutaneous (s.c.). The skin was removed to expose the skull, and the surface of the skull was made rough to allow the cement to adhere better. A skull island (approx. 6 mm ∅) was drilled centered at 1 mm rostral to bregma using a dental drill and removed with a forceps (#2 FST by Dumont) making sure not to damage the cortical surface. For improved imaging condition, the dura was carefully removed from both hemispheres using a fine forceps (#5 FST by Dumont). Normal rat ringer solution (NRR) was applied on the exposed brain surface to keep it moist, and a curved cover glass was placed on top to cover it [21]. With dental acrylic cement (powder: Paladur, Kulzer; activator: Cyano FAST, Hager & Werken GmbH), the cover glass was sealed, and excess cement was used to cover the exposed skull and edges of the skin. A custom designed 3D printed holder was placed on top of the window and any gaps were filled with cement to ensure maximum adhesion with the skull. After the procedure, the mice were injected (i.p./s.c.) with a mix of 30 μl Atipamezole (5 mg/ml), 30 μl Flumazenil (0.1 mg/m), and 180 μl Naloxon (0.4 mg/ml) at a dosage of 6 μl per gram body weight. To ensure proper recovery of the mice, 3 more doses of Carprofen were given every 8 to 12 h, and the mice were placed onto a heating plate and monitored.

In vivo two-photon imaging

Imaging was carried out with a two-photon microscope (TriM Scope II, LaVision BioTec GmbH) with a pulsed Titanium-Sapphire (Ti:Sa) laser (Chameleon Ultra 2; Coherent) at excitation wavelengths of 860 nm for DyLight 594 and 960 nm for H2GFP, tdTomato and mCherry, respectively. A water immersion objective (16×; NA 0.8, Nikon) was used to obtain volumetric 3D stacks. Individual frames consisted of 694 μm × 694 μm in XY with a resolution of 0.29 μm/pixel. Stacks were obtained at varying depths up to 702 μm from the cortical surface with a step size of 2 μm in Z. Prior to each imaging session, the laser power was adjusted to achieve the best signal to noise ratio. Adaptations were made to minimize the effect of laser attenuation due to tissue depth by creating a z-profile and setting the laser power for different imaging depths while making sure to minimize oversaturation of pixels.

Mice were initially anesthetized using a vaporizer with 6% isoflurane, eye ointment was applied, and mice fixed in a custom-built holder on the microscope stage. Isoflurane was adjusted between 0.5% and 1.5% depending on the breathing rate of each mouse to achieve a stable breathing rate of 55 to 65 breaths per minute with an oxygen flow rate of 0.9 to 1.1 l/min. A heating pad was placed underneath the mouse to regulate body temperature. An infrared camera was used for monitoring of the mice during the imaging.

For ground truth training data reporter mouse lines for neurons and neuronal subtypes, astroglia, oligodendroglia, and endothelial cells were imaged 2 to 4 weeks after the cranial window surgery. Microglia reporter mice were imaged immediately after the cranial window surgery for a minimal inflammatory reaction (for a full list of reporter mice numbers, see S4G Fig). For the data obtained from adult H2B-eGFP mice used for classification, 3D volumetric stacks were imaged in the secondary motor cortex at 2 different time points. This included baseline imaging that was performed at 3 to 4 weeks after the chronic cranial window surgery and 12 weeks after the baseline time point. To investigate a possible age effect, mice underwent a sham surgery at the left hind paw 1 week after baseline surgery as part of a different study.

Automatic nuclei segmentation using StarDist

Automated nuclei segmentation was achieved using the StarDist algorithm, which implements a neural network to detect star-convex polygons in 2D and 3D fluorescence imaging [10]. The StarDist model was trained to segment nuclei in 3D using an in-house developed Jupyter Notebook. A GUI based version of the software for training of the segmentation (NucleusAI) can be found here: https://github.com/adgpta/NuCLear. Nuclei were manually traced and segmented using the segmentation editor in Fiji [28] (S2 Movie). In total, 15 different crops from the two-photon volumetric data with approximately 150 to 200 nuclei each were used for training the StarDist 3D segmentation classifier (S1I Fig).

Detection accuracy and precision were calculated after visualization of raw data and segmentation in ImageJ/Fiji. True positive nuclei (TP), false positive nuclei (FP), and false negative nuclei (FN) were manually counted. Four arbitrary volumetric images were selected containing up to 80 nuclei per crop. Accuracy was calculated as TP/(TP + FP + FN). Precision was calculated as TP/(TP + FP).

Feature extraction of segmented nuclei using PyRadiomics

Using the PyRadiomics python package [11], in total 107 radiomics features were extracted for each nucleus after segmentation using StarDist, including, but not limited, to morphological features, intensity features, and texture features (see S1 Table). Nuclei touching the borders of the imaging volume were excluded to avoid mistakes in the classification. Within an imaging volume of 700 μm3 containing approximately 25,000 nuclei, the number of border touching nuclei represented only 0.03% of the total number of nuclei, thus diminishing the possibility of any error caused by an edge effect.

Extraction of data for training of the classifier

To perform supervised training of the deep neural network for cell type classification, a ground truth dataset was created using the two-photon volumetric fluorescence data, automatically segmented nuclei, which were assigned a unique label, and the radiomics features for each segmented nucleus. With the help of the red fluorescence channel images, nuclei belonging to a specific cell type were manually identified. Nuclei that were overlapping in the green and red channel images in the post-induction time point were reidentified in the green channel images of the preinduction time point. These nuclei were manually selected in the segmented StarDist output images. Using their label, corresponding radiomics features could be extracted from the preinduction time point image (S4B Fig). Approximately 70 to 400 nuclei were identified for each cell type (S4G Fig).

To increase the number of nuclei for training and make it equal in size for all cell types and thus avoid a bias in training, synthetic datasets were created from nuclei features using the synthetic data vault (SDV) package [29], which utilizes correlations between features of the original dataset as well as mean, minimum or maximum values, and standard deviations. The synthetic data generated by the model fit to the original dataset with approximately 83% accuracy (S4D Fig).

Training of the classification model

All training and validation of the classifiers were performed with a custom MATLAB (R2021a, Mathworks) script, visualization was performed in ImageJ/Fiji (S3 Movie). Original and synthesized data were merged into a single table and random indices were generated to divide the dataset into a training, test, and validation dataset with 70%, 15%, and 15% of the total dataset, respectively. Datasets with different amounts of synthetic data combined with the original dataset were created and training accuracy was later compared between them: orig = original dataset, orig * 2.5 = dataset containing 2.5 times the amount of data as the original dataset, orig * 2.5 (down sampled) = dataset down sampled to minimum sample count (after 2.5-fold increase) to equalize sample numbers for all cell types, orig * 9 = dataset containing 9 times the amount of data as the original dataset, orig * 9 (down sampled) = dataset down sampled to minimum sample count (after 9-fold increase) to equalize sample numbers for all cell types (S4E and S4F Fig). Down sampled datasets were created by selecting the class with the minimum number of nuclei and removing random samples from the other classes to match the nuclei count of the selected class. The same training, test, and validation datasets were used for training 5 different classifiers, 1 for each cell type. In the training dataset, for each classifier, the class of interest was assigned a unique identifier and all other cell classes were denoted by another identifier, for example, when training the neuronal classifier, the neurons were labeled as “Neuron” while all the other cell types in the dataset were labeled “Other” (Figs 1C(i), 3A and S4A). Each classifier had the same design. Initially, a sequential feature selection function “sequentialfs” (MATLAB R2022b) was applied to the whole dataset to extract 12 features with the highest variability to reduce data overfitting. These were fed into the “feature input layer” of the classifier. The features were z-score normalized for better compatibility between each class. A “fullyconnected” layer of 50 outputs was created and a mini-batch size of 16 samples was set to reduce the time and memory consumption during training. A batch normalization layer was added to reduce sensitivity to the initialization of the network. The activation layer ReLU (rectified linear unit) was added for thresholding operations. Another “fullyconnected” layer was added, which sets the number of outputs identical to the number of input classes, and a second activation layer “softmax” was chosen for classifying the output into the 2 separate groups. The “adam” adaptive moment estimation optimizer [30] was selected as the solver for the training network that updates the network weights based on the training data in an iterative manner. The training data and the validation data were shuffled every epoch, i.e., a complete pass or iteration through the entire training dataset during the training process, before training and network validation, respectively. After each classifier was trained, further automatic and visual validation was performed to check for accuracy.

Classification of the H2B-eGFP data

After acquiring volumetric images of H2B-eGFP nuclei (n = 8 mice, male), automated segmentation of nuclei was performed with the StarDist neuronal network (see above, S6A and S6C Fig). Features for the segmented nuclei were extracted using PyRadiomics (see above) and then classified with all the different classifiers trained on the orig * 9 dataset (see “Training of the classification model” for a description of the dataset). Thus, a single nucleus would be either labeled as belonging to one of the 5 classes or to the “Other” class. Nuclei that were assigned to multiple classes were labeled as “Undecided” (UD). Any nuclei that were identified as “Other” by all the classifiers were labeled as “Unclassified” (UC). Nuclei of neurons were further subdivided into excitatory and inhibitory subtypes.

Statistical analysis

All classified nuclei and their features were stored in a local MySQL database (MySQL Workbench 8.0). The data were imported into R [31] for statistical analysis. Cell count, mean nuclei volume, and first nearest neighbor distance were calculated for each cell type for different time points. Since the distribution of the data was nonparametric, the Wilcoxon signed-rank test was used for all statistical testing and p-values were corrected for multiple comparisons using the “bonferroni”-method. Plots were created using the R package “ggplot2” [32].

Supporting information

S1 Fig

(A) (i–iii) Chronic cranial window implantation using a curved glass cover slip and a custom 3D printed holder. (iv) Fixation of the anesthetized mouse in the custom holder for imaging under the two-photon microscope. (B) X-Z maximum intensity projection of a two-photon volumetric stack (700 μm × 700 μm × 700 μm). (C) Imaging position on a DAPI-stained brain slice. Slice thickness 50 μm. DAPI intensity profile showing similar distribution of intensity as a two-photon image of nuclei. The decreasing nucleus density in the two-photon image stack is a result of attenuation of fluorescence signal at higher depths. Inlay: Nucleus density distribution of sub-volumes used for data analysis. (D) Scheme showing intracranial and intraperitoneal injection strategies for labeling cell types. (E) Fluorescence emission spectra for eGFP, tdTomato, and mCherry (y axis: fluorophore emission normalized to quantum yield, source: fpbase.org). (F) Labeling strategies for red fluorescence expression in reporter mouse lines. Visualization of crosstalk between the eGFP and tdTomato signal after induction with tamoxifen. Overlay of pre (cyan) and post (yellow) GFP after image alignment with the ImageJ plugin bUnwarpJ [33]. (G) No crosstalk is visible between GFP and mCherry signals (upper panel) or eGFP and Tomato lectin-Dylight 594 signals (lower panel). (H) (i) Raw data of H2B-eGFP signal, (ii) manually labeled ground truth, (iii) StarDist segmentation, (iv) composite image of ground truth (red), and StarDist segmentation (green). (I) Upper panel: Count of manually segmented ground truth nuclei for StarDist training, each color depicts an individual mouse. Lower panel: StarDist nucleus detection accuracy and precision, each bar represents an individual imaging volume. Plot data can be found in S1 Data.

(TIFF)

S2 Fig. Training a classifier to distinguish excitatory from inhibitory neurons.

(A) Labeling strategies for excitatory and inhibitory neurons. Intracortical injections were performed at approximately 400 μm deep from the cortex of H2B-eGFP mice using AAV5-CamKIIα-mCherry and AAV1-mDLX-NLS-mRuby2 to visualize excitatory and inhibitory neurons. (B) Post-injection images. Inhibitory neurons were imaged 2–3 weeks after injection; excitatory neurons were imaged 3–4 weeks after injection. (C) Radiomics features showing differences between excitatory and inhibitory neurons for (i) surface area (in pixel 2 [resolution x = 0.29 μm, y = 0.29 μm, z = 2 μm]), (ii) maximum 2D diameter (in pixel, same resolution as in (i)), (iii) first order 10th percentile, (iv) mesh volume in voxels (resolution as in (i)), (n of excitatory neurons: 396, n = 3 mice, n of inhibitory neurons = 122, n = 2 mice). (D) Confusion plot for the classifier. Rows show the predicted class (output class), and the columns show the true class (target class). Green fields illustrate correct identification whereas blue fields illustrate erroneous identifications. The number of observations and the percentage of observations compared to the total number of observations are shown in each cell. Column on the far right shows the precision (or positive predictive value) and false discovery rate in green and red, respectively. Bottom row denotes recall (or true positive rate) and false negative rate in green and red. Cell on the bottom right shows overall accuracy of the classifier. (E) Number of nuclei per mm3 in the secondary motor cortex at baseline after 12 weeks (dashed line) (n = 8 mice). (F) Mean nucleus volume after 12 weeks normalized to baseline (n = 8 mice). (Significance testing for C, E, F: Wilcoxon test, p-values were corrected for multiple comparisons using the Bonferroni method, p < 0.05*, p < 0.01 **, p < 0.001 ***) Plot data can be found in S1 Data.

(TIFF)

S3 Fig. Combinations of nuclear morphology, intensity, and texture features show clear distinction between cell types.

(A) Flatness, gray level non-uniformity. (B) Minor axis length, flatness. (C) First order minimum, elongation. (D) Large area emphasis, first order 10th percentile. (E, F) Combinations of features in 3 dimensions. Plot data can be found in S1 Data.

(TIFF)

S4 Fig

(A) Visualization of the entire classification training process. After ground truth data were selected, a sequential forward feature selection algorithm was applied to extracted features from all nuclei of all cell types, which selected 12 features from the 107 radiomics features. Synthetic data was generated from all the radiomics features of all nuclei and all cell types. The combined dataset was used to train a classifier with the training, validation, and test data comprising 70%, 15%, and 15% of the combined dataset. (B) Example showing manual selection of ground truth data for supervised training. Post-induction green and red channels were overlayed to create a composite. Any nuclei appearing yellow (possessing green and red fluorescence) were identified in the preinduction GFP channel and from its corresponding segmentation, a label id was acquired and later used to identify the extracted features. (C) Synthetic training data generated from the original dataset matched the features of the original datasets. (D) Statistical analysis of similarity between the distribution of original data and distribution of synthetic data (K-S test; mean = 83.68%). (E) Datasets with different amounts of synthetic data were created and training accuracy was compared between them, orig = original dataset, orig * 2.5 = dataset containing 2.5 times the amount of data as the original dataset, orig * 2.5 (down sampled) = dataset down sampled to minimum sample count (after 2.5-fold increase) to equalize sample numbers for all cell types, orig * 9 = dataset containing 9 times the amount of data as the original dataset, orig * 9 (down sampled) = dataset down sampled to minimum sample count (after 9-fold increase) to equalize sample numbers for all cell types. (F) Mean accuracy of classifiers trained 5 times with different combinations of synthetic data; error bars denote standard deviation (SD). Light blue = orig * 2.5, dark blue = orig * 2.5 (downsampled), green = orig * 9, yellow = orig * 9 (down sampled), red = orig. (G) Table listing the amount of manually identified nuclei used in the training process (Excitatory N: excitatory neurons, Inhibitory N: inhibitory neurons). Plot data can be found in S1 Data.

(TIFF)

S5 Fig. Confusion matrices for each classifier for the test dataset.

Since each classifier distinguishes between the desired class and every “other” class, the confusion matrix consists only of 4 fields. Rows show the predicted class (Output Class), and the columns show the true class (Target Class). Green fields illustrate correct identification of target and “other” class, blue fields illustrate erroneous identifications. The number of observations and the percentage of observations compared to the total number of observations are shown in each cell. Column on the far right shows the precision (or positive predictive value) and false discovery rate in green and red, respectively. Bottom row denotes recall (or true positive rate) and false negative rate in green and red. Cell on the bottom right shows overall accuracy of the classifier. Plot data can be found in S1 Data.

(TIFF)

S6 Fig

(A) Three-dimensional representation of raw, segmented, and classified data (700 μm × 700 μm × 700 μm volume) visualizing comprehensive cell type composition. (B) Cell type distribution in a single volumetric stack. (C) Maximum intensity z-projection of a sub volume (150 μm × 150 μm × 100 μm) showing raw, segmented, and classified nuclei in the X-Y axis.

(TIFF)

S1 Table. Radiomics features extracted from segmented nuclei.

Features marked in green were used for cell type training and classification.

(TIF)

S2 Table. Morphology, intensity, and texture features used for training and classification.

(TIF)

S1 Text. Explanations for technical terms used in the manuscript.

(DOCX)

S1 Movie. Illustration of nucleus segmentation and classification using NuCLear.

Raw data of H2B-eGFP-labeled nuclei (green), nucleus segmentation (randomly colored), and classification (apricot = neurons, red = microglia, blue = oligodendrocytes, dark green = astrocytes, magenta = endothelial cells).

(MP4)

S2 Movie. Visualization of the manual nucleus segmentation process.

(MP4)

S3 Movie. Visualization of a classified 3D volume.

(MP4)

S1 Data. The excel sheet contains all data that were used to generate Figs 2F, 3C–3J, and S1I, S2C–S2F, S3A–S3F, S4D, S4F and S5.

Data can be found on sheets within the excel document named after the corresponding figure and panel.

(XLSX)

Acknowledgments

We thank Michaela Kaiser for her excellent technical support during preparation and execution of experiments. We thank Sidney Cambridge for devising breeding schemes for mouse lines needed for ground truth generation and for help with mouse breeding.

Abbreviations

EEC

European Economic Community

eGFP

enhanced green fluorescent protein

FN

false negative

FP

false positive

H2B-eGFP

histone 2B-eGFP

i.p.

intraperitoneal

NRR

normal rat ringer

NuCLear

nucleus-instructed tissue composition using deep learning

ReLU

rectified linear unit

s.c.

subcutaneous

SDV

synthetic data vault

TP

true positive

Data Availability

All datasets used in the study are available on the heiData archive: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/L3PITA Code and exemplary data can be found on the Github page: https://github.com/adgpta/NuCLear We also published all 3 Github repositories on Zenodo: NuCLear: https://zenodo.org/badge/latestdoi/662776296 NucleusAI: https://zenodo.org/badge/latestdoi/671099550 SynthGen: https://zenodo.org/badge/latestdoi/665629516.

Funding Statement

We gratefully acknowledge the support by the German Research Foundation (DFG) (SFB1158, project B08 awarded to TK), the data storage service SDS@hd, supported by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) and the DFG through grant INST 35/1314-1 FUGG, as well as the high-performance cluster bwForCluster MLS&WISO, supported by the MWK and the DFG through Grant INST 35/1134-1 FUGG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Ero C, Gewaltig MO, Keller D, Markram H. A Cell Atlas for the Mouse Brain. Front Neuroinform. 2018;12:84. doi: 10.3389/fninf.2018.00084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Herculano-Houzel S, Lent R. Isotropic fractionator: a simple, rapid method for the quantification of total cell and neuron numbers in the brain. J Neurosci. 2005;25(10):2518–2521. doi: 10.1523/JNEUROSCI.4526-04.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nagendran M, Riordan DP, Harbury PB, Desai TJ. Automated cell-type classification in intact tissues by single-cell molecular profiling. Elife. 2018:7. doi: 10.7554/eLife.30510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zeisel A, Hochgerner H, Lonnerberg P, Johnsson A, Memic F, van der Zwan J, et al. Molecular Architecture of the Mouse Nervous System. Cell. 2018;174(4):999–1014 e22. doi: 10.1016/j.cell.2018.06.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schmitz C, Hof PR. Design-Based Stereology in Brain Aging Research. In: Riddle DR, editor. Brain Aging: Models, Methods, and Mechanisms. Frontiers in Neuroscience. Boca Raton (FL)2007. [PubMed]
  • 6.Zhang C, Yan C, Ren M, Li A, Quan T, Gong H, et al. A platform for stereological quantitative analysis of the brain-wide distribution of type-specific neurons. Sci Rep. 2017;7(1):14334. doi: 10.1038/s41598-017-14699-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Pakkenberg B, Olesen MV, Kaalund SS, Dorph-Petersen KA. Editorial: Neurostereology. Front Neuroanat. 2019;13:42. doi: 10.3389/fnana.2019.00042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bouvier DS, Jones EV, Quesseveur G, Davoli MA, Ferreira TA, Quirion R, et al. High Resolution Dissection of Reactive Glial Nets in Alzheimer’s Disease. Sci Rep. 2016;6:24544. doi: 10.1038/srep24544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zisis E, Keller D, Kanari L, Arnaudon A, Gevaert M, Delemontex T, et al. Digital Reconstruction of the Neuro-Glia-Vascular Architecture. Cereb Cortex. 2021;31(12):5686–5703. doi: 10.1093/cercor/bhab254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schmidt U, Weigert M, Broaddus C, Myers G, editors. Cell Detection with Star-Convex Polygons. Cham: Springer International Publishing; 2018. [Google Scholar]
  • 11.van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017;77(21):e104–e107. doi: 10.1158/0008-5472.CAN-17-0339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Boisvert MM, Erikson GA, Shokhirev MN, Allen NJ. The Aging Astrocyte Transcriptome from Multiple Regions of the Mouse Brain. Cell Rep. 2018;22(1):269–285. doi: 10.1016/j.celrep.2017.12.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Clarke LE, Liddelow SA, Chakraborty C, Munch AE, Heiman M, Barres BA. Normal aging induces A1-like astrocyte reactivity. Proc Natl Acad Sci U S A. 2018;115(8):E1896–E1905. doi: 10.1073/pnas.1800165115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tan SS, Kalloniatis M, Truong HT, Binder MD, Cate HS, Kilpatrick TJ, et al. Oligodendrocyte positioning in cerebral cortex is independent of projection neuron layering. Glia. 2009;57(9):1024–1030. doi: 10.1002/glia.20826 [DOI] [PubMed] [Google Scholar]
  • 15.Herculano-Houzel S. The glia/neuron ratio: how it varies uniformly across brain structures and species and what that means for brain physiology and evolution. Glia. 2014;62(9):1377–1391. doi: 10.1002/glia.22683 [DOI] [PubMed] [Google Scholar]
  • 16.Dawson MR, Polito A, Levine JM, Reynolds R. NG2-expressing glial progenitor cells: an abundant and widespread population of cycling cells in the adult rat CNS. Mol Cell Neurosci. 2003;24(2):476–488. doi: 10.1016/s1044-7431(03)00210-0 [DOI] [PubMed] [Google Scholar]
  • 17.Algamal M, Russ AN, Miller MR, Hou SS, Maci M, Munting LP, et al. Reduced excitatory neuron activity and interneuron-type-specific deficits in a mouse model of Alzheimer’s disease. Commun Biol. 2022;5(1):1323. doi: 10.1038/s42003-022-04268-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kummer KK, Mitric M, Kalpachidou T, Kress M. The Medial Prefrontal Cortex as a Central Hub for Mental Comorbidities Associated with Chronic Pain. Int J Mol Sci. 2020;21(10). doi: 10.3390/ijms21103440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Palmer AL, Ousman SS. Astrocytes and Aging. Front Aging Neurosci. 2018;10:337. doi: 10.3389/fnagi.2018.00337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Guo D, Zou J, Rensing N, Wong M. In Vivo Two-Photon Imaging of Astrocytes in GFAP-GFP Transgenic Mice. PLoS ONE. 2017;12(1):e0170005. doi: 10.1371/journal.pone.0170005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Asan L, Falfan-Melgoza C, Beretta CA, Sack M, Zheng L, Weber-Fahr W, et al. Cellular correlates of gray matter volume changes in magnetic resonance morphometry identified by two-photon microscopy. Sci Rep. 2021;11(1):4234. doi: 10.1038/s41598-021-83491-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Choi M, Kwok SJ, Yun SH. In vivo fluorescence microscopy: lessons from observing cell behavior in their native environment. Physiology (Bethesda). 2015;30(1):40–49. doi: 10.1152/physiol.00019.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Percie du Sert N, Hurst V, Ahluwalia A, Alam S, Avey MT, Baker M, et al. The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLoS Biol. 2020;18(7):e3000410. doi: 10.1371/journal.pbio.3000410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hadjantonakis AK, Papaioannou VE. Dynamic in vivo imaging and cell tracking using a histone fluorescent protein fusion in mice. BMC Biotechnol. 2004;4:33. doi: 10.1186/1472-6750-4-33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schwendele B, Brawek B, Hermes M, Garaschuk O. High-resolution in vivo imaging of microglia using a versatile nongenetically encoded marker. Eur J Immunol. 2012;42(8):2193–2196. doi: 10.1002/eji.201242436 [DOI] [PubMed] [Google Scholar]
  • 26.Chan KY, Jang MJ, Yoo BB, Greenbaum A, Ravi N, Wu WL, et al. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nat Neurosci. 2017;20(8):1172–1179. doi: 10.1038/nn.4593 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Holtmaat A, Bonhoeffer T, Chow DK, Chuckowree J, De Paola V, Hofer SB, et al. Long-term, high-resolution imaging in the mouse neocortex through a chronic cranial window. Nat Protoc. 2009;4(8):1128–1144. doi: 10.1038/nprot.2009.89 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–682. doi: 10.1038/nmeth.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Patki N, Wedge R, Veeramachaneni K, editors. The Synthetic Data Vault. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA); 2016 17–19 Oct. 2016.
  • 30.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv preprint. 2017; arXiv:1412.6980.
  • 31.R. Core Team. R: A Language and Environment for Statistical Computing. 2021.
  • 32.Wickham H. ggplot2: Elegant Graphics for Data Analysis. 2016.
  • 33.Arganda-Carreras I, Sorzano COS, Kybic J, Ortiz-de-Solórzano C, editors. bUnwarpJ: Consistent and Elastic Registration in ImageJ. Methods and Applications. 2008.

Decision Letter 0

Richard Hodge

6 Feb 2023

Dear Dr Knabbe,

Thank you for submitting your manuscript entitled "Comprehensive monitoring of tissue composition using in vivo imaging of cell nuclei and deep learning" for consideration as a Methods and Resources Article by PLOS Biology. Please accept my sincere apologies for the delay in getting back to you as we consulted with an academic editor about your submission.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Feb 08 2023 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Richard

Richard Hodge, PhD

Associate Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

ORCiD I plosbio.org I @PLOSBiology I Blog

California (U.S.) corporation #C2354500, based in San Francisco

Decision Letter 1

Richard Hodge

21 Mar 2023

Dear Dr Knabbe,

Thank you for your patience while your manuscript "Comprehensive monitoring of tissue composition using in vivo imaging of cell nuclei and deep learning" was peer-reviewed at PLOS Biology. Please accept my apologies for the long delays that you have experienced during the peer review process. It has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by two independent reviewers.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

As you will see below, the reviewers find think you approach is interesting and potentially useful, but raise overlapping concerns with the accessibility and reproducibility of the method since the code has not yet been made publicly available and that the scripts/methods should be assembled into an easily accessible code package that can be used as a resource by the community. In addition, the reviewers note that the utility of the method should be broadened to include additional datasets that classify neurons into subtypes, as demonstrated with glial cells. Please note that these concerns would need to be addressed in order for us to consider a revised version of the manuscript.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Richard

Richard Hodge, PhD

Associate Editor, PLOS Biology

rhodge@plos.org

------------------------------------

REVIEWS:

Reviewer #1: In this paper, Das Gupta et al describe an interesting approach for labelling major cell types in the brain using a computational method. I think the approach is clever, and could potentially be useful for many researchers. However, this is a computational method paper without code, which I can never recommend for publication. The authors must share their code as an easily accessible package, including clear instructions, scripts with comments, example applications etc. I would ask that they share the code with the reviewers for peer review purposes, and then commit to releasing this code publicly at journal publication. Similarly, the data should be made available publicly at publication date (it currently is not). The way the code is described in the methods makes me think there will be some work needed to put all the scripts and methods together into an easily accessible code package, and this is the work that I as a reviewer would like to request. I will make other comments below, which are of a secondary nature and much less important.

Other comments:

I think it should be stated very explicitly that this method can only categorize shallow cell types in the brain, i.e. neurons, astroglia, microglia etc. For the neuroscience community, cell types in the brain usually refers to types of neurons, such as "chandelier cells", or "PV-positive" cells, and these cannot be classified using this approach.

Another thing that should be stated explicitly is that the brain is the only body region where a clear coverslip can even be implanted for in vivo imaging. This is very difficult to do for any other body organ, and the data quality from those implants is much much worse, precluding the application of these computational methods which need fine visual features to work.

Another caveat of the method that should be stated is that it will depend on the type of microscope, depth of imaging and overall quality of the preparation. This makes a classifier trained on one set of data less robust at classifying new types of data, and it could explain the conflicting results which the authors report in the discussion with respect to the motor cortex volume imaged. To improve the ability of the decoder to generalize, the authors or other users could add augmentations to the training data to make it more similar to their own data, such as 3D blurring with PSF-shaped convolutions. This doesn't need to be done here, but I think it is important to give potential users options and ideas for what to do if the method does not work out of the box.

Reviewer #2: The authors have presented a novel method for automatic classification of 5 major cell types of cortical layer 2/3: neurons, astroglia, microglia, oligodendroglia and endothelial cells. The method requires two neural networks, one for nuclei segmentation, the other for cell-type classification based on features of the nuclei. Both neural networks require training via ground-truth datasets, where the ground-truth datasets are created manually. After the initial training, the neural networks can be used in longitudinal studies where a given block of tissue is imaged over an extended period time. Hence, the NuCLear method presented by the authors could prove useful for researchers studying tissue composition in the brain and perhaps other types of tissue. However, it is not clear what resources the authors intend to provide besides the information contained in the manuscript; NuCLear is not a software package but a two-step neural-network method for classifying major cell types.

Major Comments:

1. The abstract is short and heavily weighted to "selling" the authors' NuCLear method with overreaching claims and very little mention of their experimental results (one brief sentence, line 22-23). The authors have not included the type of animal used in the study, what part of the brain that was investigated nor what major cell types were classified.

2. The order in which the authors present their Results is confusing. This is not helped by everything being presented in two long paragraphs. It appears the authors have started their Results with an overall summary with 3 main points: volumetric imaging of nuclei (GFP), neuronal network training using GFP nuclei and ground-truth data created via tdTomato and mCherry, classification of 5 main cell types. Perhaps the authors can make it clear they are summarizing their method from beginning to end here, and the remaining Results will be about filling in the details. However, the authors have not included their method of nuclei segmentation in their initial summary. Finally, the section on "Automated segmentation of nuclei in the raw data" (line 91) comes after the section describing the generation of ground-truth datasets via tdTomato and mCherry (line 73). This may be confusing since generation of the ground-truth datasets requires segmentation of the nuclei. I would suggest the authors present their Results in the same order as their Methods.

3. There is little information content in Figure 1. Moreover, the stripped-down simplicity of the cartoon seems to imply the NuCLear method is easy to use and requires few methodological steps. However, after looking at Figure 3a and Supplementary Figure 3, and reading the Methods, one begins to realise the numerous tasks necessary to perform the NuCLear method. There are actually two neural networks that are required, one for segmentation of the nuclei and the other for cell-type classification. Both neural networks require training via ground-truth datasets. Moreover, the ground-truth datasets have to be computed manually. I think Figure 1 should be replaced with Figure 3a or some version of it; it could also include information in Supplementary Figure 3. Or a simple flow box diagram could be useful.

4. The authors' figures are inadequately labelled making it frustratingly difficult to read/review their manuscript. The panel labels in the figures (a, b, c…) often do not match the labels in their legends (the authors have apparently rearranged their figures without rearranging the legends). Several graphs are missing units. There are many panels that are not adequately described in the legends.

5. The authors may not have computed their densities correctly. On line 317 they state: "Nuclei touching the borders of the imaging volume were excluded." To compute density correctly they need to consider the border effect (Gundersen 1977). For a volume, this means 3 walls of their volume should be "exclusive" and the other 3 "inclusive" when counting the nuclei. Gundersen HJG. Notes on the estimation of the numerical density of arbitrary profiles: the edge effect. J Microsc. 1977 Nov;111(2): 219-223. doi: 10.1111/j.1365-2818.1977.tb00062.x

6. From what I can tell from the Results the authors have only applied their method to one 3D volume of the cortex (line 129-130: "The 3D classification results for one example volume containing 20123 cells are shown in Figure 3E"). I would expect that after the authors trained their neural networks they would then move on to classifying neurons/glia in tissue that was not used in the training process, in multiple volumes as a proof of principle.

7. The authors have not discussed/explained why they did not attempt to classify neurons into subtypes as they did with glia. I would think the NuCLear method would find wider application if it was possible to categorize neuronal subtypes.

8. The authors have not clarified how much training of the neural networks will be necessary to perform their NuCLear classification method. Will the training only be necessary at the beginning of a study and then be valid thereafter for weeks/months/years on multiple preparations? Could the neural networks be shared between labs? Or will it be necessary for each lab to train their own neural networks?

9. There is a lack of experimental quantification in methods, e.g. the number, age and type (male/female) of mice, the number of imaged volumes used in the various steps of the NuCLear method.

10. It is not clear what data/code/networks (i.e. Resources) the authors plan to make available for others to use. Will a lab wanting to perform the NuCLear method have to replicate the authors' study from scratch?

11. There are several terms that are never defined in the manuscript: shape segmentation, binary mask, flatness, grey level non-uniformity, entropy, first order minimum, coarseness, overfitting, pixel entropy, confusion plots, radiomics, crops, etc.

Specific (minor) Comments

Title:

I think "monitoring" is not the best word for the title. The paper is more about classification of cell types than about monitoring cells over time.

Abstract:

Line 18. "Comprehensive analysis of tissue composition has so far been limited to ex-vivo approaches." This sentence is very general in scope and is not correct. I can think of MRI studies as one example.

Line 21. "This allowed us to classify all cells per imaging volume". This statement is not correct. There was a large number of unclassified and undecided cells in this study (Figure 3). And segmentation was not perfect.

Line 23. "…using only a single label." This statement is not correct. Besides labelling the nuclei with GFP, the authors labelled individual cell types with tdTomato and mCherry.

Line 24. "NuCLear opens a window to study changes in relative abundance and location of all major brain cell types…" The phrase "all major brain cell types" may be misleading. On first reading I was anticipating the authors meant that their method would be used to classify major neuronal types (pyramidal, stellate, etc.) but in fact neurons are all grouped into one category.

Line 25. "…enabling comprehensive studies of changes in cellular composition in physiological and pathophysiological conditions". If the pathology changes the shape of the nuclei then perhaps the authors' method will no longer work. The phrase "cellular composition" usually refers to intracellular composition, not cell-type composition of a tissue.

Line 26. "NuCLear will work with any fluorescence-based microscopy and perform in any organ or model system." The authors should be more careful with their choice of words such as "all" and "any" and "every". Other fluorescence-based microscopy methods might not have the necessary resolution to allow segmentation and deep learning in 3D. Not all organs, or parts of the brain, can be accessed using a chronic window implant.

Introduction:

Line 35: "all brain cell types" would be better as "multiple cell types".

Line 35-38: "…each cell type would require an individual labelling strategy using a set of specific promoters and fluorophores. This would require prioritizing cell types from the outset and thereby limit the scope to a small subset of the population". Are the authors implying their method resolves this problem? If that is so, then I am not sure how since their method has a labelling strategy using promoters and fluorophores.

Line 46: "all brain cell types". See comment above (line 35).

Line 49-50: "in the same mouse" is probably not what the authors are implying here.

Line 50: "…using only one genetically modified mouse line" should be "using a single genetically modified mouse line".

Line 51: "Histon" should be "histone". This typo occurs elsewhere.

Line 54-55: "In addition, knowing the 3D location of every cell enables previously inaccessible analyses." This sentence is vague and should be reworded to be more specific.

Line 56-57: "…the concept of this technique will be applicable to many other cellular imaging techniques and in different experimental systems…" This phrase is vague and should be reworded to be more specific.

Results:

Line 67: "…where known cell types were additionally labelled with a red fluorescent protein (tdTomato or mCherry)". It is not clear what "additionally labelled" means. The green label should be mentioned here.

Line 70: "The trained neuronal network was used to classify cell types based on their GFP-labeled nuclei." This sentence seems redundant with the previous sentence.

Line 71-73: "In our study we selected five different cell types for classification Neurons (yellow), astroglia (blue), microglia (green), oligodendroglia (turquoise), and endothelial cells (red)". A colon is missing after "classification". Colours would be better listed in the figure legend.

Line 73-76: "To generate ground-truth datasets without a bleed through of the strong tdTomato fluorescence towards the GFP-channel (Supplementary Figure 1F), we chose a method of inducing the expression of tdTomato for unequivocal cell type identification after imaging of the nuclei." I suggest starting a new paragraph here. Why is bleed-through a problem for tdTomata and not mCheery? The word "unequivocal" is not correct since type identification is not perfect.

Line 76-78: "We selected a labelling strategy consisting of a tamoxifen-inducible cell type-specific Cre-recombinase in conjunction with…" The information in this sentence seems better placed before the previous sentence. A new paragraph could start: "To generate ground-truth datasets, we selected a labelling strategy consisting of…"

Line 79-80: "We acquired volumetric images of nuclei from these mouse lines and induced tdTomato expression via intraperitoneal injection of tamoxifen." The authors could move their tdTomata bleed-through information to this sentence.

Line 81: "the same location" is singular, but in previous sentence volumetric imaging is plural.

Line 81-83: "the previously imaged nuclei were selected by overlay with the red fluorescence signal." Readers may not understand this phrase. The authors should mention the green signal and those nuclei with both green and red overlap are selected. Was selection manual or automated?

Line 83-84: "As microglia changed their positions after induction with tamoxifen…" Was this effect only for microglia? If so, the authors should clarify this.

Line 88-90: "Hence, this approach was used to ascertain images of the labeled nuclei to be undisturbed by signals possibly arising from the red fluorescence used for unequivocal identification of the cell type." This sentence is confusing. Is it a summary sentence for all cell types or just neurons? As before, the word "unequivocal" is incorrect.

Line 91: "Automated segmentation of nuclei in the raw data…" This is the first mention of segmentation and it is not sufficiently described here. At what stage(s) does segmentation occur during the NuCLear method?

Line 106: "Having demonstrated the ability of nuclear features to distinguish between their corresponding cell types…" The authors should begin a new paragraph here.

Line 108-100: "Using data from the five reporter mouse lines, we trained one classifier per cell type for distinguishing one cell type from the respective other cell types." This sentence is difficult to understand and should be reworded.

Line 110-111: "To increase the amount of training data and equalize the nucleus counts for each cell type to reduce training bias…" This phrase has run-on "to" prepositions.

Line 114-115: "After classifying the whole dataset, nuclei were either assigned to a single class, two or more classes (undecided) or to none (unclassified)." The phrase "After classifying the whole dataset" is confusing: this sentence is about classifying. The authors should remind the reader what their different classes are.

Line 116-117: "Precision and recall rates for each classifier were high for neurons and endothelial cells." "Precision" and "recall" are not defined. Reference to Figure 3 in next sentence should be moved to this sentence.

Line 117-118: "After combining all classifier output…" It is not clear what this means.

Line 118: "Accuracy" is not defined. Readers may not understand the difference between precision and accuracy.

Line 129-130: "The 3D classification results for one example volume containing 20123 cells…" The authors should start a new paragraph here.

Line 130: I do not find Figure 3E and supplementary Figure 6 "intuitive".

Discussion:

Line 150: "all cells" is not correct.

Line 153: "feasible" is not the correct word to use here.

Line 157: "perturbations, treatments, or adaptive processes." These could be clarified, especially "adaptive processes".

Line 158-159: "…the best classification results regarding total classification accuracy and precision…" The second "classification" is redundant.

Line 171: "expressed increasingly" should be reworded.

Line 173: "concluded" is not the correct word to use here.

Line 177-190: "When applying the classification model to a Layer 2/3 volume…" This paragraph could be incorporated into the authors' Results.

Line 178: "…extrapolating the number of cells per mm³…" It is not clear what this means.

Line 196-197: "When analyzing the effect of time on the imaging volumes from the secondary motor cortex…" I could not find the "time" aspect in the Results or figures.

Line 204-205: "Our study illustrates only one possible application of NuCLear and subsequent analysis of 3D issue composition." The authors have not presented an application of their method, only details of the method itself.

Line 211: The word "easily" is not correct.

Line 214-215: "We anticipate that any cell type will be detected…" This is a bold claim.

Line 216: "…we propose an easy and readily usable method…" It does not seem to me the authors' method is easy nor readily accessible. It requires someone trained in neural networks.

Line 217: The authors have mentioned the use of their method for "different organs" in multiple places but have not clarified what organs they are referring to. Most organs are not accessible for in vivo imaging.

Materials and Methods:

Line 275: "Two photon imaging was carried out with a two-photon microscope". "Two photon" is redundant.

Line 303-304: "The StarDist model was trained to segment nuclei in 3D using an in-house developed Jupyter Notebook". There does not seem to be enough information here for other labs to develop their own in-house segmentation model.

Line 304: "Manually segmented and labelled nuclei…" It was not explained how nuclei were manually segmented/labelled. This may be non-trivial.

Line 313: "Feature extraction using PyRadiomics". The authors should add "nuclei" to the header.

Line 315: "…for each segmented nucleus…" The authors do not explain which nuclei dataset they are using here.

Line 320: "For supervised training of the deep neural network for cell type classification…" This phrase has run-on "for" prepositions.

Line 324: "label ID" is not defined.

Line 327-328: "To increase the number of nuclei for training and make it equal in size for all cell types to avoid a bias in training…" This phrase has run-on "to" prepositions.

Line 336: "manual validation was performed in ImageJ/Fiji". The method for this was not explained.

Line 353: Do the authors have a reference for the "adam" adaptive moment estimation optimizer?

Line 356: It is not clear what "epoch" refers to.

Line 361: "all nuclei" is not correct.

Line 363: "every segmented nucleus" should be "the segmented nuclei".

Line 364: "Every segmented nucleus" should be "The segmented nuclei".

Line 360-367: This section is confusing as it mostly repeats the previous methods.

Line 374: the authors have not specified what type of t-test they used.

Figures:

Figure 1: "all cell nuclei" is not correct. Missing information: mouse, type of mouse, type of fluorescence. There is no "c" label in the figure. Cell types and colours are not defined. It looks like the endothelial cells line the blood vessels but this is not explained by the authors.

Figure 2: Labels in legend do not match those in the figure. There are no "3D renderings" in the figure. Diameters have no units. It is not clear what "algorithm" the authors are referring to in the legend. The results in this figure may be confusing since the nuclei have been separated into cell types (panels e and f) before the authors have explained how they classified their neurons via neural networks (Figure 3).

Figure 3a: It is not clear what is plotted under "radiomics features" and "neural network classifier". There are no units for volumes or surface-to-volume ratios.

Supplementary Figure 3: "A sequential forward feature selection algorithm was applied." I am not sure if this algorithm was mentioned in the manuscript.

Decision Letter 2

Richard Hodge

6 Sep 2023

Dear Dr Knabbe,

Thank you for your patience while we considered your revised manuscript "Comprehensive monitoring of tissue composition using in vivo imaging of cell nuclei and deep learning" for publication as a Methods and Resources Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and the original reviewers.

Based on the reviews, I am pleased to say that we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by Reviewer #2. Please also make sure to address the following data and other policy-related requests that I have provided below (A-G):

(A) We would like to suggest the following modification to the title:

“NuClear enables accurate classification of major brain cell types during in vivo imaging using neural network processing”

(B) In the Financial disclosure statement in the online submission form, please provide any funding sources that you received to conduct the study, along with the corresponding grant numbers.

(C) You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

-Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

-Deposition in a publicly available repository. Please also provide the accession code/DOI so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it:

Figure 2F, 3C-J, S1I, S2C-F, S3A-F, S4D, S4F, S5

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

(D) Thank you for already depositing the code and examples in Github (https://github.com/adgpta/NuCLear). At this stage, we ask that you please attach this deposition to the Zenodo data repository to ensure long term maintenance and that the deposition is assigned a DOI. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

(E) Please also ensure that each of the relevant figure legends in your manuscript include information on *WHERE THE UNDERLYING DATA CAN BE FOUND*, and ensure your supplemental data file/s has a legend.

(F) Please ensure that your Data Statement in the submission system accurately describes where your data can be found and is in final format, as it will be published as written there.

(G) Please also provide a blurb which (if accepted) will be included in our weekly and monthly Electronic Table of Contents, sent out to readers of PLOS Biology, and may be used to promote your article in social media. The blurb should be about 30-40 words long and is subject to editorial changes. It should, without exaggeration, entice people to read your manuscript. It should not be redundant with the title and should not contain acronyms or abbreviations. For examples, view our author guidelines: https://journals.plos.org/plosbiology/s/revising-your-manuscript#loc-blurb

------------------------------------------------------------------------

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: The authors have addressed my concerns. I do not have any follow-up concerns.

Reviewer #2 (Jason Seth Rothman, signs review): "Comprehensive monitoring of tissue composition using in vivo imaging of cell nuclei and deep learning"

The latest version of the authors' manuscript has improved considerably from their initial version. The authors have addressed all of my major and minor concerns. I have only a few remaining minor concerns about the revised manuscript.

Line 18. "Comprehensive analysis of tissue composition has so far been limited to ex-vivo approaches." The authors state they changed this sentence but it has not changed. My example of MRI was to highlight how vague this sentence is, which does not even include the word 'imaging'.

Lines 26 and 147: 'relative abundance' would be better as 'relative density'.

Line 28: 'cellular composition' would be better as 'tissue composition' or 'cell-type composition' as used elsewhere in the manuscript.

Line 32-34: 'Understanding the plasticity and interaction of different brain cell types in vivo has been a limiting factor for the investigation of large-scale structural brain alterations associated with diverse physiological and pathophysiological states.' This sentence is confusing. It is not clear what the 'limiting factor' is.

Line 39-40: 'studies assessing whole tissue composition quantitatively' would be better as 'quantitative studies assessing whole tissue composition'.

Line 47: 'composition' would be better as 'tissue composition'.

Line 50: 'in the same mouse in vivo' would be better if the authors added 'over time', or some longitudinal aspect of their study.

Line 75-76: 'feature extraction' would be better as 'feature extraction of the segmented nuclei'.

Line 97: 'right-shifted' might be better as 'red-shifted'.

Line 101-103: 'nuclei were manually selected from the pre-induction timepoint, wherever the green and red fluorescence signals overlapped in the post-induction timepoint images.' This phrase could be changed to: 'nuclei were manually selected from the pre-induction timepoint images (green fluorescence signal) wherever they overlapped the post-induction timepoint images (red fluorescence signal).'

Line 128-129: 'a single neural network classifier was trained for each cell type' would be better as 'five neural network classifiers were trained for each cell type'.

Lines 152-158: 'When the trained model was applied, two features not chosen by the automated feature selection algorithm differed significantly between the classes…' These two sentences are confusing here since it is not clear how they relate to the 'longitudinal' topic of this paragraph.

Line 182: 'z-step size' would be better as 'axial resolution'.

Line 183: 'more than 20000 cells' would be better as '~25000 cells' as stated earlier in the manuscript, or something similar.

Line 182-186: 'We were able to image a whole 3D volume of 700 μm³ consisting of more than 20000 cells in around 20 minutes, making it possible to perform large scale data acquisition in a repeated manner, enabling longitudinal intravital monitoring of tissue histology over time and in response to perturbations, treatments, or adaptive processes such as learning or exploring.' It is not clear whether the text following with 'making it possible to perform…' describes what the authors actually did in their study, or whether they are suggesting these methods are hypothetically possible. If it's the latter, then this sentence would be better if it was broken into two different sentences so there is no confusion.

Line 368: 'Nuclei touching the borders of the imaging volume were excluded.' The authors state in their reply that the error introduced by their counting method is small. They could include their reasoning here in the Methods, which would be helpful to others who decide to use the authors' methods.

Line 376: 'This allowed for identification of the label… ' It is not clear what label the authors are referring to here or why this label has not been identified. The previous mention of a 'unique label' indicates the labels were assigned manually and therefore should be known.

Line 406: "sequentialfs" should be "sequentials".

Figure 1B. There is a box for 'Trained network models' that is followed by what appears to be the trained network models. This looks redundant.

Decision Letter 3

Richard Hodge

30 Sep 2023

Dear Dr Knabbe,

Thank you for the submission of your revised Methods and Resources Article "Accurate classification of major brain cell types using in vivo imaging and neural network processing" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Carole Parent, I am pleased to say that we can accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Best wishes, 

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

ORCiD I plosbio.org I @PLOSBiology I Blog

California (U.S.) corporation #C2354500, based in San Francisco

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    (A) (i–iii) Chronic cranial window implantation using a curved glass cover slip and a custom 3D printed holder. (iv) Fixation of the anesthetized mouse in the custom holder for imaging under the two-photon microscope. (B) X-Z maximum intensity projection of a two-photon volumetric stack (700 μm × 700 μm × 700 μm). (C) Imaging position on a DAPI-stained brain slice. Slice thickness 50 μm. DAPI intensity profile showing similar distribution of intensity as a two-photon image of nuclei. The decreasing nucleus density in the two-photon image stack is a result of attenuation of fluorescence signal at higher depths. Inlay: Nucleus density distribution of sub-volumes used for data analysis. (D) Scheme showing intracranial and intraperitoneal injection strategies for labeling cell types. (E) Fluorescence emission spectra for eGFP, tdTomato, and mCherry (y axis: fluorophore emission normalized to quantum yield, source: fpbase.org). (F) Labeling strategies for red fluorescence expression in reporter mouse lines. Visualization of crosstalk between the eGFP and tdTomato signal after induction with tamoxifen. Overlay of pre (cyan) and post (yellow) GFP after image alignment with the ImageJ plugin bUnwarpJ [33]. (G) No crosstalk is visible between GFP and mCherry signals (upper panel) or eGFP and Tomato lectin-Dylight 594 signals (lower panel). (H) (i) Raw data of H2B-eGFP signal, (ii) manually labeled ground truth, (iii) StarDist segmentation, (iv) composite image of ground truth (red), and StarDist segmentation (green). (I) Upper panel: Count of manually segmented ground truth nuclei for StarDist training, each color depicts an individual mouse. Lower panel: StarDist nucleus detection accuracy and precision, each bar represents an individual imaging volume. Plot data can be found in S1 Data.

    (TIFF)

    S2 Fig. Training a classifier to distinguish excitatory from inhibitory neurons.

    (A) Labeling strategies for excitatory and inhibitory neurons. Intracortical injections were performed at approximately 400 μm deep from the cortex of H2B-eGFP mice using AAV5-CamKIIα-mCherry and AAV1-mDLX-NLS-mRuby2 to visualize excitatory and inhibitory neurons. (B) Post-injection images. Inhibitory neurons were imaged 2–3 weeks after injection; excitatory neurons were imaged 3–4 weeks after injection. (C) Radiomics features showing differences between excitatory and inhibitory neurons for (i) surface area (in pixel 2 [resolution x = 0.29 μm, y = 0.29 μm, z = 2 μm]), (ii) maximum 2D diameter (in pixel, same resolution as in (i)), (iii) first order 10th percentile, (iv) mesh volume in voxels (resolution as in (i)), (n of excitatory neurons: 396, n = 3 mice, n of inhibitory neurons = 122, n = 2 mice). (D) Confusion plot for the classifier. Rows show the predicted class (output class), and the columns show the true class (target class). Green fields illustrate correct identification whereas blue fields illustrate erroneous identifications. The number of observations and the percentage of observations compared to the total number of observations are shown in each cell. Column on the far right shows the precision (or positive predictive value) and false discovery rate in green and red, respectively. Bottom row denotes recall (or true positive rate) and false negative rate in green and red. Cell on the bottom right shows overall accuracy of the classifier. (E) Number of nuclei per mm3 in the secondary motor cortex at baseline after 12 weeks (dashed line) (n = 8 mice). (F) Mean nucleus volume after 12 weeks normalized to baseline (n = 8 mice). (Significance testing for C, E, F: Wilcoxon test, p-values were corrected for multiple comparisons using the Bonferroni method, p < 0.05*, p < 0.01 **, p < 0.001 ***) Plot data can be found in S1 Data.

    (TIFF)

    S3 Fig. Combinations of nuclear morphology, intensity, and texture features show clear distinction between cell types.

    (A) Flatness, gray level non-uniformity. (B) Minor axis length, flatness. (C) First order minimum, elongation. (D) Large area emphasis, first order 10th percentile. (E, F) Combinations of features in 3 dimensions. Plot data can be found in S1 Data.

    (TIFF)

    S4 Fig

    (A) Visualization of the entire classification training process. After ground truth data were selected, a sequential forward feature selection algorithm was applied to extracted features from all nuclei of all cell types, which selected 12 features from the 107 radiomics features. Synthetic data was generated from all the radiomics features of all nuclei and all cell types. The combined dataset was used to train a classifier with the training, validation, and test data comprising 70%, 15%, and 15% of the combined dataset. (B) Example showing manual selection of ground truth data for supervised training. Post-induction green and red channels were overlayed to create a composite. Any nuclei appearing yellow (possessing green and red fluorescence) were identified in the preinduction GFP channel and from its corresponding segmentation, a label id was acquired and later used to identify the extracted features. (C) Synthetic training data generated from the original dataset matched the features of the original datasets. (D) Statistical analysis of similarity between the distribution of original data and distribution of synthetic data (K-S test; mean = 83.68%). (E) Datasets with different amounts of synthetic data were created and training accuracy was compared between them, orig = original dataset, orig * 2.5 = dataset containing 2.5 times the amount of data as the original dataset, orig * 2.5 (down sampled) = dataset down sampled to minimum sample count (after 2.5-fold increase) to equalize sample numbers for all cell types, orig * 9 = dataset containing 9 times the amount of data as the original dataset, orig * 9 (down sampled) = dataset down sampled to minimum sample count (after 9-fold increase) to equalize sample numbers for all cell types. (F) Mean accuracy of classifiers trained 5 times with different combinations of synthetic data; error bars denote standard deviation (SD). Light blue = orig * 2.5, dark blue = orig * 2.5 (downsampled), green = orig * 9, yellow = orig * 9 (down sampled), red = orig. (G) Table listing the amount of manually identified nuclei used in the training process (Excitatory N: excitatory neurons, Inhibitory N: inhibitory neurons). Plot data can be found in S1 Data.

    (TIFF)

    S5 Fig. Confusion matrices for each classifier for the test dataset.

    Since each classifier distinguishes between the desired class and every “other” class, the confusion matrix consists only of 4 fields. Rows show the predicted class (Output Class), and the columns show the true class (Target Class). Green fields illustrate correct identification of target and “other” class, blue fields illustrate erroneous identifications. The number of observations and the percentage of observations compared to the total number of observations are shown in each cell. Column on the far right shows the precision (or positive predictive value) and false discovery rate in green and red, respectively. Bottom row denotes recall (or true positive rate) and false negative rate in green and red. Cell on the bottom right shows overall accuracy of the classifier. Plot data can be found in S1 Data.

    (TIFF)

    S6 Fig

    (A) Three-dimensional representation of raw, segmented, and classified data (700 μm × 700 μm × 700 μm volume) visualizing comprehensive cell type composition. (B) Cell type distribution in a single volumetric stack. (C) Maximum intensity z-projection of a sub volume (150 μm × 150 μm × 100 μm) showing raw, segmented, and classified nuclei in the X-Y axis.

    (TIFF)

    S1 Table. Radiomics features extracted from segmented nuclei.

    Features marked in green were used for cell type training and classification.

    (TIF)

    S2 Table. Morphology, intensity, and texture features used for training and classification.

    (TIF)

    S1 Text. Explanations for technical terms used in the manuscript.

    (DOCX)

    S1 Movie. Illustration of nucleus segmentation and classification using NuCLear.

    Raw data of H2B-eGFP-labeled nuclei (green), nucleus segmentation (randomly colored), and classification (apricot = neurons, red = microglia, blue = oligodendrocytes, dark green = astrocytes, magenta = endothelial cells).

    (MP4)

    S2 Movie. Visualization of the manual nucleus segmentation process.

    (MP4)

    S3 Movie. Visualization of a classified 3D volume.

    (MP4)

    S1 Data. The excel sheet contains all data that were used to generate Figs 2F, 3C–3J, and S1I, S2C–S2F, S3A–S3F, S4D, S4F and S5.

    Data can be found on sheets within the excel document named after the corresponding figure and panel.

    (XLSX)

    Attachment

    Submitted filename: Response_To_Reviewers_Knabbe.docx

    Attachment

    Submitted filename: Response_To_Reviewers_Knabbe.docx

    Data Availability Statement

    All datasets used in the study are available on the heiData archive: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/L3PITA Code and exemplary data can be found on the Github page: https://github.com/adgpta/NuCLear We also published all 3 Github repositories on Zenodo: NuCLear: https://zenodo.org/badge/latestdoi/662776296 NucleusAI: https://zenodo.org/badge/latestdoi/671099550 SynthGen: https://zenodo.org/badge/latestdoi/665629516.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES