CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models

Haoran Chen; Robert F Murphy

doi:10.1101/2024.10.31.621408

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Nov 3:2024.10.31.621408. [Version 1] doi: 10.1101/2024.10.31.621408

CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models

Haoran Chen ¹, Robert F Murphy ^1,^*

PMCID: PMC11565948 PMID: 39553984

Summary

Recent advances in multiplexed fluorescence imaging have provided new opportunities for deciphering the complex spatial relationships among various cell types across diverse tissues. We introduce CytoSpatio, open-source software that constructs generative, multirange, and multitype point process models that capture interactions among multiple cell types at various distances simultaneously. On analyzing five cell types across five tissues, our software showed consistent spatial relationships within the same tissue type, with certain cell types like proliferating T cells consistently clustering across tissue types. It also revealed that the attraction-repulsion relationships between cell types like B cells and CD4-positive T cells vary with tissue type. CytoSpatio can also generate synthetic tissue structures that preserve the spatial relationships seen in training images, a capability not provided by previous descriptive, motif-based approaches. This potentially allows spatially realistic simulations of how cell relationships affect tissue biochemistry.

Keywords: point process models, generative models, cell types, tissue images, spatial relationships, multiplexed fluorescence imaging, synthetic tissue simulation, spatial proteomics

Introduction

The functions of a tissue are often determined by the type and arrangement of its constituent cells. Distinct shapes, sizes, and molecular properties of cell types lead to specialized functions within a tissue [1–5]. However, spatial relationships among various cell types within diverse tissues are often more complex, and their impact on tissue functions is not fully understood.

Traditional imaging techniques, such as confocal microscopy, electron microscopy, and computed tomography (CT), have allowed scientists to investigate the spatial relationships between specific cell types within particular tissues [6–9]. However, these approaches typically required manual annotations of cell types. Therefore, they faced challenges of subjectivity in cell type annotations, limited scalability of conclusions across tissues, and most notably, the inability to capture complex spatial relationships due to the restriction on the number of identifiable cell types.

Recent advances in multiplexed imaging approaches for spatial transcriptomics and proteomics offer an unprecedented opportunity for researchers to explore the spatial relationships between a diverse range of cell types simultaneously [10–14]. By employing biomarkers targeting distinct RNA transcripts or proteins within cells in a multiplexed manner, various cell types can be concurrently visualized in tissue samples [15–17].

This advancement has motivated researchers to investigate spatial relationships among cell types with a variety of methods, mainly involving quantification and summarization of colocalization and correlation between cell types using analytic and statistical methods.

Behanova et al. [18] summarized and reviewed a variety of spatial statistics methods, tools, and software. The primary focus was on testing various hypotheses regarding whether cell types are randomly distributed, rather than attempting to construct models to capture complex spatial relationships.

Filipek et al. [19] presented CytoMAP, a spatial analysis platform that quantified local cell composition and global tissue structure. This platform defines cell-centered local neighborhoods across the tissue, and groups similar neighborhoods together through clustering methods. It provides overall correlation and neighborhood composition between cell types for colorectal tumor and lymphoid tissues. While CytoMAP is a powerful tool for the spatial analysis of cell type relationships in tissue images, it has certain limitations. First, choice of the range for cell-centered local neighborhoods would be expected to significantly affect results. This could limit the reproducibility of the analyses and the comparability of spatial relationships between cell types in different tissues. Second, while CytoMAP does offer correlations and neighborhood compositions between cell types, it may oversimplify the complexity of spatial relationships among various cell types, a common concern shared with spatial statistics.

Barlow et al. [20] hypothesized that tissues are composed hierarchically from smaller to larger components following certain assembly rules. To test this hypothesis, a hierarchical computational framework was devised to systematically identify the characteristic local compositions of cell types, known as cellular neighborhoods, map the local interactions and co-localization of these neighborhoods into distinct microenvironments, and delineate assembly rules that govern the formation of these microenvironments into tissue motifs. This hierarchical analysis produced proposed assembly rules for normal lymph node, spleen, and tonsil tissue, as well as colorectal cancer tissue. However, like CytoMAP, both the specific choices of the hierarchical design and the fixed parameters used to define the ranges of neighborhoods and microenvironments were not well justified or explored. The approach was also not incorporated into a probabilistic, generative framework to allow estimation of the likelihood of a tissue image being produced by a given model and/or the quantitative similarity between different tissues, and to allow generation of synthetic tissue images.

To address the limitations of these existing methods, we sought to employ generative statistical models to learn and represent the complex spatial relationships between different cell types in different tissues beyond pairwise analyses of colocalization and correlation.

Spatial point process models [21] are generative statistical models designed to learn the probability of individual objects (points) occurring at specific locations in space, including dependence of that probability on locations of other objects. The collection of points (including their locations within a defined region) are referred to as a “point pattern”, and models capturing how such point patterns are generated are referred to as “point process models.” These models have found widespread application in the analysis of spatial relationships across various domains, such as meteorology [22], ecology [23], criminology [24], and social sciences [25]. In cell biology, spatial point process models have been employed to elucidate the spatial relationships between punctate organelles and various cellular components, such as the nuclear membrane and microtubules [26, 27]. They have also been used to investigate the assembly of viral ribonucleoprotein complexes [28] and to identify prognostic structural features in colon cancer tissues [29]. Although these point process models have been successful in revealing spatial dependencies and interaction patterns between objects in different contexts, they typically focus on one type of object at a time. In these models, the locations of other point types, if they exist, are treated as influential “factors” that may affect the spatial distribution of the target point type. Consequently, separate models must be trained for each object type. To address this limitation, we chose to implement a point process model capable of simultaneously learning the spatial relationships between many types of objects, namely multitype point process model (or marked point process model) [30–34].

In a multitype point process model, when assuming there are interactions between different types of points, a common challenge is to determine the maximum interaction distance over which two types of points can influence each other. Conventionally, a range parameter was determined either by the distance from the nearest neighbor up to the distance of commonly observed interactions between two types of points [35] or by a distance distribution of nearest-neighbor between two types of points [36]. While these approaches offer a useful approximation, they are highly dependent on empirical observations from the data set. If the data set is limited or biased, they might lead to an inaccurate estimation of the maximum interaction distance. They might also constrain the extent to which models trained on different tissues may be compared. To overcome this challenge, we designed multirange models wherein different types of points can influence each other differently based on a specific range. This allows greater sensitivity in distinguishing different types of interactions.

In this study, we introduce CytoSpatio, open-source software that constructs generative, multirange, multitype point process models. We demonstrate its superior performance over single-range models using images from five different tissues containing five distinct cell types. We show how the models can be used to compare cell type spatial relationships between images from the same tissue or between images of different tissues. Additionally, we can use our approach to evaluate heterogeneity in different tissue subregions. Perhaps most usefully, we construct interaction network graphs that directly exhibit and compare the spatial relationships among cell types. Lastly, we demonstrate that our models can generate synthetic tissue images that preserve the spatial relationships observed in real tissue images, enabling rigorous validation of the models. Figure 1 illustrates the processes involved in constructing models using our approach.

CytoSpatio process for learning spatial relationships between different cell types. (A) A region from a larger lymph node image is shown, with cell types shown in different colors and cell boundaries shown in white. The blue concentric circles denote five distance ranges of 100–500 pixels at 100-pixel intervals. (B) The training process involves counting the number of other cells of each type within varying distance ranges for each cell, as illustrated for the central cell (small blue diamond) in panel A, a B cell. (C) A simplified version of the equation used for the fitting process in a point process model to learn the spatial relationships among cell types is shown. The probability $λ$ of a particular cell type c at a given location, $x$ , is given by a (global) base intensity $(β)$ adjusted for the influence of (multiplied by) the local frequencies of all cell types. This adjustment is given by the dot product of a vector of interaction coefficients $(δ)$ for this cell type with all cell types (including its own) and a vector ( $C o u n t s (x)$ ) reflecting the counts of each cell type. The interaction coefficient and counts can be for a single range (i.e., one of the columns in panel B) or can be concatenated across multiple ranges (i.e., linearizing the counts in panel B). (D) Predicted intensities (proportional to the probabilities of occurrence) are shown for three cell types for each cell in this region (derived from a model trained with the entire image). Brighter colors indicate a higher predicted intensity, with each color corresponding to a distinct cell type. (E) A synthetic image depicting predicted cell types generated for this region from the model is shown. The image was generated from the model using the positions of each cell in panel A but assigning each cell’s type based on the predicted probabilities across the cell types for that location (cell type colors are the same as in panel (A)).

Results

For this study, we used multiplexed tissue images from the Human BioMolecular Atlas Program (HuBMAP) [37]. Images for five tissues were segmented into single cells and the cell type of each cell was assigned as described in the Methods.

Assessing non-randomness of cell type distributions in different tissues

We began our analysis by exploring whether the cell type distribution in each tissue is random, which would imply a lack of meaningful spatial relationships among cell types. We posed a null hypothesis that the cell type distribution in a tissue image would be equivalent to a distribution with the same cell locations but randomized cell types. For each tissue, we randomized the cell types within all images 100 times, generating 100 sets of point patterns with shuffled cell types. These patterns served as a background distribution for our hypothesis testing. For each set, we trained a multitype Strauss Hardcore model (see Methods) with the range that two cell types can affect each other (referred to as a Straus radius) set to 100 pixels and the range within which two cells cannot come closer to each other (referred to as a Hardcore radius) set to 1 pixel (1 pixel equals 0.377 microns). To measure agreement between a model and a set of point patterns, we used a metric that quantified the average disparity between each point pattern and the predicted intensity from the model (average deviance per cell, see Methods). For each shuffled model, we measured average deviance per cell against a randomly selected shuffled point pattern set from the same tissue, and also against the unshuffled point pattern from the original image.

As shown in Figure 2, we consistently observed that the average deviance per cell was lower when the models trained on a shuffled pattern set were tested against another shuffled point pattern set (red boxplots), as compared to when tested on the original point pattern set (green boxplots). As expected, this indicates for each tissue that the shuffled pattern sets were more similar to each other than they were to the original (unshuffled) pattern set. This strong deviation of the original patterns from randomness was statistically significant (p<0.01) for all five tissue types investigated. Interestingly, we found that the cell type distributions in thymus, small intestine (SI), and large intestine (LI) were particularly non-random, resulting in significantly higher deviance when their randomized models were tested against the original patterns.

Comparison of average deviance per cell between shuffled point pattern sets and original point pattern sets. Lower average deviance per cell indicates a higher likelihood that a particular image could have been produced by a given model. The average deviance per cell is depicted in the boxplots, with the red boxplots representing the deviances when models trained on a shuffled point pattern set were compared to another shuffled point pattern set. The green boxplots represent the deviances when the same models were compared to the original point pattern set. Whiskers are drawn at 1.5 times the difference between the first and third quartiles. The significantly higher deviances for the original patterns compared to those for the shuffled patterns demonstrate the non-random distribution of cell types within the tissues studied. How the extremely high deviances seen in some cases can be obtained is discussed in the Methods.

Comparing multirange to single range of Strauss Hardcore

We next evaluated whether our multirange, multitype Strauss-Hardcore model (see Methods) provides a more accurate fit for learning spatial relationships among cell types in our tissue images, compared to conventional Strauss Hardcore models with a single Strauss radius. For each tissue, we trained Strauss Hardcore models using various single radii (in pixels), as well as our multirange model that incorporates five distinct Strauss radii ranging from 100 to 500 at 100-pixel intervals.

An important component of constructing point process models is the creation of “dummy” points that have different types than the observed points so that the model can learn not only that observed points should have high probability, but that non-observed point should in general have low probability (see Methods). In order to compare models for different radii, we evaluated each model’s goodness-of-fit using the average deviance per real cell, per dummy cell, and per both real and dummy cells.

Figure 3 shows that, compared to the conventional Strauss Hardcore models with five single ranges, our multirange model consistently yielded the lowest average deviances for all five tissue types. Interestingly, we observed a gradual decline in the performance of the single radius model as the Strauss radius expanded from 100 to 500 pixels. This implies that the positioning of specific cell types is primarily influenced by their proximate neighboring cells, while cells at greater distances may introduce mixed spatial relationships that lower the prediction accuracy. Despite this, the spatial information derived from cells at larger distances remains beneficial for predicting cell types, contributing to the superior accuracy of the multi-range model across the five tissue types.

Performance comparison between multirange and single range multitype Strauss Hardcore models. The average deviance per cell for all cells, real cells, and dummy cells respectively measure the overall goodness-of-fit of the model, the prediction accuracy of cell types at their locations, and the accuracy of predicting locations devoid of real cells.

It is important to consider the relationships between the radius ranges used in constructing the models, the radii of the cell types being considered, and the size of the image pixels. For images with the same pixel size and similar cell radii, models can be directly compared (as we have here). As long as pixel size of the image (the width and height of each pixel in the sample plane; 0.377 microns for the images analyzed here) is reasonably smaller than the typical radii of the cell types, it does not significantly affect the estimation of cell-cell distances (when expressed in microns). Models for images of different pixel sizes can also be compared as long as the radius ranges (in pixels) are adjusted for each image so that they represent the same length (in microns).

Evaluating differences in cell type spatial relationships within and across tissues

We next asked, using two distinct approaches, whether spatial relationships among cell types were more similar within the same tissue than they were between different tissues. Both approaches used sets of models for each tissue that were derived from a leave-one-out cross-validation process (see Methods).

The first approach involved calculating the Gaussian kernel similarity between the concatenated vectors of interaction coefficients for all radii (which encode the attraction or repulsion among cell types) of a pair of models. To provide an overall measure of similarity within or between tissues, we averaged similarity values between all pairs of models for a given tissue, and between all pairs of models from two tissues (Figure 4A). We found that models trained on images from the same tissue yielded similarities very close to 1. The value for lymph node were slightly lower, indicating slightly more heterogeneity among the images of that tissue. All of the same tissue values were consistently higher than those comparing models from different tissues. However, spleen, lymph node, and thymus tissues were more similar to each other than any of them were to either large or small intestine (which we quite similar to each other).

Comparison of cell type spatial relationships within and across different tissues. (A) The interaction coefficients between models are directly compared using Gaussian kernel similarity. Lighter color indicates greater similarity. (B) The predictive accuracy on held-out images of a given tissue as well as images from other tissues was measured using wmAUC. In each tissue panel, the violin plots are arranged in descending order of the mean from left to right, and the mean is indicated by an “x”.

These distinct similarities and dissimilarities might also be a reflection of the organs’ primary biological systems and functions. The spleen, thymus, and lymph node are primarily part of the immune system, which could explain their high intra-tissue similarity. Conversely, the large and small intestines mainly serve the digestive system, but they also have immune functions. This dual role might contribute to the distinctive spatial relationships we observed between these two and other three tissues.

For our second approach, we employed the models from leave-one-out cross-validation to predict cell types (see Methods) in the held-out images of the same tissue that the model was trained on, as well as images from other tissues. We hypothesized that high predictive accuracy would indicate similar spatial relationships among cell types in the training and prediction images. The prediction accuracy of cell types was quantified using the weighted macro Area-Under-the-Curve (wmAUC, see Methods).

The results (Figure 4B) showed high (<0.7) values for all similarities between predicted and original cell types of the same tissue, especially considering the difficulty of predicting a single cell type only from the types of its neighbors. The highest value for comparisons among spleen, thymus, and lymph node were not always those for a tissue with itself; this does not indicate poor performance of the model but rather reflects the similarity between those tissues as already observed above. Those tissues also had a more consistent range of wmAUC values among images from the same tissue compared to those from the small and large intestines. This suggests that regions within the spleen, thymus, and lymph nodes share greater intra-tissue similarity than the intestines.

Analyzing heterogeneity within tissue images

One assumption of point process models is that point patterns are homogeneous; in our case this means that spatial relationships among cell types remain consistent at different locations within the tissue. However, most tissues have distinct structural and functional units within them (such as stem cell niches). To evaluate whether such organization may be reflected in heterogeneity in cell spatial interaction models, we randomly segmented subregions (tiles) from the original images at two different sizes (5000×5000 and 2500×2500 pixels). Tiles were required to contain at least 100 cells of all cell types and have at least one-fifth of the average number of cells per tile for that image. We ensured that the edges of each tile were at least 500 pixels away from the original image edges, since cells too close to the edge cannot have their interactions accurately counted.

For the same reason, we counted interactions for each cell within a tile with nearby cells outside the tiles. We trained and tested our model on each original image and tile, and for each tile size, we formed a matrix where each row represents a model for a given tile and each column corresponds to a interaction coefficient. Using principal component analysis, we extracted the two major modes of variation, enabling visualization of heterogeneity between individual tile models (Figure 5A–E). We also transformed the interaction coefficients of the model trained on all original images of each tissue using the fitted PCA.

Evaluating tissue heterogeneity of cell type relationships. Panels A to E show the top 2 principal components of the interaction coefficients of various trained models. Panel F illustrates the change of heterogeneity with the tile size for the five tissues.

We also calculated the median of the Euclidean distances between the coefficients of models trained on tiles and coefficients of the model trained on all original images of that tissue. We used this value as a heterogeneity metric (Figure 5F).

As discussed above, spleen, thymus, and lymph nodes displayed lower heterogeneity across their original images compared to those of the large and small intestines. This homogeneity also persists for smaller subregions of those tissues (Figure 5A,B,C) compared to intestine (Figure 5D,E). Figure 5F further quantifies this difference. It is of interest to note that within the three similar tissues, spleen exhibited a much smaller increase in heterogeneity for smaller subregions, suggesting largely homogeneous spatial relationships among cell types across various region sizes in this tissue.

Visualizing cell type interaction networks

The primary goal of this study was to analyze the spatial relationships among cell types. To summarize our findings, we constructed interaction networks to visualize the interaction coefficients at various ranges in the multirange multitype Strauss Hardcore model (Figure 6).

Spatial relationships between five cell types across five different tissues. The size of each node corresponds to the total strength of self-interaction across five distance ranges for that cell type (see Figure S4 for strength of self-interaction at each range). Each pair of nodes is interconnected by five arcs, each representing a different distance range. The range increases from left to right or from bottom to top, with the smallest and farthest ranges corresponding to the most curved arcs. The strength of the relationship between two cell types is depicted by the thickness of the arc, while the nature of their interaction is indicated by the color of the arc (blue as attraction and red as repulsion). (A) A direct, unfiltered illustration by raw interaction coefficients (B) Interaction coefficients adjusted by base intensities of corresponding cell types.

We began by visualizing the interaction coefficients $(δ)$ derived from models trained on all images for each tissue type (Figure 6A). These coefficients directly reflect the inherent probability that cell types are near each other, which for simplicity we can interpret as reflecting either “attraction” or “repulsion” between pairs of cell types. However, it’s crucial to emphasize that these inferred interactions aren’t based on isolated pairwise analyses for each pair of cell types. Instead, by integrating the interactions among all cell types in a single point process model, they represent interconnected behaviors between a pair of cell types factoring in influences from all other cell types concurrently.

Our analysis unveiled a variety of noteworthy interaction patterns among different cell types across several tissues. We detected a strong self-attraction among proliferating T cells throughout all the tissues studied (indicated by their larger node diameter). Conversely, cytotoxic T cells and CD4-positive cells demonstrated strong self-attraction in the small and large intestine tissues, but not in the other three tissues. B cells showed moderate self-attraction across all five tissues. As expected, the “other cell” type (cells that could not be annotated given the five markers common to all tissues), exhibited the weakest self-attraction. This is presumably due to the diversity of cell types within this category, with their respective influences offsetting each other.

As also expected, we found that the most intense interactions between two cell types generally occurred within the shortest distance ranges. However, there were a few notable exceptions. The interactions between cytotoxic T cells and B cells in small and large intestine, as well as between proliferating T cells and CD4-positive T cells in the large intestine, were moderate across a range of distances.

Our findings show high consistency between these interaction networks and the analysis presented in Figure 4B. When comparing the interaction networks for the small and large intestines, we discovered high similarity in both the direction of influence (attraction or repulsion) and the intensity of these interactions between cell types, with exception that B cells and proliferating T cells exhibited a notably stronger repulsion against each other within large intestine compared to their counterparts in small intestine. The spleen, thymus, and lymph node also demonstrated a high degree of similarity in terms of the direction of influence (attraction or repulsion) between cell types, but with variance in strength. For instance, thymus displayed stronger repulsion between proliferating T cells and B cells than the other two tissues. Lymph node had a stronger repulsion between B cells and both cytotoxic T and CD4-positive T cells, whereas the spleen demonstrated overall weaker interactions.

Our analysis also highlighted that in spleen, thymus, and lymph node tissues, B cells and CD4-positive T cells displayed a strong repulsive tendency at short distances (less than 40 microns), while they have a moderate attraction at larger distances. Interestingly, the interaction pattern between these two cell types reverses in large and small intestine tissues.

These conclusions are all made by examining the interaction coefficients $(δ)$ directly, and thus assumes that the frequencies of the two types are approximately the same. However, it is worth noting that the extent to which a particular interaction is observed in tissue also depends on the base frequencies $(β)$ and the counts (which are also affected by the base frequencies). Therefore, in contrast to “inherent” interaction coefficients presented in Figure 6A, we also calculated “apparent” interaction coefficients by multiplying them with the appropriate base intensities. As shown in Figure 6B, all of the interactions of the “other cells” types were increased across all five tissues after adjustment, due to the high frequency of that type. We found that the self-interaction of cytotoxic T cells in spleen also increased after adjustment. These cells exhibited the strongest repulsion with “other cells” at distances less than 100 pixels (<38 microns) and the strongest attraction at ranges between 100 to 200 pixels (38 to 76 microns). A universal attraction was observed across five tissues between cytotoxic T cells, CD4-positive T cells, and “other cells” with the attraction strength varied. We noted that the repulsion between B cells and both cytotoxic T cells and CD4-positive T cells in the lymph node (LN) persisted after adjustment. Furthermore, all cell types in small and large intestine, excluding “other cells,” displayed minimal self-interaction and minimal interactions among each other after adjustment. This is consistent with the relatively low frequencies of these immune cell types in the small and large intestine tissues.

Simulating artificial tissue images from generative models

Perhaps the most valuable property of a generative model lies in its ability to create new, realistic data samples based on its learned probability density functions. We therefore asked whether our models could generate artificial tissue images that maintain spatial relationships among cell types.

To begin the simulation process, we generated cell locations using a Poisson distribution that maintained the same total cell density as the original image. We next randomly assigned cell types for all cells based on the density of each cell type in the original image. Following this, we randomly and iteratively selected a cell and reassigned its type according to the cell type counts for that location and the likelihoods derived from the model. This process was continued until the number of sampled cells reached a specified percentage of the total cell count in the image.

We conducted separate trials with different random seeds, and for each trial sampled cells from 0 to 400 percent of total cell counts in intervals of 50 percent. We measured the wmAUC of the original model with respect to the synthetic images, which reflected how well the arrangement of the assigned cell types agreed with the model. We expected that the reassignment process would result in increased wmAUC as it converged as cell type assignments in agreement with the model.

As shown in Figure S1, the wmAUC nearly monotonically increased with the resampling percentage. This observation suggests that our model is capable of generating synthetic images with cell type spatial relationships similar to those in the original images, although the wmAUC values are a bit lower than those obtained for the predicting individual cell types in original images. Even higher accuracy synthetic images could presumably be generated by using even more resampling for different random seeds and choosing the one whose coefficients are most similar to those of the model.

Figure 7 shows how our models can be used to illustrate the differences in cell type arrangement that would result for different tissues if cell locations and sizes were kept constant. To do this, an arrangement of synthetic cells was generated with randomly-chosen cell positions and with cell shapes created from them using a Voronoi diagram truncated at 20 pixels (approximately 7.5 micron radius). Synthetic images were then created from this arrangement using the models trained on all images from each tissue (with 300 percent resampling). The results reflect the trends captured by the adjusted interaction coefficients in Figure 6B for all spatial relationships between cell types, including self-interactions. In particular, the tendency of cytotoxic T cells to be near each other is preserved in all tissues even as the frequency of those cells changes. Cytotoxic T cells and CD4-positive T cells are consistently found near each other across three immune tissues spleen, thymus, and lymph node. This proximity is consistent with their high attraction as represented in Figure 6B. In lymph node synthetic tissue, B cells and CD4-positive T cells exhibit repulsion at short distances whereas attractive to each other at longer distance, aligning with the observations in Figure 6B. While B cells generally appear to be repulsive to both CD4-positive T cells and cytotoxic T cells at short distances in spleen tissue, exceptions can be found Figure 7. This may be attributed to the high intensity of both cytotoxic T cells and CD4-positive T cells in spleen. In both small and large intestine tissues, fewer B cells and T cell types are observed, which is consistent with the low “apparent” interaction strength between these cell types depicted in Figure 6B after adjustment for cell intensity. Nevertheless, we were able to discern the inherent interactions between these cell types in these two tissues, as illustrated in Figure 6A.

Synthetic tissue images across five tissue types. Each color represents a unique cell type, consistent with representations in other figures.

Discussion

Spatial relationships among cell types are critical determinants of tissue functions. In this study, we present CytoSpatio – open-source software that constructs innovative generative multitype, multirange point process models to comprehensively learn spatial relationships between 5 cell types in 5 tissues. Our model is built upon a baseline multitype Strauss Hardcore model, incorporating multiple ranges of Strauss radii in a piece-wise manner that captures diverse properties of both signs and strengths of interactions among cell types at varying distances. We demonstrated that our model successfully captures a higher similarity of cell type spatial relationships between images from the same tissue compared to images across different tissues (Figure 4A). Additionally, we provided a quantitative measurement of the spatial heterogeneity within a tissue, revealing the approximate size of heterogeneous structures in five tissues (Figure 5). To visualize the spatial relationships of cell type, we constructed interaction networks and discussed the similarities and differences across 5 tissues (Figure 6). Furthermore, we showcased the capability of our model to generate synthetic tissue images that maintain similar spatial relationships among cell types as those in the original tissue images (Figure 7).

We demonstrated that our multirange, multitype model provides a reasonable approximation for capturing complex spatial relationships among cell types, achieving a balanced trade-off between computational complexity and the ability to learn spatial relationships. In our model, we assumed a maximum range of 500 pixels, or approximately 188 microns, as the distance within which two cells could affect each other. While this is a sound estimation, extending the range further could potentially provide better insight. Furthermore, there is room for refining our model’s interaction function, which currently exhibits a sudden shift of influence every 100 pixels, or approximately 71 microns, due to the piece-wise step function (see Methods). The intervals of our current interaction function could benefit from optimization, and interaction functions with smooth transitions such as Softcore, Fiksel [38], Diggle-Gratton [39], Diggle-Gates-Stibbard [40] might also be worthwhile to explore. In addition, models capturing higher order interactions such as area-interaction [41] and Geyer saturated model [42] where the interaction functions are determined by the relationships of three or more points may be valuable. Currently, the lack of availability of software supporting the multitype versions of the interaction functions limits their use, but future implementations could enhance the representation of interactions among cell types in different scenarios.

Recently, multiplexed tissue imaging technologies have been extended to high-resolution, three-dimensional images [43]. The addition of a third dimension significantly increases the complexity of spatial relationships among cell types and the challenges associated with modeling these relationships. Consequently, there is an urgent need for 3D multitype point process models, since building models on 2D slices or 2D-projections may not capture relationships accurately. We are currently extending our pipeline to model 3D cell type spatial relationships, aiming to deepen our understanding of their impact on tissue function in a 3D context.

Our study successfully depicted the spatial relationships among five cell types in five distinct tissues, with a majority being immune cell types. Rather than making the traditional assumption that these cell types (e.g. B cell, T cell and their subtypes) are generally located near one another for close collaboration [44, 45], we have quantitatively examined their attraction and repulsion tendencies across varying distances. For example, we found a strong preference against B cells and proliferating T cells being closer to each other than ~38 microns in spleen, thymus, small and large intestine tissues but the opposite tendency at larger distances. Our approach can not only challenge existing qualitative perspectives on spatial relationships among immune cell types but can also potentially provide valuable quantitative insights into how cell types assemble to form tissues.

CytoSpatio effectively simulated cell type locations, accurately reflecting their spatial relationships with one another. We are in the process of upgrading our simulation to include cell shape. To achieve this, we require a generative model capable of learning and simulating diverse cell shapes. In this regard, a robust version of spherical harmonic transform parameterization has been demonstrated as the most effective and accurate method for generating cell shapes [46]. This enhancement will enable us to construct a more comprehensive and detailed representation of tissue images.

Methods

Tissue images and cellular data

We used 110 images from the Human BioMolecular Atlas Program (HuBMAP) consortium [37] that had been acquired using the CO-Detection by indEXing (CODEX) [11] method. A summary of these images is provided in Table S1. They were produced by two Tissue Mapping Centers (TMCs): Stanford TMC produced images of the large and small intestine with 47 fluorescence channels (markers), and the University of Florida TMC produced images of the lymph node, thymus, and spleen with 11 fluorescence channels. Image sizes vary, ranging from approximately 5,000 to 15,000 pixels, with each pixel corresponding to a tissue region of 0.37745 × 0.37745 micrometers. The images share five common channels (CD11c, CD21, CD4, CD8, Ki67) across both TMCs. We downloaded files detailing the total intensities of the cell boundary, cytoplasm, nuclear boundary, and nucleus of each channel and the coordinates of cell centers from the HuBMAP portal (https://portal.hubmapconsortium.org/). These files were generated using SPRM (https://github.com/hubmapconsortium/sprm), based on cell segmentations created by Cytokit [47].

Assigning cell types

Different cell types typically express varying levels of specific cell marker proteins. For instance, proliferating T cells demonstrate high Ki67 levels and low levels of other markers, whereas cytotoxic T cells exhibit high CD8 levels. We defined cell types based only on the five common channels to ensure comparability across tissue types. This decision allows direct comparison of spatial relationships among cell types across various tissues in subsequent analyses.

To compensate for potential differences in channel intensities across tissues, such as those that might arise during image acquisition due to experimental variables like inconsistencies in staining procedures or tissue preparation, we initially z-scored total pixel intensities per cell for each channel within each tissue.

For cell type assignment, we first performed KMeans clustering on the total pixel intensities per cell over the z-scored five common channels across all cells and images from the five tissues. Next, we calculated an overall similarity statistic T based on Gaussian Kernel similarity for intensity compositions of cells between 1) each pair of clusters from KMeans and 2) each cluster from KMeans and each annotated cell type from a lymph node image annotated by Cellar [16] (Figure S2). Using these results as features, we conducted another round of KMeans as meta-clustering to assign the clusters to the five cell types annotated by Cellar.

T = \frac{1}{m^{2}} \sum_{i = 1}^{m} \sum_{j = 1}^{m} K (X_{i}, X_{j}) - \frac{2}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} K (X_{i}, Y_{j}) + \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} K (Y_{i}, Y_{j}) K (x, y) = e x p (- \frac{| | x - y ‖^{2}}{2 σ^{2}})

where $T$ is the statistic measuring overall similarity between two cell types, lower $T$ indicates higher similarity. $m$ and $n$ are the number of cells in two cell types, respectively. $X_{i}$ and $Y_{j}$ indicate the cell intensity composition of $i^{t h}$ cell in cell type $X$ and $j^{t h}$ cell in cell type $Y . K$ is the Gaussian kernel similarity and $σ$ is the bandwidth of the kernel (we used $2 σ^{2} = 0.08$ ; this value was also used for other Gaussian kernel similarity measurements).

To determine the optimal number of clusters in the initial KMeans, we incrementally increased the number of clusters while monitoring the number of cells in each assigned cell type. We then selected the number of clusters that yielded the highest match between assigned cell types and their corresponding cell types from Cellar (Figure S3). We note that this approach enables the extrapolation of cell type determination from lymph nodes to other tissues, and it allows for finer distinctions within each cell type (i.e., the identification of potential cell subtypes).

For simplicity, all cells assigned to the type “lymphocytes of B lineage” are referred to throughout as simply “B cells.”

Point pattern and point process model

For each image across 5 tissues, we formed a point pattern $p = \{(x_{1}, c_{1}), \dots, (x_{i}, c_{i}), \dots, (x_{n}, c_{n})\}$ , where $x_{i}$ is a vector of 2-dimensional coordinates (i.e., cell center) for cell $i$ , $c_{i}$ is the cell type of cell $i$ and $n$ is the total number of cells in the image. The coordinates were defined separately in each image. The point patterns belonging to each tissue were considered as random realizations (instances) from a point process model. Our task was to define this point process model.

We assumed cells influence each other by both attraction and repulsion. Therefore, we chose to use the multitype Strauss Hardcore model [21], a kind of multitype Gibbs model, as our baseline model since it satisfies this assumption and can model all cell types at once. The model consists of an expression that allows estimation of the probability density $f (p)$ of a given point pattern given a set of model parameters (that is, the probability that a particular point pattern would have been observed given those parameters)

f (p) = α \prod_{i = 1}^{n} β_{c_{i}} (x_{i}) \prod_{i < j}^{n} γ_{c_{i}, c_{j}} (d (x_{i}, x_{j}))

where $f$ is the probability density of point pattern $p$ , $α$ is a normalizing constant, $β_{c_{i}}$ is the intensity of cell type $c_{i}$ of point $x_{i}, n$ is the total number of cells in the pattern, $γ_{c_{i}, c_{j}}$ is the interaction function between cell type $c_{i}$ and $c_{j}$ , $d (x_{i}, x_{j})$ , is the Euclidean distance between cell $x_{i}$ and $x_{j}$ . From this we can also write an expression for the conditional intensity (probability) of finding a cell of cell type $c_{i}$ at location $x_{i}$ given the point pattern $p$

λ ((x_{i}, c_{i}) ∣ p) = β_{c_{j}} (x_{j}) \prod_{j = 1, (x_{i}, c_{i}) \neq (x_{j}, c_{j})}^{n} γ_{c_{i}, c_{j}} (d (x_{i}, x_{j}))

which ignores any contribution from the actual type of that cell.

The interaction function encodes the spatial relationships between two cell types. In multitype Strauss Hardcore model, the interaction function is

γ_{c_{i}, c_{j}} (d (x_{i}, x_{j})) = \{\begin{array}{l} 0 & d < r_{h} \\ δ_{s} & r_{h} \leq d \leq r_{s} \\ 1 & d > r_{s} \end{array}

where $r_{h}$ is the hardcore radius that stands for the minimum distance that two cells can be from each other, $r_{s}$ is the Strauss radius which represents the maximum distance over which cells can affect each other, and $δ_{s}$ is the interaction coefficient that captures whether two cells may have attraction $(δ_{s} > 1)$ or repulsion $(δ_{s} < 1)$ between each other.

One limitation of the conventional Strauss Hardcore model is that the influence between cells is uniformly across a certain single range (Strauss radius $r_{s}$ ), whereas for given spatial relationships between two cell types it may actually vary with distance. To address that, we proposed a multirange multitype model with an upgraded piece-wise interaction function [48]:

γ_{c_{i}, c_{j}} (d (x_{i}, x_{j})) = \{\begin{matrix} 0 & d < r_{h} \\ δ_{s_{1}} & r_{h} \leq d < r_{s_{1}} \\ δ_{s_{2}} & r_{s_{1}} \leq d < r_{s_{2}} \\ \dots \\ δ_{s_{m}} & r_{s_{m - 1}} \leq d \leq r_{s_{m}} \\ 1 & d > r_{s_{m}} \end{matrix}

where different interaction coefficients $δ_{s_{1}} \dots δ_{s_{m}}$ are assigned to each distance interval. For each pair of cell types, we have $δ_{c_{i}, c_{j}} = (δ_{s_{1}} \dots δ_{s_{m}})$ , which is the same for all interactions between cell type $c_{i}$ and $c_{j}$ , where $c_{i}, c_{j} \in C$ , and $C$ is the set of all cell types.

Training the point process model

The standard method of fitting point process models to existing data utilizes maximum likelihood estimation (MLE). However, it’s difficult to calculate or approximate the normalizing constant $α$ in the probability density function $f$ [49]. As an alternative we calculated the log pseudolikelihood:

l o g P L (θ, x) = \sum_{i = 1}^{n} l o g λ_{θ} ((x_{i}, c_{i}) ∣ x) - \sum_{c} \int_{W} λ_{θ} ((u, c) ∣ x) d u

Where $θ = (β, δ)$ is a set of coefficients we need to estimate where $β = (β_{c_{i}})$ , $c_{i} \in C$ is the first-order term or intensity of each cell type and $δ = (δ_{c_{i}, c_{j}}), c_{i}, c_{j} \in C$ is the set of interaction coefficients between each pair of cell types, $W$ is the image window, and the integration is on all possible points $u$ over all possible cell types $c$ within this window given the point pattern $x$ .

The difficulty of estimating maximum pseudolikelihood is it’s computationally infeasible to integrate over every location within the image window. Therefore, we applied the Berman-Turner quadrature scheme [49, 50] to approximate the background distribution of the conditional intensity function. Each image was evenly split into subregions (tiles) with 20×20 pixels. At the center of each tile and four corners of the image, dummy cells for each of cell types were created. At the location of each real cell, dummy cells for all cell types except the real cell type were also created. This way the integration was converted to a sum weighted by the intensity of cells. The intensity of a cell was calculated by the ratio of the number of cells in its tile to the size of the tile. In other words, cells in the same tile have the same intensity. We therefore had the approximate log pseudolikelihood:

l o g P L (θ, x) \approx \sum_{i = 1}^{n} l o g λ_{θ} ((x_{i}, c_{i}) ∣ x) - \sum_{c} \sum_{j = 1}^{n^{'}} w_{j} λ_{θ} ((x_{j}^{'}, c_{j}^{'}) ∣ x^{'})

where $x^{'}$ is the new point pattern generated by the quadrature scheme that includes both real and dummy cells, $n'$ is the total number of real and dummy cells, and weight $w_{j}$ is calculated by the area of a quadrature grid (20×20 pixels) over the number of cells in the grid.

We then performed maximum pseudolikelihood estimation by generalized linear model (GLM). The first step was to construct a feature matrix for GLM’s regression (see Table S#). For each point, we converted its cell type to one-hot encoding and counted the number of neighboring cells within a various distance (multirange Strauss radius). The label to predict was the local intensity $y_{i} = I_{i} / w_{i}$ , where $I_{i}$ is an indicator function that equals 1 if current cell is real and 0 if it’s dummy [51, 52].

The whole training process was done by modifying the R package spatstat [53]. We created a new function for our multirange, multitype model.

Error metric of point process model

Pseudolikelihood can appropriately be used to compare different models trained on the same point pattern. However, pseudolikelihoods for models trained on different patterns are not comparable since those patterns may contain different numbers of cells.

To obtain an error metric that is independent of the training data size, we rewrite the pseudolikelihood as:

l o g P L (θ; x) = - \frac{D}{2} + g

where $g$ is a constant and therefore irrelevant in pseudolikelihood comparison. $D$ is the deviance that can be written as:

D = 2 (l o g P L_{S} (y) - l o g P L (\hat{θ})) = 2 \sum_{i = 1}^{n^{'}} w_{i} (y_{i} l o g (y_{i} / μ_{i}) - (y_{i} - μ_{i})) μ = e x p (η) η = {\hat{θ}}^{T} X

where $l o g P L_{s} (y)$ is the log pseudolikelihood of a “saturated” model that has one parameter for each cell to achieve a perfect fit for the data, $l o g P L (\hat{θ})$ , is the log pseudolikelihood of the model under estimation, $w_{i}$ is weight for cell $i$ (definition same as in the equation of log pseudolikelihood), $y_{i} = \frac{1}{w_{i}}$ is the true label and $μ_{i}$ is the predicted label for cell $i$ in GLM. $X$ is the input feature matrix, $\hat{θ}$ is a vector that contains all base intensity coefficients and interaction coefficients need to be estimated. We assumed the model belongs to exponential family. We therefore applied exponential as the link function of GLM between the linear product $η$ and predicted label $μ$ .

To account for the influence of data size, we normalized deviance $D$ by dividing it by the cell number $n$ , yielding the average deviance per cell as our error metric. We interpreted this metric as the average difference between the observed local intensity for each cell and its predicted intensity from a trained model. This metric is particularly sensitive to the value of $η$ . An increase in $η$ would exponentially elevate $μ$ , leading to a significantly higher average deviance per cell, as exemplified in Figure 2.

Leave-one-out cross-validation

To prevent overfitting when comparing point process models trained on different tissues, we conducted a leave-one-out cross-validation for each tissue. In this process, we sequentially excluded one image from the current tissue’s training set, fit the model to the remaining images, and predicted the average deviance per cell for the left-out image. As a result, the number of models for each tissue equaled the number of images. In the subsequent analyses, we used them as an ensemble representation of their respective tissues.

Assessing cell type prediction accuracy

We utilized the Receiver Operating Characteristic (ROC) curve, which is derived from the false positive rate and the true positive rate, to measure the accuracy of cell type prediction. Given that we have five cell types, we need a multi-class ROC; for this, a prediction for one cell type was considered true only if it matched the corresponding cell type and false otherwise.

To calculate overall prediction accuracy, we employed several techniques. First, we calculated the Micro AUC, which considered each cell (independent of its actual type) and counted whether it was correctly predicted. However, a potential issue with Micro AUC arises when class imbalance exists. If a majority of the predictions are biased towards the majority class, Micro AUC could be misleadingly high. This is because the true positive rate and false positive rate in Micro AUC are derived from aggregating predictions across all classes. Consequently, strong performance on the majority class can significantly overshadow any poor performance on the minority classes.

We also computed the Macro AUC to evaluate each cell type independently. This method computes the AUC separately for each class and then averages them, giving equal weight to each class. However, Macro AUC can also be less representative of the model’s overall performance when the class frequencies are different. If a model performs well on a minority class but poorly on a majority class, the Macro AUC might still appear reasonably high despite the model’s overall lower performance on most instances.

We therefore adopted the Weighted Macro AUC (wmAUC) to address this class imbalance issue. Like the Macro AUC, this approach evaluates each cell type independently, but it counters class imbalances by weighting the AUC of each cell type according to its fraction within the total number of cells. Thus, if certain cell types are more common in the dataset, they are assigned more importance in the overall score calculation. Given its effective solution to class imbalance, we chose to use this metric to evaluate the prediction accuracy of cell types.

Supplementary Material

Supplement 1

media-1.pdf^{(1.1MB, pdf)}

Acknowledgments

We thank Matthew Ruffalo for helpful discussions. This work was supported in part by grant from the National Institutes of Health Common Fund OT2 OD026682 and OT2 OD033761, and by a traineeship to HC under training grant T32 EB009403.

Footnotes

Declaration of Interests

All authors declare no competing interests.

Data and code availability

CytoSpatio software is available at https://github.com/murphygroup/CytoSpatio.
All data used for this work are available as a reproducible research archive (https://github.com/murphygroup/ChenMurphyCytoSpatioRRA).
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

1.Alford P.W., et al. , Vascular smooth muscle contractility depends on cell shape. Integrative Biology, 2011. 3(11): p. 1063–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Luxenburg C. and Zaidel-Bar R., From cell shape to cell fate via the cytoskeleton—Insights from the epidermis. Experimental cell research, 2019. 378(2): p. 232–237. [DOI] [PubMed] [Google Scholar]
3.Smith U., Effect of cell size on lipid synthesis by human adipose tissue in vitro. Journal of lipid research, 1971. 12(1): p. 65–70. [PubMed] [Google Scholar]
4.Gaylor D., Prakah-Asante K., and Lee R.C., Significance of cell size and tissue structure in electrical trauma. Journal of theoretical biology, 1988. 133(2): p. 223–237. [DOI] [PubMed] [Google Scholar]
5.Schaefer M.H. and Serrano L., Cell type-specific properties and environment shape tissue specificity of cancer genes. Scientific reports, 2016. 6(1): p. 20707. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Jensen U.B., Lowell S., and Watt F.M., The spatial relationship between stem cells and their progeny in the basal layer of human epidermis: a new view based on whole-mount labelling and lineage analysis. Development, 1999. 126(11): p. 2409–2418. [DOI] [PubMed] [Google Scholar]
7.Apps J.R., et al. , Imaging Invasion: Micro-CT imaging of adamantinomatous craniopharyngioma highlights cell type specific spatial relationships of tissue invasion. Acta neuropathologica communications, 2016. 4: p. 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Geiger B., Rosen D., and Berke G., Spatial relationships of microtubule-organizing centers and the contact area of cytotoxic T lymphocytes and target cells. The Journal of cell biology, 1982. 95(1): p. 137–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Eglen S.J., et al. , Analysis of spatial relationships in three dimensions: tools for the study of nerve cell patterning. BMC neuroscience, 2008. 9(1): p. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gerdes M.J., et al. , Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc Natl Acad Sci U S A, 2013. 110(29): p. 11982–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Goltsev Y., et al. , Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell, 2018. 174(4): p. 968–981 e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Angelo M., et al. , Multiplexed ion beam imaging of human breast tumors. Nat Med, 2014. 20(4): p. 436–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Chang Q., et al. , Imaging Mass Cytometry. Cytometry A, 2017. 91(2): p. 160–169. [DOI] [PubMed] [Google Scholar]
14.Chen K.H., et al. , Spatially resolved, highly multiplexed RNA profiling in single cells. Science, 2015. 348(6233): p. aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hickey J.W., et al. , Strategies for accurate cell type identification in CODEX multiplexed imaging data. Frontiers in Immunology, 2021. 12: p. 727626. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hasanaj E., et al. , Interactive single-cell data analysis using Cellar. Nature communications, 2022. 13(1): p. 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Liu B., Li Y., and Zhang L., Analysis and visualization of spatial transcriptomic data. Frontiers in Genetics, 2022. 12: p. 2852. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Behanova A., Klemm A., and Wählby C., Spatial statistics for understanding tissue organization. Frontiers in Physiology, 2022: p. 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Stoltzfus C.R., et al. , CytoMAP: a spatial analysis toolbox reveals features of myeloid cell organization in lymphoid tissues. Cell reports, 2020. 31(3): p. 107523. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bhate S.S., et al. , Tissue schematics map the specialization of immune tissue motifs and their appropriation by tumors. Cell Systems, 2022. 13(2): p. 109–130. e6. [DOI] [PubMed] [Google Scholar]
21.Baddeley A., Bárány I., and Schneider R., Spatial point processes and their applications. Stochastic Geometry: Lectures Given at the CIME Summer School Held in Martina Franca, Italy, September 13–18, 2004, 2007: p. 1–75. [Google Scholar]
22.Rodriguez-Iturbe I., Cox D.R., and Isham V., A point process model for rainfall: further developments. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 1988. 417(1853): p. 283–298. [Google Scholar]
23.Law R., et al. , Ecological information from spatial patterns of plants: insights from point process theory. Journal of Ecology, 2009. 97(4): p. 616–628. [Google Scholar]
24.Mohler G.O., et al. , Self-exciting point process modeling of crime. Journal of the american statistical association, 2011. 106(493): p. 100–108. [Google Scholar]
25.Amburgey T.L., Multivariate point process models in social research. Social Science Research, 1986. 15(2): p. 190–207. [Google Scholar]
26.Johnson G.R., et al. , Automated learning of subcellular variation among punctate protein patterns and a generative model of their relation to microtubules. PLoS computational biology, 2015. 11(12): p. e1004614. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Li Y., et al. , Point process models for localization and interdependence of punctate cellular structures. Cytometry Part A, 2016. 89(7): p. 633–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Majarian T.D., Murphy R.F., and Lakdawala S.S., Learning the sequence of influenza A genome assembly during viral replication using point process models and fluorescence in situ hybridization. Plos Computational Biology, 2019. 15(1): p. e1006199. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Jones-Todd C.M., et al. , Identifying prognostic structural features in tissue sections of colon cancer patients using point pattern analysis. Statistics in medicine, 2019. 38(8): p. 1421–1441. [DOI] [PubMed] [Google Scholar]
30.Perrin G., Descombes X., and Zerubia J.. A marked point process model for tree crown extraction in plantations. in IEEE International Conference on Image Processing 2005. 2005. IEEE. [Google Scholar]
31.Mohler G., Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 2014. 30(3): p. 491–497. [Google Scholar]
32.Baddeley A., Jammalamadaka A., and Nair G., Multitype point process analysis of spines on the dendrite network of a neuron. Journal of the Royal Statistical Society Series C: Applied Statistics, 2014. 63(5): p. 673–694. [Google Scholar]
33.Isham V., Multitype Markov point processes: some approximations. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 1984. 391(1800): p. 39–53. [Google Scholar]
34.Summers H.D., Wills J.W., and Rees P., Spatial statistics is a comprehensive tool for quantifying cell neighbor relationships and biological processes via tissue image analysis. Cell Reports Methods, 2022. 2(11): p. 100348. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Chervoneva I., et al. , Quantification of spatial tumor heterogeneity in immunohistochemistry staining images. Bioinformatics, 2021. 37(10): p. 1452–1460. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Helmuth J.A., Paul G., and Sbalzarini I.F., Beyond co-localization: inferring spatial interactions between sub-cellular structures from microscopy images. BMC bioinformatics, 2010. 11: p. 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.9, C.-U.T.C.L.l.c.e.b.S.J.T.C.L.S.s.u.e.e.J.D., et al. , The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature, 2019. 574(7777): p. 187–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Diggle P.J., et al. , On parameter estimation for pairwise interaction point processes. International Statistical Review/Revue Internationale de Statistique, 1994: p. 99–117. [Google Scholar]
39.Diggle P.J. and Gratton R.J., Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society: Series B (Methodological), 1984. 46(2): p. 193–212. [Google Scholar]
40.Diggle P.J., Gates D.J., and Stibbard A., A nonparametric estimator for pairwise-interaction point processes. Biometrika, 1987. 74(4): p. 763–770. [Google Scholar]
41.Baddeley A.J. and Van Lieshout M., Area-interaction point processes. Annals of the Institute of Statistical Mathematics, 1995. 47: p. 601–619. [Google Scholar]
42.Geyer C., Likelihood inference for spatial point processes, in Stochastic geometry. 2019, Routledge. p. 79–140. [Google Scholar]
43.Kuett L., et al. , Three-dimensional imaging mass cytometry for highly multiplexed molecular and cellular mapping of tissues and the tumor microenvironment. Nature cancer, 2022. 3(1): p. 122–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Petersone L., et al. , T cell/B cell collaboration and autoimmunity: an intimate relationship. Frontiers in immunology, 2018. 9: p. 1941. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Allen C.D., et al. , Imaging of germinal center selection events during affinity maturation. Science, 2007. 315(5811): p. 528–531. [DOI] [PubMed] [Google Scholar]
46.Ruan X. and Murphy R.F., Evaluation of methods for generative modeling of cell and nuclear shape. Bioinformatics, 2019. 35(14): p. 2475–2485. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Czech E., et al. , Cytokit: a single-cell analysis toolkit for high dimensional fluorescent microscopy imaging. BMC bioinformatics, 2019. 20(1): p. 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Diggle P.J., Statistical analysis of spatial and spatio-temporal point patterns. 2013: CRC press. [Google Scholar]
49.Baddeley A., Rubak E., and Turner R., Spatial point patterns: methodology and applications with R. 2015: CRC press. [Google Scholar]
50.Berman M. and Turner T.R., Approximating point process likelihoods with GLIM. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1992. 41(1): p. 31–38. [Google Scholar]
51.Goulard M., Särkkä A., and Grabarnik P., Parameter estimation for marked Gibbs point processes through the maximum pseudo-likelihood method. Scandinavian Journal of Statistics, 1996: p. 365–379. [Google Scholar]
52.Baddeley A. and Turner R., Practical Maximum Pseudolikelihood for Spatial Point Patterns: (with Discussion). Australian & New Zealand Journal of Statistics, 2000. 42(3): p. 283–322. [Google Scholar]
53.Baddeley A. and Turner R., Spatstat: an R package for analyzing spatial point patterns. Journal of statistical software, 2005. 12: p. 1–42. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.pdf^{(1.1MB, pdf)}

Data Availability Statement

CytoSpatio software is available at https://github.com/murphygroup/CytoSpatio.
All data used for this work are available as a reproducible research archive (https://github.com/murphygroup/ChenMurphyCytoSpatioRRA).
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

[R1] 1.Alford P.W., et al. , Vascular smooth muscle contractility depends on cell shape. Integrative Biology, 2011. 3(11): p. 1063–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Luxenburg C. and Zaidel-Bar R., From cell shape to cell fate via the cytoskeleton—Insights from the epidermis. Experimental cell research, 2019. 378(2): p. 232–237. [DOI] [PubMed] [Google Scholar]

[R3] 3.Smith U., Effect of cell size on lipid synthesis by human adipose tissue in vitro. Journal of lipid research, 1971. 12(1): p. 65–70. [PubMed] [Google Scholar]

[R4] 4.Gaylor D., Prakah-Asante K., and Lee R.C., Significance of cell size and tissue structure in electrical trauma. Journal of theoretical biology, 1988. 133(2): p. 223–237. [DOI] [PubMed] [Google Scholar]

[R5] 5.Schaefer M.H. and Serrano L., Cell type-specific properties and environment shape tissue specificity of cancer genes. Scientific reports, 2016. 6(1): p. 20707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Jensen U.B., Lowell S., and Watt F.M., The spatial relationship between stem cells and their progeny in the basal layer of human epidermis: a new view based on whole-mount labelling and lineage analysis. Development, 1999. 126(11): p. 2409–2418. [DOI] [PubMed] [Google Scholar]

[R7] 7.Apps J.R., et al. , Imaging Invasion: Micro-CT imaging of adamantinomatous craniopharyngioma highlights cell type specific spatial relationships of tissue invasion. Acta neuropathologica communications, 2016. 4: p. 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Geiger B., Rosen D., and Berke G., Spatial relationships of microtubule-organizing centers and the contact area of cytotoxic T lymphocytes and target cells. The Journal of cell biology, 1982. 95(1): p. 137–143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Eglen S.J., et al. , Analysis of spatial relationships in three dimensions: tools for the study of nerve cell patterning. BMC neuroscience, 2008. 9(1): p. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Gerdes M.J., et al. , Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc Natl Acad Sci U S A, 2013. 110(29): p. 11982–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Goltsev Y., et al. , Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell, 2018. 174(4): p. 968–981 e15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Angelo M., et al. , Multiplexed ion beam imaging of human breast tumors. Nat Med, 2014. 20(4): p. 436–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Chang Q., et al. , Imaging Mass Cytometry. Cytometry A, 2017. 91(2): p. 160–169. [DOI] [PubMed] [Google Scholar]

[R14] 14.Chen K.H., et al. , Spatially resolved, highly multiplexed RNA profiling in single cells. Science, 2015. 348(6233): p. aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Hickey J.W., et al. , Strategies for accurate cell type identification in CODEX multiplexed imaging data. Frontiers in Immunology, 2021. 12: p. 727626. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Hasanaj E., et al. , Interactive single-cell data analysis using Cellar. Nature communications, 2022. 13(1): p. 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Liu B., Li Y., and Zhang L., Analysis and visualization of spatial transcriptomic data. Frontiers in Genetics, 2022. 12: p. 2852. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Behanova A., Klemm A., and Wählby C., Spatial statistics for understanding tissue organization. Frontiers in Physiology, 2022: p. 37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Stoltzfus C.R., et al. , CytoMAP: a spatial analysis toolbox reveals features of myeloid cell organization in lymphoid tissues. Cell reports, 2020. 31(3): p. 107523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Bhate S.S., et al. , Tissue schematics map the specialization of immune tissue motifs and their appropriation by tumors. Cell Systems, 2022. 13(2): p. 109–130. e6. [DOI] [PubMed] [Google Scholar]

[R21] 21.Baddeley A., Bárány I., and Schneider R., Spatial point processes and their applications. Stochastic Geometry: Lectures Given at the CIME Summer School Held in Martina Franca, Italy, September 13–18, 2004, 2007: p. 1–75. [Google Scholar]

[R22] 22.Rodriguez-Iturbe I., Cox D.R., and Isham V., A point process model for rainfall: further developments. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 1988. 417(1853): p. 283–298. [Google Scholar]

[R23] 23.Law R., et al. , Ecological information from spatial patterns of plants: insights from point process theory. Journal of Ecology, 2009. 97(4): p. 616–628. [Google Scholar]

[R24] 24.Mohler G.O., et al. , Self-exciting point process modeling of crime. Journal of the american statistical association, 2011. 106(493): p. 100–108. [Google Scholar]

[R25] 25.Amburgey T.L., Multivariate point process models in social research. Social Science Research, 1986. 15(2): p. 190–207. [Google Scholar]

[R26] 26.Johnson G.R., et al. , Automated learning of subcellular variation among punctate protein patterns and a generative model of their relation to microtubules. PLoS computational biology, 2015. 11(12): p. e1004614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Li Y., et al. , Point process models for localization and interdependence of punctate cellular structures. Cytometry Part A, 2016. 89(7): p. 633–643. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Majarian T.D., Murphy R.F., and Lakdawala S.S., Learning the sequence of influenza A genome assembly during viral replication using point process models and fluorescence in situ hybridization. Plos Computational Biology, 2019. 15(1): p. e1006199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Jones-Todd C.M., et al. , Identifying prognostic structural features in tissue sections of colon cancer patients using point pattern analysis. Statistics in medicine, 2019. 38(8): p. 1421–1441. [DOI] [PubMed] [Google Scholar]

[R30] 30.Perrin G., Descombes X., and Zerubia J.. A marked point process model for tree crown extraction in plantations. in IEEE International Conference on Image Processing 2005. 2005. IEEE. [Google Scholar]

[R31] 31.Mohler G., Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 2014. 30(3): p. 491–497. [Google Scholar]

[R32] 32.Baddeley A., Jammalamadaka A., and Nair G., Multitype point process analysis of spines on the dendrite network of a neuron. Journal of the Royal Statistical Society Series C: Applied Statistics, 2014. 63(5): p. 673–694. [Google Scholar]

[R33] 33.Isham V., Multitype Markov point processes: some approximations. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 1984. 391(1800): p. 39–53. [Google Scholar]

[R34] 34.Summers H.D., Wills J.W., and Rees P., Spatial statistics is a comprehensive tool for quantifying cell neighbor relationships and biological processes via tissue image analysis. Cell Reports Methods, 2022. 2(11): p. 100348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Chervoneva I., et al. , Quantification of spatial tumor heterogeneity in immunohistochemistry staining images. Bioinformatics, 2021. 37(10): p. 1452–1460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Helmuth J.A., Paul G., and Sbalzarini I.F., Beyond co-localization: inferring spatial interactions between sub-cellular structures from microscopy images. BMC bioinformatics, 2010. 11: p. 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.9, C.-U.T.C.L.l.c.e.b.S.J.T.C.L.S.s.u.e.e.J.D., et al. , The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature, 2019. 574(7777): p. 187–192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Diggle P.J., et al. , On parameter estimation for pairwise interaction point processes. International Statistical Review/Revue Internationale de Statistique, 1994: p. 99–117. [Google Scholar]

[R39] 39.Diggle P.J. and Gratton R.J., Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society: Series B (Methodological), 1984. 46(2): p. 193–212. [Google Scholar]

[R40] 40.Diggle P.J., Gates D.J., and Stibbard A., A nonparametric estimator for pairwise-interaction point processes. Biometrika, 1987. 74(4): p. 763–770. [Google Scholar]

[R41] 41.Baddeley A.J. and Van Lieshout M., Area-interaction point processes. Annals of the Institute of Statistical Mathematics, 1995. 47: p. 601–619. [Google Scholar]

[R42] 42.Geyer C., Likelihood inference for spatial point processes, in Stochastic geometry. 2019, Routledge. p. 79–140. [Google Scholar]

[R43] 43.Kuett L., et al. , Three-dimensional imaging mass cytometry for highly multiplexed molecular and cellular mapping of tissues and the tumor microenvironment. Nature cancer, 2022. 3(1): p. 122–133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Petersone L., et al. , T cell/B cell collaboration and autoimmunity: an intimate relationship. Frontiers in immunology, 2018. 9: p. 1941. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Allen C.D., et al. , Imaging of germinal center selection events during affinity maturation. Science, 2007. 315(5811): p. 528–531. [DOI] [PubMed] [Google Scholar]

[R46] 46.Ruan X. and Murphy R.F., Evaluation of methods for generative modeling of cell and nuclear shape. Bioinformatics, 2019. 35(14): p. 2475–2485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Czech E., et al. , Cytokit: a single-cell analysis toolkit for high dimensional fluorescent microscopy imaging. BMC bioinformatics, 2019. 20(1): p. 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Diggle P.J., Statistical analysis of spatial and spatio-temporal point patterns. 2013: CRC press. [Google Scholar]

[R49] 49.Baddeley A., Rubak E., and Turner R., Spatial point patterns: methodology and applications with R. 2015: CRC press. [Google Scholar]

[R50] 50.Berman M. and Turner T.R., Approximating point process likelihoods with GLIM. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1992. 41(1): p. 31–38. [Google Scholar]

[R51] 51.Goulard M., Särkkä A., and Grabarnik P., Parameter estimation for marked Gibbs point processes through the maximum pseudo-likelihood method. Scandinavian Journal of Statistics, 1996: p. 365–379. [Google Scholar]

[R52] 52.Baddeley A. and Turner R., Practical Maximum Pseudolikelihood for Spatial Point Patterns: (with Discussion). Australian & New Zealand Journal of Statistics, 2000. 42(3): p. 283–322. [Google Scholar]

[R53] 53.Baddeley A. and Turner R., Spatstat: an R package for analyzing spatial point patterns. Journal of statistical software, 2005. 12: p. 1–42. [Google Scholar]

PERMALINK

This is a preprint.

CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models

Haoran Chen

Robert F Murphy

Summary

Introduction

Figure 1.

Results

Assessing non-randomness of cell type distributions in different tissues

Figure 2.

Comparing multirange to single range of Strauss Hardcore

Figure 3.

Evaluating differences in cell type spatial relationships within and across tissues

Figure 4.

Analyzing heterogeneity within tissue images

Figure 5.

Visualizing cell type interaction networks

Figure 6.

Simulating artificial tissue images from generative models

Figure 7.

Discussion

Methods

Tissue images and cellular data

Assigning cell types

Point pattern and point process model

Training the point process model

Error metric of point process model

Leave-one-out cross-validation

Assessing cell type prediction accuracy

Supplementary Material

Acknowledgments

Footnotes

Data and code availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases