Abstract
Summary
Multiplex imaging platforms have become popular for studying complex single-cell biology in the tumor microenvironment (TME) of cancer subjects. Studying the intensity of the proteins that regulate important cell-functions becomes extremely crucial for subject-specific assessment of risks. The conventional approach requires selection of two thresholds, one to define the cells of the TME as positive or negative for a particular protein, and the other to classify the subjects based on the proportion of the positive cells. We present a threshold-free approach in which distance between a pair of subjects is computed based on the probability density of the protein in their TMEs. The distance matrix can either be used to classify the subjects into meaningful groups or can directly be used in a kernel machine regression framework for testing association with clinical outcomes. The method gets rid of the subjectivity bias of the thresholding-based approach, enabling easier but interpretable analysis. We analyze a lung cancer dataset, finding the difference in the density of protein HLA-DR to be significantly associated with the overall survival and a triple-negative breast cancer dataset, analyzing the effects of multiple proteins on survival and recurrence. The reliability of our method is demonstrated through extensive simulation studies.
Availability and implementation
The associated R package can be found here, https://github.com/sealx017/DenVar.
Supplementary information
Supplementary data are available at Bioinformatics Advances online.
1 Introduction
In recent years, various technologies are being used for probing single-cell spatial biology, for example, multiparameter immunofluorescence (Bataille et al., 2006), imaging mass cytometry (Ali et al., 2020), multiplex immunohistochemistry (mIHC) (Tan et al., 2020; Vu et al., 2021) and multiplexed ion beam imaging (MIBI) (Angelo et al., 2014; Seal et al., 2021). These technologies, often referred to as multiplex tissue imaging, offer the potential for researchers to explore the basis of many different biological mechanisms. Multiplex tissue imaging platforms such as Vectra 3.0 (Akoya Biosciences) (Huang et al., 2013), Vectra Polaris (Akoya Biosciences) (Pollan et al., 2020) and MIBI (Ionpath Inc.) (Angelo et al., 2014) produce images with similar structure. In particular, each image is two-dimensional, collected at cell- and nucleus-level resolution and proteins in the sample are labeled with antibodies that attach to cell membranes. We will refer to the antibodies as markers in the paper. Typically, mIHC images have 6–8 markers, whereas MIBI images can have more than 40 markers.
The majority of the above markers are surface or phenotypic markers (Shipkova and Wieland, 2012) which are primarily used for cell type identification. Additionally, there are several functional markers including HLA-DR (Saraiva et al., 2018), PD1, PD-L1 and Lag3 (Phillips et al., 2021) that dictate or regulate important cell-functions. Both surface and functional markers are quantified as continuous-valued marker intensities. For a phenotypic marker, a threshold is drawn to indicate whether a cell is positive or negative for the particular marker. Then one or more of these binarized phenotypic markers are used to classify the cells into different types based on biological knowledge of marker expression pattern. With functional markers, the interest lies in finding out if over-expression of the markers across the cells of the tumor microenvironment (TME) (Binnewies et al., 2018) have significant impact on subject-level clinical outcomes, such as survival or recurrence (Johnson et al., 2020; Koguchi et al., 2015). A two-step thresholding-based approach (e.g. Bulian et al., 2014; Costa et al., 2017) with one marker at a time is typically used in this context which we describe next.
The two steps in the thresholding-based approach involve identifying cells positive for a functional marker and classifying subjects into different groups according to the proportions of positive cells. The group labels can be used in a linear regression framework to test association with the outcomes of interest (Chang et al., 2018; Yang et al., 2019). For example, Johnson et al. (2021) define the cells to be positive for HLA-DR (also known as, MHCII) if the corresponding mean marker intensity is >0.05. Next, they classify the subjects into two groups, MHCII: High and MHCII: Low if the proportion of cancer cells positive for HLA-DR is greater or smaller than 5%, respectively. Finally, they test if these two groups of subjects have different 5-year overall survival. Instead of grouping the subjects based on the proportion of positive cells, another approach is to directly test if the vector of the proportion of positive cells is associated with the outcome (Patwa et al., 2021).
The aforementioned thresholding-based method clearly requires judicious selection of the cutoffs that greatly influence the subsequent steps of the analysis (Harris et al., 2022). The result is bound to vary for different thresholding values; and a poor choice of thresholds may produce an uninformative and uninterpretable result. There is a plethora of helpful guidelines for choosing these thresholds in different contexts (Cossarizza et al., 2019; Kimball et al., 2018). However, there is no universal solution or rule of thumb. Thus, the method remains prone to subjectivity bias and lacks robustness. In addition, discarding important marker information by binarizing them (Altman and Royston, 2006) can be critical in capturing subtle differences between the subjects and thus, result in a loss of power and robustness (refer to the Supplementary Figs S3 and S4).
In this article, we propose a threshold-free method for distinguishing the difference between the subjects with respect to the distribution of a functional marker in the TME. We treat the expression of a marker as a continuous random variable having realizations in different cells of the TME of a subject. Then, we compare the marker’s probability distribution or equivalently, probability density across all the subjects. Our algorithm is as follows. First, for every subject, the probability density of the marker is estimated using kernel density estimation (KDE) (Silverman, 2018). Next, a density based distance (Basu et al., 1998) known as Jensen–Shannon distance (JSD) (Endres and Schindelin, 2003; Nielsen, 2019) is used to quantify the difference in the estimated density between the pairs of subjects. The matrix of distances between the subjects can then be used to classify them into meaningful groups using hierarchical clustering (Murtagh and Legendre, 2014), and the group-labels can be tested for association with clinical outcomes in a linear regression framework. Alternatively, the distance matrix can also be used directly in a linear mixed model (Hoffman, 2013; Seal et al., 2022) or equivalently, a kernel machine regression framework (Jensen et al., 2019; Liu et al., 2008) to test for association with clinical outcomes.
Using our proposed method, we analyzed an mIHC dataset on lung cancer (Johnson et al., 2021) from the University of Colorado School of Medicine, finding that the difference in HLA-DR marker density in tumor cells is associated with 5-year overall survival probability of subjects. We have also applied the proposed method on a publicly available triple negative breast cancer (TNBC) dataset (Keren et al., 2018) from the MIBI platform finding the density of an immunoregulatory protein, PD1 to have significant effect on disease recurrence probability. We have performed extensive simulation studies mimicking the characteristics of the real datasets to check the power, reliability and robustness of our method.
2 Materials and methods
Suppose there are M functional markers and N subjects with the j-th subject having nj cells. Let Xkij denote the scaled expression, between 0 and 1, of marker k in i-th cell of subject j for , and Let Y ( vector) be a subject-level outcome of interest and C be an N × p matrix of p subject-level covariates. Next, we describe the traditional and proposed methods, considering one marker at a time.
2.1 Traditional thresholding-based approach for clustering subjects
To study if abundance of marker k is associated with a subject’s survival or any other outcome of interest (Y), the conventional approach is to classify the subjects into two or more groups using a thresholding-based approach. First, we choose a threshold t1. Then we compute the number of cells for subject j whose expression is greater than t1 i.e., the number of cells with Such cells are referred to as the cells positive for marker k. The proportion of the cells positive for a marker k in subject j is denoted as, , where is the indicator function. The next threshold t2 is chosen to classify the subjects into two groups. This is based on or not. Then, we test for association between the group label and clinical outcomes. This can easily be extended to allow more than two groups.
Denote the clustering variable as . When Y is a continuous outcome, a standard multiple linear regression model with as a predictor can be written as
where are fixed effects and ϵ is an error vector following multivariate normal distribution (MVN) with mean 0 and identity covariance matrix After estimating the parameters, the null hypothesis, , can be tested using the Wald test (Gourieroux et al., 1982).
Next, we consider the case of Y being a survival or recurrence outcome. Let the outcome of the j-th individual be , where Tj is the time to event and Uj is the censoring time. Let be the corresponding censoring indicator. Assuming that Tj and Uj are conditionally independent given the covariates for , the hazard function for the Cox proportional hazards (PH) model (Andersen and Gill, 1982) with fixed effects can be written as,
(1) |
In Equation 1, is the hazard of the j-th subject at time t, given the vector of covariates Cj and the cluster label Zkj and is an unspecified baseline hazard at time t. To test the null hypothesis: , a likelihood ratio test (LRT) (Therneau, 1997) can be considered.
As pointed out earlier, the biggest difficulty with this approach lies in choosing the thresholds, t1 and t2 appropriately. In most cases, one would perform the analysis for different pairs of (t1, t2), and choose the result that aligns best with the biological mechanism of interest. Thus, the step of threshold-selection remains entirely subjective and the results are bound to vary largely depending on the selected thresholds.
2.2 Proposed method: distance-based clustering using marker probability density of subjects
To avoid the bias inherent in the thresholding-based approach, we propose a distance between the subjects based on each marker k that would be devoid of subjectivity and can easily be tested for association with an outcome of interest. First, we discuss the concept of divergence or distance between two probability distributions and then its implementation.
2.2.1 Jensen–Shannon distance
Let be a measurable space (Billingsley, 2008) where denotes the sample space and the σ-algebra of measurable events. Consider a dominating measure μ and denote the set of probability distributions as . In this context, JSD (Endres and Schindelin, 2003; Nielsen, 2019) between two probability distributions, can be defined as,
(2) |
where p, q are the Radon–Nikodym derivatives or densities (Nikodym, 1930) of P and Q with respect to a dominating measure Unlike other divergences between distributions, such as Kullback-Leibler divergence (van Erven and Harremos, 2014), the JSD satisfies the properties of being a metric (Lawvere, 1973) between probability measures. To formalize this, a metric needs to satisfy the following three axioms:
Identity: iff P = Q,
Symmetry:
Triangle Inequality: where
Note that, P = Q implies almost everywhere w.r.t μ (Athreya and Lahiri, 2006). JSD lies between (0, ), and smaller values associate with more similar distributions. As JSD satisfies the metric properties, it can readily be used to construct a valid distance matrix between random variables (rv’s) having different probability distributions or densities. The distance matrix can then be used in subsequent analysis such as classifying the rv’s into meaningful groups. JSD has been used in many different areas, such as bioinformatics (Sims et al., 2009), social sciences (DeDeo et al., 2013), and more recently, in generative adversarial networks (Goodfellow et al., 2014), a popular technique in deep learning. Next, we discuss the formulation of JSD in our context.
For every subject j, we assume that the expression of marker k is a continuous random variable, denoted by Xkj, taking values between 0 and 1. Xkj is observed in nj cells as, . Let the probability distribution function and the density function of Xkj be denoted by, Fkj and fkj, respectively. Next, we consider the set-up described with and being the corresponding σ-algebra of measurable events. Then the set, contains the distribution functions, Fkj for and . Using Equation 2, the distance between two subjects in terms of the probability distribution of marker k can be quantified by
A large value of will imply that there is a clear difference in the distribution or equivalently, density of k-th marker between the pair of subjects, . A small value will imply that the distributions are close. The distance matrix between all the subjects based on k-th marker can then be constructed as, .
In a real-data analysis, the density function fkj will be unknown. Therefore, we compute corresponding KDE (Silverman, 2018) using the observations: Xkij’s for . typically has the form: , where wh is a Gaussian kernel with bandwidth parameter h, chosen using Silverman’s rule of thumb (Silverman, 2018). Using the KDEs, can be estimated as,
(3) |
where are some grid-points in the interval . In our simulations and real data analysis, the estimates did not change for sufficiently large values of R. We kept R at 1024 and chose equidistant grid-points. We made sure that the estimated densities integrate to 1 by appropriately scaling them.
2.2.2 Using the distance in association analysis
Next, we construct suitable tests for testing the association of the distance matrix with dependent variable, Y.
Test based on hierarchical clustering: The estimated distance matrix () can be subjected to hierarchical clustering (Murtagh and Legendre, 2014) for classifying the subjects into two or more groups. Suppose, we obtain a vector of cluster labels: . Then, exactly the same models, described in Section 2.1 and the corresponding tests, can be used to determine if the differential expression of the k-th marker is associated with Y.
Test based on linear mixed model: The distance matrix can be transformed into a similarity matrix (Vert et al., 2004) as, . When Y is a continuous outcome, Gk can be incorporated in a linear mixed model framework, particularly popular in the context of heritability estimation (Hoffman, 2013; Seal et al., 2022), as,
where is the vector of fixed effects, is the vector of random effects following MVN() and ϵ is an error vector following MVN(0, The null hypothesis: can be tested using a LRT (Crainiceanu and Ruppert, 2004). Note that, such a linear mixed model setup has been shown to be equivalent to a kernel machine regression framework by Liu et al. (2008). In a standard kernel machine regression framework, there is one additional width parameter, ρ that has to be estimated.
Next, we consider the case of Y being a survival or recurrence outcome. Using the same definitions and conditional independence assumptions of Tj, Uj and covariates as in Section 2.1, the hazard function for the Cox PH model with random effects (Therneau et al., 2015) can be written as,
(4) |
where is the hazard of the j-th subject at time t, given the vector of covariates Cj and the random effect gkj and is an unspecified baseline hazard at time t. To test the null hypothesis, , an LRT based on integrated partial likelihoods (Therneau et al., 2015) can be considered. However, it is to be kept in mind that usually a large sample size is needed to obtain a precise estimate of the random effect variance (Bell et al., 2010; Maas and Hox, 2005). The problem would possibly be exacerbated in the Cox PH model with random effects because the partial likelihood would depend on the number of events (Kocak and Onar-Thomas, 2012). Therefore, we do not recommend using this test unless the sample size is sufficiently large. We have summarized the workflow of our method in Figure 1.
Fig. 1.
A simple comparison of the workflow of the proposed method with the traditional thresholding-based method
2.3 Clustering based on marker quantiles
For comparison with the traditional and proposed methods, we also consider a simpler clustering algorithm based on the subject-specific quantiles of the marker intensity. For every subject j, we compute a few quantiles (e.g. quantiles) of the marker intensity, i.e., Next, K-means clustering algorithm (Likas et al., 2003) is used to classify the subjects into different groups based on the vector of quantiles. Once, we have the vector of cluster labels: , the hypothesis tests described in Section 2.1, can be used to determine if the differential expression of the k-th marker is associated with the dependent variable Y.
Instead of considering the entire marker density, here we are checking how well only a few quantiles of the marker distribution can capture the differences across subjects. The method can be interpreted as a less general and weaker version of the proposed JSD-based clustering. In real data, as we will see in the next section, the difference between the tails of the estimated distributions seem to be apparent and thus, the method can be expected to perform moderately well. However, choosing how many and which quantiles to use, remain subjective and dependent on careful inspection of the estimated distributions. We evaluate the performance of the method only in the simulation studies.
3 Real-data analysis
We first discuss the application of our method on the real datasets. We analyzed two datasets: an mIHC lung cancer dataset (Johnson et al., 2021) and a MIBI breast cancer dataset (Keren et al., 2018). The first dataset has a single functional marker, HLA-DR and the second dataset has four immunoregulatory proteins, PD1, PD-L1, Lag3 and IDO. We applied our proposed method on both the datasets. In all the analyses, the markers were scaled to have expression value between 0 and 1.
3.1 Application to mIHC lung cancer data
In the mIHC lung cancer dataset, there are 153 subjects each with 3–5 non-overlapping images (in total, 761 images). The subjects have varying number of cells identified (from 3755 to 16949). The cells come from two different tissue regions: tumor and stroma. The cells are pre-classified into either of the six different cell types: CD14+, CD19+, CD4+, CD8+, CK+ and Other, based on the expression of phenotypic markers, CD19, CD3, CK, CD8 and CD14. A functional marker, HLA-DR (also known as MHCII), is also measured in each of the cells. Using the thresholding-based approach described in Section 2.1, Johnson et al. (2021) classified the subjects into two groups, (i) MHCII: High and (ii) MHCII: Low based on the proportion of CK+ tumor cells that are also positive for HLA-DR. They discovered that there is a significant difference in 5-year overall survival between the groups. Analogously, we were interested in answering the question: whether 5-year overall survival of a subject was associated with the HLA-DR density in CK+ tumor cells. We first computed the JSD matrix between the subjects as discussed in Section 2.2.1 based on the density of HLA-DR marker in CK+ tumor cells. Next, we performed a hierarchical clustering using the computed JSD matrix to classify the subjects into two groups. Finally, we tested if there was a difference in survival probability between the subjects of the two groups using the test based on the Cox PH model described in Section 2.2.2 (and Equation 1). Figure 2 shows the Kaplan–Meier curves (Efron, 1988) of the two groups of subjects. We noticed that hazard ratio (HR) was large (>2) and the P-value was significant (<0.015) indicating that 5-year overall survival probability was associated with the probability density of HLA-DR in CK+ tumor cells.
Fig. 2.
Kaplan–Meier curves for 5-year overall survival probability of 153 subjects from the lung cancer dataset, color coded by the clusters found comparing HLA-DR marker density in CK+ tumor cells. Also, displayed are the hazard ratio (HR) and the P-value corresponding to the test, from Equation 1. Notice that HR is large (> 2) and the P-value is significant as well indicating that the two clusters have significant difference in survival probability
Next, we checked the degree of concordance between Johnson et al. (2021)’s classification and the classification based on our method and summarized it in Table 1. The accompanying values of Rand index (RI) and adjusted Rand index (ARI) were 0.64 and 0.29, respectively, indicating that the classifications moderately agreed with each other. We investigated how the estimated HLA-DR density profiles varied across the two clusters found by our method and also, across the groups identified by Johnson et al. (2021)’s traditional classification. From Figure 3, we noticed that the individual densities from cluster 1 were more right-skewed compared to those from Cluster 2 which led to the mean density of cluster 1 having a very high mode compared to that of Cluster 2. Some of the subjects from MHCII: High group actually had density functions similar to the mean density of MHCII: Low group meaning that the thresholding-based method was incapable of fully capturing the differences between the density profiles.
Table 1.
Number of subjects common between the groups found using the thresholding-based method and our proposed method in the lung cancer dataset
Cluster 1 | Cluster 2 | |
---|---|---|
MHCII: High | 80 | 17 |
MHCII: Low | 18 | 38 |
Fig. 3.
Two figures on the left respectively correspond to individual and mean HLA-DR probability density in CK+ tumor cells of the subjects (patients) from the two clusters found using the proposed JSD-based clustering. Two figures on the right respectively correspond to individual and mean HLA-DR marker probability density in CK+ tumor cells of the subjects from the two groups identified by Johnson et al. (2021)’s traditional thresholding-based method. Notice that the distinction between the density profiles of the identified clusters (groups) is more apparent in our method than the traditional method.
We also used the test based on Cox PH model with random effects from Section 2.2.2 in this case. The estimated variance of the random effect was 0.38. Following Therneau et al. (2015)’s interpretation of the variance parameter in this context, we concluded that there are multiple subjects in the study with quite large relative risks, fold greater than the average subjects. However, the LRT based on integrated partial likelihoods was not significant.
3.2 Application to TNBC MIBI data
The TNBC MIBI dataset has images from 41 subjects. Keren et al. (2018) categorized these subjects into three groups: ‘cold’, ‘compartmentalized’ and ‘mixed’ based on the level of immune infiltration in the TME. We were interested in studying the density of the immunoregulatory protein markers, PD1, PD-L1, and Lag3 which have been shown to have immunological relevance (Keren et al., 2018; Patwa et al., 2021). PD1 and Lag3 are primarily expressed in immune cells and ‘cold’ subjects have very few immune cells expressing them. Thus, we focused our analysis on 33 non-‘cold’ subjects. For PD1 and Lag3, we studied their density only in immune cells of a subject and for PD-L1, we studied its density both in immune and tumor cells of a subject. For every marker, we computed the JSD matrix between the subjects and performed a hierarchical clustering to classify the subjects into two groups. Then, we tested the vector of cluster labels for association with two available outcomes: disease recurrence and survival using the Cox PH model. Figure 4 shows the Kaplan–Meier curves corresponding to the markers PD1 and Lag3 (refer to the Supplementary Material for PD-L1). We noticed that there was significant difference in recurrence probability between the clusters obtained using marker PD1 (HR = 3.0778, P < 0.0461). For other two markers, we did not find any statistically significant results (at level 0.05). However, it is worth pointing out that the HR corresponding to survival for the clusters obtained using Lag3 was large (HR = 3.3358, P < 0.0716), alluding to a possible association of Lag3 density with the risk of death. We should also keep in mind that the sample size for this particular analysis was quite low which could have limited our power.
Fig. 4.
Kaplan–Meier curves for survival and recurrence probability of 33 subjects color coded by the clusters found using our method on the markers, PD1 (two figures on the left) and Lag3 (two figures on the right). Note that the difference in PD1 density has significant effect on recurrence probability
4 Simulation study application
Next, we compared the performance of the proposed JSD-based clustering with the thresholding-based method and also the simpler marker quantile based method (Section 2.3) in terms of ARI (Santos and Embrechts, 2009), in three different simulation setups. In the first two setups, we simulated expression data based on the mean HLA-DR expression profiles of the two clusters found in the mIHC lung cancer dataset (Section 3.1), while in the last one, we simulated expression data using the assumptions of the thresholding-based method. In all the three setups, we considered two groups of subjects (referred to as Groups 1 and 2) with sizes and . Each subject had n cells. And, two values of n, 200 and 2000 were considered.
We had noticed that the mean HLA-DR distributions of the two clusters identified in the mIHC lung cancer dataset could be well approximated using Beta distributions (Gupta and Nadarajah, 2004) with different sets of parameters (α, β). In particular, the mean distribution of the Cluster 1 could be well approximated by Beta(2.17, 300), whereas the mean distribution of the Cluster 2 could be well approximated by Beta(1.78, 45). Refer to the Supplementary Material for more details. The essential difference between these two distributions was that the former had a much sharper peak and a thinner tail compared to the latter. These distributions and their perturbations were respectively used in the following two simulation setups.
4.1 Simulation using mean expression profile of the Cluster 1 of the mIHC data
For a subject from the Group 1, the marker expression in every cell was simulated from Beta(2.17, 300). For a subject from the Group 2, the marker expression in every cell was simulated from Beta(x, 300) where x was chosen such that the mode of this distribution was higher than the mode of the Group 1 distribution by a percentage of l. Five different values of l, and 200 were considered. We considered 100 replications in every case.
As discussed earlier, the thresholding-based approach requires specifying two thresholds t1 and t2. We varied t1 between 95% and 97.5% quantiles of the full marker data (concatenating marker data of all the subjects) and kept t2 at 0.01. These two methods were referred to as 95% and 97.5% thresholding respectively. For the marker quantile based method, we considered the vector of three extreme quantiles, and 99.5%. Table 2 lists the average ARI of the methods across all the replications. Refer to the Supplementary Material for a table of the confidence interval of ARI for the different methods.
Table 2.
Performance of different methods in terms of the average ARI (higher is better) across 100 replications in the simulation setup from Section 4.1
Number of cells | % difference in modes | JSD-based clustering | 95% thresholding | 97.5% thresholding | Marker quantiles |
---|---|---|---|---|---|
n = 200 | 10 | 0.0744 | 0.0014 | 0.0234 | 0.0052 |
20 | 0.3988 | 0.0029 | 0.0551 | 0.0312 | |
50 | 0.9808 | 0.0236 | 0.2264 | 0.2200 | |
100 | 1.0000 | 0.1979 | 0.6324 | 0.6864 | |
200 | 1.0000 | 0.8628 | 0.9570 | 0.9876 | |
n = 2000 | 10 | 0.8029 | 0.0000 | 0.0000 | 0.0760 |
20 | 0.9530 | 0.0000 | 0.0001 | 0.3136 | |
50 | 1.0000 | 0.0000 | 0.0299 | 0.9350 | |
100 | 1.0000 | 0.0040 | 0.8105 | 0.9996 | |
200 | 1.0000 | 0.9907 | 1.0000 | 1.0000 |
We noticed that when the number of cells and differences in modes were both small (), all the methods performed poorly. The performance of the methods expectedly improved as the difference l increased and the JSD-based clustering achieved close to 1 accuracy even for a moderate difference in modes (l = 50). For large number of cells (n = 2000), the JSD-based clustering achieved accuracy even for the smallest l, whereas the thresholding-based approaches achieved little to zero accuracy for all the smaller values of l. For both the values of n, the marker quantile based clustering performed relatively well and beat the thresholding-based approaches in most of the scenarios. It demonstrated the novelty of capturing the difference of marker distribution, even in a simpler form, across subjects to classify them into meaningful groups.
4.2 Simulation using mean expression profile of the Cluster 2 of the mIHC data
For a subject from the Group 1, the marker expression in every cell was simulated from Beta(1.78, 45). For a subject from the Group 2, the marker expression in every cell was simulated from Beta(x, 45) where x was chosen such that the mode of this distribution was higher than the mode of the Group 1 distribution by a percentage of l. Five different values of l, and 200 were considered. Table 3 lists the average ARI of the methods across 100 replications. Refer to the Supplementary Material for a table of the confidence interval of ARI for the different methods.
Table 3.
Performance of different methods in terms of the average ARI across 100 replications in the simulation setup from Section 4.2
Number of cells | % difference in modes | JSD-based clustering | 95% thresholding | 97.5% thresholding | Marker quantiles |
---|---|---|---|---|---|
n = 200 | 10 | 0.0345 | 0.0005 | 0.0171 | 0.0025 |
20 | 0.2003 | 0.0016 | 0.0411 | 0.0160 | |
50 | 0.8656 | 0.0119 | 0.1395 | 0.1308 | |
100 | 0.9996 | 0.0699 | 0.4153 | 0.4455 | |
200 | 1.0000 | 0.5428 | 0.8696 | 0.9196 | |
n = 2000 | 10 | 0.5157 | 0.0000 | 0.0000 | 0.0403 |
20 | 0.9737 | 0.0000 | 0.0001 | 0.1788 | |
50 | 1.0000 | 0.0000 | 0.0045 | 0.7795 | |
100 | 1.0000 | 0.0000 | 0.2627 | 0.9925 | |
200 | 1.0000 | 0.4074 | 0.9984 | 1.0000 |
Once again, the JSD-based clustering outperformed the thresholding-based approaches in all the cases. Interestingly enough, the thresholding-based approaches seemed to be performing worse in this simulation setup compared to the previous one. Possibly, a different set of (t1, t2) would have been more appropriate in this scenario. It reiterated the point that the subjectivity of the thresholding-based approaches can hugely alter or affect the performance. The marker quantile based method performed better than the thresholding-based approaches, showing again why comparing the marker distributions across the subjects can turn out be to more informative and useful than the thresholding-based analysis.
4.3 Simulation under the assumptions of the thresholding-based method
Next, we devised a simulation setup where the true values of the thresholds: (t1, t2) were known and the marker expression data were generated based on that. Recall that t1 controls how one defines a cell to be positive for a marker and t2 controls how one clusters the subjects into two groups based on the number of positive cells. We again considered two groups of subjects with the subjects from Group 1 having positive cells and the subjects from Group 2 having more than positive cells. Two different values of t1, and five different values of and 0.2 were considered. Refer to the Supplementary Material for more details about the simulation strategy. Table 4 lists the average ARI of the JSD-based clustering across 100 replications for all combinations of the parameters. We found out that our method performed better for higher values of t2. The value of t1 and the value of n did not have any apparent impact on the performance. It should be kept in mind that using the thresholding approach in this simulation setup with the known values of (t1, t2), one would achieve ARI accuracy of 1 in all the cases. However, as we have repeatedly pointed out, knowing the true values of (t1, t2) will never be possible in real data.
Table 4.
Performance of the JSD-based clustering in terms of the average ARI across 100 replications in the simulation setup from Section 4.3
Number of cells | 0.005 | 0.01 | 0.05 | 0.1 | 0.2 | |
---|---|---|---|---|---|---|
n = 200 | 0.760 | 0.791 | 0.801 | 0.938 | 1.000 | |
0.784 | 0.727 | 0.815 | 0.957 | 0.987 | ||
n = 2000 | 0.778 | 0.727 | 0.800 | 0.936 | 1.000 | |
0.784 | 0.727 | 0.808 | 0.965 | 1.000 |
5 Discussion
In multiplexed tissue imaging datasets, it is of interest to stratify the subjects based on the profile of a functional marker in the TME for the purpose of risk assessment (e.g. risk of recurrence and risk of death). The most common approach is a thresholding-based method which requires elaborate tuning of two or more thresholds, one to binarize the marker expression and others to group the subjects based on the binarized expression. In consequence, the method remains largely subjective and varies from one researcher to another based on their interpretation of the data. On top of that, discarding valuable marker information by binarizing can result in loss of power and robustness. In this article, we have developed a threshold-free method for classifying subjects based on the probability density of the functional markers. The method is easy to interpret and free from the subjectivity bias.
In our method, we treat the expression-profile of a functional marker in a subject as a continuous random variable and compute its kernel density estimate based on its observed expression value in the cells of the TME. Once the marker density estimates for all the subjects have been computed, we use Jensen–Shannon distance (JSD) to quantify the difference in marker densities between the subjects. If the distance between two subjects is large, it means that they have very different marker expression-profiles. Next, the computed distance matrix is used in either of the following two ways. It can be subjected to hierarchical clustering to group the subjects into clusters and the cluster-labels can be tested for association with outcomes of interest (e.g. recurrence, survival). Or, it can be used directly in a kernel machine regression setup for testing association with outcomes of interest. We briefly discuss one more simpler method which takes into account the difference of the marker distribution across subjects in terms of only a few quantiles. The marker quantile based method can be interpreted as a weaker or less general version of the proposed JSD-based method.
We analyzed two highly complex multiplex tissue imaging datasets, an mIHC lung cancer dataset from University of Colorado School of Medicine and a publicly available TNBC MIBI data. In the lung cancer dataset, we discovered that the difference in HLA-DR marker density between subjects was significantly associated with their 5-year overall survival. In the breast cancer dataset, we found out that the difference in the density of immunoregulatory protein PD1 was associated with the disease recurrence. Next, we replicated the characteristics of the lung cancer dataset in two simulation setups and showcased the robustness of our method in comparison with the thresholding-based method. Interestingly enough, even the marker-quantile based method outperformed the thresholding-based approaches in most cases. It demonstrated how utilizing the difference of marker distribution even in a simpler form instead of a cell-specific analysis, could turn out to be more informative. In the final simulation setup, we generated datasets favoring the principles of the thresholding-based method. We showed that the JSD-based method performed competently even in that scenario.
In this article, we have focused on analyzing each of the functional markers separately. Our next goal will be to study the joint effect of multiple functional markers. One naive way of studying the joint effect would be to sum up the distance matrices corresponding to different functional markers creating a new distance matrix. This aggregated distance matrix would capture the overall difference in densities of the different markers. However, the approach is essentially assuming that the markers are independent and will be incapable of capturing complex interplay between the markers. In that light, one possible alternative would be to compare multivariate probability density of the markers across different subjects which, on the other hand, can turn out to be extremely computationally demanding. Therefore, we would study all these approaches in much greater details as a part of our next work. So far, we have tested the method on two types of multiplex imaging datasets, mIHC (Vectra) and MIBI. In future, we would like to test the applicability of our method on several other types of imaging datasets, such as HE stain (Feldman and Wolfe, 2014), newer multiplexed immunofluorescence platforms, like CODEX (Goltsev et al., 2018) and MultiOmyx (Juncker-Jensen et al., 2018).
6 Software and data avaliability
Software in the form of a GitHub R package, together with an example data-set and complete documentation are available at this link, https://github.com/sealx017/DenVar. The MIBI data is publicly available at https://mibi-share.ionpath.com/ and the mIHC lung cancer dataset is available upon request.
Supplementary Material
Acknowledgements
We thank the Human Immune Monitoring Shared Resource and support of the University of Colorado Human Immunology and Immunotherapy Initiative for their expert assistance in multiplex IHC and generation of the lung cancer dataset. We acknowledge the support of the University of Colorado Cancer Center Support Grant (P30CA046934). S.S. is funded by the Grohne-Stepp Endowment from the University of Colorado Cancer Center.
Funding
S.S. is funded by the Grohne-Stepp Endowment from the University of Colorado Cancer Center.
Conflict of Interest: none declared.
References
- Ali H.R. et al. (2020) Imaging mass cytometry and multiplatform genomics define the phenogenomic landscape of breast cancer. Nat. Cancer, 1, 163–175. [DOI] [PubMed] [Google Scholar]
- Altman D.G., Royston P. (2006) The cost of dichotomising continuous variables. BMJ, 332, 1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersen P.K., Gill R.D. (1982) Cox’s regression model for counting processes: a large sample study. Ann. Stat., 10, 1100–1120. [Google Scholar]
- Angelo M. et al. (2014) Multiplexed ion beam imaging of human breast tumors. Nat. Med., 20, 436–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Athreya K.B., Lahiri S.N. (2006) Measure Theory and Probability Theory. Vol. 19. New York: Springer. [Google Scholar]
- Basu A. et al. (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika, 85, 549–559. [Google Scholar]
- Bataille F. et al. (2006) Multiparameter immunofluorescence on paraffin-embedded tissue sections. Appl. Immunohistochem. Mol. Morphol., 14, 225–228. [DOI] [PubMed] [Google Scholar]
- Bell B.A. et al. (2010) The impact of small cluster size on multilevel models: a monte carlo examination of two-level models with binary and continuous predictors. Vancouver: JSM Proceedings Survey Research Methods Section. Vol. 1. pp. 4057–4067.
- Billingsley P. (2008) Probability and Measure. New York: John Wiley & Sons. [Google Scholar]
- Binnewies M. et al. (2018) Understanding the tumor immune microenvironment (time) for effective therapy. Nat. Med., 24, 541–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bulian P. et al. (2014) CD49d is the strongest flow cytometry-based predictor of overall survival in chronic lymphocytic leukemia. J. Clin. Oncol., 32, 897–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang B. et al. (2018) High number of PD-1 positive intratumoural lymphocytes predicts survival benefit of cytokine-induced killer cells for hepatocellular carcinoma patients. Liver Int., 38, 1449–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cossarizza A. et al. (2019) Guidelines for the use of flow cytometry and cell sorting in immunological studies. Eur. J. Immunol., 49, 1457–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costa A. et al. (2017) Role of new immunophenotypic markers on prognostic and overall survival of acute myeloid leukemia: a systematic review and meta-analysis. Sci. Rep., 7, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crainiceanu C.M., Ruppert D. (2004) Likelihood ratio tests in linear mixed models with one variance component. J. R. Stat. Soc. Series B Stat. Methodol., 66, 165–185. [Google Scholar]
- DeDeo S. et al. (2013) Bootstrap methods for the empirical study of decision-making and information flows in social systems. Entropy, 15, 2246–2276. [Google Scholar]
- Efron B. (1988) Logistic regression, survival analysis, and the Kaplan-Meier curve. J. Am. Stat. Assoc., 83, 414–425. [Google Scholar]
- Endres D.M., Schindelin J.E. (2003) A new metric for probability distributions. IEEE Trans. Inform. Theory, 49, 1858–1860. [Google Scholar]
- Feldman A.T., Wolfe D. (2014) Tissue processing and hematoxylin and eosin staining. In: Histopathology. Humana Press, New York, Oxford University Press; pp. 31–43. [DOI] [PubMed] [Google Scholar]
- Goltsev Y. et al. (2018) Deep profiling of mouse splenic architecture with codex multiplexed imaging. Cell, 174, 968–981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodfellow I. et al. (2014) Generative adversarial nets. Adv. Neural Inf. Process. Syst, 27, 2672–2680. [Google Scholar]
- Gourieroux C. et al. (1982) Likelihood ratio test, Wald test, and Kuhn-Tucker test in linear models with inequality constraints on the regression parameters. Econ. J. Econ. Soc., 50, 63–80. [Google Scholar]
- Gupta A.K., Nadarajah S. (2004) Handbook of Beta Distribution and Its Applications. Boca Raton: CRC Press. [Google Scholar]
- Harris C.R. et al. (2022) Quantifying and correcting slide-to-slide variation in multiplexed immunofluorescence images. Bioinformatics, 38, 1700–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffman G.E. (2013) Correcting for population structure and kinship using the linear mixed model: theory and extensions. PLoS One, 8, e75707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang W. et al. (2013) A colorful future of quantitative pathology: validation of vectra technology using chromogenic multiplexed immunohistochemistry and prostate tissue microarrays. Hum. Pathol., 44, 29–38. [DOI] [PubMed] [Google Scholar]
- Jensen A.M. et al. (2019) Kernel machine tests of association between brain networks and phenotypes. PLoS One, 14, e0199340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson A.M. et al. (2020) Cancer cell-intrinsic expression of MHC class ii regulates the immune microenvironment and response to anti-PD-1 therapy in lung adenocarcinoma. J. Immunol., 204, 2295–2307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson A.M. et al. (2021) Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment. J. Thorac. Oncol., 16, 1694–1704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juncker-Jensen A. et al. (2018) Using multiomyx™ to analyze correlations between immunosuppressive cells and tumor-infiltrating lymphocytes in the pancreatic tumor microenvironment. Ann. Oncol., 29, viii422. [Google Scholar]
- Keren L. et al. (2018) A structured tumor-immune microenvironment in triple negative breast cancer revealed by multiplexed ion beam imaging. Cell, 174, 1373–1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimball A.K. et al. (2018) A beginner’s guide to analyzing and visualizing mass cytometry data. J. Immunol., 200, 3–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kocak M., Onar-Thomas A. (2012) A simulation-based evaluation of the asymptotic power formulas for cox models in small sample cases. Am. Stat., 66, 173–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koguchi Y. et al. (2015) Serum immunoregulatory proteins as predictors of overall survival of metastatic melanoma patients treated with ipilimumab. Cancer Res., 75, 5084–5092. [DOI] [PubMed] [Google Scholar]
- Lawvere F.W. (1973) Metric spaces, generalized logic, and closed categories. Rend. Sem. Mat. Fis. Milano, 43, 135–166. [Google Scholar]
- Likas A. et al. (2003) The global k-means clustering algorithm. Pattern Recognit., 36, 451–461. [Google Scholar]
- Liu D. et al. (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics, 9, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maas C.J., Hox J.J. (2005) Sufficient sample sizes for multilevel modeling. Methodology, 1, 86–92. [Google Scholar]
- Murtagh F., Legendre P. (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif., 31, 274–295. [Google Scholar]
- Nielsen F. (2019) On the Jensen–Shannon symmetrization of distances relying on abstract means. Entropy, 21, 485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nikodym O. (1930) Sur une généralisation des intégrales de mj radon. Fund. Math., 15, 131–179. [Google Scholar]
- Patwa A., Yamashita R., Long J., Risom T., Angelo M., Keren L., & Rubin D. L. (2021) Multiplexed imaging analysis of the tumor-immune microenvironment reveals predictors of outcome in triple-negative breast cancer. Commun Biol., 4, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips D. et al. (2021) Highly multiplexed phenotyping of immunoregulatory proteins in the tumor microenvironment by codex tissue imaging. Front. Immunol., 12, 687673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pollan S. et al. (2020) Profiling exhausted t cells using Vectra® polaris™ multiplex immunofluorescence assay in HNSCC.
- Santos J. M., Embrechts M. (2009). On the use of the adjusted Rand index as a metric for evaluating supervised classification. In: International Conference on Artificial Neural Networks. Springer, pp. 175–184. [Google Scholar]
- Saraiva D.P. et al. (2018) HLA-DR in cytotoxic t lymphocytes predicts breast cancer patients’ response to neoadjuvant chemotherapy. Front. Immunol., 9, 2605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seal S. et al. (2021) On clustering for cell phenotyping in multiplex immunohistochemistry (mIHC) and multiplexed ion beam imaging (MIBI) data. [DOI] [PMC free article] [PubMed]
- Seal S. et al. (2022) Efficient estimation of SNP heritability using Gaussian predictive process in large scale cohort studies. PLoS Genet., 18, e1010151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shipkova M., Wieland E. (2012) Surface markers of lymphocyte activation and markers of cell proliferation. Clin. Chim. Acta., 413, 1338–1349. [DOI] [PubMed] [Google Scholar]
- Silverman B.W. (2018) Density Estimation for Statistics and Data Analysis. New York: Routledge. [Google Scholar]
- Sims G.E. et al. (2009) Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA, 106, 2677–2682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan W.C.C. et al. (2020) Overview of multiplex immunohistochemistry/immunofluorescence techniques in the era of cancer immunotherapy. Cancer Commun., 40, 135–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Therneau T. et al. (2015) Mixed effects cox models. CRAN Reposit. [Google Scholar]
- Therneau T.M. (1997) Extending the cox model. In: Proceedings of the First Seattle Symposium in Biostatistics. Springer, pp. 51–84.
- van Erven T., Harremoes P. (2014) Rényi divergence and kullback-leibler divergence. IEEE Trans. Inform. Theory, 60, 3797–3820. [Google Scholar]
- Vert J.-P. et al. (2004) A primer on kernel methods. Kernel Methods Comput. Biol., 47, 35–70. [Google Scholar]
- Vu T. et al. (2021) SPF: a spatial and functional data analytic approach to cell imaging data. bioRxiv. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z.-Z. et al. (2019) Mass cytometry analysis reveals that specific intratumoral CD4+ T cell subsets correlate with patient survival in follicular lymphoma. Cell Rep., 26, 2178–2193. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.