Abstract
A fundamental task in the automated analysis of images is the development of effective image pair comparison techniques. For two high-dimensional images, a statistical method must automatically label them as “similar” or “different” depending on whether random error and spatial dependencies could account for the pixel-wise differences. We develop a Bayesian strategy by constructing a novel extension of Dirichlet processes called the spatial random partition model (sRPM). The process groups spatially proximal image pixels with similar intensities into clusters, thereby achieving dimension reduction in the large number of pixels. Next, we apply the sRPM-based analytical procedure to compare two images. The image comparison problem is formulated as a hypothesis test involving a univariate metric adaptive to spatial correlations and robust to random variability in the pixel intensities. To handle the computational burden, we foster a two-stage technique for MCMC analysis and hypothesis testing of image pairs. A simulation study analyzes artificial datasets and finds compelling evidence for the high accuracy of sRPM in image comparison. We demonstrate the effectiveness of the technique by statistically analyzing satellite image data.
Keywords: Aggregated attraction function, Bayesian hierarchical model, Differential pixel proportion, Markov Chain Monte Carlo, Nonparametric Bayes, Spatial random partition model
1. Introduction
Due to the rapid advancements in image acquisition techniques, images have emerged as one of the most versatile and information-rich data resources across diverse industries and scientific disciplines. For example, in the manufacturing industry, images play a pivotal role in quality control processes due to their ease of acquisition and cost-effectiveness. Similarly, in the natural sciences, satellite images are indispensable for analyzing Earth’s surface and environment across disciplines such as forest science, climate science, agriculture, forecasting, ecology, and fire science (Qiu, 2005; Gonzalez, 2009).
In many of these contexts, a series of images is often collected over time from a longitudinal process—such as images acquired from a rolling process in metalworking—where a critical goal is to monitor for temporal changes and detect potential “change points.” Identifying such change points is essential as they indicate moments where the process diverges beyond acceptable limits of variation, potentially signaling anomalies or shifts in underlying dynamics. This task is closely related to statistical process control (SPC) (Qiu, 2014), but existing SPC methodologies are insufficient for high-dimensional image data due to unrealistic model assumptions and limited practical applicability.
A foundational task underpinning this monitoring process is image pair comparison. By developing effective techniques for comparing high-dimensional images, we can systematically classify pairs of images as “similar” or “different” based on whether observed pixel-wise differences can be attributed to random error. Successful image comparison is often the first step toward more complex objectives, such as inferring the nature of detected changes, isolating regions of interest, or predicting future behaviors of a system under monitoring. For example, in quality control, identifying a shift might guide interventions to rectify the process, while in satellite image analysis, detected changes might signal environmental shifts requiring further study. Thus, image comparison lays the groundwork for meaningful downstream analyses and decision-making in these applications.
In response to the growing need for sophisticated methodologies that combine robust modeling with efficient computation, this paper aims to address the challenges of high-dimensional image comparison and provide a practical framework for advancing SPC techniques tailored to modern image data. To be useful, image comparison techniques must have (i) high sensitivity and specificity, (ii) the ability to handle high dimensionality and spatial dependencies through model-based dimension reduction, (iii) flexibility, in the sense that they do not make restrictive parametric assumptions limiting the applicability of the models in real settings, and (iv) computational efficiency, for example, being able to harness the power of efficient computation to analyze high-dimensional images. This paper addresses these critical challenges, providing a practical framework for advancing SPC techniques tailored to modern image data.
To motivate the methodological development, consider three images presented in Figure 1 and consisting of 100×100 pixels each. We denote these images by and , respectively. Although the images may look identical at the first glance, in reality, all three images are different to varying degrees. A visual inspection reveals that differences between images and are relatively subtle compared to differences between and . Intuitively, it appears that random variation could possibly account for the differences between and . However, although and are very similar in most places, there are systematic differences, e.g., the dark area on the lower right side of has a slightly different shape.
Figure 1:

Are images and significantly different? Are images and significantly different?
Relevant background and present state of knowledge
There is only limited existing research on monitoring image pairs or sequences, mainly in the chemical and industrial engineering literature, where images have been widely used in recent years (Megahed et al., 2011; Prats-Montalban and Ferrer, 2014; Yan et al., 2015). Existing methods for image comparison often proceed in two main steps. For instance, the methods first extract some features from each image using techniques such as principal component analysis (PCA), and then monitor the extracted features by a conventional control chart (Duchesne et al., 2012; Lin et al., 2008). Some other methods focus on certain prespecified regions in individual images called regions of interest (ROIs), and then monitor the images by a control chart constructed using a summary statistic of the ROIs, e.g., the average image intensity (Jiang et al., 2011; Megahed et al., 2012). The first type of method completely ignores the spatial structure of the images while the second type considers the spatial structure only within the prespecified ROIs. Most of these methods fail to take into account image edges and complicated correlation structures of the spatial image intensities. Consequently, they do not provide a reliable tool for image monitoring applications; see Qiu (2020) for a related discussion. Notable exceptions include Wang and Ye (2010), who compare nonparametric curves under a heteroscedastic model with spatially correlated errors; Koosha et al. (2017), who use nonparametric wavelet basis functions to extract key features and detect change points and fault locations simultaneously; and Roy and Mukherjee (2024), who propose a feature-based image comparison method in which edges and jump points are considered as the primary features. As a side note, in the literatures of image processing, fMRI, and machine learning, there exist methods or algorithms for analyzing a set of images obtained in a given time interval (Feng and Qiu, 2018; Guo and Zhang, 2015; Guo et al., 2018; Julea et al., 2011; Lindquist, 2008). However, these methods are retrospective and cannot be used effectively for prospective monitoring of spatiotemporally correlated image sequences.
There are relatively few Bayesian techniques for image comparison. Nonparametric Bayes techniques that are potentially applicable to this problem typically involve extensions of Dirichlet processes (Ferguson, 1973; Ghosal et al., 1999) to spatial settings; see Reich and Fuentes (2015) for a recent review. These strategies include Gelfand et al. (2005), Duan et al. (2007), Dunson and Park (2008), Griffin and Steel (2006), Rodriguez et al. (2010), and Reich and Bondell (2011). Another class of Bayesian methods (e.g., Na et al., 2013; Chatzis and Tsechpenakis, 2010) applies infinite-dimensional spatial extensions of hidden Markov models to the problem of image segmentation. However, these models have certain limitations. One notable challenge is their sensitivity to the predefined definition of a ‘neighborhood,’ which is assumed to be uniform across the image and known in advance. This assumption can limit the model’s flexibility, as the Markov assumption treats pixels in different neighborhoods as independent. As a result, these models may find it difficult to capture more complex or unknown spatial correlation structures beyond first-order dependencies, including medium- or long-range interactions that can vary across the image space.
Moreover, these Bayesian methods are either broadly applicable spatial models or are primarily designed for the analysis of single images. As a result, they are not well-suited for image comparison, which is the central focus of this paper. Specifically, they lack the ability to account for spatial correlation structures between non-random pixel locations, identify and transfer relevant information across images using unsupervised, model-based dimensionality reduction techniques, or are inefficient at detecting significant deviations from quality standards while considering the spatial nature of images.
Motivated by these challenges, this article develops, in stages, a Bayesian strategy for comparison of high-dimensional image pairs. Section 2 invents a new nonparametric approach for analyzing a single image with multiple intensity values at each of pixels or voxels (e.g. a colored image). We construct a novel extension of Dirichlet processes, called the spatial random partition model (sRPM), that groups proximal image pixels with similar intensities into the same cluster. The infinite mixture model adapts the general Bayesian frameworks of random partition models and density estimation in the presence of arbitrary covariates (e.g. Müller et al., 2011) to spatial settings and image analysis in particular. The proposed sRPM model favors a small number of relatively large clusters, thereby achieving dimension reduction in the large number of pixels in a manner that accounts for spatial correlations. A key benefit of this dimension reduction is that the extracted image information is useful for answering inferential questions in image sequence monitoring applications.
Section 3 extends the sRPM-based analytical procedure to compare two images that share the same pixel locations but possess possibly different intensity values at multiple pixels. Image comparison is achieved by an intuitively appealing univariate metric for measuring image differences. This metric is not only robust to random response variability in the large number of pixels but also adaptive to spatial correlations. The image comparison problem is then formulated as a single-parameter hypothesis test. This comparison procedure utilizes only cluster-related information previously extracted from the first image, as developed in Section 2. Depending on the degree to which the two images differ, the marginal posterior of this metric is used to call the two images “similar” or “different” based on the posterior probabilities of competing hypotheses. To handle the computational burden in analyzing high-dimensional images, Section 3.1 fosters a two-stage technique for MCMC analysis and hypothesis testing of image pairs. Section 4 analyzes artificial datasets using the proposed technique and finds compelling evidence for the high accuracy of sRPM in image comparison. Section 5 applies the sRPM technique to detect differences between pairs of satellite images. Finally, several remarks conclude the article in Section 6.
2. Bayesian analysis of a single image
For , imagine that we have a single -dimensional image, denoted by and consisting of pixels. In Figure 1, because is two-dimensional. However, we are equally interested in analyzing three-dimensional images, such as brain images used in Alzheimer’s disease research. We wish to achieve dimension reduction in the large number of pixels in a manner that accounts for spatial correlation. The extracted information should be subsequently useful for comparisons between and other images in image sequence monitoring.
2.1. Spatial random partition model (sRPM)
The proposed sRPM defines a coherent process, rather than simply a model. Specifically, the distribution of the responses for any subset of pixels in image can be obtained from the distribution for a larger set of pixels by marginalization. The common domain of the images is assumed to be a design set denoted by containing regularly or irregularly arranged -dimensional pixels. This design set is assumed to belong to a compact convex set in denoted by . For pixel , where is large, and for image indexed by , the pixel is located at for two-dimensional images (i.e., ) and is associated with a pixel-specific intensity vector of length . Color images can be represented by triplets, , corresponding to RGB values. Monochrome images correspond to scalar intensities, so that .
Non-random pixel locations.
We regard the pixel locations as non-random design points and the associated intensities as random, image-specific responses. To achieve dimension reduction in the large number of pixels, , one possibility is to utilize the sparsity-inducing property of Dirichlet processes to group a set of proximal pixels with similar intensities in image into latent clusters, where is unknown but typically much smaller than . This would drastically reduce the necessary memory storage space from to include only the parameters related to the cluster characteristics. The use of Dirichlet processes to achieve dimension reduction has precedence in the literature; see Kim et al. (2006) and Dunson and Park (2008).
However, standard Dirichlet processes are inadequate for image analysis because they treat all pixels as exchangeable, and they would allocate the pixels to latent clusters regardless of spatial proximity. To accommodate the intrinsic spatial nature of images, we propose extending the Dirichlet process to a spatial random partition model (sRPM) so that only proximal pixels with similar intensities are a posteriori allocated to the same cluster. Applying density estimation regressed on covariates (e.g. Müller et al., 2011) and random partition models, the sRPM process provides a novel framework for image analysis that flexibly adapts to the spatial correlations among the pixels.
Allocation variables
Let the event represent pixel of image belonging to cluster in a spatial random partition model, which we will formulate after introducing key concepts and definitions. In the eventual inferential procedure, we will find that an MCMC point estimate of allocation vector can be computed if desired.
Aggregated attraction function.
Each cluster is associated with a latent spatial knot and a cluster-specific bandwidth, with individual pixels in image stochastically allocated to a cluster depending on the pixel-cluster attraction. More specifically, for cluster , the spatial knot is a -dimensional vector , and the bandwidth is a positive definite matrix, . Since the spatial knots and bandwidths are unknown, we assume normal-inverse Wishart (NIW) priors for these cluster-specific quantities: , for hyperparameter consisting of positive hyperparameters and -dimensional vector , and positive-definite matrix .
The pixel-cluster attraction is determined by the pixel’s position relative to the spatial knot , with the relative distance calibrated by cluster bandwidth . In particular, the attraction between pixel and cluster is assumed to be equal to the multivariate normal density, . This ensures that the pixel-cluster attraction is greater for pixels near the spatial knot and for clusters with smaller bandwidths. The concept of pixel-cluster attraction is illustrated in Figure 2 for clusters belonging to a rectangular design space in a single image. The areas and orientations of the ellipses are determined by the bandwidths quantified by the positive-definite matrices . The sizes of the dots marking the spatial knots are proportional to the attraction between pixel and the cluster.
Figure 2:

Illustration of pixel-cluster attraction for a pixel located at .
In image , using the pixel-cluster allocation variables, let denote the pixel locations assigned to the cluster. The spatial compactness of cluster in image is quantified by an aggregated attraction function denoted by . Larger values of aggregated attraction are associated with tighter spatial clusters, i.e., nearby pixels grouped into smaller clusters. Marginalizing over the unknown cluster-specific knots and bandwidths, and regarding the pixel locations as non-random design points, the aggregated attraction function for cluster in image is obtained as follows:
| (1) |
where denotes the multivariate gamma function. The new quantities appearing in equation (1) have the following definition. Based on the pixels assigned to the cluster, the updated NIW prior has hyperparameter , where
| (2) |
These quantities rely on the following cluster-related summaries for image : (i) number of pixels, , allocated to cluster by the allocation vector , (ii) pixel location averages, , and (iii) matrix error sum of squares for the pixel locations, , for . Notice that these quantities rely only on the pixel locations but not on the image intensities.
We emphasize that the pixel locations are conceived as non-random design points. By construction, the correlation induced by the spatial knots and bandwidths of the clusters implies that larger values of the attraction function are a priori associated with tighter spatial clusters, i.e., proximal pixels allocated to clusters with smaller areas or volumes, i.e., Lebesgue measures in .
Spatial random partition model for images.
As mentioned, standard Dirichlet processes are inadequate because they ignore the spatiotemporal nature of the images and regard the intensities at the pixels as a priori exchangeable. Using the cluster-specific aggregated attraction functions in (1), we extend the standard Dirichlet process to a spatial random partition model (sRPM) for the pixel-cluster allocations in image :
| (3) |
where denotes density functions, is the set of possible partitions of pixels, and is the mass parameter.
In other words, allocation vector is a priori distributed as the partitions induced on the image pixels by sRPM. Furthermore, attraction function (1) ensures that a pixel is a priori more likely to join clusters containing a larger number of spatially proximal pixels. Summing over all possible partitions of the pixels in image , the normalization constant for density (3) is .
Intensities.
Finally, we specify the sampling distribution of the intensity or response vector , for . Conditional on the pixel-cluster allocations of image , the intensity vectors of the member pixels in cluster are modeled as a shared cluster-specific mean vector, , plus Gaussian white noise. More specifically, conditional on the event that the pixel belongs to the cluster, , with assigned an inverse chi-square prior. Small values of guarantee that the clusters consist of pixels with similar intensities. Consequently, the posterior distribution of sRPM tends to group proximal image pixels with similar intensities into the same cluster, and tends to favor a small number of relatively large clusters, thereby achieving dimension reduction.
For cluster , let mean vectors in the prior. Standard conditionally conjugate priors are imposed on all the remaining hyperparameters to complete the model specification. Notice that a common parameter for the intensity components presupposes that the intensities have comparable scales; this may require standardizing each intensity component before analysis. Despite the apparently strong parametric assumption of homoscedastic Gaussian errors in the sRPM model, Dirichlet processes and many of their extensions, including sRPM, are intrinsically nonparametric and capable of flexibly adapting to complex-structured regression surfaces (Lijoi and Prünster, 2010).
2.2. Posterior inferences for image
The following Proposition establishes the posterior equivalence of two models with very different interpretations. The proof is presented in Supplementary Material. Applying well-established sampling procedures, the sRPM parameters are then updated in a relatively straightforward manner.
Proposition 1.
Consider an alternative model in which (i) sampling distribution , and (ii) allocation vector is distributed as the partitions induced by a regular Dirichlet process prior with mass parameter and base distribution . This model regards the pixel locations as random, unlike sRPM, for which is the design set of pixel locations. Nevertheless, for every , given the responses and set , the posterior distribution of the remaining parameters of sRPM and the alternative model are identical.
Intuitively, posterior inferences about the key parameters of interest are equivalent under the alternative model due to the high degree of separability between the pixel locations and image intensities in the two models. Starting with ad hoc initial values, the sRPM model parameters for image are then iteratively generated by Gibbs sampling steps from their full conditionals. The post-burn-in MCMC sample is used for posterior inference. We briefly outline the Gibbs sampling steps below for some of the parameters:
Allocation vector . For the allocation vector of the alternative posterior distribution described in Proposition 1, this parameter vector is updated using the Gibbs sampler for finite-dimensional Dirchlet process representation of Ishwaran and James (2001).
Spatial knots and bandwidths. Given the data and the current values of the remaining model parameters including allocation vector , the spatial knots and bandwidths have independent NIW distributions: , for , with the updated NIW hyperparameter described after equation (1).
Cluster-related posterior inferences.
By using the MCMC sample and applying the least-squares method of Dahl (2006), we can compute , an estimate of the pixel-cluster allocation vector for image . Intuitively, this procedure first uses the MCMC sample to estimate , the “adjacency matrix” whose elements represent the estimated posterior probabilities that each pixel pair belongs to a common cluster. Next, using an MCMC sample, estimate is chosen to minimize the Frobenius norm of the difference between and the adjacency matrix associated with . Since they are straightforward deterministic functions of , the estimated number of clusters, , and the estimated numbers of pixels allocated to the clusters, , are immediately available. Furthermore, estimates of the spatial knots, , and bandwidths, , can be computed. For example, for image of Figure 1, the detected cluster characteristics, along with image for comparison, are graphically presented in Figure 2. sRPM detected spatial clusters, achieving impressive dimension reduction in the 10, 000 image pixels.
3. Comparison of image pairs
We will extend the sRPM-based analytical procedure of Section 2 to the comparison of two images. Suppose that the images, denoted by and , share the same pixel locations but possess possibly different intensities at multiple pixel locations. Thus, design set consists of regularly or irregularly arranged pixels that are common to both images. Image comparison is achieved by an intuitively appealing measure for image differences that is not only robust to random response variability of the large number of pixels but is also adaptive to spatial correlation. Furthermore, the comparisons utilize only cluster-related information extracted from , as described in Section 2. Depending on the degree to which the two images are different, the marginal posterior of this measure is used to call the images “similar” or “different” using the posterior probabilities of competing hypotheses.
Propagation of information from image to .
One of the key features of the Bayesian paradigm is its natural ability to borrow strength from comparable or somewhat similar data. Applied to the image comparison problem, this implies that conditional on image , the posterior of plays the role of the prior for . For example, the pixels in image may join any of the clusters already discovered in image (if the pixels belong to regions of design space where the two images are somewhat similar) or may join a new set of clusters unique to image (if the pixels belong to regions where the two images are very different). If is associated with clusters, then because we condition on the clusters previously discovered in . The first of these clusters are shared with whereas the remaining clusters are unique to . In the extreme situation where the images are very different, , and none of the clusters of are occupied in . At the other extreme, when the two images are nearly identical, , and only the clusters of are occupied by the pixels of . Consequently, the latent spatial knots and bandwidths of the image clusters have the following independent normal-inverse Wishart (NIW) conditional priors:
| (4) |
where the components of the NIW hyperparameter is described in equation (2) and estimated using the Gibbs sampler for image . Extending the earlier notation for allocation variables, for , let represent the event that pixel of image belongs to cluster , and let be the set of pixel locations belonging to cluster . It can be shown that the conditional prior (4) gives the following expression for the aggregated attraction function:
| (5) |
The new quantities in equation (5) are defined as follows. Based on the assigned pixels, the updated NIW prior has hyperparameters with components
| (6) |
These quantities rely on the following cluster-related summaries from image : (i) number of pixels, , allocated to cluster by the allocation vector , (ii) pixel location averages, , and (iii) matrix error sum of squares for the pixel locations, , for . Then, the conditional sRPM for the pixel-cluster allocations of image becomes
| (7) |
where for , i.e., the new clusters of were empty in . Analogously as in image , the intensity or response vector is distributed as . For , the prior for mean vector coincides with its posterior from image . For the new clusters indexed by , the mean vectors have the previously specified no-data prior, .
A quantitative measure for image comparisons.
Using the parameters of the conditional model for image , specifically, the random number of pixels of allocated to the spatial clusters, we define the differential pixel proportion of image to be
| (8) |
where is the indicator function. This quantity measures the relative area or volume of the regions in where images and are different; values near 0 (1) are indicative of similar (dissimilar) image pairs. Since the pixel-cluster allocations are a denoised version of the image after accounting for spatial correlation, this advantage is inherited by the measure . Image comparison problem can be formulated as a single-parameter hypothesis test:
| (9) |
for some prespecified threshold . For image comparisons, the key parameter of interest is , i.e. the posterior probability of hypothesis . The two images are declared different (similar) if this posterior probability does (not) exceed 0.5.
3.1. Posterior inferences for differential proportion
Due to the computational costs of analyzing high-dimensional images, we perform combined MCMC analysis and hypothesis testing of the two images in separate stages:
Stage 1: First, image is analyzed using the computationally efficient MCMC strategy described in Section 2.2. In this manner, we obtain , an estimate of the pixel-cluster allocation vector of . From the estimated allocations, we compute the estimated number of clusters and the estimated cluster sizes . For estimating these important parameters, a detailed description is presented in Section 2.2.
Stage 2a: Assuming all parameters specific to image to be equal to their posterior estimates, image is analyzed using the Gibbs sampler of Ishwaran and James (2001) applied to the conditional sRPM (7). The Monte Carlo scheme is analogous to that for , and the computational costs of the two MCMC samplers are identical.
Stage 2b: For the Gibbs sampling update of the parameters, where , let denote the sizes of the clusters. Then, represents the MCMC draw for the parameter . The values estimate the marginal posterior density of and the estimated posterior probability , is immediately available. Images and are called different or similar depending on whether or not exceeds 0.5 .
4. Simulation studies
To investigate the accuracy of sRPM procedure in labeling images as “similar” or “different,” we generated 500 sets of truly similar and truly different image pairs (i.e. 1,000 images in total), with each monochrome image consisting of 100×100 pixels each. For each image, the pixel signals were generated from an adaptation of the Potts model (Hurn et al., 2003) with hidden states on the square lattice. A Potts model induces spatial correlation in the pixel intensities in a very different manner than sRPM. For hidden states , the model induces spatial correlation in the hidden states by borrowing information from neighboring pixels. Let be the pixel indices surrounding pixel , so that has 3, 5, and 8 elements depending on whether belongs to the image interior, edge, or corner. Specifically, , where , with being a similarity measure between hidden states and in a diagonally dominant matrix, . The similarity measures were related to the pairwise distances of randomly generated points on the square lattice. The normalizing constant is denoted by . Due to its intractability, we generated a sample from the Potts density as the last draw of the corresponding Gibbs sampler.
The hidden signals associated with the hidden states were generated i.i.d. from the standard normal distribution. Finally, zero-mean Gaussian noise was added to the pixel-specific hidden signals to obtain the continuous image intensities, which were then standardized to belong to the interval [0, 1]. Truly similar images shared the same hidden signals but differed in the pixel-specific noise. Truly different images not only differed in pixel-specific noise, but also had slightly different similarity measures in the Potts densities.
Following image generation, the posterior inference strategy outlined in Section 3 was implemented to analyze each image pair using the sRPM technique and call each image pair as similar or different. Averaging over the 500 datasets, the blue line in Figure 4 displays the receiver operating characteristic (ROC) curve of accurate calls, along with 95% confidence bands, as the threshold similarity was increased from 0 to 1. As is well known, the area under the curve (AUC) close to 1 is indicative of a method’s reliability. The estimated AUC for sRPM was 0.928 and a 95% confidence interval was (0.914, 0.942).
Figure 4:

Estimated ROC curves of the sRPM and Feng and Qiu (2018) approaches for the 500 artificial datasets of the simulation study. The dotted lines represent 95% confidence bands for the ROC curves.
Minimizing the distance from the ROC curve to the upper left corner of the square, we obtained the optimal threshold value of . The percentage of correct and incorrect calls for the optimal is displayed in Table 1. image comparison. In extensive simulation studies, we have obtained reliable inferences for threshold values belonging to the interval, [0.1, 0.2]. For image comparison in real datasets, we recommend selecting the threshold of hypothesis test (9) in this range. For comparison, we analyzed the same datasets using the image comparison approach based on continuity regions (Feng and Qiu, 2018). Averaging over the 500 datasets, the red line of Figure 4 displays the ROC curve with 95% confidence bands. For the Feng and Qiu (2018) method, the estimated AUC was 0.801, with a 95% confidence interval of (0.770, 0.831), demonstrating the significantly greater effectiveness of sRPM in image comparisons. Minimizing the distance of the competing method’s ROC curve to the upper left corner of the square in Figure 4, Table 1 displays the optimal percentage of correct and incorrect image pairs calls for the Feng and Qiu (2018) approach. These results reveal that the sRPM method has approximately balanced levels of sensitivity (i.e., ability to detect differences) and specificity (i.e., ability to detect similarities). In contrast, the Feng and Qiu (2018) strategy has significantly lower sensitivity and specificity. These findings reveal that, at least for the types of images investigated here, the sRPM method is more conservative than the Feng and Qiu (2018) strategy in rejecting the null hypothesis of image similarity.
Table 1:
Averaging over the 500 artificial image pairs of the simulation study, the optimal detection accuracies of the sRPM and Feng and Qiu (2018) methods for image comparison. See the text for further explanation. Shown in parentheses are standard errors.
| sRPM | |||
|---|---|---|---|
| Truth | |||
| Similar | Different | ||
| Detected | Similar | 86.4% (1.5%) | 13.6% (1.5%) |
| Different | 14.0% (1.6%) | 86.0% (1.6%) | |
| Feng and Qiu (2018) | |||
| Truth | |||
| Similar | Different | ||
| Detected | Similar | 78.4% (1.8%) | 21.6% (1.8%) |
| Different | 29.8% (2.0%) | 70.2% (2.0%) | |
5. Analysis of satellite images
Chicago area images
Figure 5 displays monochrome satellite images of the Chicago area taken in 1990 (image ) and in 1999 (image ). The data are available at available at https://users.phhp.ufl.edu/pqiu/research/book/data/index.html. From an informal visual comparison, we find that the two images have a few differences. Following the 9-year interval between the images, there appear to be more dark spots in image . These changes may be due to environmental changes or new construction. Some dark spots in image also appear in image , although their sizes have changed, e.g., in the lower portions of the images. Most of the differences between the images are fairly small in magnitude, difficult to describe using a parametric model, and are mostly local, affecting only small portions of the images.
Figure 5:

Reference satellite image was taken in 1990 in the Chicago area. Satellite image of the same area taken in 1999.
For calling the two images similar or different, we focused on the differential pixel proportion, , defined in equation (8), and performed hypothesis test (9) to obtain an MCMC estimate of posterior probability . As mentioned earlier, extensive simulation studies have suggested a threshold value belonging to the interval [0.1, 0.2] to obtain reliable inferences. In general, two images were declared to different or similar depending on whether or not the estimate exceeded 0.5.
For the Chicago area images, the posterior inference procedure outlined in Section 3.1 was implemented to obtain the marginal posterior of differential parameter . An MCMC estimate of the marginal posterior is shown in the left panel of Figure 6. The right panel of Figure 6 plots the estimated posterior probability as varies over the range [0, 0.5]. We find that as increases, the posterior probability stays level at approximately 1 before falling sharply as approaches 0.5. That is, irrespective of the prespecified threshold , the estimated posterior probability exceeds 0.99, which is a Bayes factor greater than 100 if we assign equal prior probabilities to the competing hypotheses. This is decisive evidence that the Chicago area satellite images are different.
Figure 6:

For the Chicago area images, the left panel displays an MCMC estimate of the marginal posterior of differential parameter . The right panel displays an estimate of the posterior probability for different values of the threshold .
Bay Area images
Figure 7 displays two satellite images of the San Francisco Bay Area in 1990 and 1999. Figure 1 of Supplementary Material displays the estimated marginal posterior of differential parameter . We find negligible posterior mass assigned to smaller than 0.48. Consequently, the estimated probability was equal to 1 for any prespecified threshold , and is overwhelming evidence that the two satellite images are different.
Figure 7:

In the left panel, satellite image of the San Francisco bay area, taken in 1990. In the right panel, satellite image of the same region, taken in 1999.
6. Discussion
There is a critical need for statistical techniques that can reliably identify “similar” or “different” image pairs depending on whether random measurement error is able to account for pixel-wise differences between the images. We have developed a Bayesian strategy relying on a novel extension process called the spatial random partition model (sRPM). This process achieves dimension reduction while accounting for spatial correlation by grouping proximal image pixels with similar intensities into clusters. The extracted image information is then utilized for image comparison via a single-parameter hypothesis test based on a univariate metric that is adaptive to pixel intensity variation and spatial correlation. We foster a computationally efficient two-stage MCMC technique. A simulation study analyzes artificial datasets and finds compelling evidence for the high sensitivity and specificity of sRPM. The success of sRPM is demonstrated by analyzing satellite image data. R code implementing the method is available at https://github.com/sguha-lab/sRPM.
The nonparametric nature of sRPM makes it highly effective in analyzing images with diffuse spatially differentiated regions. However, the method may be challenged by images with sharply demarcated features (e.g., text images) or jagged images with steep slopes. In applications where a sequence of multiple images is available from a longitudinal process, interest often focuses on detecting temporal change points, if any, at which the longitudinal process diverges beyond acceptable limits of variation. We are currently developing natural extensions of the proposed sRPM analytical framework in such a manner that, at any point in the image sequence, only cluster-related information that has been progressively extracted from the preceding images is used to determine whether the current time point is a change point. It is often known that the first few images of the sequence correspond to an “in-control” condition of the longitudinal process. In these situations, we are devising strategies that exploit this information to further improve the inferential accuracy of monitoring image sequences. The MCMC strategy of Ishwaran and James (2001) is adequate for the image sizes of this paper. However, for ultra high-dimensional images, these algorithms are computationally intensive. We are exploring whether the Metropolis-Hastings algorithm proposed by Guha (2010) could potentially be applied to significantly reduce computational times while ensuring that the posterior mixing rates and ESS in the sense of Turek et al. (2017) are satisfactory.
Supplementary Material
Figure 3:

Image in Figure 1 and its detected cluster characteristics by the sRPM technique.
Acknowledgements
This work was supported by NSF grants DMS-1854003 awarded to SG and DMS-1914639 awarded to PQ, and by NIH awards R01 CA269398 and U01 CA209414 to SG. We thank the Associate Editor and two anonymous referees for many insightful remarks that improved the paper’s focus and content.
Footnotes
Conflict of Interest
The authors report there are no competing interests to declare.
References
- Chatzis Sotirios P and Tsechpenakis Gabriel (2010), “The infinite hidden Markov random field model,” IEEE Transactions on Neural Networks, 21, 6, 1004–1014. [DOI] [PubMed] [Google Scholar]
- Dahl DB (2006), Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model. Cambridge University Press. [Google Scholar]
- Duan JA, Guindani M, and Gelfand AE (2007), “Generalized spatial Dirichlet process models,” Biometrika, 94, 809–825. [Google Scholar]
- Duchesne C, Liu J, and MacGregor J (2012), “Multivariate image analysis in the process industries: A review,” Chemometrics and Intelligent Laboratory Systems, 117, 116–128. [Google Scholar]
- Dunson DB and Park J-H (2008), “Kernel stick-breaking processes,” Biometrika, 95, 307–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng Long and Qiu Peihua (2018), “Difference detection between two images for image monitoring,” Technometrics, 60, 3, 345–359. [Google Scholar]
- Ferguson Thomas S (1973), “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 209–230. [Google Scholar]
- Gelfand A, Kottas A, and MacEachern S (2005), “Bayesian nonparametric spatial modeling with Dirichlet processes mixing,” Journal of the American Statistical Association, 100, 1021–1035. [Google Scholar]
- Ghosal S, Ghosh JK, and Ramamoorthi RV (1999), “Posterior consistency of Dirichlet mixtures in density estimation,” The Annals of Statistics, 27, 143–158. [Google Scholar]
- Gonzalez Rafael C. (2009), Digital image processing, Pearson Education India. [Google Scholar]
- Griffin JE and Steel MFJ (2006), “Order-based dependent Dirichlet processes,” Journal of the American Statistical Association, 101, 179–194. [Google Scholar]
- Guha S (2010), “Posterior simulation in countable mixture models for large datasets,” Journal of the American Statistical Association, 105, 775–786. [Google Scholar]
- Guo X and Zhang CM (2015), “Difference detection between two images for image monitoring,” Statistica Sinica, 25, 475–498. [Google Scholar]
- Guo Y, Jia X, and Paull D (2018), “Effective sequential classifier training for SVM-based multitemporal remote sensing image classification,” IEEE Transactions on Image Processing, 27, 475–498. [Google Scholar]
- Hurn MA, Husby OK, and Rue H (2003), A Tutorial on Image Analysis. Springer, New York, NY. [Google Scholar]
- Ishwaran Hemant and James Lancelot F (2001), “Gibbs sampling methods for stick-breaking priors,” Journal of the American Statistical Association, 96, 453, 161–173. [Google Scholar]
- Jiang W, Han SW, Tsui KL, and Woodall WH (2011), “Spatiotemporal surveillance methods in the presence of spatial correlation,” Statistics in Medicine, 30, 569–583. [DOI] [PubMed] [Google Scholar]
- Julea A, Meger N, Bolon P, Rigotti C, Doin MP, Lasserre C, Trouve E, and Lazarescu VN (2011), “Unsupervised spatiotemporal mining of satellite image time series using grouped frequent sequential patterns,” IEEE Transactions on Geoscience and Remote Sensing, 49, 1417–1430. [Google Scholar]
- Kim S, Tadesse MG, and Vannucci M (2006), “Variable selection in clustering via Dirichlet process mixture models,” Biometrika, 93, 877–893. [Google Scholar]
- Koosha Mehdi, Noorossana Rassoul, and Megahed Fadel (2017), “Statistical process monitoring via image data using wavelets,” Quality and Reliability Engineering International, 33, 8, 2059–2073. [Google Scholar]
- Lijoi A and Prünster I (2010), “Models beyond the Dirichlet process,” In Bayesian Nonparametrics, eds. Hjort NL, Holmes C, Müller P, and Walker SG, Cambridge, U.K.: Cambridge Series in Statistical and Probabilistic Mathematics, pp. 80–136. [Google Scholar]
- Lin HD, Chung CY, and Lin WT (2008), “Principal component analysis based on wavelet characteristics applied to automated surface defect inspection,” WSEAS Transactions on Computer Research, 3, 193–202. [Google Scholar]
- Lindquist MA (2008), “The statistical analysis of fMRI data,” Statistical Science, 23, 439–464. [Google Scholar]
- Megahed FM, Wells LJ, Camelio JA, and Woodall WH (2012), “A spatiotemporal method for the monitoring of image data,” Quality Reliability and Engineering International, 28, 967–980. [Google Scholar]
- Megahed FM, Woodall WH, and Camelio JA (2011), “A review and perspective on control charting with image data,” Journal of Quality Technology, 43, 83–98. [Google Scholar]
- Müller P, Quintana F, and Rosner GL (2011), “A product partition model with regression on covariates,” Journal of Computational and Graphical Statistics, 20, 260–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Na In Seop, Cho Wan Hyun, Kim Soo Hyung, Kang Soon Ja, Nguyen Trung Quy, and Choi Jun Yong (2013),, Unsupervised color images segmentation using spatial hidden MRF GDPM model. in Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication, 1–8. [Google Scholar]
- Prats-Montalban JM and Ferrer A (2014), “Statistical process control based on multivariate image analysis: A new proposal for monitoring and defect detection,” Computers and Chemical Engineering, 71, 501–511. [Google Scholar]
- Qiu Peihua (2005), Image Processing and Jump Regression Analysis, John Wiley & Sons. [Google Scholar]
- Qiu P (2014), Introduction to Statistical Process Control, Chapman & Hall/CRC. [Google Scholar]
- —(2020), “Big data? Statistical process control can help,” The American Statistician, DOI: 10.1080/00031305.2019.1700163. [DOI] [Google Scholar]
- Reich BJ and Fuentes M (2015), Spatial Bayesian Nonparametric Methods. Springer International Publishing. [Google Scholar]
- Reich Brian J and Bondell Howard D (2011), “A spatial Dirichlet process mixture model for clustering population genetics data,” Biometrics, 67, 2, 381–390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez A, Dunson DB, and Gelfand AE (2010), “Latent stick-breaking processes,” Journal of the American Statistical Association, 105, 647–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy Anik and Mukherjee Partha Sarathi (2024), “Image comparison based on local pixel clustering,” Technometrics, 1–12. [Google Scholar]
- Turek Daniel, de Valpine Perry, Paciorek Christopher J, and Anderson-Bergman Clifford (2017), “Automated parameter blocking for efficient Markov chain Monte Carlo sampling,” Bayesian Analysis, 12, 2, 465–490. [Google Scholar]
- Wang Xiao-Feng and Ye Deping (2010), “On nonparametric comparison of images and regression surfaces,” Journal of statistical planning and inference, 140, 10, 2875–2884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan H, Paynabar K, and Shi J (2015), “Image-based process monitoring using low-rank tensor decomposition,” IEEE Transactions on Automation Science and Engineering, 12, 216–227. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
