Abstract
Motivated by the problem of detecting changes in two-dimensional X-ray diffraction data, we propose a Bayesian spatial model for sparse signal detection in image data. Our model places considerable mass near zero and has heavy tails to reflect the prior belief that the image signal is zero for most pixels and large for an important subset. We show that the spatial prior places mass on nearby locations simultaneously being zero, and also allows for nearby locations to simultaneously be large signals. The form of the prior also facilitates efficient computing for large images. We conduct a simulation study to evaluate the properties of the proposed prior and show that it outperforms other spatial models. We apply our method in the analysis of X-ray diffraction data from a two-dimensional area detector to detect changes in the pattern when the material is exposed to an electric field.
Keywords: Bayesian variable selection, High-dimensional data, Image analysis, X-ray diffraction
1. Introduction
X-ray diffraction (XRD) is a powerful technique to characterize the atomic structure of a material. There are nearly 300,000 structures cataloged in the International Center for Diffraction Data (Thomas, 2012; Editorial, 2014). For example, we analyze the XRD pattern of lead zirconate titanate (PZT). PZT is a useful functional material known as a piezoelectric. As a piezoelectric, PZT changes shape when an electric field is applied, making it useful for actuation, positional control and energy conversion. The material we investigate in the present study is a commercially available PZT polycrystalline ceramic material known by the tradename K350 (Piezo Technologies). This material has been studied previously under varying temperature, electric field, pressure and deformation, making it an established material (e.g., Katrusiak, 2008; Dutta and Singh, 2011; Gorfman et al., 2011; Esteves et al., 2015) that is suitable for the present investigation. The data in the present experiment are acquired using a two-dimensional area detector that measures diffracted intensity; this method can be referred to as 2-D XRD. The data are shown in Figure 1 and described in detail in Section 2. In a 2-D XRD image, each pixel represents the z-score under an electric field relative to several baseline images. There are few z-scores absolutely greater than two, and these pixels tend to cluster in rings indicative of atomic structure. Therefore, we analyze these data using sparse spatial signal detection methods. Our objective is to develop a more powerful statistical method for detecting and quantifying structural changes in 2-D XRD patterns under different experimental conditions.
Figure 1:
The map of the standardized intensity Y(s) (left) with the maximum electric field 2 kV/mm at time t = 20 seconds using the sample mean and the sample standard deviation of the 100 baseline images without electric field and the smoothed Y(s) (right) using a Gaussian kernel with the bandwidth 0.75 pixels.
The statistical problem of identifying spatial regions affected by an experimental factor has many applications, including epidemiology, neuroscience and materials science. As a result, there is a rich literature on the topic. The spatial scan statistic (Naus, 1965; Kulldorff, 1997; Costa and Kulldorff, 2009; Liao et al., 2017) searches for regions with different means than the background, but does not estimate the signal and is not well suited for large images with many significant subregions. The two-dimensional fused lasso (Friedman et al., 2007), the graphical lasso (Friedman et al., 2008) and the smooth-sparse decomposition method (Yan et al., 2017) are penalized regression methods that account for spatial structure in the signal using penalties to encourage spatial smoothness. Yan et al. (2017) incorporates denoising and signal detection into one step for images with a smooth background and facilitates to keep sharp boundaries which is not achieved in traditional two-step procedures (Qiu and Yandell, 1997; Bradley and Roth, 2007; Sollie, 2013). Spatial wavelet shrinkage methods impose a threshold on coefficients in the wavelet domain to recover a sparse signal (Donoho and Johnstone, 1994; Taswell, 2000; Jansen, 2001; Yadav et al., 2014; He and Wang, 2017). These regularization methods can be applied to high-dimensional data, but require presetting the tuning parameters via cross validation (Mallick and Yi, 2013) and fail to account for all sources of uncertainty.
Our approach builds on Bayesian variable selection methods. This allows us to fully account for uncertainty in the posterior distribution and incorporate known atomic structure in the prior. An intuitive sparse prior is a two-component mixture (spike and slab) with one component concentrated near zero for the unimportant features and the other diffuse for the signals (Mitchell and Beauchamp, 1988; George and McCulloch, 1993; Yuan and Lin, 2005; Geweke, 1996; George and McCulloch, 1997; O’hara and Sillapää, 2009; Ročková and George, 2016). Two-component mixture priors can be extended to the spatial setting to identify subregions of interest (Goldsmith et al., 2014; Boehm Vock et al., 2015; Li et al., 2015; Kang et al., 2018). However, the two-component construction is computationally challenging because posterior sampling requires a search over a considerable space of complex models and is plagued by slow convergence and poor mixing (Carvalho et al., 2010; Johnson and Rossell, 2012; Mallick and Yi, 2013).
The computational difficulties of spike-and-slab priors are abated by continuous shrinkage priors (Carvalho et al., 2010; Griffin and Brown, 2010; Armagan et al., 2013; Bhattacharya et al., 2014; Bhadra et al., 2016b; Piironen and Vehtari, 2016). Rather than a discrete mixture over two components, these priors continuously mix over shrinkage parameters. For example, Carvalho et al. (2010) proposed the horseshoe prior that assumes a normal prior with mean zero and standard deviation that follows a half-Cauchy prior. The horseshoe prior has high concentration around zero for sparsity, heavy tails to avoid excessive shrinkage of signals and attractive theoretical properties (Datta and Ghosh, 2013; van der Pas et al., 2014; Bhadra et al., 2016a; van der Pas et al., 2017; Zhang et al., 2017).
To our knowledge, we propose the first continuous shrinkage prior for spatial data. We extend the horseshoe prior to account for spatial dependence in the signal at nearby observations. We prove that the proposed spatial horseshoe prior has the univariate horseshoe prior marginally at each location, and that the induced joint distribution for pairs of nearby sites has higher concentration around zero and heavier tails compared to independent horseshoe priors. The form of the continuous shrinkage prior permits simple expressions for the gradients of the posterior, and thus we use the Hamiltonian Monte Carlo (HMC) algorithm (Neal, 1994) for efficient sampling in high-dimensions. A simulation study demonstrates that the proposed method is effective at identifying spatial signals. When applied to the 2-D XRD data, we find improved cross-validation performance compared to other methods.
The paper is organized as follows. We describe the data in Section 2. In Sections 3 and 4, we introduce the proposed shrinkage prior and show its theoretical properties. In Section 5, we evaluate the proposed prior through the simulation study. In Section 6, we apply our method to the 2-D XRD data and compare with other methods. We conclude with remarks and comments in Section 7.
2. Description of the 2-D XRD data
The 2-D XRD data were acquired using high-energy X-rays generated at a synchrotron source (Advanced Photon Source, Argonne National Laboratory). The beamline 11-ID-C was used with an energy of 105 keV. The sample was located in the X-ray beam and a silicon-based detector was placed in the transmitted direction, approximately 2 meters from the sample position. Electrical connections were made from a high voltage power supply to the top conductive electrode of the PZT sample. The bottom electrode was grounded. 2-D XRD patterns were measured at sequentially increasing and then decreasing electric field amplitudes from −2 kV/mm to 2 kV/mm. The objective is to examine if introducing electric field changes the structure of the PZT sample and to capture the diffraction pattern. The detector records data over time which yields images that are 2048 × 2048 pixel matrices. The intensity of the XRD measured in counts per second (cps) indicates the relative abundance at each pixel. In addition, we transform the image of size 2048 × 2048 into size 1023 × 1023 by removing the first and last rows consisting of all zeros due to experimental setup and extracting all odd rows and columns. We surmise that this step is reasonable in that the intensity is close to the values nearby, and the diffraction pattern is still apparent. (See Supplementary Materials A for the data before and after this reduction.)
We compare the diffraction pattern of PZT with electric field to that without electric field and investigate the locations where changes occur. There are 100 baseline images recorded without electric field. We define the response Y(s) as the standardized intensity, , where X(s) is the intensity at pixel s, is sample mean of the 100 baseline images at pixel s, and is sample standard deviation of the 100 baseline images at pixel s. The yellow rings in Figure 1 indicate the larger change of diffraction pattern at t = 20 seconds. We choose the image at time t = 20 seconds since this is when the maximum electric field is applied. The main features in Figure 1 are the rings centered on the middle of the image. We project the spatial shrinkage process into the polar coordinates similar to a Radon transform (Radon, 1986) to accommodate these annular features in Sections 3 and 6. The Radon transformation has been used successfully for image analysis problems related to motion detection (Carretero-Moya et al., 2009; Xu et al., 2011), deblurring (Cho et al., 2011) and classification (Kinoshita et al., 2008; Deepak and Sivaswamy, 2011; Acharya et al., 2016).
3. Model description
3.1. The univariate horseshoe prior
Define Y as the response variable and β as the signal variable. The likelihood for Y with horseshoe prior for β is
(1) |
where σ2 and τ2 are the error variances, and λ follows the standard half-Cauchy distribution on the positive reals. If σ2 = 1, the posterior mean is
(2) |
where is the shrinkage coefficient. E(κ | Y) determines the amount of shrinkage towards zero. The name of the prior comes from the horseshoe-shaped Beta distribution on the shrinkage coefficient κ that is induced by the half-Cauchy distribution on λ. The shape of the shrinkage density implies that the shrinkage coefficient is close to either zero or one with high probability which shrinks null signals towards zero and avoids shrinking the true signals. This property facilitates separating signals from the noise. Although marginally over λ this prior for β does not have a closed form of the probability density function, it enjoys nice theoretical properties including tight bounds, high concentration near zero and heavy tails (Carvalho et al., 2010).
3.2. The spatial horseshoe prior (SHP)
To extend the horseshoe prior to the spatial setting, define Y (s) as the real-valued response at a spatial location s = (s1,s2) E ℝ2 and β(s) as the true signal at s. For data observed at locations s1,..., sn, define Y = [Y (s1), ..., Y (sn)]T as the n × 1 response vector and β = [β(s1), ..., β(sn)]T as the signal vector. Extending (1), let
(3) |
where σ2 is the error variance, In is the identity matrix of size n, Λ is diagonal with the diagonal elements λ = [λ(s1), ..., λ(sn)]T, and Σβ is a spatial covariance matrix. Independence of the response Y given the signals β is a strong assumption, but justified for the 2-D XRD data where the distribution of photons should theoretically follow a Poisson process with independent counts across pixels. Physics dictates that the photons should follow a Poisson process so that the number of photons that land in disjoint regions are independent of each other. The photon detectors are identical and operate independently and so this assumption should carry through to the observed data.
Spatial shrinkage is induced by the spatial process model for λ(s). We propose a Gaussian copula model (Nelsen, 2006) that preserves the marginal half-Cauchy distribution and captures spatial dependence,
(4) |
where λ0 is a global scale parameter, θ(s) is a latent spatial Gaussian process with mean zero and variance one for all s, is the half-Cauchy link function, is the inverse cumulative density function of the half-Cauchy distribution, and Φ(·) is the standard normal cumulative distribution function. The half-Cauchy link function provides a marginal half-Cauchy distribution for l[θ(s)] if E[θ(s)] = 0 and Var[θ(s)] = 1. We interpret l[θ(s)] as a local shrinkage parameter and λ0 as a global shrinkage parameter, respectively.
The latent shrinkage process θ(s) could follow another Gaussian process prior with mean zero, variance one and spatial correlation. We also consider a low-rank representation of the latent shrinkage process
(5) |
where xj(s) is the jth basis function for location s and . The spatial model gives the same marginal distribution as (1), but with spatial dependence both in the distribution for β | λ and the distribution for λ. We will elaborate on the properties of this model in Section 4.
4. Theoretical properties
4.1. Properties of the spatial horseshoe prior
In this section we study the properties of the SHP for a pair of spatial locations as a function of the distance between the locations. We compare the SHP with the simple model with the independent β and λ, namely, and . The joint density of β in this special case is (Carvalho et al., 2010)
(6) |
where E1 (·) is the exponential integral function. Next, we consider the joint shrinkage parameter λ = λ(s1) =λ(s2) ~ C+(0,1) and the covariance Σβ with the diagonal elements 1 and the spatial correlation ρβ which stands in for the distance between the two locations. Then we obtain (Supplementary Materials B.1) the joint density
(7) |
where erfc(·) is the complementary error function.
Figure 2 plots log[P0(β1,β2)] and log[P1(β1, β2)] for ρβ = 0, 0.5, 0.9. When ρβ = 0, the log density with a common shrinkage parameter has higher spike around zero and heavier tail than the log density with independent shrinkage parameters. Increasing the ρβ gives the higher concentration on the 45-degree line of the log density.
Figure 2:
Log prior density of β1 and β2 in the first quadrant for (a) independent horseshoe priors (log P0) and (b)-(d) the spatial horseshoe prior (log P1) with ρβ ∈ {0, 0.5, 0.9}.
4.1.1. Concentration around zero
The densities P0(β1,β2) and P1(β1,β2) both have high concentration towards zero. It can be shown that and . Moreover, we show that the density with joint shrinkage concentrates towards zero more than the density with independent shrinkage by proving that the ratio of P1(β1,β2) to P0(β1, β2) diverges as β1 and β2 approach zero. Let R be the ratio,
(8) |
We show that with the divergence rate (Supplementary Materials B.2). This indicates that joint sparsity is achieved more rapidly in P1 (β1, β2) than in Po(β1, β2).
4.1.2. Tail behavior
We also investigate the capability to retain signals of large values through examining the tail densities with joint and independent shrinkage. In Figure 2, we observe that the tail density is lower in P0(β1,β2) than P1(β1,β2) and becomes larger when the spatial dependence is stronger. The higher tail density implies a better capability to avoid excessive shrinkage. Theoretically, it is shown that with the divergence rate (Supplementary Materials B.3). This result suggests the higher tail density with joint shrinkage and o its stronger power to incorporate large signals.
4.2. Properties of the spatial horseshoe posterior
Let the error variance σ2 be 1, then the posterior mean can be expressed as E(β | Y, λ) = (I2 — K)Y, where the shrinkage matrix and Ωβ = Λ Σβ Λ. To examine the joint shrinkage induced by the SHP, we consider the one-number complexity summary (Efron, 2004)
(9) |
If λ1 = λ2 = λ, then
(10) |
In this special case with , and thus its prior distribution is under the SHP resembles the horseshoe density scaled by 2. Figure 3 plots the density of df. The SHP with common shrinkage has spikes at zero and two corresponding to joint shrinkage. This property encourages noises to be shrunk to zero and signals to be retained at the original values. The density of df with independent λ has a spike around one which corresponds to shrinking only one of β1 and β2, instead of shrinkage or non-shrinkage for both variables. This characteristic even holds when ρβ = 0.9 which is contradictory to the expectation that a high correlation between two variables leads to similar shrinkage behaviors.
Figure 3:
Induced prior density plot of the complexity measure df = tr(I — K) with independent and common shrinkage with ρβ ∈ {0, 0.5,0.9}.
5. Simulation Study
We investigate how the SHP performs in detecting sparse signals compared with other methods using a simulation study.
5.1. Data generation
We consider the two true surfaces:
and
The observations are generated with on a 40 × 40 grid of n = 1, 600 locations. The proportion of signals with nonzero β0(s) is 7.06% and 10.13% in the two surfaces. In addition, we vary the error variance σ2 = 0.52,12, 22. For each combination of true surface and σ2, we generate N = 100 data sets. Figure 4 shows the two surfaces and a representative simulated data set for each surface.
Figure 4:
True surface β0(s) (left) and a simulated data set Y(s) with error standard deviation σ = 0.5 (middle) and σ = 2 (right).
5.2. Models
Although the SHP can accommodate other covariance structures, we use the conditionally autoregressive prior (CAR, Carlin and Banerjee, 2003) for Σβ. The CAR covariance is Σβ = (M − ρβA) −1, where M is the diagonal matrix with the elements m1, ...,mn indicating the number of neighbors for locations s1, ..., sn, ρβ is the spatial dependence parameter, and A is the adjacency matrix with Aij = 1 if locations si and sj are neighbors and Aij = 0 otherwise. Note that while ρβ ∈ (0,1) determines the strength of spatial dependence, it is not a correlation parameter as in Section 4. The CAR prior is a natural choice for data on a discrete grid as discussed in Cressie (2013) and Gelfand (2010). This is an intuitive prior specification and has the advantage of a sparse inverse covariance matrix which leads to a dramatic computational savings. For our analysis of over a million pixels, a prior that did not give a sparse inverse covariance would require approximation and/or prohibitively long run times. Computational details are given in Supplementary Materials C.
We fit five models for β(s) to each simulated dataset. The first four are versions of the spatial horseshoe differentiated by their flexibility in modeling the shrinkage process λ(s). The fifth method is soft-thresholded Gaussian process (STGP) model of Kang et al. (2018) that places prior mass exactly at zero.
Gaussian:
λ(s) as a constant across space, λ(s) = λo which leads to The error variance σ2 and the scale parameter follow uninformative inverse gamma prior IG(0.1, 0.1), and the spatial dependence parameter ρβ follows a beta prior Beta(10,1). This is the usual Gaussian CAR model.
SHS_quad:
Low-rank representation using the half-Cauchy link function with quadratic basis expansions, , where l[·] is the half-Cauchy link function, , , and x6(si) = si1si2 for i = 1, ..., n.
SHS_B-spline 1:
Low-rank representation using the half-Cauchy link function with the B-spline basis expansions with five degrees of freedom for each coordinate, λ(si) = λ0l[θ(si)], where l[·] is the half-Cauchy link function, , and xj (si) is the jth product of B-spline bases at location si.
SHS_B-spline 2:
Same structure as the model SHS_B-spline 1, but with ten degrees of freedom for each coordinate.
STGP:
Soft-thresholded Gaussian process prior for β(s). Let follow a multivariate normal distribution with zero mean and CAR covariance with first-order neighborhood. Then , where gK is the soft-thresholding function to map near zero to exact zero and thus gives a sparse prior,
The thresholding parameter κ follows a uniform prior U(0,10) and controls the degree of sparsity.
5.3. Evaluation metrics
We compare methods in terms of root mean squared error (RMSE), coverage probability, Type I error and power. For simulated data set k, let be the posterior mean and [lk(s),uk(s)] be the 95% credible interval of β(s). We list the formulae of the measures as follows:
5.4. Results
The results are given in Tables 1 and 2. RMSE decreases as the number of basis functions in the spatial shrinkage process λ(s) increases since a larger number of basis functions enhances the flexibility in λ(s). RMSE in the SHS_B-spline 2 model drops 34.06% compared with RMSE in the Gaussian model under the low-noise case in Signal 2. SHS_B-spline models have the smallest RMSE in the low-noise and mid-noise cases while STGP outerperforms other models in RMSE under the other scenario. Coverage and type I error are at or near the nominal level for all methods except for the Gaussian model in the low-noise cases. All models have strong power when the error variance is small. However, large error variance distinguishes the models. STGP model has about 66% the power of SHS models. The loss of power in STGP model results from excessive shrinkage from the soft-thresholding formulation, especially in the large-noise scenario.
Table 1:
Summary of the simulation study under Signal 1 by error variance σ2 for the Gaussian, spatial horseshoe with quadratic (SHS_quad), spline 1 (SHS_B-spline 1) and spline 2 (SHS_B-spline 2) shrinkage models and the soft-thresholded Gaussian process (STGP) model.
σ2 = 0.52 |
σ2 = 12 |
σ2 = 22 |
|||||
---|---|---|---|---|---|---|---|
Statistics | Model | Estimate | SE | Estimate | SE | Estimate | SE |
100×RMSE | Gaussian | 33.08 | 0.11 | 42.34 | 0.17 | 52.21 | 0.26 |
SHS_quad | 22.32 | 0.07 | 31.92 | 0.14 | 43.84 | 0.28 | |
SHS_B-spline 1 | 19.67 | 0.13 | 28.97 | 0.21 | 42.32 | 0.39 | |
SHS_B-spline 2 | 21.31 | 0.12 | 31.19 | 0.14 | 44.31 | 0.34 | |
STGP | 24.05 | 0.22 | 30.75 | 0.45 | 39.76 | 0.33 | |
Coverage (%) | Gaussian | 91.67 | 0.17 | 97.46 | 0.05 | 98.66 | 0.05 |
SHS_quad | 98.09 | 0.04 | 98.68 | 0.03 | 99.28 | 0.03 | |
SHS_B-spline 1 | 97.91 | 0.51 | 97.78 | 0.68 | 98.61 | 0.60 | |
SHS_B-spline 2 | 97.61 | 0.36 | 97.31 | 0.08 | 98.22 | 0.13 | |
STGP | 98.15 | 0.07 | 99.48 | 0.03 | 99.62 | 0.02 | |
Type I error (%) | Gaussian | 7.25 | 0.14 | 2.60 | 0.06 | 1.18 | 0.05 |
SHS_quad | 1.79 | 0.05 | 1.27 | 0.04 | 0.79 | 0.04 | |
SHS_B-spline 1 | 1.73 | 0.40 | 1.91 | 0.55 | 1.06 | 0.19 | |
SHS_B-spline 2 | 1.96 | 0.23 | 2.27 | 0.09 | 1.44 | 0.15 | |
STGP | 0.56 | 0.03 | 0.15 | 0.02 | 0.01 | 0.00 | |
Power (%) | Gaussian | 100.00 | 0.00 | 99.97 | 0.02 | 98.00 | 0.14 |
SHS_quad | 100.00 | 0.00 | 99.86 | 0.03 | 95.36 | 0.25 | |
SHS_B-spline 1 | 100.00 | 0.00 | 99.86 | 0.03 | 95.65 | 0.25 | |
SHS_B-spline 2 | 100.00 | 0.00 | 99.94 | 0.02 | 96.60 | 0.27 | |
STGP | 100.00 | 0.00 | 98.48 | 0.15 | 67.85 | 0.62 |
Table 2:
Summary of the simulation study under Signal 2 by error variance σ2 for the Gaussian, spatial horseshoe with quadratic (SHS_quad), spline 1 (SHS_B-spline 1) and spline 2 (SHS_B-spline 2) shrinkage models and the soft-thresholded Gaussian process (STGP) model.
σ2 = 0.52 |
σ2 = 12 |
σ2 = 22 |
|||||
---|---|---|---|---|---|---|---|
Statistics | Model | Estimate | SE | Estimate | SE | Estimate | SE |
100×RMSE | Gaussian | 37.04 | 0.12 | 49.21 | 0.18 | 60.98 | 0.27 |
SHS_quad | 27.48 | 0.08 | 39.66 | 0.15 | 53.13 | 0.27 | |
SHS_B-spline 1 | 24.87 | 0.09 | 36.78 | 0.22 | 52.65 | 0.42 | |
SHS_B-spline 2 | 24.41 | 0.10 | 36.66 | 0.13 | 52.53 | 0.36 | |
STGP | 27.03 | 0.20 | 43.07 | 0.46 | 49.55 | 0.37 | |
Coverage (%) | Gaussian | 88.09 | 0.24 | 96.47 | 0.07 | 98.59 | 0.05 |
SHS_quad | 96.74 | 0.08 | 97.74 | 0.06 | 98.76 | 0.05 | |
SHS_B-spline 1 | 97.33 | 0.11 | 97.75 | 0.22 | 98.29 | 0.40 | |
SHS_B-spline 2 | 97.40 | 0.12 | 96.87 | 0.07 | 98.13 | 0.08 | |
STGP | 98.15 | 0.07 | 98.42 | 0.07 | 99.31 | 0.03 | |
Type I error (%) | Gaussian | 9.79 | 0.17 | 3.67 | 0.08 | 1.39 | 0.05 |
SHS_quad | 2.87 | 0.08 | 2.07 | 0.07 | 1.08 | 0.05 | |
SHS_B-spline 1 | 2.40 | 0.09 | 1.85 | 0.14 | 1.45 | 0.35 | |
SHS_B-spline 2 | 2.19 | 0.09 | 2.53 | 0.09 | 1.36 | 0.08 | |
STGP | 0.56 | 0.03 | 0.65 | 0.05 | 0.03 | 0.01 | |
Power (%) | Gaussian | 100.00 | 0.00 | 99.90 | 0.02 | 94.93 | 0.21 |
SHS_quad | 100.00 | 0.00 | 99.79 | 0.04 | 92.69 | 0.27 | |
SHS_B-spline 1 | 100.00 | 0.00 | 99.66 | 0.05 | 92.43 | 0.32 | |
SHS_B-spline 2 | 100.00 | 0.00 | 99.79 | 0.03 | 92.65 | 0.33 | |
STGP | 100.00 | 0.00 | 98.91 | 0.09 | 64.36 | 0.56 |
Figures 5 and 6 illustrate the simulation results under Signal 2 with σ = 0.5 and 2, respectively. Increasing the flexibility of the shrinkage process improves signal identification and leads to a smoother signal surface. The Gaussian model tends to have more false positives, especially when the error variance is small. This explains why it has strong power but poor RMSE. The simulation plots under other scenarios including data generated with spatially-correlated errors are shown in Supplementary Materials D. For the computing time, analyzing a simulated data set takes 0.33, 0.87, 1.23, 3.40 and 10.29 minutes for Gaussian, SHS_quad, SHS_B-spline 1, SHS_B-spline 2 and STGP models, respectively.
Figure 5:
Simulation results for Signal 2 with σ = 0.5 for the four models (rows): the first row shows the true signal β0(s) and results for the soft-thresholded Gaussian process (STGP) model, the second row shows the simulated data Y(s) and results for the Gaussian model, the third and fourth rows show results for spatial horseshoe with quadratic (SHS_quad) and spline (SHS_B-spline 2) shrinkage models.
Figure 6:
Simulation results for Signal 2 with σ = 2 for the four models (rows): the first row shows the true signal β0(s) and results for the soft-thresholded Gaussian process (STGP) model, the second row shows the simulated data Y(s) and results for the Gaussian model, the third and fourth rows show results for spatial horseshoe with quadratic (SHS_quad) and spline (SHS_B-spline 2) shrinkage models.
6. Analysis of the 2-D XRD data
6.1. Model comparisons
In this section we apply the proposed model in Section 3 to the 2-D XRD data. We tailor the SHP model to the 2-D XRD data to capture the ring-shaped pattern visible in Figure 1. Rather than using the basis expansions of square coordinates in the simulation study, we consider a basis expansion of the radius, r ≥ 0, and the angle, a ∈ [0, 2π), from the central point. In the simulation study the true features were defined on disks and thus we used radial basis functions for the smoothing process, λ(s). As is apparent in Figure 1, the features in the XRD data are annuli, and we thus choose basis functions define in polar coordinates to capture the shape of these features. We use Fourier and B-spline basis expansions for angle and radius, respectively. be the Fourier basis functions and be the B-spline basis functions, where ka and kr are the numbers of basis functions in the angle and the radius, respectively. The Fourier basis functions are A1(a) = sin(a),A2(a) = cos(a),A3(a) = sin (2a), and A4(a) = cos (2a), etc. At pixel s, the basis functions consist of J = ka × kr products of Fourier and B-spline basis expansions, i.e. . The priors and computing details remain the same as in Section 5.
We implement five-fold cross-validation to evaluate the prediction in models with varying flexibility of shrinkage across space. We consider (ka, kr) = (4,10), (4, 25), (4,50), (4,100), (8, 5) and (8, 50). We do not include STGP in the model comparisons because of the heavy computation. Table 3 presents the results as RMSE–1, since for the z-scores with independent standard normal error, one is the lowest achievable prediction RMSE. Generally, predicted RMSE declines as flexibility of shrinkage grows, similar to the simulation results shown in Section 5. However, a larger number of basis functions of angle tends to have a smaller prediction error when the total number of basis expansions is the same, e.g. SHS1 versus SHS5 and SHS3 versus SHS6. According to Wilcoxon signed rank test, RMSE of the models SHS1-SHS6 is significantly less than RMSE of the Gaussian model. Coverage is close to the nominal level 95% for all models.
Table 3:
Comparison of prediction on the 2-D XRD data using the Gaussian and spatial horseshoe (SHS) models with ka basis functions in the angle and kr basis functions in the radius (SHS1-SHS6). Methods are compared using 100×(root mean squared error-1) as RMSE, coverage (%) and computing time per 100 iterations in minutes.
Model | ka | kr | RMSE |
Coverage (%) |
Computing time (mins) | ||
---|---|---|---|---|---|---|---|
Estimate | SE | Estimate | SE | ||||
Gaussian | - | - | 3.64 | 0.08 | 94.72 | 0.02 | 2.78 |
SHS1 | 4 | 10 | 3.59 | 0.08 | 94.78 | 0.02 | 6.57 |
SHS2 | 4 | 25 | 3.57 | 0.08 | 94.79 | 0.02 | 9.91 |
SHS3 | 4 | 50 | 3.55 | 0.08 | 94.80 | 0.03 | 17.45 |
SHS4 | 4 | 100 | 3.54 | 0.08 | 94.80 | 0.02 | 41.61 |
SHS5 | 8 | 5 | 3.56 | 0.08 | 94.80 | 0.02 | 7.38 |
SHS6 | 8 | 25 | 3.50 | 0.09 | 94.84 | 0.02 | 8.62 |
6.2. Summary of fitted models
In this section we fit the Gaussian and SHS models as above to the full dataset. Because the number of pixels is large, we implement the Bayesian spatial false discovery rate (BSFDR) procedure (Sun et al., 2015) with rate 0.05 to control for multiple testing. We consider the one-sided null and alternative hypotheses H0 : β(s) ≤ 0 and H1 : β(s) > 0, respectively. The BSFDR procedure provides a critical probability that the null hypothesis is rejected if the posterior probability is greater than the critical probability. The critical probabilities are 92.93% for all models, except for 91.92% for the SHS4 model. The number of pixels for which H0s is rejected are also similar for the Gaussian and SHS models, from 1.42% to 2.02%.
Figure 7 displays the fitted results for the Gaussian and SHS6 (with the lowest RMSE) models. The posterior mean of the shrinkage parameter λ(s) (Figure 7, bottom left) is large for only a few radii and for only a subset of the angles on the rings. These areas also have the largest posterior mean of β(s) and posterior probability that β(s) is positive. The map of difference in posterior probability (Figure 7, top right) exhibits that the model SHS6 encourages shrinkage along the rings. Comparing models, we find more significant pixels and stronger spatial clustering of the signal in the SHS model compared to the Gaussian model. In addition, Figure 8 illustrates that the density of posterior mean β(s) from the SHP model (SHS6) has higher concentration around zero and heavier tails than the density of β(s) from the Gaussian model.
Figure 7:
Results of the Gaussian model (top row), and the best prediction performance model SHS6 (bottom row) and columns in different statistics; the first column plots the observations and posterior mean of shrinkage process λ(s) from the model SHS6, the second column plots the truncated (at ±0.5) posterior mean of signal β(s) to illustrate contrast between the two models, and the third column plots the difference in posterior probability that β(s) is positive, i.e. P(β(s) > 0 | Y)_SHS6 — P(β(s) > 0 | Y)_Gaussian, and the signal map for the model SHS6 with the Bayesian spatial false discovery rate controlled at 0.05.
Figure 8:
Density plot (left) and quantile-quantile plot (right) of the posterior mean of signal β(s) from the Gaussian and SHS6 models.
Figure 7 is an analysis of one time point (t = 20). We investigate the temporal trend of i signals by fitting the model SHS6 for the standardized intensity Y(s) at all time points. We. use the sum of squared signals, , as a measure of strength of structural change of the PZT sample. Figure 9 displays the boxplots of the posterior of D and the electric field by time. Generally, the structural change becomes larger when the magnitude of electric field increases. The measure D is smaller in the first half of experimental period under a positive charge compared to the second half of period under a negative charge. The reason for the rapid increase in D between 54 and 58 seconds is a material phenomenon known as ferroelectric switching; the electric field amplitude at this time exceeds that required for substantial reorientation of material elements, leading to a significant change in the diffraction pattern. The example of this method on ferroelectrics demonstrates the versatility of the method at efficiently assessing 2-D XRD images for the purposes of materials characterization.
Figure 9:
Time series plot of the posterior of (boxplots) from the model SHS6 applied separately for a bipolar electric field from −2 kV/mm to 2 kV/mm (line).
7. Conclusion
In this paper, we propose a new method for sparse signal defection with application to the 2-D XRD imaging data. To our knowledge, the SHP is the first continuous shrinkage prior for spatial data. Theoretically, we prove that the SHP has the univariate horseshoe prior marginally at each location, and that the induced joint distribution for pairs of nearby sites has higher concentration around zero and heavier tails compared to independent horseshoe priors. Further, we facilitate the computation via the HMC algorithm (Neal, 1994) for high-dimensional data. The simulation and empirical results both show the improvement of estimation and prediction when we consider spatial dependence.
A potential limitation of our method is that the covariance structure of data is assumed to be independent given the mean vector β. This is sufficient for the 2-D XRD imaging data because the distribution of photons ought to follow a Poisson process with independent counts across pixels, but it may not hold for different data. Our simulation results (Supplemental Materials D) show that for data with moderate spatial correlation our working independence model maintained reasonable error rates but had lower power than our previous simulation with independent errors. For simulated data with strong residual spatial correlation, our model with working independent assumption did not work well. The shortcomings of the working independence model led us to consider a richer model that includes spatial correlation in the residuals. A second restriction is the assumption of CAR covariance structure with the first order neighbors. The covariance structure can be modified to meet needs in other situations. For instance, the Matérn covariance (Stein, 1999) is a covariance function depending on the distance between two given locations, a smooth parameter and a range parameter. The general class of covariance functions includes the exponential covariance and the squared exponential covariance. The flexibility may be preferred and computationally feasible for smaller datasets.
Deep learning via convolutional neural networks (CNN) have emerged as immensely powerful tools for extracting information from images (Goodfellow et al., 2016; Rawat and Wang, 2017; Yamashita et al., 2018). The standard CNN is a supervised learning algorithm that uses an image predictor. However, in our setting we do not have known labels (e.g., images with locations that are known to have a change) to train the model. Also, while CNNs are strong for prediction, they are weak for formal statistical inference which is our primary objective.
In the future, the proposed method can be extended to Bayesian variable selection in spatial linear regression. In spatial regression, the covariates effect can vary by site (Gelfand et al., 2003), and the SHP for the spaitally-varying effects would encourage sparsity in these effects. We can also extend the SHP to a spatio-temporal horseshoe model for longitudinal data. We can use the extended model to test for spatio-temporal anomalies in a surveillance study (e.g. Li et al., 2012) or process monitoring (e.g. Yan et al., 2018).
Supplementary Material
Acknowledgements
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The authors thank the editor, associate editor, and two anonymous referees for insightful and constructive comments on an earlier version of the manuscript.
References
- Acharya UR, Fujita H, Sudarshan VK, Mookiah MRK, Koh JEW, Tan JH, Hagiwara Y, Chua CK, Junnarkar SP, Vijayananthan A, and Ng KH (2016). An integrated index for identification of fatty liver disease using radon transform and discrete cosine transform features in ultrasound images. Information Fusion, 31:43–53. [Google Scholar]
- Armagan A, Dunson DB, and Lee J (2013). Generalized double Pareto shrinkage. Statistica Sinica, 23(1):119–143. [PMC free article] [PubMed] [Google Scholar]
- Bhadra A, Datta J, Li Y, Polson NG, and Willard B (2016a). Prediction risk for the horseshoe regression. ArXiv e-prints.
- Bhadra A, Datta J, Polson NG, and Willard B (2016b). Default Bayesian analysis with global-local shrinkage priors. Biometrika, 103(4):955–969. [Google Scholar]
- Bhattacharya A, Pati D, Pillai NS, and Dunson DB (2014). Dirichlet-laplace priors for optimal shrinkage. ArXiv e-prints. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015). Spatial variable selection methods for investigating acute health effects of fine particulate matter components. Biometrics, 71(1):167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bradley D and Roth G (2007). Adaptive thresholding using the integral image. Journal of Graphics Tools, 12(2):13–21. [Google Scholar]
- Carlin PB and Banerjee S (2003). Hierarchical multivariate CAR models for spatio- temporally correlated survival data. Bayesian Statistics, 7:45–63. [Google Scholar]
- Carretero-Moya J, Gismero-Menoyo J, Asensio-López A, and Blanco-del Campo A (2009). Application of the radon transform to detect small-targets in sea clutter. IET Radar, Sonar & Navigation, 3(2):155–166. [Google Scholar]
- Carvalho CM, Polson NG, and Scott JG (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480. [Google Scholar]
- Cho TS, Paris S, Horn BKP, and Freeman WT (2011). Blur kernel estimation using the Radon transform. In CVPR 2011, pages 241–248. [Google Scholar]
- Costa MA and Kulldorff M (2009). Applications of spatial scan statistics: a review In Glaz J, Pozdnyakov V, and Wallenstein S, editors, Scan Statistics: Methods and Applications, pages 129–152. Birkhäuser Boston, Boston, MA. [Google Scholar]
- Cressie NAC (2013). Statistics for Spatial Data. Wiley, New York, NY. [Google Scholar]
- Datta J and Ghosh JK (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Analysis, 8(1):111–132. [Google Scholar]
- Deepak KS and Sivaswamy J (2011). Automatic assessment of macular edema from color retinal images. IEEE Transations on Medical Imaging, 31(3):766–776. [DOI] [PubMed] [Google Scholar]
- Donoho DL and Johnstone JM (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455. [Google Scholar]
- Dutta I and Singh RN (2011). Dynamic in situ x-ray diffraction study of antiferroelectric-ferroelectric phase transition in strontium-modified lead zirconate titanate ceramics. Integrated Ferroelectrics, 131(1):153–172. [Google Scholar]
- Editorial (2014). Crystallography matters. Nature Materials, 13. [DOI] [PubMed] [Google Scholar]
- Efron B (2004). The estimation of prediction error: covariance penalties and cross-validation. Journal of the American Statistical Association, 99(467):619–632. [Google Scholar]
- Esteves G, Fancher CM, and Jones JL (2015). In situ characterization of polycrystalline ferroelectrics using x-ray and neutron diffraction. Journal of Materials Research, 30(3):340–356. [Google Scholar]
- Friedman J, Hastie T, Höfling H, and Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332. [Google Scholar]
- Friedman J, Hastie T, and Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelfand AE, Kim H-J, Sirmans CF, and Banerjee S (2003). Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98(462):387–396. [Google Scholar]
- Gelfand AE, D. P. G. P. F. M. (2010). Handbook of Spatial Statistics. CRC Press, Boca Raton, FL. [Google Scholar]
- George EI and McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889. [Google Scholar]
- George EI and McCulloch RE (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7(2):339–374. [Google Scholar]
- Geweke J (1996). Variable selection and model comparison in regression. Bayesian Statistics, 5:609–620. [Google Scholar]
- Goldsmith J, Huang L, and Crainiceanu CM (2014). Smooth scalar-on-image regression via spatial Bayesian variable selection. Journal of Computational and Graphical Statistics, 23(1):46–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodfellow I, Bengio Y, and Courville A (2016). Deep Learning. MIT Press; http://www.deeplearningbook.org. [Google Scholar]
- Gorfman S, Keeble DS, Glazer AM, Long X, Xie Y, Ye Z-G, Collins S, and Thomas PA (2011). High-resolution x-ray diffraction study of single crystals of lead zirconate titanate. Physical Review B, 84:020102. [Google Scholar]
- Griffin JE and Brown PJ (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1):171–188. [Google Scholar]
- He L and Wang Y (2017). Wavelet frame based image restoration using sparsity, nonlocal and support prior of frame coefficients. The Visual Computer. [Google Scholar]
- Jansen M (2001). Noise Reduction by Wavelet Thresholding. Springer, New York, NY. [Google Scholar]
- Johnson VE and Rossell D (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 107(498):649–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang J, Reich BJ, and Staicu A-M (2018). Scalar-on-image regression via the soft- thresholded Gaussian process. Biometrika, 105(1):165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katrusiak A (2008). High-pressure crystallography. Acta Crystallographica Section A, 64(1):135–148. [DOI] [PubMed] [Google Scholar]
- Kinoshita SK, Azevedo-Marques PM, Pereira RR Jr, Rodrigues AH, and Ran- gayyan RM (2008). Radon-domain detection of the nipple and the pectoral muscle in mammograms. Journal of Digital Imaging, 21(1):37–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kulldorff M (1997). A spatial scan statistic. Communications in Statistics - Theory and Methods, 26(6):1481–1496. [Google Scholar]
- Li F, Zhang T, Wang Q, Gonzalez MZ, Maresh EL, and Coan JA (2015). Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression. The Annals of Applied Statistics, 9(2):687–713. [Google Scholar]
- Li G, Best N, Hansell AL, Ahmed I, and Richardson S (2012). BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics, 13(4):695–710. [DOI] [PubMed] [Google Scholar]
- Liao Y, Li X, and Wang J (2017). The study on modified spatial scan statistic In Yang W, editor, Early Warning for Infectious Disease Outbreak, pages 329–342. Academic Press, Cambridge, MA. [Google Scholar]
- Mallick H and Yi N (2013). Bayesian methods for high dimensional linear models. Journal of Biometrics and Biostatistics, [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitchell TJ and Beauchamp JJ (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032. [Google Scholar]
- Naus JI (1965). The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association, 60(310):532–538. [Google Scholar]
- Neal RM (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics, 111(1):194–203. [Google Scholar]
- Nelsen RB (2006). An Introduction to Copulas. Springer, New York, NY. [Google Scholar]
- O’hara RB and Sillapää MJ (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian Analysis, 4(1):85–118. [Google Scholar]
- Piironen J and Vehtari A (2016). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. ArXiv e-prints.
- Qiu P and Yandell B (1997). Jump detection in regression surfaces. Journal of Computational and Graphical Statistics, 6(3):332–354. [Google Scholar]
- Radon J (1986). On the determination of functions from their integral values along certain manifolds (Parks PC Trans.). IEEE-Transactions on Medical Imaging, MI-5(4):170–176. [DOI] [PubMed] [Google Scholar]
- Rawat W and Wang Z (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29(9):2352–2449. [DOI] [PubMed] [Google Scholar]
- Ročková V and George EI (2018). The spike-and-slab lasso. Journal of the American Statistical Association, 113(521):431–444. [Google Scholar]
- Sollie P (2013). Morphological Image Analysis: Principles and Applications. Springer Science & Business Media, Berlin, Germany. [Google Scholar]
- Stein ML (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, NY. [Google Scholar]
- Sun W, Reich BJ, Tony Cai T, Guindani M, and Schwartzman A (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1):59–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taswell C (2000). The what, how, and why of wavelet shrinkage denoising. Computing in Science Engineering, 2(3):12–19. [Google Scholar]
- Thomas JM (2012). The birth of x-ray crystallography. Nature, 491:186–187. [DOI] [PubMed] [Google Scholar]
- van der Pas S, Szabó B, and van der Vaart A (2017). Uncertainty quantification for the horseshoe. Bayesian Analysis, 12(4):1221–1274. [Google Scholar]
- van der Pas SL, Kleijn BJK, and van der Vaart AW (2014). The horseshoe estimator: posterior concentration around nearly black vectors. Electronic Journal of Statistics, 8(2):2585–2618. [Google Scholar]
- Xu J, Yu J, Peng Y, and Xia X (2011). Radon-Fourier transform for radar target detection, I: Generalized Doppler filter bank. IEEE Transactions on Aerospace and Electronic Systems, 47(2):1186–1202. [Google Scholar]
- Yadav M, Yadav S, and Sharma D (2014). Image denoising using orthonormal wavelet transform with stein unbiased risk estimator. In Electrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students’ Conference, pages 1–4, Bhopal, India. [Google Scholar]
- Yamashita R, Nishio M, Do RKG, and Togashi K (2018). Convolutional neural networks: an overview and application in radiology. Insights into Imaging, 9(4):611–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan H, Paynabar K, and Shi J (2017). Anomaly detection in images with smooth background via smooth-sparse decomposition. Technometrics, 59(1):102–114. [Google Scholar]
- Yan H, Paynabar K, and Shi J (2018). Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics, 60(2):181–197. [Google Scholar]
- Yuan M and Lin Y (2005). Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472):1215–1225. [Google Scholar]
- Zhang Y, Reich BJ, and Bondell H (2017). High dimensional linear regression via the R2-D2 shrinkage prior. ArXiv e-prints.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.