Spatial Signal Detection Using Continuous Shrinkage Priors

An-Ting Jhuang; Montserrat Fuentes; Jacob L Jones; Giovanni Esteves; Chris M Fancher; Marschall Furman; Brian J Reich

doi:10.1080/00401706.2018.1546622

. Author manuscript; available in PMC: 2020 Mar 22.

Published in final edited form as: Technometrics. 2019 Mar 22;61(4):494–506. doi: 10.1080/00401706.2018.1546622

Spatial Signal Detection Using Continuous Shrinkage Priors

An-Ting Jhuang ¹, Montserrat Fuentes ², Jacob L Jones ³, Giovanni Esteves ³, Chris M Fancher ⁴, Marschall Furman ¹, Brian J Reich ¹

PMCID: PMC6853616 NIHMSID: NIHMS1520715 PMID: 31723308

Abstract

Motivated by the problem of detecting changes in two-dimensional X-ray diffraction data, we propose a Bayesian spatial model for sparse signal detection in image data. Our model places considerable mass near zero and has heavy tails to reflect the prior belief that the image signal is zero for most pixels and large for an important subset. We show that the spatial prior places mass on nearby locations simultaneously being zero, and also allows for nearby locations to simultaneously be large signals. The form of the prior also facilitates efficient computing for large images. We conduct a simulation study to evaluate the properties of the proposed prior and show that it outperforms other spatial models. We apply our method in the analysis of X-ray diffraction data from a two-dimensional area detector to detect changes in the pattern when the material is exposed to an electric field.

Keywords: Bayesian variable selection, High-dimensional data, Image analysis, X-ray diffraction

1. Introduction

X-ray diffraction (XRD) is a powerful technique to characterize the atomic structure of a material. There are nearly 300,000 structures cataloged in the International Center for Diffraction Data (Thomas, 2012; Editorial, 2014). For example, we analyze the XRD pattern of lead zirconate titanate (PZT). PZT is a useful functional material known as a piezoelectric. As a piezoelectric, PZT changes shape when an electric field is applied, making it useful for actuation, positional control and energy conversion. The material we investigate in the present study is a commercially available PZT polycrystalline ceramic material known by the tradename K350 (Piezo Technologies). This material has been studied previously under varying temperature, electric field, pressure and deformation, making it an established material (e.g., Katrusiak, 2008; Dutta and Singh, 2011; Gorfman et al., 2011; Esteves et al., 2015) that is suitable for the present investigation. The data in the present experiment are acquired using a two-dimensional area detector that measures diffracted intensity; this method can be referred to as 2-D XRD. The data are shown in Figure 1 and described in detail in Section 2. In a 2-D XRD image, each pixel represents the z-score under an electric field relative to several baseline images. There are few z-scores absolutely greater than two, and these pixels tend to cluster in rings indicative of atomic structure. Therefore, we analyze these data using sparse spatial signal detection methods. Our objective is to develop a more powerful statistical method for detecting and quantifying structural changes in 2-D XRD patterns under different experimental conditions.

Figure 1: — The map of the standardized intensity Y(s) (left) with the maximum electric field 2 kV/mm at time t = 20 seconds using the sample mean $\hat{μ} (s)$ and the sample standard deviation $\hat{σ} (s)$ of the 100 baseline images without electric field and the smoothed Y(s) (right) using a Gaussian kernel with the bandwidth 0.75 pixels.

The statistical problem of identifying spatial regions affected by an experimental factor has many applications, including epidemiology, neuroscience and materials science. As a result, there is a rich literature on the topic. The spatial scan statistic (Naus, 1965; Kulldorff, 1997; Costa and Kulldorff, 2009; Liao et al., 2017) searches for regions with different means than the background, but does not estimate the signal and is not well suited for large images with many significant subregions. The two-dimensional fused lasso (Friedman et al., 2007), the graphical lasso (Friedman et al., 2008) and the smooth-sparse decomposition method (Yan et al., 2017) are penalized regression methods that account for spatial structure in the signal using penalties to encourage spatial smoothness. Yan et al. (2017) incorporates denoising and signal detection into one step for images with a smooth background and facilitates to keep sharp boundaries which is not achieved in traditional two-step procedures (Qiu and Yandell, 1997; Bradley and Roth, 2007; Sollie, 2013). Spatial wavelet shrinkage methods impose a threshold on coefficients in the wavelet domain to recover a sparse signal (Donoho and Johnstone, 1994; Taswell, 2000; Jansen, 2001; Yadav et al., 2014; He and Wang, 2017). These regularization methods can be applied to high-dimensional data, but require presetting the tuning parameters via cross validation (Mallick and Yi, 2013) and fail to account for all sources of uncertainty.

Our approach builds on Bayesian variable selection methods. This allows us to fully account for uncertainty in the posterior distribution and incorporate known atomic structure in the prior. An intuitive sparse prior is a two-component mixture (spike and slab) with one component concentrated near zero for the unimportant features and the other diffuse for the signals (Mitchell and Beauchamp, 1988; George and McCulloch, 1993; Yuan and Lin, 2005; Geweke, 1996; George and McCulloch, 1997; O’hara and Sillapää, 2009; Ročková and George, 2016). Two-component mixture priors can be extended to the spatial setting to identify subregions of interest (Goldsmith et al., 2014; Boehm Vock et al., 2015; Li et al., 2015; Kang et al., 2018). However, the two-component construction is computationally challenging because posterior sampling requires a search over a considerable space of complex models and is plagued by slow convergence and poor mixing (Carvalho et al., 2010; Johnson and Rossell, 2012; Mallick and Yi, 2013).

The computational difficulties of spike-and-slab priors are abated by continuous shrinkage priors (Carvalho et al., 2010; Griffin and Brown, 2010; Armagan et al., 2013; Bhattacharya et al., 2014; Bhadra et al., 2016b; Piironen and Vehtari, 2016). Rather than a discrete mixture over two components, these priors continuously mix over shrinkage parameters. For example, Carvalho et al. (2010) proposed the horseshoe prior that assumes a normal prior with mean zero and standard deviation that follows a half-Cauchy prior. The horseshoe prior has high concentration around zero for sparsity, heavy tails to avoid excessive shrinkage of signals and attractive theoretical properties (Datta and Ghosh, 2013; van der Pas et al., 2014; Bhadra et al., 2016a; van der Pas et al., 2017; Zhang et al., 2017).

To our knowledge, we propose the first continuous shrinkage prior for spatial data. We extend the horseshoe prior to account for spatial dependence in the signal at nearby observations. We prove that the proposed spatial horseshoe prior has the univariate horseshoe prior marginally at each location, and that the induced joint distribution for pairs of nearby sites has higher concentration around zero and heavier tails compared to independent horseshoe priors. The form of the continuous shrinkage prior permits simple expressions for the gradients of the posterior, and thus we use the Hamiltonian Monte Carlo (HMC) algorithm (Neal, 1994) for efficient sampling in high-dimensions. A simulation study demonstrates that the proposed method is effective at identifying spatial signals. When applied to the 2-D XRD data, we find improved cross-validation performance compared to other methods.

The paper is organized as follows. We describe the data in Section 2. In Sections 3 and 4, we introduce the proposed shrinkage prior and show its theoretical properties. In Section 5, we evaluate the proposed prior through the simulation study. In Section 6, we apply our method to the 2-D XRD data and compare with other methods. We conclude with remarks and comments in Section 7.

2. Description of the 2-D XRD data

The 2-D XRD data were acquired using high-energy X-rays generated at a synchrotron source (Advanced Photon Source, Argonne National Laboratory). The beamline 11-ID-C was used with an energy of 105 keV. The sample was located in the X-ray beam and a silicon-based detector was placed in the transmitted direction, approximately 2 meters from the sample position. Electrical connections were made from a high voltage power supply to the top conductive electrode of the PZT sample. The bottom electrode was grounded. 2-D XRD patterns were measured at sequentially increasing and then decreasing electric field amplitudes from −2 kV/mm to 2 kV/mm. The objective is to examine if introducing electric field changes the structure of the PZT sample and to capture the diffraction pattern. The detector records data over time which yields images that are 2048 × 2048 pixel matrices. The intensity of the XRD measured in counts per second (cps) indicates the relative abundance at each pixel. In addition, we transform the image of size 2048 × 2048 into size 1023 × 1023 by removing the first and last rows consisting of all zeros due to experimental setup and extracting all odd rows and columns. We surmise that this step is reasonable in that the intensity is close to the values nearby, and the diffraction pattern is still apparent. (See Supplementary Materials A for the data before and after this reduction.)

We compare the diffraction pattern of PZT with electric field to that without electric field and investigate the locations where changes occur. There are 100 baseline images recorded without electric field. We define the response Y(s) as the standardized intensity, $Y (s) = \frac{X (s) - \hat{μ} (s)}{\hat{σ} (s)}$ , where X(s) is the intensity at pixel s, $\hat{μ} (s)$ is sample mean of the 100 baseline images at pixel s, and $\hat{σ} (s)$ is sample standard deviation of the 100 baseline images at pixel s. The yellow rings in Figure 1 indicate the larger change of diffraction pattern at t = 20 seconds. We choose the image at time t = 20 seconds since this is when the maximum electric field is applied. The main features in Figure 1 are the rings centered on the middle of the image. We project the spatial shrinkage process into the polar coordinates similar to a Radon transform (Radon, 1986) to accommodate these annular features in Sections 3 and 6. The Radon transformation has been used successfully for image analysis problems related to motion detection (Carretero-Moya et al., 2009; Xu et al., 2011), deblurring (Cho et al., 2011) and classification (Kinoshita et al., 2008; Deepak and Sivaswamy, 2011; Acharya et al., 2016).

3. Model description

3.1. The univariate horseshoe prior

Define Y as the response variable and β as the signal variable. The likelihood for Y with horseshoe prior for β is

Y | β ~ N (β, σ^{2}), β | λ ~ N (0, τ^{2} λ^{2}), λ ~ C^{+} (0, 1),

(1)

where σ² and τ² are the error variances, and λ follows the standard half-Cauchy distribution on the positive reals. If σ² = 1, the posterior mean is

E (β | Y) = \int_{0}^{1} (1 - κ) Y P (κ | Y) d κ = [1 - E (κ | Y)] Y,

(2)

where $κ = \frac{1}{1 + λ^{2}}$ is the shrinkage coefficient. E(κ | Y) determines the amount of shrinkage towards zero. The name of the prior comes from the horseshoe-shaped Beta $(\frac{1}{2}, \frac{1}{2})$ distribution on the shrinkage coefficient κ that is induced by the half-Cauchy distribution on λ. The shape of the shrinkage density implies that the shrinkage coefficient is close to either zero or one with high probability which shrinks null signals towards zero and avoids shrinking the true signals. This property facilitates separating signals from the noise. Although marginally over λ this prior for β does not have a closed form of the probability density function, it enjoys nice theoretical properties including tight bounds, high concentration near zero and heavy tails (Carvalho et al., 2010).

3.2. The spatial horseshoe prior (SHP)

To extend the horseshoe prior to the spatial setting, define Y (s) as the real-valued response at a spatial location s = (s₁,s₂) E ℝ² and β(s) as the true signal at s. For data observed at locations s₁,..., s_n, define Y = [Y (s₁), ..., Y (s_n)]^T as the n × 1 response vector and β = [β(s₁), ..., β(s_n)]^T as the signal vector. Extending (1), let

Y | β ~ N (β, σ^{2} I_{n}), β | λ ~ N (0, Λ Σ_{β} Λ),

(3)

where σ² is the error variance, I_n is the identity matrix of size n, Λ is diagonal with the diagonal elements λ = [λ(s₁), ..., λ(s_n)]^T, and Σ_β is a spatial covariance matrix. Independence of the response Y given the signals β is a strong assumption, but justified for the 2-D XRD data where the distribution of photons should theoretically follow a Poisson process with independent counts across pixels. Physics dictates that the photons should follow a Poisson process so that the number of photons that land in disjoint regions are independent of each other. The photon detectors are identical and operate independently and so this assumption should carry through to the observed data.

Spatial shrinkage is induced by the spatial process model for λ(s). We propose a Gaussian copula model (Nelsen, 2006) that preserves the marginal half-Cauchy distribution and captures spatial dependence,

λ (s) = λ_{0} l [θ (s)],

(4)

where λ₀ is a global scale parameter, θ(s) is a latent spatial Gaussian process with mean zero and variance one for all s, $l (\cdot) = F_{C +}^{- 1} [Φ (\cdot)]$ is the half-Cauchy link function, $F_{C +}^{- 1} (\cdot)$ is the inverse cumulative density function of the half-Cauchy distribution, and Φ(·) is the standard normal cumulative distribution function. The half-Cauchy link function provides a marginal half-Cauchy distribution for l[θ(s)] if E[θ(s)] = 0 and Var[θ(s)] = 1. We interpret l[θ(s)] as a local shrinkage parameter and λ₀ as a global shrinkage parameter, respectively.

The latent shrinkage process θ(s) could follow another Gaussian process prior with mean zero, variance one and spatial correlation. We also consider a low-rank representation of the latent shrinkage process

θ (s) = \sum_{j = 1}^{J} \frac{x_{j} (s)}{\sqrt{\sum_{j = 1}^{J} x_{j}^{2} (s)}} b_{j},

(5)

where x_j(s) is the j^th basis function for location s and $b_{j} \overset{i i d}{~} N (0, 1)$ . The spatial model gives the same marginal distribution as (1), but with spatial dependence both in the distribution for β | λ and the distribution for λ. We will elaborate on the properties of this model in Section 4.

4. Theoretical properties

4.1. Properties of the spatial horseshoe prior

In this section we study the properties of the SHP for a pair of spatial locations as a function of the distance between the locations. We compare the SHP with the simple model with the independent β and λ, namely, $Σ_{β} = I_{2}$ and $λ_{1}, λ_{2} \overset{i i d}{~} C^{+} (0, 1)$ . The joint density of β in this special case is (Carvalho et al., 2010)

P_{0} (β_{1}, β_{2}) = \frac{1}{2 π^{3}} e^{\frac{1}{2} (β_{1}^{2} + β_{2}^{2})} E_{1} (\frac{β_{1}^{2}}{2}) E_{1} (\frac{β_{2}^{2}}{2}),

(6)

where E₁ (·) is the exponential integral function. Next, we consider the joint shrinkage parameter λ = λ(s₁) =λ(s₂) ~ C⁺(0,1) and the covariance Σ_β with the diagonal elements 1 and the spatial correlation ρ_β which stands in for the distance between the two locations. Then we obtain (Supplementary Materials B.1) the joint density

P_{1} (β_{1}, β_{2}) = 2^{- \frac{1}{2}} π^{- \frac{3}{2}} {(1 - ρ_{β}^{2})}^{\frac{1}{2}} {(β_{1}^{2} - 2 ρ_{β} β_{1} β_{2} + β_{2}^{2})}^{- \frac{1}{2}} - {(2 π)}^{- 1} {(1 - ρ_{β}^{2})}^{\frac{1}{2}} e^{\frac{1}{2} (β_{1}^{2} - 2 ρ_{β} β_{1} β_{2} + β_{2}^{2})} e r f c (\sqrt{\frac{β_{1}^{2} - 2 ρ_{β} β_{1} β_{2} + β_{2}^{2}}{2}}),

(7)

where erfc(·) is the complementary error function.

Figure 2 plots log[P₀(β₁,β₂)] and log[P₁(β₁, β₂)] for ρ_β = 0, 0.5, 0.9. When ρ_β = 0, the log density with a common shrinkage parameter has higher spike around zero and heavier tail than the log density with independent shrinkage parameters. Increasing the ρ_β gives the higher concentration on the 45-degree line of the log density.

4.1.1. Concentration around zero

The densities P₀(β₁,β₂) and P₁(β₁,β₂) both have high concentration towards zero. It can be shown that $\lim_{β_{1}, β_{2} \to 0} P_{0} (β_{1}, β_{2}) = \infty$ and $\lim_{β_{1}, β_{2} \to 0} P_{0} (β_{1}, β_{2}) = \infty$ . Moreover, we show that the density with joint shrinkage concentrates towards zero more than the density with independent shrinkage by proving that the ratio of P₁(β₁,β₂) to P₀(β₁, β₂) diverges as β₁ and β₂ approach zero. Let R be the ratio,

R = \frac{P_{1} (β_{1}, β_{2})}{P_{0} (β_{1}, β_{2})} = π^{\frac{3}{2}} e^{- \frac{1}{2} (β_{1}^{2} + β_{2}^{2})} {(\frac{β_{1}^{2} + β_{2}^{2}}{2})}^{- \frac{1}{2}} {[E_{1} (\frac{β_{1}^{2}}{2}) E_{1} (\frac{β_{2}^{2}}{2})]}^{- 1} - π^{2} \frac{e r f c (\sqrt{\frac{β_{1}^{2} + β_{2}^{2}}{2}})}{E_{1} (\frac{β_{1}^{2}}{2}) E_{1} (\frac{β_{2}^{2}}{2})} .

(8)

We show that $\lim_{β_{1}, β_{2} \to 0} R = \infty$ with the divergence rate ${(\sqrt{β_{1}^{2} + β_{2}^{2}} \log β_{1} \log β_{2})}^{- 1}$ (Supplementary Materials B.2). This indicates that joint sparsity is achieved more rapidly in P1 (β1, β2) than in Po(β1, β2).

4.1.2. Tail behavior

We also investigate the capability to retain signals of large values through examining the tail densities with joint and independent shrinkage. In Figure 2, we observe that the tail density is lower in P₀(β₁,β₂) than P₁(β₁,β₂) and becomes larger when the spatial dependence is stronger. The higher tail density implies a better capability to avoid excessive shrinkage. Theoretically, it is shown that $\lim_{β_{1}, β_{2} \to \infty} P_{0} (β_{1}, β_{2}) = \infty$ with the divergence rate $\frac{β_{1}^{2} β_{2}^{2}}{\sqrt{β_{1}^{2} + β_{2}^{2}}}$ (Supplementary Materials B.3). This result suggests the higher tail density with joint shrinkage and o its stronger power to incorporate large signals.

4.2. Properties of the spatial horseshoe posterior

Let the error variance σ² be 1, then the posterior mean can be expressed as E(β | Y, λ) = (I₂ — K)Y, where the shrinkage matrix $K = I_{2} - {(I_{2} + Ω_{β}^{- 1})}^{- 1}$ and Ω_β = Λ Σ_β Λ. To examine the joint shrinkage induced by the SHP, we consider the one-number complexity summary (Efron, 2004)

df = t r (I - K) = \frac{2 (1 - ρ_{β}^{2}) λ_{1}^{2} λ_{2}^{2} + λ_{1}^{2} + λ_{2}^{2}}{(1 - ρ_{β}^{2}) λ_{1}^{2} λ_{2}^{2} + λ_{1}^{2} + λ_{2}^{2} + 1} .

(9)

If λ₁ = λ₂ = λ, then

df = \frac{2 (1 - ρ_{β}^{2}) λ^{4} + 2 λ^{2}}{(1 - ρ_{β}^{2}) λ^{4} + 2 λ^{2} + 1} .

(10)

In this special case with $ρ_{β} = 0, df = \frac{2 λ^{2}}{λ^{2} + 1}$ , and thus its prior distribution is under the SHP resembles the horseshoe $B e t a (\frac{1}{2}, \frac{1}{2})$ density scaled by 2. Figure 3 plots the density of df. The SHP with common shrinkage has spikes at zero and two corresponding to joint shrinkage. This property encourages noises to be shrunk to zero and signals to be retained at the original values. The density of df with independent λ has a spike around one which corresponds to shrinking only one of β₁ and β₂, instead of shrinkage or non-shrinkage for both variables. This characteristic even holds when ρ_β = 0.9 which is contradictory to the expectation that a high correlation between two variables leads to similar shrinkage behaviors.

Figure 3: — Induced prior density plot of the complexity measure df = tr(I — K) with independent and common shrinkage with ρ_β ∈ {0, 0.5,0.9}.

5. Simulation Study

We investigate how the SHP performs in detecting sparse signals compared with other methods using a simulation study.

5.1. Data generation

We consider the two true surfaces:

1. β_{0} (s) = 6 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 1] + 5 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 1 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 4] + 4 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 4 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 16] + 3 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 16 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 36]

and

2. β_{0} (s) = 6 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 1] + 5 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 1 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 4] + 4 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 4 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 16] + 3 I [{(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} > 16 and {(s_{1} - 10)}^{2} + {(s_{2} - 12)}^{2} \leq 36] + 4 I [{(s_{1} - 28)}^{2} + {(s_{2} - 30)}^{2} \leq 4] + 3 I [{(s_{1} - 28)}^{2} + {(s_{2} - 30)}^{2} > 4 and {(s_{1} - 28)}^{2} + {(s_{2} - 30)}^{2} \leq 16] .

The observations are generated with $Y (s) | β_{0} (s) \overset{i n d e p}{~} N (β_{0} (s), σ^{2})$ on a 40 × 40 grid of n = 1, 600 locations. The proportion of signals with nonzero β₀(s) is 7.06% and 10.13% in the two surfaces. In addition, we vary the error variance σ² = 0.5²,1², 2². For each combination of true surface and σ², we generate N = 100 data sets. Figure 4 shows the two surfaces and a representative simulated data set for each surface.

Figure 4: — True surface β₀(s) (left) and a simulated data set Y(s) with error standard deviation σ = 0.5 (middle) and σ = 2 (right).

5.2. Models

Although the SHP can accommodate other covariance structures, we use the conditionally autoregressive prior (CAR, Carlin and Banerjee, 2003) for Σ_β. The CAR covariance is Σ_β = (M − ρ_βA) ⁻¹, where M is the diagonal matrix with the elements m₁, ...,m_n indicating the number of neighbors for locations s₁, ..., s_n, ρ_β is the spatial dependence parameter, and A is the adjacency matrix with A_ij = 1 if locations s_i and s_j are neighbors and A_ij = 0 otherwise. Note that while ρ_β ∈ (0,1) determines the strength of spatial dependence, it is not a correlation parameter as in Section 4. The CAR prior is a natural choice for data on a discrete grid as discussed in Cressie (2013) and Gelfand (2010). This is an intuitive prior specification and has the advantage of a sparse inverse covariance matrix which leads to a dramatic computational savings. For our analysis of over a million pixels, a prior that did not give a sparse inverse covariance would require approximation and/or prohibitively long run times. Computational details are given in Supplementary Materials C.

We fit five models for β(s) to each simulated dataset. The first four are versions of the spatial horseshoe differentiated by their flexibility in modeling the shrinkage process λ(s). The fifth method is soft-thresholded Gaussian process (STGP) model of Kang et al. (2018) that places prior mass exactly at zero.

Gaussian:

λ(s) as a constant across space, λ(s) = λo which leads to $β | λ_{0} ~ N (0, λ_{0}^{2} {(M - ρ_{β} A)}^{- 1})$ The error variance σ² and the scale parameter $λ_{0}^{2}$ follow uninformative inverse gamma prior IG(0.1, 0.1), and the spatial dependence parameter ρ_β follows a beta prior Beta(10,1). This is the usual Gaussian CAR model.

SHS_quad:

Low-rank representation using the half-Cauchy link function with quadratic basis expansions, $λ (s_{i}) = λ_{0} l [θ (s_{i})]$ , where l[·] is the half-Cauchy link function, $θ (s) = \sum_{j = 1}^{J} \frac{x_{j} (s)}{\sqrt{\sum_{j = 1}^{J} x_{j}^{2} (s)}} b_{j}$ , $priors b_{j} \overset{i i d}{~} N (0, 1)$ , $x_{1} (s_{i}) = 1, x_{2} (s_{i}) = s_{i 1}, x_{3} (s_{i}) = s_{i 2}, x_{4} (s_{i}) = s_{i 1}^{2}, x_{5} (s_{i}) = s_{i 2}^{2}$ and x₆(s_i) = s_i1s_i2 for i = 1, ..., n.

SHS_B-spline 1:

Low-rank representation using the half-Cauchy link function with the B-spline basis expansions with five degrees of freedom for each coordinate, λ(s_i) = λ₀l[θ(s_i)], where l[·] is the half-Cauchy link function, $θ (s_{i}) = \sum_{j = 1}^{100} x_{j} (s_{i}) b_{j}$ , $priors b_{j} \overset{i i d}{~} N (0, 1)$ and x_j (s_i) is the j^th product of B-spline bases at location s_i.

SHS_B-spline 2:

Same structure as the model SHS_B-spline 1, but with ten degrees of freedom for each coordinate.

STGP:

Soft-thresholded Gaussian process prior for β(s). Let $\tilde{β} (s)$ follow a multivariate normal distribution with zero mean and CAR covariance with first-order neighborhood. Then $β (s) = g_{κ} [\tilde{β} (s)]$ , where g_K is the soft-thresholding function to map $\tilde{β} (s)$ near zero to exact zero and thus gives a sparse prior,

g_{κ} (x) = {\begin{array}{l} 0 & if | x | \leq κ \\ s i g n (x) (| x | - κ) & if | x | > κ \end{array} .

The thresholding parameter κ follows a uniform prior U(0,10) and controls the degree of sparsity.

5.3. Evaluation metrics

We compare methods in terms of root mean squared error (RMSE), coverage probability, Type I error and power. For simulated data set k, let ${\hat{β}}_{k} (s)$ be the posterior mean and [l_k(s),u_k(s)] be the 95% credible interval of β(s). We list the formulae of the measures as follows:

R M S E = \sqrt{\frac{1}{N n} \sum_{k = 1}^{N} \sum_{i = 1}^{n} {[{\hat{β}}_{k} (s_{i}) - β_{0} (s_{i})]}^{2}}

Coverage = \frac{1}{N n} \sum_{k = 1}^{N} \sum_{i = 1}^{n} I {β_{0} (s_{i}) \in [l_{k} (s_{i}), u_{k} (s_{i})]}

Type I error = \frac{1}{N} \sum_{k = 1}^{N} \frac{\sum_{i = 1}^{n} I {P [{\hat{β}}_{k} (s_{i}) > 0] > 0.95} \cdot I [β_{0} (s_{i}) = 0]}{\sum_{i = 1}^{n} I [β_{0} (s_{i}) = 0]}

Power = \frac{1}{N} \sum_{k = 1}^{N} \frac{\sum_{i = 1}^{n} I {P [{\hat{β}}_{k} (s_{i}) > 0] > 0.95} \cdot I [β_{0} (s_{i}) \neq 0]}{\sum_{i = 1}^{n} I [β_{0} (s_{i}) \neq 0]} .

5.4. Results

The results are given in Tables 1 and 2. RMSE decreases as the number of basis functions in the spatial shrinkage process λ(s) increases since a larger number of basis functions enhances the flexibility in λ(s). RMSE in the SHS_B-spline 2 model drops 34.06% compared with RMSE in the Gaussian model under the low-noise case in Signal 2. SHS_B-spline models have the smallest RMSE in the low-noise and mid-noise cases while STGP outerperforms other models in RMSE under the other scenario. Coverage and type I error are at or near the nominal level for all methods except for the Gaussian model in the low-noise cases. All models have strong power when the error variance is small. However, large error variance distinguishes the models. STGP model has about 66% the power of SHS models. The loss of power in STGP model results from excessive shrinkage from the soft-thresholding formulation, especially in the large-noise scenario.

Table 1:

Summary of the simulation study under Signal 1 by error variance σ² for the Gaussian, spatial horseshoe with quadratic (SHS_quad), spline 1 (SHS_B-spline 1) and spline 2 (SHS_B-spline 2) shrinkage models and the soft-thresholded Gaussian process (STGP) model.

		σ² = 0.5²		σ² = 1²		σ² = 2²
Statistics	Model	Estimate	SE	Estimate	SE	Estimate	SE
100×RMSE	Gaussian	33.08	0.11	42.34	0.17	52.21	0.26
	SHS_quad	22.32	0.07	31.92	0.14	43.84	0.28
	SHS_B-spline 1	19.67	0.13	28.97	0.21	42.32	0.39
	SHS_B-spline 2	21.31	0.12	31.19	0.14	44.31	0.34
	STGP	24.05	0.22	30.75	0.45	39.76	0.33

Coverage (%)	Gaussian	91.67	0.17	97.46	0.05	98.66	0.05
	SHS_quad	98.09	0.04	98.68	0.03	99.28	0.03
	SHS_B-spline 1	97.91	0.51	97.78	0.68	98.61	0.60
	SHS_B-spline 2	97.61	0.36	97.31	0.08	98.22	0.13
	STGP	98.15	0.07	99.48	0.03	99.62	0.02

Type I error (%)	Gaussian	7.25	0.14	2.60	0.06	1.18	0.05
	SHS_quad	1.79	0.05	1.27	0.04	0.79	0.04
	SHS_B-spline 1	1.73	0.40	1.91	0.55	1.06	0.19
	SHS_B-spline 2	1.96	0.23	2.27	0.09	1.44	0.15
	STGP	0.56	0.03	0.15	0.02	0.01	0.00

Power (%)	Gaussian	100.00	0.00	99.97	0.02	98.00	0.14
	SHS_quad	100.00	0.00	99.86	0.03	95.36	0.25
	SHS_B-spline 1	100.00	0.00	99.86	0.03	95.65	0.25
	SHS_B-spline 2	100.00	0.00	99.94	0.02	96.60	0.27
	STGP	100.00	0.00	98.48	0.15	67.85	0.62

Open in a new tab

Table 2:

Summary of the simulation study under Signal 2 by error variance σ² for the Gaussian, spatial horseshoe with quadratic (SHS_quad), spline 1 (SHS_B-spline 1) and spline 2 (SHS_B-spline 2) shrinkage models and the soft-thresholded Gaussian process (STGP) model.

		σ² = 0.5²		σ² = 1²		σ² = 2²
Statistics	Model	Estimate	SE	Estimate	SE	Estimate	SE
100×RMSE	Gaussian	37.04	0.12	49.21	0.18	60.98	0.27
	SHS_quad	27.48	0.08	39.66	0.15	53.13	0.27
	SHS_B-spline 1	24.87	0.09	36.78	0.22	52.65	0.42
	SHS_B-spline 2	24.41	0.10	36.66	0.13	52.53	0.36
	STGP	27.03	0.20	43.07	0.46	49.55	0.37

Coverage (%)	Gaussian	88.09	0.24	96.47	0.07	98.59	0.05
	SHS_quad	96.74	0.08	97.74	0.06	98.76	0.05
	SHS_B-spline 1	97.33	0.11	97.75	0.22	98.29	0.40
	SHS_B-spline 2	97.40	0.12	96.87	0.07	98.13	0.08
	STGP	98.15	0.07	98.42	0.07	99.31	0.03

Type I error (%)	Gaussian	9.79	0.17	3.67	0.08	1.39	0.05
	SHS_quad	2.87	0.08	2.07	0.07	1.08	0.05
	SHS_B-spline 1	2.40	0.09	1.85	0.14	1.45	0.35
	SHS_B-spline 2	2.19	0.09	2.53	0.09	1.36	0.08
	STGP	0.56	0.03	0.65	0.05	0.03	0.01

Power (%)	Gaussian	100.00	0.00	99.90	0.02	94.93	0.21
	SHS_quad	100.00	0.00	99.79	0.04	92.69	0.27
	SHS_B-spline 1	100.00	0.00	99.66	0.05	92.43	0.32
	SHS_B-spline 2	100.00	0.00	99.79	0.03	92.65	0.33
	STGP	100.00	0.00	98.91	0.09	64.36	0.56

Open in a new tab

Figures 5 and 6 illustrate the simulation results under Signal 2 with σ = 0.5 and 2, respectively. Increasing the flexibility of the shrinkage process improves signal identification and leads to a smoother signal surface. The Gaussian model tends to have more false positives, especially when the error variance is small. This explains why it has strong power but poor RMSE. The simulation plots under other scenarios including data generated with spatially-correlated errors are shown in Supplementary Materials D. For the computing time, analyzing a simulated data set takes 0.33, 0.87, 1.23, 3.40 and 10.29 minutes for Gaussian, SHS_quad, SHS_B-spline 1, SHS_B-spline 2 and STGP models, respectively.

Figure 5: — Simulation results for Signal 2 with σ = 0.5 for the four models (rows): the first row shows the true signal β₀(s) and results for the soft-thresholded Gaussian process (STGP) model, the second row shows the simulated data Y(s) and results for the Gaussian model, the third and fourth rows show results for spatial horseshoe with quadratic (SHS_quad) and spline (SHS_B-spline 2) shrinkage models.

Figure 6: — Simulation results for Signal 2 with σ = 2 for the four models (rows): the first row shows the true signal β₀(s) and results for the soft-thresholded Gaussian process (STGP) model, the second row shows the simulated data Y(s) and results for the Gaussian model, the third and fourth rows show results for spatial horseshoe with quadratic (SHS_quad) and spline (SHS_B-spline 2) shrinkage models.

6. Analysis of the 2-D XRD data

6.1. Model comparisons

In this section we apply the proposed model in Section 3 to the 2-D XRD data. We tailor the SHP model to the 2-D XRD data to capture the ring-shaped pattern visible in Figure 1. Rather than using the basis expansions of square coordinates in the simulation study, we consider a basis expansion of the radius, r ≥ 0, and the angle, a ∈ [0, 2π), from the central point. In the simulation study the true features were defined on disks and thus we used radial basis functions for the smoothing process, λ(s). As is apparent in Figure 1, the features in the XRD data are annuli, and we thus choose basis functions define in polar coordinates to capture the shape of these features. We use Fourier and B-spline basis expansions for angle and radius, respectively. $A_{1} (a), \dots, A_{k_{a}} (a)$ be the Fourier basis functions and $B_{1} (r), \dots, B_{k_{r}} (r)$ be the B-spline basis functions, where k_a and k_r are the numbers of basis functions in the angle and the radius, respectively. The Fourier basis functions are A₁(a) = sin(a),A₂(a) = cos(a),A₃(a) = sin (2a), and A₄(a) = cos (2a), etc. At pixel s, the basis functions consist of J = k_a × k_r products of Fourier and B-spline basis expansions, i.e. $x (s) = {[x_{1} (s), \dots, x_{J} (s)]}^{T} = {A_{1} [a (s)] \times B_{1} [r (s)], \dots, A_{k_{a}} [a (s)] \times B_{k_{r}} [r (s)]}^{T}$ . The priors and computing details remain the same as in Section 5.

We implement five-fold cross-validation to evaluate the prediction in models with varying flexibility of shrinkage across space. We consider (k_a, k_r) = (4,10), (4, 25), (4,50), (4,100), (8, 5) and (8, 50). We do not include STGP in the model comparisons because of the heavy computation. Table 3 presents the results as RMSE–1, since for the z-scores with independent standard normal error, one is the lowest achievable prediction RMSE. Generally, predicted RMSE declines as flexibility of shrinkage grows, similar to the simulation results shown in Section 5. However, a larger number of basis functions of angle tends to have a smaller prediction error when the total number of basis expansions is the same, e.g. SHS1 versus SHS5 and SHS3 versus SHS6. According to Wilcoxon signed rank test, RMSE of the models SHS1-SHS6 is significantly less than RMSE of the Gaussian model. Coverage is close to the nominal level 95% for all models.

Table 3:

Comparison of prediction on the 2-D XRD data using the Gaussian and spatial horseshoe (SHS) models with k_a basis functions in the angle and k_r basis functions in the radius (SHS1-SHS6). Methods are compared using 100×(root mean squared error-1) as RMSE, coverage (%) and computing time per 100 iterations in minutes.

Model	k_a	k_r	RMSE		Coverage (%)		Computing time (mins)
Model	k_a	k_r	Estimate	SE	Estimate	SE	Computing time (mins)
Gaussian	-	-	3.64	0.08	94.72	0.02	2.78
SHS1	4	10	3.59	0.08	94.78	0.02	6.57
SHS2	4	25	3.57	0.08	94.79	0.02	9.91
SHS3	4	50	3.55	0.08	94.80	0.03	17.45
SHS4	4	100	3.54	0.08	94.80	0.02	41.61
SHS5	8	5	3.56	0.08	94.80	0.02	7.38
SHS6	8	25	3.50	0.09	94.84	0.02	8.62

Open in a new tab

6.2. Summary of fitted models

In this section we fit the Gaussian and SHS models as above to the full dataset. Because the number of pixels is large, we implement the Bayesian spatial false discovery rate (BSFDR) procedure (Sun et al., 2015) with rate 0.05 to control for multiple testing. We consider the one-sided null and alternative hypotheses H₀ : β(s) ≤ 0 and H₁ : β(s) > 0, respectively. The BSFDR procedure provides a critical probability that the null hypothesis is rejected if the posterior probability is greater than the critical probability. The critical probabilities are 92.93% for all models, except for 91.92% for the SHS4 model. The number of pixels for which H_0s is rejected are also similar for the Gaussian and SHS models, from 1.42% to 2.02%.

Figure 7 displays the fitted results for the Gaussian and SHS6 (with the lowest RMSE) models. The posterior mean of the shrinkage parameter λ(s) (Figure 7, bottom left) is large for only a few radii and for only a subset of the angles on the rings. These areas also have the largest posterior mean of β(s) and posterior probability that β(s) is positive. The map of difference in posterior probability (Figure 7, top right) exhibits that the model SHS6 encourages shrinkage along the rings. Comparing models, we find more significant pixels and stronger spatial clustering of the signal in the SHS model compared to the Gaussian model. In addition, Figure 8 illustrates that the density of posterior mean β(s) from the SHP model (SHS6) has higher concentration around zero and heavier tails than the density of β(s) from the Gaussian model.

Figure 8: — Density plot (left) and quantile-quantile plot (right) of the posterior mean of signal β(s) from the Gaussian and SHS6 models.

Figure 7 is an analysis of one time point (t = 20). We investigate the temporal trend of i signals by fitting the model SHS6 for the standardized intensity Y(s) at all time points. We. use the sum of squared signals, $D = \sum_{i = 1}^{n} {[β (s_{i})]}^{2}$ , as a measure of strength of structural change of the PZT sample. Figure 9 displays the boxplots of the posterior of D and the electric field by time. Generally, the structural change becomes larger when the magnitude of electric field increases. The measure D is smaller in the first half of experimental period under a positive charge compared to the second half of period under a negative charge. The reason for the rapid increase in D between 54 and 58 seconds is a material phenomenon known as ferroelectric switching; the electric field amplitude at this time exceeds that required for substantial reorientation of material elements, leading to a significant change in the diffraction pattern. The example of this method on ferroelectrics demonstrates the versatility of the method at efficiently assessing 2-D XRD images for the purposes of materials characterization.

Figure 9: — Time series plot of the posterior of $D = \sum_{i = 1}^{n} {[β (s_{i})]}^{2}$ (boxplots) from the model SHS6 applied separately for a bipolar electric field from −2 kV/mm to 2 kV/mm (line).

7. Conclusion

In this paper, we propose a new method for sparse signal defection with application to the 2-D XRD imaging data. To our knowledge, the SHP is the first continuous shrinkage prior for spatial data. Theoretically, we prove that the SHP has the univariate horseshoe prior marginally at each location, and that the induced joint distribution for pairs of nearby sites has higher concentration around zero and heavier tails compared to independent horseshoe priors. Further, we facilitate the computation via the HMC algorithm (Neal, 1994) for high-dimensional data. The simulation and empirical results both show the improvement of estimation and prediction when we consider spatial dependence.

A potential limitation of our method is that the covariance structure of data is assumed to be independent given the mean vector β. This is sufficient for the 2-D XRD imaging data because the distribution of photons ought to follow a Poisson process with independent counts across pixels, but it may not hold for different data. Our simulation results (Supplemental Materials D) show that for data with moderate spatial correlation our working independence model maintained reasonable error rates but had lower power than our previous simulation with independent errors. For simulated data with strong residual spatial correlation, our model with working independent assumption did not work well. The shortcomings of the working independence model led us to consider a richer model that includes spatial correlation in the residuals. A second restriction is the assumption of CAR covariance structure with the first order neighbors. The covariance structure can be modified to meet needs in other situations. For instance, the Matérn covariance (Stein, 1999) is a covariance function depending on the distance between two given locations, a smooth parameter and a range parameter. The general class of covariance functions includes the exponential covariance and the squared exponential covariance. The flexibility may be preferred and computationally feasible for smaller datasets.

Deep learning via convolutional neural networks (CNN) have emerged as immensely powerful tools for extracting information from images (Goodfellow et al., 2016; Rawat and Wang, 2017; Yamashita et al., 2018). The standard CNN is a supervised learning algorithm that uses an image predictor. However, in our setting we do not have known labels (e.g., images with locations that are known to have a change) to train the model. Also, while CNNs are strong for prediction, they are weak for formal statistical inference which is our primary objective.

In the future, the proposed method can be extended to Bayesian variable selection in spatial linear regression. In spatial regression, the covariates effect can vary by site (Gelfand et al., 2003), and the SHP for the spaitally-varying effects would encourage sparsity in these effects. We can also extend the SHP to a spatio-temporal horseshoe model for longitudinal data. We can use the extended model to test for spatio-temporal anomalies in a surveillance study (e.g. Li et al., 2012) or process monitoring (e.g. Yan et al., 2018).

Supplementary Material

Supp1

NIHMS1520715-supplement-Supp1.pdf^{(9.9MB, pdf)}

Supp2

NIHMS1520715-supplement-Supp2.pdf^{(113.3KB, pdf)}

Acknowledgements

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The authors thank the editor, associate editor, and two anonymous referees for insightful and constructive comments on an earlier version of the manuscript.

References

Acharya UR, Fujita H, Sudarshan VK, Mookiah MRK, Koh JEW, Tan JH, Hagiwara Y, Chua CK, Junnarkar SP, Vijayananthan A, and Ng KH (2016). An integrated index for identification of fatty liver disease using radon transform and discrete cosine transform features in ultrasound images. Information Fusion, 31:43–53. [Google Scholar]
Armagan A, Dunson DB, and Lee J (2013). Generalized double Pareto shrinkage. Statistica Sinica, 23(1):119–143. [PMC free article] [PubMed] [Google Scholar]
Bhadra A, Datta J, Li Y, Polson NG, and Willard B (2016a). Prediction risk for the horseshoe regression. ArXiv e-prints.
Bhadra A, Datta J, Polson NG, and Willard B (2016b). Default Bayesian analysis with global-local shrinkage priors. Biometrika, 103(4):955–969. [Google Scholar]
Bhattacharya A, Pati D, Pillai NS, and Dunson DB (2014). Dirichlet-laplace priors for optimal shrinkage. ArXiv e-prints. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015). Spatial variable selection methods for investigating acute health effects of fine particulate matter components. Biometrics, 71(1):167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bradley D and Roth G (2007). Adaptive thresholding using the integral image. Journal of Graphics Tools, 12(2):13–21. [Google Scholar]
Carlin PB and Banerjee S (2003). Hierarchical multivariate CAR models for spatio- temporally correlated survival data. Bayesian Statistics, 7:45–63. [Google Scholar]
Carretero-Moya J, Gismero-Menoyo J, Asensio-López A, and Blanco-del Campo A (2009). Application of the radon transform to detect small-targets in sea clutter. IET Radar, Sonar & Navigation, 3(2):155–166. [Google Scholar]
Carvalho CM, Polson NG, and Scott JG (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480. [Google Scholar]
Cho TS, Paris S, Horn BKP, and Freeman WT (2011). Blur kernel estimation using the Radon transform. In CVPR 2011, pages 241–248. [Google Scholar]
Costa MA and Kulldorff M (2009). Applications of spatial scan statistics: a review In Glaz J, Pozdnyakov V, and Wallenstein S, editors, Scan Statistics: Methods and Applications, pages 129–152. Birkhäuser Boston, Boston, MA. [Google Scholar]
Cressie NAC (2013). Statistics for Spatial Data. Wiley, New York, NY. [Google Scholar]
Datta J and Ghosh JK (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Analysis, 8(1):111–132. [Google Scholar]
Deepak KS and Sivaswamy J (2011). Automatic assessment of macular edema from color retinal images. IEEE Transations on Medical Imaging, 31(3):766–776. [DOI] [PubMed] [Google Scholar]
Donoho DL and Johnstone JM (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455. [Google Scholar]
Dutta I and Singh RN (2011). Dynamic in situ x-ray diffraction study of antiferroelectric-ferroelectric phase transition in strontium-modified lead zirconate titanate ceramics. Integrated Ferroelectrics, 131(1):153–172. [Google Scholar]
Editorial (2014). Crystallography matters. Nature Materials, 13. [DOI] [PubMed] [Google Scholar]
Efron B (2004). The estimation of prediction error: covariance penalties and cross-validation. Journal of the American Statistical Association, 99(467):619–632. [Google Scholar]
Esteves G, Fancher CM, and Jones JL (2015). In situ characterization of polycrystalline ferroelectrics using x-ray and neutron diffraction. Journal of Materials Research, 30(3):340–356. [Google Scholar]
Friedman J, Hastie T, Höfling H, and Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332. [Google Scholar]
Friedman J, Hastie T, and Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelfand AE, Kim H-J, Sirmans CF, and Banerjee S (2003). Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98(462):387–396. [Google Scholar]
Gelfand AE, D. P. G. P. F. M. (2010). Handbook of Spatial Statistics. CRC Press, Boca Raton, FL. [Google Scholar]
George EI and McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889. [Google Scholar]
George EI and McCulloch RE (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7(2):339–374. [Google Scholar]
Geweke J (1996). Variable selection and model comparison in regression. Bayesian Statistics, 5:609–620. [Google Scholar]
Goldsmith J, Huang L, and Crainiceanu CM (2014). Smooth scalar-on-image regression via spatial Bayesian variable selection. Journal of Computational and Graphical Statistics, 23(1):46–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodfellow I, Bengio Y, and Courville A (2016). Deep Learning. MIT Press; http://www.deeplearningbook.org. [Google Scholar]
Gorfman S, Keeble DS, Glazer AM, Long X, Xie Y, Ye Z-G, Collins S, and Thomas PA (2011). High-resolution x-ray diffraction study of single crystals of lead zirconate titanate. Physical Review B, 84:020102. [Google Scholar]
Griffin JE and Brown PJ (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1):171–188. [Google Scholar]
He L and Wang Y (2017). Wavelet frame based image restoration using sparsity, nonlocal and support prior of frame coefficients. The Visual Computer. [Google Scholar]
Jansen M (2001). Noise Reduction by Wavelet Thresholding. Springer, New York, NY. [Google Scholar]
Johnson VE and Rossell D (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 107(498):649–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang J, Reich BJ, and Staicu A-M (2018). Scalar-on-image regression via the soft- thresholded Gaussian process. Biometrika, 105(1):165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katrusiak A (2008). High-pressure crystallography. Acta Crystallographica Section A, 64(1):135–148. [DOI] [PubMed] [Google Scholar]
Kinoshita SK, Azevedo-Marques PM, Pereira RR Jr, Rodrigues AH, and Ran- gayyan RM (2008). Radon-domain detection of the nipple and the pectoral muscle in mammograms. Journal of Digital Imaging, 21(1):37–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kulldorff M (1997). A spatial scan statistic. Communications in Statistics - Theory and Methods, 26(6):1481–1496. [Google Scholar]
Li F, Zhang T, Wang Q, Gonzalez MZ, Maresh EL, and Coan JA (2015). Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression. The Annals of Applied Statistics, 9(2):687–713. [Google Scholar]
Li G, Best N, Hansell AL, Ahmed I, and Richardson S (2012). BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics, 13(4):695–710. [DOI] [PubMed] [Google Scholar]
Liao Y, Li X, and Wang J (2017). The study on modified spatial scan statistic In Yang W, editor, Early Warning for Infectious Disease Outbreak, pages 329–342. Academic Press, Cambridge, MA. [Google Scholar]
Mallick H and Yi N (2013). Bayesian methods for high dimensional linear models. Journal of Biometrics and Biostatistics, [DOI] [PMC free article] [PubMed] [Google Scholar]
Mitchell TJ and Beauchamp JJ (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032. [Google Scholar]
Naus JI (1965). The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association, 60(310):532–538. [Google Scholar]
Neal RM (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics, 111(1):194–203. [Google Scholar]
Nelsen RB (2006). An Introduction to Copulas. Springer, New York, NY. [Google Scholar]
O’hara RB and Sillapää MJ (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian Analysis, 4(1):85–118. [Google Scholar]
Piironen J and Vehtari A (2016). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. ArXiv e-prints.
Qiu P and Yandell B (1997). Jump detection in regression surfaces. Journal of Computational and Graphical Statistics, 6(3):332–354. [Google Scholar]
Radon J (1986). On the determination of functions from their integral values along certain manifolds (Parks PC Trans.). IEEE-Transactions on Medical Imaging, MI-5(4):170–176. [DOI] [PubMed] [Google Scholar]
Rawat W and Wang Z (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29(9):2352–2449. [DOI] [PubMed] [Google Scholar]
Ročková V and George EI (2018). The spike-and-slab lasso. Journal of the American Statistical Association, 113(521):431–444. [Google Scholar]
Sollie P (2013). Morphological Image Analysis: Principles and Applications. Springer Science & Business Media, Berlin, Germany. [Google Scholar]
Stein ML (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, NY. [Google Scholar]
Sun W, Reich BJ, Tony Cai T, Guindani M, and Schwartzman A (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1):59–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
Taswell C (2000). The what, how, and why of wavelet shrinkage denoising. Computing in Science Engineering, 2(3):12–19. [Google Scholar]
Thomas JM (2012). The birth of x-ray crystallography. Nature, 491:186–187. [DOI] [PubMed] [Google Scholar]
van der Pas S, Szabó B, and van der Vaart A (2017). Uncertainty quantification for the horseshoe. Bayesian Analysis, 12(4):1221–1274. [Google Scholar]
van der Pas SL, Kleijn BJK, and van der Vaart AW (2014). The horseshoe estimator: posterior concentration around nearly black vectors. Electronic Journal of Statistics, 8(2):2585–2618. [Google Scholar]
Xu J, Yu J, Peng Y, and Xia X (2011). Radon-Fourier transform for radar target detection, I: Generalized Doppler filter bank. IEEE Transactions on Aerospace and Electronic Systems, 47(2):1186–1202. [Google Scholar]
Yadav M, Yadav S, and Sharma D (2014). Image denoising using orthonormal wavelet transform with stein unbiased risk estimator. In Electrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students’ Conference, pages 1–4, Bhopal, India. [Google Scholar]
Yamashita R, Nishio M, Do RKG, and Togashi K (2018). Convolutional neural networks: an overview and application in radiology. Insights into Imaging, 9(4):611–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan H, Paynabar K, and Shi J (2017). Anomaly detection in images with smooth background via smooth-sparse decomposition. Technometrics, 59(1):102–114. [Google Scholar]
Yan H, Paynabar K, and Shi J (2018). Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics, 60(2):181–197. [Google Scholar]
Yuan M and Lin Y (2005). Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472):1215–1225. [Google Scholar]
Zhang Y, Reich BJ, and Bondell H (2017). High dimensional linear regression via the R2-D2 shrinkage prior. ArXiv e-prints.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp1

NIHMS1520715-supplement-Supp1.pdf^{(9.9MB, pdf)}

Supp2

NIHMS1520715-supplement-Supp2.pdf^{(113.3KB, pdf)}

[R1] Acharya UR, Fujita H, Sudarshan VK, Mookiah MRK, Koh JEW, Tan JH, Hagiwara Y, Chua CK, Junnarkar SP, Vijayananthan A, and Ng KH (2016). An integrated index for identification of fatty liver disease using radon transform and discrete cosine transform features in ultrasound images. Information Fusion, 31:43–53. [Google Scholar]

[R2] Armagan A, Dunson DB, and Lee J (2013). Generalized double Pareto shrinkage. Statistica Sinica, 23(1):119–143. [PMC free article] [PubMed] [Google Scholar]

[R3] Bhadra A, Datta J, Li Y, Polson NG, and Willard B (2016a). Prediction risk for the horseshoe regression. ArXiv e-prints.

[R4] Bhadra A, Datta J, Polson NG, and Willard B (2016b). Default Bayesian analysis with global-local shrinkage priors. Biometrika, 103(4):955–969. [Google Scholar]

[R5] Bhattacharya A, Pati D, Pillai NS, and Dunson DB (2014). Dirichlet-laplace priors for optimal shrinkage. ArXiv e-prints. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015). Spatial variable selection methods for investigating acute health effects of fine particulate matter components. Biometrics, 71(1):167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Bradley D and Roth G (2007). Adaptive thresholding using the integral image. Journal of Graphics Tools, 12(2):13–21. [Google Scholar]

[R8] Carlin PB and Banerjee S (2003). Hierarchical multivariate CAR models for spatio- temporally correlated survival data. Bayesian Statistics, 7:45–63. [Google Scholar]

[R9] Carretero-Moya J, Gismero-Menoyo J, Asensio-López A, and Blanco-del Campo A (2009). Application of the radon transform to detect small-targets in sea clutter. IET Radar, Sonar & Navigation, 3(2):155–166. [Google Scholar]

[R10] Carvalho CM, Polson NG, and Scott JG (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480. [Google Scholar]

[R11] Cho TS, Paris S, Horn BKP, and Freeman WT (2011). Blur kernel estimation using the Radon transform. In CVPR 2011, pages 241–248. [Google Scholar]

[R12] Costa MA and Kulldorff M (2009). Applications of spatial scan statistics: a review In Glaz J, Pozdnyakov V, and Wallenstein S, editors, Scan Statistics: Methods and Applications, pages 129–152. Birkhäuser Boston, Boston, MA. [Google Scholar]

[R13] Cressie NAC (2013). Statistics for Spatial Data. Wiley, New York, NY. [Google Scholar]

[R14] Datta J and Ghosh JK (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Analysis, 8(1):111–132. [Google Scholar]

[R15] Deepak KS and Sivaswamy J (2011). Automatic assessment of macular edema from color retinal images. IEEE Transations on Medical Imaging, 31(3):766–776. [DOI] [PubMed] [Google Scholar]

[R16] Donoho DL and Johnstone JM (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455. [Google Scholar]

[R17] Dutta I and Singh RN (2011). Dynamic in situ x-ray diffraction study of antiferroelectric-ferroelectric phase transition in strontium-modified lead zirconate titanate ceramics. Integrated Ferroelectrics, 131(1):153–172. [Google Scholar]

[R18] Editorial (2014). Crystallography matters. Nature Materials, 13. [DOI] [PubMed] [Google Scholar]

[R19] Efron B (2004). The estimation of prediction error: covariance penalties and cross-validation. Journal of the American Statistical Association, 99(467):619–632. [Google Scholar]

[R20] Esteves G, Fancher CM, and Jones JL (2015). In situ characterization of polycrystalline ferroelectrics using x-ray and neutron diffraction. Journal of Materials Research, 30(3):340–356. [Google Scholar]

[R21] Friedman J, Hastie T, Höfling H, and Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332. [Google Scholar]

[R22] Friedman J, Hastie T, and Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Gelfand AE, Kim H-J, Sirmans CF, and Banerjee S (2003). Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98(462):387–396. [Google Scholar]

[R24] Gelfand AE, D. P. G. P. F. M. (2010). Handbook of Spatial Statistics. CRC Press, Boca Raton, FL. [Google Scholar]

[R25] George EI and McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889. [Google Scholar]

[R26] George EI and McCulloch RE (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7(2):339–374. [Google Scholar]

[R27] Geweke J (1996). Variable selection and model comparison in regression. Bayesian Statistics, 5:609–620. [Google Scholar]

[R28] Goldsmith J, Huang L, and Crainiceanu CM (2014). Smooth scalar-on-image regression via spatial Bayesian variable selection. Journal of Computational and Graphical Statistics, 23(1):46–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Goodfellow I, Bengio Y, and Courville A (2016). Deep Learning. MIT Press; http://www.deeplearningbook.org. [Google Scholar]

[R30] Gorfman S, Keeble DS, Glazer AM, Long X, Xie Y, Ye Z-G, Collins S, and Thomas PA (2011). High-resolution x-ray diffraction study of single crystals of lead zirconate titanate. Physical Review B, 84:020102. [Google Scholar]

[R31] Griffin JE and Brown PJ (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1):171–188. [Google Scholar]

[R32] He L and Wang Y (2017). Wavelet frame based image restoration using sparsity, nonlocal and support prior of frame coefficients. The Visual Computer. [Google Scholar]

[R33] Jansen M (2001). Noise Reduction by Wavelet Thresholding. Springer, New York, NY. [Google Scholar]

[R34] Johnson VE and Rossell D (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 107(498):649–660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Kang J, Reich BJ, and Staicu A-M (2018). Scalar-on-image regression via the soft- thresholded Gaussian process. Biometrika, 105(1):165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Katrusiak A (2008). High-pressure crystallography. Acta Crystallographica Section A, 64(1):135–148. [DOI] [PubMed] [Google Scholar]

[R37] Kinoshita SK, Azevedo-Marques PM, Pereira RR Jr, Rodrigues AH, and Ran- gayyan RM (2008). Radon-domain detection of the nipple and the pectoral muscle in mammograms. Journal of Digital Imaging, 21(1):37–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Kulldorff M (1997). A spatial scan statistic. Communications in Statistics - Theory and Methods, 26(6):1481–1496. [Google Scholar]

[R39] Li F, Zhang T, Wang Q, Gonzalez MZ, Maresh EL, and Coan JA (2015). Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression. The Annals of Applied Statistics, 9(2):687–713. [Google Scholar]

[R40] Li G, Best N, Hansell AL, Ahmed I, and Richardson S (2012). BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics, 13(4):695–710. [DOI] [PubMed] [Google Scholar]

[R41] Liao Y, Li X, and Wang J (2017). The study on modified spatial scan statistic In Yang W, editor, Early Warning for Infectious Disease Outbreak, pages 329–342. Academic Press, Cambridge, MA. [Google Scholar]

[R42] Mallick H and Yi N (2013). Bayesian methods for high dimensional linear models. Journal of Biometrics and Biostatistics, [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Mitchell TJ and Beauchamp JJ (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032. [Google Scholar]

[R44] Naus JI (1965). The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association, 60(310):532–538. [Google Scholar]

[R45] Neal RM (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics, 111(1):194–203. [Google Scholar]

[R46] Nelsen RB (2006). An Introduction to Copulas. Springer, New York, NY. [Google Scholar]

[R47] O’hara RB and Sillapää MJ (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian Analysis, 4(1):85–118. [Google Scholar]

[R48] Piironen J and Vehtari A (2016). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. ArXiv e-prints.

[R49] Qiu P and Yandell B (1997). Jump detection in regression surfaces. Journal of Computational and Graphical Statistics, 6(3):332–354. [Google Scholar]

[R50] Radon J (1986). On the determination of functions from their integral values along certain manifolds (Parks PC Trans.). IEEE-Transactions on Medical Imaging, MI-5(4):170–176. [DOI] [PubMed] [Google Scholar]

[R51] Rawat W and Wang Z (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29(9):2352–2449. [DOI] [PubMed] [Google Scholar]

[R52] Ročková V and George EI (2018). The spike-and-slab lasso. Journal of the American Statistical Association, 113(521):431–444. [Google Scholar]

[R53] Sollie P (2013). Morphological Image Analysis: Principles and Applications. Springer Science & Business Media, Berlin, Germany. [Google Scholar]

[R54] Stein ML (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, NY. [Google Scholar]

[R55] Sun W, Reich BJ, Tony Cai T, Guindani M, and Schwartzman A (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1):59–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] Taswell C (2000). The what, how, and why of wavelet shrinkage denoising. Computing in Science Engineering, 2(3):12–19. [Google Scholar]

[R57] Thomas JM (2012). The birth of x-ray crystallography. Nature, 491:186–187. [DOI] [PubMed] [Google Scholar]

[R58] van der Pas S, Szabó B, and van der Vaart A (2017). Uncertainty quantification for the horseshoe. Bayesian Analysis, 12(4):1221–1274. [Google Scholar]

[R59] van der Pas SL, Kleijn BJK, and van der Vaart AW (2014). The horseshoe estimator: posterior concentration around nearly black vectors. Electronic Journal of Statistics, 8(2):2585–2618. [Google Scholar]

[R60] Xu J, Yu J, Peng Y, and Xia X (2011). Radon-Fourier transform for radar target detection, I: Generalized Doppler filter bank. IEEE Transactions on Aerospace and Electronic Systems, 47(2):1186–1202. [Google Scholar]

[R61] Yadav M, Yadav S, and Sharma D (2014). Image denoising using orthonormal wavelet transform with stein unbiased risk estimator. In Electrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students’ Conference, pages 1–4, Bhopal, India. [Google Scholar]

[R62] Yamashita R, Nishio M, Do RKG, and Togashi K (2018). Convolutional neural networks: an overview and application in radiology. Insights into Imaging, 9(4):611–629. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] Yan H, Paynabar K, and Shi J (2017). Anomaly detection in images with smooth background via smooth-sparse decomposition. Technometrics, 59(1):102–114. [Google Scholar]

[R64] Yan H, Paynabar K, and Shi J (2018). Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics, 60(2):181–197. [Google Scholar]

[R65] Yuan M and Lin Y (2005). Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472):1215–1225. [Google Scholar]

[R66] Zhang Y, Reich BJ, and Bondell H (2017). High dimensional linear regression via the R2-D2 shrinkage prior. ArXiv e-prints.

PERMALINK

Spatial Signal Detection Using Continuous Shrinkage Priors

An-Ting Jhuang

Montserrat Fuentes

Jacob L Jones

Giovanni Esteves

Chris M Fancher

Marschall Furman

Brian J Reich

Abstract

1. Introduction

Figure 1:

2. Description of the 2-D XRD data

3. Model description

3.1. The univariate horseshoe prior

3.2. The spatial horseshoe prior (SHP)

4. Theoretical properties

4.1. Properties of the spatial horseshoe prior

Figure 2:

4.1.1. Concentration around zero

4.1.2. Tail behavior

4.2. Properties of the spatial horseshoe posterior

Figure 3:

5. Simulation Study

5.1. Data generation

Figure 4:

5.2. Models

Gaussian:

SHS_quad:

SHS_B-spline 1:

SHS_B-spline 2:

STGP:

5.3. Evaluation metrics

5.4. Results

Table 1:

Table 2:

Figure 5:

Figure 6:

6. Analysis of the 2-D XRD data

6.1. Model comparisons

Table 3:

6.2. Summary of fitted models

Figure 7:

Figure 8:

Figure 9:

7. Conclusion

Supplementary Material

Acknowledgements

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases