Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2020 May 27;107(3):745–752. doi: 10.1093/biomet/asaa008

Bayesian cumulative shrinkage for infinite factorizations

Sirio Legramanti 1,, Daniele Durante 1, David B Dunson 2
PMCID: PMC7430941  PMID: 32831355

Summary

The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage priors are useful in such contexts, increasing shrinkage priors can provide a more effective approach that progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, called the cumulative shrinkage process, for the parameters that control the dimension in overcomplete formulations. Our construction has broad applicability and is based on an interpretable sequence of spike-and-slab distributions which assign increasing mass to the spike as the model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages relative to current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the performance gains are outlined in simulations and in an application to personality data.

Keywords: Factor analysis, Increasing shrinkage, Multiplicative gamma process, Spike-and-slab distribution

1. Introduction

Shrinkage priors have received considerable attention (e.g., Ishwaran & Rao, 2005; Carvalho et al., 2010), but the focus has been on high-dimensional regression, where there is no natural ordering among the coefficients. However, there are several settings in which an order is present. Indeed, in statistical models relying on low-rank factorizations, such as factor and tensor models, it is natural to expect that additional dimensions will play a progressively less important role in characterizing the structure of the model, and therefore the associated parameters should have a stochastically decreasing effect. This behaviour can be induced through increasing shrinkage priors. In the context of Bayesian factor models, an example of this approach can be found in the multiplicative gamma process developed by Bhattacharya & Dunson (2011) to penalize the effect of additional factor loadings via the cumulative product of gamma priors for their precision. This process has been widely applied, but there are still practical disadvantages that motivate the search for alternative solutions (Durante, 2017). In general, despite the importance of increasing shrinkage priors in many factorization models, this field of research remains underdeveloped.

We propose a new increasing shrinkage prior, called the cumulative shrinkage process, which is broadly applicable and has a simple interpretable structure. This prior induces increasing shrinkage via a sequence of spike-and-slab distributions that assign growing mass to the spike as the model complexity grows. In Definition 1, we present the prior for a general case in which the effect of the Inline graphicth dimension is regulated by a scalar parameter Inline graphic, so that redundant terms can essentially be deleted by progressively shrinking the sequence Inline graphic, where Inline graphic, towards an appropriate value Inline graphic. For example, in factor models Inline graphic could represent the variance of the loadings for the Inline graphicth factor, and the goal would be to define a prior on these terms that favours stochastically decreasing effects of the factors via an increasing concentration of the loadings near zero as Inline graphic grows.

Definition 1.

Let Inline graphic denote a countable sequence of parameters. We say that Inline graphic is distributed according to a cumulative shrinkage process with parameter Inline graphic, starting slab distribution Inline graphic and target value Inline graphic if, conditionally on Inline graphic, each Inline graphic is independent and has the following spike-and-slab distribution:

Definition 1. (1)

where Inline graphic are independent Inline graphic variables and Inline graphic is a diffuse continuous distribution.

Equation (1) exploits the stick-breaking construction of the Dirichlet process (Ishwaran & James, 2001). This implies that the probability Inline graphic assigned to the spike Inline graphic increases with the model dimension Inline graphic, and that Inline graphic almost surely. Hence, as the complexity grows, Inline graphic increasingly concentrates around Inline graphic, which is specified to facilitate shrinkage of the redundant terms, while the slab Inline graphic corresponds to the prior on the active parameters. Definition 1 can be extended to sequences in Inline graphic, and Inline graphic can be replaced with a continuous distribution, without affecting the key properties of the prior, which are presented in § 2. As we will discuss in § 2 and § 3.1, it is also possible to restrict Definition 1 to finitely many terms Inline graphic by letting Inline graphic. In practical implementations, this truncated version typically ensures full flexibility if Inline graphic is set to a conservative upper bound, but this value can be extremely large in several high-dimensional settings, thus motivating our initial focus on the infinite expansion and its theoretical properties.

2. General properties of the cumulative shrinkage process

We begin by motivating our cumulative stick-breaking construction for the sequence Inline graphic that controls the mass assigned to the spike in (1) as a function of the model dimension. Indeed, one could alternatively consider prespecified nondecreasing functions bounded between Inline graphic and Inline graphic; however, we have found that such specifications are overly restrictive and have worse practical performance. The specification in (1) is purposely chosen to be effectively nonparametric, and Proposition 1 shows that the prior has large support on the space of nondecreasing sequences taking values in Inline graphic. See the Supplementary Material for proofs.

Proposition 1.

Let Inline graphic denote the probability measure induced on Inline graphic by (1). Then Inline graphic has large support on the whole space of nondecreasing sequences taking values in Inline graphic.

Besides being fully flexible, our construction for Inline graphic in (1) has a simple interpretation and allows control over shrinkage via an interpretable parameter Inline graphic, as stated in Proposition 2 and subsequent results.

Proposition 2.

Each Inline graphic in (1) coincides with the proportion of the total variation distance between the slab and the spike covered up to step Inline graphic, in the sense that Inline graphic for every Inline graphic.

Using similar arguments, we can obtain analogous expressions for Inline graphic and Inline graphic, which are the proportions of, respectively, the total distance Inline graphic and the remaining distance Inline graphic covered from Inline graphic to Inline graphic. Specifically, Inline graphic and Inline graphic for every Inline graphic. The expectations of these quantities can be explicitly calculated as

graphic file with name Equation2.gif (2)

Moreover, upon combining (2) with Definition 1, the expectation of Inline graphic is

graphic file with name Equation3.gif (3)

where Inline graphic defines the expected value under the slab Inline graphic. Hence, as Inline graphic grows, the prior expectation of Inline graphic converges exponentially to the spike location Inline graphic. As stated in Lemma 1, a stronger notion of cumulative shrinkage in distribution, beyond simple concentration in expectation, also holds under (1).

Lemma 1.

Let Inline graphic be an Inline graphic-neighbourhood of Inline graphic with radius Inline graphic, and denote by Inline graphic the complement of Inline graphic. Then, for any Inline graphic and Inline graphic,

Lemma 1. (4)

Therefore, Inline graphic for any Inline graphic, Inline graphic and Inline graphic.

Equations (2)–(4) describe how the rate of increasing shrinkage is controlled by Inline graphic. In particular, a smaller Inline graphic enforces more rapid shrinkage of the redundant terms. This control over the shrinkage rate is separate from the choice of the slab Inline graphic, allowing flexible modelling of the active terms. Such separation does not hold for the multiplicative gamma process (Bhattacharya & Dunson, 2011), whose hyperparameters control both the rate of shrinkage and the prior for the active factors. This creates a trade-off between the need to maintain diffuse priors for the active terms and the attempt to shrink the redundant ones. Moreover, increasing shrinkage holds only in expectation and for specific hyperparameters (Durante, 2017).

Our prior instead ensures increasing shrinkage in distribution for any Inline graphic, and can model any prior expectation for the number of active terms. In fact, Inline graphic is equal to the prior mean of the number of terms in Inline graphic modelled via the slab Inline graphic. This result follows upon noticing that Inline graphic in (1) can alternatively be obtained by marginalizing out the indicator Inline graphic in Inline graphic. According to this result, Inline graphic counts the number of active elements in Inline graphic, and its prior mean is

graphic file with name Equation5.gif

Hence, Inline graphic should be set to the expected number of active terms, while Inline graphic should be sufficiently diffuse to model the active components and Inline graphic should be chosen to facilitate shrinkage of the redundant ones.

Recalling Bhattacharya & Dunson (2011) and Rousseau & Mengersen (2011), it is useful to define models with more than enough components, and then choose shrinkage priors that favour effective deletion of the unnecessary terms. This choice protects against overfitting and allows estimation of the model dimension, bypassing the need for reversible jump (Lopes & West, 2004) or other computationally intensive strategies. Our cumulative shrinkage process in (1) is a useful prior for this purpose. As discussed in § 1, it is straightforward to modify Definition 1 to restrict Inline graphic to Inline graphic components by letting Inline graphic, with Inline graphic being a conservative upper bound. Theorem 1 provides theoretical support for such a truncated representation.

Theorem 1.

If Inline graphic has prior (1) and Inline graphic denotes the sequence obtained by fixing Inline graphic in Inline graphic for every Inline graphic, then for any truncation index Inline graphic and Inline graphic,

Theorem 1.

where Inline graphic is the sup-norm distance and Inline graphic is the complement of Inline graphic.

Therefore, the prior probability that Inline graphic is close to Inline graphic converges to 1 at a rate which is exponential in Inline graphic, thus justifying posterior inference under finite sequences based on a conservative Inline graphic. Although the above bound holds for Inline graphic, in general Inline graphic is set close to zero. Hence, Theorem 1 is valid also for small Inline graphic.

3. Cumulative shrinkage process for Gaussian factor models

3.1. Model formulation and prior specification

Definition 1 provides a general prior that can be used in different models (e.g., Gopalan et al., 2014) under appropriate choices of Inline graphic and Inline graphic. Here, we focus on Gaussian sparse factor models as an important special case to illustrate our approach. We will compare our cumulative shrinkage process primarily with the multiplicative gamma process, which was devised specifically for this class of models and was shown to yield practical gains over several competitors, including the lasso (Tibshirani, 1996) and banding methods (Bickel & Levina, 2008). Although other priors are available for sparse Bayesian factor models (e.g., Carvalho et al., 2008; Knowles & Ghahramani (2011)), these choices have practical disadvantages relative to the multiplicative gamma process, so they will not be considered further.

We will focus on the performance in learning the structure of the covariance matrix Inline graphic for data Inline graphic generated from the Gaussian factor model Inline graphic with Inline graphic, Inline graphic and Inline graphic. In performing Bayesian inference under this model, Bhattacharya & Dunson (2011) assumed Inline graphic and Inline graphic, with scales Inline graphic from independent inverse-gamma Inline graphic priors and global precisions Inline graphic having the multiplicative gamma process prior

graphic file with name Equation7.gif (5)

Specific choices of Inline graphic in (5) ensure that Inline graphic decreases with Inline graphic, thus allowing increasing shrinkage of the loadings as Inline graphic grows. Instead, we keep the same prior for Inline graphic, but let Inline graphic and place our cumulative shrinkage process prior on Inline graphic by assuming

graphic file with name Equation8.gif (6)

where the Inline graphic are independent Inline graphic. By integrating out Inline graphic, each loading Inline graphic has the marginal prior Inline graphic, where Inline graphic denotes the Student-Inline graphic distribution with Inline graphic degrees of freedom, location Inline graphic and scale Inline graphic. Hence, Inline graphic should be set close to zero to facilitate effective shrinkage of redundant factors, while Inline graphic should be specified so as to induce a moderately diffuse prior with scale Inline graphic for the active loadings. Although the choice Inline graphic is possible, we follow Ishwaran & Rao (2005) in suggesting Inline graphic to induce a continuous shrinkage prior on every Inline graphic, which improves mixing and identification of the inactive factors. By exploiting the marginals for Inline graphic, it also follows that if Inline graphic, then Inline graphic for each Inline graphic, Inline graphic and Inline graphic. This leads to cumulative shrinkage in distribution also for the loadings, and provides guidelines for Inline graphic and Inline graphic. Additional discussion on prior elicitation and empirical studies on sensitivity can be found in § 4.

To implement the analysis, we require a truncation Inline graphic on the number of factors needed to characterize Inline graphic, as discussed in § 2. Theorem 2 states that our shrinkage process truncated at Inline graphic terms yields a well-defined prior for Inline graphic with full support, under the sufficient conditions that Inline graphic is greater than the true Inline graphic and that Inline graphic. These conditions are met when considering up to Inline graphic active factors, Inline graphic and Inline graphic.

Theorem 2.

Let Inline graphic be any Inline graphic covariance matrix, and let Inline graphic be the prior probability measure on Inline graphic matrices Inline graphic induced by the Bayesian factor model having prior (6) on Inline graphic, truncated at Inline graphic with Inline graphic. If Inline graphic, then Inline graphic. In addition, if there exists a decomposition Inline graphic such that Inline graphic and Inline graphic, then Inline graphic for any Inline graphic, where Inline graphic is an Inline graphic-neighbourhood of Inline graphic under the sup-norm.

Recalling Theorem 2 in Bhattacharya & Dunson (2011), this result is also sufficient to ensure that the posterior of Inline graphic is weakly consistent.

3.2. Posterior computation via Gibbs sampling

Posterior inference for the factor model in § 3.1 with cumulative shrinkage process (6) truncated at Inline graphic terms for the loadings proceeds via a simple Gibbs sampler. The algorithm relies on a data augmentation which exploits the fact that (6) can be obtained by marginalizing out the independent indicators Inline graphic, with probabilities Inline graphic, in

graphic file with name Equation9.gif (7)

where Inline graphic if Inline graphic and Inline graphic otherwise. Given these indicators, it is possible to sample the other parameters from conjugate full conditional distributions, while the updating of Inline graphic relies on

graphic file with name Equation10.gif (8)

where Inline graphic and Inline graphic are the densities of Inline graphic-variate Gaussian and Student-Inline graphic distributions evaluated at Inline graphic. Algorithm 1 summarizes the steps of the Gibbs sampler.

Algorithm 1.

One cycle of the Gibbs sampler for factor models with the cumulative shrinkage process.

  For Inline graphic to Inline graphic:

  Sample the Inline graphicth row of Inline graphic from Inline graphic, with Inline graphic,

     Inline graphic , Inline graphic and Inline graphic.

  For Inline graphic to Inline graphic:

  Sample Inline graphic from Inline graphic.

  For Inline graphic to Inline graphic:

  Sample Inline graphic from Inline graphic.

  For Inline graphic to Inline graphic:

  Sample Inline graphic from the categorical distribution with probabilities as in (8).

  For Inline graphic to Inline graphic:

  Update Inline graphic from Inline graphic.

  Compute Inline graphic as in (6) with Inline graphic.

  For Inline graphic to Inline graphic:

  If Inline graphic set Inline graphic; otherwise sample Inline graphic from Inline graphic.

Output at the end of the cycle: one sample from the posterior of Inline graphic.

The probabilities in (8) are obtained by marginalizing out Inline graphic, distributed as in (7), from the joint Inline graphic. These calculations are simple under several models that rely on conditionally conjugate constructions, making (1) a general prior that can be used, for example, in Poisson factorizations (Gopalan et al., 2014).

3.3. Tuning the truncation index via adaptive Gibbs sampling

Recalling § 3.1, it is reasonable to perform Bayesian inference with up to Inline graphic factors. Under our cumulative shrinkage process truncated at Inline graphic terms, this translates into Inline graphic, since there are at most Inline graphic active factors, with the Inline graphicth one modelled by the spike, by construction. However, this choice is too conservative, as we expect substantially fewer active factors than Inline graphic, especially when Inline graphic is very large. Therefore, running Algorithm 1 with Inline graphic would be computationally inefficient, since most of the columns in Inline graphic would be modelled by the spike, yielding a negligible contribution to the factorization of Inline graphic.

Bhattacharya & Dunson (2011) addressed this issue by using an adaptive Gibbs sampler which tunes Inline graphic as it proceeds. To satisfy the diminishing adaptation condition in Roberts & Rosenthal (2007), they adapt Inline graphic at iteration Inline graphic with probability Inline graphic, where Inline graphic and Inline graphic. This adaptation consists in dropping the inactive columns of Inline graphic, if any, together with the corresponding parameters. If all columns are active, an extra factor is added, sampling the associated parameters from the prior. This idea can be implemented also for the cumulative shrinkage process, as illustrated in Algorithm 2.

Algorithm 2.

One cycle for the adaptive version of the Gibbs sampler in Algorithm 1.

Let Inline graphic be the cycle number, Inline graphic the truncation index at Inline graphic, and Inline graphic.

  Perform one cycle of Algorithm 1.

  If Inline graphic, adapt with probability Inline graphic as follows:

  If Inline graphic:

    Set Inline graphic, drop the inactive columns in Inline graphic along with the associated parameters

    in Inline graphic, Inline graphic and Inline graphic, and add a final component to Inline graphic, Inline graphic, Inline graphic and Inline graphic sampled from the prior.

  Otherwise:

    Set Inline graphic and add a final column sampled from the spike to Inline graphic, together

    with the associated parameters in Inline graphic and Inline graphic sampled from the corresponding priors.

Output at the end of the cycle: one sample from the posterior of Inline graphic and a value for Inline graphic.

Under (6), the inactive columns of Inline graphic are naturally identified as those modelled by the spike, and hence have index Inline graphic such that Inline graphic. In contrast, under the multiplicative gamma process, a column is flagged as inactive if all its entries are within distance Inline graphic from zero. This Inline graphic plays a similar role to that of our spike location Inline graphic. Indeed, lower values of Inline graphic and Inline graphic make it harder to discard inactive columns, which affects the running time. Although fixing Inline graphic close to zero is key to enforcing shrinkage, excessively low values should be avoided. As under a truncated cumulative shrinkage process the number of active factors Inline graphic is at most Inline graphic, we increase Inline graphic by 1 when Inline graphic and decrease Inline graphic to Inline graphic when Inline graphic.

In our implementation, to let the chain stabilize no adaptation is allowed before a fixed number Inline graphic of iterations, while Inline graphic and Inline graphic are initialized at Inline graphic and Inline graphic, the maximum possible rank for Inline graphic. Further guidance on the choice of Inline graphic can be obtained by monitoring how close Inline graphic is to 1 via (2).

4. Performance assessments in simulations

We conduct simulations to study the performance in learning the structure of the true covariance matrix Inline graphic for data Inline graphic from a Gaussian factor model, where Inline graphic and the entries of Inline graphic are independently drawn from Inline graphic. To examine the performance at varying dimensions, we consider three different combinations of Inline graphic: Inline graphic, Inline graphic and Inline graphic. For every pair Inline graphic we sample 25 datasets of Inline graphic observations from Inline graphic, and for each of the 25 replicates we perform posterior inference on Inline graphic under the Gaussian factor model in § 3.1 with both (5) and (6), exploiting the adaptive Gibbs sampler in Bhattacharya & Dunson (2011) and Algorithm 2, respectively.

For our cumulative shrinkage process we set Inline graphic, Inline graphic and Inline graphic, whereas for the multiplicative gamma process we follow Durante (2017) by taking Inline graphic and set Inline graphic as in the simulations of Bhattacharya & Dunson (2011). For both models, Inline graphic is fixed at Inline graphic as in Bhattacharya & Dunson (2011). The truncation Inline graphic is initialized at Inline graphic for the multiplicative gamma process and at Inline graphic for the cumulative shrinkage process, both corresponding to at most Inline graphic active factors. For the two methods, adaptation is allowed only after 500 iterations and, following Bhattacharya & Dunson (2011), the parameters Inline graphic are set to Inline graphic, while the adaptation threshold Inline graphic in the multiplicative gamma process is Inline graphic. Both algorithms are run for 10 000 iterations after a burn-in of 5000, and by thinning every five we obtain a final sample of 2000 draws from the posterior of Inline graphic. For each of the 25 simulations in each scenario, we compute a Monte Carlo estimate of Inline graphic and Inline graphic. Since Inline graphic, the posterior averaged mean square error accounts for both bias and variance in the posterior of Inline graphic.

Table 1 shows, for each scenario and model, the median and the interquartile range of the above quantities computed from the 25 measures produced by the different simulations, together with the medians of the averaged effective sample sizes for Inline graphic, out of 2000 samples, and of the running times. Such quantities rely on an R (R Development Core Team, 2020) implementation run on an Intel Core i7-3632QM CPU laptop computer with 7.7 GB of RAM. The two methods have comparable mean square errors, but these measures and the performance gains of prior (6) over (5) increase with Inline graphic. Our approach also yields some improvements in mixing and reduced running times. The latter are arguably due to the fact that the multiplicative gamma process overestimates Inline graphic, hence keeping more parameters to update than necessary. In contrast, our cumulative shrinkage process recovers the true dimension Inline graphic in all settings, thus efficiently tuning the truncation Inline graphic. Such an improved learning of the true underlying dimension is confirmed by the 95% credible intervals being highly concentrated around Inline graphic in all the scenarios considered. The multiplicative gamma process leads to wider intervals for Inline graphic, with none of them including Inline graphic. As shown in Table 2, the results are robust to moderate and reasonable changes in the hyperparameters of the cumulative shrinkage process. We also tried to modify Inline graphic in Bhattacharya & Dunson (2011) so as to delete columns of Inline graphic with values on the same scale of our spike. This setting gave lower estimates for Inline graphic, and hence a computational time more similar to that of our cumulative shrinkage process, but it led to worse mean square errors and retained some difficulties in learning Inline graphic.

Table 1.

Performance of the cumulative shrinkage process and the multiplicative gamma process in Inline graphic simulations for each Inline graphic scenario

Inline graphic Method mse Inline graphic Averaged ess Runtime (s)
    Median iqr Median iqr Median Median
Inline graphic cusp 0.75 0.29 5.00 0.00 655.04 310.76
  mgp 0.75 0.32 19.69 0.21 547.23 616.61
Inline graphic cusp 2.25 0.33 10.00 0.00 273.55 716.23
  mgp 2.26 0.28 28.64 1.94 251.35 1845.88
Inline graphic cusp 3.76 0.40 15.00 0.00 175.26 2284.87
  mgp 3.97 0.45 34.38 2.92 116.10 5002.33

cusp, cumulative shrinkage process; mgp, multiplicative gamma process; mse, mean square error; ess, effective sample size; iqr, interquartile range.

Table 2.

Sensitivity analysis for cumulative shrinkage process hyperparameters Inline graphic in Inline graphic simulations

Inline graphic Inline graphic mse Inline graphic Averaged ess Runtime (s)
    Median iqr Median iqr Median Median
(20, 5) (2.5, 2, 2, 0.05) 0.74 0.32 5.00 0.00 626.22 317.31
  (10, 2, 2, 0.05) 0.74 0.33 5.00 0.00 636.61 314.82
  (5, 2, 1, 0.05) 0.72 0.34 5.00 0.00 607.61 322.68
  (5, 1, 2, 0.05) 0.79 0.30 5.00 0.00 602.28 309.39
  (5, 2, 2, 0.025) 0.78 0.31 5.00 0.00 655.80 313.21
  (5, 2, 2, 0.1) 0.74 0.30 5.00 0.04 604.88 315.51
(50, 10) (2.5, 2, 2, 0.05) 2.25 0.40 10.00 0.00 280.39 719.11
  (10, 2, 2, 0.05) 2.20 0.36 10.00 0.00 277.89 748.75
  (5, 2, 1, 0.05) 2.16 0.42 10.00 0.00 266.82 722.67
  (5, 1, 2, 0.05) 2.35 0.40 10.00 0.00 272.47 689.70
  (5, 2, 2, 0.025) 2.22 0.35 10.00 0.00 280.60 717.19
  (5, 2, 2, 0.1) 2.22 0.41 10.00 0.00 273.39 698.96
(100, 15) (2.5, 2, 2, 0.05) 3.68 0.47 15.00 0.00 176.31 2247.44
  (10, 2, 2, 0.05) 3.74 0.40 15.00 0.00 172.02 2205.78
  (5, 2, 1, 0.05) 3.64 0.44 15.00 0.00 172.04 2287.32
  (5, 1, 2, 0.05) 3.96 0.52 15.00 0.00 174.74 2178.47
  (5, 2, 2, 0.025) 3.70 0.44 15.00 0.00 172.83 2200.20
  (5, 2, 2, 0.1) 3.77 0.44 15.00 0.00 174.76 2284.80

mse, mean square error; ess, effective sample size; iqr, interquartile range.

5. Application to personality data

We conclude with an application to a subset of the personality data available in the dataset bfi of the R package psych. Here we focus on the association structure among Inline graphic personality self-report items collected on a six-point response scale from Inline graphic individuals over 50 years of age. These variables represent answers to questions about five personality traits known as agreeableness, conscientiousness, extraversion, neuroticism and openness. Recalling common implementations of factor models, we centre the 25 items and then change the sign of variables 1, 9, 10, 11, 12, 22 and 25 as suggested in the R documentation for the bfi dataset, so as to obtain coherent answers within each personality trait. Posterior inference under priors (5) and (6) is performed with the same hyperparameters and Gibbs settings as in § 4.

Figure 1 shows the posterior means and credible intervals for the absolute value of the entries in the correlation matrix Inline graphic under our model. Samples from Inline graphic are obtained by computing Inline graphic for each sample of Inline graphic, where Inline graphic denotes the elementwise Hadamard product. Figure 1 displays associations within each block of five answers measuring a main personality trait, and reveals interesting across-block correlations between agreeableness and extraversion as well as between conscientiousness and neuroticism. Openness has less evident within-block and across-block associations. These results suggest three main factors, as confirmed by the posterior mean and the 95% credible intervals for Inline graphic under the cumulative shrinkage process, which are Inline graphic and Inline graphic, respectively. Under the multiplicative gamma process, these posterior summaries are Inline graphic and Inline graphic, but the higher Inline graphic does not lead to improved learning of Inline graphic. In fact, when considering the Monte Carlo estimate of the mean squared deviations Inline graphic from the sample correlation matrix Inline graphic, we obtain values that are Inline graphic under (6) and Inline graphic under (5), suggesting that the multiplicative gamma process may be overestimating Inline graphic in this application. This leads to more redundant parameters to be updated in the adaptive Gibbs sampler, and hence to an increase in the running time from 400.69 to 1321.04 seconds, relative to (6).

Fig. 1.

Fig. 1.

Posterior mean and credible intervals for each element of the absolute correlation matrix Inline graphic under our model.

Supplementary Material

asaa008_Supplementary_Data

Acknowledgement

The authors are grateful to the editor, associate editor and referees for useful suggestions, and acknowledge support from the Ministry of Education, University and Research of Italy as well as from the U.S. Office of Naval Research and National Institutes of Health.

Supplementary material available at Biometrika online includes proofs of the theoretical results.

References

  1. Bhattacharya, A. & Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bickel, P. J. & Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]
  3. Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. & West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Am. Statist. Assoc. 103, 1438–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Carvalho, C. M., Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–80. [Google Scholar]
  5. Durante, D. (2017). A note on the multiplicative gamma process. Statist. Prob. Lett. 122, 198–204. [Google Scholar]
  6. Gopalan, P., Ruiz, F. J., Ranganath, R. & Blei, D. (2014). Bayesian nonparametric Poisson factorization for recommendation systems. J. Mach. Learn. Res. W&CP 33, 275–83. [Google Scholar]
  7. Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Am. Statist. Assoc. 96, 161–73. [Google Scholar]
  8. Ishwaran, H. & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33, 730–73. [Google Scholar]
  9. Knowles, D. & Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Statist. 5, 1534–52. [Google Scholar]
  10. Lopes, H. F. & West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14, 41–68. [Google Scholar]
  11. R Development Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]
  12. Roberts, G. O. & Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob. 44, 458–75. [Google Scholar]
  13. Rousseau, J. & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Statist. Soc. B 73, 689–710. [Google Scholar]
  14. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–88. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asaa008_Supplementary_Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES