Bayesian cumulative shrinkage for infinite factorizations

Sirio Legramanti; Daniele Durante; David B Dunson

doi:10.1093/biomet/asaa008

. 2020 May 27;107(3):745–752. doi: 10.1093/biomet/asaa008

Bayesian cumulative shrinkage for infinite factorizations

Sirio Legramanti ^1,^✉, Daniele Durante ¹, David B Dunson ²

PMCID: PMC7430941 PMID: 32831355

Summary

The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage priors are useful in such contexts, increasing shrinkage priors can provide a more effective approach that progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, called the cumulative shrinkage process, for the parameters that control the dimension in overcomplete formulations. Our construction has broad applicability and is based on an interpretable sequence of spike-and-slab distributions which assign increasing mass to the spike as the model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages relative to current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the performance gains are outlined in simulations and in an application to personality data.

Keywords: Factor analysis, Increasing shrinkage, Multiplicative gamma process, Spike-and-slab distribution

1. Introduction

Shrinkage priors have received considerable attention (e.g., Ishwaran & Rao, 2005; Carvalho et al., 2010), but the focus has been on high-dimensional regression, where there is no natural ordering among the coefficients. However, there are several settings in which an order is present. Indeed, in statistical models relying on low-rank factorizations, such as factor and tensor models, it is natural to expect that additional dimensions will play a progressively less important role in characterizing the structure of the model, and therefore the associated parameters should have a stochastically decreasing effect. This behaviour can be induced through increasing shrinkage priors. In the context of Bayesian factor models, an example of this approach can be found in the multiplicative gamma process developed by Bhattacharya & Dunson (2011) to penalize the effect of additional factor loadings via the cumulative product of gamma priors for their precision. This process has been widely applied, but there are still practical disadvantages that motivate the search for alternative solutions (Durante, 2017). In general, despite the importance of increasing shrinkage priors in many factorization models, this field of research remains underdeveloped.

We propose a new increasing shrinkage prior, called the cumulative shrinkage process, which is broadly applicable and has a simple interpretable structure. This prior induces increasing shrinkage via a sequence of spike-and-slab distributions that assign growing mass to the spike as the model complexity grows. In Definition 1, we present the prior for a general case in which the effect of the Inline graphic th dimension is regulated by a scalar parameter , so that redundant terms can essentially be deleted by progressively shrinking the sequence , where , towards an appropriate value . For example, in factor models could represent the variance of the loadings for the th factor, and the goal would be to define a prior on these terms that favours stochastically decreasing effects of the factors via an increasing concentration of the loadings near zero as Inline graphic grows.

Definition 1.

Let denote a countable sequence of parameters. We say that is distributed according to a cumulative shrinkage process with parameter , starting slab distribution and target value if, conditionally on , each is independent and has the following spike-and-slab distribution:

(1)

where are independent variables and is a diffuse continuous distribution.

Equation (1) exploits the stick-breaking construction of the Dirichlet process (Ishwaran & James, 2001). This implies that the probability Inline graphic assigned to the spike increases with the model dimension , and that almost surely. Hence, as the complexity grows, increasingly concentrates around , which is specified to facilitate shrinkage of the redundant terms, while the slab corresponds to the prior on the active parameters. Definition 1 can be extended to sequences in Inline graphic , and can be replaced with a continuous distribution, without affecting the key properties of the prior, which are presented in § 2. As we will discuss in § 2 and § 3.1, it is also possible to restrict Definition 1 to finitely many terms by letting . In practical implementations, this truncated version typically ensures full flexibility if Inline graphic is set to a conservative upper bound, but this value can be extremely large in several high-dimensional settings, thus motivating our initial focus on the infinite expansion and its theoretical properties.

2. General properties of the cumulative shrinkage process

We begin by motivating our cumulative stick-breaking construction for the sequence Inline graphic that controls the mass assigned to the spike in (1) as a function of the model dimension. Indeed, one could alternatively consider prespecified nondecreasing functions bounded between and ; however, we have found that such specifications are overly restrictive and have worse practical performance. The specification in (1) is purposely chosen to be effectively nonparametric, and Proposition 1 shows that the prior has large support on the space of nondecreasing sequences taking values in Inline graphic . See the Supplementary Material for proofs.

Proposition 1.

Let denote the probability measure induced on by (1). Then has large support on the whole space of nondecreasing sequences taking values in .

Besides being fully flexible, our construction for Inline graphic in (1) has a simple interpretation and allows control over shrinkage via an interpretable parameter , as stated in Proposition 2 and subsequent results.

Proposition 2.

Each in (1) coincides with the proportion of the total variation distance between the slab and the spike covered up to step , in the sense that for every .

Using similar arguments, we can obtain analogous expressions for Inline graphic and , which are the proportions of, respectively, the total distance and the remaining distance covered from to . Specifically, and for every . The expectations of these quantities can be explicitly calculated as

(2)

Moreover, upon combining (2) with Definition 1, the expectation of Inline graphic is

(3)

where Inline graphic defines the expected value under the slab . Hence, as grows, the prior expectation of converges exponentially to the spike location . As stated in Lemma 1, a stronger notion of cumulative shrinkage in distribution, beyond simple concentration in expectation, also holds under (1).

Lemma 1.

Let be an -neighbourhood of with radius , and denote by the complement of . Then, for any and ,

(4)

Therefore, for any , and .

Equations (2)–(4) describe how the rate of increasing shrinkage is controlled by Inline graphic . In particular, a smaller enforces more rapid shrinkage of the redundant terms. This control over the shrinkage rate is separate from the choice of the slab , allowing flexible modelling of the active terms. Such separation does not hold for the multiplicative gamma process (Bhattacharya & Dunson, 2011), whose hyperparameters control both the rate of shrinkage and the prior for the active factors. This creates a trade-off between the need to maintain diffuse priors for the active terms and the attempt to shrink the redundant ones. Moreover, increasing shrinkage holds only in expectation and for specific hyperparameters (Durante, 2017).

Our prior instead ensures increasing shrinkage in distribution for any Inline graphic , and can model any prior expectation for the number of active terms. In fact, is equal to the prior mean of the number of terms in modelled via the slab . This result follows upon noticing that in (1) can alternatively be obtained by marginalizing out the indicator in . According to this result, Inline graphic counts the number of active elements in , and its prior mean is

Hence, Inline graphic should be set to the expected number of active terms, while should be sufficiently diffuse to model the active components and should be chosen to facilitate shrinkage of the redundant ones.

Recalling Bhattacharya & Dunson (2011) and Rousseau & Mengersen (2011), it is useful to define models with more than enough components, and then choose shrinkage priors that favour effective deletion of the unnecessary terms. This choice protects against overfitting and allows estimation of the model dimension, bypassing the need for reversible jump (Lopes & West, 2004) or other computationally intensive strategies. Our cumulative shrinkage process in (1) is a useful prior for this purpose. As discussed in § 1, it is straightforward to modify Definition 1 to restrict Inline graphic to components by letting , with being a conservative upper bound. Theorem 1 provides theoretical support for such a truncated representation.

Theorem 1.

If has prior (1) and denotes the sequence obtained by fixing in for every , then for any truncation index and ,

where is the sup-norm distance and is the complement of .

Therefore, the prior probability that Inline graphic is close to converges to 1 at a rate which is exponential in , thus justifying posterior inference under finite sequences based on a conservative . Although the above bound holds for , in general is set close to zero. Hence, Theorem 1 is valid also for small .

3. Cumulative shrinkage process for Gaussian factor models

3.1. Model formulation and prior specification

Definition 1 provides a general prior that can be used in different models (e.g., Gopalan et al., 2014) under appropriate choices of Inline graphic and . Here, we focus on Gaussian sparse factor models as an important special case to illustrate our approach. We will compare our cumulative shrinkage process primarily with the multiplicative gamma process, which was devised specifically for this class of models and was shown to yield practical gains over several competitors, including the lasso (Tibshirani, 1996) and banding methods (Bickel & Levina, 2008). Although other priors are available for sparse Bayesian factor models (e.g., Carvalho et al., 2008; Knowles & Ghahramani (2011)), these choices have practical disadvantages relative to the multiplicative gamma process, so they will not be considered further.

We will focus on the performance in learning the structure of the covariance matrix Inline graphic for data generated from the Gaussian factor model with , and . In performing Bayesian inference under this model, Bhattacharya & Dunson (2011) assumed and , with scales from independent inverse-gamma priors and global precisions having the multiplicative gamma process prior

(5)

Specific choices of Inline graphic in (5) ensure that decreases with , thus allowing increasing shrinkage of the loadings as grows. Instead, we keep the same prior for , but let and place our cumulative shrinkage process prior on by assuming

(6)

where the Inline graphic are independent . By integrating out , each loading has the marginal prior , where denotes the Student- distribution with degrees of freedom, location and scale . Hence, should be set close to zero to facilitate effective shrinkage of redundant factors, while should be specified so as to induce a moderately diffuse prior with scale Inline graphic for the active loadings. Although the choice is possible, we follow Ishwaran & Rao (2005) in suggesting to induce a continuous shrinkage prior on every , which improves mixing and identification of the inactive factors. By exploiting the marginals for , it also follows that if , then Inline graphic for each , and . This leads to cumulative shrinkage in distribution also for the loadings, and provides guidelines for and . Additional discussion on prior elicitation and empirical studies on sensitivity can be found in § 4.

To implement the analysis, we require a truncation Inline graphic on the number of factors needed to characterize , as discussed in § 2. Theorem 2 states that our shrinkage process truncated at terms yields a well-defined prior for with full support, under the sufficient conditions that is greater than the true and that . These conditions are met when considering up to Inline graphic active factors, and .

Theorem 2.

Let be any covariance matrix, and let be the prior probability measure on matrices induced by the Bayesian factor model having prior (6) on , truncated at with . If , then . In addition, if there exists a decomposition such that and , then for any , where is an -neighbourhood of under the sup-norm.

Recalling Theorem 2 in Bhattacharya & Dunson (2011), this result is also sufficient to ensure that the posterior of Inline graphic is weakly consistent.

3.2. Posterior computation via Gibbs sampling

Posterior inference for the factor model in § 3.1 with cumulative shrinkage process (6) truncated at Inline graphic terms for the loadings proceeds via a simple Gibbs sampler. The algorithm relies on a data augmentation which exploits the fact that (6) can be obtained by marginalizing out the independent indicators , with probabilities , in

(7)

where Inline graphic if and otherwise. Given these indicators, it is possible to sample the other parameters from conjugate full conditional distributions, while the updating of relies on

(8)

where Inline graphic and are the densities of -variate Gaussian and Student- distributions evaluated at . Algorithm 1 summarizes the steps of the Gibbs sampler.

Algorithm 1.

One cycle of the Gibbs sampler for factor models with the cumulative shrinkage process.

For to :

Sample the th row of from , with ,

, and .

For to :

Sample from .

For to :

Sample from .

For to :

Sample from the categorical distribution with probabilities as in (8).

For to :

Update from .

Compute as in (6) with .

For to :

If set ; otherwise sample from .

Output at the end of the cycle: one sample from the posterior of .

The probabilities in (8) are obtained by marginalizing out Inline graphic , distributed as in (7), from the joint . These calculations are simple under several models that rely on conditionally conjugate constructions, making (1) a general prior that can be used, for example, in Poisson factorizations (Gopalan et al., 2014).

3.3. Tuning the truncation index via adaptive Gibbs sampling

Recalling § 3.1, it is reasonable to perform Bayesian inference with up to Inline graphic factors. Under our cumulative shrinkage process truncated at terms, this translates into , since there are at most active factors, with the th one modelled by the spike, by construction. However, this choice is too conservative, as we expect substantially fewer active factors than , especially when Inline graphic is very large. Therefore, running Algorithm 1 with would be computationally inefficient, since most of the columns in would be modelled by the spike, yielding a negligible contribution to the factorization of .

Bhattacharya & Dunson (2011) addressed this issue by using an adaptive Gibbs sampler which tunes Inline graphic as it proceeds. To satisfy the diminishing adaptation condition in Roberts & Rosenthal (2007), they adapt at iteration with probability , where and . This adaptation consists in dropping the inactive columns of , if any, together with the corresponding parameters. If all columns are active, an extra factor is added, sampling the associated parameters from the prior. This idea can be implemented also for the cumulative shrinkage process, as illustrated in Algorithm 2.

Algorithm 2.

One cycle for the adaptive version of the Gibbs sampler in Algorithm 1.

Let be the cycle number, the truncation index at , and .

Perform one cycle of Algorithm 1.

If , adapt with probability as follows:

If :

Set , drop the inactive columns in along with the associated parameters

in , and , and add a final component to , , and sampled from the prior.

Otherwise:

Set and add a final column sampled from the spike to , together

with the associated parameters in and sampled from the corresponding priors.

Output at the end of the cycle: one sample from the posterior of and a value for .

Under (6), the inactive columns of Inline graphic are naturally identified as those modelled by the spike, and hence have index such that . In contrast, under the multiplicative gamma process, a column is flagged as inactive if all its entries are within distance from zero. This plays a similar role to that of our spike location . Indeed, lower values of Inline graphic and make it harder to discard inactive columns, which affects the running time. Although fixing close to zero is key to enforcing shrinkage, excessively low values should be avoided. As under a truncated cumulative shrinkage process the number of active factors is at most , we increase Inline graphic by 1 when and decrease to when .

In our implementation, to let the chain stabilize no adaptation is allowed before a fixed number Inline graphic of iterations, while and are initialized at and , the maximum possible rank for . Further guidance on the choice of can be obtained by monitoring how close is to 1 via (2).

4. Performance assessments in simulations

We conduct simulations to study the performance in learning the structure of the true covariance matrix Inline graphic for data from a Gaussian factor model, where and the entries of are independently drawn from . To examine the performance at varying dimensions, we consider three different combinations of : , and . For every pair we sample 25 datasets of observations from , and for each of the 25 replicates we perform posterior inference on Inline graphic under the Gaussian factor model in § 3.1 with both (5) and (6), exploiting the adaptive Gibbs sampler in Bhattacharya & Dunson (2011) and Algorithm 2, respectively.

For our cumulative shrinkage process we set Inline graphic , and , whereas for the multiplicative gamma process we follow Durante (2017) by taking and set as in the simulations of Bhattacharya & Dunson (2011). For both models, is fixed at as in Bhattacharya & Dunson (2011). The truncation is initialized at for the multiplicative gamma process and at Inline graphic for the cumulative shrinkage process, both corresponding to at most active factors. For the two methods, adaptation is allowed only after 500 iterations and, following Bhattacharya & Dunson (2011), the parameters are set to , while the adaptation threshold in the multiplicative gamma process is Inline graphic . Both algorithms are run for 10 000 iterations after a burn-in of 5000, and by thinning every five we obtain a final sample of 2000 draws from the posterior of . For each of the 25 simulations in each scenario, we compute a Monte Carlo estimate of and . Since , the posterior averaged mean square error accounts for both bias and variance in the posterior of Inline graphic .

Table 1 shows, for each scenario and model, the median and the interquartile range of the above quantities computed from the 25 measures produced by the different simulations, together with the medians of the averaged effective sample sizes for Inline graphic , out of 2000 samples, and of the running times. Such quantities rely on an R (R Development Core Team, 2020) implementation run on an Intel Core i7-3632QM CPU laptop computer with 7.7 GB of RAM. The two methods have comparable mean square errors, but these measures and the performance gains of prior (6) over (5) increase with Inline graphic . Our approach also yields some improvements in mixing and reduced running times. The latter are arguably due to the fact that the multiplicative gamma process overestimates , hence keeping more parameters to update than necessary. In contrast, our cumulative shrinkage process recovers the true dimension Inline graphic in all settings, thus efficiently tuning the truncation . Such an improved learning of the true underlying dimension is confirmed by the 95% credible intervals being highly concentrated around in all the scenarios considered. The multiplicative gamma process leads to wider intervals for Inline graphic , with none of them including . As shown in Table 2, the results are robust to moderate and reasonable changes in the hyperparameters of the cumulative shrinkage process. We also tried to modify in Bhattacharya & Dunson (2011) so as to delete columns of with values on the same scale of our spike. This setting gave lower estimates for Inline graphic , and hence a computational time more similar to that of our cumulative shrinkage process, but it led to worse mean square errors and retained some difficulties in learning .

Table 1.

Performance of the cumulative shrinkage process and the multiplicative gamma process in Inline graphic simulations for each scenario

Method	mse				Averaged ess	Runtime (s)
	Median	iqr	Median	iqr	Median	Median
cusp	0.75	0.29	5.00	0.00	655.04	310.76
mgp	0.75	0.32	19.69	0.21	547.23	616.61
cusp	2.25	0.33	10.00	0.00	273.55	716.23
mgp	2.26	0.28	28.64	1.94	251.35	1845.88
cusp	3.76	0.40	15.00	0.00	175.26	2284.87
mgp	3.97	0.45	34.38	2.92	116.10	5002.33

Open in a new tab

cusp, cumulative shrinkage process; mgp, multiplicative gamma process; mse, mean square error; ess, effective sample size; iqr, interquartile range.

Table 2.

Sensitivity analysis for cumulative shrinkage process hyperparameters Inline graphic in simulations

		mse				Averaged ess	Runtime (s)
		Median	iqr	Median	iqr	Median	Median
(20, 5)	(2.5, 2, 2, 0.05)	0.74	0.32	5.00	0.00	626.22	317.31
	(10, 2, 2, 0.05)	0.74	0.33	5.00	0.00	636.61	314.82
	(5, 2, 1, 0.05)	0.72	0.34	5.00	0.00	607.61	322.68
	(5, 1, 2, 0.05)	0.79	0.30	5.00	0.00	602.28	309.39
	(5, 2, 2, 0.025)	0.78	0.31	5.00	0.00	655.80	313.21
	(5, 2, 2, 0.1)	0.74	0.30	5.00	0.04	604.88	315.51
(50, 10)	(2.5, 2, 2, 0.05)	2.25	0.40	10.00	0.00	280.39	719.11
	(10, 2, 2, 0.05)	2.20	0.36	10.00	0.00	277.89	748.75
	(5, 2, 1, 0.05)	2.16	0.42	10.00	0.00	266.82	722.67
	(5, 1, 2, 0.05)	2.35	0.40	10.00	0.00	272.47	689.70
	(5, 2, 2, 0.025)	2.22	0.35	10.00	0.00	280.60	717.19
	(5, 2, 2, 0.1)	2.22	0.41	10.00	0.00	273.39	698.96
(100, 15)	(2.5, 2, 2, 0.05)	3.68	0.47	15.00	0.00	176.31	2247.44
	(10, 2, 2, 0.05)	3.74	0.40	15.00	0.00	172.02	2205.78
	(5, 2, 1, 0.05)	3.64	0.44	15.00	0.00	172.04	2287.32
	(5, 1, 2, 0.05)	3.96	0.52	15.00	0.00	174.74	2178.47
	(5, 2, 2, 0.025)	3.70	0.44	15.00	0.00	172.83	2200.20
	(5, 2, 2, 0.1)	3.77	0.44	15.00	0.00	174.76	2284.80

Open in a new tab

mse, mean square error; ess, effective sample size; iqr, interquartile range.

5. Application to personality data

We conclude with an application to a subset of the personality data available in the dataset bfi of the R package psych. Here we focus on the association structure among Inline graphic personality self-report items collected on a six-point response scale from individuals over 50 years of age. These variables represent answers to questions about five personality traits known as agreeableness, conscientiousness, extraversion, neuroticism and openness. Recalling common implementations of factor models, we centre the 25 items and then change the sign of variables 1, 9, 10, 11, 12, 22 and 25 as suggested in the R documentation for the bfi dataset, so as to obtain coherent answers within each personality trait. Posterior inference under priors (5) and (6) is performed with the same hyperparameters and Gibbs settings as in § 4.

Figure 1 shows the posterior means and credible intervals for the absolute value of the entries in the correlation matrix Inline graphic under our model. Samples from are obtained by computing for each sample of , where denotes the elementwise Hadamard product. Figure 1 displays associations within each block of five answers measuring a main personality trait, and reveals interesting across-block correlations between agreeableness and extraversion as well as between conscientiousness and neuroticism. Openness has less evident within-block and across-block associations. These results suggest three main factors, as confirmed by the posterior mean and the 95% credible intervals for Inline graphic under the cumulative shrinkage process, which are and , respectively. Under the multiplicative gamma process, these posterior summaries are and , but the higher does not lead to improved learning of . In fact, when considering the Monte Carlo estimate of the mean squared deviations Inline graphic from the sample correlation matrix , we obtain values that are under (6) and under (5), suggesting that the multiplicative gamma process may be overestimating in this application. This leads to more redundant parameters to be updated in the adaptive Gibbs sampler, and hence to an increase in the running time from 400.69 to 1321.04 seconds, relative to (6).

Fig. 1. — Posterior mean and credible intervals for each element of the absolute correlation matrix under our model.

Supplementary Material

asaa008_Supplementary_Data

Click here for additional data file.^{(143.5KB, pdf)}

Acknowledgement

The authors are grateful to the editor, associate editor and referees for useful suggestions, and acknowledge support from the Ministry of Education, University and Research of Italy as well as from the U.S. Office of Naval Research and National Institutes of Health.

Supplementary material available at Biometrika online includes proofs of the theoretical results.

References

Bhattacharya, A. & Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bickel, P. J. & Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]
Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. & West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Am. Statist. Assoc. 103, 1438–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carvalho, C. M., Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–80. [Google Scholar]
Durante, D. (2017). A note on the multiplicative gamma process. Statist. Prob. Lett. 122, 198–204. [Google Scholar]
Gopalan, P., Ruiz, F. J., Ranganath, R. & Blei, D. (2014). Bayesian nonparametric Poisson factorization for recommendation systems. J. Mach. Learn. Res. W&CP 33, 275–83. [Google Scholar]
Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Am. Statist. Assoc. 96, 161–73. [Google Scholar]
Ishwaran, H. & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33, 730–73. [Google Scholar]
Knowles, D. & Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Statist. 5, 1534–52. [Google Scholar]
Lopes, H. F. & West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14, 41–68. [Google Scholar]
R Development Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]
Roberts, G. O. & Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob. 44, 458–75. [Google Scholar]
Rousseau, J. & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Statist. Soc. B 73, 689–710. [Google Scholar]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–88. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asaa008_Supplementary_Data

Click here for additional data file.^{(143.5KB, pdf)}

[B1] Bhattacharya, A. & Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Bickel, P. J. & Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]

[B3] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. & West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Am. Statist. Assoc. 103, 1438–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Carvalho, C. M., Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–80. [Google Scholar]

[B5] Durante, D. (2017). A note on the multiplicative gamma process. Statist. Prob. Lett. 122, 198–204. [Google Scholar]

[B6] Gopalan, P., Ruiz, F. J., Ranganath, R. & Blei, D. (2014). Bayesian nonparametric Poisson factorization for recommendation systems. J. Mach. Learn. Res. W&CP 33, 275–83. [Google Scholar]

[B7] Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Am. Statist. Assoc. 96, 161–73. [Google Scholar]

[B8] Ishwaran, H. & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33, 730–73. [Google Scholar]

[B9] Knowles, D. & Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Statist. 5, 1534–52. [Google Scholar]

[B10] Lopes, H. F. & West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14, 41–68. [Google Scholar]

[B11] R Development Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]

[B12] Roberts, G. O. & Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob. 44, 458–75. [Google Scholar]

[B13] Rousseau, J. & Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Statist. Soc. B 73, 689–710. [Google Scholar]

[B14] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–88. [Google Scholar]

PERMALINK

Bayesian cumulative shrinkage for infinite factorizations

Sirio Legramanti

Daniele Durante

David B Dunson

Summary

1. Introduction

Definition 1.

2. General properties of the cumulative shrinkage process

Proposition 1.

Proposition 2.

Lemma 1.

Theorem 1.

3. Cumulative shrinkage process for Gaussian factor models

3.1. Model formulation and prior specification

Theorem 2.

3.2. Posterior computation via Gibbs sampling

Algorithm 1.

3.3. Tuning the truncation index via adaptive Gibbs sampling

Algorithm 2.