Efficient posterior sampling for high-dimensional imbalanced logistic regression

Deborshee Sen; Matthias Sachs; Jianfeng Lu; David B Dunson

doi:10.1093/biomet/asaa035

. 2020 Jun 17;107(4):1005–1012. doi: 10.1093/biomet/asaa035

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Deborshee Sen ^1,^✉, Matthias Sachs ², Jianfeng Lu ², David B Dunson ¹

PMCID: PMC7799181 PMID: 33462537

Summary

Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation become inefficient as the number Inline graphic of predictors or the number of subjects to classify gets large, because of the increasing computational time per step and worsening mixing rates. One strategy is to employ a gradient-based sampler to improve mixing while using data subsamples to reduce the per-step computational complexity. However, the usual subsampling breaks down when applied to imbalanced data. Instead, we generalize piecewise-deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch subsampling. These maintain the correct stationary distribution with arbitrarily small subsamples and substantially outperform current competitors. We provide theoretical support for the proposed approach and demonstrate its performance gains in simulated data examples and an application to cancer data.

Keywords: Imbalanced data, Logistic regression, Piecewise-deterministic Markov process, Scalable inference, Subsampling

1. Introduction

In designing algorithms for large datasets, much of the focus has been on optimization methods that produce a point estimate with no characterization of uncertainty. This motivates the development of scalable Bayesian algorithms. As variational methods and related analytic approximations lack theoretical support and can be inaccurate, this note focuses on posterior sampling algorithms.

One such class of methods consists of divide-and-conquer Markov chain Monte Carlo algorithms, which divide the data into chunks, run Markov chain Monte Carlo independently for each chunk, and then combine the samples (Li et al., 2017; Srivastava et al., 2018). However, combining samples inevitably leads to some bias, and accuracy theorems require sample sizes to increase within each subset.

An alternative strategy uses subsamples to approximate transition probabilities and to reduce bottlenecks in calculating likelihoods and gradients (Welling & Teh, 2011). Such approaches typically rely on uniform random subsamples, which can be highly inefficient, as noted in an expanding frequentist literature on biased subsampling (Fithian & Hastie, 2014; Ting & Brochu, 2018). The Bayesian literature has largely overlooked the use of biased subsampling in efficient algorithm design, though recent coreset approaches address a related problem (Huggins et al., 2016). A problem with subsampling Markov chain Monte Carlo is that it is almost impossible to preserve the correct invariant distribution. While there has been work on quantifying the error (Johndrow et al., 2017; Alquier et al., 2016), it is usually difficult to do so in practice. The pseudo-marginal approach of Andrieu & Roberts (2009) offers a potential solution, but it is generally impossible to obtain the required unbiased estimators of likelihoods using subsamples (Jacob & Thiery, 2015).

A promising recent direction has been the use of nonreversible samplers with subsampling within the framework of piecewise-deterministic Markov processes (Bouchard-Côté et al., 2018; Fearnhead et al., 2018; Bierkens et al., 2019). These approaches use the gradient of the loglikelihood, which can be replaced by a subsample-based unbiased estimator so that the exactly correct invariant distribution is maintained. In this note our goal is to improve the efficiency of such approaches by using nonuniform subsampling motivated concretely by logistic regression.

2. Logistic regression with sparse imbalanced data

2.1. Model

We consider the logistic regression model

(1)

where Inline graphic is the response, are predictors, and are coefficients for the predictors. Consider data from model (1), where . For a prior distribution on , the posterior distribution is , where and denotes the potential function. A popular algorithm for sampling from is Pólya–Gamma data augmentation (Polson et al., 2013); however, this performs poorly if there is a large imbalance in the class labels Inline graphic (Johndrow et al., 2019). Similar issues arise when the are imbalanced. Logistic regression is routinely used in a wide range of fields and such imbalance issues are extremely common, giving rise to a great need for more efficient algorithms. While standard Metropolis–Hastings algorithms not relying on data augmentation can perform well despite imbalanced data in settings where both Inline graphic and are small (Johndrow et al., 2019), problems arise in scaling to large datasets due to the increasing computational time per step and slow mixing.

2.2. The zig-zag process

The zig-zag process (Bierkens et al., 2019) is a piecewise-deterministic Markov process that is particularly useful for logistic regression. It is a continuous-time stochastic process Inline graphic on the augmented space , where may be understood as the position and the velocity of the process at time . Under fairly mild conditions, the zig-zag process is ergodic with respect to the product measure , where is the uniform measure on . In other words, holds almost surely for any Inline graphic -integrable function .

For a starting point Inline graphic and velocity , the zig-zag process evolves deterministically according to

(2)

At a random time Inline graphic , a bouncing event flips the sign of one component of the velocity . The process then evolves according to (2) with the new velocity until the next change in velocity. The time is the first arrival time of independent Poisson processes with intensity functions ; that is, with . The sign flip applies Inline graphic to , with if and if . The intensity functions are of the form , where is a rate function. A sufficient condition for the zig-zag process to preserve as its invariant distribution is the existence of nonnegative functions such that ; here denotes the positive part of . The are known as refreshment rates.

If Inline graphic has a simple closed form, the arrival times can be sampled as for ; otherwise, the are obtained via Poisson thinning (Lewis & Shedler, 1979). Assume that we have continuous functions such that ; here are upper computational bounds. Let denote the first arrival times of nonhomogeneous Poisson processes with rates Inline graphic , respectively, and let . A zig-zag process with intensity is still obtained if is evolved according to (2) for time instead of , and the sign of is flipped at time with probability .

The subsampling approach of Bierkens et al. (2019) uses uniform subsampling of a single data point to obtain an unbiased estimate of the Inline graphic th partial derivative of the potential function , where is from the prior and with is from the likelihood. Their subsampling algorithm preserves the correct stationary distribution. Bierkens et al. (2019) considered estimates such that , where indexes the sampled data point. This is used to construct a stochastic rate function as Inline graphic . By using upper bounds satisfying for all , the rate functions can be replaced by stochastic versions , with being resampled at every unthinned event.

In addition, control variates can be used to reduce the variance of the estimate Inline graphic , which can lead to dramatic increases in sampling efficiency when the posterior is concentrated around a reference point (Bierkens et al., 2019). Here we use isotropic Gaussian priors and focus on situations where either is large relative to or the covariates and/or responses are imbalanced. In such situations, the posterior is not sufficiently concentrated around a reference point for control variates to perform efficiently. We demonstrate this numerically in § 4.2 for imbalanced responses, and the Supplementary Material contains similar experiments for sparse covariates. For this reason and to save space, our discussion in this note focuses on subsampling techniques that do not rely on control variates; the techniques developed can be combined with the use of control variates as detailed in the Supplementary Material.

3. Improved subsampling

3.1. General framework

We introduce a generalized version of the zig-zag sampler. Our motivation is (i) to increase the sampling efficiency and (ii) to simplify the construction of upper bounds. We achieve (ii) by letting the Poisson process that determines bouncing times in component Inline graphic be a superposition, i.e., sum, of two independent Poisson processes with state-dependent bouncing rates and . This construction allows Poisson thinning of each process separately, which decouples the problem of constructing upper bounds to that of constructing suitable bounds for the prior and for the likelihood. We achieve (i) by using general forms of the estimator Inline graphic in the Poisson thinning step obtained through nonuniform subsampling.

The resulting algorithm is presented as Algorithm 1, where Inline graphic and , assuming that , with , is an unbiased estimator of and is such that for all and , . To keep the presentation simple, we do not explicitly include the Poisson thinning step for the prior in the algorithm. The state-dependent bouncing rate of the resulting zig-zag process is , with Inline graphic having the explicit form . General results on piecewise-deterministic Markov processes imply that such a zig-zag process preserves the target measure (Fearnhead et al., 2018); we nonetheless provide a proof in the Supplementary Material for a self-contained presentation.

Algorithm 1.

Zig-zag algorithm with generalized subsampling.

Input: starting point , initial velocity , maximum number of bouncing attempts .

Set .

for do

Draw and such that and

Set where and .

Evolve position: .

Draw and .

if then

else if then

else

end for

Output: the path of a zig-zag process specified by skeleton points and bouncing times .

Although the focus of this work is on sampling from the Bayesian logistic regression problem presented in § 2.1, the approach can be readily applied to situations where the following assumption on the terms Inline graphic in the loglikelihood holds.

Assumption 1.

The partial derivatives of are bounded; that is, there exist constants such that for all , .

For the logistic regression problem considered, Bierkens et al. (2019) showed that Assumption 1 is satisfied with

To keep things simple, we consider the prior to be Inline graphic ; we discuss other choices of priors in the Supplementary Material. Then we have that .

In what follows, we introduce alternative subsampling schemes and their associated estimators and bounds as variants of the zig-zag sampler. These are designed to improve sampling efficiency by either (I) improving the mixing of the zig-zag process or (II) reducing the computational cost per simulated unit time interval. Specifically, we replace uniform subsampling with importance subsampling, in § 3.2, to address (II), and we allow general mini-batches instead of subsamples of size 1, see § 3.3, to address (I). The Supplementary Material contains an extension to stratified subsampling, which enables mixing to be further improved.

3.2. Improving bounds via importance sampling

A generalization of the estimator obtained using uniform subsampling, Inline graphic with , is to consider the index as being sampled from a nonuniform probability distribution , defined by where are weights satisfying . It follows that with defines an unbiased estimator of . Moreover, with defines an upper bound for the rate function under Assumption 1. The contribution Inline graphic to the effective bouncing rate is , the same as for uniform subsampling, which corresponds to .

The magnitudes of the upper bounds Inline graphic can be minimized by choosing the weight vector such that the constants are minimized. This can be verified in the case where with , so that . This approach can be trivially generalized to allow for importance weights in cases where the respective partial derivative vanishes, i.e., Inline graphic , which is the case, for example, when the respective covariate in the considered logistic regression example is zero. For logistic regression, using optimal importance subsampling thus reduces the bounds from to for the th dimension. This reduction is particularly significant when the Inline graphic are sparse or have outliers; see § 4.1.

3.3. Improving mixing via mini-batches

In the context of piecewise-deterministic Markov processes, the motivation for using mini-batches of size larger than 1 is to reduce the effective refreshment rate, which can be expected to improve the mixing of the process if the refreshment rate is high (Andrieu et al., 2019). We consider a mini-batch Inline graphic of random indices , so that is an unbiased estimator of . Entries of the mini-batch are typically sampled uniformly and independently from the dataset. This yields unbiased estimators of the form with .

Since for any function Inline graphic one has that , upper bounds for mini-batches of size are also upper bounds for mini-batches of size . We can also let with , where and and are as defined in § 3.2; by the same arguments, we conclude that the value of does not depend on the size of the mini-batch .

If we consider mini-batches of size Inline graphic , the effective bouncing rate of the zig-zag process when used with the estimators described above can be computed as . The effective refreshment rate decreases with increasing mini-batch size, as stated in the following lemma.

Lemma 1.

For all , and , we have

4. Synthetic data examples

4.1. Scaling of computational efficiency

We evaluate the sampling efficiency using synthetic data generated by sampling the covariates Inline graphic from mixture distributions of the form , where is a point mass at zero, is a smooth density, and determines the degree of sparsity. The responses are sampled from (1) with true ; we choose in this section. We further choose a noninformative prior by setting the prior variance to Inline graphic . We repeat the data generation and sampling times. The expected gain in efficiency from using importance subsampling instead of uniform subsampling is estimated as , where and denote the total simulation times after attempted bounces of the zig-zag process in the th run using uniform and importance subsampling, respectively. The Supplementary Material contains further, similar experiments but using control variates.

We can expect the expected relative gain in efficiency to behave as

Figure 1(a) plots the gain in efficiency for sparse covariates for Inline graphic and decreasing . Indeed, the behaviour of the estimated relative gain in efficiency as approximately is as suggested by the first-order Taylor expansion of when is large. Similarly, the form of suggests that the expected relative gain in efficiency is unbounded as the number of observations increases if Inline graphic has unbounded support. We therefore choose dense covariates for increasing . Figure 1(b) shows that the relative gains in efficiency for and are of order and order , respectively, as , which is what the first-order Taylor approximation of suggests. %The more heavy-tailed the density Inline graphic is, the larger the asymptotic growth rate of the efficiency gain becomes. Additionally, when is Student’s distribution, we observe the efficiency gain to be of order .

Fig. 1. — Scaling of the relative gain in efficiency for (red), (blue), (green) and (pink), with: (a) and varying , where the dashed black line is proportional to ; (b) and varying number of observations, where the dashed black line is proportional to , the dotted black line is proportional to , and the dot-dashed black line is proportional to .

4.2. Control variates for sparse data

As mentioned in § 2.2, importance subsampling can be combined with the use of control variates. However, this approach fails to be efficient when the data are imbalanced or sparse. To demonstrate this, we generate covariates as described in § 4.1 for Inline graphic and , and generate responses independently of the covariates such that exactly of them are ones. We choose and , and choose the prior variance to be 1. Figure 2(a) plots, as a function of , the ratio of the mixing time of the slowest-mixing component for importance subsampling with control variates to that for importance subsampling without control variates. As the responses become more imbalanced, i.e., as Inline graphic decreases, it can be seen that the efficiency of using control variates decreases relative to not using them.

Fig. 2. — (a) Gain in efficiency of using control variates over not using control variates for imbalanced responses. Auto-correlation function plots for (b) uniform subsampling and (c) importance subsampling in a high-dimensional sparse example.

4.3. High-dimensional sparse example

In this simulation we consider a challenging setting with number of dimensions Inline graphic and number of observations . The data are generated as in § 4.1 with and . Traditional data augmentation and subsampling algorithms are either computationally very expensive or mix slowly in such a scenario. We choose the prior variance to be 1; auto-correlation function plots for uniform subsampling and for importance subsampling are displayed in Fig. 2 panels (b) and (c), respectively. These show that it is necessary to use importance subsampling for the zig-zag sampler to be a feasible sampling method. From a practical point of view, adaptive preconditioning can be of further help; this is described in the Supplementary Material.

5. Real-data example

We consider an imbalanced set of data on cervical cancer (Fernandes et al., 2017) obtained from the UCI Machine Learning Repository (Dua & Graff, 2017). The dataset contains 858 observations with 34 predictors. The responses are whether or not an individual has cancer, with only 18 out of the 858 individuals having cancer. The predictors include the number of sexual partners, use of hormonal contraceptives, and other variables; more than half of the predictors have approximately 80% zeros. Fixing the number of bouncing attempts, the mixing times of the slowest-mixing component for uniform subsampling and for importance subsampling are Inline graphic and , respectively. A stratification scheme described in the Supplementary Material brings this time down to .

6. Discussion

Subsampling for traditional Markov chain Monte Carlo schemes can be tricky, as the resulting chains induce an error in the invariant distribution that can be difficult to quantify. A promising class of recently developed algorithms, known as piecewise-deterministic Markov processes, allow subsampling without modifying the invariant measure. Nonuniform subsampling strategies can improve such algorithms relative to using uniform subsampling, especially for logistic regression with sparse covariate data; however, this can also be the case for other problems to which piecewise-deterministic Markov processes are applicable. After completion of this work, we became aware of a 2016 University of Oxford master’s thesis by Nicholas Galbraith, where a method called informed subsampling is introduced, which is similar to the importance subsampling strategy presented here. While aspects of sparsity in the covariate data are not addressed in that thesis, the author makes similar observations regarding the usefulness of the approach in the setting of covariate data with outliers or distributed according to heavy-tailed distributions.

Supplementary Material

asaa035_Supplementary_Data

Click here for additional data file.^{(474.9KB, zip)}

Acknowledgement

This research was partially supported by the U.S. National Science Foundation and National Institutes of Health. We are grateful to the associate editor and two referees for helping us to improve the paper. Sen and Sachs contributed equally to this paper.

Supplementary material

Supplementary material available at Biometrika online includes more details on the zig-zag process with generalized subsampling, importance subsampling using control variates and associated numerical experiments, and stratified subsampling, as well as a proof of Lemma 1.

References

Alquier, P., Friel, N., Everitt, R. & Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statist. Comp. 26, 29–47. [Google Scholar]
Andrieu, C., Durmus, A., Nüsken, N. & Roussel, J. (2019). Hypercoercivity of piecewise deterministic Markov process-Monte Carlo. arXiv: 1808.08592v2. [Google Scholar]
Andrieu, C. & Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37, 697–725. [Google Scholar]
Bierkens, J., Fearnhead, P. & Roberts, G. (2019). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47, 1288–320. [Google Scholar]
Bouchard-Côté, A., Vollmer, S. J. & Doucet, A. (2018). The bouncy particle sampler: A nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Statist. Assoc. 113, 855–67. [Google Scholar]
Dua, D. & Graff, C. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml. [Google Scholar]
Fearnhead, P., Bierkens, J., Pollock, M. & Roberts, G. O. (2018). Piecewise deterministic Markov processes for continuous-time Monte Carlo. Statist. Sci. 33, 386–412. [Google Scholar]
Fernandes, K., Cardoso, J. S. & Fernandes, J. (2017). Transfer learning with partial observability applied to cervical cancer screening. In Iberian Conference on Pattern Recognition and Image Analysis. Cham, Switzerland: Springer, pp. 243–50. [Google Scholar]
Fithian, W. & Hastie, T. (2014). Local case-control sampling: Efficient subsampling in imbalanced data sets. Ann. Statist. 42, 1693–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huggins, J., Campbell, T. & Broderick, T. (2016). Coresets for scalable Bayesian logistic regression. In NIPS’16: Proc. 30th Int. Conf. Neural Information Processing Systems. New York: Curran Associates, pp. 4080–8. [Google Scholar]
Jacob, P. E. & Thiery, A. H. (2015). On nonnegative unbiased estimators. Ann. Statist. 43, 769–84. [Google Scholar]
Johndrow, J. E., Mattingly, J. C., Mukherjee, S. & Dunson, D. (2017). Optimal approximating Markov chains for Bayesian inference. arXiv: 1508.03387v3. [Google Scholar]
Johndrow, J. E., Smith, A., Pillai, N. & Dunson, D. B. (2019). MCMC for imbalanced categorical data. J. Am. Statist. Assoc. 114, 1394–403. [Google Scholar]
Lewis, P. W. & Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Naval Res. Logist. Quart. 26, 403–13. [Google Scholar]
Li, C., Srivastava, S. & Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika 104, 665–80. [Google Scholar]
Polson, N. G., Scott, J. G. & Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Statist. Assoc. 108, 1339–49. [Google Scholar]
Srivastava, S., Li, C. & Dunson, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. J. Mach. Learn. Res. 19, 312–46. [Google Scholar]
Ting, D. & Brochu, E. (2018). Optimal subsampling with influence functions. In NIPS’18: Proc. 32nd Int. Conf. Neural Information Processing Systems. New York: Curran Associates, pp. 3650–9. [Google Scholar]
Welling, M. & Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In ICML’11: Proc. 28th Int. Conf. Machine Learning. Madison, Wisconsin: Omnipress, pp. 681–8. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asaa035_Supplementary_Data

Click here for additional data file.^{(474.9KB, zip)}

[B1] Alquier, P., Friel, N., Everitt, R. & Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statist. Comp. 26, 29–47. [Google Scholar]

[B2] Andrieu, C., Durmus, A., Nüsken, N. & Roussel, J. (2019). Hypercoercivity of piecewise deterministic Markov process-Monte Carlo. arXiv: 1808.08592v2. [Google Scholar]

[B3] Andrieu, C. & Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37, 697–725. [Google Scholar]

[B4] Bierkens, J., Fearnhead, P. & Roberts, G. (2019). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47, 1288–320. [Google Scholar]

[B5] Bouchard-Côté, A., Vollmer, S. J. & Doucet, A. (2018). The bouncy particle sampler: A nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Statist. Assoc. 113, 855–67. [Google Scholar]

[B6] Dua, D. & Graff, C. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml. [Google Scholar]

[B7] Fearnhead, P., Bierkens, J., Pollock, M. & Roberts, G. O. (2018). Piecewise deterministic Markov processes for continuous-time Monte Carlo. Statist. Sci. 33, 386–412. [Google Scholar]

[B8] Fernandes, K., Cardoso, J. S. & Fernandes, J. (2017). Transfer learning with partial observability applied to cervical cancer screening. In Iberian Conference on Pattern Recognition and Image Analysis. Cham, Switzerland: Springer, pp. 243–50. [Google Scholar]

[B9] Fithian, W. & Hastie, T. (2014). Local case-control sampling: Efficient subsampling in imbalanced data sets. Ann. Statist. 42, 1693–724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Huggins, J., Campbell, T. & Broderick, T. (2016). Coresets for scalable Bayesian logistic regression. In NIPS’16: Proc. 30th Int. Conf. Neural Information Processing Systems. New York: Curran Associates, pp. 4080–8. [Google Scholar]

[B11] Jacob, P. E. & Thiery, A. H. (2015). On nonnegative unbiased estimators. Ann. Statist. 43, 769–84. [Google Scholar]

[B12] Johndrow, J. E., Mattingly, J. C., Mukherjee, S. & Dunson, D. (2017). Optimal approximating Markov chains for Bayesian inference. arXiv: 1508.03387v3. [Google Scholar]

[B13] Johndrow, J. E., Smith, A., Pillai, N. & Dunson, D. B. (2019). MCMC for imbalanced categorical data. J. Am. Statist. Assoc. 114, 1394–403. [Google Scholar]

[B14] Lewis, P. W. & Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Naval Res. Logist. Quart. 26, 403–13. [Google Scholar]

[B15] Li, C., Srivastava, S. & Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika 104, 665–80. [Google Scholar]

[B16] Polson, N. G., Scott, J. G. & Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Statist. Assoc. 108, 1339–49. [Google Scholar]

[B17] Srivastava, S., Li, C. & Dunson, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. J. Mach. Learn. Res. 19, 312–46. [Google Scholar]

[B18] Ting, D. & Brochu, E. (2018). Optimal subsampling with influence functions. In NIPS’18: Proc. 32nd Int. Conf. Neural Information Processing Systems. New York: Curran Associates, pp. 3650–9. [Google Scholar]

[B19] Welling, M. & Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In ICML’11: Proc. 28th Int. Conf. Machine Learning. Madison, Wisconsin: Omnipress, pp. 681–8. [Google Scholar]

PERMALINK

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Deborshee Sen

Matthias Sachs

Jianfeng Lu

David B Dunson

Summary

1. Introduction

2. Logistic regression with sparse imbalanced data

2.1. Model

2.2. The zig-zag process

3. Improved subsampling

3.1. General framework

Algorithm 1.

Assumption 1.

3.2. Improving bounds via importance sampling

3.3. Improving mixing via mini-batches

Lemma 1.

4. Synthetic data examples

4.1. Scaling of computational efficiency

Fig. 1.

4.2. Control variates for sparse data

Fig. 2.

4.3. High-dimensional sparse example

5. Real-data example

6. Discussion

Supplementary Material

Acknowledgement

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Deborshee Sen

Matthias Sachs

Jianfeng Lu

David B Dunson

Summary

1. Introduction

2. Logistic regression with sparse imbalanced data

2.1. Model

2.2. The zig-zag process

3. Improved subsampling

3.1. General framework

Algorithm 1.

Assumption 1.

3.2. Improving bounds via importance sampling

3.3. Improving mixing via mini-batches

Lemma 1.

4. Synthetic data examples

4.1. Scaling of computational efficiency

Fig. 1.

4.2. Control variates for sparse data

Fig. 2.

4.3. High-dimensional sparse example

5. Real-data example

6. Discussion

Supplementary Material

Acknowledgement

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases