Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 3.
Published in final edited form as: IEEE Trans Signal Process. 2022 May 3;70:2388–2401. doi: 10.1109/tsp.2022.3172028

Extreme Compressed Sensing of Poisson Rates from Multiple Measurements

Pavan K Kota 1, Daniel LeJeune 2, Rebekah A Drezek 3, Richard G Baraniuk 4
PMCID: PMC9447484  NIHMSID: NIHMS1811488  PMID: 36082267

Abstract

Compressed sensing (CS) is a signal processing technique that enables the efficient recovery of a sparse high-dimensional signal from low-dimensional measurements. In the multiple measurement vector (MMV) framework, a set of signals with the same support must be recovered from their corresponding measurements. Here, we present the first exploration of the MMV problem where signals are independently drawn from a sparse, multivariate Poisson distribution. We are primarily motivated by a suite of biosensing applications of microfluidics where analytes (such as whole cells or biomarkers) are captured in small volume partitions according to a Poisson distribution. We recover the sparse parameter vector of Poisson rates through maximum likelihood estimation with our novel Sparse Poisson Recovery (SPoRe) algorithm. SPoRe uses batch stochastic gradient ascent enabled by Monte Carlo approximations of otherwise intractable gradients. By uniquely leveraging the Poisson structure, SPoRe substantially outperforms a comprehensive set of existing and custom baseline CS algorithms. Notably, SPoRe can exhibit high performance even with one-dimensional measurements and high noise levels. This resource efficiency is not only unprecedented in the field of CS but is also particularly potent for applications in microfluidics in which the number of resolvable measurements per partition is often severely limited. We prove the identifiability property of the Poisson model under such lax conditions, analytically develop insights into system performance, and confirm these insights in simulated experiments. Our findings encourage a new approach to biosensing and are generalizable to other applications featuring spatial and temporal Poisson signals.

Keywords: Compressed sensing, sparse recovery, Poisson, maximum likelihood, Monte Carlo methods, microfluidics

I. Introduction

As data increasingly informs critical decision-making, efficient signal acquisition frameworks must keep pace. Modern signals of interest are often high-dimensional but can be efficiently recovered by exploiting their underlying structure through signal processing. The field of compressed sensing (CS), reviewed in [1], [2], focuses on the recovery of sparse signals from fewer measurements than the signal dimension. Concretely, an N-dimensional signal x* with at most k nonzero entries (in which case x* is said to be k-sparse) can be recovered from a measurement vector y acquired by M < N sensors. The sensors’ linear responses to entries of x* define a sensing matrix Φ such that, compactly, y = Φx*. Recovering x* from y is known as the single measurement vector (SMV) problem [3]-[6]. In the multiple measurement vector (MMV) problem [7]-[9], D measurements are captured in an M × D matrix Y in order to recover X*, an N × D signal matrix. X* is jointly sparse such that only k rows contain at least some nonzero elements. CS has been applied extensively in imaging [10]-[12] and communications [13]-[15] and only recently in biosensing [16]-[19].

Emerging microfluidics technologies in the field of biosensing motivate a new MMV framework. With microfluidics, a single sample can be split into D small-volume partitions such as droplets or nanowells with D on the order of 103 to 107 [20]. Microfluidic partitioning captures individual analytes (e.g. cells [21], [22]; genes [23]; proteins [24], [25]; etc.) in partitions, and analyte quantities across partitions are known to follow a Poisson distribution [26], [27]. The common method to detect a library of analytes with large N is to either dilute samples or split samples into more partitions such that the Poisson distributions reduce to either empty or single-analyte capture, i.e., that columns of X* satisfy xd{0,1} [22], [28]. This assumption motivates a straightforward N-class classification problem for each non-empty partition, but it necessitates clear separation between classes even under noise, some prior knowledge of sample concentration, and the generation of many wasteful, empty partitions. We hypothesize that CS could generalize the signal recovery strategy when samples are sparse, a common characteristic of biological samples. For example, samples may contain only a few microbes or mutations of interest among many possibilities [29], [30].

We propose the following generally applicable framework for the MMV problem with Poisson signals (MMVP). Let each signal xd be drawn independently from a multivariate Poisson distribution parametrized by the N-dimensional, k-sparse vector λ*. That is, xn,dPoisson(λn) are independent. This framework should not be confused with the well-studied “Poisson compressed sensing” problem in imaging where the measurement noise, rather than the signal, follows a Poisson distribution [31], [32]. In contrast to typical MMV problems, our primary goal is to find an estimate λ^λ from D observations rather than to estimate X^. Each signal and measurement pair (xd, yd) is in one of G different sensor groups, each with its own sensing matrix Φ(g) such that yd=Φ(g)xd. The group g associated with each index d is known and deterministic. This measurement group multiplexing is similar to that found in the single pixel camera (SPC) [10]; however, in the SPC the xd are assumed to be equal, whereas in MMVP they are independently sampled. In microfluidics, several sensor groups can be feasibly achieved by forking an input microfluidic channel into G reaction zones each containing its own set of M sensors. Note that xdi.i.d.Poisson(λ) regardless of which group it is in. The statement Y = ΦX* is the special case without noise where G = 1 and is illustrated in Fig. 1. For multiple groups with X(g)* denoting the submatrix of X* in group g, Y is the following concatenation:

Y=[Φ(1)X(1)Φ(G)X(G)]. (1)

Fig. 1.

Fig. 1.

The multiple measurement vector (MMV) problem with Poisson signals (MMVP) with one sensor group and noiseless measurements. White squares are zeroes and darker colors represent larger values. Each column xd of X* is drawn from a Poisson distribution governed by the 2-sparse λ (i.e., xdi.i.d.Poisson(λ)).

A. Contributions and Findings

We present the first exploration of the MMVP problem and develop a novel recovery algorithm and initial theoretical results. We take a maximum likelihood estimation (MLE) approach, treating Y as a set of D observations from which to infer λ^. Our core contributions are 1: the Sparse Poisson Recovery (SPoRe) algorithm that tractably estimates λ^ (Section II); 2: theoretical results on the identifiability of our MMVP model, proving that MLE can asymptotically recover λ* even when M = G = 1 for any k, and insights into MLE performance (Section III); and 3: simulations demonstrating SPoRe’s superior performance over existing and custom baseline algorithms (Section IV). The achievable measurement rate in the MMVP model is unprecedented in CS, and although we are unable to provide theoretical guarantees for our algorithm, we analytically derive insights into the influence of various system parameters and confirm these insights in our simulated experiments. We find that system designers should first maximize M and then increase G as necessary depending on the expected real-world conditions. We found that using G > 1 was helpful for cases with high noise or high nλn.

SPoRe’s strong performance even with extremely low M ∈ {1, 2, 3} under very high measurement noise uniquely enables sensor-constrained applications in biosensing. Although microfluidics devices can rapidly generate a large number of partitions D at a tunable rate, most optical and electrochemical sensing modalities that can keep pace are limited in M [20], [33]. Commonly, fluorescently tagged sensors reveal droplets’ contents rapidly as they flow by a detector, but spectral overlap generally limits M to be five or less without highly specialized, system-specific approaches [34]. High M alternatives such as various spectroscopic techniques limit throughput, necessitate additional instrumentation, or complicate workflows [33], [35]. We speculate that these severe restrictions in M may have forestalled research into CS’s potential role in microfluidics.

B. Previous Work

To the best of our knowledge, the MMVP problem has not yet been explored, likely owing to the ongoing maturation of microfluidics and only recent application of CS to biosensing. The Poisson signal model constrains elements of X to be nonnegative integers under a set of defined probability mass functions. Some aspects of this structure have been studied tangentially, but not the MMVP structure directly.

The core MMV problem only imposes joint sparsity. Early greedy algorithms for this generalized scenario extend the classic Orthogonal Matching Pursuit (OMP) algorithm [36] into OMPMMV [8], simultaneously developed as Simultaenous OMP (S-OMP) [9]. Generally, OMP-based algorithms iteratively build a support set of an estimated sparse solution x^ (or X^) by testing for the correlation between columns of Φ and the residuals between the measurements and previous estimates. A suite of greedy algorithms was recently developed that impose nonnegative constraints to a number of MMV approaches including OMP’s analogues, and the nonnegative extensions outperformed their generalized counterparts [37].

The application of integer constraints to the SMV problem has proven challenging in the literature. Some theory involving sensing matrix design includes [38], [39], but practical algorithms have required additional constraints on the possible integers, e.g., x ∈ {0, 1}N or other finite-alphabet scenarios [14], [40]-[43]. A recent study verified that these problems, as well as those with unbounded integer signals, are NP-hard [44]. Algorithms for the unconstrained integer SMV problem thus apply greedy heuristics such as OMP-based approaches [45], [46].

Additional structural constraints can also make these problems tractable. The communications problem of multi-user detection (MUD), reviewed in [47], bears some similarity to MMVP. Here, the activity of N users is the signal of interest and generally follows a Bernoulli model where each user is active with the same prior probability pa [14]. An alternative prior with n=1Nxn,dPoisson(λ) models the mean number of total active users in any given signal [48] although the authors solely explored an overdetermined system. Applying an MMV framework to MUD enables underdetermined (M < N) applications but has generally assumed that any active user is active for the entire frame of observation (a row of X is entirely zero or nonzero) [49], [50]. Recently, the potency of sensor groups with a G = 2 system was demonstrated in the MUD context [51]. Despite some similarities to MMVP with an MMV framework and discrete signals, MUD most fundamentally differs from MMVP in its utilization of the probabilistic structure of X. In MUD, the model parameters governing user activities are assumed and leveraged in recovery of X, whereas in MMVP, the model parameters in λ* themselves are the target of recovery.

Regardless of the particular structural constraints imposed in the variants of MMV above, M > k is necessary to recover X* [52]. However, in MMV models where the rows of X* are statistically independent, the simpler task of support recovery has been proven to be possible below the M > k regime, as low as M=Ω(k) [53]-[55]. In particular, multiple measurement sparse Bayesian learning (M-SBL) [53] takes a MLE approach in a setting very similar to MMVP, but where xd have a Gaussian distribution rather than Poisson, and there M=Ω(k) is limit for sparse support recovery even for D → ∞. In contrast, we prove that in the MMVP problem, this is improved to M ≥ 1 (i.e., no minimum number of measurements) regardless of k when D → ∞, even when G = 1. Furthermore, the MLE in this setting recovers λ* exactly. We demonstrate empirically that we can recover λ* well for finite D even for extremely low M such as M ∈ {1, 2, 3}.

II. Sparse Poisson Recovery (SPoRe) Algorithm

A. Notation

We denote by P(·) a probability mass function and by p(·) a probability density function. We use RN and ZN to represent the N-dimensional Euclidean space and integer lattice, respectively. We denote by R+N and Z+N the non-negative restrictions on these spaces. We use script letters (A, B, …) for sets unless otherwise described. We use lowercase and uppercase bold-face letters for vectors and matrices, respectively. We represent their dimensions with uppercase letters (e.g., XZ+N×D) that are indexed by their lowercase counterparts. For example, xn,d is the element of X in the nth row and dth column, and we use the shorthands xn and xd to represent the entire nth row and dth column vectors, respectively. Other lower case letters (a, b, ϵ, etc.) may represent variable or constant scalars depending on context. We use λ* and X* to refer to the true signal values, and we denote estimates λ^ and X^ with the source of the estimate (e.g., MLE, SPoRe, baseline algorithm) being implicit from the context. We use ∥ · ∥ to denote a norm, ∥ · ∥0 for the number of nonzero elements of a vector, ∥ · ∥1 for the 1 vector norm, ∥ · ∥2 for the Euclidean norm, and ∥ · ∥F for the Frobenius norm. We also use XRxnmaxdxn,d, a relaxation of the row 0 quasi-norm defined in [56]. We denote the null space of matrix A by N(A). As one abuse of notation, for densities of the form p(yx), we let the corresponding Φ(g) applied to x and the relevant noise model be implicit. Also, we let the division of two vectors represent element-wise division.

B. Algorithm

If the index d is in sensor group g, we say that linear measurements are corrupted by an additive random noise vector bd:

yd=Φ(g)xd+bd. (2)

We let bd be entirely independent (e.g., additive white Gaussian noise (AWGN), as used in our simulations) or dependent on x. With xdi.i.d.Poisson(λ), yd are independent across d as well. The MLE estimate maximizes the average log-likelihood of the measurements:

λ^MLE=argmaxλd=1Dp(ydλ) (3)
=argmaxλ1Dd=1DlogxZ+Np(ydx)P(xλ). (4)

Because the infinite sum over x yields an intractable posterior distribution, we cannot apply the popular expectation-maximization (EM) algorithm [57] to solve this MLE problem. Instead, our Sparse Poisson Recovery (SPoRe) algorithm (Algorithm 1) optimizes this function with batch stochastic gradient ascent, drawing B elements uniformly with replacement from {1, …, D} to populate a batch set B. First, note that

λP(xλ)=P(xλ)(xλ1). (5)

Denoting the objective function from the right-hand side of (4) as , the gradient is

λ=1BdBxZ+Np(ydx)P(xλ)xλxZ+Np(ydx)P(xλ)1. (6)

With gradient ascent, each iteration updates λλ+αλ with learning rate α. However, the summations over all of Z+N are clearly intractable. SPoRe approximates these quantities with a Monte Carlo (MC) integration over S samples of x, newly drawn for each batch gradient step from sampling distribution Q:Z+NR+, such that

xZ+Np(yx)P(xλ)1Ss=1Sp(yxs)P(xsλ)Q(xs). (7)

The optimal choice of Q(xs) is beyond the scope of this work, but we found that Q(xs) = P(xsλ) simplifies the expression, is effective in practice, and draws inspiration from the EM algorithm. In other words, the sampling function is updated at each iteration based on the current estimate of λ. The gradient thus simplifies to

λ=1BdBs=1Sp(ydxs)xsλs=1Sp(ydxs)1. (8)

Note that if only one x^dZ+N satisfied p(ycdx^d)>0 for every yd, the objective would be concave with λ^=1Dd=1Dx^d, i.e., the MLE solution if X* were directly observed. Of course, with compressed measurements and noise, multiple signals may vie to “explain” any single measurement, but SPoRe’s key strength is that it jointly considers independent measurements to directly estimate λ^.

We note that for finite samples, since the MC integration occurs inside a logarithm, the stochastic gradient is biased. However, since it converges in probability to the true gradient, we can expect results comparable to SGD with an unbiased gradient for sufficiently large S [58].

Fig. 2 illustrates key concepts of SPoRe and MMVP with a small example where M = G = 1 and λ* = [0.5, 0, 0.5] for which we can numerically compute p(yλ) for various λ. The measurements yd are effectively drawn from an underlying mixture distribution depending on the noise; e.g., under AWGN, yd follows a Gaussian mixture. The weights on each mixture component are controlled by λ. In simulated recovery, SPoRe assigns weights to the mixture via λ^ according to the distribution of measurements, coming close to the true underlying distribution. In contrast, an 1-Oracle (Section IV) which represents best-case performance for a standard, convex sparse recovery process fails because M < k as shown by its error in λ and illustrated by the difference in the distributions. Moreover, by using Φ = [1, 2, 3], many x will map to the same y. While CS theory generally focuses on conditions for unique or well-spaced projections of k-sparse signals (e.g., the restricted isometry property, RIP [5]), we demonstrate that such restrictions are unnecessary in MMVP. By accounting for the latent Poisson distribution in the signals, SPoRe succeeds even when M < k.

Fig. 2.

Fig. 2.

Example of MMVP and Sparse Poisson Recovery (SPoRe) with M < k < N: Φ = [1, 2, 3], λ* = [0.5, 0, 0.5], and D = 1000 measurements under additive white Gaussian noise (AWGN) b ~ N(0, σ2) with σ2 = 0.02. SPoRe attempts to fit the distribution of measurements directly and finds λ^ ≈ [0.45, 0.03, 0.44]. For comparison, the ℓ1-Oracle (see Section IV) minimizes the measurement error X^=argminXYΦXF with X ≥ 0 and n,dxn,d=n,dxn,d as affine constraints. The estimate λ^ for the ℓ1-Oracle is then set to the average of the columns of X^, and in this example, λ^ ≈ [0.33, 0.31, 0.31]. The distributions p(y∣λ^) for each estimation method are compared against the true distribution p(y∣λ*) and the empirical histogram of the D observations.

Algorithm 1 summarizes the implementation details of SPoRe. Even though λR+N, we enforce λϵ by clipping (ϵ = 10−3 in our simulations) to maintain exploration of the parameter space. Note that in (8), gradients can become very large with finite sampling as some elements of λ approach zero. We found that rescaling gradients to maximum norm γ helps stabilize convergence. For rescaling, we consider only the subvector δΓ of the α-scaled gradient δ, defining indices n ∈ Γ ⊆ {1, …, N} if λn+δn > ϵ. This restriction ensures that rescaling is solely based on the indices still being optimized, excluding those clipping to ϵ.

Algorithm 1 Spare Poisson Recovery (SPoRe)
Input:λ(0),B,S,γ,α,ϵ1:λλ(0)2:i=03:repeat4:DrawBcolumns ofYuniformly with replacement5:DrawSnew samples fromQ(xs)6:δαλ(λ)(8)7:ifδΓ2>γthen8:δγδΓ2δRescale gradient step9:endif10:λnmax(λn+δn,ϵ)11:untilstopping criterion met12:returnλ

For our stopping criterion, we evaluate a moving average of λ^ for convergence. We also track the median of p(ydλ) for dB, accounting for stochasticity in likelihood approximations and batch selections such that we may reduce α if no improvements in the median have been seen within a patience window and terminate if α is reduced five times. We conducted all experiments on commodity personal computing hardware. Ultimately, recovery of λ^ takes a few minutes on a single core for the problem sizes we consider (M ≤ 15, N ≤ 50), and SPoRe can be easily parallelized in the future for faster performance.

C. Practical Considerations

Within an iteration, we found that using the same S = 1000 samples for all dB helped to vectorize our implementation to dramatically improved speed over sampling S times for each drawn yd. This simplification had no noticeable influence on performance. While we found random initializations with a small offset λ(0) ~ Uniform(0, 1) + ν (with ν = 0.1) to be effective in general, we encountered a numerical issue when under low-variance AWGN. Even though AWGN results in nonzero probabilities everywhere, p(ydxs) may numerically evaluate to zero for all drawn samples in low-noise settings. These zeros across all samples result in undefined terms in the summation over dB in (8). SPoRe simply ignores such undefined terms, but when this numerical issue occurs for all of B, SPoRe takes no gradient step. With very low noise and large N dampening the effectiveness of random sampling, SPoRe may stop prematurely as it appears to have converged. This problem did not arise with larger noise variances where even inexact samples pushed λ^ in the generally appropriate direction until better samples could be drawn (recall that Q(x)=P(xλ^) at each iteration). Nonetheless, we decided to set λ(0) = ν for consistency across all simulated experiments. We speculate that setting λ(0) to a small value helped encourage sampling sparse x’s in early iterations to help find xs with nonzero p(ydxs), bypassing the numerical issue altogether.

III. Theory and Analysis

The summation over x inside the logarithm of the objective function complicates the precise analysis of our sPoRe algorithm. However, we can consider the asymptotic MMVP problem as D → ∞ and its MLE solution to gain insight into the superior recovery performance of sPoRe. In this setting, we obtain the powerful result that λ* is exactly recoverable even when M = G = 1 under a simple null space condition on Φ. We then characterize the loss in Fisher Information for MMVP and show how losses accrue with the increase of signals that map to the same measurements, an effect that increases with k. Lastly, we derive insights into the influence of sensor groups through a small-scale analysis. From a system design standpoint, we find that designers should first increase M as much as feasible and then increase G as needed. All proofs can be found in the Appendix.

A. Identifiability of MMVP Models

Our primary theoretical result is that the asymptotic MLE of the MMVP problem is exactly equal to λ* as long as a simple null space condition on Φ is satisfied. This result improves upon the M=Ω(k) limitations of correlation-aware support recovery methods [53], [55], placing no restrictions on k and M. The reason for this improvement is that Φ transforms an integer lattice rather than a linear subspace such that the resulting model is still identifiable.

Identifiability refers to the uniqueness of the model parameters that can give rise to a distribution of observations. A model P={p(λ):λR+N} is a collection of distribution functions which are indexed by the parameter λ; in the MMVP problem, each choice of Φ and noise give rise to a different model P. Through an optimization lens, if our model is identifiable, then λ* is the unique global optimum of the data likelihood as D → ∞. Recall that p(yλ)=xZ+Np(yx)P(xλ), meaning that we can interpret this model as each sensor group consisting of a mixture whose elements’ positions are governed by Φ(g)x, distributions by the noise model, and weights by P(xλ). We focus in this analysis on G = 1, since as D → ∞, at least one sensor group contains infinite measurements. If the corresponding Φ(g) satisfies the conditions we describe here, then the model is identifiable. Formally:

Definition III.1 (Identifiability). The model P is identifiable if p(yλ) = p(yλ′) ∀yλ = λ′ for all λ, λR+N.

The identifiability of mixtures is well-studied [59], [60]; if a mixture is identifiable, the mixture weights uniquely parametrize possible distributions. For finite mixtures, a broad set of distributions including multivariate exponential and Gaussian have been proven to be identifiable [61]. A finite case may manifest in realistic MMVP systems where measurements y must eventually saturate; all sensors have a finite dynamic range of values they can capture. In the most general case, p(·∣λ) is a countably infinite mixture. Although less studied, countably infinite mixtures are identifiable under some classes of distributions [62]. The AWGN that we use in our simulations is identifiable for both the finite and countably infinite cases. Characterizing the full family of noise models that are identifiable under countably infinite mixtures is beyond the scope of this work. Our contribution is that given a noise model that yields identifiable mixtures, equal mixture weights induced by λ and λ′ imply λ = λ′. We prove the sufficiency of the following simple conditions on Φ for identifiability:

Theorem III.2 (Identifiability of Mixture Weights). Let b be additive noise drawn from a distribution for which a countably infinite mixture is identifiable. If N(Φ)R+N={0} and ϕnϕn′n, n′ ∈ {1, …, N} with nn′, then P is identifiable.

The null space condition essentially says that any nonzero vector in N(Φ) must contain both positive and negative elements. Many practical Φ satisfy this constraint (e.g., any Φ with at least one strictly negative or positive row). The second condition is trivial: no two columns of Φ can be identical. We also obtain a separate sufficient condition, that Φ drawn from any continuous distribution results in identifibiablity.

Corollary III.3 (Identifiability with Random Continuous Φ). Let b be additive noise drawn from a distribution for which a countably infinite mixture is identifiable. If the elements of Φ are independently drawn from any continuous distribution, then P is identifiable.

We emphasize the general result of Theorem III.2, since discrete sensing is common in biomedical systems. For example, sensors are often designed to bind to an integer number of known target sites and yield “digital” measurements [26], [63]. Discrete Φ can give rise to what we call collisions. Formally:

Definition III.4 (Collisions and Collision Sets). Let ΦRM×N be a sensing matrix applied to signals xZ+N. A collision occurs between x and x′ when Φx = Φx′. A collision set for an arbitrary uZ+N is the set Cu={x:Φx=Φu;xZ+N}.

If the distribution from which b is drawn is fixed (e.g., AWGN) or a function of Φx, then the mixture weights are the probability mass of each collision set. Let the set of collision sets be U with CuU being an arbitrary collision set.

p(yλ)=CuUp(yxCu)P(Cuλ) (9)
P(Cuλ)=xCuP(xλ). (10)

The weights of the mixture elements are governed by P(Cuλ). Given a noise model that yields identifiable mixtures, the same distribution of observations y implies that the mixture weights are identical, i.e. P(Cuλ)=P(Cuλ)u. We prove that P(Cuλ)=P(Cuλ)u implies λ = λ′, which implies the identifiability of P under both the conditions of Theorem III.2 and Corollary III.3.

Our proofs are based on the existence and implications of single-vector collison sets Cx={x}. When (9) holds, u indexes both the mixture elements and the collision sets. In the general case where b is dependent on x and not simply Φx, signals participating in the same mixture element may have different noise distributions. These differences can only further subdivide collision sets and leaves single-vector collision sets unaffected. Thus, our results also cover the general noise case.

B. Fisher Information of MMVP Measurements

While identifiability confirms that λ* is a unique global optimum of the MLE problem given infinite observations, Fisher Information helps characterize estimation of λ* as D increases. The Fisher Information matrix I is the (negative) Hessian of the expected log-likelihood function at the optimum λ*, and it is well-known that under a few technical conditions the MLE solution is asymptotically Gaussian with covariance I1D. Intuitively, higher Fisher Information implies a “sharper” optimum that needs fewer observations for stable recovery. For direct observations of Poisson signals xd rather than yd, I is diagonal with In,n=1λn. In MMVP with observations of noisy projections (yd), I and its inverse are difficult to analyze. We can, however, instead characterize the reduction in In,n in MMVP caused by the noisy measurement of xd and derive an insight that we empirically confirm in Section IV-D. Concretely, elements of I follow

Ii,j=E[(λilogp(yλ))(λjlogp(yλ))]. (11)

We denote the shorthand wxp(yx)P(xλ) and note that ∑x wx = p(yλ). Following a similar derivation for the partial derivatives in (8), it can be shown that the general expression for diagonal elements In,n is

In,n=(xwxxnxwxλn1)2(xwx)dy. (12)

In the ideal scenario, we observe xd directly such that

In,nideal=xP(xλ)(xnλn1)2(p(yx)dy). (13)

It can be easily shown that (13) reduces to the canonical 1λn. The integration of p(yx) evaluates to one, but we can manipulate it algebraically to re-express the quantity as

In,nideal=x[wx(xnλn1)2]dy. (14)

Let InlossIn,nidealIn,n and let (x,χ) denote the sum over all pairs of signals x′, χZ+N. Expanding Equations (12) and (14) and simplifying yields

Inloss=1λn2(xwxxn2(xwxxn)2xwx)dy=1λn21xwx(x,χwxwχ(xnχn)2)dy. (15)

Note that Inloss is non-negative such that In,nIn,nideal and that pairs of signals with xnχn can contribute to Inloss. Also note that, wx′wχ = p(yx′)p(yχ)P(x′λ*)P(χλ*) and that P(xλ*) > 0 only when supp(x) ⊆ supp(λ*). Thus, the Fisher Information is only reduced over the direct Poisson observation case when there are pairs of signals that are well-explained by the same y and by the same λ*. Clearly, λ* with higher k will result in more of such pairs, and we empirically confirm this influence of k on Fisher Information in Section IV-D. Although further precise analysis via Fisher Information is challenging, we provide deeper analysis of the special case of the MMVP problem with small λ through a different lens in the next section.

C. Small Scale Analysis

With identifiability, we know that λ* uniquely maximizes the expected log-likelihood. However, because SPoRe uses stochastic gradient ascent to optimize the empirical log-likelihood, it will typically achieve a λ^ that is near but not equal to λ*. We therefore wish to understand how the neighborhood of λ* changes given the parameters of the problem. The natural way to do this for MLE problems is to consider the Fisher Information matrix as in the previous section, but the presence of a sum inside the logarithm makes analysis difficult. Instead, we consider a particular λ~ near λ* that solves an optimization related to the original likelihood maximization problem. To further simplify the setting, we consider the “small scale” case where nλn is small enough that there is almost never a case where nxn>1. We emphasize that although this setting is simple, the MLE approach can still drastically outperform a trivial solution such as λ^=E[x^], where x^=argmaxxp(yx), since with sufficient noise, x^x with arbitrary probability (Section IV-B).

At the small scale, the distribution of each xn becomes Bernoulli with parameter λn, and the probability that xn=1 and xn=1 for nn′ vanishes. Let n* ≜ (the first nonzero index of x*, 0 if none), which has a categorical distribution with parameter λ*. We abuse notation so that ϕ0 = 0, λ0 is the probability that n* = 0, and n=0Nλn=1. Applying Jensen’s inequality to the log-likelihood for the conditional expectation given n*, we obtain

E[logn=0np(yn)λn]En[logn=0NEyn[p(yn)]λn]. (16)

Call the right-hand side of this inequality the Jensen bound. This Jensen bound via the logarithm has the attractive property of having a gradient that is equal to a first-order Taylor approximation of the gradient of the original likelihood.1 To see this, consider the partial derivatives for a single λn:

E[p(yn)n=0Np(yn)λn]En[Eyn[p(yn)]n=0NEyn[p(yn)]λn]. (17)

Thus, we can expect the optimizer of the Jensen bound to be close to λ* (this is particularly true as measurement noise vanishes and the bound becomes tight).

In the case where G = 1 under AWGN, we have the following result characterizing the solution of the Jensen bound.

Proposition III.5. If yN(ϕn,σ2I) and

K=(exp{14σ2ϕnϕn22})n,n=0N (18)

is invertible, then the maximizer λ~ of the Jensen bound satisfies

λ~K1(λK1(sμ)). (19)

where sλ~1 and for all n, μn ≥ 0 and μnλ~n=0.

In the case where all entries of λ~ are positive, sμ = 1. K has values of one along the diagonal and smaller values off the diagonal, so it mimics the identity matrix. Clearly, as KI, λ~λ. However, given nonzero σ2, K is bounded away from I. Furthermore, since it is impossible to find a set of more than M + 1 equidistant points in RM, the off-diagonal values of K will differ when M < N, introducing distortion in the transformation.

However, even if M < N, if y is a measurement from a random sensor group, then the effect of this distortion can be mitigated such that λ~ is a reliable estimator of λ* from a support recovery perspective:

Theorem III.6. If yN(ϕn(g),σ2I), g is distributed uniformly, and ϕn(g)i.i.d.N(0,I), then if G → ∞ and all elements of the maximizer λ~ of the Jensen bound are stricly positive, there exist c1 ≥ 0, c2R such that λ~n=c1λn+c2 for 1 ≤ nN.

If λ^ has the same rank ordering as λ*, the exact support can be recovered. Therefore, we expect an increase in G to improve performance in tasks such as support recovery. From this result, however, we expect gains due to increasing G to be less immediate than those due to increasing M (and indeed, we see this in our simulations in Section IV-C). To see this, contrast the asymptotic nature of Theorem III.6 in G with the fact that for a finite choice of M (specifically M = N) we can select all ϕn equidistant (or that for M even smaller we can select Φ satisfying a RIP with some acceptable distortion) and obtain the same reliability result.

IV. Simulations

In this section, we present comparisons of SPoRe against existing and custom baseline algorithms and follow with focused experimentation on SPoRe’s performance and limitations. For most baseline algorithms that find an estimate X^, we set their estimates λ^=1Dd=1Dx^d, i.e., the canonical Poisson MLE if X* were observed directly. For a performance metric, we chose cosine similarity between λ^ and λ* as it captures the relative distribution of elements of the solution which we believe is most useful to a user in biosensing applications. Although comparisons of cosine similarity mask differences in magnitude, estimates with high cosine similarity also exhibited low mean-squared error in our experience (results not shown). We plot cosine similarity alone for brevity. In all simulations, we use AWGN and set ϕm,n(g)i.i.d.Uniform(0,1) since many sensors are restricted to nonnegative measurements. For each parameter combination, we evaluate over 50 trials in which we draw new Φ(g) and λ* for each trial. Due to high performance variability for some baseline algorithms, all error bars are scaled to ±12 standard deviation for consistency and readability.

A. Comparison against existing baselines

With no existing algorithm designed for Poisson signals, we compare against a number of algorithms with various relevant structural assumptions. We compare against both greedy and convex optimization approaches. First, we use DCS-SOMP [64], a generalization of the common baseline Simultaneous Orthogonal Matching Pursuit (S-OMP) [9] that assumes no structure and greedily solves MMV problems for any value of G. Next, we use NNS-SP and NNS-CoSaMP [37], two greedy algorithms for nonnegative MMV CS motivated by subspace pursuit (SP) [65] and compressive sampling matching pursuit (CoSaMP) [66] which exhibited the best empirical performance in [37]. For integer-based recovery, we use PROMP [45], an SMV algorithm for unbounded integer sparse recovery, to recover an estimate for each signal x^d. We also use two support recovery algorithms with M=Ω(k) measurement rates: M-SBL [53] and a convex, 1-based variance recovery algorithm [54]. Support recovery is generally not quantitative and appears to make cosine similarity an inappropriate metric for comparison. However, these algorithms produce an estimate of the variance of xn, which equals λn for the Poisson distribution, making them coincidentally reasonable baseline approaches for MMVP directly.

For comparison against best-possible performance of the baselines and to avoid hyperparameter search where possible (for regularization weights, stopping criteria, etc.), we arm the baselines with relevant oracle knowledge of λ* or X*. While NNS-SP and NNS-CoSaMP require k as an input, we also give DCS-SOMP and PROMP, algorithms that iteratively and irreversibly select support elements, knowledge of k and have them stop after k elements have been chosen. Additionally, we created three oracle-enabled convex algorithms that minimize g=1GYΦ(g)X(g)F under norm constraints with oracle knowledge of the value of the norm. The 1 norm is commonly used as a penalty for convex solvers to encourage sparsity in sparse recovery. Our 1-Oracles include SMV and MMV versions, where in the SMV case, Y is collapsed to a single vector by summing ∑d yd, and a vector x^ is recovered from which λ^=x^D. The 1-Oracle SMV and 1-Oracle MMV are both given n,dxn,d. In [56], ∥XRx is suggested as a better alternative for MMV, so that our Rx-Oracle is given ∥X*∥Rx. In the support recovery algorithm from [54], the vector of sample variances (Var(xn))n=1N is recovered under a constraint on ∑n Var(xn). Because we provide the sum of variances, we call this method the ∑ Var-Oracle. We also enforce non-negativity of the optimization variables in all convex problems, which we solve using the convex optimization package CVX in Matlab [67], [68]. We use no oracle knowledge for M-SBL, but we use the same initialization as for SPoRe and run its fixed point update until convergence.

From Fig. 3a, we see the crucial result that the M < k regime is only feasible with SPoRe, while conventional CS algorithms, both SMV and MMV, fail. Such a result is expected; generally speaking, CS algorithms seek to minimize measurement error (∥YΦXF) while constraining the sparsity of the recovered solution. CS theory focuses on M > k since if M < k, M × k submatrices of Φ yield underdetermined systems in general. In other words, there simply cannot be unique k-sparse minimizers of measurement error alone with M < k, so the conventional CS problem is not well-posed, unlike in the MMVP problem. Support recovery algorithms M-SBL and the ∑ Var-Oracle exhibit improved performance over other CS baselines as M decreases, but SPoRe remains far superior as M → 1. Next, in Fig. 3b, set M = 10, a regime where most baselines performed nearly perfectly according to (Fig. 3a), and we increased the AWGN variance. We see that even in the conventional regime of N > M > k, SPoRe exhibits the highest noise tolerance which reflects the fact that its leverage of the Poisson assumption minimizes its dependence on accurate measurements. Lastly, however, in Fig. 3c, SPoRe has the unique disadvantage of struggling to recover cases with high nλn. We observed that as nλn increases, SPoRe’s finite sampling results in few to no gradient steps taken as “good” samples with nonzero (numerically) p(yxs) were drawn increasingly rarely, and SPoRe mistakenly terminates. Under AWGN, larger λn raises the signal-to-noise ratio but can paradoxically compromise SPoRe’s performance. If Mk is a practical design choice, practitioners should consider existing MMV approaches if nλn may be highly variable.

Fig. 3.

Fig. 3.

Performance of SPoRe vs. compressed sensing baseline algorithms over 50 trials. Common settings unless otherwise specified are M = 10, k = 3, N = 20, D = 100, G = 1, ∑n λn = 2. (a) Performance as a function of M, with σ2 = 10−6 for comparison in an effectively noiseless setting. (b) Performance as a function of AWGN variance, with M = 10. (c) Performance as a function of ∑n λn, with σ2 = 10−2 and M = 10.

B. Comparison against integer baselines: M < k

In the M < k regime, since with high probability we can bound the elements of X*, we might expect the discrete nature of the problem to admit at the least a brute-force solution for obtaining X^ that we can use to obtain λ^. Indeed, if measurement noise is low, then the integer signal that minimizes measurement error for yd is likely to be xd. But a finite search space alone has not enabled integer-constrained CS research to achieve M < k in general.

One may wonder whether SPoRe is simply taking advantage of this practically finite search space and, by virtue of MC sampling over thousands of iterations, is effectively finding the right solution by brute force. To address this possibility, we compare against an 0-Oracle that is given k and the maximum value in X* in order to test all (Nk) combinations of X’s support. For each combination, it enumerates the (max(X*) + 1)k possibilities for each xd and selects x^d=argminxydΦx2. Finally, it returns the k-sparse solution with the lowest minimized measurement error. This algorithm is the only Poisson-free approach in this section.

Comparing SPoRe and other Poisson-enabled baselines against the 0-Oracle characterizes the effect of incorporating the Poisson assumption on recovery performance. An early solution of ours for tackling MMVP, which we now use as a baseline, was an alternating optimization framework to update estimates of X^=argmaxXp(XY,λ^) and λ^=argmaxλp(λY,X^). Noting that p(XY,λ^)p(YX)p(Xλ^), this MAP framework for solving for X under AWGN with variance σ2 is

X^=argmaxX1Dd=1DlogP(ydxd)+logp(xdλ^) (20)
=argmaxX1Dd=1D[12σ2ydΦxd22]+[n=1N(xn,dlogλ^nlogΓ(xn,d+1))], (21)

where the Gamma function Γ(·) is the continuous extension of the factorial (Γ(xn,d + 1) = xn,d!) and is log-concave in the space of positive reals R++N. We implemented the classic branch-and-bound (BB) algorithm [69] to find the optimal integer-valued solution X^ of the concave objective. Once an estimate X^ is available, the update to λ^ is also concave with the closed form solution λ^=1Ddx^d. The biconcavity of this objective function in X and λ makes this approach attractive, but it is unclear how to best initialize λ^. We refer to this alternating baseline algorithm with the prefix “Alt” followed by the method of initialization. For example, for Alt-Random, we use random initialization with a small offset (λ^nUniform(0,1)+0.1) to avoid making any particular λn irretrievable from the start.

We also explore a few “guided” initialization processes. The quantity nλn can hypothetically be estimated from data if P(xd = 0λ) is significant and easily estimated from Y. In fact, quantification in microfluidics often relies on a clear identification of empty sample partitions (that is, where xd = 0) [26]. This motivates a strategy of relaxing the problem by optimizing X with a Poisson assumption on the sum of each column n xn,d rather than each element of X individually. The nλn-Oracle is given nλn and optimizes for P(nxn,dnλn) in place of P(xdλ^) in (20). It is straightforward to show that this objective is also concave. Each estimate x^d is solved via BB from which λ^=1Ddx^d. We use the nλn-Oracle as its own baseline and as an initialization to our alternating framework (Alt-nλn). For the next guided initialization, we again use nλn for an unbiased initialization where the first estimate of λ^n=(nλn)N for all n (Alt-Unbiased). Finally, we used the output of SPoRe as an initial value for λ^ (Alt-SPoRe). Alt-SPoRe can be understood as a way to use SPoRe to estimate X^ if needed.

Fig. 4 illustrates that SPoRe has the greatest tolerance to measurement noise whereas the 0-Oracle has the least. This comparison illustrates the value of incorporating the Poisson assumption in recovery; specifically, the integer and sparsity structures (perfectly captured by the 0-Oracle) are not sufficient for recovery under measurement noise. The alternating optimization algorithm’s behavior was unexpected; initialization (other than with Alt-SPoRe) does not appear to have a major influence on performance. Surprisingly, comparing Alt-nλn versus nλn-Oracle and Alt-SPoRe versus SPoRe, alternating seems to worsen the performance under high noise. Our interpretation is that in high noise settings, the ability of SPoRe to not “overcommit” to a particular solution x^d may be especially effective when λ* is the signal of interest rather than X*. Any given measurement yd may make the specific estimate x^d arbitrarily unreliable. In our alternating framework with x^d recovered separately for each d, errors on individual estimates accumulate. SPoRe instead makes gradient steps based on batches of observations, helping it maintain awareness of the full distribution of measurements.

Fig. 4.

Fig. 4.

AWGN tolerance of integer-restricted algorithms over 50 trials with M = 2, k = 3, N = 10, D = 100, G = 1, ∑n λn=2.

C. Sparsity and nλn

We empirically tested the limitations of SPoRe’s recovery performance under very challenging conditions of M = 2, 3 ≤ k ≤ 7, N = 50, σ2 = 10−2. Here we set D = 1000 to better reflect the typical capabilities of biomedical systems, whereas D = 100 in our baseline comparisons was due to our budget on computational time strained by solving BB for x^d. From our analysis and previous simulations, we expect that both k and the magnitudes of λn will influence recovery. Fig. 5 probes when and why SPoRe fails. Fig. 5a illustrates SPoRe’s performance decreases with increasing k and nλn. To elucidate the cause of poor performance, Fig. 5b shows SPoRe’s performance under the same conditions when initialized at the optimum. SPoRe’s maintenance of high cosine similarity in this case means that in Fig. 5a, SPoRe is converging to incorrect optima (or terminating before convergence). These two figures depict fundamental limitations of stochastic optimization in a challenging landscape.

Fig. 5.

Fig. 5.

SPoRe’s performance and behavior as a function of k and ∑n λn over 50 trials. Common settings unless otherwise specified are M = 2, N = 50, D = 1000, G = 1, σ2 = 10−2. (a) Performance when initialized with standard λ^ = 0.1. (b) Performance when initialized with λ^λ*, specifically λ^n = max{ϵ, λn}. (c) Average variance of partial derivatives for indices n ∉ supp(λ*) evaluated at λ^λ*.

Moreover, Fig. 5c illustrates that MC gradients decrease in quality with high nλn and k. In SPoRe, recall that we set a minimum λ^nϵ=103 so that nonzero xn have a chance of being sampled for all n. We keep S fixed as we increase k and nλn, and we see that the variance of the gradient increases at coordinates where λ^n=ϵ and λn=0. Such an effect accounts for some drift from the optimum observed in Fig. 5b that increases with k, and we believe that it helps to explain the inability to converge to the optimum in Fig. 5a. Future work can explore alternative techniques for stochastic optimization and sampling. Practitioners may benefit significantly from reducing nλn if faced with limitations in M. In a biosensing context, these results indicate that given a microfluidics system with fixed D, having fewer total analytes in the sample (i.e., smaller Dnλn) can counterintuitively improve performance and may be particularly important if both M and G are limited.

However, note in (8) that SPoRe’s gradients are defined by an average of xs weighted by p(ydxs). The previous result from Fig. 3c in which SPoRe performed well with nλn20 when M = 10 illustrates that limitations of MC sampling may be offset by improving the ability of p(yxs) to guide gradients. In Fig. 6, we explore this notion further for M-constrained systems by increasing G. One may wonder how increasing G compares to a CS problem with GM measurements (i.e., Φ¯RGM×N). Although λ* is fixed across groups, the xd are random such that there is no reasonable method to directly stack individual measurements yd from multiple groups. Instead, we created a new baseline 1-Oracle GM SMV. Denote the average of measurements and signals in each group g as y¯(g) and x¯(g), respectively. Our new baseline stacks all y¯(g) into one measurement vector y¯RGM and minimizes y¯Φ¯λ2 with respect to λ given n,dxn,d and λ ≥ 0. Stacking measurements and sensing matrices implicitly assumes that for each group, y¯(g)Φ(g)λ, or that x¯(g)λg. It can be easily shown that the relative errors in these approximations reduce with increasing D or λ*.

Fig. 6.

Fig. 6.

Performance of SPoRe (solid) vs. ℓ1-Oracle GM SMV (dashed) as a function of G. Common settings are k = 7, N = 50, D = 1000, σ2 = 10−2. (a) Comparison with ∑n λn = 10, motivated by Fig. 5a. (b) Comparison with ∑n λn = 1.

Although increasing D is feasible in microfluidics, it generally corresponds with a reduction in λ* since a sample’s total analyte content is fixed. Therefore, in Fig. 6, we focus on the influence of the magnitude of λ*. In Fig. 6a, used the most challenging settings from Fig. 5 with k = 7 and nλn=10. As expected from our analysis in Section III-C, larger choices of M make SPoRe much more effective per sensor group, but near perfect recovery is achievable even with M = 1. However, note that the new oracle baseline performs almost identically to SPoRe, with SPoRe exhibiting a modest advantage only when GM is comparable to or less than k. When we reduce nλn to 1 in Fig. 6b, the assumption that x¯(g)λ becomes far less valid. As a result, the performance improvement with SPoRe is dramatic. For instance, what SPoRe achieves with M = 1 is only matched by the oracle baseline when M = 3. For applications in which x¯(g)λ and GM > k, practitioners could consider reformulating the recovery problem as a standard CS problem. However, SPoRe is uniquely suited for systems with GM < k and is the best generalized approach for applications with smaller λ* or D.

D. Efficiency

For system design, it is helpful to know how many observations D are necessary and sufficient for stable estimation of λ^. Such insight is often derived from the analysis of Fisher Information I. Recall that for direct observations of Poisson signals xd, the ideal case, the MLE solution λ^n=dxn,dD is an efficient estimator and achieves the Cramér–Rao bound such that var(λ^n)=λnD.

In Section III-B, we derived the reduction in In,n from MMVP measurements and found reason to expect that the reduction increases with k. Here, we empirically characterize this effect (Fig. 7). The matrix I is evaluated at λ*, so we only consider n ∈ supp(λ*). In MMVP, the variance of λ^n will depend on Φ and n, but by redrawing random Φ and λ* over 50 trials, we hope to smooth out these dependencies and capture the broader effect of low dimensional projections. For a concise comparison considering n ∈ supp(λ*), we set all λn=1, pool all of λ^n across all trials, and compute a single average variance for each k and D. Because the Fisher Information describes the optimization landscape near the optimum, we chose parameter settings (M = 2, G = 20) based on our results in Fig. 6a to be confident that SPoRe is arriving near the optimum and the estimation variance is not confounded by poor estimates.

Fig. 7.

Fig. 7.

Comparison of average variance of λ^n from SPoRe versus the ideal Cramér-Rao (CR) bound as a function of D over 50 trials with n ∈ supp(λ*), λn = 1, M = 2, N = 50, G = 20, σ2 = 10−2.

Fig. 7 shows a noticeable increase in the estimation variance and verifies that this deviation from the ideal bound is exponentially worsened in k. However, we argue that the variance quickly becomes negligible at reasonable D for practical purposes. Practitioners could consider the necessary precision of estimation and the maximum expected k for an application, increase D as needed, and worry little about the influence of noisy measurements in low dimensions.

V. Discussion

We have found that the structure in the MMVP problem can be easily exploited for substantial improvements in signal recovery. While compressed sensing of arbitrary integer signals has proven challenging in the past, Poisson constraints not only make the recovery problem tractable, but even significantly easier. Most inverse problems necessitate constraints that make the signal-to-measurement transformations nearly isometric: in compressed sensing, these manifest as restrictions on Φ noise, and the relationship between M, N, and k. In MMVP, recovery of λ* is theoretically possible under very lax conditions on Φ (Theorem III.2) and practically achievable as shown in our simulations.

In practice, our new SPoRe algorithm exhibits high performance even under deliberately challenging circumstances of high noise and M < k. Here, recovery with G = 1 appears particularly feasible even with noisy measurements if nλn is small. Because the log-likelihood is not concave, SPoRe’s gradient ascent approach is not theoretically guaranteed to find a global optimum since local optima may exist. However, if they exist, we speculate that SPoRe is naturally poised to evade these traps due to stochasticity in its gradient steps from both batch draws and MC integrations.

We noted a few scenarios in which SPoRe’s MC sampling appears to cause issues with convergence or early termination that are generally associated with increases in k and nλn. We anticipate that further increases in N may also contribute to these effects. While k and N are entirely determined by the application, system designers can reduce nλn by increasing the spatial or temporal sampling rate. In microfluidics, this adjustment translates to either generating more (smaller) partitions D given a fixed sample volume or diluting a sample prior to partitioning. Our initial implementation of SPoRe uses S = 1000, can easily run on personal computers, and is sufficient for systems with N < 102. This scale is appropriate for most applications in biosensing, and future work with parallelized or adaptive sampling strategies could improve the reliability of recovery for larger systems. Moreover, we found that increasing M and G appear to mitigate poor performance due to excessive sampling noise.

To date, CS implementations with low M and high G, based on our notation, reformulate the acquired measurements into a canonical CS problem with GM measurements [10]. As we discussed in Section IV-C, this reformulation implicitly assumes that the average signal in every group (x¯(g)) is approximately λ*. However, this assumption fails in biosensing in two common scenarios: the total number of analytes in a typical sample may be small, or the sensing mechanism may impose restrictions on D. In such cases, SPoRe is the superior approach allowing M < k, achieving high performance even with M = 1 under high noise. In this work, we found practical limits on the sparsity level k that we could accurately solve, hence the “S” in SPoRe, but future research should explore the conditions needed for the reliable recovery of non-sparse signals. High performance in this regime would inspire new methods in biosensing for the analysis of heterogeneous samples with many types of cells.

The ability to recover signals in MMVP, even in the extreme case of M = G = 1, is unprecedented in CS and offers a new paradigm for sensor-constrained applications. The current state-of-the-art method for achieving similar efficiency in microfluidics is to essentially guarantee single-analyte capture for classification by substantially increasing the sampling rate, whereas our MMVP framework is not reliant on such an intervention. Increasing G can make SPoRe reliable under harsh conditions, is straightforward with microfluidics, and offers a potent alternative to increasing the sampling rate. For example, diluting to a tolerable concentration is challenging with samples of unknown content such as in diagnostics applications, and increasing D either delays results or necessitates high throughput measurement acquisition which may not be feasible depending on the nature of the sensors. Our group is continuing research in CS-based microbial diagnostics [18] by working towards an in vitro demonstration of MMVP-based diagnostics with microfluidics.

Our initial theoretical and empirical results show the promise of MMVP, but there are many directions for further research. For instance, theoretical results that precisely relate M, N, D, k, λ* and noise such as in a recovery guarantee would be highly valuable. Moreover, SPoRe can accept any signal-to-measurement model p(yx). While we have proven identifiability under linear mappings with common noise models, SPoRe can be easily applied with any application-specific model2 even if proving identifiability of the Poisson mixture is difficult. With growing interest in microfluidics, SPoRe’s promising performance in the MMVP problem warrants further research to ensure that the statistical assumptions underlying these new technologies are leveraged to their full potential.

Acknowledgments

We thank the anonymous reviewers for their helpful feed-back and for pointing out the connection between the MMVP problem and correlation-aware MMV problems. This work was supported by NSF grants CBET 2017712, CCF-1911094, IIS-1838177, and IIS-1730574; ONR grants N00014-18-12571, N00014-20-1-2787, and N00014-20-1-2534; AFOSR grant FA9550-18-1-0478; a Vannevar Bush Faculty Fellow-ship, ONR grant N00014-18-1-2047; and the Rice University Institute of Biosciences and Bioengineering. P.K.K. was sup-ported by the NLM Training Program in Biomedical Informat-ics and Data Science (T15LM007093).

Biographies

graphic file with name nihms-1811488-b0001.gif

Pavan Kota received the B.S.E. in biomedical engineering from Case Western Reserve University (Cleveland, OH) in 2017.

He is currently pursuing his Ph.D. in the Department of Bioengineering at Rice University and an NIH NLM Fellow in Biomedical Informatics and Data Science. Pavan is primarily interested in the application of signal processing and machine learning techniques to biomedical diagnostics.

graphic file with name nihms-1811488-b0002.gif

Daniel LeJeune (S’11) received the B.S. in engineering from McNeese State University (Lake Charles, LA) in 2014 and the M.S. in electrical and computer engineering from the University of Michigan in 2016.

He is currently a PhD candidate in the Department of Electrical and Computer Engineering at Rice University. His research interests include machine learning theory and adaptive algorithms.

graphic file with name nihms-1811488-b0003.gif

Rebekah Drezek received her B.S.E. from Duke University (1996) and her M.Sc. (1998) and Ph.D. (2001) from the University of Texas at Austin, all in electrical engineering.

She is currently a professor and associate chair in the Department of Bioengineering at Rice University and a fellow of AIMBE. Her research interests include optical imaging, nanomedicine, and diagnostics.

graphic file with name nihms-1811488-b0004.gif

Richard Baraniuk (S’85–M’93–SM’98–F’01) received the B.S. from the University of Manitoba, Canada (1987), the M.S. from the University of Wisconsin-Madison (1988), and the Ph.D. from the University of Illinois at Urbana-Champaign (1992), all in electrical engineering.

He is currently the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University and the Founding Director of OpenStax (openstax.org). His research interests lie in new theory, algorithms, and hardware for sensing, signal processing, and machine learning. He is a Fellow of the American Academy of Arts and Sciences, National Academy of Inventors, American Association for the Advancement of Science, and IEEE. He has received the DOD Vannevar Bush Faculty Fellow Award (National Security Science and Engineering Faculty Fellow), the IEEE James H. Mulligan, Jr. Education Medal, and the IEEE Signal Processing Society Technical Achievement, Education, Best Paper, Best Magazine Paper, and Best Column Awards. He holds 35 US and 6 foreign patents.

Appendix

A. Proof of Theorem III.2

We use a direct proof, assuming P(Cuλ)=P(Cuλ)u and proving the resulting implication λ = λ′. Let z(x)ZN be such that x+z(x)Cx. By Definition III.4, x+z(x)Z+N and z(x)N(Φ).

Lemma A.1. If N(Φ)R+N={0}, and P(C0λ)=P(C0λ), then nλn=nλn.

Proof. The null space condition on Φ means that C0={0}; there is no vector z(0) that satisfies 0+z(0)Z+N other than z(0) = 0. Therefore, P(C0λ)=P(C0λ)enλn=enλnnλn=nλn.

We now turn our attention to the one-hot collision sets. Let ej denote the jth standard basis vector. By Definition III.4, Cej={x:Φx=ϕj,xZ+N}. For Cej that contain only ej, we have the following result:

Lemma A.2. If N(Φ)R+N={0} and Cej={ej}, then λj=λj.

Proof. The restriction on C1j means P(C1jλ)=P(C1jλ)λjenλn=λjenλn. Applying Lemma A.1 yields λj=λj.

By similar arguments to Lemmas A.1 and A.2, we can prove Corollary III.3 under the assumption that there are no collisions instead of the null space condition. When the elements of Φ are independently drawn from continuous distributions, the collision of any particular x and x′ occurs with probability zero. Since Z+N×Z+N is countably infinite, there are no collisions almost surely. As such, Cej={ej}j, and therefore λ = λ′.

For the conditions in Theorem III.2, the following Lemma states the existence of at least one j satisfying Cej={ej}:

Lemma A.3. If N(Φ)R+N={0} and ϕnϕn, ∀n, n′ ∈ {1, … , N} with nn′, thenj such that Cej={ej}.

Proof. If Cej={ej}, then ∄ z(ej)0. Define P as the number of one-hot collision sets that contain more than just ej, and note that PN. Without loss of generality, let us say that Cej for j ∈ {1, … , P} meet this condition. Lemma A.3 effectively says that P < N, such that NP > 0 one-hot collision sets contain only ej. We proceed with a proof by contradiction by assuming P = N.

By our null space condition, z(ej) must contain both positive and negative integers. There are two additional conditions on nontrivial z(ej). First, because ej+z(ej)Z+N, the only negative component of z(ej) is zj(ej)=1. To see this, if zi(ej) for ij were negative, then ej + z(ej) would be negative at index i, and if zj(ej) were less than −1, then ej + z(ej) would be negative at index j. Second, z(ej)’s positive elements must total to at least 2. A single positive element of zi(ej)=1 would imply that ϕi = ϕj, violating a condition on Φ.

With P = N, let us concatenate the z(ej) column vectors into a matrix for visualization.

Z=[1z1(e2)z1(eN)z2(e1)1z2(eN)zN(e1)zN(e2)1]. (22)

Note that each column z(ej) in this matrix is symbolic for any vector that satisfies the conditions we described. All columns of Z are in N(Φ). Any linear combination of vectors in N(Φ) are in N(Φ). Let S represent a subset of indices of the columns of Z and let zSjSz(ej).

First, let S={1,,N}. Because all off-diagonal components in Z are nonnegative and because zS must have one negative value, one of the rows of Z must be entirely zero except for the −1 on the diagonal. Note the ordering of the columns in Z is arbitrary, so without loss of generality, let this be the first row. Now, let’s say that S={2,3,,N}. The same logic holds: at least one row must contain all zeros except for the −1. Without loss of generality, we can set [Z]2,3, [Z]2,4, … [Z]2, N = 0. Repeating this process, we get a lower triangular matrix:

Z=[1000z2(e1)100z3(e1)z3(e2)10zN(e1)zN(e2)zN(e3)1]. (23)

However, examining the final column, we see that z(eN) is a vector of all zeros and one −1, such that it cannot be in N(Φ), proving Lemma A.3 by contradiction. □

Proof of Theorem III.2: Lemma A.3 confirms P < N, meaning that we can form the concatenated matrix of zej vectors:

Z=[100z2(e1)10zP(e1)zP(e2)1zN(e1)zN(e2)zN(eP)]. (24)

Let us now apply P(CePλ)=P(CePλ). For all xCeP,

n=1Nλnxnxn!n=1Nλnxnxn!=0, (25)
(i>Pλixixi!)(λPxPxP!λPxPxP!)=0, (26)

where Lemma A.1 (nλn=nλn) yields the first equality,and Lemma A.2 (all λi=λni>P) yields the second equality when combined with the fact that xi = 0 for i < P due to (24). The only xCeP with xP ≠ 0 is eP which simplifies (26) to λP=λP.

Now we have λi=λii>P1. Following the same arguments, we can start from P(CeP1λ)=P(CeP1λ) and arrive at λP1=λP1. Applying this repeatedly ultimately yields λ = λ′, proving Theorem III.2. □

B. Proof of Proposition III.5

Proof. By straightforward integration,

Eyn[p(yn)]exp{14σ2ϕnϕn22}κ(n,n). (27)

Therefore, given the constraints ∥λ1 ≤ 1 and λn ≥ 0, the first-order KKT condition is

En[κ(n)κ(n),λ]=csμ, (28)

where κ(n)=(κ(n,n))n=0N, sλ1, c ≥ 0, and μn ≥ 0. By complementary slackness, μnλn = 0 for all n. Because K is symmetric, we can rewrite the above as

K(λKλ)=csμ, (29)

where the fraction represents element-wise division. Solving for λ and rescaling μ, we obtain the desired expression. □

C. Proof of Theorem III.6

Proof. Let κg be defined the same as κ from (27) with ϕn(g). Then again by straightforward integration,

κ~(n,n)Eg[κg(n,n)] (30)
={1n=n,(2σ22σ2+1)M2nn,0{n,n},(σ2σ2+1)M2nn,0{n,n}.} (31)

Let K~=(κ~(n,n))n,n=0N, and let K^=(κ~(n,n))n,n=1N be the sub-matrix of K~ excluding n = 0. Then

K^=(1a)I+aJ, (32)

where a=(σ2σ2+1)M2 and J is a matrix of all ones.Leveraging the block matrix inverse, we observe that we have the form

K~1=[11aI+bJc1c1Td], (33)

assuming the final column corresponds to n = 0, for some scalars b, c, and d. Using the formula from Proposition III.5 and the fact that sμ = 1 by assumption, we conclude that for n > 0, λ~nλn+C for some C. Rote algebra verifies that the constant of proportionality is non-negative. □

Footnotes

1

The Taylor expansion is of f (u, v) = u/v, for which a first-order approximation yields E[UV]E[U]E[V] for random variables U, V.

2

Our full code base is available at https://github.com/pavankkota/SPoRe with instructions on how to implement SPoRe with custom models.

Contributor Information

Pavan K. Kota, Department of Bioengineering, Rice University, Houston, TX 77005 USA.

Daniel LeJeune, Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA..

Rebekah A. Drezek, Department of Bioengineering, Rice University, Houston, TX 77005 USA.

Richard G. Baraniuk, Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA..

References

  • [1].Baraniuk RG, “Compressive sensing [lecture notes],” IEEE Signal Process. Mag, vol. 24, no. 4, pp. 118–121, July 2007. [Google Scholar]
  • [2].Eldar YC and Kutyniok G, Compressed Sensing: Theory and Applications, 1st ed. Cambridge University Press, May 2012. [Google Scholar]
  • [3].Chen SS, Donoho DL, and Saunders MA, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput, vol. 20, no. 1, pp. 33–61, 1998. [Google Scholar]
  • [4].Donoho DL, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [Google Scholar]
  • [5].Candes EJ and Tao T, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [Google Scholar]
  • [6].Cohen A, Dahmen W, and Devore R, “Compressed sensing and best k-term approximation,” J. Am. Math. Soc, vol. 22, no. 1, pp. 211–231, Jan. 2009. [Google Scholar]
  • [7].Chen J and Huo X, “Sparse representations for multiple measurement vectors (MMV) in an over-complete dictionary,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 4, 2005, pp. 257–260. [Google Scholar]
  • [8].__,“Theoretical results on sparse representations of multiple-measurement vectors,” IEEE Trans. Signal Process, vol. 54, no. 12, pp. 4634–4643, Dec. 2006. [Google Scholar]
  • [9].Tropp JA, Gilbert AC, and Strauss MJ, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,” Signal Processing, vol. 86, no. 3, pp. 572–588, Mar. 2006. [Google Scholar]
  • [10].Duarte MF, Davenport MA, Takhar D, Laska JN, Sun T, Kelly KF, and Baraniuk RG, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag, vol. 25, no. 2, pp. 83–91, Mar. 2008. [Google Scholar]
  • [11].Willett RM, Marcia RF, and Nichols JM, “Compressed sensing for practical optical imaging systems: a tutorial,” Opt. Eng, vol. 50, no. 7, July 2011. [Google Scholar]
  • [12].Lustig M, Donoho DL, Santos JM, and Pauly JM, “Compressed sensing MRI,” IEEE Signal Process. Mag, vol. 25, no. 2, pp. 72–82, Mar. 2008. [Google Scholar]
  • [13].Li S, Xu LD, and Wang X, “Compressed sensing signal and data acquisition in wireless sensor networks and internet of things,” IEEE Trans. Ind. Informat, vol. 9, no. 4, pp. 2177–2186, Nov. 2013. [Google Scholar]
  • [14].Zhu H and Giannakis GB, “Exploiting sparse user activity in multiuser detection,” IEEE Trans. Commun, vol. 59, no. 2, pp. 454–465, Feb. 2011. [Google Scholar]
  • [15].Ragheb T, Laska JN, Nejati H, Kirolos S, Baraniuk RG, and Massoud Y, “A prototype hardware for random demodulation based compressive analog-to-digital conversion,” in 51st Midwest Symposium on Circuits and Systems, vol. 4, 2008, pp. 37–40. [Google Scholar]
  • [16].Cleary B, Cong L, Cheung A, Lander ES, and Regev A, “Efficient generation of transcriptomic profiles by random composite measurements,” Cell Theory, vol. 171, no. 6, pp. 1424–1436.e18, Nov. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Koslicki D, Foucart S, and Rosen G, “WGSQuikr: Fast whole-genome shotgun metagenomic classification,” PLoS ONE, vol. 9, no. 3, p. e91784, Mar. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Aghazadeh A, Lin AY, Sheikh MA, Chen AL, Atkins LM, J. F. P. Johnson Coreen L., Drezek RA, and Baraniuk RG, “Universal microbial diagnostics using random DNA probes,” Sci. Adv, vol. 2, no. 9, p. e1600025, Sept. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Ghosh S, Agarwal R, Rehan MA, Pathak S, Agrawal P, Gupta Y, Consul S, Gupta N, Goyal R, Rajwade A, and Gopalkrishnan M, “A compressed sensing approach to group-testing for covid-19 detection,” May 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Guo MT, Rotem A, Heyman JA, and Weitz DA, “Droplet microfluidics for high-throughput biological assays,” Lab Chip, vol. 12, pp. 2146–2155, Feb. 2012. [DOI] [PubMed] [Google Scholar]
  • [21].Hosokawa M, Nishikawa Y, Kogawa M, and Takeyama H, “Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics,” Sci. Rep, vol. 7, no. 5199, July 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Mazutis L, Gilbert J, Ung WL, Weitz DA, Griffiths AD, and Heyman JA, “Single-cell analysis and sorting using droplet-based microfluidics,” Nat. Protoc, vol. 8, no. 5, pp. 870–891, Apr. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Vogelstein B and Kinzler KW, “Digital PCR,” PNAS, vol. 96, no. 16, pp. 9236–9241, Aug. 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Kim SH, Iwai S, Araki S, Sakakihara S, Iino R, and Noji H, “Large-scale femtoliter droplet array for digital counting of single biomolecules,” Lab Chip, vol. 12, no. 23, pp. 4986–4991, Dec. 2012. [DOI] [PubMed] [Google Scholar]
  • [25].Yelleswarapu V, Buser JR, Haber M, Baron J, Inapuri E, and Issadore D, “Mobile platform for rapid sub–picogram-per-milliliter , multiplexed, digital droplet detection of proteins,” PNAS, vol. 116, no. 10, pp. 4489–4495, Feb. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Basu AS, “Digital assays part I: Partitioning statistics and digital PCR,” SLAS Technology, vol. 22, no. 4, pp. 369–386, Aug. 2017. [DOI] [PubMed] [Google Scholar]
  • [27].Moon S, Ceyhan E, Gurkan UA, and Demirci U, “Statistical modeling of single target cell encapsulation,” PLoS ONE, vol. 6, no. 7, p. e21580, July 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Velez DO, Mack H, Jupe J, Hawker S, Kulkarni N, Hedayatnia B, Zhang Y, Lawrence S, and Fraley SI, “Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling,” Sci. Rep, vol. 7, no. 42326, Feb. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Peters BM, Jabra-Rizk MA, O’May GA, Costerton JW, and Shirtliff ME, “Polymicrobial interactions: impact on pathogenesis and human disease,” Clin. Microbiol. Rev, vol. 25, no. 1, pp. 193–213, Jan. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Vural S, Wang X, and Guda C, “Classification of breast cancer patients using somatic mutation profiles and machine learning approaches,” BMC Syst. Biol, vol. 10, no. 62, pp. 263–276, Aug. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Raginsky M, Willett RM, Harmany ZT, and Marcia RF, “Compressed sensing performance bounds under poisson noise,” IEEE Trans. Signal Process, vol. 58, no. 8, pp. 3990–4002, Aug. 2010. [Google Scholar]
  • [32].Harmany ZT, Marcia RF, and Willett RM, “This is SPIRAL-TAP: Sparse poisson intensity reconstruction ALgorithms—theory and practice,” IEEE Trans. Image Process, vol. 21, no. 3, pp. 1084–1096, Mar. 2012. [DOI] [PubMed] [Google Scholar]
  • [33].Suea-Ngam A, Howes PD, Srisa-Art M, and deMello AJ, “Droplet microfluidics: from proof-of-concept to real-world utility?” Chem. Commun, vol. 55, no. 67, pp. 9895–9903, July 2019. [DOI] [PubMed] [Google Scholar]
  • [34].Orth A, Ghosh RN, Wilson ER, Doughney T, Brown H, Reineck P, Thompson JG, and Gibson BC, “Super-multiplexed fluorescence microscopy via photostability contrast,” Biomed. Opt. Express, vol. 9, no. 7, pp. 2943–2954, July 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Zhu Y and Fang Q, “Analytical detection techniques for droplet microfluidics - a review,” Anal. Chim. Acta, vol. 787, no. 17, pp. 24–35, July 2013. [DOI] [PubMed] [Google Scholar]
  • [36].Pati YC, Rezaiifar R, and Krishnaprasad PS, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, vol. 1, 1993, pp. 40–44. [Google Scholar]
  • [37].Kim D and Haldar JP, “Greedy algorithms for nonnegativity-constrained simultaneous sparse recovery,” Signal Processing, vol. 125, pp. 274–289, Aug. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Dai W and Milenkovic O, “Weighted superimposed codes and constrained integer compressed sensing,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2215–2229, May 2009. [Google Scholar]
  • [39].Fukshansky L, Needell D, and Sudakov B, “An algebraic perspective on integer sparse recovery,” Appl. Math. Comput, vol. 340, no. 1, pp. 31–42, Jan. 2019. [Google Scholar]
  • [40].Nakarmi U and Rahnavard N, “BCS: Compressive sensing for binary sparse signals,” in IEEE Military Communications Conf., 2012, pp. 1–5. [Google Scholar]
  • [41].Tian Z, Leus G, and Lottici V, “Detection of sparse signals under finite-alphabet constraints,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 2009, pp. 2349–2352. [Google Scholar]
  • [42].Draper SC and Malekpour S, “Compressed sensing over finite fields,” in IEEE Int. Symp. Inf. Theory, June 2009, pp. 669–673. [Google Scholar]
  • [43].Shim B, Kwon S, and Song B, “Sparse detection with integer constraint using multipath matching pursuit,” IEEE Commun. Lett, vol. 18, no. 10, pp. 1851–1854, Oct. 2014. [Google Scholar]
  • [44].Lange J-H, Pfetsch ME, Seib BM, and Tillmann AM, “Sparse recovery with integrality constraints,” Discrete Appl. Math, vol. 283, pp. 346–366, Sept. 2020. [Google Scholar]
  • [45].Flinth A and Kutyniok G, “PROMP: A sparse recovery approach to lattice-valued signals,” Appl. Comput. Harmon. Anal, vol. 45, pp. 668–708, Mar. 2017. [Google Scholar]
  • [46].Sparrer S and Fischer RFH, “MMSE-based version of OMP for recovery of discrete-valued sparse signals,” Electron. Lett, vol. 52, no. 1, pp. 75–77, Jan. 2016. [Google Scholar]
  • [47].Alam M and Zhang Q, “A survey: Non-orthogonal multiple access with compressed sensing multiuser detection for mMTC,” 2018. [Google Scholar]
  • [48].Ji Y, Bockelmann C, and Dekorsky A, “Compressed sensing based multi-user detection with modified sphere detection in machine-to-machine communications,” in SCC 2015; 10th International ITG Conference on Systems, Communications and Coding, 2015, pp. 1–6. [Google Scholar]
  • [49].Monsees F, Bockelmann C, and Dekorsky A, “Reliable activity detection for massive machine to machine communication via multiple measurement vector compressed sensing,” in IEEE Globecom Workshops (GC Wkshps), Dec. 2014, pp. 1057–1062. [Google Scholar]
  • [50].Liu J, Cheng H-Y, Liao C-C, and Wu A-YA, “Scalable compressive sensing-based multi-user detection scheme for internet-of-things applications,” in IEEE Workshop on Signal Processing Systems (SiPS), 2015, pp. 1–6. [Google Scholar]
  • [51].Alam M and Zhang Q, “Enhanced compressed sensing based multiuser detection for machine type communication,” in IEEE Wireless Communications and Networking Conference, WCNC, 2018, pp. 1–6. [Google Scholar]
  • [52].Davies ME and Eldar YC, “Rank awareness in joint sparse recovery,” IEEE Trans. Inf. Theory, vol. 58, no. 2, pp. 1135–1146, 2012. [Google Scholar]
  • [53].Balkan O, Kreutz-Delgado K, and Makeig S, “Localization of more sources than sensors via jointly-sparse bayesian learning,” IEEE Signal Process. Lett, vol. 21, no. 2, pp. 131–134, 2014. [Google Scholar]
  • [54].Pal P and Vaidyanathan PP, “Pushing the limits of sparse support recovery using correlation information,” IEEE Trans. Signal Process, vol. 63, no. 3, pp. 711–726, 2015. [Google Scholar]
  • [55].Koochakzadeh A, Qiao H, and Pal P, “On fundamental limits of joint sparse support recovery using certain correlation priors,” IEEE Trans. Signal Process, vol. 66, no. 17, pp. 4612–4625, 2018. [Google Scholar]
  • [56].Tropp JA, “Algorithms for simultaneous sparse approximation. Part II: Convex relaxation,” Signal Processing, vol. 86, no. 3, pp. 589–602, Mar. 2006. [Google Scholar]
  • [57].Dempster AP, Laird NM, and Rubin DB, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Series B, vol. 39, no. 1, pp. 1–38, 1977. [Google Scholar]
  • [58].Chen J and Luss R, “Stochastic gradient descent with biased but consistent gradient estimators,” 2019. [Google Scholar]
  • [59].Teicher H, “Identifiability of finite mixtures,” Ann. Math. Statist, vol. 34, no. 4, pp. 1265–1269, Dec. 1963. [Google Scholar]
  • [60].Tallis GM, “The identifiability of mixtures of distributions,” J. Appl. Prob, vol. 6, no. 2, pp. 389–398, , 1969. [Google Scholar]
  • [61].Yakowitz SJ and Spragins JD, “On the identifiability of finite mixtures,” Ann. Math. Statist, vol. 39, no. 1, pp. 209–214, , 1968. [Google Scholar]
  • [62].Yang L and Wu X, “A new sufficient condition for identifiability of countably infinite mixtures,” Metrika, vol. 77, pp. 377–387, May 2013. [Google Scholar]
  • [63].Zhang DY, Chen SX, and Yin P, “Optimizing the specificity of nucleic acid hybridization,” Nat. Chem, vol. 4, pp. 208–214, Jan. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Baron D, Duarte MF, Wakin MB, Sarvotham S, and Baraniuk RG, “Distributed compressive sensing,” Jan. 2009. [Google Scholar]
  • [65].Dai W and Milenkovic O, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2230–2249, May 2009. [Google Scholar]
  • [66].Needell D and Tropp JA, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Appl. Comput. Harmon. Anal, vol. 26, no. 3, pp. 301–321, May 2009. [Google Scholar]
  • [67].Grant M and Boyd S, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014. [Google Scholar]
  • [68].__, “Graph implementations for nonsmooth convex programs,” in Recent Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences, Blondel V, Boyd S, and Kimura H, Eds. Springer-Verlag Limited, 2008, pp. 95–110, http://stanford.edu/~boyd/graph_dcp.html. [Google Scholar]
  • [69].Land AH and Doig AG, “An automatic method of solving discrete programming problems,” Econometrica, vol. 28, no. 3, pp. 497–520, July 1960. [Google Scholar]

RESOURCES