Skip to main content
IEEE - PMC COVID-19 Collection logoLink to IEEE - PMC COVID-19 Collection
. 2021 Apr 27;2:248–264. doi: 10.1109/OJSP.2021.3075913

A Compressed Sensing Approach to Pooled RT-PCR Testing for COVID-19 Detection

Sabyasachi Ghosh 1, Rishi Agarwal 1, Mohammad Ali Rehan 1, Shreya Pathak 1, Pratyush Agarwal 1, Yash Gupta 1, Sarthak Consul 2, Nimay Gupta 1, Ritika 1, Ritesh Goenka 1, Ajit Rajwade 1,, Manoj Gopalkrishnan 2
PMCID: PMC8545028  PMID: 34812422

Abstract

We propose ‘Tapestry’, a single-round pooled testing method with application to COVID-19 testing using quantitative Reverse Transcription Polymerase Chain Reaction (RT-PCR) that can result in shorter testing time and conservation of reagents and testing kits, at clinically acceptable false positive or false negative rates. Tapestry combines ideas from compressed sensing and combinatorial group testing to create a new kind of algorithm that is very effective in deconvoluting pooled tests. Unlike Boolean group testing algorithms, the input is a quantitative readout from each test and the output is a list of viral loads for each sample relative to the pool with the highest viral load. For guaranteed recovery of Inline graphic infected samples out of Inline graphic being tested, Tapestry needs only Inline graphic tests with high probability, using random binary pooling matrices. However, we propose deterministic binary pooling matrices based on combinatorial design ideas of Kirkman Triple Systems, which balance between good reconstruction properties and matrix sparsity for ease of pooling while requiring fewer tests in practice. This enables large savings using Tapestry at low prevalence rates while maintaining viability at prevalence rates as high as 9.5%. Empirically we find that single-round Tapestry pooling improves over two-round Dorfman pooling by almost a factor of 2 in the number of tests required. We evaluate Tapestry in simulations with synthetic data obtained using a novel noise model for RT-PCR, and validate it in wet lab experiments with oligomers in quantitative RT-PCR assays. Lastly, we describe use-case scenarios for deployment.

Keywords: Compressed sensing, coronavirus, COVID-19, group testing, Kirkman/Steiner triples, mutual coherence, pooled testing, sensing matrix design

I. Introduction

The coronavirus disease of 2019 (COVID-19) crisis has led to widespread lockdowns in several countries, and has had a major negative impact on the economy. Early identification of infected individuals can enable quarantining of the individuals and thus control the spread of the disease. Such individuals may often be asymptomatic for many days. Widespread testing with the RT-PCR (reverse transcription polymerase chain reaction) method can help identify the infected individuals. However, widespread testing is not an available option in many countries due to constraints on resources such as testing time (Inline graphic hours per round), basic equipment, skilled manpower and reagents.

The current low rate of COVID-19 infection in the world population [1] means that most samples tested are not infected, so that most tests are wasted on uninfected samples. Group testing is a process of pooling together samples of Inline graphic different people into multiple pools, and testing the pools instead of each individual sample. A negative result on a pool implies that all samples participating in it were negative. This saves a huge amount of testing resources, especially with low infection rates. Group testing for medical applications has a long history dating back to the 1940 s when it was proposed for testing of blood samples for syphilis [2]. Simple two-round group testing schemes have already been applied in the field by several research labs [3], [4] for COVID-19 testing. Such two-round group testing schemes require pooling of samples and a second round of sample handling for all samples in positive pools. This second round of sample handling can increase the time to result and be laborious to perform since it requires the technician to wear PPE one more time, do another round of RNA extraction, and PCR. In situations where the result needs to be delivered fast, a second round of sample handling and testing must be avoided. In such situations, these schemes are less attractive.

We present Tapestry, a novel combination of ideas from combinatorial group testing and compressed sensing (CS) [5] which uses the quantitative output of PCR tests to reconstruct the viral load of each sample in a single round. Tapestry has been validated with wet lab experiments with oligomers [6]. In this work, we elaborate on the results from the algorithmic perspective for the computer science and signal processing communities. Tapestry has a number of salient features which we enumerate below.

  • 1)

    Tapestry delivers results in a single round of testing, without the need for a second confirmatory round, at clinically acceptable false negative and false positive rates. The number Inline graphic of required tests is only Inline graphic for random binary pooling matrix constructions, as per compressed sensing theory for random binary matrices [7]. In the targeted use cases where the number of infected samples Inline graphic, we see that Inline graphic. However, our deterministic pooling matrix constructions based on Kirkman Triple Systems [8], [9] require fewer tests in practice (see Section III-F8 for a discussion on why this may be the case). Consequently we obtain significant savings in testing time and resources such as number of tests, quantity of reagents, and manpower.

  • 2)

    Tapestry reconstructs relative viral loads i.e., ratio of viral amounts in each sample to the highest viral amount across pools. It is believed that super-spreaders and people with severe symptoms have higher viral load [10], [11], so this quantitative information might have epidemiological relevance.

  • 3)

    Tapestry takes advantage of quantitative information in PCR tests. Hence it returns far fewer false positives than traditional binary group testing algorithms such as Comp (Combinatorial Orthogonal Matching Pursuit) [12], while maintaining clincally acceptable false negative rates. Furthermore, it takes advantage of the fact that a negative pool has viral load exactly zero. Traditional CS algorithms do not take advantage of this information. Hence, Tapestry demonstrates better sensitivity and specificity than CS algorithms.

  • 4)

    Because each sample is tested in three pools, Tapestry can detect some degree of noise in terms of cross-contamination of samples and pipetting errors.

  • 5)

    Tapestry allows PCR test measurements to be noisy. We develop a novel noise model to describe noise in PCR experiments. Our algorithms are tested on this noise model in simulation.

  • 6)

    All tuning parameters for execution of the algorithms are inferred on the fly in a data driven fashion.

  • 7)

    Each sample contributes to exactly three pools, and each pool has the same number of samples. This simplifies the experimental design, conserves samples, keeps pipetting overhead to a minimum, and makes sure that dilution due to pool size is in a manageable regime.

The organization of the paper is as follows. We first present a brief overview of the RT-PCR method in Section II. The precise mathematical definition of the computational problem being solved in this paper is then put forth in Section III-A. We describe traditional and CS-based group-testing algorithms for this problem in Section III-B, III-C and III-D. The Tapestry method is described in Section III-D. The sensing matrix design problem, as well as theoretical guarantees using Kirkman Triple Systems or random binary matrices, are described in Section III-F. Results on synthetic data are presented in Section IV. This is followed by results on data from lab experiments performed with oligomers to mimic the clinical situation as closely as possible. In Section V, we compare our work to recent related approaches. We conclude in Section VI with a glance through different scenarios where our work could be deployed. The supplemental material contains several additional experimental details as well as proofs of some theoretical results.

II. RT-PCR Method

We present here a brief summary of the RT-PCR process, referring to [13] for more details. In the RT-PCR method for COVID-19 testing, a sample in the form of naso- or oro-pharyngeal swabs is collected from a patient. The sample is then dispersed into a liquid medium. The RNA molecules of the virus present in this liquid medium are converted into complementary DNA (cDNA) via a process called reverse transcription. DNA fragments called primers complementary to cDNA from the viral genome are then added. They attach themselves to specific sections of the cDNA from the viral genome if the virus is present in the sample. The cDNA of these specific viral genes then undergoes a process of exponential amplification in an RT-PCR machine. Here, cDNA is put through several cycles of alternate heating and cooling in the presence of Taq polymerase and appropriate reagents. This triggers the creation of many new identical copies of specific portions of the target DNA, roughly doubling in number with every cycle of heating and cooling. The reaction volume contains sequence-specific fluorescent markers which report on the total amount of amplified DNA of the appropriate sequence. The resulting fluorescence is measured, and the increase can be observed on a computer screen in real time. The time when the amount of fluorescence exceeds the threshold level is known as the threshold cycle Inline graphic, and is a quantitative readout from the experiment. A smaller Inline graphic indicates greater number of copies of the virus. Usually Inline graphic takes values anywhere between 16 to 32 cycles in real experiments. PCR can detect even single molecules. A single molecule typically would have Inline graphic value of around 40 cycles. A typical RT-PCR setup can test 96 samples in parallel. The test takes about 3-4 hours to execute.

III. Testing Methods

A. Statement of the Computational Problem

Let Inline graphic denote a vector of Inline graphic elements where Inline graphic is the viral load (i.e. viral amount) of the Inline graphic person. Throughout this paper we assume that only one sample per person is extracted. Hence Inline graphic contains the viral loads corresponding to Inline graphic different people. Note that Inline graphic implies that the Inline graphic person is not infected. Due to the low infection rate for COVID-19 as yet even in severely affected countries [1], Inline graphic is considered to be a sparse vector with at the most Inline graphic positive-valued elements. In group testing, small and equal volumes of the samples of a subset of these Inline graphic people are pooled together according to a sensing or pooling matrix Inline graphic whose entries are either 0 or 1. The viral loads of the pools will be given by:

A.

where Inline graphic if a portion of the sample of the Inline graphic person is included in the Inline graphic pool, and Inline graphic is the Inline graphic row of Inline graphic. In all, some Inline graphic pools are created and individually tested using RT-PCR. We now have the relationship Inline graphic, where Inline graphic is the Inline graphic-element vector of viral loads in the mixtures, and Inline graphic denotes a Inline graphic binary ‘pooling matrix’ (also referred to as a ‘sensing matrix’ in CS literature). Note that each positive RT-PCR test will yield a noisy version of Inline graphic, which we refer to as Inline graphic. The relation between the ‘clean’ and noisy versions is given as follows (also see Eqn. (7)):

A.

where Inline graphic and Inline graphic is the fraction of viral cDNA that replicates in each cycle. The factor Inline graphic reflects the stochasticity in the growth of the numbers of DNA molecules during PCR. Here Inline graphic is known and constant. Equivalently for positive tests, we have:

A.

In case of negative tests, Inline graphic as well as Inline graphic are 0-valued, and no logarithms need be computed. In non-adaptive group testing, the core computational problem is to estimate Inline graphic given Inline graphic and Inline graphic without requiring any further pooled measurements. It should be noted that though we have treated each element of Inline graphic to be a fixed quantity, it is in reality a random variable of the form Inline graphic where Inline graphic. If matrix Inline graphic contains only ones and zeros, this implies that Inline graphic because the sum of Poisson random variables is also a Poisson random variable.

1). Derivation of Noise Model

For a positive pool Inline graphic, the quantitative readout from RT-PCR is not its viral load but the observed cycle time Inline graphic when its fluorescence reaches a given threshold Inline graphic (see Section II). In order to be able to apply CS techniques (see Section III-C), we derive a relationship between the cycle time of a sample and its viral load. Because of exponential growth (see [14]), the number of molecules of viral cDNA in pool Inline graphic at cycle time Inline graphic, denoted by Inline graphic is given by:

1).

Also, Inline graphic is a real number, with Inline graphic indicating the number of PCR cycles that have passed, and Inline graphic indicating the fraction of wall-clock time within the current cycle. The fluorescence of the pool, Inline graphic, is directly proportional to the number of virus molecules Inline graphic. That is,

1).

where Inline graphic is a constant of proportionality. Suppose the fluorescence of pool Inline graphic should reach the threshold value Inline graphic at cycle time Inline graphic, according to Eqn. (5). Due to the stochastic nature of the reaction, as well as measurement error in the PCR machine, the threshold cycle output by the machine will not reflect this true cycle time. We model this discrepancy as Gaussian noise. Hence, the true cycle time Inline graphic and the observed cycle time Inline graphic are related as Inline graphic, where Inline graphic as before. Now, since Inline graphic, using Eqn. (5), we have

1).

The latter equality is since we use the noisy cycle threshold Inline graphic to compute viral load, where Inline graphic is defined to be the noisy viral load of pool Inline graphic. Hence we find

1).

obtaining the relationship from Eqn. (2).

Constants Inline graphic and Inline graphic are unknown. Hence it is not possible to directly obtain Inline graphic from Inline graphic without additional machine-specific calibration. However, we can find the ratio between the noisy viral loads of two pools using Eqn. (6). Let Inline graphic be the noisy viral load of the pool with the minimum observed threshold cycle (Inline graphic) among all pools. Then we define relative viral loads as:

1).

where Inline graphic is the relative viral load of a pool, Inline graphic is its noisy version, and Inline graphic is the vector of relative viral loads of each sample. We note that due to Eqn. (7), the following relation holds:

1).

Hence we can apply CS techniques from Section III-C to determine the relative magnitudes of viral loads without knowing Inline graphic and Inline graphic. We provide more comments about the settings of various noise model parameters for our experiments, in Section IV, particularly in Section IV-A6.

B. Combinatorial Group-Testing

Combinatorial Orthogonal Matching Pursuit (Comp) is a Boolean nonadaptive group testing method [15, Sec. 2.3]. Here one uses the simple idea that if a mixture Inline graphic tests negative then any sample Inline graphic for which Inline graphic must be negative. Note that pools which test negative are regarded as noiseless observations, as argued in Section III-A1. The other samples are all considered to be positive. This algorithm guarantees that there are no ‘false negatives’. However it can produce a very large number of ‘false positives’. For example, a sample Inline graphic will be falsely reported to be positive if every mixture Inline graphic it is part of, also contains at least one other genuinely positive sample. The Comp algorithm is largely insensitive to noise. Moreover a small variant of it can also produce a list of ‘high confidence positives,’ after identifying the (sure) negatives. This happens when a positive mixture Inline graphic contains only one sample Inline graphic, not counting the other samples which were declared sure negatives in the earlier step. Such a step of identifying ‘high confidence positives’ is included in the so-called Definite Defectives (Dd) Algorithm [15, Sec. 2.4]. However Dd labels all remaining items to be negative, potentially leading to a large number of false-negatives. The performance guarantees for Comp have been analyzed in [12] and show that Comp requires Inline graphic tests for an error probability less than Inline graphic (see Section III-F8). This analysis has been extended to include the case of noisy test results as well [12]. However Comp can result in a large number of false positives if not enough tests are used, and it also does not predict viral loads.

C. Compressed Sensing for Pooled Testing

Group testing is intimately related to the field of compressed sensing (CS) [16], which has emerged as a significant sub-area of signal and image processing [5], with many applications in biomedical engineering [17][19]. In CS, an image or a signal Inline graphic with Inline graphic elements, is directly acquired in compressed format via Inline graphic linear measurements of the form Inline graphic. Here, the measurement vector Inline graphic has Inline graphic elements, and Inline graphic is a matrix of size Inline graphic, and Inline graphic is a vector of noise values. If Inline graphic is a sparse vector with Inline graphic non-zero entries, and Inline graphic obeys the so-called restricted isometry property (RIP), then exact recovery of Inline graphic from Inline graphic is possible [20] if Inline graphic. In the case of measurement noise, the recovery of Inline graphic produces a solution that is provably close to the original Inline graphic. A typical recovery problem P0 consists of optimizing the following cost function:

C.

where Inline graphic is an upper bound (possibly a high probability upper bound) on Inline graphic, and Inline graphic is the number of non-zero elements in Inline graphic. In the absence of noise, a unique and exact solution to this problem is possible with as few as Inline graphic measurements in Inline graphic if Inline graphic has Inline graphic non-zero elements [20]. Unfortunately, this optimization problem P0 is NP-Hard and the algorithm requires brute-force subset enumeration. Instead, the following problem P1 (often termed ‘Basis Pursuit Denoising’ or Bpdn) is solved in practice:

C.

P1 is a convex optimization problem which yields the same solution as the earlier problem (with similar conditions on Inline graphic) at significantly lower computational cost, albeit with Inline graphic measurements (i.e. typically greater than Inline graphic[5], [20].

The order Inline graphic restricted isometry constant (RIC) of a matrix Inline graphic is defined as the smallest constant Inline graphic, for which the following relationship holds for all Inline graphic-sparse vectors Inline graphic (i.e. all vectors with at the most Inline graphic non-zero entries): Inline graphic. The matrix Inline graphic is said to obey the order Inline graphic restricted isometry property (RIP) if Inline graphic is close to 0. This property essentially implies that no Inline graphic-sparse vector (other than the zero vector) can lie in the null-space of Inline graphic. Unique recovery of Inline graphic-sparse signals requires that no Inline graphic-sparse vector lies in the nullspace of Inline graphic [20]. A matrix Inline graphic which obeys RIP of order Inline graphic satisfies this property. It has been proved that matrices with entries randomly and independently drawn from distributions such as Rademacher or Gaussian, obey the RIP of order Inline graphic with high probability [21], provided they have at least Inline graphic rows. There also exist deterministic binary sensing matrix designs (e.g. [22]) which require Inline graphic measurements. However it has been shown recently [23] that the constant factors in the deterministic case are significantly smaller than those in the former random case when Inline graphic, making the deterministic designs more practical for typically encountered problem sizes. The solution to the optimization problems P0 and P1 in Eqns. (10) and (11) respectively, are provably robust to noise [5], and the recovery error decreases with decrease in noise magnitude. The error bounds for P0 in Eqn. (10) are of the form, for solution Inline graphic [24]:

C.

whereas those for P1 in Eqn. (11) have the form [24]:

C.

Here Inline graphic is a monotonically increasing function of Inline graphic and has a small value in practice.

The Restricted Isometry Property as defined above is also known as RIP-2, because it uses the Inline graphic-norm. Many other sufficient conditions for recovery of Inline graphic-sparse vectors exist. We define the following which we use later in Section III-F and supplemental Section S.V to prove theoretical guarantees of our method.

Definition 1: —

RIP-1: [25, Defn. 8] A Inline graphic matrix Inline graphic is said to obey RIP-1 of order Inline graphic if Inline graphic Inline graphic such that for all Inline graphic-sparse vectors Inline graphic,

graphic file with name M171.gif

.

Definition 2: —

RNSP: [23, Eqn. 12] A Inline graphic matrix Inline graphic is said to obey the Robust Nullspace Property (RNSP) of order Inline graphic if Inline graphic Inline graphic and Inline graphic such that for all Inline graphic it holds that

graphic file with name M179.gif

for all Inline graphic with Inline graphic.

Definition 3: —

Inline graphic-RNSP: [7, Defn. 1] A Inline graphic matrix Inline graphic is said to obey the Inline graphic-robust Nullspace Property (Inline graphic-RNSP) of order Inline graphic if Inline graphic Inline graphic and Inline graphic such that for all Inline graphic it holds that

graphic file with name M192.gif

for all Inline graphic with Inline graphic.

Over the years, a variety of different techniques for compressive recovery have been proposed. We use some of these for our experiments in Section III-D. These algorithms use different forms of sparsity and incorporate different types of constraints on the solution.

D. CS and Traditional GT Combined

Algorithm 1: Tapestry Method.

  • 1:

    Input: Inline graphic samples, Inline graphic pooling matrix Inline graphic

  • 2:

    Perform pooling according to pooling matrix Inline graphic and create Inline graphic pooled samples

  • 3:

    Run RT-PCR test on these Inline graphic pooled samples and receive Inline graphic vector of cycle threshold values Inline graphic

  • 4:

    Compute Inline graphic vector of relative viral loads Inline graphic from Inline graphic

  • 5:

    Use Comp to filter out negative tests and sure negative samples. Compute submatrix Inline graphic, Inline graphic and list Inline graphic of ‘high-confidence positives’ along with their viral loads (see Section III-B).

  • 6:

    Use a CS decoder to recover relative viral loads Inline graphic from Inline graphic

  • 7:

    Compute Inline graphic relative viral load vector Inline graphic by setting its entries from Inline graphic, and setting remaining entries to 0.

  • 8:

    return Inline graphic, Inline graphic.

The complete pipeline of the Tapestry method is presented in Algorithm 1. First, a wet lab technician performs pooling of Inline graphic samples into Inline graphic pools according to a Inline graphic pooling matrix Inline graphic. Then they run the RT-PCR test on these Inline graphic pools (in parallel). The output of the RT-PCR tests – the threshold cycle (Inline graphic) values of each pool – is processed to find the relative viral load vector Inline graphic of the Inline graphic pools (as shown in Eqn. (8)). This is given as input to the Tapestry decoding algorithm, which outputs a sparse relative viral load vector Inline graphic.

The Tapestry decoding algorithm, our approach toward group-testing for COVID-19, involves a two-stage procedure.1 In the first stage, we apply the Comp algorithm described in Section III-B, to identify the sure negatives (if any) in Inline graphic to form a set Inline graphic. Let Inline graphic be the set of zero-valued measurements in Inline graphic (i.e. negative tests). Please refer to Section III-A1 for the definition of Inline graphic. Moreover, we define Inline graphic as the complement-sets of Inline graphic respectively. Also, let Inline graphic be the vector of Inline graphic measurements which yielded a positive result. Let Inline graphic be the vector of Inline graphic samples, which does not include the Inline graphic surely negative samples. Let Inline graphic be the submatrix of Inline graphic, having size Inline graphic, which excludes rows corresponding to zero-valued measurements in Inline graphic and columns corresponding to negative elements in Inline graphic. In the second stage, we apply a CS algorithm to recover Inline graphic from Inline graphic. To avoid symbol clutter, we henceforth just stick to the notation Inline graphic, even though they respectively refer to Inline graphic.

Note that the CS stage following Comp is very important for the following reasons:

  • 1)

    Comp typically produces a large number of false positives. The CS algorithms help reduce the number of false positives as we shall see in later sections.

  • 2)

    Comp does not estimate viral loads, unlike CS algorithms.

  • 3)

    In fact, unlike CS algorithms, Comp treats the measurements in Inline graphic as also being binary, thus discarding a lot of useful information.

  • 4)

    Comp preserves the RIP-1, RIP-2, RNSP, and Inline graphic-RNSP of the pooling matrix, i.e. if Inline graphic obeys any of RIP-1, RIP-2, RNSP or Inline graphic-RNSP of order Inline graphic, then Inline graphic also obeys the same property of the same order Inline graphic with the same parameters. We formalize and prove these claims in the supplemental Section S.V.

However, the Comp algorithm prior to applying the CS algorithm is also very important for the following reasons:

  • 1)

    Viral load in negative pools is exactly 0. Comp identifies the sure negatives in Inline graphic from the negative measurements in Inline graphic. Traditional CS algorithms do not take advantage of this information, since they assume all tests to be noisy (Eqns. (10) and (11)). It is instead easier to discard the obvious negatives before applying the CS step.

  • 2)

    Since Comp identifies the sure negatives, therefore, it effectively reduces the size of the problem to be solved by the CS step from Inline graphic to Inline graphic.

  • 3)

    In a few cases, a (positive) pool in Inline graphic may contain only one contributing sample in Inline graphic, after negatives have been eliminated by Comp. Such a sample is called a ‘high-confidence positive,’ and we denote the list of high-confidence positives as Inline graphic. In rare cases, the CS decoding algorithms we employed (see further in this section) did not recognize such a positive. However, such samples will still be returned by our algorithm as positives, in the set Inline graphic (see last step of Alg. 1, and ‘definite defectives’ in Section III-B).

For CS recovery, we employ one of the following algorithms after Comp: the non-negative LASSO (Nnlasso), non-negative orthogonal matching pursuit (Nnomp), Sparse Bayesian Learning (Sbl), and non-negative absolute deviation regression (Nnlad). For problems of small size, we also apply a brute force (Bf) search algorithm to solve a problem similar to P0 from Eqn. (10) combinatorially.

1). The Non-Negative LASSO (Nnlasso)

The LASSO (least absolute shrinkage and selection operator) is a penalized version of the constrained problem P1 in Eqn. (11), and seeks to minimize the following cost function:

1).

Here Inline graphic is a regularization parameter which imposes sparsity in Inline graphic. The LASSO has rigorous theoretical guarantees [26] (chapter 11) for recovery of Inline graphic as well as recovery of the support of Inline graphic (i.e. recovery of the set of non-zero indices of Inline graphic). Given the non-negative nature of Inline graphic, we implement a variant of LASSO with a non-negativity constraint, leading to the following optimization problem:

1).

Selection of Inline graphic: There are criteria defined in [26] for selection of Inline graphic under iid Gaussian noise, so as to guarantee statistical consistency. However, in practice, cross-validation (CV) can be used for optimal choice of Inline graphic in a purely data-driven fashion from the available measurements. The details of this are provided in the supplemental Section S.III.

2). Non-Negative Orthogonal Matching Pursuit (Nnomp)

Orthogonal Matching Pursuit (OMP) [27] is a greedy approximation algorithm to solve the optimization problem in Eqn. (10). Rigorous theoretical guarantees for OMP have been established in [28]. OMP proceeds by maintaining a set Inline graphic of ‘selected coefficients’ in Inline graphic corresponding to columns of Inline graphic. In each round a column of Inline graphic is picked greedily, based on the criterion of maximum absolute correlation with a residual vector Inline graphic. Each time a column is picked, all the coefficients extracted so far (i.e. in set Inline graphic) are updated. This is done by computing the orthogonal projection of Inline graphic onto the subspace spanned by the columns in Inline graphic. The OMP algorithm can be quite expensive computationally. Moreover, in order to maintain non-negativity of Inline graphic, the orthogonal projection step would require the solution of a non-negative least squares problem, further adding to computational costs. However, a fast implementation of a non-negative version of OMP (Nnomp) has been developed in [29], which is the implementation we adopt here. For the choice of Inline graphic in Eqn. (10), we can use CV as described in Section III-D1.

3). Sparse Bayesian Learning (Sbl)

Sparse Bayesian Learning (Sbl[30], [31] is a non-convex optimization algorithm based on Expectation-Maximization (EM) that has empirically shown superior reconstruction performance to most other CS algorithms with manageable computation cost [32]. In Sbl, we consider the case of Gaussian noise in Inline graphic and a Gaussian prior on elements of Inline graphic, leading to:

3).

Since both Inline graphic and Inline graphic (the vector of the Inline graphic values) are unknown, the optimization for these quantities can be performed using an EM algorithm. In the following, we shall denote Inline graphic. Moreover, we shall use the notation Inline graphic for the estimate of Inline graphic in the Inline graphic iteration. The E-step of the EM algorithm here involves computing Inline graphic. It is to be noted that the posterior distribution Inline graphic has the form Inline graphic where Inline graphic and Inline graphic. The M-step involves maximization of Inline graphic, leading to the update Inline graphic. The E-step and M-step are executed alternately until convergence. Convergence to a fixed-point is guaranteed, though the fixed point may or may not be a local minimum. However, all local minima are guaranteed to produce sparse solutions for Inline graphic (even in the presence of noise) because most of the Inline graphic values shrink towards 0. The Sbl procedure can also be modified to dynamically update the noise variance Inline graphic (as followed in this paper), if it is unknown. All these results can be found in [31]. Unlike Nnlasso or Nnomp, the Sbl algorithm from [31] expressly requires Gaussian noise. However we use it as is in this paper for the simplicity it affords. Unlike Nnomp or Nnlasso, there is no explicit non-negativity constraint imposed in the basic Sbl algorithm. In our implementation, the non-negativity is simply imposed at the end of the optimization by setting to 0 any negative-valued elements in Inline graphic, though more principled, albeit more computationally heavy, approaches such as [33] can be adopted.

4). Non-Negative Absolute Deviation Regression (Nnlad)

The Non-Negative Absolute Deviation Regression (Nnlad[34] and Non-negative Least squares (Nnls[7] seek to respectively minimize

4).

It has been shown in [34] that Nnlad is sparsity promoting for certain conditions on the sensing matrix Inline graphic, and that its minimizer Inline graphic obeys bounds of the form Inline graphic, where Inline graphic is a constant independent of Inline graphic. A salient feature of Nnlad/Nnls is that they do not require any parameter tuning. This property makes them useful for matrices of smaller size where cross-validation may be unreliable.

E. Generalized Binary Search Techniques

There exist adaptive group testing techniques which can determine Inline graphic infected samples in Inline graphic tests via repeated binary search. These techniques are impractical in our setting due to their sequential nature and large pool sizes. We provide details of these techniques in the supplemental Section S.II. We also compare with a two-stage approach called Dorfman's method [2] in Section IV-A7.

F. Sensing Matrix Design

1). Physical Requirements of the Sensing Matrix

The sensing matrix Inline graphic must obey some properties specific to this application such as being non-negative. For ease and speed of pipetting, it is desirable that the entries of Inline graphic be (1) binary (where Inline graphic indicates that sample Inline graphic did not contribute to pool Inline graphic, and Inline graphic indicates that a fixed volume of sample Inline graphic was pipetted into pool Inline graphic), and (2) sparse. Sparsity ensures that not too many samples contribute to a pool, and that a single sample does not contribute to too many pools. The former is important because typically the volume of sample that is added in a PCR reaction is fixed. Increasing pool size means each sample contributes a smaller fraction of that volume. This leads to dilution which manifests as a shift of the Inline graphic value towards larger numbers. If care is not taken in this regard, this can affect the power of PCR to discriminate between positive and negative samples. The latter is important because contribution of one sample to a large number of pools could lead to depletion of sample.

2). RIP-1 of Expander Graph Adjacency Matrices

The Restricted Isometry Property (RIP-2) of sensing matrices is a sufficient condition for good CS recovery as described in Section III-C. However the matrices which obey the aforementioned physical constraints are not guaranteed to obey RIP-2. Instead, we consider sensing matrices which are adjacency matrices of expander graphs. A left-regular bipartite graph Inline graphic with degree of each vertex in Inline graphic being Inline graphic, is said to be a Inline graphic-unbalanced expander graph for some integer Inline graphic and some real-valued Inline graphic, if for every subset Inline graphic with Inline graphic, we have Inline graphic. Here Inline graphic denotes the union set of neighbors of all nodes in Inline graphic. Intuitively a bipartite graph is an expander if every ‘not too large’ subset has a ‘large’ boundary. It can be proved that a randomly generated left-regular bipartite graph with Inline graphic, Inline graphic is an expander, with high probability [35], [36]. Moreover, it has been shown in [25, Thm. 1] that the scaled adjacency matrix Inline graphic of a Inline graphic-unbalanced expander graph obeys RIP-1 (Defn. 1) of order Inline graphic. Here columns of Inline graphic correspond to vertices in Inline graphic, and rows correspond to vertices in Inline graphic. That is, for any Inline graphic-sparse vector Inline graphic, the following relationship holds: Inline graphic for some absolute constant Inline graphic. This property again implies that the null-space of Inline graphic cannot contain vectors that are ‘too sparse’ (apart from the zero-vector). This summarizes the motivation behind the use of expanders in compressive recovery of sparse vectors, and also in group testing [25].

3). Matrices Derived From Kirkman Triple Systems

Although randomly generated left-regular bipartite graphs are expanders, we would need to verify whether a particular such graph is a good expander, which may take prohibitively long in practice [35]. In the application at hand, this can prove to be a critical limitation since matrices of various sizes may have to be served, depending on the number of samples arriving in that batch at the testing centre, and the number of tests available to be performed. Hence, we have chosen to employ deterministic procedures to design such matrices, based on objects from combinatorial design theory known as Kirkman triples (see [8], [9]).

We first recall Kirkman Triple Systems (an example of which is illustrated in Fig. 1) which are Steiner Triple Systems with an extra property. Steiner Triple Systems consist of Inline graphic column vectors with Inline graphic elements each, with each entry being either 0 or 1 such that each column has exactly three 1 s, every pair of rows has dot product equal to 1 and every pair of columns has dot product at most 1 [37]. This means that each column of a Steiner Triple System corresponds to a triplet of rows (i.e. contains exactly three 1 s), and every pair of rows occurs together in exactly one such triplet (i.e. for every pair of rows indexed by Inline graphic, there exists exactly one column index Inline graphic for which Inline graphic). If the columns of a Steiner Triple System can be arranged such that the sum of columns from Inline graphic to Inline graphic equals Inline graphic for every Inline graphic modulo Inline graphic then the Steiner Triple System is said to be resolvable, and is known as a Kirkman Triple System [8]. That is, the set of columns of a Kirkman Triple System can be partitioned into Inline graphic disjoint groups, each consisting of Inline graphic columns, such that each row has exactly one 1 entry in a given such group of columns. Because of this property, we may choose any Inline graphic such groups of columns of a Kirkman Triple System to form a Inline graphic matrix, Inline graphic, with Inline graphic, and Inline graphic, while keeping the number of 1 entries in each row the same. From here on, we refer to such matrices as Kirkman matrices. If Inline graphic, then we refer to it as a full Kirkman matrix, else it is referred to as a partial Kirkman matrix. Note that in a partial Kirkman matrix, the dot product of any two rows may be at most 1, whereas in a full Kirkman matrix, it must be equal to 1.

FIGURE 1.

FIGURE 1.

A full Kirkman matrix with Inline graphic rows and Inline graphic columns. Each cell denotes an entry of the matrix, with white cells denoting the location of a 0 entry and the greyed out cells indicating the location of a 1 entry. Each column has exactly 3 entries with value 1. Each row has 7 entries with value 1. There are Inline graphic groups of columns, each consisting of Inline graphic columns. Each row in a column group has exactly one 1 entry. Matrices of size Inline graphic, Inline graphic, Inline graphic or Inline graphic may be served by choosing the first 4, 5, 6, or 7 column groups, while keeping the number of 1 entries in each row equal.

Notice that Inline graphic for some Inline graphic for a Kirkman Triple System to exist, since Inline graphic must be divisible by 2, and Inline graphic must be divisible by 3. This, and the existence of Kirkman Triple Systems for all Inline graphic have been proven in [9]. Explicit constructions of Kirkman Triple Systems for Inline graphic exist [8]. Generalizations of Kirkman Triple Systems under the name of the Social Golfer Problem is an active area of research (see [38], [39]). The Social Golfer Problem asks if it is possible for Inline graphic golfers to play in Inline graphic groups of Inline graphic players each for Inline graphic weeks, such that no two golfers play in the same group more than once [40, Sec. 1.1]. Kirkman Triple Systems with Inline graphic rows and Inline graphic columns are a solution to the Social Golfer Problem for the case when Inline graphic, Inline graphic and Inline graphic. Full or partial Kirkman matrices may be constructed via greedy search techniques used for solving the Social Golfer Problem (such as in [41]). Previously, Kirkman matrices have been proposed for use as Low-Density Parity Check codes in [42], due to high girth2 of Kirkman matrix bipartite graphs and the ability to serve only part of the matrix while keeping the row weights3 equal. Matrices derived from Steiner Triple Systems have previously been used for pooled testing for transcription regulatory network mapping in [43]. Further, matrices derived from Steiner Systems [44], a generalization of Steiner Triple Systems, have been proposed for optimizing 2-stage binary group testing in [45].

4). RIP-1 and Expansion Properties of Kirkman Matrices

We show that Kirkman matrix bipartite graphs are Inline graphic-unbalanced expanders, with Inline graphic, where Inline graphic is the left-degree of the graph and is 3 for Kirkman matrices. Given a set Inline graphic of column vertices such that Inline graphic, we note that the size of the union set of neighbours of Inline graphic, Inline graphic, is at least Inline graphic, where Inline graphic is the number of (unordered) pairs of columns in Inline graphic, and Inline graphic is the maximum number of row vertices in common between any two column vertices. For a Kirkman matrix, since any two columns have dot product at most 1, hence Inline graphic. Therefore, Inline graphic. Since Inline graphic, therefore Inline graphic. This implies that Kirkman matrix biparite graphs are Inline graphic-unbalanced expanders, with Inline graphic. If we put in the requirement that Inline graphic for Kirkman matrices and Inline graphic, we find that Inline graphic. Hence it follows from [25, Thm. 1] that the scaled Kirkman matrix has RIP-1 of order Inline graphic for Inline graphic and Inline graphic. This suggests exact recovery for upto 3 infected samples using CS. However, in practice, we observe that using our method we are able to recover much higher number of positives, at the cost of an acceptable number of false positives and rare false negatives (Section IV).

5). Optimality of Girth 6 Matrices

A Steiner Triple System bipartite graph does not have a cycle of length 4. If it did, then there would exist two rows Inline graphic and Inline graphic, and two columns Inline graphic and Inline graphic of the Steiner Triple System matrix Inline graphic such that Inline graphic and Inline graphic. This would violate the property that dot product of any two rows of the Steiner Triple System must be equal to 1. Furthermore, [42, Lemma 1] show that Steiner Triple System bipartite graphs have girth equal to 6. Since Kirkman Triple Systems are resolvable Steiner Triple Systems (see definitions earlier in this section), their bipartite graphs also have girth equal to 6. For a bipartite graph constructed from a partial Kirkman matrix, the girth is at least 6, since dropping some column vertices will not introduce new cycles in the graph. Furthermore, it is shown in [23, Thm. 10] that adjacency matrices of left-regular graphs with girth at least 6 satisfy RNSP (Defn. 2) of order Inline graphic (for suitable Inline graphic). Consequently, they may be used for CS decoding [23, Thm. 5]. They also give lower bounds on the number of rows Inline graphic of left-regular bipartite graph matrices whose column weight4 is more than 2, for them to have high girth and consequently satisfy RNSP of order Inline graphic, given Inline graphic and Inline graphic [23, Eqn. 32, 33]. Given Inline graphic and Inline graphic, these lower bounds are minimized for graphs of girth 6 and 8, and the bounds are, respectively, Inline graphic and Inline graphic ([23, Eqn. 37]). However, with the additional requirement that Inline graphic for CS, it is found that girth 6 matrices can recover Inline graphic defects, while girth 8 matrices can only recover Inline graphic defects. Hence, matrices whose bipartite graphs have girth equal to 6 are optimal in this sense. Full Kirkman matrix bipartite graphs are left-regular and have girth 6, as argued earlier, and hence they satisfy RNSP, may be used for compressive sensing, and are optimal in the sense of being able to handle most number of defects while minimizing the number of measurements. We note that since we employ Kirkman triples, each column has only three 1 s. The theoretical guarantees for such matrices hold for signals with Inline graphic norm less than or equal to 2. However, we have obtained acceptable false positive and false negative rates in practice for much larger sparsity levels, as will be seen in Section IV.

6). Disjunctness Property of Kirkman Matrices

In order for a matrix to be suitable for our method, it should not only be good for CS decoding algorithms, but also for Comp. Kirkman matrices are 2-disjunct, and can recover up to 2 defects exactly using Comp. In a Inline graphic-disjunct matrix, there does not exist any column such that its support is a subset of the union of the support of Inline graphic other columns [15]. Matrices which are Inline graphic-disjunct have exact support recovery guarantee for Inline graphic-sparse vectors, using Comp (see [15]). Disjunctness follows from the following properties of Kirkman matrices – that two columns in a Kirkman matrix have at most one row in common with an entry of 1, and that each column has exactly three 1 entries. Consider Inline graphic, Inline graphic, and Inline graphic, the sets of rows for which the three columns Inline graphic, Inline graphic and Inline graphic respectively have a 1 entry. Note that Inline graphic, and Inline graphic for Inline graphic. If Inline graphic, then either Inline graphic or Inline graphic, which presents a contradiction.

Empirically we find that even for Inline graphic, Comp reports only a small fraction of the total number of samples as positives when using Kirkman matrices (Table 1). In Section S.XIV (Proposition 6) of the supplemental material, we prove that if a fraction Inline graphic of the tests come out to be positive, then Compreports strictly less than fraction Inline graphic of the samples as positive for a full Kirkman matrix. This provides intuition behind why Kirkman matrices may be well-suited for our combined Comp + CS method, since most samples are already eliminated by Comp. On the other hand, CS decoding (without the earlier Comp step) on the full Kirkman matrix does not perform as well, as shown in the supplemental Section S.IX.

TABLE 1. Performance of Comp and Dd (On Synthetic Data) for Inline graphic Kirkman Triple Matrix. For Each Criterion and Each Inline graphic Value, Mean and Standard Deviation Values are Reported Across 1000 Signals.
Inline graphic RMSE #FN #FP Sens. Spec. Inline graphic
5 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 1.0 Inline graphic 1.0 1.0000 Inline graphic 0.0000 0.9899 Inline graphic 0.0099 4.8
8 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 4.4 Inline graphic 2.2 1.0000 Inline graphic 0.0000 0.9541 Inline graphic 0.0223 5.2
10 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 8.0 Inline graphic 3.2 1.0000 Inline graphic 0.0000 0.9163 Inline graphic 0.0338 4.0
12 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 12.2 Inline graphic 4.1 1.0000 Inline graphic 0.0000 0.8689 Inline graphic 0.0446 2.5
15 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 19.9 Inline graphic 5.8 1.0000 Inline graphic 0.0000 0.7791 Inline graphic 0.0647 0.9
17 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 24.9 Inline graphic 6.6 1.0000 Inline graphic 0.0000 0.7174 Inline graphic 0.0747 0.5
20 1.000 Inline graphic 0.000 0.0 Inline graphic 0.0 32.0 Inline graphic 8.1 1.0000 Inline graphic 0.0000 0.6233 Inline graphic 0.0955 0.1

7). Advantages of Using Kirkman Matrices

As we have seen in earlier sections, Kirkman matrices are suitable for use in compressed sensing due to their expansion, RIP-1 and high girth properties, as well as for binary group testing due to disjunctness. Furthermore, the dot product between two columns of a Kirkman matrix being at most 1 ensures that no two samples participate in more than one test together. This has favourable consequences in terms of placing an upper bound on the mutual coherence of the matrix, defined as:

7).

where Inline graphic refers to the Inline graphic column of Inline graphic. Matrices with lower Inline graphic values have lower values of worst case upper bounds on the reconstruction error [46]. These bounds are looser than those based on the RIC that we saw in previous sections. However, unlike the RIC, the mutual coherence is efficiently computable.

A practical benefit of Kirkman triples that is not shared by Steiner triples is that the former can be served for number of samples far less than Inline graphic while keeping pools balanced (i.e. ensuring that each pool is created from the same number of samples). In fact, we can choose Inline graphic to be any integer multiple of Inline graphic, and ensure that every pool gets the same number of samples, as discussed in section III-F3. Notice that the expansion, RIP-1, high girth and disjunctness properties hold for full as well as partial Kirkman matrices, as proven in previous sections. This allows us to characterize the properties of the full Kirkman matrix, and use that analysis to predict how it will behave in the clinical situation where the pooling matrix to be served may require very specific values of Inline graphic depending on the prevalence rate.

Column weight: Kirkman matrices have column weight equal to 3 - that is, each sample goes to 3 pools. It is possible to construct matrices with higher number of pools per sample (such as those derived from the Social Golfer Problem [38], which will retain several benefits of the Kirman matrices: (1) They would have the ability to serve only part of the matrix; (2) They would retain the the expander and RIP-1 properties, following a proof similar to the one in Section III-F4; (3) They would not have any 4-cycles in the corresponding bipartite graph, following a similar argument as in Section III-F5; and (4) They would possess the disjunctness property following a proof similar to the one in Section III-F6). Nevertheless, the time and effort needed for pooling increases with more pools per sample. Further, higher pools per sample will come at the cost of a larger number of tests (if pool size is kept constant), or larger pool size (if number of tests is kept constant). Higher number of tests is undesirable for obvious reasons, while larger pool size may lead to dilution of the sample within a pool, leading to individual RT-PCR tests failing.

8). Optimal Binary Sensing Matrices With Random Construction

While Kirkman matrices which satisfy RNSP of order Inline graphic must have at least Inline graphic measurements, we can get much better bounds in theory if we use random constructions. From [7, Prop. 10] we see that with high probability, Inline graphic Bernoulli(Inline graphic) matrices need only Inline graphic measurements in order to satisfy Inline graphic-RNSP (Defn. 3) of order Inline graphic, with Inline graphic being the probability with which each entry of the matrix is independently 1.

In the supplemental Section S.V, we prove that Inline graphic-RNSP is preserved by Comp. That is, the reduced matrix Inline graphic obeys Inline graphic-RNSP of order Inline graphic with the same parameters as the original matrix Inline graphic. Hence our method only needs Inline graphic measurements for robust recovery of Inline graphic-sparse vectors with such random matrix constructions. Bernoulli(Inline graphic) matrices are also good for Comp – [12, Thm. 4] shows that Bernoulli(Inline graphic) matrices with Inline graphic need only Inline graphic measurements for exact support recovery of Inline graphic-sparse vectors with Comp with vanishingly small probability of error.

In practice, we observe that Kirkman matrices perform better than Bernoulli(Inline graphic) matrices using our method in the regime of our problem size. This gap between theory and practice may be arising due to the following reasons: (1) The Inline graphic lower bound for Kirkman triples is for a sufficient but not necessary condition for sparse recovery; (2) The Inline graphic may be ignoring a very large constant factor which affects the performance of moderately-sized problems such as the ones reported in this paper; and (3) The theoretical bounds are for exact recovery with vanishingly small error, whereas we allow some false positives and rare false negatives in our experiments. Similar comparisons between binary and Gaussian random matrices have been recently put forth in [23]. Moreover, the average column weight of Bernoulli(Inline graphic) matrices is Inline graphic, where Inline graphic is the number of measurements. This is typically much higher than column weight 3 of Kirkman matrices and hence undesirable (see Section III-F7). In the supplemental Section S.VI, we compare the performance of Kirkman matrices with Bernoulli(0.1) and Bernoulli(0.5) matrices.

9). Mutual Coherence Optimized Sensing Matrices

As mentioned earlier, the mutual coherence from Eqn. (20) is efficient to compute and optimize over. Hence, there is a large body of literature on designing CS matrices by minimizing Inline graphic w.r.t. Inline graphic, for example [47]. We followed such a procedure for designing sensing matrices for some of our experimental results in Section IV-B. For this, we follow simulated annealing to update the entries of Inline graphic, starting with an initial condition where Inline graphic is a random binary matrix. For synthetic experiments, we compared such matrices with Bernoulli(Inline graphic) random matrices, adjacency matrices of biregular random sparse graphs (i.e. matrices in which each column has the same weight, and each row has the same weight - which may be different than the column weight), and Kirkman matrices. We found that matrices of Kirkman triples perform very well empirically in the regime of sizes we are interested in, besides facilitating easy pipetting, and hence the results are reported using only Kirkman matrices.

IV. Experimental Results

In this section, we show a suite of experimental results on synthetic data as well as on real data.

A. Results on Synthetic Data

1). Choice of Sensing Matrix

Recall from section II that a typical RT-PCR setup can test 96 samples in parallel. Three of these tests are used as control by the RT-PCR technician in order to have confidence that the RT-PCR process has worked. Hence, in order to optimize the available test bandwidth of the RT-PCR setup, the number of tests we perform in parallel should be Inline graphic, and as close to 93 as possible. Since in Kirkman matrices, the number of rows must be Inline graphic for some Inline graphic, hence we choose 93. With this choice, the number of samples tested Inline graphic has to be a multiple of Inline graphic, hence we chose Inline graphic. This matrix is not a full Kirkman matrix – a full matrix with 93 rows will have 1426 columns. However, we keep the number of columns of the matrix under 1000 due to challenges in pooling large number of samples. Furthermore, Inline graphic, Inline graphic satisfies more than 10x factor improvement in testing while detecting 1% infected samples with reasonable sensitivity and specificity and is in a regime of interest for widespread screening or repeated testing.

We also present results with a Inline graphic partial Kirkman matrix in the supplemental Section S.VIII. This matrix gives 2.3x improvement in testing while detecting 9.5% infected samples with reasonable sensitivity and specificity. Further, two such batches of 105 tests in 45 pools may be run in parallel in a single RT-PCR setup.

2). Signal/Measurement Generation

For the case of synthetic data, we generated Inline graphic-sparse signal vectors Inline graphic of dimension Inline graphic, for each Inline graphic in Inline graphic. We choose a wide range of Inline graphic in order to demonstrate that not only do our algorithms have high sensitivity and specificity for large values of Inline graphic, they also keep performing reasonably, well beyond the typical operating regime. The support of each signal vector Inline graphic – given Inline graphic – was chosen by sampling a Inline graphic-sparse binary vector uniformly at random from the set of all Inline graphic-sparse binary vectors. The magnitudes of the non-zero elements of Inline graphic were picked uniformly at random from the range [1,32 768]. This high dynamic range in the value of Inline graphic was chosen to reflect a variance in the typical threshold cycle values (Inline graphic) of real PCR experiments, which can be between 16 and 32. From Eqn. (6), we can infer that viral loads vary roughly as Inline graphic (setting Inline graphic), up to constant multiplicative terms. In all cases, Inline graphic noisy measurements in Inline graphic were simulated following the noise model in Eqn. (3) with Inline graphic and Inline graphic. A Inline graphic Kirkman sensing matrix was used for generating the measurements. The Poisson nature of the elements of Inline graphic in Eqn. (3) was ignored. This approximation was based on the principle that if Inline graphic, then Inline graphic which becomes smaller and smaller as Inline graphic increases. The recovery algorithms were tested on Inline graphic randomly generated signals for each value of Inline graphic.

3). Algorithms Tested

The following algorithms were compared:

  • 1)

    Comp (see Table 1)

  • 2)

    Comp followed by Nnlasso (see Table 2)

  • 3)

    Comp followed by Sbl (see Table 3)

  • 4)

    Comp followed by Nnomp (see Table 4)

  • 5)

    Comp followed by Nnlad (see Table V)

  • 6)

    Comp followed by Nnls (see Table S.VI in the Supplementary)

For each algorithm any positives missed during the CS stage but caught by Dd were declared as positives, as mentioned in Section III-D. For small sample sizes we also tested Comp-Bf, i.e. Comp followed by brute-force search for samples in Inline graphic with non-zero values. Details of this algorithm and experimental results with it are presented in the supplemental Section S.IV.

TABLE 2. Performance of Comp Followed by Nnlasso (On Synthetic Data) for Inline graphic Kirkman Triple Matrix. For Each Criterion and Each Inline graphic Value, Mean and Standard Deviation Values are Reported, Across 1000 Signals.
k RMSE #FN #FP Sens. Spec.
5 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
8 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
10 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
12 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
15 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
17 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
20 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
TABLE 3. Performance of Comp Followed by Sbl (On Synthetic Data) for Inline graphic Kirkman Triple Matrix. For Each Criterion and Each Inline graphic Value, Mean and Standard Deviation Values are Reported, Across 1000 Signals.
Inline graphic RMSE #FN #FP Sens. Spec.
5 0.043 Inline graphic 0.017 0.0 Inline graphic 0.0 0.9 Inline graphic 0.9 0.9998 Inline graphic 0.0063 0.9991 Inline graphic 0.0010
8 0.058 Inline graphic 0.021 0.0 Inline graphic 0.2 4.3 Inline graphic 2.1 0.9958 Inline graphic 0.0227 0.9955 Inline graphic 0.0023
10 0.071 Inline graphic 0.025 0.1 Inline graphic 0.2 8.2 Inline graphic 3.1 0.9937 Inline graphic 0.0247 0.9913 Inline graphic 0.0033
12 0.094 Inline graphic 0.035 0.1 Inline graphic 0.4 13.6 Inline graphic 4.4 0.9886 Inline graphic 0.0310 0.9856 Inline graphic 0.0046
15 0.123 Inline graphic 0.108 0.3 Inline graphic 0.6 25.1 Inline graphic 6.9 0.9804 Inline graphic 0.0396 0.9735 Inline graphic 0.0073
17 0.165 Inline graphic 0.179 0.5 Inline graphic 0.8 35.1 Inline graphic 9.9 0.9713 Inline graphic 0.0491 0.9628 Inline graphic 0.0105
20 0.318 Inline graphic 0.305 1.3 Inline graphic 1.6 54.5 Inline graphic 13.2 0.9349 Inline graphic 0.0803 0.9420 Inline graphic 0.0140
TABLE 4. Performance of Comp Followed by Nnomp (On Synthetic Data) for Inline graphic Kirkman Triple Matrix. For Each Criterion and Each Inline graphic Value, Mean and Standard Deviation Values are Reported, Across 1000 Signals.
Inline graphic RMSE #FN #FP Sens. Spec.
5 0.043 Inline graphic 0.019 0.0 Inline graphic 0.1 0.3 Inline graphic 0.6 0.9982 Inline graphic 0.0209 0.9997 Inline graphic 0.0006
8 0.060 Inline graphic 0.025 0.1 Inline graphic 0.4 1.8 Inline graphic 2.0 0.9831 Inline graphic 0.0472 0.9981 Inline graphic 0.0021
10 0.077 Inline graphic 0.035 0.3 Inline graphic 0.5 3.7 Inline graphic 3.2 0.9739 Inline graphic 0.0541 0.9961 Inline graphic 0.0034
12 0.115 Inline graphic 0.067 0.5 Inline graphic 0.7 7.8 Inline graphic 4.9 0.9565 Inline graphic 0.0560 0.9918 Inline graphic 0.0051
15 0.242 Inline graphic 0.190 1.5 Inline graphic 1.4 15.6 Inline graphic 6.0 0.9013 Inline graphic 0.0951 0.9835 Inline graphic 0.0064
17 0.361 Inline graphic 0.243 2.8 Inline graphic 2.2 20.8 Inline graphic 5.6 0.8329 Inline graphic 0.1268 0.9780 Inline graphic 0.0059
20 0.589 Inline graphic 0.282 6.1 Inline graphic 3.0 27.0 Inline graphic 5.2 0.6941 Inline graphic 0.1520 0.9713 Inline graphic 0.0055
TABLE 5. Performance of Comp Followed by Nnlad (On Synthetic Data) for Inline graphic Kirkman Triple Matrix. For Each Criterion and Each Inline graphic Value, Mean and Standard Deviation Values are Reported, Across 1000 Signals.
k RMSE #FN #FP Sens. Spec.
5 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
8 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
10 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
12 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
15 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
17 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
20 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

4). Comparison Criteria

In the following, Inline graphic denotes the estimate of Inline graphic. Most numerical algorithms do not produce vectors that are exactly sparse and have many entries with very tiny magnitude, due to issues such as choice of convergence criterion. Since in this application, support recovery is of paramount importance to identify which samples in Inline graphic were infected, we employed the following post-processing step: All entries in Inline graphic whose magnitude fell below a threshold Inline graphic were set to zero, yielding a vector Inline graphic. Here Inline graphic refers to the least possible value of the viral load, and this can be obtained offline from practical experiments on individual samples. In these synthetic experiments, we simply set Inline graphic. We observed that varying the value of Inline graphic over a fairly wide range had negligible impact on the results, as can be observed from Section S.XII of the supplemental material. For Sbl, we set Inline graphic to 0 and also set also negative entries in the estimate to 0. For Nnomp, such thresholding was inherently not needed. The various algorithms were compared with respect to the following criteria:

  • 1)

    RMSE := Inline graphic

  • 2)

    Number of false positives (FP) := Inline graphic

  • 3)

    Number of false negatives (FN) := Inline graphic

  • 4)

    Sensitivity (also called Recall or True Positive rate) := Inline graphiccorrectly detected positives/Inline graphicactual positives

  • 5)

    Specificity (also called True Negative Rate) := Inline graphiccorrectly detected negatives/Inline graphicactual negatives.

5). Main Results

It should be noted that all algorithms were evaluated on 1000 randomly generated sparse signals, given the same sensing matrix. The average value as well as standard deviation of all quality measures (over the 1000 signals) are reported in the Tables I, II, III, IV, V, S.VI. A comparison of Table 1 to Tables II, III, IV, V, S.VI indicates that Comp followed by Nnlasso/Sbl/Nnomp/Nnlad/Nnls significantly reduces the false positives at the cost of a rare false negative. The RMSE is also significantly improved, since Comp does not estimate viral loads. At the same time, Comp significantly reduces the size of the problem for the CS stage. For example, for the Inline graphic Kirkman matrix, when number of infected samples Inline graphic is 12, the average size of the matrix after Comp filtering is Inline graphic. From Table 1 we see that Definite Defectives classifies many positives as high-confidence positives, for Inline graphic upto 8. We note that the experimental results reported in these tables are quite encouraging, since these experiments are challenging due to small Inline graphic and fairly large Inline graphic, albeit with testing on synthetic data. We noticed that running the CS algorithms without the Comp step did not perform as well, results for which are presented in the supplemental Section S.IX. We observed that the advantages of our combined group testing and compressed sensing approach holds regardless of the sensing matrix size. For comparison, results of running our algorithms using a Inline graphic Kirkman matrix instead of the Inline graphic Kirkman matrix are presented in supplemental Section S.VIII.

6). Parameter Selection

As mention earlier, the regularization parameters in various estimators such as Comp-Nnlasso, Comp-Nnlad, Comp-Nnomp, etc. are estimated via cross-validation. For these estimators, we therefore do not require knowledge of the Inline graphic parameter in the noise model from Eqn. (3). The Inline graphic parameter in the noise model is set to 0.95 in all our experiments. It is a reasonable choice as the molecule count is known to roughly double in each cycle of RT-PCR [14]. Moreover, variation of Inline graphic in the range from 0.7 to 1 showed negligible variation in the results of our wet-lab experiments as can be seen in Section S.XI and Table S.VIII of the supplemental material. Also note that we only report viral loads relative to Inline graphic (see Eqn. (8)) - we do not attempt to estimate Inline graphic. These relative viral loads are interpretable by the RT-PCR technicians since they know Inline graphic, the minimum Inline graphic (threshold cycle) value observed in that experiment. Note as well that since Inline graphic is the viral load of the pool with the minimum Inline graphic value – it corresponds to the pool with the maximum viral load in that experiment.

7). Comparison With Dorfman Pooling

We also performed a comparison of our algorithms with the popular two-stage Dorfman pooling method (an adaptive method), with regard to the number of tests required. In the first stage of the Dorfman pooling technique, the Inline graphic samples are divided into Inline graphic pools, each of size Inline graphic. Each of these Inline graphic pools are tested, and a negative result leads to all members of that pool being considered negative (i.e. non-infected). However, the pools that are tested positive are passed onto a second stage, where all members of those pools are individually tested. The optimal pool size Inline graphic will minimize the expected number of tests taken by this process (given that the membership in each pool is decided randomly). A formula for the expected number of tests taken by Dorfman testing is derived in [2]. The derivation in [2] assumes the following: (1) Any given sample may be positive with probability Inline graphic, independently of the other samples; (2) The number of samples Inline graphic is divisible by the pool size Inline graphic. We modify the formula from [2] for the case that Inline graphic is not divisible by Inline graphic (supplemental section S.XIII), and find Inline graphic by choosing the value of Inline graphic which minimizes this number. We set Inline graphic, so that out of Inline graphic samples, the number of infected samples is Inline graphic in expectation. Table VI shows the expected number of tests computed from the formula in supplemental section S.XIII, assuming that the expected number of infected samples Inline graphic (and thus the optimal pool size Inline graphic) is known in advance. We also empirically verified the expected number of tests by performing 1000 Monte Carlo simulations of Dorfman testing with the optimal pool size Inline graphic for each case, and did not observe much deviation from the numbers reported in Table VI. Comparisons of Tables I, II, III, IV with the two-stage Dorfman pooling method in VI show that our methods require much fewer tests, albeit with a slight increase in number of false negatives. Moreover, all our methods are single-stage methods and therefore require less time for testing, unlike the Dorfman method which requires two stages of testing.

TABLE 6. Expected Number of Tests Needed by Optimal Dorfman Testing for Number of Samples (Inline graphic) 105 and 961 for Various Inline graphic. Note That Our Proposed Methods Based on CS Require Much Fewer Tests (45 and 93) Typically, and Do Not Require Two Rounds of Testing.
Inline graphic Inline graphic
Inline graphic # Tests Pool Size Inline graphic # Tests Pool Size
5 43.7 5 5 136.5 14
8 55.3 4 8 172.2 11
10 61.3 4 10 192.2 11
12 67.0 4 12 209.6 9
15 73.9 3 15 233.7 9
17 78.2 3 17 248.7 8
20 84.3 3 20 269.4 7

8). Estimation of Number of Infected Samples

The number of CS measurements for successful recovery depends on the number of non-zero elements (Inline graphic norm) of the underlying signal. For example, this varies as Inline graphic for randomized sensing matrices [5] or as Inline graphic for deterministic designs [22]. There is a lower bound of Inline graphic measurements for certain types of expander matrices to satisfy a sufficient (but not necessary) condition for recovery [23]. However, in practice Inline graphic is always unknown, which leads to the question as to how many measurements are needed as a minimum for a particular problem instance. To address this, we adopt the technique from [48] to estimate Inline graphic on the fly from the compressive measurements. This technique does not require signal recovery for estimating Inline graphic. The relative error in the estimate of Inline graphic is shown to be Inline graphic [49], which diminishes as Inline graphic increases (irrespective of the true Inline graphic). Table VII shows the accuracy of our sparsity estimate on synthetic data.

TABLE 7. True Sparsity Inline graphic Versus Estimated Sparsity Inline graphic (On Synthetic Data) for Inline graphic Kirkman Matrix. Mean and Standard Deviation of Estimated Sparsity is Computed Over 1000 Signals for Each Inline graphic.
Inline graphic Inline graphic
5 5.01 Inline graphic 0.33
10 10.07 Inline graphic 0.69
15 15.13 Inline graphic 1.17
20 20.26 Inline graphic 1.63
25 25.54 Inline graphic 2.03
30 30.53 Inline graphic 2.61

The advantage of this estimate of Inline graphic is that it can drive the Comp-Bf algorithm, as well as act as an indicator of whether there exist any false negatives. We can use this knowledge to enable a graceful failure mode. In this mode, if our estimate of Inline graphic is larger than what the CS algorithms can handle, we return only the output of the Comp stage. Hence in such rare cases, it minimizes the number of false negatives, at the cost of many false positives. In these cases a second stage of individual testing must be done on the samples which were declared positive. Table VIII shows the effect of using graceful failure mode with Comp followed by Sbl for large values of Inline graphic. In these experiments, output of Comp is returned if the estimated sparsity, Inline graphic, is greater than or equal to 20. We see that Comp-Sbl with graceful failure mode matches the behaviour of Comp-Sbl at sparsity value lower than 20, and that of Comp at sparsity value greater than 20. At sparsity equal to 20, it compromises between the high false positives of Comp, and the high false negatives of Comp-Sbl. This is because of the variability in Inline graphic, which can occasionally be less than 20 even if Inline graphic is equal to 20.

TABLE 8. Comparison of Mean Number of False Negative and False Positives for COMP, COMP-SBL and COMP-SBL With Graceful Failure Mode for High Values of Inline graphic for the Inline graphic Kirkman Matrix. The Algorithm Goes Into Graceful Failure Mode When Estimated Sparsity is Greater Than or Equal to 20.
COMP COMP-SBL COMP-SBL-graceful
Inline graphic #FN #FP #FN #FP #FN #FP
15 0 45.3 0.3 24.9 0.3 24.8
20 0 92.7 1.3 55.4 0.4 80.2
25 0 151.2 4 97.5 0 151.2
30 0 212.1 6.9 140.6 0 212.1

B. Results on Real Data

We acquired real data in the form of test results on pooled samples from two labs: one at the National Center of Biological Sciences (NCBS) in India, and the other at the Wyss Institute at the Harvard Medical School, USA. In both cases, viral RNA was artificially injected into Inline graphic of the Inline graphic samples where Inline graphic. From these Inline graphic samples, a total of Inline graphic mixtures were created. For the datasets obtained from NCBS that we experimented with, we had Inline graphic, Inline graphic, Inline graphic. For the data from the Wyss Institute, we had Inline graphic, Inline graphic, Inline graphic and Inline graphic, Inline graphic, Inline graphic. The results for all these datasets are presented in Table IX. The Inline graphic and Inline graphic pooling matrices were obtained by performing a simulated annealing procedure to minimize the mutual coherence (see Section III-F9), starting with a random sparse binary matrix as initial condition. The Inline graphic pooling matrix was a Kirkman matrix. We used Inline graphic in all cases to obtain relative viral loads from Inline graphic values, using Eqn. (8). While Inline graphic may be estimated from raw RT-PCR data (Section S.XI, supplemental material), we found Inline graphic to be a reasonable choice, and did not observe any variation in the number of reported positives when this parameter was changed between 0.7 to 1. For Nnlasso, Nnls and Nnlad, we use Inline graphic as the threshold below which an estimated relative viral load is set to 0, since value of Inline graphic may not always be available for real experiments. Here Inline graphic is the relative viral load of the pool with the largest Inline graphic value, and consequently the smallest viral amount. We see that the CS algorithms reduce the false positives, albeit with an introduction of occasional false negatives for higher values of Inline graphic. We also refer the reader to our work in [6] for a more in-depth description of results on real experimental data.

TABLE 9. Results of Lab Experiments With Each Algorithm.

Dataset Algorithm # true pos # false neg #false pos
Harvard Inline graphic Comp 2 0 1
Comp-Sbl 2 0 1
Comp-Nnomp 2 0 0
Comp-Nnlasso 2 0 1
Comp-Nnlad 2 0 1
Comp-Nnls 2 0 1
Harvard Inline graphic Comp 2 0 1
Comp-Sbl 2 0 1
Comp-Nnomp 2 0 1
Comp-Nnlasso 2 0 1
Comp-Nnlad 2 0 1
Comp-Nnls 2 0 1
NCBS-0 Inline graphic Comp 0 0 0
Comp-Sbl 0 0 0
Comp-Nnomp 0 0 0
Comp-Nnlasso 0 0 0
Comp-Nnlad 0 0 0
Comp-Nnls 0 0 0
NCBS-1 Inline graphic Comp 1 0 0
Comp-Sbl 1 0 0
Comp-Nnomp 1 0 0
Comp-Nnlasso 1 0 0
Comp-Nnlad 1 0 0
Comp-Nnls 1 0 0
NCBS-2 Inline graphic Comp 2 0 0
Comp-Sbl 2 0 0
Comp-Nnomp 2 0 0
Comp-Nnlasso 2 0 0
Comp-Nnlad 2 0 0
Comp-Nnls 2 0 0
NCBS-3 Inline graphic Comp 3 0 1
Comp-Sbl 2 1 1
Comp-Nnomp 2 1 0
Comp-Nnlasso 2 1 1
Comp-Nnlad 3 0 1
Comp-Nnls 2 1 1
Comp-Bf 2 1 1
NCBS-4 Inline graphic Comp 4 0 3
Comp-Sbl 3 1 2
Comp-Nnomp 2 2 2
Comp-Nnlasso 3 1 2
Comp-Nnlad 2 2 2
Comp-Nnls 3 1 2
Comp-Bf 2 2 2

C. Discussion

Each algorithm we ran presented a different set of tradeoffs between sensitivity and specificity. While Comp provides us with sensitivity equal to 1, it suffers many false positives, especially for higher Inline graphic. For other algorithms, in general both the sensitivity and the specificity decrease as Inline graphic is increased. Comp-Nnomp (Table 4) has the highest specificity, but it comes at the cost of sensitivity. Comp-Sbl (Table 3) has the best sensitivity for most values of Inline graphic amongst the CS algorithms. Comp-Nnlasso (Table 2) has better specificity than Comp-Sbl for small values of Inline graphic, but loses out for Inline graphic. Comp-Nnlad and Comp-Nnls (Tables 5 and S.VI) start behaving like Comp for higher values of Inline graphic, effectively bounding the number of false negatives. However, their number of false positives is almost as much as those with Comp.

Ideally, we want both high sensitivity and high specificity while catching a large number of infected samples. Hence, we look at Inline graphic, which is the maximum number of infected samples Inline graphic for which the sensitivity and specificity of the algorithm are greater than or equal to some threshold values. For the Inline graphic Kirkman matrix, we chose the sensitivity threshold as 0.99 and the specificity threshold as 0.95. For the Inline graphic Kirkman matrix, we chose both thresholds to be 0.99, since a specificity threshold of 0.95 gives too many false positives for 961 samples. We observed that Comp-Sbl has Inline graphic for both matrices, which is the highest amongst all algorithms tested. Typically we do not know the number of infections, but a prevalence rate of infection. The number of infected samples out of a given set of Inline graphic samples may be treated as a Binomial random variable with probability of success equal to the prevalence rate. Under this assumption, using Comp-Sbl with the Inline graphic Kirkman matrix, we observed that the maximum prevalence rate for which sensitivity and specificity are both above 0.99 is 1%. Similarly, using Comp-Sbl with the Inline graphic Kirkman matrix, we observed that the maximum prevalence rate for which sensitivity is above 0.99 and specificity is above 0.95 is 9.5%. Thus, Tapestry is viable at prevalence rates as high as 9.5%, while reducing testing cost by a factor of 2.3. On the other hand, if the prevalence rate is only 1% or less, it can reduce testing cost by a factor of 10.3.

Comments about sensitivity and specificity: We observe that the sensitivity and specificity of our method on synthetic data is within the recommendations of the U.S. Food and Drugs Administration (FDA), as provided in this document [50]. The document provides recommendations for percent positive agreement (PPA) and percent negative agreement (PNA) of a COVID-19 test with a gold standard test (such as RT-PCR done on individual samples). PPA and PNA are used instead of sensitivity and specificity when ground-truth positives are not known. Since for synthetic data we know the ground truth positives, we compare their PPA and PNA recommendations with the sensitivity and specificity observed by us. We use Comp-Sbl for comparison, since we consider it to be our best method.

For ‘Testing patients suspected of COVID-19 by their healthcare provider’ (point G.4.a, page 7 of [50]), the document considers positive and negative agreement of Inline graphic as acceptable clinical performance (page 9, row 2 of table in [50]). The sensitivity and specificity of our method on the Inline graphic Kirkman matrix is within this range for Inline graphic infected samples (Table 3). For the Inline graphic matrix, it is within this range for Inline graphic infected samples (Table S.XII).

For ‘Screening individuals without symptoms or other reasons to suspect COVID-19 with a previously unauthorized test’ (point G.4.c, page 10 of [50]), the document considers positive agreement of Inline graphic and negative agreement of Inline graphic as acceptable (along with the lower bounds of two-sided 95% confidence interval to be Inline graphic and Inline graphic respectively). Similarly, for ‘Adding population screening of individuals without symptoms or other reasons to suspect COVID-19 to an authorized test’ (point G.4.d, page 12 of [50]) the document has the same criterion as for point G.4.c. Our sensitivity and specificity are within the ranges specified for the Inline graphic Kirkman matrix for Inline graphic (Table 3). While we do not report confidence intervals (as suggested for point G.4.c and G.4.d of [50]), the standard deviation of sensitivity and specificity reported by us are fairly low, and we believe the performance of our method is within the recommendations of [50]. Since our numbers are on synthetic data - these numbers may vary upon full clinical validation, especially considering that there may be more sources of error in a real test. Nonetheless, we find these numbers to be encouraging.

Further, we note that while our method incurs an occasional false negative, the viral loads of these false negative values are fairly small. This means that super-spreaders (who are believed to have high viral load [10]) will almost always be caught by our method. In the supplemental material, we discuss this in more detail in Sec. S.X, and provide a table of mean and standard deviations of viral loads of false negatives (Table S.VII) for all our methods on synthetic data.

Tapestry can detect certain errors caused by incorrect pipetting, pool contamination, or failed RT-PCR amplification of some pools. This is done by performing a consistency check after the Comp stage. If there is a pool which is positive, but all of the samples involved in it have been declared negative by Comp, this is indicative of error. In case of error, we list all samples categorized by the number of tests that they are positive in. However, the Comp consistency check will not catch all errors. Alternately, the noisy Comp [12] algorithm may be used to correct for errors in the Comp stage. A full exposition on detection and correction of errors is left as future work.

Although Tapestry can work with a variety of sensing matrix designs, we found Kirkman matrices to be most suitable for our purposes. This is due to lower sparsity and smaller pool sizes presented by Kirkman matrices. Our algorithms also exhibit a more stable behaviour over a wide range of the number of infected samples Inline graphic when using Kirkman matrices. We compare some alternative matrix designs in Section S.VI.

V. Relation to Previous Work

We review some recent work which apply CS or combinatorial group testing for COVID-19 testing. The works in [51][53] adopt a nonadaptive CS based approach. The works in [54][56] use combinatorial group testing. Compared to these methods, our work is different in the following ways (also see [6]):

  • 1)

    Real/Synthetic data: Our work as well as that in [52] have tested results on real data, while the rest present only numerical or theoretical results.

  • 2)

    Quantitative Noise model: Our work uses the physically-derived noise model in Eqn. (3) (as opposed to only Gaussian noise). This noise model is not considered in [51]. The work in [53] considers unknown noise. Combinatorial group testing methods [54][56] do not make use of quantitative information. The work in [52] uses only binary test information, even though the decoding algorithm is based on CS.

  • 3)

    Algorithms: The work in [51] adopts the Bpdn technique (i.e P1 from Eqn. (11)) as well as the brute-force search method for reconstruction. The work in [52], [57] uses the Lasso, albeit with a ternary representation for the viral loads. The work in [53] uses Nnlad. We use the Lasso with a non-negative constraint, the brute-force method, Nnlad, as well as other techniques such as Sbl and Nnomp, all in combination with Comp. The work in [51] assumes knowledge of the (Gaussian) noise variance for selection of Inline graphic in the estimator in Eqn. (11), whereas we use cross-validation for all our estimators. The technique in [52] uses a slightly different form of cross-validation for selection of the regularization parameter in LASSO. Amongst combinatorial algorithms, [56] uses Comp, while [54] and [55] use message passing.

  • 4)

    Sensing matrix design: The work in [51] uses randomly generated expander graphs, whereas we use Kirkman matrices. The work in [52] uses randomly generated sparse Bernoulli matrices or Reed-Solomon codes, while [55] uses Low-Density Parity Check (LDPC) codes [58]. The work in [53] uses Euler square matrices [59], and the work in [56] uses the Shifted Transversal Design [60]. Both are deterministic disjunct matrices like Kirkman matrices. Each sample in our matrix participates in 3 pools as opposed to 5 pools as used in [55], 6 pools as used in [52] and [56], and 8 pools as used in [53], which is advantageous from the point of view of pipetting time.

  • 5)

    Sparsity estimation: Our work uses an explicit sparsity estimator and does not rely on any assumption regarding the prevalence rate.

  • 6)

    Numerical comparisons: We found that Comp-Nnlad works better than the Nnlad method used in [53] on our matrices (see Tables 5 and S.XIX). We also found that Comp-Nnlasso and Comp-Sbl have better sensitivity and specificity than Comp-Nnlad (see Tables 2, 3, and V). The method in [52] can correctly identify up to 5/384 (1.3%) of samples with 48 tests, with an average number of false positives that was less than 2.75, and an average number of false negatives that was less than 0.33. On synthetic simulations with their Inline graphic Reed-Solomon code based matrix (released by the authors) for a total of 100 Inline graphic vectors with Inline graphic norm of 5 using Comp-Nnlasso, we obtained 1.51 false positives and 0.02 false negatives on an average with a standard deviation of 1.439 and 0.14 respectively. Using Comp-Sbl instead of Comp-Nnlasso with all other settings remaining the same, we obtained 1.4 false positives and 0.0 false negatives on an average with a standard deviation of 1.6 and 0.1 respectively. As such, a direct numerical comparison between our work and that in [52] is not possible, due to lack of available real data, however these numbers yield some indicator of performance.

  • 7)

    Number of Tests: We use 93 tests for 961 samples while achieving more than 0.99 sensitivity and specificity for Inline graphic infections using Comp-Sbl. In a similar setting, [55] use 108 tests for Inline graphic samples under prevalence rate 0.01 for exact 2-stage recovery. The work in [56] uses 186 tests for 961 samples under the same prevalence rate, albeit for sensitivity equal to 1 and very high specificity. Matrix sizes studied in other work are very different than ours. The work in [61] builds on top of our Tapestry scheme to reduce the number of tests, but it is a two-stage adaptive technique and hence will require much more testing time.

VI. Conclusion

We have presented a non-adaptive, single-round technique for prediction of infected samples as well as the viral loads, from an array of Inline graphic samples, using a compressed sensing approach. We have empirically shown on synthetic data as well as on some real lab acquisitions that our technique can correctly predict the positive samples with a very small number of false positives and false negatives. Moreover, we have presented techniques for appropriate design of the mixing matrix. Our single-round testing technique can be deployed in many different scenarios such as the following:

  • 1)

    Testing of 105 symptomatic individuals in 45 tests.

  • 2)

    Testing of 195 asymptomatic individuals in 45 tests assuming a low rate of infection. A good use case for this is airport security personnel, delivery personnel, or hospital staff.

  • 3)

    Testing of 399 individuals in 63 tests. This can be used to test students coming back to campuses, or police force, or asymptomatic people in housing blocks and localities currently under quarantine.

  • 4)

    Testing of 961 people in 93 tests, assuming low infection rate. This might be suitable for airports and other places where samples can be collected and tested immediately, and it might be possible to obtain liquid handling robots.

Outputs: We have designed an Android app named Byom Smart Testing to make our Tapestry protocol easy to deploy in the future. The app can be accessed at [62]. We are also sharing our code and some amount of data at [63]. More information is also available at our website [64].

Future work: Future work will involve extensive testing on real COVID-19 data, and extensive implementation of a variety of algorithms for sensing matrix design as well as signal recovery, keeping in mind the accurate statistical noise model and accounting for occasional pipetting errors.

Acknowledgment

The authors would like to thank the two anonymous reviewers as well as the Associate Editor for careful review of the previous version of this article and helpful suggestions which have greatly improved this article.

Funding Statement

The work of Ajit Rajwade was supported in part by SERB Matrics under Grant MTR/2019/000691. The work of Ajit Rajwade and Manoj Gopalkrishnan was supported in part by IITB WRCB under Grant #10013976 and in part by the DST-Rakshak under Grant #10013980.

Footnotes

1

The two-stage procedure is purely algorithmic. It does not require two consecutive rounds of testing in a lab.

2

The girth of a graph is equal to the length of the shortest cycle in it.

3

defined as the number of 1 entries in a row

4

defined as the number of 1 entries in a column

Contributor Information

Sabyasachi Ghosh, Email: ssgosh@gmail.com.

Rishi Agarwal, Email: rishiagarwal@cse.iitb.ac.in.

Mohammad Ali Rehan, Email: alirehan@cse.iitb.ac.in.

Shreya Pathak, Email: shreyapathak@cse.iitb.ac.in.

Pratyush Agarwal, Email: pratyush@cse.iitb.ac.in.

Yash Gupta, Email: yashgupta@cse.iitb.ac.in.

Sarthak Consul, Email: sarthakconsul@iitb.ac.in.

Nimay Gupta, Email: nimay@cse.iitb.ac.in.

Ritika, Email: ritikagoyal@cse.iitb.ac.in.

Ritesh Goenka, Email: goenkaritesh12@gmail.com.

Ajit Rajwade, Email: ajitvr@cse.iitb.ac.in.

Manoj Gopalkrishnan, Email: manoj.gopalkrishnan@gmail.com.

References

  • [1].Benatia D., Godefroy R., and Lewis J., “Estimating COVID-19 prevalence in the United States: A sample selection model approach,” [Online]. Available: https://www.medrxiv.org/content/10.1101/2020.04.20.20072942v1
  • [2].Dorfman R., “The detection of defective members of large populations,” Ann. Math. Statist., vol. 14, no. 4, pp. 436–440, 1943. [Google Scholar]
  • [3].“Israelis introduce method for accelerated Covid-19 testing,” Accessed: Apr. 8, 2021. [Online]. Available: https://www.israel21c.org/israelis-introduce-method-for-accelerated-covid-19-testing/
  • [4].“Corona ‘pool testing’ increases worldwide capacities many times over,” Accessed: Apr. 8, 2021. [Online]. Available: https://healthcare-in-europe.com/en/news/corona-pool-testing-increases-worldwide-capacities-many-times-over.html
  • [5].Candes E. and Wakin M., “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008. [Google Scholar]
  • [6].Ghosh S. et al. , “Tapestry: A single-round smart pooling technique for COVID-19 testing,” 2020, medRxiv. [Online]. Available: https://www.medrxiv.org/content/early/2020/05/02/2020.04.23.20077727
  • [7].Kueng R. and Jung P., “Robust nonnegative sparse recovery and the nullspace property of 0/1 measurements,” IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 689–703, Feb. 2018. [Google Scholar]
  • [8].Weisstein E. W., “Kirkman's schoolgirl problem,” from MathWorld-A Wolfram Web Resource. Accessed: Apr. 8, 2021. [Online]. Available: https://mathworld.wolfram.com/KirkmansSchoolgirlProblem.html
  • [9].Ray-Chaudhuri D. K. and Wilson R. M., “Solution of Kirkman's schoolgirl problem,” in Proc. Symp. Pure Math, 1971, vol. 19, pp. 187–203. [Google Scholar]
  • [10].Beldomenico P. M., “Do superspreaders generate new superspreaders? A hypothesis to explain the propagation pattern of COVID-19,” Int. J. Infect. Dis., vol. 96, pp. 461–463, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Liu Y. et al. , “Viral dynamics in mild and severe cases of COVID-19,” Lancet Infect. Dis., vol. 20, no. 6, pp. 656–657, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Chan C. L. et al. , “Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms,” in Proc. 49th Annu. Allerton Conf. Commun., Control, Comput., 2011, pp. 1832–1839. [Google Scholar]
  • [13].Jawerth N., “How is the COVID-19 virus detected using real time RT-PCR?,” [Online]. Available: https://www.iaea.org/newscenter/news/how-is-the-covid-19-virus-detected-using-real-time-rt-pcr
  • [14].“Efficiency of real-time PCR,” Accessed: Apr. 5, 2021. [Online]. Available: https://www.thermofisher.com/in/en/home/life-science/pcr/real-time-pcr/real-time-pcr-learning-center/real-time-pcr-basics/efficiency-real-time-pcr-qpcr.html
  • [15].Aldridge M., Baldassini L., and Johnson O., “Group testing algorithms: Bounds and simulations,” IEEE Trans. Inf. Theory, vol. 60, no. 6, pp. 3671–3687, Jun. 2014. [Google Scholar]
  • [16].Gilbert A., Iwen M., and Strauss M., “Group testing and sparse signal recovery,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2008, pp. 1059–1063. [Google Scholar]
  • [17].Zhao N., D. O’Connor, A. Basarab, D. Ruan, and K. Sheng, “Motion compensated dynamic MRI reconstruction with local affine optical flow estimation,” IEEE Trans. Biomed. Eng., vol. 66, no. 11, pp. 3050–3059, Nov. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Zhang Z., T. Jung, S. Makeig, and B. D. Rao, “Compressed sensing for energy-efficient wireless telemonitoring of noninvasive fetal ECG via block sparse bayesian learning,” IEEE Trans. Biomed. Eng., vol. 60, no. 2, pp. 300–309, Feb. 2013. [DOI] [PubMed] [Google Scholar]
  • [19].Liu Y., Vos M. D., and Huffel S. V., “Compressed sensing of multichannel EEG signals: The simultaneous cosparsity and low-rank optimization,” IEEE Trans. Biomed. Eng., vol. 62, no. 8, pp. 2055–2061, Aug. 2015. [DOI] [PubMed] [Google Scholar]
  • [20].Candes E., “The restricted isometry property and its implications for compressive sensing,” Comptes Rendus Mathematiques, vol. 346, no. 9-10, pp. 589–592, 2008. [Google Scholar]
  • [21].Baraniuk R. et al. , “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, pp. 253–263, 2008. [Google Scholar]
  • [22].DeVore R., “Deterministic construction of compressed sensing matrices,” J. Complexity, vol. 23, pp. 918–925, 2007. [Google Scholar]
  • [23].Lotfi M. and Vidyasagar M., “Compressed sensing using binary matrices of nearly optimal dimensions,” IEEE Trans. Signal Process., vol. 68, pp. 3008–3021, 2020, doi: 10.1109/TSP.2020.2990154. [DOI] [Google Scholar]
  • [24].Davenport M. et al. , “Introduction to compressed sensing,” in Compressed Sensing: Theory and Applications, Y. Eldar and G. Kutyniok, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2012, pp. 1–64. [Google Scholar]
  • [25].Berinde R. et al. , “Combining geometry and combinatorics: A unified approach to sparse signal recovery,” in Proc. 46th Annu. Allerton Conf. Commun., Control, Comput., 2008, pp. 798–805. [Google Scholar]
  • [26].Hastie T., Tibshirani R., and Wainwright M., Statistical Learning With Sparsity: The LASSO and Generalizations. Boca Raton, FL, USA: CRC Press, 2015. [Google Scholar]
  • [27].Pati Y., Rezaiifar R., and Krishnaprasad P., “Orthogonal matching pursuit: Recursive function approximation with application to wavelet decomposition,” in Proc. Asilomar Conf. Signals, Syst. Comput., 1993, pp. 40–44. [Google Scholar]
  • [28].Cai T. T. and Wang L., “Orthogonal matching pursuit for sparse signal recovery with noise,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4680–4688, Jul. 2011. [Google Scholar]
  • [29].Yaghoobi M., Wu D., and Davies M., “Fast non-negative orthogonal matching pursuit,” IEEE Signal Process. Lett., vol. 22, no. 9, pp. 1229–1233, Sep. 2015. [Google Scholar]
  • [30].Tipping M., “Sparse bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001. [Google Scholar]
  • [31].Wipf D. and Rao B. D., “Sparse bayesian learning for basis selection,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2153–2164, Aug. 2004. [Google Scholar]
  • [32].Marques E. Crespo et al. , “A review of sparse recovery algorithms,” IEEE Access, vol. 7, pp. 1300–1322, 2019. [Google Scholar]
  • [33].Nalci A., I. Fedorov, M. Al-Shoukairi, T. T. Liu, and B. D. Rao, “Rectified Gaussian scale mixtures and the sparse non-negative least squares problem,” IEEE Trans. Signal Process., vol. 66, no. 12, pp. 3124–3139, Jun. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Petersen H., Bah B., and Jung P., “Efficient noise-blind Inline graphic-regression of nonnegative compressible signals,” 2020, arXiv:2003.13092. [Google Scholar]
  • [35].Lotfi M. and Vidyasagar M., “A fast noniterative algorithm for compressive sensing using binary measurement matrices,” IEEE Trans. Signal Process., vol. 66, no. 15, pp. 4079–4089, Aug. 2018. [Google Scholar]
  • [36].Raginsky M., S. Jafarpour, Z. T. Harmany, R. F. Marcia, R. M. Willett, and R. Calderbank, “Performance bounds for expander-based compressed sensing in poisson noise,” IEEE Trans. Sig. Process., vol. 59, no. 9, pp. 4139–4153, Sep. 2011. [Google Scholar]
  • [37].Wikipedia contributors, “Steiner triple systems,” 2021. Accessed: Apr. 8, 2021. [Online]. Available: https://en.wikipedia.org/wiki/Steiner_system#Steiner_triple_systems
  • [38].Pegg E. J., “Social golfer problem,” from MathWorld-A Wolfram Web Resource, created by E. W. Weisstein. Accessed: Apr. 8, 2021. [Online]. Available: https://mathworld.wolfram.com/SocialGolferProblem.html
  • [39].“Math games: Social golfer problem,” [Online]. Available: http://www.mathpuzzle.com/MAA/54-Golf%20Tournaments/mathgames_08_14_07.html
  • [40].Triska M., Solution Methods for the Social Golfer Problem. Citeseer, 2008. [Google Scholar]
  • [41].Dotú I. and Van Hentenryck P., “Scheduling social golfers locally,” in Proc. Int. Conf. Integration Artif. Intell. (AI) Operations Res. (OR) Techn. Constraint Program., Springer, 2005, pp. 155–167. [Google Scholar]
  • [42].Johnson S. J. and Weller S. R., “Construction of low-density parity-check codes from Kirkman triple systems,” in Proc. IEEE Global Telecommun. Conf., 2001, vol. 2, pp. 970–974. [Google Scholar]
  • [43].Vermeirssen V. et al. , “Matrix and steiner-triple-system smart pooling assays for high-performance transcription regulatory network mapping,” Nature Methods, vol. 4, no. 8, pp. 659–664, 2007. [DOI] [PubMed] [Google Scholar]
  • [44].Wikipedia contributors, “Steiner systems,” 2021. Accessed: Apr. 8, 2021. [Online]. Available: https://en.wikipedia.org/wiki/Steiner_system
  • [45].Tonchev V. D., “Steiner systems for two-stage disjunctive testing,” J. Combinatorial Optim., vol. 15, no. 1, pp. 1–6, 2008. [Google Scholar]
  • [46].Studer C. and Baraniuk R., “Stable restoration and separation of approximately sparse signals,” Appl. Comput. Harmon. Anal., vol. 37, no. 1, pp. 12–35, 2014. [Google Scholar]
  • [47].Abdoghasemi V. et al. , “On optimization of the measurement matrix for compresive sensing,” in Proc. 18th Eur. Signal Process. Conf., 2010, pp. 427–431. [Google Scholar]
  • [48].Bioglio V., Bianchi T., and Magli E., “On the fly estimation of the sparsity degree in compressed sensing using sparse sensing matrices,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2015, pp. 3801–3805. [Google Scholar]
  • [49].Ravazzi C. et al. , “Sparsity estimation from compressive projections via sparse random matrices,” EURASIP J. Adv. Signal Process., vol. 56, 2018. [Online]. Avilable: https://doi.org/10.1186/s13634-018-0578-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].“In vitro diagnostics euas, section: Templates for eua submissions/diagnostic templates (molecular and antigen), bullet: Molecular diagnostic template for laboratories,” Accessed: Mar. 27, 2021. [Online]. Available: https://www.fda.gov/medical-devices/coronavirus-disease-2019-covid-19-emergency-use-authorizations-medical-devices/in-vitro-diagnostics-euas#covid19ivdTemplates
  • [51].Yi J., Mudumbai R., and Xu W., “Low-cost and high-throughput testing of COVID-19 viruses and antibodies via compressed sensing: System concepts and computational experiments,” 2020. [Online]. Available: https://arxiv.org/abs/2004.05759
  • [52].N. Shental et al. , “Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers,” Sci. Adv., 2020. [Online]. Available: https://advances.sciencemag.org/content/early/2020/08/20/sciadv.abc5961 [DOI] [PMC free article] [PubMed]
  • [53].Petersen H., Bah B., and Jung P., “Practical high-throughput, non-adaptive and noise-robust SARS-CoV-2 testing,” 2020. [Online]. Available: https://arxiv.org/abs/2007.09171
  • [54].Zhu J., Rivera K., and Baron D., “Noisy pooled PCR for virus testing,” 2020. [Online]. Available: https://arxiv.org/abs/2004.02689
  • [55].Seong J.-T., “Group testing-based robust algorithm for diagnosis of COVID-19,” Diagnostics, vol. 10, no. 6, 2020, Art. no. 396, doi: 10.3390/diagnostics10060396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Täufer M., “Rapid, large-scale, and effective detection of COVID-19 via non-adaptive testing,” J. Theor. Biol., vol. 506, 2020, Art. no. 110450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Nida H. et al. , “Highly efficient de novo mutant identification in a sorghum bicolor tilling population using the comseq approach,” Plant J., vol. 86, no. 4, pp. 349–359, 2016. [DOI] [PubMed] [Google Scholar]
  • [58].MacKay D. J., “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. [Google Scholar]
  • [59].Naidu R. R., Jampana P., and Sastry C. S., “Deterministic compressed sensing matrices: Construction via euler squares and applications,” IEEE Trans. Signal Process., vol. 64, no. 14, pp. 3566–3575, Jul. 2016. [Google Scholar]
  • [60].Thierry-Mieg N., “A new pooling strategy for high-throughput screening: The shifted transversal design,” BMC Bioinf., vol. 7, no. 28, 2006. [Online]. Avilable: https://doi.org/10.1186/1471-2105-7-28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Heidarzadeh A. and Narayanan K. R., “Two-stage adaptive pooling with RT-QPCR for COVID-19 screening,” 2020, arXiv:2007.02695.
  • [62].“Byom app,” [Online]. Available: https://rebrand.ly/byom-app
  • [63].“Tapestry code,” [Online]. Available: https://github.com/atoms-to-intelligence/tapestry
  • [64].“Tapestry website,” [Online]. Available: https://www.tapestry-pooling.com/
  • [65].Du D., Hwang F. K., and Hwang F., Combinatorial Group Testing and Its Applications. Singapore: World Scientific, 2000. [Google Scholar]
  • [66].Zhang J. et al. , “On the theoretical analysis of cross validation in compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 3370–3374. [Google Scholar]
  • [67].Li Y. and Raskutti G., “Minimax optimal convex methods for poisson inverse problems under Inline graphic-ball sparsity,” IEEE Trans. Inf. Theory, vol. 64, no. 8, pp. 5498–5512, Aug. 2018. [Google Scholar]

Articles from Ieee Open Journal of Signal Processing are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES