Skip to main content
IEEE - PMC COVID-19 Collection logoLink to IEEE - PMC COVID-19 Collection
. 2021 May 20;8(3):2170–2182. doi: 10.1109/TNSE.2021.3081759

Positively Correlated Samples Save Pooled Testing Costs

Yi-Jheng Lin 1,, Che-Hao Yu 1, Tzu-Hsuan Liu 1, Cheng-Shang Chang 1, Wen-Tsuen Chen 1
PMCID: PMC8769016  PMID: 35783009

Abstract

The group testing approach, which achieves significant cost reduction over the individual testing approach, has received a lot of interest lately for massive testing of COVID-19. Many studies simply assume samples mixed in a group are independent. However, this assumption may not be reasonable for a contagious disease like COVID-19. Specifically, people within a family tend to infect each other and thus are likely to be positively correlated. By exploiting positive correlation, we make the following two main contributions. One is to provide a rigorous proof that further cost reduction can be achieved by using the Dorfman two-stage method when samples within a group are positively correlated. The other is to propose a hierarchical agglomerative algorithm for pooled testing with a social graph, where an edge in the social graph connects frequent social contacts between two persons. Such an algorithm leads to notable cost reduction (roughly 20–35%) compared to random pooling when the Dorfman two-stage algorithm is applied.

Keywords: COVID-19, group testing, regenerative processes, Markov modulated processes, social networks

I. Introduction

MASSIVE testing is one of the most effective measures to detect and isolate asymptomatic COVID-19 infections to reduce the transmission rate of COVID-19 [1]. However, massive testing for a large population is very costly if it is done one at a time. The recent article posted on the US FDA website [2] indicates that the group testing approach (or pool testing, pooled testing, batch testing) has received a lot of interest lately. Such an approach (testing a group of mixed samples) can greatly save testing resources for a population with a low prevalence rate [3][6]. Moreover, the following testing procedure is suggested in the US CDC's guidance for the use of pooling procedures in SARS-CoV-2 [7]:

“If a pooled test result is negative, then all specimens can be presumed negative with the single test. If the test result is positive or indeterminate, then all the specimens in the pool need to be retested individually.”

A simple testing procedure that implements the above guidance is known as Dorfman's two-stage group testing method [8]. The method first partitions the population into groups of Inline graphic samples. If the test of a group of Inline graphic samples is negative, then all the Inline graphic samples in that group are declared to be negative. Otherwise, each sample in that group is retested individually. Such a method has been implemented by many countries for massive testing of COVID-19 [9].

To measure the amount of saving of a group testing method, Dorfman used the expected relative cost (that is defined as the ratio of the expected number of tests required by the group testing method to the number of tests required by the individual testing). The expected relative cost for independent and identically distributed (i.i.d.) samples was derived in [8]. Suppose that the prevalence rate (the probability that a randomly selected sample is positive) is Inline graphic. Note that if the test result of a group is positive, all the samples in that group need to be retested individually. For a group of Inline graphic samples, the group is tested positive with the probability Inline graphic, where Inline graphic. So the expected number of tests for the group is Inline graphic. Thus, the expected relative cost for i.i.d. samples with group size Inline graphic is

I.

One can then use (1) to optimize the group size Inline graphic according to the prevalence rate [8].

There are more sophisticated group testing methods for implementing the CDC's guidance for testing COVID-19 (see e.g., [10][15]). These methods require diluting a sample and then pooling the diluted samples into multiple groups (pooled samples). Such methods are specified by two components: (i) a pooling matrix that directs each diluted sample to be pooled into a specific group, and (ii) a decoding algorithm that uses the test results of pooled samples to reconstruct the status (i.e., a positive or negative result) of each sample. As shown in the recent comparative study [15], the expected relative costs of such methods depend heavily on the pooling matrix, and one has to select an appropriate pooling matrix according to the prevalence rate. For i.i.d. samples, using such sophisticated methods result in significant gains over the simple Dorfman two-stage group testing method, in particular when the prevalence rate is low (below 5%).

In practice, samples are not i.i.d. For a contagious disease like COVID-19, people in the same family (or social bubble) are likely to infect each other. Lendle et al. [16] studied the efficiency (i.e., the expected relative costs) for group testing methods when samples within a group are positively correlated exchangeable random variables. They derived closed-form expressions of efficiency for hierarchical- and matrix-based group testing methods under certain assumptions, and examined three models of exchangeable binary random variables. They concluded that positive correlations between samples within a group could improve efficiency.

Moreover, in the recent WHO research article [17], it was shown by computer simulations that pooled samples from homogeneous groups of similar people could lead to cost reduction for the Dorfman two-stage method. The main objective of this paper is to provide insight and proof for that observation through a mathematical model.

Let us consider a testing site where people form a line (or queue) to be tested. It is reasonable to assume that people arriving in groups of various sizes are in contiguous positions of the line. Since the disease prevalence rate in two arriving groups may differ, we say that two groups are of the same type if they have the same prevalence rate. People in Inline graphic contiguous positions are pooled together and tested by using Dorfman's two-stage group testing method. For our analysis, we make the following three mathematical assumptions:

  • (A1)

    i.i.d. group sizes: The sizes of arriving groups of people are i.i.d. with a finite mean.

  • (A2)

    i.i.d. group types: There are Inline graphic types of arriving groups. The types of arriving groups of people are i.i.d. With probability Inline graphic, a group of arriving people is of type Inline graphic, Inline graphic.

  • (A3)

    Homogeneous samples within the same group: Samples obtained from people within the same group are i.i.d. Bernoulli random variables with the same prevalence rate. With probability Inline graphic (resp. Inline graphic), a sample in a type Inline graphic group is negative (resp. positive).

An illustration of an arrival process in a testing site is provided in Fig. 1. In this figure, the number of people in the first group Inline graphic is 4, the number of people in the second group Inline graphic is 7, the number of people in the third group Inline graphic is 6, and the number of people in the forth group Inline graphic is 5. Eight samples of contiguous positions are pooled together for Dorfman's two-stage group testing, i.e., Inline graphic.

Fig. 1.

Fig. 1.

An illustration of an arrival process in a testing site, where Inline graphic is the group size of the Inline graphic arriving groups, Inline graphic is the group type of the Inline graphic sample, and Inline graphic samples of contiguous positions are pooled together for Dorfman's two-stage group testing.

Denote by Inline graphic the indicator random variable of the Inline graphic sample in the line of the testing site. We say the Inline graphic sample is negative (resp. positive) if Inline graphic (resp. Inline graphic). Consider using the Dorfman two-stage method for testing the Inline graphic consecutive samples Inline graphic for some fixed Inline graphic. With probability

I.

the test result for the group of Inline graphic consecutive samples is positive and they need to be tested individually. Thus, the expected number of tests is

I.

As such, the expected relative cost for these Inline graphic samples by the Dorfman two-stage method is

I.

We state the first main result of this paper in the following theorem.

Theorem 1: —

Suppose that the arriving process Inline graphic satisfying (A1)-(A3). The expected relative cost for pooling any Inline graphic consecutive samples into a group is not higher than that for pooling Inline graphic samples at random, i.e., the expected relative cost in (2) is not higher than (1) with

graphic file with name M46.gif

Our second main result is the monotonicity of the expected relative cost under a stronger assumption than (A1).

  • (AInline graphic) The group sizes are independent and geometrically distributed with parameter Inline graphic for some Inline graphic.

Theorem 2: —

Suppose that the arriving process Inline graphic satisfying (AInline graphic), (A2), and (A3). Then the expected relative cost in (2) is decreasing in Inline graphic.

Note that when Inline graphic, Inline graphic is reduced to the sequence of i.i.d. samples with the prevalence rate Inline graphic. As such, the monotonicity result in Theorem 2 is a stronger result than that in Theorem 1.

Our third main result is a closed-form expression for the expected relative cost under (AInline graphic), (A2), and (A3).

Theorem 3: —

Under (AInline graphic), (A2), and (A3), the expected relative cost is

graphic file with name M58.gif

where Inline graphic is the Inline graphic matrix with

graphic file with name M61.gif

Inline graphic is the diagonal matrix with the Inline graphic diagonal element being Inline graphic, Inline graphic is the Inline graphic (column) vector with all its elements being 1, and Inline graphic is the Inline graphic (row) vector with its Inline graphic element being Inline graphic.

We can further derive the lower bound of the expected relative cost in (4).

Theorem 4: —

Under (AInline graphic), (A2), and (A3), the expected relative cost is lower bounded by

graphic file with name M72.gif

Using the closed-form expression in Theorem 3, we compare the expected relative cost of the simple Dorfman two-stage method with the lowest expected relative cost of the Inline graphic-regular pooling matrix [15]. With a moderate positive correlation, our numerical results demonstrate that the gain by such a simple method outperforms those by using sophisticated strategies with Inline graphic-regular pooling matrices when the prevalence rate is higher than 5%.

The results for samples in a line of a testing site only exploit the positive correlations between two contiguous samples in a line graph. One important extension is to consider pooled testing with a social graph, where frequent social contacts between two persons are connected by an edge in the social graph. Contagious diseases such as COVID-19 can propagate the disease from an infected person to another person through the social contacts between two persons, two persons connected by an edge are likely to infect each other, and they are likely to be positively correlated. To exploit the positive correlation in a social graph, we adopt the probabilistic framework of sampled graphs for structural analysis in [18][20]. In particular, we propose a hierarchical agglomerative algorithm for pooled testing with a social graph (see Algorithm 1). Our numerical results show that such an algorithm leads to significant cost reduction (roughly 20%-35%) compared to random pooling when the Dorfman two-stage algorithm is used.

The paper is organized as follows: in Section II-A, we prove Theorem 1 and Theorem 2 by using the renewal property of regenerative processes. We then prove Theorem 3 and Theorem 4 in Section II-B by using the Markov property of Markov modulated processes. In Section III, we extend the dependency of samples from a line graph to a general graph. There we propose a hierarchical agglomerative algorithm to exploit the positive correlation of samples. The numerical results are shown in Section IV. The paper is concluded in Section V, where we discuss possible extensions for future works.

II. Mathematical Analyses and Proofs

A. Regenerative processes

In this section, we prove the main result in Theorem 1 and Theorem 2 by using the renewal property of regenerative processes (see, e.g., Section 6.3 of the book [21]).

Let Inline graphic be the number of samples in the Inline graphic group, and Inline graphic be the cumulative number of samples in the first Inline graphic groups. Since we assume that Inline graphic are i.i.d. in (A1), Inline graphic is a renewal process. From (A2) and (A3), Inline graphic is a regenerative process with the regenerative points Inline graphic, i.e., Inline graphic has the same joint distribution as Inline graphic.

In the following lemma, we derive the prevalence rate.

Lemma 5: —

The prevalence rate of a randomly selected sample for the arrival process satisfying (A1)-(A3) is

graphic file with name M85.gif

Thus, Inline graphic.

Proof. Let Inline graphic be the group type of the Inline graphic sample. In view of (A2), we have

A.

Also, from (A3),

A.

From the law of total probability, it follows that

A.

As (10) holds for any arbitrary Inline graphic, the prevalence rate of a randomly selected sample is the same as (10).

Now we prove Theorem 1.

Proof. (Theorem 1) In view of (2), it suffices to show that for any Inline graphic,

A.

For this, we first show that (11) holds for Inline graphic by induction on Inline graphic. Since Inline graphic from Lemma 5, the inequality in (11) holds trivially for Inline graphic. Assume that the inequality in (11) holds for Inline graphic and all Inline graphic as the induction hypothesis. From the law of total probability, we have

A.

Conditioning on the event Inline graphic for Inline graphic, the number of samples in the first group is not smaller than Inline graphic. Thus, for Inline graphic, we have from (A2) and (A3) that

A.

where the last inequality follows from Jensen's inequality for the convex function Inline graphic. For Inline graphic, we know that the second group starts from Inline graphic. It then follows from the renewal property in (A1) that

A.

where the second last inequality follows from Jensen's inequality for the convex function Inline graphic, and the last inequality follows from the induction hypothesis. Using (13) and (14) in (12) completes the induction for Inline graphic in (11).

Now we show that (11) hold for any arbitrary Inline graphic. For a fixed Inline graphic, let Inline graphic be the residual life from Inline graphic to the next regenerative point, i.e., the number of remaining samples in the same group of the Inline graphic sample. The argument for any arbitrary Inline graphic then follows from the same inductive proof for Inline graphic by replacing Inline graphic with Inline graphic.

In the proof of Theorem 1, we show that

A.

By replacing Inline graphic by Inline graphic in the proof of Theorem 1, one can also show that

A.

Letting Inline graphic in (16) yields the following corollary.

Corollary 6: —

Suppose that the arriving process Inline graphic satisfying (A1)-(A3). Then Inline graphic and Inline graphic are positively correlated, i.e.,

graphic file with name M130.gif

where Inline graphic denotes the expectation operator of the random variable Inline graphic.

There are two key properties used in the proof of Theorem 1: the regenerative property and Jensen's inequality (for convex functions). To prove Theorem 2, we need the following generalization of Jensen's inequality.

Lemma 7: —

For any positive integers Inline graphic,

graphic file with name M134.gif

Note that for Inline graphic, the inequality in (18) reduces to Jensen's inequality for the convex function Inline graphic used in the proof of Theorem 1.

Proof. Consider a random variable Inline graphic with the probability mass function Inline graphic, Inline graphic. Since Inline graphic for all Inline graphic, Inline graphic is nonnegative. Then the right-hand-side of (18) can be written as Inline graphic. Similarly, the left-hand-side of (18) can be written as Inline graphic. Thus, it suffices to show that

A.

We show (19) by induction on Inline graphic. For Inline graphic, we consider two independent random variables Inline graphic and Inline graphic that have the same distribution as Inline graphic. Since Inline graphic and Inline graphic are nonnegative, for any two positive integers Inline graphic and Inline graphic,

A.

To see this, note that if Inline graphic, then Inline graphic and Inline graphic. Taking expectations on both side of (20) yields

A.

Since Inline graphic and Inline graphic are independent and have the same distribution as Inline graphic, we have from (21) that

A.

Now assume that (19) hold for Inline graphic as the induction hypothesis. From (22) and the induction hypothesis, it follows that

A.

Now we prove Theorem 2.

Proof. (Theorem 2) To show that the expected relative cost in (2) is decreasing in Inline graphic, it is equivalent to showing that Inline graphic is increasing in Inline graphic. Consider two arrival processes Inline graphic and Inline graphic that are generated by using the parameters Inline graphic and Inline graphic in (AInline graphic), respectively. Assume that Inline graphic. Let Inline graphic (resp. Inline graphic) be the group size of the Inline graphic group in the first (resp. second) arrival process. Note from (AInline graphic) that for all Inline graphic and Inline graphic,

A.

The trick of the proof is to couple the two sequences of group sizes Inline graphic and Inline graphic so that the regenerative points of Inline graphic is a subset of the regenerative points of Inline graphic. Such a coupling is feasible because the random splitting of a renewal process with geometrically distributed interarrival times is also a renewal process with geometrically distributed interarrival times. In particular, the size of the first group for the second arrival process, i.e., Inline graphic, is a sum of the sizes of several groups for the first arrival process, i.e.,

A.

for some Inline graphic. An illustration of coupling two sequences of group sizes Inline graphic and Inline graphic is shown in Fig. 2.

Fig. 2.

Fig. 2.

An illustration of coupling two sequences of group sizes Inline graphic and Inline graphic.

Following the regenerative analysis in the proof of Theorem 1, we condition on the event Inline graphic and use the law of the total probability to derive that

A.

For Inline graphic, we have from (A3) that

A.

From the coupling of these two arrival processes,

A.

As a direct consequence of Lemma 7, we then have

A.

The case for Inline graphic is similar, and we have from (25) that

A.

B. Markov modulated processes

In this section, we prove Theorem 3 and Theorem 4 by using the Markov property of Markov modulated processes (see, e.g., Chapter 8 and Chapter 9 of the book [21]).

Recall that Inline graphic is the group type of the Inline graphic sample. In view of the memoryless property of the geometrical distribution, we know that with probability Inline graphic, the Inline graphic sample is still in the same group of the Inline graphic sample. With probability Inline graphic, it is in another group. Under (AInline graphic) and (A2), the sequence of group types Inline graphic is a Markov chain with Inline graphic states. Denote by Inline graphic the transition probability from state Inline graphic to state Inline graphic for the (hidden) Markov chain. For such a Markov chain, we then have

B.

It is easy to see that the correlation coefficient of Inline graphic and Inline graphic is simply Inline graphic, i.e.,

B.

From (A3), we also know that Inline graphic is a Markov modulated process that is modualted by the (hidden) Markov chain Inline graphic. The conditional probability that Inline graphic is negative given the (hidden) Markov chain is in the state Inline graphic is Inline graphic, i.e.,

B.

As such, we have from the law of total probability that

B.

From the (conditional) independence of Bernoulli samples in (A3), it follows that

B.

Using (34) in (33) yields

B.

Now let

B.

Similar to the argument for (35), we can further condition on the event Inline graphic and use the law of total probability to show that

B.

for Inline graphic. Let Inline graphic be the Inline graphic (column) vector with its Inline graphic element being Inline graphic, Inline graphic be the Inline graphic transition probability matrix, and Inline graphic be the diagonal matrix with the Inline graphic diagonal element being Inline graphic. Then (37) can be rewritten in the following matrix form:

B.

Since Inline graphic for all Inline graphic, we have from (38) that

B.

where Inline graphic is the Inline graphic vector with all its elements being 1.

Let Inline graphic be the Inline graphic (row) vector with its Inline graphic element being Inline graphic. Then we have from (35) and (39) that

B.

Thus, the expected relative cost is

B.

as in Theorem 3.

For Inline graphic, we note that the Markov chain Inline graphic stays at the same state from time 1 onward, and the Inline graphic random variables Inline graphic are i.i.d. when conditioning on Inline graphic. As such, they are exchangeable random variables, and the distribution of Inline graphic can be expressed as a mixture of Binomial distributions. For the special case Inline graphic, our model of Markov modulated processes recovers the model of exchangeable binary random variables in [16] (see Assumptions 2 and 3 in [16]).

Now we prove Theorem 4.

Proof. (Theorem 4) Analogous to the proof of Theorem 1, it suffices to show that for any Inline graphic,

B.

For this, we show that (42) holds by induction on Inline graphic. Since Inline graphic, the inequality in (42) holds trivially for Inline graphic. Assume that the inequality in (42) holds for all Inline graphic as the induction hypothesis. From the law of total probability, we have

B.

Conditioning on the event Inline graphic for Inline graphic, the number of samples in the first group is not smaller than Inline graphic. Thus, for Inline graphic, we have from (A2) and (A3) that

B.

where the last inequality follows from the fact that the convex function Inline graphic for Inline graphic. For Inline graphic, we know that the second group starts from Inline graphic. It then follows from the renewal property in (A1) that

B.

where the second last inequality follows from the fact that the convex function Inline graphic for Inline graphic, and the last inequality follows from the induction hypothesis. Since Inline graphic is geometrically distributed from (AInline graphic), we have

B.

Using (44), (45) and (46) in (43) yeilds

B.

This then completes the induction in (42).

III. Pooled Testing with a Social Graph

In the previous section, we consider samples in a line of a testing site, where the correlations between two contiguous samples are characterized by a line graph. In this section, we extend the dependency between two samples to a general graph. Suppose that there is a social network modeled by a graph Inline graphic, where Inline graphic is the set of nodes, and Inline graphic is the set of edges. A node in Inline graphic represents a person in the social graph, and an edge between two persons represents frequent social contacts between these two persons. As a contagious disease can propagate the disease from an infected person to another person through the social contacts between these two persons, two persons connected by an edge are likely to infect each other. Thus, two samples obtained from two persons connected by an edge are also likely to be positively correlated.

The question for pooled testing with a social graph Inline graphic is how to exploit positive correlation from the edge connections in a social graph to save pooled testing costs. Intuitively, a set of nodes that are densely connected to each other are likely to be positively correlated. In social network analysis (see, e.g., [22]), such a set of nodes is called a community. In view of this, our idea for addressing the pooled testing problem with a social graph is to detect communities in a graph and then pool samples in the same community together for pooled testing.

Like pooled testing for people in a line, we define a pooling strategy for a graph Inline graphic with Inline graphic nodes, i.e., Inline graphic, as a permutation Inline graphic of Inline graphic that puts the Inline graphic nodes into a line. As such, when we use the Dorfman two-stage algorithm with a given group size Inline graphic, we can pool nodes Inline graphic in the first group, nodes Inline graphic in the second group, etc. A random pooling strategy for a graph Inline graphic is the strategy where the permutation Inline graphic is selected at random among the Inline graphic permutations. The main objective of this section is to propose a pooling strategy from a community detection algorithm in [18][20] that can achieve a lower expected relative cost than the random pooling strategy.

A. The probabilistic framework of sampled graphs

In this section, we briefly review the probabilistic framework of sampled graphs for structural analysis in [18][20]. For a graph Inline graphic with Inline graphic nodes, we index the Inline graphic nodes from Inline graphic. Also, let Inline graphic be the Inline graphic adjacency matrix of the graph, i.e.,

A.

Let Inline graphic be the set of paths from Inline graphic to Inline graphic and Inline graphic be the set of paths in the graph Inline graphic. According to a probability mass function Inline graphic, called the path sampling distribution, a path Inline graphic is selected at random with probability Inline graphic. Let Inline graphic (resp. Inline graphic) be the starting (resp. ending) node of a randomly selected path by using the path sampling distribution Inline graphic. Then the bivariate distribution

A.

is the probability that the ordered pair of two nodes Inline graphic is selected. Intuitively, one might interpret the bivariate distribution Inline graphic in (48) as the probability that both nodes Inline graphic and Inline graphic are infected (through one of the paths Inline graphic in Inline graphic). Thus, the bivariate distribution Inline graphic can also be viewed as a similarity measure from node Inline graphic to node Inline graphic and this leads to the definition of a sampled graph in [18][20].

Definition 8 (Sampled graph [18][20]): —

A graph Inline graphic that is sampled by randomly selecting an ordered pair of two nodes Inline graphic according to a specific bivariate distribution Inline graphic in (48) is called a sampled graph and it is denoted by the two-tuple Inline graphic.

Definition 9 (Covariance and Community [19], [20])): —

For a sampled graph Inline graphic, the covariance between two nodes Inline graphic and Inline graphic is defined as follows:

graphic file with name M334.gif

Moreover, the covariance between two sets Inline graphic and Inline graphic is defined as follows:

graphic file with name M337.gif

Two sets Inline graphic and Inline graphic are said to be positively correlated if Inline graphic. In particular, if a subset of nodes Inline graphic is positively correlated to itself, i.e., Inline graphic, then it is called a community.

There are many methods to obtain a sampled graph [19]. In this paper, we will use the following bivariate distribution

A.

where Inline graphic is the adjacency matrix of a graph Inline graphic, and Inline graphic is the normalization constant so that the sum of Inline graphic over Inline graphic and Inline graphic equals to 1. As such bivariate distribution is obtained from sampling paths with lengths 1 and 2, it seems to be a good sampling distribution for modelling the disease propagation within the second neighbors of an infected person.

B. The hierarchical agglomerative algorithm for pooled testing with a graph

We propose a pooling strategy that uses the hierarchical agglomerative algorithm for community detection in sampled graphs [20]. The detailed steps are outlined in Algorithm 1. Initially, every node in the input graph is assigned to a set (community) that contains the node itself. Then the algorithm recursively merges two sets that have the largest covariance into a new set. This is done by appending one set to the end of the other set so that the order of the elements in each set can be preserved. Each merge of two sets reduces the number of sets by 1. Eventually, there is only one remaining set, and the order of the elements in the remaining set is the pooling strategy from the algorithm. It was shown in [20] that all the sets are indeed communities if Algorithm 1 stops at the point when there does not exist a pair of two positively correlated sets. However, as our objective is to output a permutation for a pooling strategy, we continue the merge of two sets until there is only one remaining set.

Algorithm 1. The Hierarchical Agglomerative Algorithm for Pooled Testing with a Social Graph

  • Input: A sampled graph Inline graphic.

  • Output: A pooling strategy Inline graphic.

    (H1) Initially, the number of sets Inline graphic is set to be Inline graphic, and node Inline graphic is assigned to the Inline graphic set, i.e., Inline graphic, Inline graphic.

    (H2) Compute the covariance Inline graphic from (49) for all Inline graphic.

  • while Inline graphic do

    (H3) Find the pairs of two sets Inline graphic and Inline graphic that have the largest covariance Inline graphic.

    (H4) Merge Inline graphic and Inline graphic into a new set Inline graphic by appending Inline graphic to Inline graphic.

    (H5) Update the covariances as follows:
    graphic file with name M369.gif
  • for each Inline graphic do
    graphic file with name M371.gif
  • end

    Inline graphic.

  • end

  • (H6) There is only one remaining set. Output Inline graphic by letting Inline graphic be the Inline graphic element in the remaining set.

As an illustrating example of our algorithm, we use the Zachary karate club friendship network [23]. Such a friendship network is obtained by Wayne Zachary over the course of two years in the early 1970 s at an American university (see Fig. 3). During the course of the study, the club split into two clusters (marked with two different colors in Fig. 3) because of a dispute between its administrator (node 34) and its instructor (node 1). In Fig. 4(a), we show the dendrogram obtained from Algorithm 1 for the Zachary karate club friendship network by using the similarity measure in (51). A dendrogram for a hierarchical agglomerative algorithm is a tree-like graph with the height indicating the order of the merges of two sets. The pooling strategy is the list of the 34 nodes in the bottom of this figure. In Fig. 4(b), we illustrate the members of the Zachary karate club forming a line to be tested in a testing site.

Fig. 3.

Fig. 3.

The Zachary karate club friendship network.

Fig. 4.

Fig. 4.

(a) The dendrogram from Algorithm 1 for the Zachary karate club friendship network by using the similarity measure in (51). (b) An illustration of the 34 members of the Zachary karate club forming a line to be tested in a testing site.

IV. Numerical Results

A. Pooled testing on a line of a testing site

In this section, we compare the expected relative cost of Dorfman's two-stage method with that of a sophisticated group testing method in [15] by considering the special case with Inline graphic, Inline graphic and Inline graphic. In this case, there are two types of arriving groups, and such a group is of type 1 (resp. type 2) with probability Inline graphic (resp. Inline graphic). The sizes of these arriving groups are i.i.d. geometric random variables with parameter Inline graphic. Moreover, with probability 1, samples in the type 1 group are positive and those in the type 2 group are negative. Consequently, we have Inline graphic for all Inline graphic and it reduces to the serial correlated model in [24]. The expected relative cost in this case is

A.

where Inline graphic. Notice that from Theorem 4, (54) achieves the lower bound of the expected relative costs under (AInline graphic), (A2), and (A3).

The optimal group size of Inline graphic that induces the lowest expected relative cost in (54) can be determined by the prevalence rate Inline graphic and the parameter Inline graphic in the hidden Markov model. In general, the parameter Inline graphic is unknown and difficult to estimate; thus, in Section IV-A1, we choose the group size Inline graphic according to that in Table I of [8], which only depends on the prevalence rate Inline graphic. However, if one can estimate the parameter Inline graphic reliably, the optimal group size of Inline graphic can be selected accordingly to further reduce the expected relative cost. We optimize Inline graphic depending on both Inline graphic and Inline graphic in Section IV-A2.

TABLE I. The Expected Relative Cost of the Dorfman Two-Stage Algorithm With Group Size Inline graphic and the Lowest Expected Relative Cost of Inline graphic-Regular in [15]. The Numbers Given in Boldface are the Expected Relative Costs of Dorfman's Two-Stage Algorithm of the Smallest Values of Inline graphic That Outperform Those of the Inline graphic-Regular Pooling Matrices Under the Same Prevalence Rate Inline graphic.

The Dorfman Two-stage Algorithm Inline graphic-regular
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Lowest Cost
1% 11 0.1956 0.1865 0.1773 0.1681 0.1587 0.1493 0.1398 0.1302 0.1205 0.1108 0.1218
2% 8 0.2742 0.2620 0.2496 0.2371 0.2244 0.2116 0.1986 0.1854 0.1721 0.1586 0.1881
3% 6 0.3337 0.3207 0.3076 0.2943 0.2809 0.2673 0.2535 0.2395 0.2254 0.2111 0.2545
4% 6 0.3839 0.3675 0.3507 0.3337 0.3165 0.2989 0.2810 0.2629 0.2445 0.2257 0.3147
5% 5 0.4262 0.4098 0.3931 0.3762 0.3590 0.3415 0.3238 0.3057 0.2874 0.2689 0.3678
6% 5 0.4661 0.4472 0.4279 0.4082 0.3882 0.3678 0.3470 0.3259 0.3043 0.2824 0.4166
7% 5 0.5043 0.4831 0.4615 0.4393 0.4167 0.3935 0.3699 0.3457 0.3210 0.2958 0.4627
8% 4 0.5336 0.5148 0.4956 0.4761 0.4562 0.4360 0.4155 0.3947 0.3735 0.3519 0.5035
9% 4 0.5643 0.5437 0.5227 0.5014 0.4796 0.4574 0.4348 0.4117 0.3883 0.3643 0.5416
10% 4 0.5939 0.5718 0.5492 0.5261 0.5025 0.4784 0.4537 0.4286 0.4029 0.3767 0.5760

1). Group size Inline graphic determined by Inline graphic

In this section, we choose the group size Inline graphic from Table I of [8] that only depends on the prevalence rate Inline graphic (since the parameter Inline graphic in the hidden Markov model is generally unknown).

We numerically evaluate the expected relative cost in (54) for each value of Inline graphic ranging from 1% to 10% with increment of 1%, and each value of Inline graphic ranging from 0 to 0.9 with increment of 0.1. The results are shown in Table I. To compare the expected relative costs of Dorfman's two-stage algorithm (with positively correlated samples) with those of the Inline graphic-regular pooling matrices [15], we also list the lowest expected relative costs of the Inline graphic-regular pooling matrices (Table I of [15]) in Table I. In this table, we can easily verify that the expected relative cost decreases in Inline graphic. The numbers given in boldface are the expected relative costs of Dorfman's two-stage algorithm of the smallest values of Inline graphic that outperform those of the Inline graphic-regular pooling matrices under the same prevalence rate Inline graphic. We can observe that when the prevalence rate Inline graphic is low (e.g., Inline graphic), the gain by Dorfman two-stage method is not as good as that of Inline graphic-regular pooling matrix, except for some large Inline graphic. The reason is that under a low prevalence rate, there are very few positive samples in a group, and such positive samples can be detected easily by using the sophisticated group testing method, thus saving more testing costs. However, Dorfman's 2-stage algorithm can only check if the group contains at least one positive sample at the first stage. When a group of Inline graphic samples includes any positive ones (even if there is only one positive sample in the group), all the Inline graphic samples should be retested individually at the second stage. Thus, the performance of Dorfman's method is not as good as those of sophisticated group testing methods, on the premise that the prevalence rate is low and correlations between samples in a group are small. But when the prevalence rate Inline graphic is high (e.g., Inline graphic), the simple Dorfman's method can achieve better performance with some moderate positive correlation Inline graphic.

To show the advantage of using positively correlated samples in Dorfman's two-stage method, we calculate the ratio of the expected relative cost with the positive correlation Inline graphic to that of the i.i.d. Bernoulli samples (Inline graphic) in Table II. For example, under the prevalence rate Inline graphic, the expected relative cost with Inline graphic is 0.1865 from Table I, and thus the ratio is Inline graphic.

TABLE II. The Ratio of the Expected Relative Cost With Positive Correlation Inline graphic to That of the I.i.d. Bernoulli Samples (Inline graphic) Under Different Prevalence Rate Inline graphic. (Unit: %).
Inline graphic 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1% 95.4 90.7 85.9 81.2 76.3 71.5 66.6 61.6 56.6
2% 95.5 91.0 86.5 81.8 77.2 72.4 67.6 62.8 57.8
3% 96.1 92.2 88.2 84.2 80.1 76.0 71.8 67.6 63.3
4% 95.7 91.4 86.9 82.4 77.9 73.2 68.5 63.7 58.8
5% 96.1 92.2 88.3 84.2 80.1 76.0 71.7 67.4 63.1
6% 95.9 91.8 87.6 83.3 78.9 74.5 69.9 65.3 60.6
7% 95.8 91.5 87.1 82.6 78.0 73.3 68.5 63.7 58.6
8% 96.5 92.9 89.2 85.5 81.7 77.9 74.0 70.0 65.9
9% 96.4 92.6 88.9 85.0 81.1 77.1 73.0 68.8 64.6
10% 96.3 92.5 88.6 84.6 80.5 76.4 72.2 67.8 63.4

2). Group size Inline graphic determined by Inline graphic and Inline graphic

In this section, the optimal group size Inline graphic that induces the lowest expected relative cost is determined by both the prevalence rate Inline graphic and the correlation coefficient Inline graphic. For each value of Inline graphic ranging from 1% to 10% with increment of 1%, and each value of Inline graphic ranging from 0 to 0.9 with increment of 0.1, we show its optimal group size Inline graphic in Table III and its corresponding expected relative cost in Table IV. Intuitively, with correlated samples, the group size for pooled testing can be larger. This can be verified in Table III, which shows the size Inline graphic increases in Inline graphic for a fixed value of Inline graphic. To make a comparison of the expected relative costs of Dorfman's two-stage algorithm (with positively correlated samples) and those of the Inline graphic-regular pooling matrices [15], we also list the lowest expected relative costs of the Inline graphic-regular pooling matrices (Table I of [15]) in Table IV. To show the advantage of using positively correlated samples in Dorfman's two-stage method, we calculate the ratio of the expected relative cost with the positive correlation Inline graphic to that of the i.i.d. Bernoulli samples (Inline graphic) in Table V.

TABLE III. The Optimal Group Size of the Dorfman Two-Stage Algorithm With Different Values of the Prevalence Rate Inline graphic and the Correlation Coefficient Inline graphic.
Inline graphic 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1% 11 11 12 12 13 15 16 19 23 32
2% 8 8 8 9 10 11 12 14 16 23
3% 6 7 7 7 8 9 10 11 14 19
4% 6 6 6 7 7 8 9 10 12 17
5% 5 5 6 6 6 7 8 9 11 15
6% 5 5 5 6 6 6 7 8 10 14
7% 5 5 5 5 6 6 7 8 9 13
8% 4 4 5 5 5 6 6 7 9 12
9% 4 4 4 5 5 5 6 7 8 12
10% 4 4 4 4 5 5 6 7 8 11
TABLE IV. The Expected Relative Cost of the Dorfman Two-Stage Algorithm With Its Optimal Group Size in Table III, and the Lowest Expected Relative Cost of Inline graphic-Regular in [15]. The Numbers Given in Boldface are the Expected Relative Costs of Dorfman's Two-Stage Algorithm of the Smallest Values of Inline graphic That Outperform Those of the Inline graphic-Regular Pooling Matrices Under the Same Prevalence Rate Inline graphic.
The Dorfman Two-stage Algorithm with Positively Correlated Samples Inline graphic-regular
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Lowest Cost
1% 0.1956 0.1865 0.1771 0.1670 0.1559 0.1438 0.1303 0.1147 0.0961 0.0715 0.1218
2% 0.2742 0.2620 0.2496 0.2356 0.2209 0.2046 0.1862 0.1652 0.1397 0.1057 0.1881
3% 0.3337 0.3198 0.3044 0.2888 0.2708 0.2516 0.2299 0.2048 0.1744 0.1337 0.2545
4% 0.3839 0.3675 0.3507 0.3333 0.3131 0.2916 0.2673 0.2388 0.2045 0.1585 0.3147
5% 0.4262 0.4098 0.3921 0.3717 0.3509 0.3267 0.3003 0.2693 0.2317 0.1810 0.3678
6% 0.4661 0.4472 0.4279 0.4082 0.3841 0.3595 0.3304 0.2972 0.2568 0.2022 0.4166
7% 0.5043 0.4831 0.4615 0.4393 0.4162 0.3884 0.3586 0.3234 0.2803 0.2221 0.4627
8% 0.5336 0.5148 0.4939 0.4694 0.4443 0.4165 0.3847 0.3476 0.3025 0.2411 0.5035
9% 0.5643 0.5437 0.5227 0.4985 0.4712 0.4431 0.4091 0.3707 0.3237 0.2595 0.5416
10% 0.5939 0.5718 0.5492 0.5261 0.4973 0.4669 0.4328 0.3932 0.3437 0.277 0.5760
TABLE V. With Optimal Group Sizes in Table III, the Ratio of the Expected Relative Cost With Positive Correlation Inline graphic to That of the I.i.d. Bernoulli Samples (Inline graphic) Under Different Prevalence Rate Inline graphic. (Unit: %).
Inline graphic 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1% 95.4 90.5 85.4 79.7 73.5 66.6 58.7 49.2 36.6
2% 95.5 91.0 85.9 80.6 74.6 67.9 60.2 50.9 38.5
3% 95.8 91.2 86.6 81.2 75.4 68.9 61.4 52.3 40.1
4% 95.7 91.4 86.8 81.5 76.0 69.6 62.2 53.3 41.3
5% 96.1 92.0 87.2 82.3 76.7 70.5 63.2 54.4 42.5
6% 95.9 91.8 87.6 82.4 77.1 70.9 63.8 55.1 43.4
7% 95.8 91.5 87.1 82.5 77.0 71.1 64.1 55.6 44.0
8% 96.5 92.6 88.0 83.3 78.1 72.1 65.1 56.7 45.2
9% 96.4 92.6 88.4 83.5 78.5 72.5 65.7 57.4 46.0
10% 96.3 92.5 88.6 83.7 78.6 72.9 66.2 57.9 46.6

B. Pooled testing with a social graph

In this section, we report our simulation results for pooled testing with a social graph. For our experiments, we use a synthetic dataset and three real-world datasets. The synthetic dataset is constructed by the small-world model in [25] as follows. First, we generate a ring with 1000 nodes, and each node has a degree of 30 connected to its nearest neighbors. Then, for each edge, with probability 0.5, we remove that edge and add a new one to two randomly selected nodes. By doing so, we obtain the synthetic dataset. The three real-world datasets are: the email-Eu-core in [26], [27], the political blogs in [28] and the ego-Facebook in [29]. There are 986 nodes and 16 064 edges for the email-Eu-core network after removing multiple edges, self-loops, and nodes with degree 0. For the political blogs, there are 1224 nodes and 16 715 edges. For the ego-Facebook dataset, we remove multiple edges, self-loops, and nodes that are not in the largest component of the network as in [30]. By doing so, there are 2851 nodes and 62 318 edges left in the network. The basic information of datasets is given in Table VI.

TABLE VI. Basic Information of Four Datasets. Note That the Political Blogs Dataset is Not Connected; the Average Path Length and the Diameter of the Largest Connected Component in Political Blogs are Reported.

Dataset small-world email-Eu-core political blogs ego-Facebook
Number of nodes 1000 986 1224 2851
Number of edges 15 000 16 064 16 715 62 318
Average degree 30 32.5842 27.3121 43.7166
Average excess degree 29.3581 73.6564 80.2587 98.0664
Average clustering
coefficient
0.1133 0.4071 0.3197 0.5914
Average path length 2.4414 2.5843 2.7467 4.1353
Diameter 3 7 8 14
Density 3.0030e-2 3.3080e-2 2.2332e-2 1.5339e-2

We also need a model for modelling disease propagation in a network. A widely used model is the independent cascade (IC) model (see, e.g., Kempe, Kleinberg, and Tardos in [31]). In the IC model, an infected node can transmit the disease to a neighboring susceptible node (through an edge) with a certain propagation probability Inline graphic. An infected neighboring node can continue the propagation of the disease to its neighbors. For our experiments, a set of seeded nodes Inline graphic are randomly selected in the IC model. Each neighbor of a seeded node is infected with probability Inline graphic. These infected nodes are called the first-generation cascade of a seeded node and they can continue infecting their neighbors. The Inline graphic-generation cascade from a seeded node is generated by collecting the set of infected nodes within the distance Inline graphic of the seeded node, and the Inline graphic-generation cascade from the set Inline graphic is generated by taking the union of the Inline graphic-generation cascades of the seeded nodes in Inline graphic. In our experiments, we set Inline graphic and Inline graphic.

The pooling strategy for each dataset is obtained in the same way as that for the Zachary karate club friendship network in Section III-B. Specifically, we first generate a sampled graph by using the bivariate distribution in (51). Then we use the hierarchical agglomerative algorithm for pooled testing with a social graph in Algorithm 1 to generate the pooling strategy. In Fig. 5 (resp. Fig. 6, Fig. 7, Fig. 8), we show the expected relative cost of Dorfman's two-stage algorithm with the group size Inline graphic, as a function of the number of seeded nodes Inline graphic for the small-world dataset (resp. the email-Eu-core dataset, the political blogs dataset, the ego-Facebook dataset).

Fig. 5.

Fig. 5.

The expected relative cost of Dorfman's two-stage algorithm with Inline graphic as a function of the number of seeded nodes Inline graphic from 1 to 5 for the small-world dataset.

Fig. 6.

Fig. 6.

The expected relative cost of Dorfman's two-stage algorithm with Inline graphic as a function of the number of seeded nodes Inline graphic from 1 to 5 for the email-Eu-core dataset.

Fig. 7.

Fig. 7.

The expected relative cost of Dorfman's two-stage algorithm with Inline graphic as a function of the number of seeded nodes Inline graphic from 1 to 5 for the political blogs dataset.

Fig. 8.

Fig. 8.

The expected relative cost of Dorfman's two-stage algorithm with Inline graphic as a function of the number of seeded nodes Inline graphic from 1 to 5 for the ego-Facebook dataset.

In our experiments, the number of seeded nodes Inline graphic is from 1 to 5. Each data point is obtained from averaging 10 000 independent runs. Specifically, for the Inline graphic run, we measure the prevalence rate Inline graphic and the total number of tests Inline graphic. The expected relative cost is calculated by

B.

where Inline graphic is the number of nodes in the graph. The average prevalence rate is calculated by

B.

As shown in Fig. 5, the pooling strategy from Algorithm 1 results in much lower expected relative costs than those from the random pooling strategy. We note that the two curves, Random(simulation) and Random(Theory) from (1), are almost identical in this figure. We confirm the same finding for the email-Eu-core, the political blogs and the ego-Facebook datasets in Fig. 6, Fig. 7 and Fig. 8. To understand the effect of the number of seeded nodes in a dataset, we show the average prevalence rates in Table VII. As shown in this table, the prevalence rates are in the range of 1% to 12% that are basically in line with the prevalence rates of COVID-19 in various countries. Moreover, we can observe that the email-Eu-core network has the highest prevalence rates among the four datasets. Intuitively, the higher density and the higher averaging clustering coefficient, the higher the prevalence rate. However, under the IC model, the total number of people infected in a network highly depends on the network's structure. To conclude, under the IC model, the expected relative costs for the small-world dataset and the three real-world datasets can be significantly reduced by roughly 10%-13% and 20%-35%, respectively, by exploiting positive correlation within a social graph.

TABLE VII. Average Prevalence Rates (Unit: %).

Dataset Inline graphic Number of seeds 1 2 3 4 5
small-world 1.26 2.51 3.75 4.95 6.13
email-Eu-core 2.63 5.15 7.42 9.62 11.68
political blogs 1.91 3.77 5.56 7.21 8.78
ego-Facebook 1.26 2.44 3.59 4.72 5.79

V. Conclusion

By modelling the arrival process of a COVID-19 testing site by a regenerative process, we showed that the expected relative cost for positively correlated samples is not higher than that of i.i.d. samples with the same prevalence rate. A more detailed model by a Markov modulated process allows us to derive a closed-form expression for the expected relative cost. Using the closed-form expression in Theorem 3, we showed that for a specific Markov modulated process with a moderate positive correlation, the gain by Dorfman's two-stage method outperforms those by using sophisticated strategies with Inline graphic-regular pooling matrices when the prevalence rate is higher than 5%.

One important extension of our results is to consider the pooled testing problem with a social graph. The frequent social contacts between two persons are connected by an edge in the social graph. To exploit positive correlation in a social graph, we adopted the probabilistic framework of sampled graphs for structural analysis in [18][20] and proposed a hierarchical agglomerative algorithm for pooled testing with a social graph in Algorithm 1. Our numerical results show that the pooled testing strategy obtained from Algorithm 1 can have significant cost reduction (roughly 20%-35%) in comparison with random pooling when the Dorfman two-stage algorithm is used.

There are several possible extensions for our work:

  • (i)

    Association of random samples: in this paper, we model in the arrival process by three explicit assumptions. It is possible to further generalize our results by using the notion of association of random variables [32]. In particular, it was shown in Theorem 4.1 of [32] that (15) and (16) hold for associated binary random variables.

  • (ii)

    Sensitivity/specificity analysis: in this paper, we did not consider the effect of noise. Noise (see, e.g., the monograph [33] for various noise models) can affect sensitivity (true positive rate) and specificity (true negative rate) of a testing method. It would be of interest to see how the expected relative cost is affected by a certain type of noise, e.g., the dilution noise.

  • (iii)

    Information theory perspective: our analysis is mainly from the queueing theory perspective. There are two recent related works [34], [35] that also exploit community structure for pool testing from the information theory perspective. Such a perspective could lead to lower bounds on the number of tests.

Biographies

graphic file with name lin-3081759.gif

Yi-Jheng Lin received the B.S. degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2018. He is currently working toward the Ph.D. degree with the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan. His research interests include wireless communication and cognitive radio networks.

graphic file with name yu-3081759.gif

Che-Hao Yu received the B.S. degree in mathematics and the M.S. degree in communications engineering from National Tsing-Hua University, Hsinchu, Taiwan, in 2018 and 2020, respectively. His research focuses on 5G wireless communication.

graphic file with name liu-3081759.gif

Tzu-Hsuan Liu received the B.S. degree in communication engineering from National Central University, Taoyuan, Taiwan, in 2019 and the M.S. degree from the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan, in 2020. Since January 2021, she has been with MediaTek Inc., Hsinchu, Taiwan. Her research focuses on 5G wireless communication.

graphic file with name chang-3081759.gif

Cheng-Shang Chang (Fellow, IEEE) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1983, and the M.S. and Ph.D. degrees in electrical engineering from Columbia University, New York, NY, USA, in 1986 and 1989, respectively. From 1989 to 1993, he was a Research Staff Member with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA. Since 1993, he has been with the Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, where he is a Tsing Hua Distinguished Chair Professor. He is the author of the book Performance Guarantees in Communication Networks (Springer, 2000) and the coauthor of the book Principles, Architectures and Mathematical Theory of High Performance Packet Switches (Ministry of Education, R.O.C., 2006). His current research interests include concerned with network science, big data analytics, mathematical modeling of the Internet, and high-speed switching. From 1992 to 1999, he was the Editor of Operations Research, from 2007 to 2009, the Editor of the IEEE/ACM Transactions on Networking, and from 2014 to 2017, the Editor of the IEEE Transactions on Network Science and Engineering. He is currently the Editor-at-Large of the IEEE/ACM Transactions on Networking. He is a Member of IFIP Working Group 7.3. He was the recipient of the IBM Outstanding Innovation Award in 1992, the IBM Faculty Partnership Award in 2001, and the Outstanding Research awards from the National Science Council, Taiwan, in 1998, 2000, and 2002, respectively, the Outstanding Teaching awards from both the College of EECS and the university itself in 2003, the Merit NSC Research Fellow Award from the National Science Council, R.O.C. in 2011, the Academic Award in 2011 and the National Chair Professorship in 2017 from the Ministry of Education, R.O.C., and the 2017 IEEE INFOCOM Achievement Award. In 2002, he was appointed as the first Y. Z. Hsu Scientific Chair Professor.

graphic file with name chen-3081759.gif

Wen-Tsuen Chen (Life Fellow, IEEE) received the B.S. degree in nuclear engineering from National Tsing Hua University, Hsinchu, Taiwan, in 1970, and the M.S. and Ph.D. degrees in electrical engineering and computer sciences from the University of California, Berkeley, CA, USA, in 1973 and 1976, respectively. Since 1976, he has been with the Department of Computer Science, National Tsing Hua University and was the Chairman of the Department, the Dean of College of Electrical Engineering and Computer Science, and the President of National Tsing Hua University. In March 2012, he joined the Academia Sinica, Taipei, Taiwan, as a Distinguished Research Fellow of the Institute of Information Science until June 2018. He is currently Sun Yun-suan Chair Professor with National Tsing Hua University. His research interests include computer networks, wireless sensor networks, mobile computing, and parallel computing. He was the recipient of numerous awards for the academic accomplishments in computer networking and parallel processing, including the Outstanding Research Award of the National Science Council, the Academic Award in Engineering from the Ministry of Education, the Technical Achievement Award, and the Taylor L. Booth Education Award of the IEEE Computer Society. He is currently the lifelong National Chair of the Ministry of Education, Taiwan. He is the Founding General Chair of the IEEE International Conference on parallel and distributed systems and the General Chair of the IEEE International Conference on distributed computing systems. He is a Fellow of the Chinese Technology Management Association.

Funding Statement

This work was supported by the Ministry of Science and Technology, Taiwan, under Grant 109-2221-E-007-091-MY2, and in part by Qualcomm Technologies under Grant SOW NAT-435533.

Contributor Information

Yi-Jheng Lin, Email: s107064901@m107.nthu.edu.tw.

Che-Hao Yu, Email: chehaoyu@gapp.nthu.edu.tw.

Tzu-Hsuan Liu, Email: carina000314@gmail.com.

Cheng-Shang Chang, Email: cschang@ee.nthu.edu.tw.

Wen-Tsuen Chen, Email: wtchen@cs.nthu.edu.tw.

References

  • [1].Chen Y.-C., Lu P.-E., Chang C.-S., and Liu T.-H., “A time-dependent SIR model for COVID-19 with undetectable infected persons,” IEEE Trans. Netw. Sci. Eng., vol. 7, no. 4, pp. 3279–3294, Oct.–Dec. 2020. [DOI] [PMC free article] [PubMed]
  • [2].“ Pooled sample testing and screening testing for COVID-19,” Aug. 2020. [Online]. Available: https://www.fda.gov/medical-devices/coronavirus-covid-19-and-medical-devices/pooled-sample-testing-and-screening-testing-covid-19
  • [3].Lohse S., et al. , “Pooling of samples for testing for SARS-CoV-2 in asymptomatic people,” Lancet Infect. Dis., vol. 20, no. 11, pp. 1231–1232, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Abdalhamid B., Bilder C. R., McCutchen E. L., Hinrichs S. H., Koepsell S. A., and Iwen P. C., “Assessment of specimen pooling to conserve SARS CoV-2 testing resources,” Amer. J. Clin. Pathol., vol. 153, no. 6, pp. 715–718, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Yelin I., et al. , “Evaluation of COVID-19 RT-qPCR test in multi-sample pools,” Clin. Infect. Dis., vol. 71, no. 16, pp. 2073–2078, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Gollier C. and Gossner O., “Group testing against COVID-19,” Covid Econ., vol. 2, 2020. [Google Scholar]
  • [7].“ Interim guidance for use of pooling procedures in SARS-CoV-2 diagnostic, screening, and surveillance testing,” Jun. 2020. [Online]. Available: https://www.cdc.gov/coronavirus/2019-ncov/lab/pooling-procedures.html
  • [8].Dorfman R., “The detection of defective members of large populations,” Ann. Math. Statist., vol. 14, no. 4, pp. 436–440, 1943. [Google Scholar]
  • [9].“List of countries implementing pool testing strategy against COVID-19,” 2020. [Online]. Available: https://en.wikipedia.org/wiki/List_of_countries_implementing_pool_testing_strategy_against_COVID-19
  • [10].Sinnott-Armstrong N., Klein D., and Hickey B., “Evaluation of group testing for SARS-CoV-2 RNA,” Medrxiv, 2020. [Google Scholar]
  • [11].Shental N., et al. , “Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers,” Sci. Adv., vol. 6, no. 37, 2020, Art. no. eabc5961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Ghosh S., et al. , “Tapestry: A single-round smart pooling technique for COVID-19 testing,” medRxiv, 2020. [Google Scholar]
  • [13].Ghosh S., et al. , “A compressed sensing approach to group-testing for COVID-19 detection,” 2020, arXiv:2005.07895.
  • [14].Mutesa L., et al. , “A pooled testing strategy for identifying SARS-CoV-2 at low prevalence,” Nature, vol. 589, pp. 276–280, 2021. [DOI] [PubMed] [Google Scholar]
  • [15].Lin Y.-J., Yu C.-H., Liu T.-H., Chang C.-S., and Chen W.-T., “Comparisons of pooling matrices for pooled testing of COVID-19,” 2020, arXiv:2010.00060. [DOI] [PMC free article] [PubMed]
  • [16].Lendle S. D., Hudgens M. G., and Qaqish B. F., “Group testing for case identification with correlated responses,” Biometrics, vol. 68, pp. 532–540, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Deckert A., Bärnighausen T., and Kyei N. N., “Simulation of pooled-sample analysis strategies for COVID-19 mass testing,” Bull. World Health Org., vol. 98, no. 9, pp. 590–598, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Chang C.-S., Hsu C.-Y., Cheng J., and Lee D.-S., “A general probabilistic framework for detecting community structure in networks,” in Proc. IEEE INFOCOM, 2011, pp. 730–738.
  • [19].Chang C.-S., Chang C.-J., Hsieh W.-T., Lee D.-S., Liou L.-H., and Liao W., “Relative centrality and local community detection,” Netw. Sci., vol. 3, no. 4, pp. 445–479, 2015. [Google Scholar]
  • [20].Chang C.-S., Lee D.-S., Liou L.-H., Lu S.-M., and Wu M.-H., “A probabilistic framework for structural analysis and community detection in directed networks,” IEEE/ACM Trans. Netw., vol. 26, no. 1, pp. 31–46, Feb. 2017. [Google Scholar]
  • [21].Nelson R., Probability, Stochastic Processes, and Queueing Theory: The Mathematics of Computer Performance Modeling. New York, NY, USA: Springer, 2013. [Google Scholar]
  • [22].Newman M., Networks: An Introduction. London, U.K.: Oxford Univ. Press, 2010. [Google Scholar]
  • [23].Zachary W. W., “An information flow model for conflict and fission in small groups,” J. Anthropol. Res., vol. 33, no. 4, pp. 452–473, 1977. [Google Scholar]
  • [24].Hung M. and Swallow W. H., “Robustness of group testing in the estimation of proportions,” Biometrics, vol. 55, no. 1, pp. 231–237, 1999. [DOI] [PubMed] [Google Scholar]
  • [25].Watts D. J. and Strogatz S. H., “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, pp. 440–442, 1998. [DOI] [PubMed] [Google Scholar]
  • [26].Yin H., Benson A. R., Leskovec J., and Gleich D. F., “Local higher-order graph clustering,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2017, pp. 555–564. [DOI] [PMC free article] [PubMed]
  • [27].Leskovec J., Kleinberg J., and Faloutsos C., “Graph evolution: Densification and shrinking diameters,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 2, 2007, doi: 10.1145/1217299.1217301. [DOI] [Google Scholar]
  • [28].Adamic L. A. and Glance N., “The political blogosphere and the 2004 US election: Divided they blog,” in Proc. 3rd Int. Workshop Link Discov., 2005, pp. 36–43.
  • [29].Leskovec J. and Mcauley J. J., “Learning to discover social circles in ego networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 539–547.
  • [30].Lu P.-E. and Chang C.-S., “Explainable, stable, and scalable graph convolutional networks for learning graph representation,” 2020, arXiv:2009.10367.
  • [31].Kempe D., Kleinberg J., and Tardos É., “Maximizing the spread of influence through a social network,” in Proc. 9th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2003, pp. 137–146.
  • [32].Esary J. D., Proschan F., and Walkup D. W., “Association of random variables, with applications,” Ann. Math. Statist., vol. 38, no. 5, pp. 1466–1474, 1967.
  • [33].Aldridge M., Johnson O., and Scarlett J., “Group testing: An information theory perspective,” Foundations Trends Commun. Inf. Theory, vol. 15, no. 3-4, pp. 196–392, 2019. [Google Scholar]
  • [34].Nikolopoulos P., Srinivasavaradhan S. R., Guo T., Fragouli C., and Diggavi S., “Group testing for connected communities,” in Proc. Int. Conf. Artif. Intell. Statist., 2021, pp. 2341–2349.
  • [35].Nikolopoulos P., Srinivasavaradhan S. R., Guo T., Fragouli C., and Diggavi S., “Group testing for overlapping communities,” 2021, arXiv:2012.02804.

Articles from Ieee Transactions on Network Science and Engineering are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES