Informative Group Testing for Multiplex Assays

Christopher R Bilder; Joshua M Tebbs; Christopher S McMahan

doi:10.1111/biom.12988

. Author manuscript; available in PMC: 2021 Mar 15.

Published in final edited form as: Biometrics. 2019 Mar 28;75(1):278–288. doi: 10.1111/biom.12988

Informative Group Testing for Multiplex Assays

Christopher R Bilder ^1,^*, Joshua M Tebbs ², Christopher S McMahan ³

PMCID: PMC7959482 NIHMSID: NIHMS1675331 PMID: 30353548

Summary:

Infectious disease testing frequently takes advantage of two tools–group testing and multiplex assays–to make testing timely and cost effective. Until the work of Tebbs et al. (2013) and Hou et al. (2017), there was no research available to understand how best to apply these tools simultaneously. This recent work focused on applications where each individual is considered to be identical in terms of the probability of disease. However, risk-factor information, such as past behavior and presence of symptoms, is very often available on each individual to allow one to estimate individual-specific probabilities. The purpose of our paper is to propose the first group testing algorithms for multiplex assays that take advantage of individual risk-factor information as expressed by these probabilities. We show that our methods significantly reduce the number of tests required while preserving accuracy. Throughout this paper, we focus on applying our methods with the Aptima Combo 2 Assay that is used worldwide for chlamydia and gonorrhea screening.

Keywords: Case identification, Correlated binary data, Latent response, Pooled testing, Sensitivity, Specificity

1. Introduction

“Make do with less” is a constant mantra in today’s society. Governments and businesses all want to cut budgets without sacrificing services. Health care in particular is not immune to this philosophy despite its importance to personal well-being. Fortunately, infectious disease testing is one area of health care that is successfully making do with less. This is because laboratories are increasingly using group testing (also known as pooled testing) to provide reliable diagnostic testing in a timely manner and at lower costs. Group testing works by amalgamating specimens from individuals (e.g., blood or urine) into pools and performing tests on these pools. When compared to testing each specimen individually, a substantial reduction in the number of tests occurs when a disease is rare, because multiple persons can be classified as disease free simultaneously when their group results in a negative outcome. Widespread health care applications of group testing include screening blood donations for infectious diseases (e.g., American Red Cross, 2018; Saá et al., 2018) and testing for chlamydia and gonorrhea as part of national sexually transmitted disease assessment programs (e.g., Lewis et al., 2012). Outside of health care, group testing is widely used in areas including infectious disease testing in animals (Nebraska Veterinary Diagnostic Center, 2018), plant disease assessment (Gildow et al., 2008), and drug discovery (Kainkaryam and Woolf, 2009).

Laboratories also make do with less by using multiplex assays. These assays test for multiple diseases in a single application rather than one-disease, one-test assays. Examples include the Procleix Ultrio Assay for HIV, hepatitis B, and hepatitis C and the Aptima Combo 2 Assay for chlamydia and gonorrhea. Multiplex assays used with group testing can be quite cost effective. For example, the State Hygienic Laboratory (SHL) at the University of Iowa uses group testing and the Aptima Combo 2 Assay with swab specimens. With thousands of individuals tested per year and each test costing about $37, the SHL has estimated savings to be approximately 3 million dollars during a recent 5-year evaluation period.

The testing process used by the SHL is relatively similar to other laboratories. At the SHL, individual specimens are randomly assigned to groups of size four, and these groups are formed via a Tecan DTS robotic platform. The Aptima Combo 2 Assay is applied to each group to detect the ribosomal ribonucleic acid from the Chlamydia trachomatis and Neisseria gonorrhoeae bacteria. These bacteria are what lead to the chlamydia and gonorrhea diseases, respectively. If a group tests negatively for both diseases, all group members are declared negative. If a group tests positively for at least one disease, all group members are retested individually using the same multiplex assay. This same assay is used (rather than switching to single-disease assays) because it is much simpler for lab technicians to use one assay throughout the testing process.

Given the pervasiveness of group testing and multiplex assays in laboratories now, it is perhaps surprising that statistical research has not kept up in this area. Tebbs et al. (2013) were first to examine the use of group testing with multiplex assays. They focused on the two-stage algorithm used by the SHL, which represented a direct extension of the seminal Dorfman (1943) group-testing work for single-disease assays. Motivated by research that has shown other algorithms to often be more efficient (i.e., lower number of tests) than Dorfman’s algorithm for single-disease assays, Hou et al. (2017) generalized Tebbs et al. (2013) to allow for more than two stages in a hierarchical manner (any positively testing group is divided into multiple subgroups to be tested in the next stage).

A restrictive assumption made by both Tebbs et al. (2013) and Hou et al. (2017) was that each individual tested had an equal probability of being positive for a particular disease. However, due to personal behavior or clinical observations, it is natural to think of some individuals as being at higher risk for disease than others. The purpose of our paper here is to develop the first informative group testing algorithms to exploit this type of information with multiplex assays and to show these algorithms can lead to a substantial improvement in testing efficiency. Informative group testing was introduced by Bilder et al. (2010) but has only been examined in the context of single-disease assays (e.g., Black et al., 2015, Liu et al., 2017). The multiplex assay setting is much more complicated because disease statuses are likely to be correlated for each individual. Along with most specimens not being tested individually and assays being imperfect, this leads to unobservable, correlated binary random variables representing individual disease statuses as the underlying stochastic framework for this challenging problem.

We develop the first informative group testing algorithms for multiplex assays as follows. Section 2 develops a new algorithm that can be applied in three or more stages. We derive the expected number of tests and accuracy measures for the algorithm. These operating characteristics are used to determine how best to implement it for a specific application. Section 3 also develops a new informative group testing algorithm but for only two stages. We draw connections between this algorithm and the algorithm from Section 2, which will make clear why this order of development was chosen. Section 4 examines these algorithms through a Monte Carlo simulation study. We provide insight regarding under what conditions these new algorithms perform best, while also comparing to the non-informative group testing work of Tebbs et al. (2013) and Hou et al. (2017). Section 5 applies our new algorithms in the context of the Aptima Combo 2 Assay and its use at laboratories in Idaho, Iowa, and Oregon. We show that our new algorithms can substantially reduce the number of tests needed. Finally, Section 6 summarizes our work and offers ideas for extensions.

2. Hierarchical testing in three or more stages

Because group testing is used to reduce the time and cost of testing, choosing the exact algorithm to implement is an extremely important decision for a laboratory. Factors that affect time and cost include: 1) selection of an initial (stage 1) group size, 2) number of subgroups and their sizes at each stage, 3) number of stages, and 4) choosing what individuals are in which subgroups. These factors can lead to an extremely large number of possible testing configurations, which makes developing a general notation to be somewhat cumbersome. To address these issues, we introduce a group membership matrix M as a simple tool to uniquely define a hierarchical group testing algorithm for an initial group of I individuals. The rows of this matrix correspond to stages s = 1, …, S, and the columns correspond to individuals i = 1, …, I. Cell values represent the group number of individual i at stage s. For example, consider a three-stage algorithm with a group size of I = 30 in stage 1, three groups of size 10 in stage 2, and individual testing in stage 3. This algorithm is described by Sherlock et al. (2007) for HIV testing in the Seattle area. The group membership matrix is

[\begin{array}{l} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 & 3 & 3 & 3 & 3 & 3 & 3 & 3 & 3 & 3 & 3 \\ 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 & 20 & 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 \end{array}] .

Although this algorithm is easily described in words, others are not, especially when group sizes are unequal or when individuals are not tested in later stages. We provide examples of these more complicated testing configurations in Web Appendix A. It is also important to note that M represents the totality of testing that would be performed if each group tests positively until individual testing. Therefore, actual implementation may result in fewer tests. For example, in the previous algorithm, testing would end at stage 1 if the initial group tests negatively for all diseases.

With so many testing configurations possible, it is desirable to implement one which is optimal in some manner. We choose to minimize the expected number of tests per individual and define the resulting configuration with its corresponding M as the optimal testing configuration (OTC). Alternative optimal criteria have been suggested for single-disease group testing, and we provide a brief discussion of them in Web Appendix B.

2.1. Expected number of tests

Consider an initial group of I individuals. We represent the number of individuals within group j at stage s by I_sj (I₁₁ ≡ I) and the number of possible groups at stage s by c_s. If group j at stage s tests positively for at least one disease (or a pathogen that can lead to a disease), it is divided into m_sj groups. Thus, in the context of the three-stage algorithm just described, I₁₁ = 30, c₁ = 1, and m₁₁ = 3 for stage 1. Continuing for stage 2, I_2j = 10, c₂ = 3, and m_2j = 10 for j = 1, 2, 3. Lastly, for stage 3, I_3j = 1, c₃ = 30, and m_3j = 0 for j = 1, …, 30. This precise representation of group membership is needed to provide a general expression for the expected number of tests.

Define T as a random variable denoting the total number of tests needed to determine which individuals are positive and negative in an initial group of size I. Disease statuses are regarded as being independent across individuals, but we allow for dependence among disease statuses for each individual. All individuals are hierarchically tested in successively smaller groups in up to S stages until they are tested alone or their group tests negatively for each disease. At each stage, an individual is a member of only one group. Define G_sjk as a binary random variable denoting the test result for the kth disease (k = 1, …, K) in group j at stage s, where G_sjk = 1 (0) denotes a positive (negative) test result. For example, the group at stage 1 may result in (G₁₁₁, G₁₁₂) = (0, 1), meaning that the initial group tested negatively for the first disease and tested positively for the second disease. Following current group testing practice with multiplex assays, we assume that if a group tests positively for at least one disease, its members will be retested for all diseases again in the next stage.

To account for successive stage dependence, we define $G_{s j k}^{(t)}$ as the ancestor group result for G_sjk at stage t ⩽ s. These ancestor groups represent the groups that need to be tested at earlier stages in order for group j at stage s to be tested. Thus, individuals tested to produce the G_sjk response are a subset of the individuals that produce the $G_{s j k}^{(t)}$ response. This notation also means that $G_{s j k}^{(s)} \equiv G_{s j k}$ , where the ancestor form is used at times for notational convenience. For example, $G_{32 k} \equiv G_{32 k}^{(3)}$ (group 2 in stage 3) for the previously given three-stage algorithm has ancestor pool results of $G_{32 k}^{(2)} \equiv G_{21 k}$ and $G_{32 k}^{(1)} \equiv G_{21 k}^{(1)} \equiv G_{11 k}$ . With this definition of ancestor groups, the expected number of tests using group membership matrix M can be expressed as

E (T ∣ M) = 1 + \sum_{s = 1}^{S - 1} \sum_{j = 1}^{c_{s}} m_{s j} P (G_{s j +}^{(1)} > 0, \dots, G_{s j +}^{(s)} > 0),

(1)

where $G_{s j +}^{(t)} = G_{s j 1}^{(t)} + \dots + G_{s j K}^{(t)}$ for t = 1, …, s. The joint probability in Equation (1) represents the successive testing that occurs up to stage s. For example, in the previously given three-stage algorithm, the second individual is tested at stage 3 with probability $P (G_{21 +}^{(1)} > 0, G_{21 +}^{(2)} > 0) = P (G_{11 +} > 0, G_{21 +} > 0)$ . The leading 1 in Equation (1) is included because the group at stage 1 is always tested. This group tests positively for at least one disease with probability $P (G_{s j +}^{(1)} > 0) = P (G_{11 +} > 0)$ , and a positive outcome leads to m₁₁ new tests at stage 2. Subsequent stages have m_sj new tests that occur with probability $P (G_{s j +}^{(1)} > 0, \dots, G_{s j +}^{(s)} > 0)$ .

While Equation (1) is relatively simple in form, calculating the joint probability within it is challenging for a number of reasons. First, we need to account for potential testing error because assays are rarely 100% accurate. Thus, while we observe G_sjk, we would really like to observe ${\tilde{G}}_{s j k}$ , say, denoting the true positive (1) and negative (0) group status. Second, these disease statuses for each group at each stage are correlated. For example, ${\tilde{G}}_{111}$ and ${\tilde{G}}_{112}$ are correlated random variables giving the initial group’s statuses for two diseases. Third, due to the testing hierarchy, each response at stage s is dependent on its ancestor group response at stage s − 1. Lastly and most importantly, each individual has potentially a different probability of being positive for the diseases. By defining the true status for individual i and disease k as ${\tilde{Y}}_{i k} = 1 (0)$ for a positive (negative) status, we can denote these joint probabilities of disease for individual i as $P ({\tilde{Y}}_{i 1} = {\tilde{y}}_{1}, \dots, {\tilde{Y}}_{i K} = {\tilde{y}}_{K}) = p_{i {\tilde{y}}_{1} \dots {\tilde{y}}_{K}}$ , where ${\tilde{y}}_{1}, \dots, {\tilde{y}}_{K} \in {0, 1}$ . These individual-specific probabilities will be used for informative group testing in Section 2.2 to take advantage of the risk-factor information available.

For the above reasons, the derivation of $P (G_{s j +}^{(1)} > 0, \dots, G_{s j +}^{(s)} > 0)$ is quite long and requires special attention to understand its nuances. We provide this derivation in Web Appendix C.1. In summary, $P (G_{s j +}^{(1)} > 0, \dots, G_{s j +}^{(s)} > 0)$ can be shown to be equal to

\sum_{{\tilde{g}}_{s j}^{(1)}} \dots \sum_{{\tilde{g}}_{s j}^{(s)}} [\prod_{s^{'} = 1}^{s} {1 - \prod_{k = 1}^{K} {(1 - S_{e : s j k}^{(s^{'})})}^{{\tilde{g}}_{s j k}^{(s^{'})}} {(S_{p : s j k}^{(s^{'})})}^{1 - {\tilde{g}}_{s j k}^{(s^{'})}}}] P ({\tilde{G}}_{s j}^{(1)} = {\tilde{g}}_{s j}^{(1)}) \times \prod_{s^{'} = 1}^{s - 1} P ({\tilde{G}}_{s j}^{(s^{'} + 1)} = {\tilde{g}}_{s j}^{(s^{'} + 1)} ∣ {\tilde{G}}_{s j}^{(s^{'})} = {\tilde{g}}_{s j}^{(s^{'})}),

(2)

where ${\tilde{G}}_{s j}^{(t)} = ({\tilde{G}}_{s j 1}^{(t)}, \dots, {\tilde{G}}_{s j K}^{(t)})$ is a vector of ancestor true positive/negative statuses for group j at stage t and ${\tilde{g}}_{s j}^{(t)}$ is a vector of corresponding 0 and 1 potential realizations for each ${\tilde{G}}_{s j k}^{(t)}, k = 1, \dots, K$ . Expressions for $P ({\tilde{G}}_{s j}^{(1)} = {\tilde{g}}_{s j}^{(1)})$ and $P ({\tilde{G}}_{s j}^{(s^{'} + 1)} = {\tilde{g}}_{s j}^{(s^{'} + 1)} ∣ {\tilde{G}}_{s j}^{(s^{'})} = {\tilde{g}}_{s j}^{(s^{'})})$ are given in Web Appendix C.1. It is important to note that these probability expressions are functions of $p_{i {\tilde{y}}_{1} \dots {\tilde{y}}_{K}}$ . Values for the sensitivity $S_{e : s j k}^{(s^{'})} = P (G_{s j k}^{(s^{'})} = 1 ∣ {\tilde{G}}_{s j k}^{(s^{'})} = 1)$ and the specificity $S_{p : s j k}^{(s^{'})} = P (G_{s j k}^{(s^{'})} = 0 ∣ {\tilde{G}}_{s j k}^{(s^{'})} = 0)$ are obtained from large validation trials performed by the assay manufacturer. Properly calibrated assays will typically have the same accuracies for each group at each stage, especially for nucleic acid amplification tests. However, our general notation allows for potential differences if needed, such as when dilution is a concern or when different assays are used within the testing algorithm.

We provide an R function in Web Appendix D to calculate E(T|M). Being able to calculate E(T|M) in closed form allows us to address optimal ways to apply group testing with multiplex assays, which we describe next.

2.2. Optimal testing configuration

The OTC is the most efficient group testing algorithm possible. It is given by a group membership matrix that minimizes E(T|M)/I over all possible M for a specified number of stages S and for a range of values for I to consider. Equipment constraints, assay detection thresholds, and time limitations all play important roles for laboratories when deciding on maximum values for I and S to include in a search for the OTC. Since the year 2000, we have seen at most 90 individuals pooled (Sherlock et al., 2007) and at most 4 stages used (Quinn et al., 2000) for human infectious disease testing. Most frequently, I is less than 50 and S is 2 or 3. For these reasons, one can most often enumerate over all possible M and calculate E(T|M)/I for each to find the OTC. Combinatorial optimization algorithms can be used instead in those rare cases when laboratories are without these types of constraints.

To search for the OTC, we adopt similar conventions as used by Hwang (1975), McMahan et al. (2012a), Black et al. (2012), and Black et al. (2015) for single-disease assays. In summary, they proposed to order individuals by their probabilities of disease (say, $p_{i {\tilde{y}}_{1}}$ ) and subsequently assign individuals in sequence (smallest to largest probabilities of being positive) to groups of the same or smaller size. For example, with I = 7, a testing configuration could have the individuals with the four smallest probabilities in the first group, the individuals with the next two smallest probabilities in the second group, and the individual with the largest probability tested alone. One can show mathematically that the OTC will always occur with this type of group assignment in special cases (no testing error, Hwang, 1975; halving of groups, Black et al., 2012). These conventions are also very intuitive toward the ultimate goal of minimizing the number of tests. For example, consider the most extreme situation when individuals either have a probability of 0 or 1 for being positive and assume there are no testing errors. The most efficient way to form groups is to assign all individuals with a probability of 0 to one group and all individuals with a probability of 1 to groups of size one. As values for these probabilities deviate from 0 and 1, one would naturally want to test those individuals with small (large) probabilities of disease in large (small) groups.

For multiplex assays, we now order individuals by their probabilities of being truly positive for at least one disease; e.g., when K = 2, form an ordering based on 1 − p_i00 values from smallest to largest. Individuals based on this ordering are assigned to the corresponding columns of M. For subsequent stages, these ordered individuals are sequentially assigned again to groups of equal or smaller size when searching for the OTC. It is important to note that E(T|M) is a function of all joint probabilities of disease for each individual i, e.g., p_i00, p_i01, p_i10, and p_i11 when K = 2. Thus, these probabilities play an important role in determining which configuration is best.

Of course, these probabilities of disease will not be known in actual application. Instead, they can be estimated by an appropriate regression model, with covariates incorporating the available risk-factor information. When prior testing data are available, a model can be trained upon these data and then applied to estimate probabilities for future individuals. We use this approach in Section 5 to obtain the OTC in our Aptima Combo 2 Assay application.

2.3. Accuracy

While E(T|M) plays a critical role in determining which group testing algorithm to use, it is also important to characterize an algorithm’s accuracy before recommendations are made for implementation. Of course, an algorithm with a small E(T|M) may not be desirable for use without a sufficient level of accuracy. Two measures that describe accuracy are pooling sensitivity and pooling specificity. For individual i and disease k, the pooling sensitivity PS_e:ik is the probability that the individual is diagnosed as positive through group testing given this individual is truly positive. Pooling specificity PS_p:ik is defined analogously for a negative diagnosis given a truly negative individual.

Unlike in Tebbs et al. (2013) and Hou et al. (2017), these accuracy measures will likely be different for each individual due to the available risk-factor information. Therefore, in addition to characterizing an algorithm’s accuracy before implementation, these measures could allow one to isolate incorrect diagnoses after implementation. For this purpose, we define the pooling positive predictive value PPPV_ik as the probability that individual i is truly positive for disease k given a positive diagnosis. The pooling negative predictive value PNPV_ik is defined analogously for a truly negative individual given a negative diagnosis. To illustrate the importance of predictive values, suppose individual i is diagnosed as positive for disease k and has PPPV_ik = 0.25. Arguably, a positive response is somewhat incongruous with this predictive value, and thus one may want to perform a confirmatory test. Although we do not develop a formal framework for this type of testing, it is an important potential advantage of our approach.

To express these accuracy measures, define ${\tilde{G}}_{s j \bar{k}}$ as a vector of true positive/negative (1/0) statuses for group j at stage s, where ${\tilde{G}}_{s j k}$ is omitted. Define ${\tilde{g}}_{s j \bar{k}}$ in a similar manner for the potential realizations. Our derivation in Web Appendix C.2 shows the pooling sensitivity to be

P S_{e : i k} = P (G_{L j +}^{(1)} > 0, \dots, G_{L j +}^{(L - 1)} > 0, G_{L j k} = 1 ∣ {\tilde{Y}}_{i k} = 1) = \sum_{{\tilde{g}}_{L j \bar{k}}^{(1)}} \dots \sum_{{\tilde{g}}_{L j \bar{k}}^{(L - 1)}} S_{e : L j k}^{(L)} [\prod_{s^{'} = 1}^{L - 1} {1 - (1 - S_{e : L j k}^{(s^{'})}) \prod_{q = 1, q \neq k}^{K} {(1 - S_{e : L j q}^{(s^{'})})}^{{\tilde{g}}_{L j q}^{(s^{'})}} {(S_{p : L j q}^{(s^{'})})}^{1 - {\tilde{g}}_{L j q}^{(s^{'})}}}] \times P ({\tilde{G}}_{L j \bar{k}}^{(1)} = {\tilde{g}}_{L j \bar{k}}^{(1)} ∣ {\tilde{Y}}_{i k} = 1) \prod_{s^{'} = 1}^{L - 2} P ({\tilde{G}}_{L j \bar{k}}^{(s^{'} + 1)} = {\tilde{g}}_{L j \bar{k}}^{(s^{'} + 1)} ∣ {\tilde{G}}_{L j \bar{k}}^{(s^{'})} = {\tilde{g}}_{L j \bar{k}}^{(s^{'})}, {\tilde{Y}}_{i k} = 1),

(3)

where L (L ⩽ S) denotes the stage at which individual i tests positively in the jth group that has a size of 1. The conditional probabilities in Equation (3), which are also provided in Web Appendix C.2, depend on the joint probabilities of disease for individual i, making PS_e:ik likely unequal for each individual.

To find the pooling specificity, define Y_ik as a binary response denoting the positive (1) or negative (0) diagnosis for individual i with respect to disease k. We can simplify the process of finding the pooling specificity by writing it in terms of the pooling sensitivity:

P S_{p : i k} = P (Y_{i k} = 0 ∣ {\tilde{Y}}_{i k} = 0) = 1 - \frac{P (Y_{i k} = 1) - P S_{e : i k} P ({\tilde{Y}}_{i k} = 1)}{1 - P ({\tilde{Y}}_{i k} = 1)},

where the derivation is given in Web Appendix C.2. The derivation for P(Y_ik = 1) is very similar to the derivation of Equation (2), so we provide this in Web Appendix C.2 as well. The marginal probability $P ({\tilde{Y}}_{i k} = 1)$ is simply obtained by summing over the corresponding joint probabilities for $p_{i {\tilde{y}}_{1} \dots {\tilde{y}}_{K}}$ .

Predictive values are found through a standard application of Bayes’ rule. The pooling positive predictive value for individual i and disease k is

P P P V_{i k} = P ({\tilde{Y}}_{i k} = 1 ∣ Y_{i k} = 1) = \frac{P ({\tilde{Y}}_{i k} = 1) P S_{e : i k}}{P ({\tilde{Y}}_{i k} = 1) P S_{e : i k} + P ({\tilde{Y}}_{i k} = 0) (1 - P S_{p : i k})} .

The pooling negative predictive value is

P N P V_{i k} = P ({\tilde{Y}}_{i k} = 0 ∣ Y_{i k} = 0) = \frac{P ({\tilde{Y}}_{i k} = 0) P S_{p : i k}}{P ({\tilde{Y}}_{i k} = 0) P S_{p : i k} + P ({\tilde{Y}}_{i k} = 1) (1 - P S_{e : i k})} .

3. Two-stage testing

Section 2 proposes an informative group testing algorithm for S ⩾ 3 hierarchical stages. By taking advantage of the hierarchy, one can incorporate a two-stage algorithm (S = 2) into this same framework with minor modifications. Specifically, two-stage testing is essentially three-stage testing with the initial group of the three stages not tested. Thus, for two-stage testing, a set of I individuals are immediately split into separate groups of size I_1j, j = 1, …, c₁, without presuming that at least one positive individual exists. This allows the I individuals to be optimally placed into groups by taking into account their joint probabilities of disease. Web Appendix A provides an example of a group membership matrix. Web Appendix F.2 provides an example of how to apply a group membership matrix for testing.

The expected number of tests for two-stage testing can be expressed as

E (T ∣ M) = c_{1} + \sum_{j = 1}^{c_{1}} m_{1 j} P (G_{1 j +} > 0) .

Accuracy measures for two-stage testing follow from Section 2. The process to find the OTC also follows from Section 2.

4. Algorithm performance

We use Monte Carlo simulation to evaluate our proposed algorithms by calculating their expected number of tests and accuracy measures over a number of situations. Our focus is on S = 3 and K = 2 due to algorithm similarities for different S and K. For these investigations, suppose $P_{i} = (P_{i 00}, P_{i 01}, P_{i 10}, P_{i 11}) \overset{iid}{~} Dirichlet (α)$ represents probabilities of disease for individual i. Defining P_i as a vector of random variables, rather than using the constant $p_{i {\tilde{y}}_{1} {\tilde{y}}_{2}}$ values as in previous sections, allows us to examine different combinations of probabilities of disease in a controlled manner and to emulate variation that would occur in practice from one individual to the next. For our simulations, we set E(P_i1+) and E(P_i+1), the expected values for the marginal probability of being positive for diseases 1 and 2, respectively, to particular fixed values. Similarly, we control the variability of these marginal probabilities with Var(P_i1+) and Var(P_i+1).

In our first set of investigations, we simulate P_i with E(P_i1+) = E(P_i+1) = 0.05 to match low disease prevalence situations where group testing is most often used. There are an infinite number of α values with these same expectations, so we begin with α = (18.25, 0.75, 0.75, 0.25) to emulate variability levels among marginal probabilities that we have seen in practice. Note that large values for the first element of α are needed because they lead to large values of p_i00 (probability of being disease free) relative to the other joint probabilities. Our investigations also include two other cases with $P_{i} \overset{iid}{~} Dirichlet (4 α)$ and $P_{i} \overset{iid}{~} Dirichlet (α / 4)$ that result in E(P_i1+) = E(P_i+1) = 0.05. The former case reduces Var(P_i1+) and Var(P_i+1), while the latter case increases Var(P_i1+) and Var(P_i+1) relative to using $P_{i} \overset{iid}{~} Dirichlet (α)$ . We simply refer to these three informative group testing cases as “low variability” (4α), “medium variability” (α), and “high variability” (α/4); summaries of these distributions are given in Web Appendix E. For each case, I random realizations of P_i are obtained, and the OTC is found for these realizations using the methods described in Section 2.2. We repeat this process of simulating I random realizations of P_i and finding the OTC for a total of 500 times. We summarize each set of simulations by averaging over the expected number of tests per individual. The assay sensitivity and specificity are set to 0.95 and 0.99, respectively, for both diseases and for all groups tested in each stage.

Our investigations also include comparisons to the non-informative group testing work of Hou et al. (2017). We refer to this case as “homogeneous” because all individuals would be assumed to have the same disease probabilities, so that Var(P_i1+) = Var(P_i+1) = 0. Therefore, we set E(P_i) with α = (18.25,0.75,0.75,0.25) to be the realization of P_i for each individual. The dependence between disease statuses for this case is given by an odds ratio, OR = 8.11. By defining a pseudo odds ratio for the three other cases as $\tilde{O R} = E (P_{i 00}) E (P_{i 11}) / [E (P_{i 01}) E (P_{i 10})]$ , we see that $\tilde{O R} = 8.11$ for each as well, so our comparisons are with similar amounts of dependence. With respect to finding the OTC for the algorithm of Hou et al. (2017), we make one small adjustment to it by allowing for unequal group sizes in stage 2. This adjustment enables comparisons to be made over a larger set of configurations.

Figure 1 displays the averaged E(T|M)/I values at I = 5, …, 20 when M is chosen to correspond to the OTC at that particular I. This figure demonstrates that the reduction in the number of tests for informative group testing can be very significant (up to 18% for high variability) when compared to non-informative group testing (homogeneous case). This reduction is further magnified when considering that thousands or even millions of individuals are tested in high-volume clinical specimen settings when group testing is necessary due to time and cost constraints. Figure 1 also shows that as the variability among P_i1+ and P_i+1 increases, informative group testing becomes more efficient. This happens because larger observed p_i1+ and p_i+1 may occur and informative group testing algorithms can exploit this better through using different group sizes in stage 2. Figure 2 illustrates these differently sized groups by plotting their average sizes in a stacked bar chart. The plot shows that more groups and larger variability in their sizes occur as the variability increases for P_i1+ and P_i+1.

Figure 2. — Stacked bar chart of average stage 2 group sizes across 500 separate simulations for each I, where $P_{i} \overset{i i d}{~} Dirichlet (\cdot)$ , E(P_i1+) = E(P_i+1) = 0:05, and S = 3. From the bottom to the top of each bar, the average group size is given for the first group containing the largest p_i00 values to the last group containing the smallest p_i00 values. Please see Figure 1 for a description of the homogeneous (non-informative group testing) and variability cases.

The clear advantage of using our proposals extends to other situations as well, including for E(P_i1+) = E(P_i+1) = 0.01 with $\tilde{O R} = 43.67$ and for E(P_i1+) = E(P_i+1) = 0.10 with $\tilde{O R} = 3.67$ that are shown in Web Appendix E. This web appendix also provides results where we allow $\tilde{O R}$ to vary among informative group testing cases, but keep E(P_i1+) = E(P_i+1) and Var(P_i1+) = Var(P_i+1) constant. These investigations show that as the strength of positive dependence $(\tilde{O R} > 1)$ increases, the average expected number of tests per individual decreases. This occurs due to positive individuals being more likely to have more than one disease as the dependence increases.

Figure 3 summarizes accuracy for α = (18.25, 0.75, 0.75, 0.25) through the use of box plots at each I. In total, there are 500I different accuracy measures summarized in each box plot, and these measures are only for disease k = 1 (k = 2 had similar results). Depending on which measure is summarized, there can be a small or large amount of variability, although each is reasonable in the context of what is calculated. For example, the pooling positive predictive values range from 0 to 1. Low values are due to low observed values for p_i1+ that can occur when E(P_i1+) is low as well. Similarly, most pooling negative predictive values are close to 1. We also include in each plot a line connected across different I to represent the accuracy obtained by the homogeneous case, where again we allow for unequal group sizes in stage 2. Because the accuracy measures depend on stage 2 group sizes, this leads to different values for some individuals, so we simply displayed the mean value. Overall, the accuracy of informative group testing and non-informative group testing (homogeneous case) is quite comparable when just focusing on the displayed means and medians. Additional plots in Web Appendix E show similar findings for E(P_i1+) = E(P_i+1) = 0.01 and for E(P_i1+) = E(P_i+1) = 0.10.

5. Aptima Combo 2 Assay Application

High-volume testing of clinical specimens for chlamydia and gonorrhea occurs at laboratories in each state of the US, where many laboratories have already transitioned to group testing with multiplex assays. A very frequently used multiplex assay in this setting is the Aptima Combo 2 Assay. To determine how to apply this assay, one would ideally implement different versions of a group testing algorithm (e.g., informative or non-informative, different I and S) simultaneously upon the same specimens and, for future use, choose the algorithm that performs best. Unfortunately, this is not practical due to the high cost of assays, the limited amount of material available for each specimen, and the exceedingly large overall implementation time. Therefore, we instead use statewide retrospective data for the years 2010 and 2011 from Idaho and Oregon and for the years 2013 and 2014 from Iowa as a basis to emulate how testing would be performed.

Details regarding the data content and emulation process are available in Web Appendix F. In summary, we use the former year as training data and the latter year as test data for each state. Separate training/test data sets are formed within states by gender due to how manufacturers test their assays and how laboratories often perform their testing. For each individual, the data consist of final chlamydia and gonorrhea diagnoses, age, personal behavior information (e.g., risk history, patient reported symptoms), and clinical observations (e.g., urethritis, cervicitis). Multinomial regression models are estimated to each training data to obtain probability estimates of chlamydia and/or gonorrhea (i.e., estimates of p_i00, p_i01, p_i10, and p_i11). Following the recommendations of Black et al. (2015) for informative group testing with single-disease assays, we use a “non-adaptive” process with these probability estimates to obtain one overall estimated OTC from each training data. To summarize, this process obtains mean estimates of p_i00, p_i01, p_i10, and p_i11 over i = 1, …, I sets and uses these estimates to find the most efficient testing configuration. Non-informative algorithms are implemented in the same manner to obtain estimated OTCs, except only the chlamydia and gonorrhea diagnoses are used to obtain the probability estimates.

These estimated OTCs are subsequently applied to I successive members of the test data to determine positive/negative outcomes for both diseases via group testing. To account for assay imperfection, we incorporate the assay’s sensitivity and specificity (manufacturer provided) into the analysis. We do this by treating the final diagnoses given in the test data as the “true” statuses, and then simulate the group and individual responses that could occur while implementing group testing. This process is necessary because true disease statuses are not observable. Each application of a group testing algorithm is simulated 500 times to account for assay imperfection. Our reported results are averaged over these 500 applications.

Table 1 displays the results. To help understand the table’s contents, examine the entries for Idaho females tested over three stages. Informative group testing places the 4,168 individuals successively into non-overlapping initial groups of size I = 8 with an estimated OTC given by the group membership matrix of

M = [\begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 2 & 2 & 2 & 3 \\ 1 & 2 & 3 & 4 & 5 & 6 & 7 & NA \end{matrix}]

as shown in Web Table 4. This M was also provided as the first example in Web Appendix A. Table 1 shows that 1,927.1 tests on average were needed for informative group testing, while 2,029.9 tests on average were needed for non-informative group testing. Thus, the reduction in the number of tests is 5.1% when using informative rather than non-informative. Other state and gender combinations can result in a much greater reduction. For example, testing males in Idaho over two stages results in a 14.7% reduction in tests on average. In summary, the mean number of tests is reduced in all but one case when comparing informative to non-informative group testing, and this reduction is much greater for males than for females.

Table 1.

Mean number of tests and standard deviation (SD) for the number of tests when applying the group testing algorithms to data from 2011 for Idaho and Oregon and to data from 2014 for Iowa.

				Mean (SD) number of tests
State	Gender	Number of individuals	Stages	Non-informative	Informative	Reduction
Idaho	Female	4168	2	2211.0 (23.6)	2105.9 (24.9)	4.8%
			3	2029.9 (27.4)	1927.1 (25.4)	5.1%
	Male	2545	2	2014.7 (12.4)	1717.8 (12.5)	14.7%
			3	2103.9 (26.8)	1831.7 (24.8)	12.9%
Iowa	Female	4351	2	2460.7 (22.3)	2459.1 (22.2)	0.1%
			3	2305.2 (29.0)	2350.0 (27.6)	−1.9%
	Male	4358	2	3419.9 (15.2)	3201.6 (16.4)	6.4%
			3	3588.3 (26.6)	3214.6 (18.3)	10.4%
Oregon	Female	8381	2	4408.5 (30.5)	4250.2 (32.4)	3.6%
			3	4000.5 (37.9)	3948.8 (37.6)	1.3%
	Male	6865	2	5478.6 (19.9)	4936.4 (19.2)	9.9%
			3	5574.8 (40.6)	5059.9 (39.0)	9.2%

Open in a new tab

Performance differences between males and females in Table 1 are explained by examining the distributions of the joint probabilities of disease. We provide summaries of these distributions in Web Appendix F.1. Overall, the variability of these probabilities for males is much larger than for females, leading to a better algorithm performance for males. This variability result matches what was shown in Section 4 relative to Var(P_i1+) and Var(P_i+1). Also, it is interesting to note that the reduction in the mean number of tests is generally less for Iowa than for Idaho and Oregon. This occurs because there is less information available on individuals in Iowa than there is for those in Idaho and Oregon (see Web Tables 1 and 2).

Web Appendix F.4 provides the observed accuracy for the informative and non-informative algorithms. Overall, the numerical values are very similar among the algorithms, indicating that accuracy is not reduced by using informative group testing.

6. Discussion

Our research shows that substantial reductions in the number of tests can occur when using informative group testing with multiplex assays. These reductions are directly related to the joint probabilities of disease: as the variability in the marginal probabilities increases across individuals, the overall number of tests decreases. Also, these reductions occur without loss of accuracy in comparison to non-informative algorithms. To make our proposals readily available, we provide R functions to find E(T|M), accuracy measures, and the OTC in Web Appendix D and at our research website www.chrisbilder.com/grouptesting. These functions can also be useful to decide if a potential reduction in tests from three or more stages is enough, relative to practicality of implementation, to forgo using only two stages.

The use of informative group testing with multiplex assays leads to a number of intriguing future research problems. First, McMahan et al. (2012b) proposed informative group testing algorithms for single-disease assays applied to specimens configured in a two-dimensional array structure. We expect a similar development as with our hierarchical approach could be realized for multiplex assays applied in arrays. Second, there are multiple modeling methods that can be used to estimate the joint probabilities of disease. While we applied a multinomial regression model in Section 5, more flexible types of models, like generalized additive or random forests, could be used instead. One would expect better estimates of the disease probabilities; however, better estimates may not lead to significantly fewer tests. The reason is because informative group testing algorithms rely on the ordering of individuals, which may be the same regardless of the modeling method. Further research should explore the potential benefits from different types of models. Finally, multiplex assays can be discriminatory or non-discriminatory. Our paper examines discriminatory tests that produce positive/negative results for each disease. Non-discriminatory, such as the Cobas TaqScreen MPX Test, provide one positive/negative outcome overall regardless of the number of diseases. Future research should examine how best to use group testing with this type of assay.

Supplementary Material

Web-based Supplementary Materials

NIHMS1675331-supplement-Web-based_Supplementary_Materials.pdf^{(789.9KB, pdf)}

Acknowledgments

The authors thank Lucy DesJardin, Jeff Benfer, and Kristopher Eveland at the SHL for their consultation and desire to improve infectious disease testing through group testing. The authors also thank Cardea Services and the state public health laboratories in Idaho and Oregon for providing access to their testing data. This research was supported by Grant R01 AI121351 from the National Institutes of Health.

Footnotes

Supplementary materials

The referenced Web Appendices are included with this paper at the Biometrics website on Wiley Online Library. Our R programs are available from this website as well.

References

American Red Cross (2018). Blood testing https://www.redcrossblood.org/biomedical-services/blood-diagnostic-testing/blood-testing.html. Retrieved September 20, 2018.
Bilder C, Tebbs J, and Chen P (2010). Informative retesting. Journal of the American Statistical Association 105, 942–955. [DOI] [PMC free article] [PubMed] [Google Scholar]
Black M, Bilder C, and Tebbs J (2012). Group testing in heterogeneous populations by using halving algorithms. Journal of the Royal Statistical Society: Series C 61, 277–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
Black M, Bilder C, and Tebbs J (2015). Optimal retesting configurations for hierarchical group testing. Journal of the Royal Statistical Society: Series C 64, 693–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dorfman R (1943). The detection of defective members of large populations. Annals of Mathematical Statistics 14, 436–440. [Google Scholar]
Gildow F, Shah D, Sackett W, Butzler T, Nault B, and Fleischer S (2008). Transmission efficiency of cucumber mosaic virus by aphids associated with virus epidemics in snap bean. Phytopathology 98, 1233–1241. [DOI] [PubMed] [Google Scholar]
Hou P, Tebbs J, Bilder C, and McMahan C (2017). Hierarchical group testing for multiple infections. Biometrics 73, 656–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hwang F (1975). A generalized binomial group testing problem. Journal of the American Statistical Association 70, 923–926. [Google Scholar]
Kainkaryam R and Woolf P (2009). Pooling in high-throughput drug screening. Current Opinion in Drug Discovery & Development 12, 339. [PMC free article] [PubMed] [Google Scholar]
Lewis J, Lockary V, and Kobic S (2012). Cost savings and increased efficiency using a stratified specimen pooling strategy for Chlamydia trachomatis and Neisseria gonorrhoeae. Sexually Transmitted Diseases 39, 46–48. [DOI] [PubMed] [Google Scholar]
Liu T, Hogan J, Daniels M, Coetzer M, Xu Y, Bove G, DeLong A, Ledingham L, Orido M, Diero L, and Kantor R (2017). Improved HIV-1 viral load monitoring capacity using pooled testing with marker-assisted deconvolution. Journal of Acquired Immune Deficiency Syndromes 75, 580–587. [DOI] [PMC free article] [PubMed] [Google Scholar]
McMahan C, Tebbs J, and Bilder C (2012a). Informative Dorfman screening. Biometrics 68, 287–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
McMahan C, Tebbs J, and Bilder C (2012b). Two-dimensional informative array testing. Biometrics 68, 793–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nebraska Veterinary Diagnostic Center (2018). General policies and fee schedule. http://vbms.unl.edu/VDC/Information/VDCFeeSchedule.pdf. Retrieved September 20, 2018.
Quinn T, Brookmeyer R, Kline R, Shepherd M, Paranjape R, Mehendale S, Gadkari D, and Bollinger R (2000). Feasibility of pooling sera for HIV-1 viral RNA to diagnose acute primary HIV-1 infection and estimate HIV incidence. AIDS 14, 2751–2757. [DOI] [PubMed] [Google Scholar]
Saá P, Proctor M, Foster G, Krysztof D, Winton C, Linnen J, Gao K, Brodsky J, Limberger R, Dodd R, and Stramer S (2018). Investigational testing for Zika virus among US blood donors. New England Journal of Medicine 378, 1778–1788. [DOI] [PubMed] [Google Scholar]
Sherlock M, Zetola N, and Klausner J (2007). Routine detection of acute HIV infection through RNA pooling: Survey of current practice in the United States. Sexually Transmitted Diseases 34, 314–316. [DOI] [PubMed] [Google Scholar]
Tebbs J, McMahan C, and Bilder C (2013). Two-stage hierarchical group testing for multiple infections with application to the Infertility Prevention Project. Biometrics 69, 1064–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web-based Supplementary Materials

NIHMS1675331-supplement-Web-based_Supplementary_Materials.pdf^{(789.9KB, pdf)}

[R1] American Red Cross (2018). Blood testing https://www.redcrossblood.org/biomedical-services/blood-diagnostic-testing/blood-testing.html. Retrieved September 20, 2018.

[R2] Bilder C, Tebbs J, and Chen P (2010). Informative retesting. Journal of the American Statistical Association 105, 942–955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Black M, Bilder C, and Tebbs J (2012). Group testing in heterogeneous populations by using halving algorithms. Journal of the Royal Statistical Society: Series C 61, 277–290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Black M, Bilder C, and Tebbs J (2015). Optimal retesting configurations for hierarchical group testing. Journal of the Royal Statistical Society: Series C 64, 693–710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Dorfman R (1943). The detection of defective members of large populations. Annals of Mathematical Statistics 14, 436–440. [Google Scholar]

[R6] Gildow F, Shah D, Sackett W, Butzler T, Nault B, and Fleischer S (2008). Transmission efficiency of cucumber mosaic virus by aphids associated with virus epidemics in snap bean. Phytopathology 98, 1233–1241. [DOI] [PubMed] [Google Scholar]

[R7] Hou P, Tebbs J, Bilder C, and McMahan C (2017). Hierarchical group testing for multiple infections. Biometrics 73, 656–665. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Hwang F (1975). A generalized binomial group testing problem. Journal of the American Statistical Association 70, 923–926. [Google Scholar]

[R9] Kainkaryam R and Woolf P (2009). Pooling in high-throughput drug screening. Current Opinion in Drug Discovery & Development 12, 339. [PMC free article] [PubMed] [Google Scholar]

[R10] Lewis J, Lockary V, and Kobic S (2012). Cost savings and increased efficiency using a stratified specimen pooling strategy for Chlamydia trachomatis and Neisseria gonorrhoeae. Sexually Transmitted Diseases 39, 46–48. [DOI] [PubMed] [Google Scholar]

[R11] Liu T, Hogan J, Daniels M, Coetzer M, Xu Y, Bove G, DeLong A, Ledingham L, Orido M, Diero L, and Kantor R (2017). Improved HIV-1 viral load monitoring capacity using pooled testing with marker-assisted deconvolution. Journal of Acquired Immune Deficiency Syndromes 75, 580–587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] McMahan C, Tebbs J, and Bilder C (2012a). Informative Dorfman screening. Biometrics 68, 287–296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] McMahan C, Tebbs J, and Bilder C (2012b). Two-dimensional informative array testing. Biometrics 68, 793–804. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Nebraska Veterinary Diagnostic Center (2018). General policies and fee schedule. http://vbms.unl.edu/VDC/Information/VDCFeeSchedule.pdf. Retrieved September 20, 2018.

[R15] Quinn T, Brookmeyer R, Kline R, Shepherd M, Paranjape R, Mehendale S, Gadkari D, and Bollinger R (2000). Feasibility of pooling sera for HIV-1 viral RNA to diagnose acute primary HIV-1 infection and estimate HIV incidence. AIDS 14, 2751–2757. [DOI] [PubMed] [Google Scholar]

[R16] Saá P, Proctor M, Foster G, Krysztof D, Winton C, Linnen J, Gao K, Brodsky J, Limberger R, Dodd R, and Stramer S (2018). Investigational testing for Zika virus among US blood donors. New England Journal of Medicine 378, 1778–1788. [DOI] [PubMed] [Google Scholar]

[R17] Sherlock M, Zetola N, and Klausner J (2007). Routine detection of acute HIV infection through RNA pooling: Survey of current practice in the United States. Sexually Transmitted Diseases 34, 314–316. [DOI] [PubMed] [Google Scholar]

[R18] Tebbs J, McMahan C, and Bilder C (2013). Two-stage hierarchical group testing for multiple infections with application to the Infertility Prevention Project. Biometrics 69, 1064–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Informative Group Testing for Multiplex Assays

Christopher R Bilder

Joshua M Tebbs

Christopher S McMahan

Summary:

1. Introduction