Determination of Varying Group Sizes for Pooling Procedure

Wenjun Xiong; Hongyu Lu; Juan Ding

doi:10.1155/2019/4381084

. 2019 Apr 1;2019:4381084. doi: 10.1155/2019/4381084

Determination of Varying Group Sizes for Pooling Procedure

Wenjun Xiong ¹, Hongyu Lu ¹, Juan Ding ^1,^✉

PMCID: PMC6466917 PMID: 31065292

Abstract

Pooling is an attractive strategy in screening infected specimens, especially for rare diseases. An essential step of performing the pooled test is to determine the group size. Sometimes, equal group size is not appropriate due to population heterogeneity. In this case, varying group sizes are preferred and could be determined while individual information is available. In this study, we propose a sequential procedure to determine varying group sizes through fully utilizing available information. This procedure is data driven. Simulations show that it has good performance in estimating parameters.

1. Introduction

Routine monitoring or large scale of screening usually occurs in biomedical research to identify infected specimens [1–4]. However, some test kits, e.g., nucleic acid amplification test (NAAT), are expensive [2, 5]. Therefore, the expense during a large-scale monitoring process is usually a financial burden if resource is limited [6–8]. The strategy of pooling biospecimens is attractive to address this issue [9–11], which was first used during World War II to screen for syphilis [12]. This strategy is firstly to pool specimens into groups and then screen these groups. If a group tests negative, all specimens in this group will be declared negative; otherwise, continue to perform individual test. When the prevalence is low, the total number of tests using pooling will be far less than that using the individual test. Due to its efficiency and cost saving, pooling is now applied in many fields, such as agriculture [13], genetics [14, 15], HIV/AIDS [16, 17] and blood screening [18], and environmental epidemiology [19, 20].

The gain of pooling mainly depends on the pooling algorithm. Assuming homogeneity of the population, dozens of papers have investigated the problem how to design an efficient algorithm [21–25]. However, this assumption might be violated in practical application [26–28]. While individual information is available, it is of interest to estimate individual-level prevalence through incorporating such information. Note that only group-level status is observed, e.g., positive or negative. This problem has been studied in parametric context through the framework of binary regression models [29–31], and also in semiparametric [32, 33] or nonparametric context [34, 35]. However, aforementioned work mostly uses a single group size that is determined in advance.

A set of pool sizes might be more appropriate while considering population heterogeneity. For example, varying pool sizes were used to estimate the infection prevalence of Myxobolus cerebralis, which causes whirling disease, among free-ranging salmonid fish collected from the Truckee River in Nevada and California [36]. In a study of estimating the prevalence of several viruses in carnations grown in nursery glasshouses in Victoria, sequential pooled testing involving several pool sizes was adopted [37]. Using a single group size might be optimal for some estimates but far from others, especially when we have little information ahead of the experiment [37, 38]. More work is better on this issue since the benefit of pooling algorithm mainly depend on the choice of pool size [38–40]. In this study, we propose a pooling strategy with varying pool sizes through taking advantage of individual information. Our procedure is a data-driven pooling algorithm, where groups are formed sequentially. Its performance is extensively investigated by simulations and a real data set.

2. Methods

2.1. Notations and Background

Suppose N specimens are assigned into m groups each with size k _i for i=1,2,…, m. z _i denotes the observed status of the i ^th group, and X _ij denotes the covariates of the j ^th specimen in the i ^th group for j=1,…, k _i and i=1,…, m. The observations are {z _i, X _ij, j=1,…, k _i, i=1,…, m}, where X _ij={1, x _1,ij,…,x _d−1,ij}^T. Here, the notation A ^T represents the transpose of matrix A. The sensitivity and specificity of the screening tool are denoted by S _e and S _p, respectively. The full likelihood function is

\begin{matrix} L (β; z, X) = \prod_{i = 1}^{m} {[S_{e} - r \prod_{j = 1}^{k_{i}} (1 - p_{i j})]}^{z_{i}} {[1 - S_{e} + r \prod_{j = 1}^{k_{i}} (1 - p_{i j})]}^{1 - z_{i}}, \end{matrix}

(1)

where r=S _e+S _p − 1 and p _ij=g(β ₀+β ₁ x _1,ij+⋯+β _d−1 x _d−1, ij)=g(X _ij ^T β). The parameter β is defined by β={β ₀, β ₁,…,β _d−1}^T, and the function g ⁻¹(·) is a known, monotone, and differentiable link function.

Sometimes there might be a maximum admissible group size k ^max, e.g., a large group size might bring the dilution effect. Therefore, we should carefully choose an appropriate group size that is smaller than k ^max. Define a set 𝒦={1,2,…, k ^max}, and denote it by k={k ₁,…, k _m}, k _i ∈ 𝒦, i=1,…, m. Once the group size k is determined, we could obtain the estimator of β through maximum likelihood function L(β, z, X). The Fisher information matrix of the parameter β could be rewritten as follows:

\begin{matrix} I (β, k) = \sum_{i = 1}^{m} \frac{G_{i} (k_{i}, β) G_{i}^{T} (k_{i}, β)}{C_{i} (β, k_{i})}, \end{matrix}

(2)

where

\begin{matrix} H_{i} (k_{i}, β) = - \frac{1}{k_{i}} \sum_{j = 1}^{k_{i}} log (1 - g (X_{i j}^{T} β)), \\ G_{i} (k_{i}, β) = \frac{\partial}{\partial β} H_{i} (k_{i}, β), \\ C_{i} (β, k_{i}) = (S_{e} - r {exp}^{- k_{i} H_{i} (k_{i}, β)}) (1 - S_{e} + r {exp}^{- k_{i} H_{i} (k_{i}, β)}) r^{- 2} k_{i}^{- 2} {exp}^{2 k_{i} H_{i} (k_{i}, β)} . \end{matrix}

(3)

The calculation of Fisher information I(β, k) is presented in Supplemental Material (Available here). To obtain a better estimator $\hat{β}$ , we try to find k that maximizes Fisher information I(β, k). However, individual-level measurements make it difficult to achieve this goal.

The Fisher information I(β, k) defined in (2) involves a measurement H _i(β, k _i), along with its functions G _i(k _i, β) and C _i(β, k _i). According to Delaigle and Hall [41], ∏_j=1 ^k_i(1 − g(X _ij ^T β)) is generally close to ${(1 - g ({\bar{X}}_{i}^{T} β))}^{k_{i}}$ , where ${\bar{X}}_{i} = 1 / k_{i} \sum_{j = 1}^{k_{i}} X_{i j}$ . This closeness let the Fisher information reduce to the following format: I(β, k)=∑_i=1 ^m Z _i(β)Z _i(β)^T/C _i(β, k _i), where $Z_{i} (β) = g^{'} ({\bar{X}}_{i}^{T} β) {\bar{X}}_{i} / (1 - g ({\bar{X}}_{i}^{T} β))$ . Then, we propose to determine the group sizes through minimizing all C _i(β, k _i) with respect to k _i for i=1,…, m.

Note that the aforementioned approximate approach requires the pools are homogeneous. There are two methods to obtain homogeneous pool: reorder the specimens according to similarity of covariants or based on individual risk probability. The latter is adopted in this study. Following the method in McMahan et al. [42], the procedure of forming homogeneous pool is as follows. Firstly, use training data or prior knowledge to obtain an initial estimator β ⁽⁰⁾ [42]. Secondly, sort the specimens by their risk probability. Let G denotes the set which contains total covariants of enrolled specimens, G={x ₁,…, x _N}, where N is the number of specimens and x _i is the covariant of the i ^th specimen. Sort G by risk probability p _i=g(x _i ^T β ⁽⁰⁾) in the descending order, and obtain a sorted set G ^s={x ₁ ^s, ⋯, x _N ^s}. The remaining procedure is directly performed on this sorted set.

2.2. Sequential Adaptive Pooling Algorithm

Our strategy is an adaptive design, which is often adopted in the biological experiment and also in the pooled test [22]. Before stating the algorithm, we need the following result. Suppose the specimens are assigned for the first l − 1 groups with the corresponding group sizes {k ₁,…, k _l−1}. Let n _l=∑_j=1 ^l k _j for l ≥ 1 and n ₀=0. Denote W _l(β)=−log(1 − g((x _{n_l−1+1} ^s)^T β)). Then the group size for the next group, k _l, equals k ^max if k ^max ≤ ϕ ₀/W _l(β ⁽⁰⁾). Here, ϕ ₀ is the root of an equation 2S _e(1 − S _e)(ϕ − 1)e ^2ϕ+r(2S _e − 1)(ϕ − 2)e ^ϕ+2r ²=0 and is approximately 1.8414. The proof of this result is presented in Supplemental Material (Available here). Our pooling strategy is described as follows:

Step 1. Label the specimens according to the ordering of G ^s. For example, label the specimen with covariants x ₁ ^s by number 1. Assign specimens with labels up to k ^max into l ^th group.

Step 2. Calculate the corresponding function C _l(β ⁽⁰⁾, k), k ∈ 𝒦 and c ₀=ϕ ₀/W _l(β ⁽⁰⁾). If k ^max ≤ c ₀, defines k _l by k ^max, choose the group size k _l which minimizes the function C _l(β ⁽⁰⁾, k), k _l=argmin_k∈𝒦 C _l(β ⁽⁰⁾, k). Define the set of covariants G _l={x _{n_l−1+1} ^s,…, x _{n_l} ^s}.

Step 3. Let G ^s=G ^s/G _l, l=l+1. Repeat Step 2 to form the next group in the same way until all specimens are assigned.

Step 4. Screen the groups and obtain maximum likelihood estimator of β.

Note that this is a data-driven pooling strategy. Additionally, the above procedure does not strictly require that all specimens are enrolled before screening since the set G ^s is dynamic and could be renewed by new enrolled specimens.

2.3. Numerical Results

In this section, we proceed to evaluate the performance of our proposed procedure. Name it by PSV, which is pooling strategy with varied group sizes. For comparison, we also present the results of pooling strategy with a single group size k, named by PSS(k). The group size k for PSS(k) is given in advance, e.g., k=5, 10, or could be determined by the average prevalence of those enrolled samples. For the latter, we determine the optimal single group size k ^∗ by minimizing the variance of $\hat{p}$ .

To investigate the performance of these methods, define the link function g(·) as the logistic function g(u)=1/(1+exp(−u)). Then, individual prevalence is obtained through the following model:

\begin{matrix} log \frac{p_{i j}}{1 - p_{i j}} = β_{0} + β_{1} x_{1, i j} + \dots + β_{d - 1} x_{d - 1, i j}, i = 1, \dots, m, j = 1, \dots, k_{i} . \end{matrix}

(4)

We first consider a single covariant (d=2), following the normal distribution N(2,1.5) or the gamma distribution Γ(2.5, 0.8). The corresponding parameters are set by β ₀=−3 and β ₁=0.4. The samples are generated under these settings, and the procedures are repeated by M=5000 times. We report the estimators ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ , along with their mean square error (MSE) in Table 1 under different settings of sensitivity, specificity, and the number of groups. In Figure 1, we further report the relative bias of the parameters.

Table 1.

The performance of estimators using different pooling procedures.

(S _e, S _p)	𝒜	m=1000				m=500
		β ₀		β ₁		β ₀		β ₁
		Mean	MSE	Mean	MSE	Mean	MSE	Mean	MSE
X∼N (2, 1.5)
(0.99, 0.99)	PSV	−3.003	0.020	0.401	0.002	−3.001	0.043	0.401	0.004
	PSF(k ^∗)	−3.002	0.010	0.402	0.002	−3.006	0.022	0.403	0.004
	PSF(5)	−3.007	0.134	0.402	0.010	−3.018	0.289	0.405	0.021
	PSF(10)	−3.006	0.021	0.403	0.003	−3.009	0.042	0.403	0.005
(0.95, 0.95)	PSV	−3.002	0.026	0.402	0.003	−2.999	0.050	0.401	0.005
	PSF(k ^∗)	−3.006	0.022	0.403	0.003	−3.009	0.041	0.406	0.006
	PSF(5)	−3.008	0.162	0.403	0.012	−2.997	0.317	0.400	0.023
	PSF(10)	−3.004	0.026	0.403	0.003	−2.998	0.052	0.401	0.006
(0.9, 0.9)	PSV	−3.001	0.034	0.402	0.003	−2.991	0.071	0.395	0.007
	PSF(k ^∗)	−3.004	0.035	0.404	0.004	−3.007	0.074	0.404	0.010
	PSF(5)	−2.974	0.225	0.394	0.016	−2.993	0.418	0.399	0.031
	PSF(10)	−3.004	0.038	0.404	0.005	−3.008	0.077	0.404	0.010

X ∼ Г (2.5, 0.8)
(0.99, 0.99)	PSV	−2.991	0.041	0.397	0.004	−2.997	0.020	0.399	0.002
	PSF(k ^∗)	−3.006	0.020	0.404	0.003	−3.002	0.010	0.402	0.002
	PSF(5)	−2.973	0.281	0.393	0.020	−3.002	0.136	0.400	0.010
	PSF(10)	−3.002	0.042	0.402	0.005	−3.004	0.021	0.402	0.002
(0.95, 0.95)	PSV	−3.000	0.053	0.401	0.005	−2.998	0.026	0.400	0.003
	PSF(k ^∗)	−3.010	0.041	0.404	0.006	−3.007	0.020	0.404	0.003
	PSF(5)	−3.060	0.324	0.416	0.023	−3.015	0.171	0.405	0.012
	PSF(10)	−3.003	0.053	0.402	0.007	−3.006	0.027	0.403	0.003
(0.9, 0.9)	PSV	−2.989	0.072	0.398	0.007	−2.992	0.034	0.399	0.004
	PSF(k ^∗)	−3.017	0.075	0.408	0.010	−3.001	0.033	0.402	0.004
	PSF(5)	−3.012	0.379	0.403	0.028	−2.995	0.198	0.398	0.014
	PSF(10)	−3.018	0.075	0.409	0.010	−3.003	0.035	0.402	0.005

Open in a new tab

The relative bias of the parameters β ₀ and β ₁. The distribution of covariant is set by N(2,1.5) (top two panels) and Γ(2.5, 0.8) (bottom two panels), with the fixed number of groups m=1000.

Table 1 shows that all procedures have similar performance except PSF [5]. While using the procedure PSF, we have to choose a group size in advance. It is crucial for a group testing algorithm since the precision of estimators severely depend on the group size. In our setting, the average of individual prevalence is about 0.0997, and the corresponding optimal single group size is mostly k ^∗=13, 12, 11 for (S _e, S _p)=(0.99, 0.99), (0.95, 0.95), and (0.9, 0.9) respectively. Consequently, the procedure PSF [10] has better performance than PSF [5] since the latter procedure uses a too smaller group size. Figure 1 further shows the relative bias of the parameters, β ₀ and β ₁. Our procedure with varying group sizes, PSV, has very good performance under different scenarios. The procedure PSF [5] still has the poorest performance on the measurement of relative bias. As data-driven pooling strategies, PSV and PSF (k ^∗) both show good performance, but PSV has smaller bias, which is a desired characteristic.

We proceed to consider the model (2) with d=4. Denote the single variable in the above setting by x ₁. We add two more variables: x ₂ follows the binomial distribution B(0.3) and x ₃ follows the normal distribution N(1,0.5). Then, the model (2) is

\begin{matrix} logit (p_{i j}) = β_{0} + β_{1} x_{1, i j} + β_{2} x_{2, i j} + β_{3} x_{3, i j}, i = 1, \dots, m, j = 1, \dots, k_{i} . \end{matrix}

(5)

Specifically, denote by “Model I”: x ₁ ~ Γ(2.5, 0.8), x ₂ ~ B(0.3), x ₃ ~ N(1,0.5), and “Model II”: x ₁ ~ N(2,1.5), x ₂ ~ B(0.3), x ₃ ~ N(1,0.5). Set the parameters by β ₀=−3, β ₁=0.4, β ₂=1, and β ₃=−0.5. In Figure 2, we report the relative bias of the estimators ${\hat{β}}_{0} - {\hat{β}}_{3}$ under Model I. Furthermore, define a measurement of $R = (1 / 4) \sum_{l = 1}^{4} |({\hat{β}}_{l} - β_{l}) / β_{l}|$ to calculate the overall relative bias. The results are reported in Figure 3.

The relative bias of the parameters β ₀ − β ₃ under Model I: x ₁ ~ Γ(2.5, 0.8), x ₂ ~ B(0.3), and x ₃ ~ N(1,0.5), with the number of groups m=1000.

The overall relative bias of the parameters, defined as $R = (1 / 4) \sum_{l = 1}^{4} |({\hat{β}}_{l} - β_{l}) / β_{l}|$ . Model I: x ₁ ~ Γ(2.5, 0.8), x ₂ ~ B(0.3), and x ₃ ~ N(1,0.5). Model II: x ₁ ~ N(2,1.5), x ₂ ~ B(0.3), and x ₃ ~ N(1,0.5).

Figure 2 shows that our procedure PSV performs best among the four procedures. It is a similar result as shown in Figure 1. The overall relative bias of these estimators reported in Figure 3 also confirms such property. It also reveals that pooling procedures using a single group size are not desired for a heterogeneous population, even the group size is carefully chosen, e.g., k ^∗.

2.4. An Illustrative Application

Verstraeten et al. conducted a surveillance study in Kenya to monitor a trend in HIV risk over time [43]. The samples were collected from pregnant women, along with potential risk covariants such as age, parity, and education level. They used a common group size of 10 to estimate the seroprevalence of HIV. However, the individual prevalence of HIV is related with those risk covariants, e.g., the risk of HIV might tend to increase with age. For this data set, Vansteelandt et al. reported a set of group sizes varying between 5 and 12 under cost-precision trade-off [40].

We proceed to illustrate our pooling strategy based on part of these data published in [44]. They reported N=428 individuals enrolled in the experiment, including their age (x ₁) and education level (x ₂). Using model presented in [2], the individual prevalence p _ij follows the model: logit(p _ij)=β ₀+β ₁ x _1,ij+β ₂ x _2,ij, i=1,…, m, j=1,…, k _i with N=∑_i=1 ^m k _i. Let the initial estimator be β ⁽⁰⁾=[−2, −0.05, 0.5]. Using our proposed pooling strategies PSV and PSF(k ^∗), the group sizes are listed in Table 2. Correspondingly, we obtain estimators: $\hat{β} = [- 2.909, - 0.033, 0.473]$ using PSV and $\hat{β} = [- 3.011, - 0.028, 0.443]$ using PSF(k ^∗).

Table 2.

The group sizes chosen using PSV procedure for the Kenyan example.

Procedure	PSV							PSS
Group size	6	7	8	9	10	11	12	11
Number of groups	2	2	2	4	3	4	23	39

Open in a new tab

3. Discussion

In biological and epidemiological studies, there is growing interest in developing methods for a more accurate result but less cost. Group testing is such a cost saving strategy. In this study, we developed a pooling strategy that uses varying group sizes while individual information is available. This strategy is attractive since it only depends on the information of enrolled specimens and does not require a group size chosen in advance. Due to the characteristic of data-driven and theoretical justification, the procedure, “PSV,” proposed in this study has a robust performance under different settings. It is convenient for practical application since we do not have to worry about how to choose an appropriate group size.

Varying group sizes are reasonable to be used when the target population is diverse. For example, a sequential testing procedure using several group sizes is adopted to estimate virus infection levels of carnation populations grown in glasshouses since different carnation populations were expected to have a wide range of infection levels [45]. We could pool more specimens into one group if the probability of testing positive is small. It sounds reasonable to balance the probability of testing positive for each group, a way to mimic the situation when all enrolled specimens are homogeneous.

In this study, we also propose a procedure using a single group size k ^∗ determined by minimizing the variance of estimator of the prevalence. We could choose this procedure if we prefer a simple procedure or the diversity among the specimens to be screened is ignorable. Besides, we did not consider the cost of collecting specimens. If a test is much more expensive than that of collecting specimens, then the cost of tests is the main consideration in a project involving large-scale screening. Otherwise, it is necessary to take into account the overall cost of collecting and test while using the pooling strategy.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 11801102 and 11501134), Guangxi Scholarship Fund of Guangxi Education Department, Guangxi Natural Science Foundation (no. 2018GXNSFAA138161), and Research Projects of Guangxi Colleges (no. 2018KY0081).

Data Availability

The Kenya data supporting this study are from previously reported studies and datasets, which have been cited. The data are available at https://cran.r-project.org/package=binGroup.

Conflicts of Interest

The authors declare no conflicts of interest.

Supplementary Materials

This article contains additional information on some technical aspects of the research, including the detailed calculation of the Fisher information matrix of the regression parameter and theoretical justification of Step 2 of our sequential adaptive pooling algorithm.

Click here for additional data file.^{(35KB, pdf)}

References

1.Behets F., Bertozzi S., Kasali M., et al. Successful use of pooled sera to determine HIV-1 seroprevalence in Zaire with development of cost-efficiency models. AIDS. 1990;4(8):737–742. doi: 10.1097/00002030-199008000-00004. [DOI] [PubMed] [Google Scholar]
2.Westreich D. J., Hudgens M. G., Fiscus S. A., Pilcher C. D. Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests. Journal of Clinical Microbiology. 2008;46(5):1785–1792. doi: 10.1128/jcm.00787-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zhou Z., Mitchell R. M., Gutman J., et al. Pooled PCR testing strategy and prevalence estimation of submicroscopic infections using bayesian latent class models in pregnant women receiving intermittent preventive treatment at Machinga District Hospital, Malawi, 2010. Malaria Journal. 2014;13(1):p. 509. doi: 10.1186/1475-2875-13-509. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Leong D., NicAogáin K., Luque-Sastre L., et al. A 3-year multi-food study of the presence and persistence of Listeria monocytogenes in 54 small food businesses in Ireland. International Journal of Food Microbiology. 2017;249:18–26. doi: 10.1016/j.ijfoodmicro.2017.02.015. [DOI] [PubMed] [Google Scholar]
5.Hutchinson A. B., Patel P., Sansom S. L., et al. Cost-effectiveness of pooled nucleic acid amplification testing for acute HIV infection after third-generation HIV antibody screening and rapid testing in the United States: a comparison of three public health settings. PLoS Medicine. 2010;7(9) doi: 10.1371/journal.pmed.1000342.e1000342 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Emmanuel J. C., Bassett M. T., Smith H. J., Jacobs J. A. Pooling of sera for human immunodeficiency virus (HIV) testing: an economical method for use in developing countries. Journal of Clinical Pathology. 1988;41(5):582–585. doi: 10.1136/jcp.41.5.582. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Linauts S., Saldanha J., Strong D. M. PRISM hepatitis B surface antigen detection of hepatits B virus minipool nucleic acid testing yield samples. Transfusion. 2008;48(7):1376–1382. doi: 10.1111/j.1537-2995.2008.01698.x. [DOI] [PubMed] [Google Scholar]
8.Mester P., Witte A. K., Robben C., et al. Optimization and evaluation of the qPCR-based pooling strategy DEP-pooling in dairy production for the detection of Listeria monocytogenes. Food Control. 2017;82:298–304. doi: 10.1016/j.foodcont.2017.06.039. [DOI] [Google Scholar]
9.Lindan C., Mathur M., Kumta S., et al. Utility of pooled urine specimens for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in men attending public sexually transmitted infection clinics in Mumbai, India, by PCR. Journal of Clinical Microbiology. 2005;43(4):1674–1677. doi: 10.1128/jcm.43.4.1674-1677.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Saha-Chaudhuri P., Weinberg C. R. Specimen pooling for efficient use of biospecimens in studies of time to a common event. American Journal of Epidemiology. 2013;178(1):126–135. doi: 10.1093/aje/kws442. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Mitchell E. M., Lyles R. H., Manatunga A. K., Schisterman E. F. Semiparametric regression models for a right-skewed outcome subject to pooling. American Journal of Epidemiology. 2015;181(7):541–548. doi: 10.1093/aje/kwu301. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Dorfman R. The detection of defective members of large populations. Annals of Mathematical Statistics. 1943;14(4):436–440. doi: 10.1214/aoms/1177731363. [DOI] [Google Scholar]
13.Tebbs J., Bilder C. Confidence interval procedures for the probability of disease transmission in multiple-vector-transfer designs. Journal of Agricultural, Biological, and Environmental Statistics. 2004;9(1):79–90. doi: 10.1198/1085711043127. [DOI] [Google Scholar]
14.Gastwirth J. L. The efficiency of pooling in the detection of rare mutations. American Journal of Human Genetics. 2000;67(4):1036–1039. doi: 10.1086/303097. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ozerov M., Vasemägi A., Wennevik V., et al. Finding markers that make a difference: DNA pooling and SNP-arrays identify population informative markers for genetic stock identification. PLoS One. 2013;8(12) doi: 10.1371/journal.pone.0082434.e82434 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Pilcher C. D., Price M. A., Hoffman I. F., et al. Frequent detection of acute primary HIV infection in men in Malawi. AIDS. 2004;18(3):517–524. doi: 10.1097/00002030-200402200-00019. [DOI] [PubMed] [Google Scholar]
17.Kim S. B., Kim H. W., Kim H.-S., et al. Pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection in Seoul, South Korea. Scandinavian Journal of Infectious Diseases. 2014;46(2):136–140. doi: 10.3109/00365548.2013.851415. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Seo D. H., Whang D. H., Song E. Y., et al. Occult hepatitis B virus infection and blood transfusion. World Journal of Hepatology. 2015;7(3):600–606. doi: 10.4254/wjh.v7.i3.600. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Heffernan A. L., Aylward L. L., Toms L.-M. L., Sly P. D., Macleod M., Mueller J. F. Pooled biological specimens for human biomonitoring of environmental chemicals: opportunities and limitations. Journal of Exposure Science and Environmental Epidemiology. 2014;24(3):225–232. doi: 10.1038/jes.2013.76. [DOI] [PubMed] [Google Scholar]
20.Ramos M., Heffernan A. L., Toms L., et al. Concentrations of phthalates and DINCH metabolites in pooled urine from Queensland, Australia. Environment International. 2016;88:179–186. doi: 10.1016/j.envint.2015.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Swallow W. H. Relative mean squared error and cost considerations in choosing group size for group testing to estimate infection rates and probabilities of disease transmission. Phytopathology. 1987;77(10):1376–1381. doi: 10.1094/phyto-77-1376. [DOI] [Google Scholar]
22.Hughes-Oliver J. M., Swallow W. H. A two-stage adaptive group-testing procedure for estimating small proportions. Journal of the American Statistical Association. 1994;89(427):982–993. doi: 10.2307/2290924. [DOI] [Google Scholar]
23.Tu X. M., Litvak E., Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika. 1995;82(2):287–297. doi: 10.1093/biomet/82.2.287. [DOI] [Google Scholar]
24.Liu A., Liu C., Zhang Z., Albert P. S. Optimality of group testing in the presence of misclassification. Biometrika. 2011;99(1):245–251. doi: 10.1093/biomet/asr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Xiong W., Ding J. Robust procedures for experimental design in group testing considering misclassification. Statistics & Probability Letters. 2015;100:35–41. doi: 10.1016/j.spl.2015.01.021. [DOI] [Google Scholar]
26.Chen P., Tebbs J. M., Bilder C. R. Group testing regression models with fixed and random effects. Biometrics. 2009;65(4):1270–1278. doi: 10.1111/j.1541-0420.2008.01183.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Zhang Z., Liu A., Lyles R. H., Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Statistics in Medicine. 2012;31(22):2473–2484. doi: 10.1002/sim.4367. [DOI] [PubMed] [Google Scholar]
28.Li Q., Liu A., Xiong W. D-optimality of group testing for joint estimation of correlated rare diseases with misclassification. Statistica Sinica. 2017;27(2):823–838. doi: 10.5705/ss.202015.0178. [DOI] [Google Scholar]
29.Gastwirth J. L., Hammick P. A. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of AIDS antibodies in blood donors. Journal of Statistical Planning and Inference. 1989;22(1):15–27. doi: 10.1016/0378-3758(89)90061-x. [DOI] [Google Scholar]
30.Xie M. Regression analysis of group testing samples. Statistics in Medicine. 2001;20(13):1957–1969. doi: 10.1002/sim.817.abs. [DOI] [PubMed] [Google Scholar]
31.Bilder C. R., Tebbs J. M. Bias, efficiency, and agreement for group-testing regression models. Journal of Statistical Computation and Simulation. 2009;79(1):67–80. doi: 10.1080/00949650701608990. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Li M., Xie M. Nonparametric and semiparametric regression analysis of group testing samples. International Journal of Statistics in Medical Research. 2012;1(1):60–72. doi: 10.6000/1929-6029.2012.01.01.06. [DOI] [Google Scholar]
33.Wang D., McMahan C. S., Gallagher C. M., Kulasekera K. B. Semiparametric group testing regression models. Biometrika. 2013;101(3):587–598. doi: 10.1093/biomet/asu007. [DOI] [Google Scholar]
34.Delaigle A., Meister A. Nonparametric regression analysis for group testing data. Journal of the American Statistical Association. 2011;106(494):640–650. doi: 10.1198/jasa.2011.tm10520. [DOI] [Google Scholar]
35.Delaigle A., Zhou W.-X. Nonparametric and parametric estimators of prevalence from group testing data with aggregated covariates. Journal of the American Statistical Association. 2015;110(512):1785–1796. doi: 10.1080/01621459.2015.1054491. [DOI] [Google Scholar]
36.Williams C. J., Moffitt C. M. Estimation of fish and wildlife disease prevalence from imperfect diagnostic tests on pooled samples with varying pool sizes. Ecological Informatics. 2010;5(4):273–280. doi: 10.1016/j.ecoinf.2010.04.003. [DOI] [Google Scholar]
37.Hepworth G. Confidence intervals for proportions estimated by group testing with groups of unequal size. Journal of Agricultural, Biological, and Environmental Statistics. 2005;10(4):478–497. doi: 10.1198/108571105x81698. [DOI] [Google Scholar]
38.Haber G., Malinovsky Y. Random walk designs for selecting pool sizes in group testing estimation with small samples. Biometrical Journal. 2017;59(6):1382–1398. doi: 10.1002/bimj.201700004. [DOI] [PubMed] [Google Scholar]
39.Haber G., Malinovsky Y., Albert P. S. Sequential estimation in the group testing problem. Sequential Analysis. 2018;37(1):1–17. doi: 10.1080/07474946.2017.1394716. [DOI] [Google Scholar]
40.Vansteelandt S., Goetghebeur E., Verstraeten T. Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics. 2000;56(4):1126–1133. doi: 10.1111/j.0006-341x.2000.01126.x. [DOI] [PubMed] [Google Scholar]
41.Delaigle A., Hall P. Nonparametric regression with homogeneous group testing data. The Annals of Statistics. 2012;40(1):131–158. doi: 10.1214/11-aos952. [DOI] [Google Scholar]
42.McMahan C. S., Tebbs J. M., Bilder C. R. Informative dorfman screening. Biometrics. 2012;68(1):287–296. doi: 10.1111/j.1541-0420.2011.01644.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Verstraeten T., Farah B., Duchateau L., Matu R. Pooling sera to reduce the cost of HIV surveillance: a feasibility study in a rural Kenyan district. Tropical Medicine & International Health. 1998;3(9):747–750. doi: 10.1046/j.1365-3156.1998.00293.x. [DOI] [PubMed] [Google Scholar]
44.Bilder C. R., Zhang B., Schaarschmidt F., Tebbs J. M. binGroup: a package for group testing. The R Journal. 2010;2(2):56–60. doi: 10.32614/rj-2010-016. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Hepworth G., Watson R. Debiased estimation of proportions in group testing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2009;58(1):105–121. doi: 10.1111/j.1467-9876.2008.00639.x. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Click here for additional data file.^{(35KB, pdf)}

Data Availability Statement

The Kenya data supporting this study are from previously reported studies and datasets, which have been cited. The data are available at https://cran.r-project.org/package=binGroup.

[B1] 1.Behets F., Bertozzi S., Kasali M., et al. Successful use of pooled sera to determine HIV-1 seroprevalence in Zaire with development of cost-efficiency models. AIDS. 1990;4(8):737–742. doi: 10.1097/00002030-199008000-00004. [DOI] [PubMed] [Google Scholar]

[B2] 2.Westreich D. J., Hudgens M. G., Fiscus S. A., Pilcher C. D. Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests. Journal of Clinical Microbiology. 2008;46(5):1785–1792. doi: 10.1128/jcm.00787-07. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Zhou Z., Mitchell R. M., Gutman J., et al. Pooled PCR testing strategy and prevalence estimation of submicroscopic infections using bayesian latent class models in pregnant women receiving intermittent preventive treatment at Machinga District Hospital, Malawi, 2010. Malaria Journal. 2014;13(1):p. 509. doi: 10.1186/1475-2875-13-509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Leong D., NicAogáin K., Luque-Sastre L., et al. A 3-year multi-food study of the presence and persistence of Listeria monocytogenes in 54 small food businesses in Ireland. International Journal of Food Microbiology. 2017;249:18–26. doi: 10.1016/j.ijfoodmicro.2017.02.015. [DOI] [PubMed] [Google Scholar]

[B5] 5.Hutchinson A. B., Patel P., Sansom S. L., et al. Cost-effectiveness of pooled nucleic acid amplification testing for acute HIV infection after third-generation HIV antibody screening and rapid testing in the United States: a comparison of three public health settings. PLoS Medicine. 2010;7(9) doi: 10.1371/journal.pmed.1000342.e1000342 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Emmanuel J. C., Bassett M. T., Smith H. J., Jacobs J. A. Pooling of sera for human immunodeficiency virus (HIV) testing: an economical method for use in developing countries. Journal of Clinical Pathology. 1988;41(5):582–585. doi: 10.1136/jcp.41.5.582. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Linauts S., Saldanha J., Strong D. M. PRISM hepatitis B surface antigen detection of hepatits B virus minipool nucleic acid testing yield samples. Transfusion. 2008;48(7):1376–1382. doi: 10.1111/j.1537-2995.2008.01698.x. [DOI] [PubMed] [Google Scholar]

[B8] 8.Mester P., Witte A. K., Robben C., et al. Optimization and evaluation of the qPCR-based pooling strategy DEP-pooling in dairy production for the detection of Listeria monocytogenes. Food Control. 2017;82:298–304. doi: 10.1016/j.foodcont.2017.06.039. [DOI] [Google Scholar]

[B9] 9.Lindan C., Mathur M., Kumta S., et al. Utility of pooled urine specimens for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in men attending public sexually transmitted infection clinics in Mumbai, India, by PCR. Journal of Clinical Microbiology. 2005;43(4):1674–1677. doi: 10.1128/jcm.43.4.1674-1677.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Saha-Chaudhuri P., Weinberg C. R. Specimen pooling for efficient use of biospecimens in studies of time to a common event. American Journal of Epidemiology. 2013;178(1):126–135. doi: 10.1093/aje/kws442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Mitchell E. M., Lyles R. H., Manatunga A. K., Schisterman E. F. Semiparametric regression models for a right-skewed outcome subject to pooling. American Journal of Epidemiology. 2015;181(7):541–548. doi: 10.1093/aje/kwu301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Dorfman R. The detection of defective members of large populations. Annals of Mathematical Statistics. 1943;14(4):436–440. doi: 10.1214/aoms/1177731363. [DOI] [Google Scholar]

[B13] 13.Tebbs J., Bilder C. Confidence interval procedures for the probability of disease transmission in multiple-vector-transfer designs. Journal of Agricultural, Biological, and Environmental Statistics. 2004;9(1):79–90. doi: 10.1198/1085711043127. [DOI] [Google Scholar]

[B14] 14.Gastwirth J. L. The efficiency of pooling in the detection of rare mutations. American Journal of Human Genetics. 2000;67(4):1036–1039. doi: 10.1086/303097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Ozerov M., Vasemägi A., Wennevik V., et al. Finding markers that make a difference: DNA pooling and SNP-arrays identify population informative markers for genetic stock identification. PLoS One. 2013;8(12) doi: 10.1371/journal.pone.0082434.e82434 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Pilcher C. D., Price M. A., Hoffman I. F., et al. Frequent detection of acute primary HIV infection in men in Malawi. AIDS. 2004;18(3):517–524. doi: 10.1097/00002030-200402200-00019. [DOI] [PubMed] [Google Scholar]

[B17] 17.Kim S. B., Kim H. W., Kim H.-S., et al. Pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection in Seoul, South Korea. Scandinavian Journal of Infectious Diseases. 2014;46(2):136–140. doi: 10.3109/00365548.2013.851415. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Seo D. H., Whang D. H., Song E. Y., et al. Occult hepatitis B virus infection and blood transfusion. World Journal of Hepatology. 2015;7(3):600–606. doi: 10.4254/wjh.v7.i3.600. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Heffernan A. L., Aylward L. L., Toms L.-M. L., Sly P. D., Macleod M., Mueller J. F. Pooled biological specimens for human biomonitoring of environmental chemicals: opportunities and limitations. Journal of Exposure Science and Environmental Epidemiology. 2014;24(3):225–232. doi: 10.1038/jes.2013.76. [DOI] [PubMed] [Google Scholar]

[B20] 20.Ramos M., Heffernan A. L., Toms L., et al. Concentrations of phthalates and DINCH metabolites in pooled urine from Queensland, Australia. Environment International. 2016;88:179–186. doi: 10.1016/j.envint.2015.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Swallow W. H. Relative mean squared error and cost considerations in choosing group size for group testing to estimate infection rates and probabilities of disease transmission. Phytopathology. 1987;77(10):1376–1381. doi: 10.1094/phyto-77-1376. [DOI] [Google Scholar]

[B22] 22.Hughes-Oliver J. M., Swallow W. H. A two-stage adaptive group-testing procedure for estimating small proportions. Journal of the American Statistical Association. 1994;89(427):982–993. doi: 10.2307/2290924. [DOI] [Google Scholar]

[B23] 23.Tu X. M., Litvak E., Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika. 1995;82(2):287–297. doi: 10.1093/biomet/82.2.287. [DOI] [Google Scholar]

[B24] 24.Liu A., Liu C., Zhang Z., Albert P. S. Optimality of group testing in the presence of misclassification. Biometrika. 2011;99(1):245–251. doi: 10.1093/biomet/asr064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Xiong W., Ding J. Robust procedures for experimental design in group testing considering misclassification. Statistics & Probability Letters. 2015;100:35–41. doi: 10.1016/j.spl.2015.01.021. [DOI] [Google Scholar]

[B26] 26.Chen P., Tebbs J. M., Bilder C. R. Group testing regression models with fixed and random effects. Biometrics. 2009;65(4):1270–1278. doi: 10.1111/j.1541-0420.2008.01183.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Zhang Z., Liu A., Lyles R. H., Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Statistics in Medicine. 2012;31(22):2473–2484. doi: 10.1002/sim.4367. [DOI] [PubMed] [Google Scholar]

[B28] 28.Li Q., Liu A., Xiong W. D-optimality of group testing for joint estimation of correlated rare diseases with misclassification. Statistica Sinica. 2017;27(2):823–838. doi: 10.5705/ss.202015.0178. [DOI] [Google Scholar]

[B29] 29.Gastwirth J. L., Hammick P. A. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of AIDS antibodies in blood donors. Journal of Statistical Planning and Inference. 1989;22(1):15–27. doi: 10.1016/0378-3758(89)90061-x. [DOI] [Google Scholar]

[B30] 30.Xie M. Regression analysis of group testing samples. Statistics in Medicine. 2001;20(13):1957–1969. doi: 10.1002/sim.817.abs. [DOI] [PubMed] [Google Scholar]

[B31] 31.Bilder C. R., Tebbs J. M. Bias, efficiency, and agreement for group-testing regression models. Journal of Statistical Computation and Simulation. 2009;79(1):67–80. doi: 10.1080/00949650701608990. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32.Li M., Xie M. Nonparametric and semiparametric regression analysis of group testing samples. International Journal of Statistics in Medical Research. 2012;1(1):60–72. doi: 10.6000/1929-6029.2012.01.01.06. [DOI] [Google Scholar]

[B33] 33.Wang D., McMahan C. S., Gallagher C. M., Kulasekera K. B. Semiparametric group testing regression models. Biometrika. 2013;101(3):587–598. doi: 10.1093/biomet/asu007. [DOI] [Google Scholar]

[B34] 34.Delaigle A., Meister A. Nonparametric regression analysis for group testing data. Journal of the American Statistical Association. 2011;106(494):640–650. doi: 10.1198/jasa.2011.tm10520. [DOI] [Google Scholar]

[B35] 35.Delaigle A., Zhou W.-X. Nonparametric and parametric estimators of prevalence from group testing data with aggregated covariates. Journal of the American Statistical Association. 2015;110(512):1785–1796. doi: 10.1080/01621459.2015.1054491. [DOI] [Google Scholar]

[B36] 36.Williams C. J., Moffitt C. M. Estimation of fish and wildlife disease prevalence from imperfect diagnostic tests on pooled samples with varying pool sizes. Ecological Informatics. 2010;5(4):273–280. doi: 10.1016/j.ecoinf.2010.04.003. [DOI] [Google Scholar]

[B37] 37.Hepworth G. Confidence intervals for proportions estimated by group testing with groups of unequal size. Journal of Agricultural, Biological, and Environmental Statistics. 2005;10(4):478–497. doi: 10.1198/108571105x81698. [DOI] [Google Scholar]

[B38] 38.Haber G., Malinovsky Y. Random walk designs for selecting pool sizes in group testing estimation with small samples. Biometrical Journal. 2017;59(6):1382–1398. doi: 10.1002/bimj.201700004. [DOI] [PubMed] [Google Scholar]

[B39] 39.Haber G., Malinovsky Y., Albert P. S. Sequential estimation in the group testing problem. Sequential Analysis. 2018;37(1):1–17. doi: 10.1080/07474946.2017.1394716. [DOI] [Google Scholar]

[B40] 40.Vansteelandt S., Goetghebeur E., Verstraeten T. Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics. 2000;56(4):1126–1133. doi: 10.1111/j.0006-341x.2000.01126.x. [DOI] [PubMed] [Google Scholar]

[B41] 41.Delaigle A., Hall P. Nonparametric regression with homogeneous group testing data. The Annals of Statistics. 2012;40(1):131–158. doi: 10.1214/11-aos952. [DOI] [Google Scholar]

[B42] 42.McMahan C. S., Tebbs J. M., Bilder C. R. Informative dorfman screening. Biometrics. 2012;68(1):287–296. doi: 10.1111/j.1541-0420.2011.01644.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43.Verstraeten T., Farah B., Duchateau L., Matu R. Pooling sera to reduce the cost of HIV surveillance: a feasibility study in a rural Kenyan district. Tropical Medicine & International Health. 1998;3(9):747–750. doi: 10.1046/j.1365-3156.1998.00293.x. [DOI] [PubMed] [Google Scholar]

[B44] 44.Bilder C. R., Zhang B., Schaarschmidt F., Tebbs J. M. binGroup: a package for group testing. The R Journal. 2010;2(2):56–60. doi: 10.32614/rj-2010-016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] 45.Hepworth G., Watson R. Debiased estimation of proportions in group testing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2009;58(1):105–121. doi: 10.1111/j.1467-9876.2008.00639.x. [DOI] [Google Scholar]

PERMALINK

Determination of Varying Group Sizes for Pooling Procedure

Wenjun Xiong

Hongyu Lu

Juan Ding

Abstract

1. Introduction