Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 30.
Published in final edited form as: Stat Med. 2021 May 28;40(19):4185–4199. doi: 10.1002/sim.9022

Prediction-Driven Pooled Testing Methods: Application to HIV Treatment Monitoring in Rakai, Uganda

Adam Brand 1, Susanne May 2, James P Hughes 2, Gertrude Nakigozi 5, Steven J Reynolds 3,4, Erin E Gabriel 1
PMCID: PMC8487918  NIHMSID: NIHMS1706371  PMID: 34046930

Summary

Chronic medical conditions often necessitate regular testing for proper treatment. Regular testing of all afflicted individuals may not be feasible due to limited resources, as is true with HIV monitoring in resource-limited settings. Pooled testing methods have been developed in order to allow regular testing for all while reducing resource burden. However, the most commonly-used methods do not make use of covariate information predictive of treatment failure, which could improve performance. We propose and evaluate four prediction-driven pooled testing methods that incorporate covariate information to improve pooled testing performance. We then compare these methods in the HIV treatment management setting to current methods with respect to testing efficiency, sensitivity and number of testing rounds using simulated data and data collected in Rakai, Uganda. Results show that the prediction-driven methods increase efficiency by up to 20% compared to current methods while maintaining equivalent sensitivity and reducing number of testing rounds by up to 70%. When predictions were incorrect, the performance of prediction-based matrix methods remained robust. The best performing method using our motivating data from Rakai was a prediction-driven hybrid method, maintaining sensitivity over 96% and efficiency over 75% in likely scenarios. If these methods perform similarly in the field, they may contribute to improving mortality and reducing transmission in resource-limited settings. Although we evaluate our proposed pooling methods in the HIV treatment setting, they can be applied to any setting that necessitates testing of a quantitative biomarker for a threshold-based decision.

Keywords: pooled testing, prediction-based testing, treatment failure, mini pool, matrix-based testing

1 |. INTRODUCTION

Proper management of treatable, chronic medical conditions often relies on measurement of one or more biomarkers to assess the effectiveness of current treatment and disease progression. Based on regular blood testing, a patient with chronic kidney disease may have their treatment strategy changed or an anemic’s diet and/or vitamin regimen may be altered. Some chronic, infectious diseases become resistant to a patient’s current treatment and a change in medication(s) is needed to maintain disease suppression, as is the case in HIV. Proper treatment for HIV that includes regular monitoring for treatment failure using a continuous measure of HIV viral load (V L) can improve mortality and reduce transmission by up to 96%.1,2 Many people infected with HIV live in resource-limited regions where regular individual monitoring can pose a financial burden, potentially leading to a lost opportunity to slow the spread of disease.3 Efficient, cost-effective methods for the measurement of continuous biomarkers are needed in resource-limited regions to control the course of treatable, chronic diseases thereby improving mortality and preventing transmission of infectious diseases. Incorporating patients’ covariate information predictive of the continuous biomarker can potentially increase the efficiency and cost-effectiveness of testing.

A natural approach for incorporating covariate information is to use prediction to decide those patients to test more or less often, and there is a large literature on risk factors for treatment failure in HIV treatment monitoring.4,5,6,7,8,9,10,11,12 In most cases, it is suggested that monitoring should be increased for those at high risk. This is possible in resource rich settings, but in resource limited settings monitoring can only be increased for some if it is reduced for other patients at lower risk. This approach of reducing monitoring for patients at lower risk of treatment failure, called adaptive frequency monitoring, was investigated in Ssempijja et al.13 They showed via simulation that using previous viral load measurements to predict patients at low risk of treatment failure substantially reduced the total number of HIV tests needed for monitoring. This reduction in number of tests however, resulted in increased number of months of undetected treatment failures and deaths.

Pooled testing methods may offer better sensitivity than frequency monitoring, because all samples are tested in some manner. The goal of pooled testing methods is to identify every individual in a population whose biomarker value is larger than a predefined threshold (quantitative methods), e.g. experiencing HIV treatment failure, or that are positive for some infection or disease (binary methods) using the fewest tests possible. In order to classify someone as above the threshold or positive, they must be tested individually. Pooled testing methods use pooled test results to choose which individual samples to test with the goal of needing to test only those individuals above the threshold.

Much work has been done developing pooled testing methods for identification and monitoring of disease. Blood banks have used pooled testing to successfully screen out blood infected with transmittable diseases.14,15,16 Pooled testing methods have also been developed and evaluated for identifying acute HIV infection.15,17,18,19,20,21,22,23,24,25,26,14,27,28,29 These methods rely on a binary biomarker for detection of disease, but a continuous measure is needed to determine if disease is properly suppressed in infected patients. Pooled testing methods allowing for a continuous biomarker measure have been developed for assessing treatment failure in patients infected with HIV.

May et al. developed the mini pool + algorithm (Mini+alg) method and the matrix-based, simple search (SS) method for assessing HIV treatment failure, and showed that these methods improve testing efficiency while maintaining high negative predictive value at prevalence of treatment failure up to 20%.30 Multiple researchers have validated the utility of the Mini+alg and SS methods in clinical settings, and have established these methods as the most commonly-used pooled testing methods for detecting treatment failure in an infected population.31,32,33,34,35 Hanscom developed a modified simple search (MSS) method that improved on the simple search method in both efficiency and number of testing rounds.36 However, none of these methods incorporate potentially useful covariate information.

In addition to developing the MSS method, Hanscom developed a model based search (MBS) matrix-based pooled testing method.36 The MBS method incorporates covariate information, but only works well when covariates are highly predictive of treatment failure. Hanscom also developed an EM-algorithm (EM) matrix-based pooled testing method that performed well when HIV viral load (V L) was normally distributed, but did not perform well when V L was realistically skewed. Methods relying heavily on statistical modelling of or assumptions about the biomarker do not work as well as the currently-used methods in most realistic scenarios, because unless the modelling/assumptions are accurate, the most pertinent information about a sample’s value is the pool value(s) to which it contributed.

We propose and evaluate four prediction-driven pooled testing methods incorporating predictions about the value of the continuous biomarker that balance the use of the predictions with the information from the pool measurements. These include a mini pool method (MiniPred), two matrix-based methods (Linreg and LRSOE) and a method (I10MiniPred90-HyPred) from a new class of pooled testing methods we introduce and name the HyPred class. The first three methods were initially introduced in the Master’s Thesis of the first author.37 The MiniPred method was also developed independently by Tao et al., therein named the “Marker-assisted mini-pooling with algorithm”, but was only compared to the Mini+alg method.38

We evaluate and compare the proposed prediction-driven pooled testing methods to the current most commonly-used methods using simulations based on and using the same metrics as those used in May et al. and Hanscom,30,36 which are more general versions of those proposed by Wang et al.39 Wang et al. proposed a statistical framework for evaluating pooled testing (or group testing) methods and provided closed form solutions to early methods in simple settings, but recommended using simulation to estimate pooled testing operating characteristics for more complex methods and biomarker distributions.39 Wang et al. stated that even calculating the density of the measured biomarker in a pooled testing setting without a normality assumption is not feasible, and may not be possible.

Our motivating setting uses the same data source as Ssempijja et al, the Rakai Health Sciences Program in Rakai, Uganda.13 We present results of applying our proposed methods both to simulated data and to real world data using simulated pooling from actual viral load measures collected in Rakai, Uganda, and illustrate the potential to reduce burden of regular individual monitoring. In simulations we demonstrate method performance under a variety of scenarios using differing strengths of prediction and varying levels of measurement error and using highly skewed simulated data mirroring the distribution seen in Rakai.

In Section 2 we introduce the necessary notation and outline the commonly-used pooled testing methods as well as our proposed prediction-driven pooled testing methods. In Section 3 we compare our proposed methods in simulations that follow our real data application in HIV V L monitoring in Rakai. We evaluate method performance using actual data from Rakai in Section 4. In Section 5 we present a Shiny application with which a user can easily implement each method presented. Finally, in Section 6 we outline some limitations of our methods before discussing future research possibilities.

2 |. METHODS

There are currently two classes of pooled testing methods in the literature for pooled testing of a quantitative biomarker; mini pool and matrix. Mini pool methods pool samples into a single pool and test to obtain a pool average. Matrix methods arrange samples into a matrix, typically square, and pool across each row and column providing row and column averages. Below are notation and descriptions of the mini pool + algorithm (Mini+alg) method, the modified simple search (MSS), and our four proposed prediction-driven methods; MiniPred, Linreg, LRSOE and the HyPred class of methods.

Let there be N patients indexed by i. Let Vi be the continuous measure of interest for patient i, such as HIV, Epstein-Barr or hepatitis C viral loads etc. Without loss of generality, we assume higher levels of V are worse. Let C be the level of V that we are interested in detecting, i.e. VC. Let q denote a mini pool, and let V_q be the pool average for mini pool q.

The Mini+alg method pools n samples from N into pool q where pool size n is selected such that the lower limit of detection for the assay being used is C/n or less. Each pool is tested to obtain V_q. If V_q<C/n then Vi < C for each i in q. If V_qC/n, samples are tested one-at-a-time (in random order), and the result of each test divided by pool size is subtracted from the pool average, i.e. V_q=V_q(Vi/n) letting V_q represent the updated pool average. Random individual testing repeats until the sequentially updated pool average is less than C/n.

The MSS method is a matrix pooled testing method that arranges samples into an n × n matrix. Let j and k be the row and column indices, respectively, for a matrix M. Let Vjk be the individual sample, Vi, arranged in the jth row and kth column of matrix M. Let V_j· and V_·k be pool averages for a single row j and column k in matrix M, respectively. After obtaining V_j· and V_·k, for each row and column in M, MSS chooses to test the individual samples corresponding to the highest sum of V_j· and V_·k, among all pairwise sums, while ensuring that no two samples are tested from the same row or column per round of testing. During each round of testing, one tests and classifies samples based on whether Vjk < C, and updates the row and column averages, V_j·=V_j·(Vjk/n) and V_·k=V_·k(Vjk/n), respectively. After each round of testing, if a sequentially updated row or column average is below C/n, all unclassified samples in that row/column are classified as Vjk < C. Testing continues until all samples in M are classified. Hanscom developed the MSS method to test a maximum of n/2 samples per round, rounded down, to improve testing turnaround time of the SS method while maintaining efficiency.36 When there are few rows or columns left to classify in a matrix, the number of tests per round may be less than n/2, and the restriction that tested samples be in different rows and columns is relaxed.

We propose four prediction-driven pooled testing methods incorporating a prediction about Vi, denoted as V^i; a mini pool method (MiniPred), two matrix-based methods (Linreg and LRSOE) and a class of hybrid methods (HyPred).

Let V^i be the quantitative prediction for Vi, and let V^jk be the prediction corresponding to the sample in the jth row and kth column of matrix M. Let RV^i be the descending rank of V^i and RV^JK be the descending rank of V^JK. Let V_^j· and V_^·k be averaged predictions for a row and column pool, respectively, obtained by averaging V^jk in row j and column k.

The Prediction-driven Mini-pool (MiniPred) Method

A straight-forward extension to the Mini+alg method incorporating covariates is the MiniPred method. This method operates similarly to Mini+alg, but instead of testing individual samples at random from q where V_qC/n, MiniPred tests the sample with max(V^i) one-at-a-time until the sequentially updated pool average is less than C/n, i.e. V_q<C/n. The performance of this method relies solely on the ranking of the quantitative measure. The accuracy of the numeric prediction matters only insofar as it preserves the ranks of the actual values. A detailed algorithm is presented in Table S1 of the Supplementary Materials.

The Linear Regression (Linreg) Method

An extension of matrix-based pooled testing incorporating predictions is the Linreg method. Samples are arranged into an n×n matrix, and samples are pooled across rows and columns to obtain V_j· and V_·k for each j and k in M. A simple application of predictions would be to test the samples with highest V^ij in M. However, this ignores the information from V_j· and V_·k. The Linreg method incorporates this information by ‘fitting’ the predictions, V^jk, to V_j· and V_·k in an iterative fashion. The predictions are scaled to the pool averages, because as Hanscom discovered, methods relying too heavily on predictions perform poorly when prediction accuracy is not high.36 Fitting the predictions to the pool averages is a way to incorporate predictions without ignoring the information from the pool averages, and as we show later through simulation, produces robust performance even in settings with unrealistically poor predictions. After scaling, the samples with the highest predictions are tested individually, as detailed below. The Linreg method is an extension to the MSS method in that the selection of individual samples for testing is driven by the highest row/column averages while being informed by predictions. Like the MSS method, Linreg employs a limit to the number of samples tested in a round. This ensures that the information in the matrix is updated more frequently with the intention of increasing efficiency at the cost of increased number of testing rounds.

In practice there is often measurement error which affects both the pool averages and the individual sample tests separately. Therefore, it is unlikely that j=1nV_j·=k=1nV_·k as is the case in a matrix of true values, so we define V_j· and V_·k to be the adjusted row and column averages obtained by proportionally scaling each V_j· and V_·k to the average of j=1nV_j· and k=1nV_·k. This adjustment allows for possible sets of values of Vjk that satisfy V_j· and V_·k for all j and k in M. This is essential in fitting the predictions to the observed row and column averages, described below and used in both the Linreg and LRSOE methods.

In fitting the predictions to the adjusted row and column averages, each V^jk is scaled proportional to the corresponding V_j· and V_·k in an iterative fitting procedure to obtain a final, scaled prediction V^jk#, so that the final, scaled prediction row and column averages, V_^j·# and V_^·k#, equal V_j· and V_·k, respectively. The iterative fitting procedure continues until either V_^j·#=V_j· and V_^·k#=V_·k for all j and k in M or until the pre-selected maximum number of iterations is reached (20 iterations in our simulations).

One then tests the n/2 samples with highest V^jk#, and then subtracts Vjk/n of the tested samples from the corresponding V_j· and V_·k. The tested samples are classified, and the process repeats starting with re-adjusting the row and column averages until all samples are classified. After each testing round if the sequentially updated pool averages are below C/n, i.e. V_j·<C/n or V_·k<C/n, all unclassified samples in that row/column are classified as Vjk < C. Once a sample is classified, its predicted value is set to zero in the iterative fitting procedure for all future testing rounds. The Linreg method combines the predicted continuous values with the observed row and column averages to test the samples most likely to exceed C, without relying on predictions to protect against poor predictive accuracy. A detailed algorithm for Linreg (Table S2), including equations for the row and column adjustments as well as the iterative fitting procedure, is presented in Section 1 of the Supplementary Materials.

The Linear Regression System of Equations (LRSOE) Method

Another extension of matrix-based pooled testing using predictions is the LRSOE method. LRSOE combines predictions and pool averages to assign values to each sample in each round of testing by assuming the minimum number of treatment failures, i.e. minimum occurrences of VjkC. LRSOE chooses to test individual samples based on those assigned values. This method was designed specifically for settings where the prevalence of VjkC is low and the distribution of V is highly skewed, as is often the case in HIV. In these settings, it is likely that a high row/column average is the result of one occurrence of VjkC with the rest of the samples in that row/column having low Vjk. Choosing the individual sample with high Vjk is aided by the opposing column/row averages, but incorporating predictions is a way to increase the likelihood that the individual samples with highest Vjk are tested first. This method is intended to optimize number of testing rounds to reduce turnaround time and cost of additional rounds of testing.

Let Vjk* be an individual sample value assigned by LRSOE, LLD be the lower limit of detection of the assay, and Uj and Uk be the number of unclassified samples in a row and column, respectively. Let V_j·* and V_·k* be the row and column averages, respectively, of the assigned values. Similar to the Linreg method, LRSOE performs the adjustment of the row and column averages and the iterative fitting procedure to fit the predictions to the observed row and column averages. Unlike Linreg, LRSOE then assigns a set of values such that V_j·=V_j·* and V_·k=V_·k* for all j and k in M while minimizing occurrences of VjkC. Beginning with the sample with max(V^jk#) after the iterative fitting procedure, LRSOE assigns Vjk* the maximum value allowable by the corresponding row and column average, taking into account the lower limit of detection for the assay.

All unclassified samples in the same row and column as the sample with max(V^jk#) are assigned LLD. Once a value is assigned to a sample, that value is set and cannot be changed by further assignments until the next round of testing. The assignment process repeats in order of RV^jk# until all samples have been assigned a value. This process ensures that the assigned row and column averages, V_j·* and V_·k*, equal V_j· and V_·k, respectively. One then tests all samples where Vjk*>LLD, classifies the tested samples, and subtracts Vjk/n of the tested samples from the corresponding, sequentially updated V_j· and V_·k. The process repeats beginning with re-adjusting the row and column averages until all samples are classified. After each testing round if V_j·<C/n or V_·k<C/n, all unclassified samples in that row or column are classified as Vjk < C. Once a sample is classified its predicted value is set to zero in the iterative fitting procedure and the assignment process. A detailed algorithm for LRSOE (Table S3), along with equations for the assignment process, is presented in Section 1 of the Supplementary Materials.

The Hybrid Prediction (HyPred) Class of Methods

HyPred is a hybrid class of pooled testing methods that combines any prediction-driven or classical pooled testing method and individual testing and does so based on predictions. These include but are not limited to individual testing, mini pool methods and matrix-based pooled testing methods. This class of methods is flexible, and can be optimized for different settings.

The general algorithm is to generate predictions, V_i, and then classify samples into tiers of risk. The optimal number of tiers, cutoffs for those tiers and pooled testing methods used in each tier depend on the distribution of V in the population as well as the quality of the predictions. Methods of the HyPred class will have different operating characteristics based on the decisions made by the operator. Selecting the optimal method for each tier should be based on results from simulations mirroring the intended application. We selected individual testing for patients in the top 10% of V_i and the MiniPred method for the remaining 90% based on simulation results mirroring our application (Supplement Section 2). To distinguish this method from the class of methods we will call this the I10MiniPred90-HyPred method.

The HyPred method increases the influence of the prediction information. By classifying subjects into risk tiers based on predicted viral load, the goal is to place all treatment failures in the highest risk tier, who can then undergo individual testing. The remaining risk tiers are pooled using one of the pooling methods. If all treatment failures are correctly placed in the highest risk tier, the pooled subjects are classified using a minimal number of tests and all treatment failures are tested individually, resulting in 100% sensitivity. If predictions are accurate, this method is expected to substantially increase efficiency. However, when predictions are poor, this method is expected to suffer worse performance loss than the other methods.

We compare the four prediction-based methods to the benchmark Mini+alg and MSS methods in the HIV treatment management setting through simulation. In particular, the testing characteristics of interest are efficiency (the proportion of tests saved over individual testing), sensitivity (the proportion of treatment failures, ViC, detected compared to individual testing), and average number of testing rounds per 100 samples which represents turnaround time of results. We assume a maximum number of 20 tests per round of testing. Efficiency and sensitivity are defined as in May et al., Hanscom and Wang et al.30,36,39

As these methods are intended for use in resource-limited regions, prediction models may need to be simple and based on inexpensive, readily available covariates. Examples of such covariates in the HIV setting are baseline (defined at initiation of therapy) CD4 count, treatment regimen, patient-reported adherence and prior treatment failure status. As pooled testing is intended for routine monitoring and not for a patient starting treatment, baseline covariates are often available through individual testing at treatment initiation. Prediction capability and accuracy will vary depending on setting, because different settings will have varying access to data, equipment, expertise and so on. Therefore, this paper does not discuss the potential prediction methods which could be used, but rather evaluates how our proposed methods perform under varying strengths of prediction of V L. To obtain predictions we use linear prediction models throughout.

3 |. SIMULATION STUDY

In our simulation setting, Vi = HIV viral load in copies/mL (denoted as V L), C = 1000 copies/mL, n = 10 and LLD = 50 copies/mL. Two data sets of 50,000 independent, highly-skewed patient records were generated similar to the data generation used by May et al.,30 altered to follow the V L distribution seen in our motivating example and to allow for incorporating two informative covariates. The log10(VL) generated in (1) were exponentiated to obtain V L, the value used in our simulated evaluation procedures.

log10(VL)=0.5+0.1X1+2X2+0.1X1×X2+ϵ (1)

In one data set,40 X1 ~ Beta(70, 1.5), X 2 ~ Bernoulli(0.1) and ϵ ~ N(0, SD = 1.0). Values were truncated at 3,000,000 copies/mL. The random variable ε represents variation of the individual V Ls from a perfect predictive model, and is added on the log scale to reflect that individual variation is dependent on V L. As VL is the exponentiated result of the linear model in (1), the resulting distributions of V L in both data sets are highly skewed, which concurs with our motivating example and the viral load distribution seen in May et al.30 Setting S D = 1.0 resulted in a correlation between log10(VL^), based on the true model, and logl0(V L) of 0.57. The correlation after exponentiating, i.e. between VL^ and V L, was 0.22. Therefore when SD = 1.0, the noise in the data generating mechanism is strong enough to disallow strong predictive accuracy even when the model is correctly specified, which reflects a realistic scenario with correct model specification. The percentage of treatment failures, V L ≥ 1000, was 5.7%.

The pooled testing methods are compared in two prediction scenarios in the main text; the ‘As Good as it Gets’ scenario (Table 1) and the ‘Reverse’ scenario (Table 2). In the ‘As Good as it Gets’ scenario, a separate training set of 5,000 records was generated as in (1) with S D = 1.0,41 and ridge regression was applied to the correct model to obtain the predictive model in (2).

Table 1.

Simulation Results for the ‘As Good as it Gets’ Scenario (50,000 samples classified for each level of SDME)

Mini+alg MiniPred MSS Linreg LRSOE HyPred* Mini+alg MiniPred MSS Linreg LRSOE HyPred*
SD = 0, SDME = 0, Assay Sens. = 100, Assay Spec. = 100 SD = 1, SDME = 0, Assay Sens. = 100, Assay Spec. = 100
Sensitivity (%) 100 100 94.7 94.5 97.8 100 100 100 96.9 96.9 98.2 100
Efficiency (%) 60.3 83.3 69.9 74.0 71.2 81.0 60.5 78.0 70.6 73.2 70.2 74.5
Rounds (per 100 samples) 9.9 3.0 3.7 3.1 2.3 1.4 9.8 6.0 3.5 3.3 2.5 5.5
SD = 0, SDME = 0.05, Assay Sens. = 96.7, Assay Spec. = 99.8 SD = 1, SDME = 0.05, Assay Sens. = 97.8, Assay Spec. = 99.9
Sensitivity (%) 94.8 94.6 87.1 88.1 93.0 100 96.2 95.8 89.1 90.9 94.8 98.8
Efficiency (%) 59.5 79.6 68.7 72.9 70.7 81.0 58.0 72.0 68.3 70.9 69.3 74.2
Rounds (per 100 samples) 10.0 5.3 3.9 3.6 2.4 1.4 10.1 8.0 4.0 4.4 2.7 5.8
SD = 0, SDME = 0.10, Assay Sens. = 93.7, Assay Spec. = 99.6 SD = 1, SDME = 0.10, Assay Sens. = 95.8, Assay Spec. = 99.7
Sensitivity (%) 90.0 89.1 80.7 81.8 87.9 100 93.3 93.1 84.7 87.7 92.4 97.6
Efficiency (%) 58.3 76.5 68.2 72.4 70.4 81.0 56.7 69.4 67.7 69.9 68.6 73.6
Rounds (per 100 samples) 10.1 6.2 4.0 3.8 2.5 1.4 10.2 8.2 4.2 4.8 2.8 6.0
SD = 0, SDME = 0.15, Assay Sens. = 90.7, Assay Spec. = 99.3 SD = 1, SDME = 0.15, Assay Sens. = 93.7, Assay Spec. = 99.5
Sensitivity (%) 86.3 85.8 76.6 77.8 85.0 100 90.8 89.9 80.6 83.9 89.7 96.0
Efficiency (%) 57.6 74.0 67.5 71.7 69.9 81.0 55.6 67.6 67.2 69.3 68.1 73.0
Rounds (per 100 samples) 10.0 6.6 4.1 4.2 2.5 1.4 10.2 8.3 4.3 5.1 2.9 6.0
SD = 0, SDME = 0.20, Assay Sens. = 88.0, Assay Spec. = 99.2 SD = 1, SDME = 0.20, Assay Sens. = 92.1, Assay Spec. = 99.4
Sensitivity (%) 83.1 82.2 71.9 73.5 80.7 100 87.6 87.0 77.1 80.6 86.0 94.8
Efficiency (%) 56.9 71.9 67.2 71.2 69.6 81.0 54.9 65.9 67.0 68.9 67.6 72.1
Rounds (per 100 samples) 10.0 7.0 4.2 4.4 2.6 1.4 10.1 8.4 4.3 5.3 2.9 6.1
SD = 0, SDME = 0.25, Assay Sens. = 85.4, Assay Spec. = 99.0 SD = 1, SDME = 0.25, Assay Sens. = 90.2, Assay Spec. = 99.2
Sensitivity (%) 80.5 79.2 68.5 69.6 76.4 100 84.8 84.5 73.8 76.8 81.8 92.9
Efficiency (%) 56.7 70.3 67.0 70.7 69.0 81.0 54.2 64.6 66.6 68.6 67.2 71.1
Rounds (per 100 samples) 9.9 7.2 4.2 4.6 2.7 1.4 10.0 8.4 4.4 5.4 3.0 6.1
SD = 0, SDME = 0.50, Assay Sens. = 76.6, Assay Spec. = 98.5 SD = 1, SDME = 0.50, Assay Sens. = 82.7, Assay Spec. = 98.0
Sensitivity (%) 69.7 69.1 53.2 54.1 61.3 99.8 71.3 70.9 56.7 58.3 63.3 84.5
Efficiency (%) 56.3 66.2 67.3 70.0 68.3 79.6 52.1 60.5 65.5 67.3 65.3 65.5
Rounds (per 100 samples) 9.4 7.3 4.1 4.8 2.8 1.6 9.8 8.1 4.6 5.9 3.1 6.0
*

Hypred here is the I10MiniPred90-HyPred method, which individually tests samples in the top 10% of treatment failure risk and applies the MiniPred method to the remaining 90%. Sensitivity is defined as the percentage of treatment failures (V L ≥ 1000) detected compared with individual testing. Efficiency is the percentage of tests saved over individual testing. For example, if 30 tests are used to classify 100 samples, the efficiency is 70%. Rounds is the mean number of testing rounds needed to classify 100 samples (assuming max 20 tests per round). Assay sens. is the percentage of true treatment failures detected by individual testing when incorporating measurement error. Assay spec. is the percentage of false positives for treatment failure detected by individual testing. SD denotes the data set used for each simulation (left half uses the SD = 0 data and the right half uses the SD = 1.0 data. SDME denotes the standard deviation of measurement error on the log10 scale.

TABLE 2.

Simulation Results for the ‘Reverse’ Scenario (50,000 samples classified for each level of SDME)

Mini+alg MiniPred MSS Linreg LRSOE HyPred* Mini+alg MiniPred MSS Linreg LRSOE HyPred*
SD = 0, SDME = 0 SD = 1, SDME = 0
Sensitivity (%) 100 100 94.7 94.5 96.3 100 100 100 96.9 96.6 97.7 100
Efficiency (%) 60.3 40.8 69.9 64.9 62.2 39.1 60.5 44.8 70.6 65.6 63.5 42.3
Rounds (per 100 samples) 9.9 11.0 3.7 5.4 3.8 7.4 9.8 11.0 3.5 5.4 3.7 8.6
SD = 0, SDME = 0.05 SD = 1, SDME = 0.05
Sensitivity (%) 94.8 95.9 87.1 86.8 90.0 97.1 96.2 96.8 89.1 89.6 92.2 96.2
Efficiency (%) 59.5 40.6 68.7 64.7 61.6 39.4 58.0 44.1 68.3 64.2 61.9 42.4
Rounds (per 100 samples) 10.0 11.0 3.9 5.6 3.9 7.4 10.1 11.0 4.0 6.0 3.9 8.7
SD = 0, SDME = 0.10 SD = 1, SDME = 0.10
Sensitivity (%) 90.0 91.9 80.7 81.1 83.7 94.2 93.3 93.9 84.7 85.8 88.6 93.1
Efficiency (%) 58.3 40.6 68.2 64.8 61.6 39.6 56.7 43.9 67.7 63.9 61.7 42.4
Rounds (per 100 samples) 10.1 11.0 4.0 5.7 3.9 7.4 10.2 11.0 4.2 6.1 4.0 8.7
SD = 0, SDME = 0.15 SD = 1, SDME = 0.15
Sensitivity (%) 86.3 88.7 76.6 76.5 80.2 91.3 90.8 90.8 80.6 81.6 84.8 90.0
Efficiency (%) 57.6 40.9 67.5 64.8 61.4 39.7 55.6 43.7 67.2 63.8 61.5 42.3
Rounds (per 100 samples) 10.0 11.0 4.1 5.7 3.9 7.4 10.2 11.0 4.3 6.3 3.9 8.7
SD = 0, SDME = 0.20 SD = 1, SDME = 0.20
Sensitivity (%) 83.1 85.2 71.9 71.6 75.8 88.1 87.6 87.7 77.1 77.9 81.7 86.9
Efficiency (%) 56.9 41.4 67.2 65.1 61.5 40.2 54.9 43.6 67.0 63.8 61.4 42.3
Rounds (per 100 samples) 10.0 11.0 4.2 5.7 3.9 7.4 10.1 10.9 4.3 6.4 3.9 8.7
SD = 0, SDME = 0.25 SD = 1, SDME = 0.25
Sensitivity (%) 80.5 82.3 68.5 67.8 71.2 85.3 84.8 84.9 73.8 74.3 78.0 84.4
Efficiency (%) 56.7 42.0 67.0 65.1 61.6 40.7 54.2 43.4 66.6 63.8 61.1 42.2
Rounds (per 100 samples) 9.9 10.9 4.2 5.8 3.9 7.4 10.0 10.9 4.4 6.4 3.9 8.7
SD = 0, SDME = 0.50 SD = 1, SDME = 0.50
Sensitivity (%) 69.7 70.1 53.2 52.3 56.6 73.6 71.3 71.9 56.7 56.9 60.3 74.3
Efficiency (%) 56.3 45.2 67.3 66.3 62.9 43.5 52.1 43.4 65.5 64.0 60.6 41.5
Rounds (per 100 samples) 9.4 10.5 4.1 5.6 3.6 7.1 9.8 10.7 4.6 6.6 3.8 8.6
*

Hypred here is the I10MiniPred90-HyPred method, which individually tests samples in the top 10% of treatment failure risk and applies the MiniPred method to the remaining 90%. Sensitivity is defined as the percentage of treatment failures (V L ≥ 1000) detected compared with individual testing. Efficiency is the percentage of tests saved over individual testing. For example, if 30 tests are used to classify 100 samples, the efficiency is 70%. Rounds is the mean number of testing rounds needed to classify 100 samples (assuming max 20 tests per round). Assay sens. is the percentage of true treatment failures detected by individual testing when incorporating measurement error. Assay spec. is the percentage of false positives for treatment failure detected by individual testing. SD denotes the data set used for each simulation (left half uses the SD = 0 data and the right half uses the SD = 1.0 data. SDME denotes the standard deviation of measurement error on the log10 scale.

log10(VL)=0.44+0.12X1+1.93X2+0.14X1×X2 (2)

In the ‘Reverse’ scenario, the direction of association between V L and the covariates X1 and X2 was reversed in the training set where SD = 1.0,42 and ridge regression was again applied to the model to obtain the predictive model in (3).

log10(VL)=5.820.24X13.72X2+0.13X1×X2 (3)

By reversing the direction of association between V L and the covariates in the training set, the samples with highest V Ls were generally predicted to have the lowest V Ls and vice-versa. Therefore, the ‘Reverse’ scenario represents the worst case scenario in which predictive methods are expected to perform worse than if the covariates are not predictive of V L. These results serve as a sensitivity analysis to demonstrate the maximum performance losses that might occur due to using the prediction-based methods versus methods that do not use predictions.

We also generated a data set without individual variation43, where X1 ~ Beta(56, 2.0), X2 ~ Bernoulli(0.1) and ε ~ N(0,SD = 0). This data set represents a setting where the data generating mechanism accounts for all individual variation, allowing for a perfect prediction model. This scenario, while unrealistic, serves as an upper bound for method performance if prediction were perfect. The percentage of treatment failures, V L ≥ 1000, was 5.8% in this data set. We use the perfect prediction model in (4) to obtain predictions for this data set in the ‘As Good as it Gets’ scenario and the reverse-specified model in (5) to obtain predictions in the ‘Reverse’ scenario.

log10(VL)=0.5+0.1X1+2X2+0.1X1×X2 (4)
log10(VL)=60.1X12X20.1X1×X2 (5)

The V Ls obtained by exponentiating the outcomes of (1) represent the true underlying V L values that are not observable in practice due to measurement error. When carrying out testing in simulations measurement error, M E ~ N(0, S DME)), is added to V L on the log10 scale (by multiplying 10ME and V L) to reflect that the absolute measurement error on the raw scale is dependent on V L. Measurement error is added to individual samples and then separately to pools by averaging V L in each pool and then multiplying 10ME by the pool average. Because measurement error is applied to the individual samples and the pools separately it is unlikely that the mean of the observed values in a pool equal the observed pool average. Higher levels of measurement error decrease the sensitivity of individual testing. This impact is compounded when employing pooled testing methods, because the errors in individual samples and pooled samples can differ and even manifest in opposing directions. Therefore, it is important to evaluate these methods with varying degrees of measurement error and to consider the measurement error of the assay before using pooled testing in different settings.

All 50,000 samples in each data set were classified using each of the methods. Samples were randomly arranged into 10 × 10 matrices, and seeds were set to ensure that each matrix-based method analyzed the same 500 matrices. The mini pool methods also analyzed the same 500 matrices, pooling only across the rows of the matrices to represent 10 separate mini pools per matrix. Because the I10MiniPred90-HyPred method splits the population into tiers based on predicted V L, analyzing the same matrices is not possible. I10MiniPred90-HyPred individually tested samples in the top 10% of predicted V L and applied the MiniPred method to the remaining 90%. Our method selection for each I10MiniPred90-HyPred risk tier was based on simulation results evaluating the prediction-driven methods within the medium and low risk tiers in the ‘As Good as it Gets’ scenario. Results are provided in Supplement Section 2. Efficiency is presented as percentage of tests saved compared with individual testing. Sensitivity is presented as the percentage of treatment failures detected compared with individual testing. Number of testing rounds is presented as average number of rounds to classify 100 samples, assuming a maximum of 20 tests per round. All code to reproduce the simulations is provided on github at https://github.com/Adam-Brand/Pooled_Testing_HIV.

As Table 1 shows, the MiniPred method has superior efficiency and number of testing rounds in the ‘As Good as it Gets’ scenario over the Mini+alg method while maintaining similar sensitivity at all combinations of SD and SDME. The gain in efficiency using MiniPred over Mini+alg is most extreme, up to 23%, at low levels of SDME and decreases to an 8% gain over Mini+alg when SDME =0.5 and SD=1. LRSOE has the highest sensitivity and lowest number of testing rounds of all the matrix-based methods while maintaining efficiency superior to the MSS method in almost every scenario. The Linreg method is the most efficient matrix-based method, but is less sensitive than LRSOE and has higher number of testing rounds. The I10MiniPred90-HyPred method is superior to all other methods in every category when SD = 0. When SD = 1.0 I10MiniPred90-HyPred uses more testing rounds, but maintains superior sensitivity and efficiency to all other methods. The matrix-based methods in general have superior efficiency and number of testing rounds than the mini pool methods as SDME increases, but at the cost of inferior sensitivity. At SDME ≥ 0.20 the loss in sensitivity is such that the matrix-based methods may not be usable. However, it is unlikely that an assay with MESD≥ 0.20 will be used as the sensitivity of the assay itself when performing individual testing is 90% or less.

Table 2 illustrates the effect of reversing the direction of association between V L and the predictors on method performance. The effects of the reversed predictions are most pronounced in the MiniPred and I10MiniPred90-HyPred methods. While maintaining superior/similar sensitivity to all other methods in every scenario, MiniPred and I10MiniPred90-HyPred suffer 25% - 43% efficiency loss and up to 8 more rounds of testing per 100 samples when the predictions are reversed. The prediction-based matrix methods also suffer an efficiency loss, but not as pronounced as MiniPred and I10MiniPred90-HyPred, 5% - 10%. Although the MSS method (which does not use predictions) is the most efficient method when predictions are reversed, LRSOE is more sensitive and has lower/similar number of testing rounds than the other matrix-based methods while only sacrificing 5% - 8% efficiency to the MSS method.

As expected, the methods that do not use predictions outperform methods that do with respect to efficiency when the direction of association between predictive covariates and outcome is reversed. What may not be expected is that the performance of the prediction-based matrix methods remain robust in this unrealistically poor scenario, sacrificing a small amount of efficiency while maintaining sensitivity. Even the methods that suffer the greatest performances losses, MiniPred and I10MiniPred90-HyPred, maintain high sensitivity and efficiency over 40%. Tables 1 and 2 show that LRSOE can improve pooled testing characteristics by incorporating covariate information while maintaining robust performance even when the direction of association with predictors is reversed. When confidence in a prediction model is high, MiniPred and I10MiniPred90-HyPred can be used to increase sensitivity and efficiency over matrix-based methods.

Additional results are provided in the supplementary materials Section 2. Summary results from a ‘No Association’ scenario and a ‘Misspecified’ scenario are presented. Both scenarios use the same data sets used in the above mentioned scenarios. In the ‘No Association’ scenario, the viral load values were permuted to sever any association between the outcome and the data generating mechanism, and the predictive model used in the ‘As Good as it Gets’ scenario was used to generate the predictions. In the ‘Misspecified’ scenario linear regression was performed on the training set where SD = 1.0 to obtain predictions from (6), which were used to evaluate the SD = 1.0 data set. Predictions for the SD = 0 data set in the ‘Misspecified’ scenario were obtained from (7).

log10(VL)=0.62+0.13X1 (6)
log10(VL)=2+0.3X1 (7)

In the ‘No Association’ scenario method performance between the benchmark methods and the proposed methods incorporating covariates performed similarly, reflecting that the randomness of the predictions added no information to the pool averages in selecting which individual samples to test. In the Misspecified scenario, performance was slightly improved by incorporating the predictions, but the improvement was not as pronounced as in the ‘As Good as it Gets’ scenario. In neither scenario was performance hurt by incorporating predictions.

The tables in Supplement Section 2 following the summary tables for the two additional scenarios each present expanded results from one scenario at one level of MESD and SD. For example, Table S6 presents expanded results for the first three rows of Table 1 where SD = 0 (the left half of the table). The tables are organized by scenario (‘As Good as it Gets’, ‘Reverse’, ‘No Association’, then ‘Misspecified’), and within each scenario are organized by data set (SD=0 then SD=1.0). These results are presented to add further detail to the results in the summary tables, but the results of these tables demonstrate the same conclusions. Our proposed methods perform better than methods that do not use predictions, when the predictions are of some value and only exhibit minor losses in the key metrics of interest when the predictions are even worse than random selection.

4 |. RHSP HIV MONITORING IN RAKAI

The Rakai Health Sciences Program (RHSP) has provided free antiretroviral therapy since June 2004. Starting in 2005, routine biannual V L monitoring was introduced for all patients on ART. To emulate a real testing setting where a single patient’s sample would only appear once in any given testing batch, we treat all samples as independent conditional on previous measurements included in the predictive model. Our study uses 14,360 HIV treatment follow-up records collected from December 2004 to May 2012. Starting in 2005, most patients were followed and tested for HIV V L every 6 months for up to 72 months. The prevalence of treatment failure, Vi ≥ 1000, was 5.7%. HIV-1 V L testing was performed on plasma using the Roche Amplicor 1.5 Monitor assay (Roche Diagnostics, Indiana, USA) until September 2010 when Abbott real-time m2000 assay was used (Abbott Laboratories, Illinois, USA).

Records were split into a training set (10,976 records collected until July 1st, 2010) and a test set (3384 records collected after July 1st, 2010). In order to be included in either the training or test set records must have had a baseline CD4 result, a baseline HIV V L result, sex, age at baseline, baseline WHO score, clinic and treatment regimen. Initial treatment regimens consisted of two NRTIs (zidovudine or stavudine plus lamivudine) and nevirapine or efavirenz.

Ridge regression was performed (using the R package glmnet) on the training set modelling log10(V L) on the previously mentioned covariates as well as previous failure status, whether they were enrolled within 1 year prior, and whether they had an HIV V L result within the last 6 months. The resulting linear predictive model had an R2 = 0.11 between log10(VL^) and log10(V L) within the training set, suggesting weak predictive accuracy of the predictors on the continuous HIV V L measure. However, the AUC for classifying those above the threshold for treatment failure, Vi ≥ 1000, within the training set was 0.84 suggesting that the predictions were adequate to preserve the ranks of HIV V L in most cases. Interestingly, the prediction model was more accurate when applied to the test set, providing a corresponding R2 = 0.17 and AUC = 0.98. This likely reflects that samples were split between training and test sets not randomly, but temporally.

The exponentiated results from the linear predictive model were used to predict HIV V L for each record in the test set, and 3300 randomly selected records were chosen to be classified with each of the pooled testing methods. 76 (2.3%) of those 3300 records were treatment failures, i.e., V L ≥ 1000. Seeds were set to ensure each method classified the same set of matrices, except the I10MiniPred90-HyPred method. For the I10MiniPred90-HyPred method, the 300 samples with the highest predicted V L were included in the high risk tier and were tested individually while the remaining records were classified using the MiniPred method, as was done in the simulations. Although all individual samples were tested and observed V Ls collected, no actual pooling was done. Pooling was simulated by averaging the observed V Ls in each pool and adding measurement error by multiplying the pool averages by 10ME.

Table 3 shows that incorporating covariates even weakly predictive of the continuous HIV V L measure can improve pooled testing performance using real data from a resource-limited region. MiniPred has superior sensitivity to all other methods except I10MiniPred90-HyPred at all levels of SDME while maintaining high efficiency. I10MiniPred90-HyPred and MiniPred have similar performance until SDME ≥ 0.25 where I10MiniPred90-HyPred has superior efficiency and sensitivity to MiniPred. The matrix-based methods perform similarly at low levels of SDME with LRSOE having slightly higher sensitivity and lower number of testing rounds until SDME ≥ 0.20. Even at a high SDME of 0.25, Table 3 shows that I10MiniPred90-HyPred can maintain over 96% sensitivity while increasing testing efficiency by over 70%.

TABLE 3.

RHSP Pooled Testing Results (3,300 samples classified for each level of SDME)

Mini+alg MiniPred MSS Linreg LRSOE HyPred*
SDME = 0
Sensitivity (%) 100 100 100 100 100 100
Efficiency (%) 76.8 85.4 76.5 76.8 76.3 80.2
Rounds (per 100 samples) 8.1 4.0 2.2 2.7 2.0 2.8
SDME = .05
Sensitivity (%) 100 100 96.2 96.2 97.5 100
Efficiency (%) 72.8 81.1 76.5 76.5 76.0 79.2
Rounds (per 100 samples) 9.2 6.0 2.2 2.8 2.0 3.5
SDME = .10
Sensitivity (%) 98.7 98.7 92.4 91.1 93.7 100
Efficiency (%) 72.2 80.2 76.4 76.4 76.0 79.0
Rounds (per 100 samples) 9.4 6.2 2.2 2.8 2.1 3.6
SDME = .15
Sensitivity (%) 97.5 98.7 89.9 88.6 91.1 98.7
Efficiency (%) 70.8 78.3 76.1 76.5 75.5 75.1
Rounds (per 100 samples) 9.4 6.3 2.2 2.8 2.1 4.2
SDME = .20
Sensitivity (%) 96.2 97.5 89.9 86.1 88.6 98.7
Efficiency (%) 66.2 73.2 76.2 76.5 75.3 72.2
Rounds (per 100 samples) 9.5 6.5 2.2 2.8 2.1 4.7
SDME = .25
Sensitivity (%) 94.9 97.5 86.1 83.5 86.1 96.2
Efficiency (%) 62.0 68.7 75.8 76.1 74.6 72.1
Rounds (per 100 samples) 9.7 6.7 2.2 2.9 2.0 4.8
SDME = .5
Sensitivity (%) 91.1 91.1 79.7 77.2 79.7 96.2
Efficiency (%) 49.0 55.0 72.0 73.0 70.8 63.8
Rounds (per 100 samples) 9.5 6.4 2.9 3.6 2.2 5.1
SDME = .75
Sensitivity (%) 89.9 89.9 72.2 70.9 70.9 96.2
Efficiency (%) 45.6 51.0 71.7 72.7 69.8 53.0
Rounds (per 100 samples) 9.5 6.5 2.9 3.6 2.3 5.6
*

Hypred here is the I10MiniPred90-HyPred method, which individually tests samples in the top 10% of treatment failure risk and applies the MiniPred method to the remaining 90%. Sensitivity is defined as the percentage of treatment failures (V L ≥ 1000) detected compared with individual testing. Efficiency is the percentage of tests saved over individual testing. For example, if 30 tests are used to classify 100 samples, the efficiency is 70%. Rounds is the mean number of testing rounds needed to classify 100 samples (assuming max 20 tests per round). Assay sens. is the percentage of true treatment failures detected by individual testing when incorporating measurement error. Assay spec. is the percentage of false positives for treatment failure detected by individual testing. SDME denotes the standard deviation of measurement error on the log10 scale.

5 |. SHINY APPLICATION

We have created an R Shiny application, “Pooled Testing Tool”, to both serve as an interactive visual aid demonstrating how each method works and to facilitate use of these methods in the field. Pooled Testing Tool can be accessed at https://adambrand.shinyapps.io/shiny_pooled_testing. A tutorial video for how to use the tool is included in the GitHub repository in the “App Tutorial” folder along with a simulated example data set at https://github.com/Adam-Brand/Pooled_Testing_HIV. Users can upload the example data to the app to create their own example testing matrices and follow each method step-by-step through all testing rounds to classify a matrix. Examples differ only in the tested values the user enters and the initial predicted values, which the user can control by entering different predicted values in the example data prior to uploading.

Although it is the responsibility of the user to ensure that the proper samples are pooled together and that accurate results are entered, Pooled Testing Tool is created to be user-friendly. The tool allows for easy visualization of each row/column pool, which samples to test and where to enter each result. The tool allows the user to upload their own predictions, because prediction methods can vary between testing setting. This flexibility allows the tool to be used in a variety of disease areas and regions.

Because the I10MiniPred90-HyPred method is flexible and can be calibrated to optimize performance for a particular setting, the tool does not include a I10MiniPred90-HyPred option. I10MiniPred90-HyPred can be implemented using the tool by first arranging the patient population into risk tiers, and then applying the desired method for each risk tier separately using the tool.

6 |. DISCUSSION

We illustrate that pooled testing methods for a continuous biomarker can increase testing efficiency over individual testing while maintaining a high level of sensitivity in a realistic HIV treatment monitoring setting. Further, incorporating covariates even weakly predictive of the continuous biomarker of interest can improve testing performance while remaining robust to incorrect predictions. Choice of optimal method depends on the setting in which it is used. Method performance is greatly dependent on the amount of measurement error for the assay used. In settings with relatively small measurement error and strong predictive accuracy, the I10MiniPred90-HyPred and MiniPred methods can increase testing efficiency by over 80% compared to individual testing while maintaining close to 100% sensitivity. In settings where predictive accuracy is questionable, LRSOE can improve pooled testing performance while remaining robust to poor predictions.

A limitation to the evaluation of our proposed methods is the use of a simple prediction model, which as can be seen in the Rakai data, can lead to poor predictive performance. We did not investigate or discuss methods for developing predictions, as the results from the optimal prediction scenario (Table 1) and the reverse prediction scenario (Table 2) serve as the upper and lower bounds of method performance when incorporating predictions of varying strengths. Thus, regardless of how predictions are obtained, our proposed methods’ performance under them should be within the range of these results. The results from the real world application in Rakai, Uganda (Table 3) reflect the expected pooled testing method performance when incorporating predictions from a simple linear prediction and applying the methods to actual data from a resource-limited region.

Throughout we use a pool size of 10 samples. May et al. developed and evaluated the current most-commonly used pooled testing methods and found that pool sizes of 10 are optimal for likely settings in HIV, i.e., skewed distributions and prevalence of treatment failure from 1% to 20%.30 Multiple studies have validated the clinical feasibility of pool sizes up to 10 when using pooled testing methods in the HIV setting.31,33,34 Different settings may require the use of other pool sizes, and feasibility studies should be performed in new settings before committing to a pool size.

Although the methods are evaluated in the HIV treatment management setting, these methods can easily be applied to any setting requiring measurement of a continuous biomarker for classifying patients according to some cutoff value. An immediate potential application of these pooled testing methods could be in the current COVID-19 pandemic setting. Liu et al. suggests that SARS-CoV-2 viral load levels “…might be a useful marker for assessing disease severity and prognosis.”44 The time and resources required for testing all infected patients for SARS-CoV-2 viral load may not be feasible, but pooled testing methods may enable viral load testing for all infected patients; potentially improving prognostic accuracy and leading to improved outcomes.

Future research should focus on evaluating methods in specific settings where predictive accuracy will vary. In particular, given the results for the I10MiniPred90-Hypred methods, investigation of other methods from the HyPred class of methods that are optimized for a particular setting or strength of prediction are likely very useful. Once an optimal method is identified for a specific setting, a clinical trial should be run to assess the benefit of introducing pooled testing in each setting. For example, in the HIV treatment setting in Rakai, patients could be randomized to receive either the standard-of-care individual testing, which may be less frequent, versus MiniPred, LRSOE or a HyPred pooled method in a non-inferiority trial. If any survival/adverse event differences between arms remain within the inferiority margin over the course of the trial and the pooled methods increase testing efficiency as expected, it will be clear that prediction-driven pooled testing can be used safely. Pooled testing may be one way that more frequent HIV treatment monitoring can become affordable for all resource-limited regions.

Other avenues of future research could be the use of regression methods using pooled biomarkers, not to be confused with pooled testing methods, such as those proposed by Mitchell et al. to potentially increase predictive accuracy efficiently.45 Because our proposed methods will generate a large amount of pooled data, these methods could be used in conjunction with previous testing rounds to improve future predictions by more accurately modelling population distributions using the pooled and individual test data. Finally, in addition to the randomly arranged matrices we used to evaluate the pooled testing methods, we also explored evaluating the matrix-based methods by arranging samples in matrices based on the predictions. Although we did not find a predictive-driven matrix arrangement that improved performance of any of the methods, this is a potential area of future research to refine the matrix pooling methods.

Supplementary Material

sm

ACKNOWLEDGEMENT

The authors gratefully acknowledge the RHSP clinical cohort participants and study staff. Retrospective use of routinely collected de-identified clinical data was approved by the Uganda Virus Research Institute, Research Ethics Committee, The Institutional Review Board of Johns Hopkins University School of Medicine, and the Uganda National Council for Science and Technology.

EEG and AB were funded in part by grants from The Swedish Research Council (Vetenskapsrådet) 2017-01898, 2018-06156 and 2019-00227. SJR was funded by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health, Bethesda, MD (AI001040). JPH and SM were funded in part by NIAID grant AI029168. This research was also supported by a Center for AIDS Research grant AI036214.

Footnotes

DATA AVAILABILITY

The simulated data generated for method evaluation is openly available on Github at https://github.com/Adam-Brand/Pooled_Testing_HIV/tree/master/SimData. The data collected in the Rakai Health Sciences Program may be available on request from RHSP. The data are not publicly available due to privacy or ethical restrictions.

References

  • 1.Cohen MS, Chen YQ, McCauley M, et al. Prevention of HIV-1 infection with early antiretroviral therapy. New England Journal of Medicine 2011; 365(6): 493–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Group ISS. Initiation of antiretroviral therapy in early asymptomatic HIV infection. New England Journal of Medicine 2015; 373(9): 795–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.US Department of Health and Human Services . The Global HIV/AIDS Epidemic. https://www.hiv.gov/hiv-basics/overview/data-and-trends/global-statistics; 2019. Accessed July 31st, 2020.
  • 4.Bezabih YM, Beyene F, Bezabhe WM. Factors associated with first-line antiretroviral treatment failure in adult HIV-positive patients: a case-control study from Ethiopia. BMC Infectious Diseases 2019; 19(1): 537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Khienprasit N, Chaiwarith R, Sirisanthana T, Supparatpinyo K. Incidence and risk factors of antiretroviral treatment failure in treatment-naïve HIV-infected patients at Chiang Mai University Hospital, Thailand. AIDS Research and Therapy 2011; 8(1): 42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fätkenheuer G, Theisen A, Rockstroh J, et al. Virological treatment failure of protease inhibitor therapy in an unselected cohort of HIV-infected patients. Aids 1997; 11(14): F113–F116. [DOI] [PubMed] [Google Scholar]
  • 7.Sebunya R, Musiime V, Kitaka SB, Ndeezi G. Incidence and risk factors for first line anti retroviral treatment failure among Ugandan children attending an urban HIV clinic. AIDS research and therapy 2013; 10(1): 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Burger DM, Hoetelmans R, Hugen P, et al. Low plasma concentrations of indinavir are related to virological treatment failure in HIV-1-infected patients on indinavir-containing triple therapy. Antiviral Therapy 1998; 3(4): 215–220. [PubMed] [Google Scholar]
  • 9.Robbins GK, Daniels B, Zheng H, Chueh H, Meigs JB, Freedberg KA. Predictors of antiretroviral treatment failure in an urban HIV clinic. Journal of Acquired Immune Deficiency Syndromes (1999) 2007; 44(1): 30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ayalew MB, Kumilachew D, Belay A, et al. First-line antiretroviral treatment failure and associated factors in HIV patients at the University of Gondar Teaching Hospital, Gondar, Northwest Ethiopia. HIV/AIDS (Auckland, NZ) 2016; 8: 141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bacha T, Tilahun B, Worku A. Predictors of treatment failure and time to detection and switching in HIV-infected Ethiopian children receiving first line anti-retroviral therapy. BMC Infectious Diseases 2012; 12(1): 197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Glass TR, De Geest S, Hirschel B, et al. Self-reported non-adherence to antiretroviral therapy repeatedly assessed by two questions predicts treatment failure in virologically suppressed patients. Antiviral Therapy 2008; 13(1): 77. [PubMed] [Google Scholar]
  • 13.Ssempijja V, Nason M, Nakigozi G, et al. Adaptive viral load monitoring frequency to facilitate differentiated care: a modeling study from Rakai, Uganda. Clinical Infectious Diseases 2019; 71(4): 1017–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Pilcher CD, McPherson JT, Leone PA, et al. Real-time, universal screening for acute HIV infection in a routine HIV counseling and testing population. Journal of the American Medical Association 2002; 288(2): 216–221. [DOI] [PubMed] [Google Scholar]
  • 15.Westreich DJ, Hudgens MG, Fiscus SA, Pilcher CD. Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests. Journal of Clinical Microbiology 2008; 46(5): 1785–1792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bilder CR, Tebbs JM, Chen P. Informative retesting. Journal of the American Statistical Association 2010; 105(491): 942–955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Behets F, Bertozzi S, Kasali M, et al. Successful use of pooled sera to determine HIV-1 seroprevalence in Zaire with development of cost-efficiency models. Aids 1990; 4(8): 4. [DOI] [PubMed] [Google Scholar]
  • 18.Brookmeyer R Analysis of multistage pooling studies of biological specimens for estimating disease incidence and prevalence. Biometrics 1999; 55(2): 608–612. [DOI] [PubMed] [Google Scholar]
  • 19.Busch MP, Glynn SA, Stramer SL, et al. A new strategy for estimating risks of transfusion transmitted viral infections based on rates of detection of recently infected donors. Transfusion 2005; 45(2): 254–264. [DOI] [PubMed] [Google Scholar]
  • 20.Cahoon-Young B, Chandler A, Livermore T, Gaudino J, Benjamin R. Sensitivity and specificity of pooled versus individual sera in a human immunodeficiency virus antibody prevalence study.. Journal of Clinical Microbiology 1989; 27(8): 1893–1895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gastwirth JL, Hammick PA. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: Application to estimating the prevalence of AIDS antibodies in blood donors. Journal of Statistical Planning and Inference 1989; 22(1): 15–27. [Google Scholar]
  • 22.Hammick PA, Gastwirth JL. Group Testing for Sensitive Characteristics: Extension to Higher Prevalence Levels. International Statistical Review 1994; 62(3): 319–331. [Google Scholar]
  • 23.Hudgens MG. Rejoinder to “Reader reaction: A note on the evaluation of group testing algorithms in the presence of misclassification”. Biometrics 2016; 72(1): 304–304. [DOI] [PubMed] [Google Scholar]
  • 24.Kim HY, Hudgens MG, Dreyfuss JM, Westreich DJ, Pilcher CD. Comparison of group testing algorithms for case identification in the presence of test error. Biometrics 2007; 63(4): 1152–1163. [DOI] [PubMed] [Google Scholar]
  • 25.Kline RL, Brothers TA, Brookmeyer R, Zeger S, Quinn TC. Evaluation of human immunodeficiency virus seroprevalence in population surveys using pooled sera.. Journal of Clinical Microbiology 1989; 27(7): 1449–1452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Patterson KB, Leone PA, Fiscus SA, et al. Frequent detection of acute HIV infection in pregnant women. Aids 2007; 21(17): 2303–2308. [DOI] [PubMed] [Google Scholar]
  • 27.Pilcher CD, Fiscus SA, Nguyen TQ, et al. Detection of acute infections during HIV testing in North Carolina. New England Journal of Medicine 2005; 352(18): 1873–1883. [DOI] [PubMed] [Google Scholar]
  • 28.Quinn TC, Brookmeyer R, Kline R, et al. Feasibility of pooling sera for HIV-1 viral RNA to diagnose acute primary HIV-1 infection and estimate HIV incidence. Aids 2000; 14(17): 2751–2757. [DOI] [PubMed] [Google Scholar]
  • 29.Tu XM, Litvak E, Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika 1995; 82(2): 287–297. [Google Scholar]
  • 30.May S, Gamst A, Haubrich R, Benson C, Smith DM. Pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection. Journal of Acquired Immune Deficiency Syndromes (1999) 2010; 53(2): 194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Omooja J, Nannyonjo M, Sanyu G, et al. Rates of HIV-1 virological suppression and patterns of acquired drug resistance among fisherfolk on first-line antiretroviral therapy in Uganda. Journal of Antimicrobial Chemotherapy 2019; 74(10): 3021–3029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kim SB, Kim HW, Kim HS, et al. Pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection in Seoul, South Korea. Scandinavian Journal of Infectious Diseases 2014; 46(2): 136–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Smith DM, May SJ, Perez-Santiago J, et al. The use of pooled viral load testing to identify antiretroviral treatment failure. AIDS (London, England) 2009; 23(16): 2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Van Zyl G, Preiser W, Potschka S, Lundershausen A, Haubrich R, Smith D. Pooling strategies to reduce the cost of HIV-1 RNA load monitoring in a resource-limited setting. Clinical Infectious Diseases 2011; 52(2): 264–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kim H, Ku NS, Kim SB, et al. Simulation of pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection in Seoul, South Korea. Journal of Acquired Immune Deficiency Syndromes (1999) 2013; 62(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hanscom B Biostatistical Methods for HIV Monitoring and Prevention. PhD thesis. University of Washington, https://digital.lib.washington.edu/researchworks/handle/1773/27423; 2014. [Google Scholar]
  • 37.Brand A Evaluating New Matrix Pooled Testing Methods for Detecting HIV Treatment Failure with and without Covariate Information. Master’s thesis. University of Washington. https://digital.lib.washington.edu/researchworks/handle/1773/37043: 2016. [Google Scholar]
  • 38.Tao L, Hogan JW, Daniels MJ, et al. Improved HIV-1 viral load monitoring capacity using pooled testing with marker-assisted deconvolution. Journal of Acquired Immune Deficiency Syndromes (1999) 2017; 75(5): 580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang D, McMahan CS, Tebbs JM, Bilder CR. Group testing case identification with biomarker information. Computational Statistics & Data Analysis 2018; 122: 156–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.[dataset]Brand A Uganda_SimData_SD1.0.rds. https://github.com/Adam-Brand/Pooled_Testing_HIV/tree/master/SimData; 2020.
  • 41.[dataset]Brand A Uganda_SimData_train_SD1.0.rds. https://github.com/Adam-Brand/Pooled_Testing_HIV/tree/master/SimData; 2020.
  • 42.[dataset]Brand A Uganda_SimData_train_SD1.0_reverse.rds. https://github.com/Adam-Brand/Pooled_Testing_HIV/tree/master/SimData; 2020.
  • 43.[dataset]Brand A Uganda_SimData_SD0.rds. https://github.com/Adam-Brand/Pooled_Testing_HIV/tree/master/SimData; 2020.
  • 44.Liu Y, Yan LM, Wan L, et al. Viral dynamics in mild and severe cases of COVID-19. The Lancet Infectious Diseases 2020; 20(6): 656–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mitchell EM, Lyles RH, Manatunga AK, Danaher M, Perkins NJ, Schisterman EF. Regression for skewed biomarker outcomes subject to pooling. Biometrics 2014; 70(1): 202–211. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sm

RESOURCES