Variation of quantified infection rates of mixed samples to enhance rapid testing during an epidemic

Usama Kadri

doi:10.1080/20476965.2020.1817801

. 2020 Sep 13;10(1):24–30. doi: 10.1080/20476965.2020.1817801

Variation of quantified infection rates of mixed samples to enhance rapid testing during an epidemic

Usama Kadri ^1,^✉

PMCID: PMC7946003 PMID: 33763226

ABSTRACT

Rapid testing of appropriate samples from patients suspected for a disease during an epidemic, such as the current Coronavirus outbreak, is of a great importance for disease management and control. We propose a method to enhance processing large amounts of collected samples. The method is based on mixing samples in testing tubes (pooling) in a specific configuration, as opposed to testing single samples in each tube, and recognise infected samples from variations of the total infection rates in each tube. To illustrate the efficiency of the suggested method, we carry out numerical tests for actual scenarios under various test conditions. Applying the proposed method allows testing many more patients using the same number of testing tubes, where all positives are identified with no false negatives, and no need for independent testing, and the effective testing time can be reduced drastically even when the uncertainty in the test is relatively high.

KEYWORDS: Rapid testing, quantified infections, COVID-19

1. Introduction

The World Health Organization has declared the growing epidemic of novel coronavirus infectious disease (COVID-19), a global pandemic. The virus emerged in Wuhan, China, at the end of 2019, and as of August 19 2020, over 22 million cases were identified in 213 countries and territories, with over three-quarters of a million deaths being reported (WHO, 2020). In most countries around the world, the actual number of cases is believed to be larger than reported. The relatively low-reported number is attributed to a number of factors, including but not limited to mismanagement of the epidemic at the political level (Ruiu, 2020), high ER visit costs (Konrad, 2020), and a lack of resources that limit the number of tests drastically. For example, in the UK, 25,000 tests were carried out in the period since January 2020 and up until March 11 2020, which is equivalent to the number of tests carried out in South Korea in 2.5 days according to the World Health Organization. Thus, while in some countries, there is a (front-end) problem of sample collection, in other countries the main concern is in processing the collected samples (back-end problem). Here, we are concerned with the back-end problem, namely processing a large number of collected samples.

Rapid testing of appropriate specimens from suspected patients during an epidemic in general, and COVID-19 in particular, is of great importance for clinical management and outbreak control. The current outbreak has evoked researchers and experts from various fields to re-evaluate the feasibility of multi-sample pools, e.g., Kadri (2020a, 2020b) and Yelin et al. (2020), where samples from a number of patients are mixed together, as opposed to testing individual samples. In standard pooling (e.g., see Bilder and Tebbs (2012), Dorfman (1943) and references within), samples from a number of patients are mixed together in a single tube. If the test result is negative, all samples within must be negative (with some uncertainty). However, if the result is positive, it indicates that at least one patient is infected; in this case, each patient is then tested individually. In this work, we propose an advanced testing method where samples from each patient are mixed in multiple tubes in a unique configuration, then variation of quantified infection rates in the tubes is employed in order to calculate all possible positives, i.e., without a need for a repetition. The first part of the method is by itself powerful when the percentage of infected patients is extremely low, as long as proper mixing is done – keeping in mind the dilution threshold (due to mixing) required for identifying the disease. However, as the percentage of the infected patients increases, it becomes much more challenging to determine positives without performing new tests. To this end, an accurate quantitative approach can be employed to determine all positives without risking having false negatives. For this approach to be effective, an accurate method for quantification of PCR is required (Boulter et al., 2016). The proposed method takes into account the uncertainty in the test, which includes all technical, personal, and conceptual issues, e.g., sensitivity of the test is one important factor. Even when the uncertainty increases (i.e., for less accurate tests), all positives are still obtained with no false negatives, though false positives start to arise as well. A qualitative description of the method followed by the mathematical model and algorithms for samples’ distribution and test results analysis are presented in section 2. Numerical tests for actual scenarios under various test conditions are presented in section 3, which is followed by a discussion in section 4.

2. Variation of quantified infection rates

Testing patients for infection can be carried out using various methods and often requires a number of stages, from collecting samples to providing test results. Here, attention is focused merely on processing collected samples that can be divided into three main phases: preparation, testing, and analysis (Figure 1). In particular, the proposed method involves the preparation and analysis phases. For clarity purposes, we initially present a qualitative description of the proposed method, and later provide a detailed description of the mathematical models involved.

Figure 1. — **Processing collected samples**. Main phases to process collected samples: distribution of samples, parallel testing, and analysis of test results

2.1. Phase 1: preparation

The preparation of samples prior to testing is a subtle task, which requires the following steps (Figure 2): (a) test tubes are ordered and labelled; (b) all patients are allocated an equal number of test tubes that they share with other patients – thus each tube will comprise samples from different patients (pooling); (c) each patient is allocated a unique combination of tubes; (d) equal volumes from patients’ samples are added to the allocated tubes; (e) all tubes are filled to the same level – if required a neutral sample (say water) is added. These steps can be automated using a robot where a distribution algorithm can be integrated (section 2.5).

2.2. Phase 2: parallel testing

The proposed method does not require a specific testing technique. If infection rates are relatively low, any standard testing technique that returns positive or negative results may suffice. However, quantitative results become essential as the infection rates rise. Thus, multiple testing techniques can be employed as the pandemic progresses. The role of the quantitative results will become clear below.

2.3. Phase 3: data analysis

Tubes that return negative results indicate that all samples within are negative and thus all associated patients test negative. At low infection rates, the combination of test tubes that return positive results will directly indicate the patient whose samples were distributed to these tubes. However, at higher infection rates, a positive test result might be associated with more than one patient. In order to identify all infected patients, there is a need for quantified results. Here, the quantification technique used becomes very important. It is essential that we apply a technique that preserves the uniqueness of the infection rate of each patient. For example, if a PCR technique is used, calculations must be made at identical time frames and before a threshold has been reached (alternatively one can make use of the time duration required for a threshold to be reached). Even though we only know the total infection rate in each tube, since each patient has a unique (though unknown) infection rate, we can calculate all possible combinations of infected patients that lead to the observed test results by solving a set of simple linear algebraic equations.

2.4. Mathematical model

Let n be the number of patients (sample size), $m$ the size of the test tube set, and $l$ the size of a subset of the tube test set. Each patient sample is distributed to a different configuration of $l$ tubes. Thus, the relation between the maximum number of patients, the total number of tubes, and the number of tubes allocated to each patient is given by

n = \frac{m!}{l! (m - l)!} .

(1)

Relation (1) can be used to calculate any of the three parameters $n$ , $m$ , and $l$ , given the other two. However, the current format is useful particularly as we anticipate $l$ to be determined from dilution considerations (i.e., the maximum number of samples allowed together), whereas $m$ could be a technical limitation, e.g., of resources. Thus, given such limitations, relation (1) provides the maximum possible number of patients that can be tested simultaneously. If maximum number of mixed samples allowed in a tube due to dilution is $N_{m a x}$ , then any combination of $l$ , $m$ , and $n$ should satisfy the condition

\frac{(m - 1)!}{(l - 1)! (m - l)!} \leq N_{m a x} .

(2)

The test results in each tube $j = 1... l$ can be described by

R_{j} = \sum_{i = 1}^{n} r_{i j} δ_{i}, (j = 1... l),

(3)

where $r_{i j}$ is the contribution of the $i$ th patient to the infection rate of the $j$ th tube, and $δ_{i}$ is the delta function accounting for the true statuses of the individuals, being zero or unity for healthy or infected patients, respectively. We require all samples to be of the same volume, and to avoid unbalanced dilution, we also require that all tubes contain the identical volumes, i.e., neutral samples might need to be added to some tubes. Thus, we can now construct a set of $n$ algebraic equations

{(\begin{matrix} r_{1} & r_{1} & r_{1} & \dots & \dots & 0 & 0 \\ r_{2} & r_{2} & 0 & r_{2} & \dots & 0 & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & \dots & \dots & r_{k} & \dots & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & \dots & \dots & r_{n} & r_{n} & r_{n} \end{matrix})}^{T} (\begin{matrix} δ_{1} \\ δ_{2} \\ ⋮ \\ δ_{k} \\ ⋮ \\ δ_{n} \end{matrix}) = (\begin{matrix} R_{1} \\ R_{2} \\ ⋮ \\ R_{k} \\ ⋮ \\ R_{n}, \end{matrix}),

(4)

or for simplicity, we write $r^{T} δ = R$ , where subscript $T$ is the transpose operator. The solution vector $R$ represents the test results, and thus known with some degree of test uncertainty, $Δ R$ , which we shall account for. On the other hand, while the matrix $r$ is unknown, the distributions of the samples in the tubes are our choice, and that is simply the matrix $r$ with nonzero elements replaced by ones,

I_{i, j} = (\begin{matrix} 1 & 1 & 1 & \dots & \dots & 0 & 0 \\ 1 & 1 & 0 & 1 & \dots & 0 & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & \dots & \dots & 1 & 1 & 1 \end{matrix}),

(5)

which is the distribution matrix $(n \times l)$ that shows how samples from each patient (rows) are added to the tubes (columns). Finally, the vector $δ$ contains the information we seek, on positive and negative patients. Thus, our objective is to find $δ$ .

2.4.1. Step 1: identifying immediate negatives

The test results vector $R$ may contain zero elements (i.e., tests that return negative). For each test tube $j$ that returns zero, i.e., $R_{j} = 0$ , all patients $i$ that have $I_{i, j} = 1$ should test negative, otherwise the test tube would not have returned zero ( $R_{j} \neq 0$ ). Thus, at this phase, we are able to identify $q$ patients who test negative, with $0 \leq q \leq n$ .

2.4.2. Step 2: identifying all positives

We rewrite Equation (4), excluding the negatives identified in the previous phase as ${\tilde{r}}^{T} \tilde{δ} = \tilde{R}$ . Thus, all elements of $\tilde{R}$ are now non-zero. In order to identify all positives, we seek all possible solutions for $\tilde{δ}$ , using $(n - q)$ algebraic equations, for each possible solution $\tilde{δ}$ . Now we carry out the following programme: (i) consider all combinations that have a single positive, i.e., exactly $(n - q)$ combinations. Note that this can be the case only when elements in $\tilde{R}$ are identical, within the test uncertainty $Δ R$ . (ii) Consider all combinations with two positives, which is also easy to identify/exclude as it requires $\tilde{R}$ to have either two or three different elements; two numbers that each corresponds to one of the infected patients, and possibly a third in case they are combined in a third tube. (iii) In general, we keep adding an additional possible positive (i.e., increase the number of 1’s in $\tilde{R}$ ) and seek for a solution.

2.4.3. Step 3: identifying more negatives

All patients that were not identified to test positive in phase 2, nor negative in phase 1, have to be negative, as all positives have been identified. Thus, all patients with $i ({\tilde{δ}}_{k} = 0)$ are identified as negative.

2.5. Distribution and analysis algorithms

In order to process the collected samples effectively, we introduce distribution and analysis algorithms that can be employed to automate the process. A flow chart of the distribution and analysis algorithms is given in Figure 3, which is also listed in the following steps:

Figure 3. — Distribution and analysis flow chart

• Each patient is allocated a binary number with $l$ one’s and $(m - l)$ zero’s, e.g., (0 0 1 0 1 1 0). The location of each digit of the binary number corresponds to a single tube ( $m$ tubes in total), e.g., the first digit from left (0) corresponds to the first tube, the second digit (0) corresponds to the second tube, the third digit (1) corresponds to the third tube, etc.

• Samples from each patient are added to all tubes that correspond to digits with number “1”. For example, samples from patient (0 0 0 1 1 1) are added to tubes number 4, 5, and 6 (counting from left).

• All $m$ tubes are sent for testing. It is important that testing results are evaluated at a cycle where no saturation has been reached, so that the value of each positive test represents a summation of unique values for each positive sample in that tube. A quantitative result ( $R^{T}$ ) is obtained in the following form:

0 34.0972 159.3790 125.2818 159.3790 0 0 159.3790 \dots

• All tubes that return negative indicate that all patients that have samples in that tubes are not infected (negatives).

• Given the test results and uncertainty within, we then calculate all possible combination of solutions $\tilde{δ}$ that return $\tilde{R}$ , which when found, return all positives, and possibly additional negatives.

• In the unlikely scenario of no solution, a repeated test for all unidentified samples is required. This can be done by reshuffling samples or splitting them between other test groups.

2.6. Numerical example

An analysis code has been developed in Matlab R18a. In order to run numerical examples, the user is required to input the number of patients $n$ , number of tubes $m$ , and the number of tubes related to each patient, $l$ . The code randomly selects the infected patients, where a range of the infection percentage can be adjusted by the user, say between 5% and 15% of patients. An example of the output for a single test is presented in Figure 4. In this example, we considered 70 patients and 8 tubes. The code randomly selects infected patients as an input of the test, then evaluates all positives and negatives using the programme above. In the example, the two infected patients were successfully identified.

Figure 4. — **Numerical example**. Matlab output example of a single test. With $n = 70$ , $m = 8$ , $l = 4$ . There are two infected patients (randomly selected), that are identified successfully

3. Results

To gain more quantitative understanding of the proposed method, we performed numerical tests with actual scenarios under various conditions. We considered group sizes of 28, 56, and 70 patients that were tested using 8 tubes only (Figure 5), and group sizes of 120, 210, and 252 patients using 10 tubes – see Figure 6. Infected patients were selected randomly, with a percentage that ranged from $0.8$ to $21.43$ . Each result point is an average of a hundred test repetitions, which is important when discussing uncertainties in the tests that were allowed to be in the range $Δ R = 0.05 % \dots 30 %$ . Without loss of generality, each of the positively tested patients were allocated a random number between 0 and 220 ng, that mimics possible reading, e.g., using a PCR technique, though any other range could be equally implemented. Note that the chosen range is arbitrary and was introduced for illustration purposes, with 0 and 220 representing the lowest and highest possible readings, respectively.

Figure 5. — **Fixed number of positives**. Uncertainty analysis in tests using eight tubes $(m = 8)$ , for fixed number of randomly selected positives, ranging from $2.86 %$ to $21.43 %$ of total patients. Each calculation point is an average of 100 repetitions. Left column: percentage of negative patients successfully identified. Right column: solutions found for positive patients (note that once a solution is found $100 %$ of positives are identified). Rows top to bottom: $l = 3$ , $4$ , and $5$ , with $n = 28$ , $56$ , and $70$ , respectively

Figure 6. — **Random number of positives**. Uncertainty analysis in tests for random number of positives (randomly selected), ranging from $0.8 %$ to $5 %$ of total patients. Using 10 tubes $(m = 10)$ , and $l = 3$ , $4$ , and $5$ , corresponding to a total number of patients $n = 120$ , $210$ , and $252$ , respectively. Left plot: percentage of negative patients successfully identified. Right column: solutions found for positive patients (note that once a solution is found $100 %$ of positives are identified)

Uncertainty in the test results, $Δ R$ , depends on a number of factors among which are the accuracy and precision of the test method. The less sensitive the test the larger $Δ R$ becomes, e.g., an uncertainty of 10% is equivalent to 20 ng so that we are unable to distinguish between two readings with a difference that is less than 20 ng. Therefore, as $Δ R$ increases the size of the solved matrix increases where false solutions may appear, and thus false-positive results are expected to rise. However, the elements that comprise the true size of the matrix (which contains all positive solutions) would forever be part of any increased matrix size, and thus regardless to how large the uncertainty is, false negatives are not expected, which is extremely important for disease management and control. If the uncertainty is very small, we are always able to obtain a single solution $δ$ that determines accurately all positively and negatively tested patients. Unfortunately, current technology is associated with a degree of uncertainty that requires optimising the solution using other factors such as tube set size $m$ , and subset $l$ . The smaller the percentage of infected patients, the higher the percentage of negatives identified (Figure 5). As the uncertainty increases, the size of the subset $l$ becomes important. Having a smaller $l$ size results in a higher percentage of negatives, though the size of the group $n$ becomes smaller (Figure 6 and Equation (1)). When $m$ and $l$ increase the length of $\tilde{δ}$ becomes larger, which requires solving for a greater combination of possible solutions. If the uncertainty in the tests is large, finding a solution and determining the positives is not always possible, though if solutions are found we always determine all positives, even if there are false positives. For example, for 25,200 patients, we performed 100 independent tests with 10 tubes each, and allowed each patient samples to be distributed to 5 tubes (see right subplot of Figure 6, blue-cross curve), i.e., in each test there are $n = 252$ patients; even at high uncertainty of $32.77$ ng (14.9% error), we were able to determine all positives in 90 tests out of 100 tests that were carried out. Thus, all positives from 22,680 patients were determined. In the remaining eight tests, no solution was found and thus there is no risk for false negatives, in any of the tests. However, in this case, we are only able to determine, on average 67.67% of all negatives in all 100 tests (see left subplot of Figure 6, blue-cross curve). If a more accurate test is applicable, say with an uncertainty of 2.048 ng (~1% error), 99.6% of all negatives are identified in all 100 tests. Note that in Figure 6, the percentage of positives is variable and selected randomly to mimic an actual scenario where we do not know the exact percentage of infected patients a priori.

4. Discussion

We propose an advanced quantified pooling method for determining infected samples. The proposed method does not require implementing the traditional two-stage Dorfman approach (Dorfman, 1943) with a second stage of individual testing. Though until lab testings are carried out, to increase confidence in the proposed method, a second stage of testing might be considered, which would also enhance the overall sensitivity and specificity of the results. Moreover, for clinical use, additional factors need to be considered including sample types, viral load, prevalence, and inhibitor substances.

Although attention was focused here on enhancing lab tests for COVID-19, in particular, and epidemics in general, we believe that the proposed method can be equally implemented in a variety of lab tests. In fact, once the solutions vector $δ$ is obtained one could structure back quantities in each of the infected samples, i.e., $r = R^{T} δ$ . Such a quantitative method could be complementary to a wide range of environmental, healthcare and safety, and engineering applications, e.g., for testing contaminated water or food, spread of diseases in population and sewage, and proper concentration of substances in chemical products.

Acknowledgements

The author is grateful for M. Abu-Khalaf, A. Kadri, A. Mansour, and R. Asleh for fruitful discussions.

Additional information

No human or human data or human samples were used in the study.

Disclosure statement

The author declares that he has no competing financial interests.

References

Bilder, C. R., & Tebbs, J. M. (2012). Pooled-testing procedures for screening high volume clinical specimens in heterogeneous populations. Statistics in Medicine, 31(27), 3261–3268. 10.1002/sim.5334 [DOI] [PMC free article] [PubMed] [Google Scholar]
Boulter, N., Suarez, F. G., Schibeci, S., Sunderland, T., Tolhurst, O., Hunter, T., Hodge, G., Handelsman, D., Simanainen, U., Hendriks, E., & Duggan, K. (2016). A simple, accurate and universal method for quantification of PCR. BMC Biotechnology, 16(27). 10.1186/s12896-016-0256-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Dorfman, R. (1943). The detection of defective members of large populations. Annals of Mathematical Statistics, 1943(14), 436–440. 10.1214/aoms/1177731363 [DOI] [Google Scholar]
Kadri, U. (2020a). Enhancing the number of lab tests with a “poisoned wine” approach. Preprint At. https://vixra.org/abs/2004.0198
Kadri, U. (2020b). Variation of positiveness to enhance testing of specimens during an epidemic. medRxiv.
Konrad, W. (2020). After battling covid-19, survivors may have to fight big medical bills. CBS News, May 15.
Ruiu, M. L. (2020). Mismanagement of covid-19: Lessons learned from italy. Journal of Risk Research, 1–14. 10.1080/13669877.2020.1758755 [DOI] [Google Scholar]
WHO . (2020). Coronavirus disease 2019 (covid-19) situation report-93. World Health Organization. [Google Scholar]
Yelin, I., Aharony, N., Shaer-Tamar, E., Argoetti, A., Messer, E., Berenbaum, D., ... & Kishony, R. (2020). Evaluation of covid-19 RT-qPCR test in multi-sample pools. Clinical Infectious Diseases. 10.1093/cid/ciaa531 [DOI] [PMC free article] [PubMed]

[cit0001] Bilder, C. R., & Tebbs, J. M. (2012). Pooled-testing procedures for screening high volume clinical specimens in heterogeneous populations. Statistics in Medicine, 31(27), 3261–3268. 10.1002/sim.5334 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0002] Boulter, N., Suarez, F. G., Schibeci, S., Sunderland, T., Tolhurst, O., Hunter, T., Hodge, G., Handelsman, D., Simanainen, U., Hendriks, E., & Duggan, K. (2016). A simple, accurate and universal method for quantification of PCR. BMC Biotechnology, 16(27). 10.1186/s12896-016-0256-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0003] Dorfman, R. (1943). The detection of defective members of large populations. Annals of Mathematical Statistics, 1943(14), 436–440. 10.1214/aoms/1177731363 [DOI] [Google Scholar]

[cit0004] Kadri, U. (2020a). Enhancing the number of lab tests with a “poisoned wine” approach. Preprint At. https://vixra.org/abs/2004.0198

[cit0005] Kadri, U. (2020b). Variation of positiveness to enhance testing of specimens during an epidemic. medRxiv.

[cit0006] Konrad, W. (2020). After battling covid-19, survivors may have to fight big medical bills. CBS News, May 15.

[cit0007] Ruiu, M. L. (2020). Mismanagement of covid-19: Lessons learned from italy. Journal of Risk Research, 1–14. 10.1080/13669877.2020.1758755 [DOI] [Google Scholar]

[cit0008] WHO . (2020). Coronavirus disease 2019 (covid-19) situation report-93. World Health Organization. [Google Scholar]

[cit0009] Yelin, I., Aharony, N., Shaer-Tamar, E., Argoetti, A., Messer, E., Berenbaum, D., ... & Kishony, R. (2020). Evaluation of covid-19 RT-qPCR test in multi-sample pools. Clinical Infectious Diseases. 10.1093/cid/ciaa531 [DOI] [PMC free article] [PubMed]

PERMALINK

Variation of quantified infection rates of mixed samples to enhance rapid testing during an epidemic

Usama Kadri

ABSTRACT

1. Introduction

2. Variation of quantified infection rates