Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Oct 21;15(10):e0240775. doi: 10.1371/journal.pone.0240775

Is “no test is better than a bad test”? Impact of diagnostic uncertainty in mass testing on the spread of COVID-19

Nicholas Gray 1,*,#, Dominic Calleja 1,#, Alexander Wimbush 1,#, Enrique Miralles-Dolz 1, Ander Gray 1, Marco De Angelis 1, Elfriede Derrer-Merk 1, Bright Uchenna Oparaji 1, Vladimir Stepanov 1, Louis Clearkin 2, Scott Ferson 1
Editor: Jishnu Das3
PMCID: PMC7577497  PMID: 33085693

Abstract

Testing is viewed as a critical aspect of any strategy to tackle epidemics. Much of the dialogue around testing has concentrated on how countries can scale up capacity, but the uncertainty in testing has not received nearly as much attention beyond asking if a test is accurate enough to be used. Even for highly accurate tests, false positives and false negatives will accumulate as mass testing strategies are employed under pressure, and these misdiagnoses could have major implications on the ability of governments to suppress the virus. The present analysis uses a modified SIR model to understand the implication and magnitude of misdiagnosis in the context of ending lockdown measures. The results indicate that increased testing capacity alone will not provide a solution to lockdown measures. The progression of the epidemic and peak infections is shown to depend heavily on test characteristics, test targeting, and prevalence of the infection. Antibody based immunity passports are rejected as a solution to ending lockdown, as they can put the population at risk if poorly targeted. Similarly, mass screening for active viral infection may only be beneficial if it can be sufficiently well targeted, otherwise reliance on this approach for protection of the population can again put them at risk. A well targeted active viral test combined with a slow release rate is a viable strategy for continuous suppression of the virus.

Introduction

During the early stages of the United Kingdoms SARS-CoV-2 epidemic, the British government’s COVID-19 epidemic management strategy was been influenced by epidemiological modelling conducted by a number of research groups [1, 2]. The analysis of the relative impact of different mitigation and suppression strategies concluded that the “only viable strategy at the current time” is to suppress the epidemic with all available measures, including the lockdown of the population with schools closed [3, 4]. Similar analysis in other countries lead to over half the world population being in some form of lockdown by April 2020 and over 90% of global schools closed [5, 6]. These analyses have highlighted from the beginning that the eventual relaxation of lockdown measures would be problematic [3]. Without a considered cessation of the suppression strategies the risk of a second wave becomes significant, possibly of greater magnitude than the first as the SARS-CoV-2 virus is now endemic in the population [7, 8].

Although much attention was focused on the number of tests being conducted and the effect that testing could have in supressing the disease [911]. Not enough attention has been given to the issues of imperfect testing, beyond Matt Hancock, UK Secretary of State for Health and Social Care, stating in a press conference on 2nd April 2020 that “No test is better than a bad test” [12]. In this paper we will explore the validity of this claim.

The failure to detect the virus in infected patients can be a significant problem in high-throughput settings operating under severe pressure, with evidence suggesting that this is indeed the case [1317]. The public are rapidly becoming aware of the difference between the ‘have you got it?’ tests for detecting active cases, and the ‘have you had it?’ tests for the presence of antibodies, which imply some immunity to COVID-19. What may be less obvious is that these different tests need to maximise different test characteristics.

To be useful in ending lockdown measures, active viral tests need to maximise the sensitivity. High sensitivity reduces the chance of missing people who have the virus who may go on to infect others. There is an additional risk that an infected person who has been incorrectly told they do not have the disease, when in fact they do, may behave in a more reckless manner than if their disease status were uncertain.

The second testing approach, seeking to detect the presence of antibodies to identify those who have had the disease would be used in a different strategy. This strategy would involve detecting those who have successfully overcome the virus, and are likely to have some level of immunity (or at least reduced susceptibility to more serious illness if they are infected again), so are relatively safe to relax their personal lockdown measures. This strategy would require a high test specificity, aiming to minimise how often the test tells someone they have had the disease when they haven’t [18]. A false positive tells people they have immunity when they don’t, which may be worse than if people are uncertain about their viral history.

Evidence testing is flawed

The successes of South Korea, Singapore, Taiwan and Hong Kong in limiting the impact of the SARS-CoV-2 virus has been attributed to their ability to deploy widespread testing, with digital surveillance, and impose targeted quarantines in some cases [13]. This testing has predominantly been based on the use of reverse transcription polymerase chain reaction (RT-PCR) testing. During the 2009 H1N1 pandemic the rapid development of high sensitivity PCR assay were employed early with some success in that global pandemic [19]. These tests, when well targeted, clearly provide a useful tool for managing and tracking pandemics.

These tests form the basis of much of the research into the incidence, dynamics and comorbidities of SARS-CoV-2, but few, if any, of these studies give consideration to the impact of false test results [2024]. Increasing reliance on lower-sensitivity tests to address capacity concerns is likely to make available data on confirmed cases more difficult to accurately utilise [19]. It may be the case that false test results contribute to some of the counter-intuitive disease dynamics observed [25].

There is evidence that both active infection [2630] and antibody [3133] tests lack perfect sensitivity and specificity even in best-case scenarios. Alternative screening methods such as chest x-rays may be found to have high sensitivity based on biased data [34] or may simply perform poorly even compared to imperfect RT-PCR tests [29]. The Foundation for Innovative New Diagnostics (FIND) conducted an independent evaluation of five RT-PCR tests which scored highly out of 17 candidate tests on criteria such as regulatory status and availability [35]. Even ideal laboratory conditions can produce a specificity which could be as low as 90%, and the practical specificity is likely to be lower still.

The rapid development and scaling of new diagnostic systems invites error, particularly as labs are converted from other purposes and technicians are placed under pressure, and variation in test collection quality, reagent quality, sample preservation and storage, and sample registration and provenance. Assessing the magnitude of these errors on the performance of tests is challenging in real time. Point-of-care tests are not immune to these errors and are often seen as less accurate than laboratory based tests [36, 37].

Introduction to test statistics: What makes a ‘good’ test?

In order to answer this question there are a number of important statistics:

  • Sensitivity σ—Out of those who actually have the disease, that fraction that received a positive test result.

  • Specificity τ—Out of these who did not have the disease, the fraction that received a negative test result.

The statistics that characterise the performance of the test are computed from a confusion matrix (Table 1). We test ninfected people who have COVID-19, and nhealthy people who do not have COVID-19. In the first group, a people correctly test positive and c falsely test negative. Among healthy people, b will falsely test positive, and d will correctly test negative.

Table 1. Confusion matrix.

Infected Not Infected Total
Tested Positive a b a + b
Tested Negative c d c + d
Total a + c = ninfected b + d = nhealthy N

From this confusion matrix the sensitivity is given by (1) and the specificity by (2).

σ=aninfected (1)
τ=dnhealthy. (2)

Sensitivity is the ratio of correct positive tests to the total number of infected people involved in the study characterising the test. The specificity is the ratio of the correct negative tests to the total number of healthy people. Importantly, these statistics depend only on the test itself and do not depend on the population the test is intended to be used upon.

When the test is used for diagnostic purposes, the characteristics of the population being tested become important for interpreting the test results. To interpret the diagnostic value of a positive or negative test result the following statistics must be used:

  • Prevalence P—The proportion of people in the target population that have the disease tested for.

  • Positive Predictive Value PPV—How likely one is to have the disease given a positive test result.

  • Negative Predictive Value NPV—How likely one is to not have the disease, given a negative test result.

The PPV and NPV depend on the prevalence, and hence depend on the population you are focused on. This may an entire nation or region, a sub-population with COVID-19 compatible symptoms, or any other population you may wish to target. The PPV and NPV can be calculated using Bayes’ rule:

PPV=PσPσ+(1P)(1τ), (3)
NPV=τ(1P)τ(1P)+(1σ)P. (4)

To illustrate the impact of prevalence on PPV, for a test with σ = τ = 0.95, if prevalence P = 0.05, then the PPV = 0.5. Therefore, a positive result only indicates a 50% chance that an individual will have the disease given that they have tested positive, even though the test is highly accurate. Fig 1 shows why, for 1000 test subjects there will be similar numbers of true and false positives even with high sensitivity and specificity of 95%. In contrast, using the same tests on a sample with a higher prevalence P = 0.5 we find the PPV = 0.95, see Fig 2. Similarly, the NPV is lower when the prevalence is higher.

Fig 1. If the prevalence of a disease amongst those being tested is 0.05 then with σ = τ = 0.95 the number of false positives will outnumber the true positives, resulting in PPV = 0.5.

Fig 1

Fig 2. If the prevalence of a disease amongst those being tested is 0.50 then with σ = τ = 0.95 the number of true positives will outnumber the number of false positives, resulting in a high PPV of 0.95.

Fig 2

SIR model with testing

SIR models offer one approach to explore infection dynamics, and the prevalence of a communicable disease. In the generic SIR model, there are S people susceptible to the illness, I people infected, and R people who are recovered with immunity. The infected people are able to infect susceptible people at rate β and they recover from the disease at rate γ [38], Fig 3 shows how people move between the different states of an SIR model. Once infected persons have recovered from the disease they are unable to become infected again or infect others. This may be because they now have immunity to the disease or because they have unfortunately died.

R0=βγ (5)
δS,I=βIS (6a)
δIR=γI (6b)
ΔS=δS,I (6c)
ΔI=δS,IδI,R (6d)
ΔR=δI,R (6e)

Fig 3. Diagram for a basic SIR model.

Fig 3

The black arrows show how people move between the different states.

To explore the effect of imperfect testing on the disease dynamics when strategies testing regimes are employed to relax lockdown measures, three new classes were added to the model. The first is a quarantined susceptible state, QS, the second is a quarantined infected state, QI, and the third is people who have recovered but are in quarantine, QR, as shown in Fig 4.

Fig 4. SIRQ model used to simulated the effect of mass testing to leave quarantine.

Fig 4

The present model is similar to other SIR models that take into account the effect of quarantining regimes on disease dynamics, such as Lipsitch et al. (2003) [39] or Giordano et al. (2020) [23]. Lipsitch et al. implement quarantine in their model but do not incorporate the effects on the dynamics from imperfect testing, nor do they consider how the quality and scale of an available test affect the spread of a disease. Diagnostic uncertainty plays no part in the model they present. Likewise, Giordano et al reduce population based diagnostic strategies to two parameters which confound test capacity, test targeting, and diagnostic uncertainty. Again, they do not investigate the role that diagnostic uncertainty plays in the spread of a disease. The intent of this model is not to create a more sophisticated SIR model, but to investigate how diagnostic uncertainty affects the dynamics of an epidemic.

The model evaluates each day’s population-level state transitions. There are two possible tests that can be performed:

  • An active virus infection test that is able to determine whether or not someone is currently infectious. This test is performed on some proportion of the un-quarantined population (S + I + R). It has a sensitivity of σA and a specificity of τA.

  • An antibody test that determines whether or not someone has had the infection in the past. This is used on the fraction of the population that is currently in quarantine but not infected (QS + QR) to test whether they have had the disease or not. This test has a sensitivity of σB and a specificity of τB.

Each test is defined by a number of parameters. Testing each day is limited by the test capacity C, the maximum number of tests that can be performed each day. Each day a population N will be submitted for testing. The targeting capability of the test, T indicates the probability that an individual submitted for testing is positive, this is effectively the PPV of the initial screening effort. This results in a number of individuals M being considered for screening who are negative, of which K will be tested. Targeting must be imperfect, as if it were perfect there would be no need for testing. Unless otherwise stated, scenarios consider a default targeting of T = 0.8, representing an extremely effective screening capability that is nonetheless imperfect.

If daily testing targets are a goal regardless of the prevalence of the illness, T can be overruled to ensure NC for example. This condition is referred to as Strict Capacity and is denoted with boolean parameter G, defaulting to true for all scenarios. Tests can also be conducted periodically by changing the test interval parameter D. These default to 1, i.e. daily testing.

Each test has unique parameters, so for example test A (active virus infection test) has a targeting parameter TA whilst test B (antibody test) has TB. The parameters σ, τ, T, C, G and D define a test.

A person in any category who tests positive in an active virus test transitions into the corresponding quarantine state, where they are unable to infect anyone else. A person, in QS or QR, who tests positive in an antibody test transitions to S and R respectively. Any person within I or QI who recovers transitions to R, on the assumption that the end of the illness is clear and they will know when they have recovered.

For this parameterisation the impact of being in the susceptible quarantined state, QS, makes an individual insusceptible to being infected. Similarly, being in the infected quarantined state, QI, individuals are unable to infect anyone else. In practicality there is always leaking, no quarantine is entirely effective, but for the sake of exploring the impact of testing uncertainty these effects are neglected from the model. Other situations may require including this effect.

The SIR model used in this paper uses discrete-time binomial sampling for calculating movements of individuals between states. For a defined testing strategy these rates are defined as follows:

MA=min(S,CAI,max(0,ITAI,CAI)) (7a)
NA=min(CA,MA+I) (7b)
KA=H(MA,I,NA) (7c)
δS,QS=Bin(KA,1τA) (7d)
δI,QI=Bin(NAKA,σA) (7e)
δS,I=min(SδS,QS,Bin(I,β(SδS,QS)S+I+RδI,QIδS,QS)) (7f)
δI,R=Bin(IδI,R,γ) (7g)
MB=min(QS,CBQR,max(0,QRTBQR,CBQR)) (7h)
NB=min(CB,MB+QR) (7i)
KB=H(MA,I,NA) (7j)
δQS,S=Bin(KB,1τB) (7k)
δQI,R=Bin(QR,γ) (7l)
δQR,R=Bin(NBKB,σB) (7m)
ΔS=δQS,SδS,QSδS,I (7n)
ΔI=δS,IδI,QIδI,R (7o)
ΔR=δI,R+δQI,R+δQR,R (7p)
ΔQS=δS,QSδQS,S (7q)
ΔQI=δI,QIδQI,R (7r)
ΔQR=δQR,R (7s)

In Eq 7, Bin(n, p) refers to a binomial distribution with count n and rate p, H(n, k, m) refers to a hypergeometric distribution with populations n and k and a sample size m.

The model must be initialised with a defined population split between the six states. At each time step t, the model calculates the number of persons moving between each state in the order defined above. The use of binomial and hypergeometric sampling was prompted by a desire to incorporate aleatory uncertainty in each movement. The current approach does not account for epistemic uncertainty, fixing the model parameters σ, τ, C, T and D. A discrete time model was selected to allow for comparisons against available published data detailing recorded cases and recoveries on a day-by-day basis.

If the tests were almost perfect, then we can imagine how the epidemic would die out very quickly by either widespread infection or antibody testing with a coherent management strategy. A positive test on the former and the person is removed from the population, and positive test on the latter and the person, unlikely to contract the disease again, can join the population.

More interesting are the effects of incorrect test results on the disease dynamics. If someone falsely tests positive in the antibody test, they enter the susceptible state. Similarly, if an infected person receives a false negative for the disease they remain active in the infected state and hence can continue the disease propagation and infect further people.

What part will testing play in relaxing lockdown measures?

In order to explore the possible impact of testing strategies on the relaxation of lockdown measures several scenarios have been analysed. These scenarios are illustrative of the type of impact, and the likely efficacy of a range of different testing configurations.

  • Immediate end to lockdown scenario: This baseline scenario is characterised by a sudden relaxation of lockdown measures.

  • Immunity passports scenario: A policy that has been discussed in the media [4042]. Analogous to the International Certificate of Vaccination and Prophylaxis, antibody based testing would be used to identify those who have some level of natural immunity.

  • Incremental relaxation scenario: A phased relaxation of lockdown is the most likely policy that will be employed. To understand the implications of such an approach this scenario has explored the effect of testing capacity and test performance on the possible disease dynamics under this type of policy. Under the model parameterisation this analysis has applied an incremental transition rate from the QS state to the S state, and QR to R.

Whilst the authors are sensitive to the sociological and ethical concerns of any of these approaches, the analysis presented is purely on the question of efficacy.

For the purpose of the analysis we have selected a population similar in size to the United Kingdom, 6.7 × 107 people, β and γ were set to 0.32 and 0.1 respectively, this was ensure that R0 value of the model was broadly in line with other models [43, 44].

Immediate end to lockdown scenario

Under the baseline scenario, characterised by the sudden and complete cessation of lockdown measures, we explored the impact of infection testing. Under this formulation the initial conditions of the model in this scenario is that the all of the population in QS transition to S in the first iteration. The impact of infection testing under this scenario was analysed in Fig 5 using the parameters shown in Table 2.

Fig 5. A comparison of different infection test sensitivities σA shown from red to blue.

Fig 5

Three different infection test capacities are considered. Left: test capacity = 1 × 105. Centre: test capacity = 1.5 × 105. Right: test capacity = 2 × 105. Top: The number of infected individuals (I + QI population) over 100 days. Bottom: The proportion of the population that has been released from quarantine (S + I + R population) over 100 days. Model parameters are shown in Table 2.

Table 2. Fixed parameters used for Fig 5 analysis.

Antibody tests were disabled for this analysis.

Model Parameters
σA τA TA CA GA β γ
- 0.9 0.8 - True 0.32 0.1
Initial Population split
Population S I R QS QI QR
6.7 × 107 0.984 0.01 0.001 0 0.004 0.001

These scenarios consider the impact of attempts to control the disease through increased testing capacity and a more sensitive test. A test capacity range between 1 × 105 and 2 × 105 was considered as representative of the capabilities of a country such as the UK. To illustrate the sensitivity of the model to testing scenarios an evaluation was conducted with a range of infection test sensitivities, from 50% (i.e of no diagnostic value) to 98%. The specificity of these tests has a negligible impact on the disease dynamics in these scenarios. A false positive would mean people are unnecessarily removed from the susceptible population, but the benefit of a reduction in susceptible population is negligibly small.

As would be expected the model indicates a second wave is an inevitability and as many as 20 million people could become infected within 30 days. A high-sensitivity test has little impact beyond quarantining a slightly higher percentage of the population if capacities are low. At higher capacities this patterns remains, though peak infection counts are marginally reduced. Overall it is clear that reliance on infection testing, even with a highly sensitive test and high capacities, is not enough to prevent widespread infection.

Immunity passports scenario

The immunity passport is an idiom describing an approach to the relaxation of lockdown measures that focuses heavily on antibody testing. Wide-scale screening for antibodies in the general population promises significant scientific value, and targeted antibody testing is likely to have value for reducing risks to NHS and care-sector staff, and other key workers who will need to have close contact with COVID-19 sufferers. The authors appreciate these other motivations for the development and roll-out of accurate antibody tests. This analysis however focuses on the appropriateness of this approach to relaxing lockdown measures by mass testing the general population. Antibody testing has been described as a ‘game-changer’ [45]. Some commentators believe this could have a significant impact on the relaxation of lockdown measures [41], but others note that there are severe ethical, logistical and medical concerns which need to be resolved before antibody testing could support a strategy such as this [46].

Much of the discussion around antibody testing in the media has focused on the performance and number of these tests. The efficacy of this strategy however is far more dependent on the prevalence of antibodies (seroprevalence) in the general population. Without wide-scale antibody screening it is impossible to know the seroprevalence in the general population, so there is scientific value in such an endeavour. However, the seroprevalence is the dominant factor to determine how efficacious antibody screening would be for relaxing lockdown measures.

Presumably, only people who test positive for antibodies would be allowed to leave quarantine. The more people in the population with antibodies, the more people will get a true positive, so more people would be correctly allowed to leave quarantine (under the paradigms of an immunity passport).

The danger of such an approach are false positives. We demonstrate the impact of people reentering the susceptible population who have no immunity. We assume their propensity to contract the infection is the same as those without the false sense of security a positive test may engender. On an individual basis, and even at the population level, behavioural differences between those with false security from a positive antibody test, versus those who are uncertain about their viral history could be significant. The model parametrisation here does not include this additional confounding effect.

To simulate the seroprevalence in the general population the model is preconditioned with different proportions of the population in the QS and QR states. This is analogous to the proportion of people that are currently in quarantine who have either had the virus and developed some immunity, and the proportion of the population who have not contracted the virus and have no immunity. Of course the individuals in these groups do not really know their viral history, and hence would not know which state they begin in. The model evaluations explore a range of sensitivity and specificities for the antibody testing. These sensitivity and specificities, along with the capacity for testing, govern the transition of individuals from QR to R (true positive tests), and from QS to S (false positive tests).

Figs 6 and 7 show the results of the model evaluations, the parameters for these runs are shown in Tables 3 and 4. The top row of each figure corresponds to the number of infections in time, the bottom row of each figure is the proportion of the population that are released from quarantine and hence are now in the working population. Maximising this rate of reentry into the population is of course desirable, and it is widely appreciated that some increase in the numbers of infections is unavoidable. The desirable threshold in the trade-off between societal activity and number of infections is open to debate.

Fig 6. A comparison of different antibody test sensitivities σB, with varying levels of seroprevalence (P0).

Fig 6

Top: The number of infected individuals (I + QI population) over one year. Bottom: The proportion of the population that has been released from quarantine (S + I + R population) over one year. Model parameters are shown in Table 3.

Fig 7. A comparison of different antibody test specificities τB shown from left to right, with varying levels of seroprevalence (P0) shown from red to blue.

Fig 7

Top: The number of infected individuals (I + QI population) over one year. Bottom: The proportion of the population that has been released from quarantine (S + I + R population) over one year. Model parameters are shown in Table 4.

Table 3. Fixed parameters used for Fig 6 analysis.

Infection tests were disabled for this analysis.

Model Parameters
σB τB TB CB GB β γ
- 0.9 0.8 2 × 105 True 0.32 0.1
Initial Population split
Population S I R QS QI QR
6.7 × 107 0.035 0.01 0.001 0.95(1 − P0) 0.004 0.95P0

Table 4. Fixed parameters used for Fig 7 analysis.

Infection tests were disabled for this analysis.

Model Parameters
σB τB TB CB GB β γ
0.9 - 0.8 2 × 105 True 0.32 0.1
Initial Population split
Population S I R QS QI QR
6.7 × 107 0.035 0.01 0.001 0.95(1 − P0) 0.004 0.95P0

Each of the plots in Figs 6 and 7 show the effect of different seroprevalence in the population. To be clear, this is the proportion of the population that has contracted the virus and recovered but are in quarantine. The analysis has explored a range of seroprevalence from 0.1% to 50%. Fig 6 explores the impact of a variation in sensitivity, from a test with 50% sensitivity to tests with a high sensitivity of 98%.

It can be seen, considering the top row of Fig 6, that the sensitivity of the test has no discernible impact on the number of infections. The seroprevalence entirely dominates. This is possibly counter intuitive, but as was discussed above, even a highly accurate test produces a very large number of false positives when seroprevalence is low. In this case that would mean a large number of people are allowed to re-enter the population, placing them at risk, with a false sense of security that they have immunity.

The bottom row of Fig 6 shows the proportion of the entire population leaving quarantine over a year of employing this policy. At low seroprevalence there is no benefit to better performing tests. This again may seem obscure to many readers. If you consider the highest seroprevalence simulation, where 50% of the population have immunity, higher sensitivity tests are of course effective at identifying those who are immune, and gets them back into the community much faster.

A more concerning story can be seen when considering the graphs in Fig 7. Now we consider a range of antibody test specificities. Going from 50% to 98%. Low specificities (τ < 0.9) lead to extreme second peaks, and could possibly lead to more. This is due to the progressive release of false-positives from the quarantined population, which eventually swells the susceptible population to a size where the infection count can resume exponential growth. High specificities avoid this at the cost of a prolonged lockdown, which is naturally limited by the lower false-positive rate. Clearly some means of release beyond immunity passports would be required to avoid this scenario. Notably, a reasonably specific test (τB = 0.9) is capable of restraining a second peak to reasonably low levels regardless of seroprevalence. This may allow for other means of reducing lockdown measures, though with very low seroprevalence this could still be a potentially risky strategy. The dangers of neglecting uncertainties in medical diagnostic testing are pertinent to this decision [47].

Incremental relaxation scenario

Considering the above, some form of incremental relaxation of lockdown seems appropriate. This could take many forms, it could be an incremental restoration of certain activities such as school openings, permission for the reopening of some businesses, the relaxation of stay-at-home messaging, etc. Under the parameterisation chosen for this analysis the model is not sensitive to any particular policy change. We consider a variety of rates of phased relaxations to quarantine. To model these rates we consider a weekly incremental transition rate from QS to S, and QR to R. In Fig 8, three weekly transition rates have been applied: 1%, 5% and 10% of the quarantined population. Whilst in practice the rate is unlikely to be uniform as decision makers would have the ability to update their timetable as the impact of relaxations becomes apparent, it is useful to illustrate the interaction of testing capacity and release rate.

Fig 8. Total active infections each day over the year after relaxing lock-down, under different testing intensities (columns) and various epidemiologic conditions.

Fig 8

The per-day testing capacity is varied across the five columns of graphs. Rate, the percentage of the initial quarantined population being released each week is varied among rows. The prevalence of infections in the tested population is varied among different colours. To facilitate comparison within each column of graphs, the gray curves show the results observed for other Rates and Prevalences with the same testing intensity. Model parameters are shown in Table 5.

The model simulates these rates of transition for a year, with a sensitivity and specificity of 90% for active virus tests. The specifics of all the runs are detailed in Table 5. Fig 8 shows five analyses, with increasing capacity for the active virus tests. In each, the 3 incremental transition rates are applied with a range of targeting capabilities. The value of 0.8 used previously represents an unrealistically extreme case of effective targeting. The PPV, as discussed above, has a greater dependence on the prevalence (at lower values) in the tested population than it does on the sensitivity of the tests, the same is true of the specificity and the NPV.

Table 5. Fixed parameters used for Fig 8 analysis.

Model Parameters
σA τA TA CA GA DA β γ
0.9 0.9 - - True 1 0.32 0.1
σB τB TB CB GB DB
1 0 0 Rate × Population True 7
Initial Population split
Population S I R QS QI QR
6.7 × 107 0.034 0.01 0.001 0.95 0.004 0.001

It is important to notice that higher test capacities cause a higher peak of infections for higher release rates. This has a counterintuitive explanation. When there is the sharpest rise in the susceptible population (i.e., high rate of transition), the virus rapidly infects a large number of people. When these people recover after around two weeks they become immune and thus cannot continue the spread of the virus. However, when the infection testing is conducted with a higher capacity up to 150,000 units per day, these tests transition some active viral carriers into quarantine, so the peak is slightly delayed providing more opportunity for those released from quarantine later to be infected, leading to higher peak infections. This continues until the model reaches effective herd immunity after which the number of infected in the population decays very quickly. Having higher testing capacities delays but actually has the potential to worsen the peak number of infections.

At 10% release rate, up to a capacity of testing of 150,000 these outcomes are insensitive to the prevalence of the disease in the tested population. This analysis indicates that the relatively fast cessation of lockdown measures and stay-home advice would lead to a large resurgence of the virus. Testing capacity of the magnitude stated as the goal of the UK government would not be sufficient to flatten the curve in this scenario.

The 1% release rate scenario indicates that a slow release by itself is sufficient to lower peak infections, but potentially extends the duration of elevated infections. The first graph of the top row in Fig 8 shows that the slow release rate causes a plateau at a significantly lower number of infections compared to the other release rates. Poorly targeted tests at capacities less than 100,000 show similar consistent levels of infections. However, with a targeted test having a prevalence of 30% or more, the 1% release rate indicates that even with 50,000 tests per day continuous suppression of the infection may be possible.

At the rate of 5% of the population in lock-down released incrementally each week the infection peak is suppressed compared to the 10% rate. The number of infections would remain around this level for a significantly longer period of time, up to 6 months. There is negligible impact of testing below a capacity of 100,000 tests. However, with a test capacity of 150,000 tests the duration of the elevated levels of infections could be reduced if the test is extremely well targeted (TA = 0.7), reducing the length of necessary wide-scale lockdown. If this level of targeting is not achieved, increasing capacity may again increase peak infections, so care must be taken to ensure a highly targeted testing strategy.

Conclusions

This analysis does support the assertion that a bad test is potentially worse than no tests, but a good test is only effective in a carefully designed strategy. More is not necessarily better and over estimation of the test accuracy could be extremely detrimental.

This analysis is not a prediction; the numbers used in this analysis are estimates and the SIRQ model used is unlikely to be detailed enough to inform policy decisions. As such, the authors are not drawing firm conclusions about the absolute necessary capacity of tests. Nor do they wish to make specific statements about the necessary sensitivity or specificity of tests or the recommended rate of release from quarantine. The authors do, however, propose some conclusions that would broadly apply when testing and quarantining regimes are used to suppress epidemics, and therefore believe they should be considered by policy makers when designing strategies to tackle COVID-19.

  • Diagnostic uncertainty can have a large effect on the dynamics of an epidemic. And, sensitivity, specificity, and the capacity for testing alone are not sufficient to design effective testing procedures. Policy makers need to be aware of the accuracy of the tests, the prevelence of the disease at increased granularity and the characteristics of the target population, when deciding on testing strategies.

  • Caution should be exercised in the use of antibody testing. Assuming that the prevalence of antibodies is low, it is unlikely antibody testing at any scale will support the end of lockdown measures. And, un-targeted antibody screening at the population level could cause more harm than good.

  • Antibody testing, with a high specificity may be useful on an individual basis, it has scientific value, and could reduce risk for key workers. But any belief that these tests would be useful to relax lockdown measures for the majority of the population is misguided.

  • The incremental relaxation to lockdown measures, with all else equal, would significantly dampen the increase in peak infections, by 1 order of magnitude with a faster relaxation, and 2 orders of magnitude with a slower relaxation.

  • As the prevelence of the disease is suppressed in different regions, it may be the case that small spikes in cases could be the result of false positives. This problem is potentially exacerbated by increased testing in localities in response to small increases in positive tests. Policy decisions that depend on small changes in the number of positive tests may, therefore, be flawed.

  • For infection screening to be used to relax quarantine measures the capacity needs to be sufficiently large but also well targeted to be effective. For example this could be achieved through effective contact tracing. Untargeted mass screening at any capacity would be ineffectual and may prolong the necessary implementation of lockdown measures.

Epidemiological models used for policy making in real time will need to take into account the impact of diagnostic uncertainty of testing, as well as the dynamical behaviour and sensitivity analyses of modelled parameters in an appropriately complex model that may need to include quarantining, contact tracing and other surveillance strategies, test availability and targeting, and multiple subpopulations of susceptible, infected and recovered categories.

Data Availability

https://github.com/Institute-for-Risk-and-Uncertainty/SIRQ-imperfect-testing.

Funding Statement

This work has been partially funded through the following grants: UK Engineering and Physical Science Research Council (EPSRC) IAA exploration award with grant number EP/R511729/1 (NG, AW, SF, MDA) EPSRC and Economic and Social Research Council (ESRC) Centre for Doctoral Training in Quantification and Management of Risk and Uncertainty in Complex Systems and Environments, EP/L015927/1 (AW, DC, EMD, AG, VS, EDM) UK Medical Research Council (MRC) “Treatment According to Response in Giant cEll arTeritis (TARGET)“, MR/N011775/1 (BUO, LC, SF) EPSRC programme grant “Digital twins for improved dynamic design”, EP/R006768/1 (NG, MDA, SF) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors would like to thank EPSRC, ESRC and MRC for their continued support.

References

Decision Letter 0

Jishnu Das

7 Jul 2020

PONE-D-20-13370

"No test is better than a bad test'': Impact of diagnostic uncertainty in mass testing on the spread of COVID-19

PLOS ONE

Dear Dr. Gray,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The manuscript is timely, and the reviewers agreed with the overall conclusions of the manuscript. However, they raised a few key concerns that should be addressed in a revised manuscript.

Please submit your revised manuscript by Aug 21 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jishnu Das, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide  the full details of your models in  your Methods section, and not as Supplementary files; and ensure that all parameters have been described in sufficient detail to meet our reproducibility criteria.

3. Thank you for stating the following in your Competing Interests section: 'NO'

a. Please complete your Competing Interests statement to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

b. This information should be included in your cover letter; we will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

5. Please ensure that you refer to Figure 3 in your text as, if accepted, production will need this reference to link the reader to the figure.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this manuscript, the authors set out to explore the important issue of the coupled impact of diagnostic uncertainty, limited testing capacity and various quarantine relaxation strategies on the overall spread of COVID19. Definitions of standard diagnostic quality parameters (e.g. sensitivity, specificity, PPV, NPV) are first reproduced in detail. This is followed by the presentation of a modified simple SIR model including quarantine states. This is then used to model three quarantine relaxation scenarios, using assumed numbers and parameters. Notably, no comparison with real world data is presented. A final conclusion is eventually drawn in favor of slow release of quarantine and targeted use of imperfect tests.

Overall, this manuscript does present some interesting and potentially useful analysis and conclusions. However a significant body of COVID19 epidemiological modeling literature is now available which also considers some of the same questions. This work can thus be significantly improved by placing it better in the context of the existing literature and clearly highlighting the unique aspects of its analysis and conclusions. Some specific comments and suggestions for improvement are below.

Major comments:

1. The title of the article seems to make a provocative claim that could be interpreted as – performing no diagnostic testing at all would be more beneficial than using imperfect diagnostic testing. Given that this is not what is proven by the analysis shown in the rest of the manuscript, this reviewer would strongly recommend modifying the title to drop this rather misleading claim.

2. This article would benefit significantly from placing it in the context of existing COVID19 epidemiological modeling literature. Much more detailed models (Lipsitch et al – DOI: 10.1126/science.1086616; Giordano et al – DOI: 10.1038/s41591-020-0883-7 etc) along with comparisons to real world data are available now. How and why is the arguably more simplistic model presented here still valuable needs to be clearly justified.

3. A critical aspect of almost all current quarantine strategies is isolation not only of those who test positive for the virus but tracing and isolation of their close contacts as well. The effect of this is neglected here. Can the authors comment on how this might affect their conclusions?

4. A large amount of real world data is now available about COVID19. Can the authors test their analysis and conclusions using any of these data sets? The robustness of their conclusions can be significantly improved if even a partial comparison is presented.

5. In lines 280-281, a 10-fold lower viral testing capacity (10,000/day) is used compared to that (100,000/day) used earlier in the article without providing a justification. As of now, it seems the 100,000 number has in fact been surpassed in the UK. Does this change conclusions?

6. An assumed UK context is implicitly used at a number of places in the article. For clarity, these should be explicitly stated – including all numbers of population or parameters used that depend on this context.

Minor comments/suggestions:

1. The amount of detail in which textbook definitions of diagnostic quality parameters is reproduced here – while potentially useful to the lay reader – is not necessary. Citations to relevant texts or other sources can suffice.

2. Similarly, Figure 1 and 2 – to the extent that they are needed at all – would benefit from being converted to a plot showing variation of PPV with prevalence instead.

2. In general, in this reviewer’s opinion, the authors here adopt a more journalistic or colloquial tone in their writing than is usual in scientific literature. A significant fraction of the citations are from popular media as well. All of this only ends up distracting the reader from the scientific content. This is avoidable and can be easily rectified.

Reviewer #2: The developed modified SIR model presents a simple yet powerful model of the dynamics of susceptible, infected and recovered proportions of the population in quarantine or active. Three useful scenarios were tested, and several intuitive or interesting dynamical behaviors were reported. Overall, with some improvements, this manuscript can become more accessible and impactful:

- The availability of the model implementation codes (which platform/language?) would improve the impact by enabling the testing of new strategies for the relaxation of current social distancing measures outside the 3 tested scenarios by the readers. Additionally, use of tables to fully report the parameters that are kept constant in each scenario/plot are required for reproducibility of the results.

- While the temporal plots and the choice of parameters were selected wisely to emphasize the key points of the paper and interesting dynamics, the complex interactions of the key variables of the model call for more rigorous analysis to fully capture the nonlinear dynamics. As hinted in the top left two panels of Figure 6, nonlinear dynamics are observed such as oscillatory and dampening dynamics. These call for additional rigorous analysis such as sensitivity analysis to key parameters of the system in each scenario, or parameter sweeps, phase portraits, or depicting the phenotypic spaces (e.g. key dynamical behaviors in the two-dimensional space of p and tau_B).

- Minor edits/typo:

o Typo in text line 128, PPV=0.95, 0.8 is not correct. (as also noted in Figure caption).

o Although expected from the definition of Bayes formula, I think it would be beneficial to emphasize from the beginning that prevalence means different things in the viral and antibody tests.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Sepideh Dolatshahi

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 21;15(10):e0240775. doi: 10.1371/journal.pone.0240775.r002

Author response to Decision Letter 0


28 Aug 2020

We would like to thank you, and the reviewers, for considering our manuscript and suggesting revisions among their helpful comments.

All additional journal requirements have been met.

Reviewer #1 - Comment 1. The title of the article seems to make a provocative claim that could be interpreted as – performing no diagnostic testing at all would be more beneficial than using imperfect diagnostic testing. Given that this is not what is proven by the analysis shown in the rest of the manuscript, this reviewer would strongly recommend modifying the title to drop this rather misleading claim.

Reviewer 1 (R1) takes issue with the title of the manuscript, arguing it is provocative and that the claim “no test is better than a bad test” isn’t borne out in the conclusions of the paper. The title of this manuscript is a direct quote from Matt Hancock, UK Secretary of State for Health and Social Care, and the manuscript itself is intended to investigate this statement. That this has been interpreted as a statement by the authors is a mistake on our part, and we have amended the title and the manuscript to make it more clear that this is a contested statement that we are intending to investigate. Broadly, we do suggest that bad tests are potentially counterproductive if they are used to justify the removal of measures that prevent the spread of a disease, and the analysis bears this out. We do not suggest that no testing at all is the ideal approach, and have made this clear in the revision.

Reviewer #1 - Comment 2. This article would benefit significantly from placing it in the context of existing COVID19 epidemiological modeling literature. Much more detailed models (Lipsitch et al – DOI: 10.1126/science.1086616; Giordano et al – DOI: 10.1038/s41591-020-0883-7 etc) along with comparisons to real world data are available now. How and why is the arguably more simplistic model presented here still valuable needs to be clearly justified.

We believe R1 has not fully understood the research questions we are trying to explore and in this confusion asks for further justification of the simple model we present against more detailed SIR models that also include the dynamics of diagnosis and quarantine strategies. R1 suggests two papers, Lipsitch et al. (2003) and Giordano et al. (2020), as having better models than the one that we employ, however neither of these models would be able to answer the question we are trying to answer.

Lipsitch et al. implement quarantine in their model but do not incorporate the effects on the dynamics from imperfect testing, nor do they consider how the quality and scale of an available test affect the spread of a disease. Diagnostic uncertainty plays no part in the model they present. Likewise, Giordano et al. reduce diagnosis to two parameters, ε and θ, which confound test capacity, test targeting, and diagnostic uncertainty. Again, they do not investigate the role that diagnostic uncertainty plays in the spread of a disease. The analysis presented in our manuscript could be considered an in-depth look into these specific parameters using a simpler model than the SIDARTHE model used by Giordano et al. The intent of our paper is not to create a more sophisticated SIR model, but to investigate how diagnostic uncertainty affects the dynamics of an epidemic.

The value of our simple model is in demonstrating the impact of a higher-quality test (i.e. one with greater sensitivity and specificity) with better targeting (prevalence in the latter case). If such an approach could be incorporated into a more sophisticated model such as SIDARTHE this could certainly prove more informative, but the paper’s contribution is the demonstration that diagnostic uncertainty can have significant effect on epidemics. We have added text that more clearly spells out the contributions and its value within the manuscript.

Reviewer #1 - Comment 3. A critical aspect of almost all current quarantine strategies is isolation not only of those who test positive for the virus but tracing and isolation of their close contacts as well. The effect of this is neglected here. Can the authors comment on how this might affect their conclusions?

R1 notes that contact tracing is not directly included within our model and asks us to comment on what effect adding it would have to the conclusion. Although we accept that contact tracing is an important factor in controlling the spread of the disease, it is not something that we have directly included in our model. A good contact tracing strategy is something that would increase the prevalence of the tested population in Figure 7, but studying the issue would require significant changes to the model due to the fact that isolating an individual for a period of time requires knowing which of those have been isolated as a result of proximity to an infected individual. This could be modelled without an agent based model, but would require a separate state in the system to represent ‘susceptible-isolated’. This would be a more sophisticated model, and would indeed be an interesting additional factor to consider. But our model is intended to demonstrate the impact of diagnostic uncertainty, and adding this degree of complexity at this point is unlikely to impact the findings that diagnostic uncertainty has a significant effect on disease dynamics. We have added additional text to the paper to explain this.

Reviewer #1 - Comment 4. A large amount of real world data is now available about COVID19. Can the authors test their analysis and conclusions using any of these data sets? The robustness of their conclusions can be significantly improved if even a partial comparison is presented.

R1 also asks whether the model could be updated to be compared to real-world data sets, this is something we explored when developing the model but decided was not required. We made this decision as the model is not intended to be a true reflection of the dynamics of the COVID-19 epidemic. The model is intended to demonstrate the impact of diagnostic uncertainty on the spread of an epidemic and show that such effects are non-negligible, which we feel it achieves. Additionally, any calibration of the model that was performed on the data that was available in April when the paper was written is likely to be out of date by now (August) and any calibration performed now would be out of date at the time of the paper’s publication. Neither of these calibrations would affect the conclusions of the paper. We have added text when introducing the model as well as in the conclusions section to explain this point.

Reviewer #1 - Comment 5. In lines 280-281, a 10-fold lower viral testing capacity (10,000/day) is used compared to that (100,000/day) used earlier in the article without providing a justification. As of now, it seems the 100,000 number has in fact been surpassed in the UK. Does this change conclusions?

Test capacities have been aligned across the different scenarios to aid comparison. The impact of increasing the test capacity on the affected figures is negligible, though naturally a higher rate of testing affects the dynamics quite significantly. The general conclusion from Figure 6 was that sensitivity doesn’t affect the peak infection count for the immunity passports scenario, though it did allow for a more rapid rate of release from quarantine, and this holds with the updated figure though now a second peak is evident in all cases of low prevalence (Prev<0.01). Figure 7 changes more significantly, as now a second peak is evident in more cases. But again, the general conclusion that a low specificity test for ‘immunity passports’ has the potential to exacerbate peak infections unless prevalence is already extraordinarily high still holds, which an increased test capacity only makes more pronounced. The figure still supports great caution in using antibody testing to justify easing lockdown measures.

Reviewer #1 - Comment 6. An assumed UK context is implicitly used at a number of places in the article. For clarity, these should be explicitly stated – including all numbers of population or parameters used that depend on this context.

We believe that the manuscript holds more potential when considered beyond the scope of the UK, and we do agree that there is a strong implicit UK context throughout. We have attempted to relieve this somewhat, and have made assumptions of populations and prevalence etc. more explicit. We have included tables of parameters for each analysis case for reproducibility.

Reviewer #1 - Minor Comment 1. The amount of detail in which textbook definitions of diagnostic quality parameters is reproduced here – while potentially useful to the lay reader – is not necessary. Citations to relevant texts or other sources can suffice.

R1 takes issue with the amount of review material in the paper, something about which we are acutely self-conscious. But readers of the preprint have praised the fact that PPV and NPV are so clearly explained, and we believe this is essential to motivate the implications of the uncertainty about the model parameters. For instance, a journal club at Manchester University including Paul Klapper, Professor of Clinical Virology, strongly lauded the clarity of re-stating the definitions of these terms which are so important to the intent of the manuscript.(https://youtu.be/IvHYzuKZFRs?t=1819 ) We feel this is a reasonable justification to retain the explanation of these terms which amounts to less than a page of the text.

Reviewer #1 - Minor Comment 2a. Similarly, Figure 1 and 2 – to the extent that they are needed at all – would benefit from being converted to a plot showing variation of PPV with prevalence instead.

Finally, R1 believes that Figures 1 and 2 would benefit from being converted to a plot showing variation of PPV with prevalence instead. Figures 1 & 2 mirror the pedagogical approach of Gigerenzer in presenting Bayes’ rule in terms of natural frequencies which are a highly effective means of conveying conditional probabilities (https://doi.org/10.1136/bmj.d6386). A graph could not do this. Again, we appreciate that there is a lot of review information in the manuscript, but we believe this significantly improves its readability and interpretability of our findings.

Reviewer #1 - Minor Comment 2b. In general, in this reviewer’s opinion, the authors here adopt a more journalistic or colloquial tone in their writing than is usual in scientific literature. A significant fraction of the citations are from popular media as well. All of this only ends up distracting the reader from the scientific content. This is avoidable and can be easily rectified.

We agree that the tone of the paper takes a more colloquial tone, and that popular media citations were frequent and distracting. We have taken pains to remove all of these citations other than those which we felt were important for the context of the manuscript. Tonally we feel that the manuscript achieves the desired purpose, and again we may point to the digestibility noted by the Manchester University journal club as supportive of the approach taken.

Reviewer #2 - Comment 1: The availability of the model implementation codes (which platform/language?) would improve the impact by enabling the testing of new strategies for the relaxation of current social distancing measures outside the 3 tested scenarios by the readers. Additionally, use of tables to fully report the parameters that are kept constant in each scenario/plot are required for reproducibility of the results.

R2 also asks us to make the model available to readers of the paper and to show what parameters were used to generate the figures in the paper. We have made the implemented code available via Github (https://github.com/Institute-for-Risk-and-Uncertainty/SIRQ-imperfect-testing) and introduced tables describing the parameter values employed.

Reviewer #2 - Comment 2: While the temporal plots and the choice of parameters were selected wisely to emphasize the key points of the paper and interesting dynamics, the complex interactions of the key variables of the model call for more rigorous analysis to fully capture the nonlinear dynamics. As hinted in the top left two panels of Figure 6, nonlinear dynamics are observed such as oscillatory and dampening dynamics. These call for additional rigorous analysis such as sensitivity analysis to key parameters of the system in each scenario, or parameter sweeps, phase portraits, or depicting the phenotypic spaces (e.g. key dynamical behaviors in the two-dimensional space of p and tau_B).

Reviewer 2, Sepideh Dolatshahi (R2), thought the model in the paper was “simple yet powerful”, and she asks for more analysis to be performed on the model to fully capture the non-linear dynamics shown as well as a sensitivity analysis to explore the relationships between key parameters. This is a very interesting point, and analysis of this aspect of epidemiological dynamics would certainly be a very fruitful area of research when considering future strategies and modelling. However, the intent of this paper was to demonstrate that diagnostic uncertainty can have a significant impact on disease dynamics when used to justify quarantine and release strategies. Although rigorous analysis of the dynamics of this model is not within the scope of the current paper, we have added text in the conclusions mentioning the need for dynamical analysis and sensitivity analysis in future modelling.

Reviewer #2 - Minor Comment 1: Typo in text line 128, PPV=0.95, 0.8 is not correct. (as also noted in Figure caption).

This has been resolved, the figures were indeed incorrect.

Reviewer #2 - Minor Comment 2: Although expected from the definition of Bayes formula, I think it would be beneficial to emphasize from the beginning that prevalence means different things in the viral and antibody tests.

We agree that this is the cause of some confusion. We have altered the terminology to use ‘seroprevalence’ when referring to the presence of antibodies in the population.

We thank the reviewers for their thoughtful comments. We feel the changes made to respond to their suggestions have significantly improved the manuscript, which we are pleased to resubmit for your consideration.

Kind regards

Nicholas Gray, et al.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Jishnu Das

5 Oct 2020

Is "No test is better than a bad test''? Impact of diagnostic uncertainty in mass testing on the spread of COVID-19

PONE-D-20-13370R1

Dear Dr. Gray,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jishnu Das, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Please fix the typo noted by Reviewer 2. The figures are fine as is - Figs 1/2 and 3/4 do not need to be combined.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: Previous minor Comment 1 (Typo in text line 128, PPV=0.95, and not 0.8) was acknowledged, although the text was incorrect not the figure. However, it was not fixed (now line 110).

Other than this, the authors have addressed my comments.

Minor comment: Maybe this is an editorial decision, but in my opinion the flow of the manuscript can benefit from combining Figures 1 and 2 (New Fig. 1 A,B) and combining Figures 3 and 4 (New Fig. 2A,B).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Sepideh Dolatshahi

Acceptance letter

Jishnu Das

12 Oct 2020

PONE-D-20-13370R1

Is “no test is better than a bad test”? Impact of diagnostic uncertainty in mass testing on the spread of COVID-19

Dear Dr. Gray:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jishnu Das

Academic Editor

PLOS ONE


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES