Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2019 May 20;374(1776):20180262. doi: 10.1098/rstb.2018.0262

Translating surveillance data into incidence estimates

Y Bourhis 1,, T Gottwald 2, F van den Bosch 1,3
PMCID: PMC6558556  PMID: 31104599

Abstract

Monitoring a population for a disease requires the hosts to be sampled and tested for the pathogen. This results in sampling series from which we may estimate the disease incidence, i.e. the proportion of hosts infected. Existing estimation methods assume that disease incidence does not change between monitoring rounds, resulting in an underestimation of the disease incidence. In this paper, we develop an incidence estimation model accounting for epidemic growth with monitoring rounds that sample varying incidence. We also show how to accommodate the asymptomatic period that is the characteristic of most diseases. For practical use, we produce an approximation of the model, which is subsequently shown to be accurate for relevant epidemic and sampling parameters. Both the approximation and the full model are applied to stochastic spatial simulations of epidemics. The results prove their consistency for a very wide range of situations. The estimation model is made available as an online application.

This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’. This theme issue is linked with the earlier issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’.

Keywords: disease surveillance, sampling theory, spatial epidemiology

1. Introduction

Monitoring programmes are used to keep track of the invasion and spread of human, animal and plant pathogens. They are often structured in discrete rounds of inspection, during which subsamples of the host population are assessed for disease status [1]. Given a sequence of monitoring rounds, a key question in interpreting these data is the estimation of the incidence1 of the disease in the host population. There are two special cases of this general question that have received some attention.

Firstly, monitoring is often motivated by the need for early responses to enable eradication or containment. For example, early detection of the disease permits reduced culling of animal and plant hosts [24], as well as limited deployments of emergency quarantines or travel restrictions for human hosts (applied e.g. for SARS [5]). Secondly, monitoring is frequently motivated by the desire to prove disease absence from a host population [6], which is important for the transport and trade of hosts. The main question then concerns the sufficient sample size [7]. An example of this is the practical ‘rule of three’ [8,9]. It gives the upper bound of the 95% confidence interval (CI) of the incidence when all of the N sampled hosts are assessed as healthy: Q95 = 3/(N + 1). Estimating disease incidence (denoted q hereafter), or proving its absence, is of most interest during the early stages of epidemics, i.e. when incidences are low and containment measures are still promising.

Simple practices like the ‘rule of three’ make the assumption that the samples are independent binomial draws with probability q and size N. However, as a pathogen spreads across the surveyed population, our samples will carry dependencies on the underlying epidemic process. For example, by pooling all the samples together, we neglect the fact that early monitoring rounds have most likely sampled a lower incidence q than the current one, resulting in an underestimation of the incidence. An alternative and unbiased solution is to estimate q only from the last round to date. But obviously, such a poor use of data would only be tolerable in cases where the monitoring interval and epidemic growth rate are both very large, so that the previous monitoring rounds can be deemed uninformative. The temporal dependence of samples has been addressed by [10] in the design of appropriate monitoring programmes, as well as by Parnell et al. [11] and Bourhis et al. [12] for the estimation of the disease incidence after the disease’s first discovery or before its discovery (disease absence), respectively.

We propose here a generalized solution to the incidence estimation problem. Making use of all monitoring data, it applies to the previously addressed first discovery and disease absence cases but extends to any monitoring outcome. Building on the simple logistic equation, our estimation model accounts for the progression of the disease during the monitoring period. Following the idea of the rule of three, and similar to Parnell et al. [13] and Alonso Chavez et al. [14], we also produce an approximation of this model. Its derivation only requires simple algebraic operations which makes it more suitable for practitioners than the full estimation model. Both the estimation model and its approximation are then confronted with epidemic simulations: first with non-spatial and deterministic simulations, and secondly with spatially explicit and stochastic simulations of epidemics running on contrasted distributions of hosts. The results support the accuracy and practical usefulness of the estimation model, which has subsequently been made available as an online software application (https://yo-b.shinyapps.io/incidence-estimation/).

2. Material and methods

Monitoring a population for a disease results in sampling series as shown in table 1. We define K as the number of monitoring rounds iterated in time. Nk is the sampling size of monitoring round k, i.e. the number of hosts whose pathological status is assessed at time tk. Mk is the number of infected hosts detected during round k. Finally, Δk is the time interval between monitoring rounds k and k + 1.

Table 1.

Variables and structure of a sampling series.

monitoring round 1 2 k K − 1 K
no. of samples N1 N2 Nk NK−1 NK
no. of positives M1 M2 Mk MK−1 MK
time interval Δ1 Δ2 Δk ΔK−1

(a). One monitoring round

Considering q the disease incidence in the population, the probability of M positive observations out of a sample of size N is given by the binomial probability distribution

P(M|q;N)=(NM)(1q)NMqM. 2.1

A more general form, accounting for the occurrences of false positives and negatives in the detection process, is

P(M|q;N)=(NM)[(1q)(1θfp)+qθfn]NM[(1q)θfp+q(1θfn)]M, 2.2

where θfn and θfp are, respectively, the rates of false negatives and false positives [15]. For simplicity, the following developments do not explicitly incorporate those rates, which are nonetheless part of the estimation model provided in the application.

In a practical context, q is the variable that we want to estimate from samples characterized by their size N and their outcome M. To this end, we use Bayes’ rule:

P(q|M;N)=P(q)P(M|q;N)01P(q)P(M|q;N)dq, 2.3

where P(q|M; N) is the probability distribution of q given M and N. Assuming no information on the incidence before sampling, we set a uniform prior P(q), resulting in P(q|M; N) ∝ P(M|q; N) [16].

(b). K monitoring rounds

To account properly for the dynamic incidence between monitoring rounds, we inform the binomial probability distribution with an epidemiological component Zk (as in [17], whose maximum-likelihood approach is equivalent to our Bayesian formulation with flat priors). Zk gives the relation between qK, the incidence to estimate and qk, the incidence at sampling time tk, as qk = ZkqK. Hence,

P(M|qK;N)=k=1K(NkMk)(1ZkqK)NkMk(ZkqK)Mk, 2.4

where M and N on the left-hand side represent the whole sampling series, i.e. M1, M2, … , MK and N1, N2, … , NK.

We assume that the disease incidence, q, evolves logistically [18,19] between times tk and tK:

qK=qker(tKtk)1+qk(er(tKtk)1), 2.5

where r is the epidemic growth rate, while qk and qK, respectively, stand for q(tk) and q(tK). For simplicity, we hereafter express time relative to tK = 0, the time of both the last sampling round and the estimation. Hence, we define Zk as:

Zk=q(tk)q(tK)=qKertk1+qK(ertk1)/qK=ertk1+qK(ertk1), 2.6

where tk < 0 as tK = 0.

Similarly to the case of one monitoring round, we use Bayes’ rule to get the unnormalized posterior distribution P(qK|M; N). Practically, it is given by equation (2.4), which is computed for a discretized array of q ∈ [0, 1], and from which quantiles QX can be derived (a method called grid approximation, see e.g. [20]).

(c). A useful approximation

The upper bound of the CI is a useful measure of the highest, still likely, incidence we can expect in the population given the outcome of a monitoring programme. We propose in this section an approximation of this quantity not requiring the derivation of the full probability distribution of P(qK|M; N). Various methods exist for approximating the CI of a binomial parameter [21]. After preliminary testing of those methods against the binomial-shaped probability density given by equation (2.4), we choose the Agresti–Coull interval for its accuracy for low incidences [22]. Therefore, the approximated upper limit of the X% CI is defined as

Q~X=min(1,p~+zmax(0,p~N+z2(1p~))), 2.7

where

p~=1N+z2(M+z22), 2.8

and where z is the corresponding 1 − α/2 quantile of the standard normal distribution (with α the probability of type I error). For the one-sided 95% CI, we derive Q~95 by setting z = 1.645.

The approximated Q~X also needs to account for the epidemic growth. As previously with Zk, we now define Z~k to quantify the disease evolution between rounds. In this case, we are unable to use the logistic model because its non-linearity makes the derivation of Q~X intractable. This, however, was no concern for the full model and the grid approximation method used to derive P(qK|M; N). Consequently, approximating the logistic growth model by its exponential variant, Z~k is given by

Z~k=ertk=exp(ri=kKΔi). 2.9

In practice, the exponential assumption is realistic as, during early infection, the epidemic growth is exponential, even according to the logistic model [19]. Finally, we aggregate the samples together with respect to the epidemic growth via Z~k:

M=k=1KMkandN=k=1KNkZ~k. 2.10

These aggregated values of M and N are then substituted in equations (2.8) and (2.7) to derive Q~X. Scaling only Nk with Zk has two effects: (1) the historic sampling rounds k contribute less than the recent ones to the reduction of the uncertainty (reduced sample size N); and (2) the sampling rounds k that include detection events (Mk > 0) see their contribution to Q~X increased (larger M/N), hence accounting for the putative spread of the disease from those Mk infected hosts between times tk and tK. The min and max operators in equation (2.7) are added to deal with the possibility of having N < M for some values of Z~k.

As mentioned in the introduction, this estimation model and its approximation cover the two specific contexts of first discovery and disease absence addressed, respectively, by Parnell et al. [13] and Bourhis et al. [12] (see electronic supplementary material for details). Their strength is, however, that they extend to any sampling series, no matter its outcome Mk and regularity in sampling size or frequency.

(d). Asymptomatic period

For most diseases, infected hosts develop symptoms after an asymptomatic (or incubation) period [23]. Often, asymptomatic hosts contribute to the epidemic dynamics by spreading the disease while still undetectable (cryptic) when sampled. The logistic equation handles this period, noted σ, as in Alonso Chavez et al. [14]:

qT(tK)=q(tk)er(tKtk+σ)1+q(tk)(er(tKtk+σ)1). 2.11

This relates the total incidence qT at the last sampling round tK (i.e. the quantity to estimate) to the detectable incidences at the different sampling times q(tk) (i.e. the sampled quantities). Hence, Zk becomes:

Zk=q(tk)qT(tK)=er(tkσ)1+qT(tK)(er(tkσ)1). 2.12

For the exponential approximation, equation (2.9) simply becomes

Z~k=er(tkσ). 2.13

(e). Testing the model

The consistency of the full model and the accuracy of its approximation are first tested against simulations of stochastic sampling on non-spatial logistic epidemics. We consider a uniform distribution of incidences qT that we want to estimate individually. For each one of them, an epidemic is simulated until incidence qT is reached and a monitoring programme is designed with Nk and Δk drawn from Poisson distributions of mean N¯ and Δ¯, respectively. From the logistic equation (equation (2.5)), the detectable incidence q is derived for every sampling date tk. Then binomial draws with probability p = q(tk) and size n = Nk simulate the sampling process of the hosts, resulting in Mk. For every qT, an exact upper bound of its CI, QX, is derived with the full model, while an approximated one, Q~X, is derived with the approximation. To test our model, we check that the upper limits of the X% CI are above qT in X% of cases. This test is done for contrasted values of the sampling (N¯ and Δ¯) and epidemic parameters (r and σ).

The full model and its approximation are also tested against spatially explicit and stochastic epidemic simulations (as in [24]). In this case, the epidemics are no longer modelled with the logistic equation but through a transmission rate and a dispersal kernel of the pathogens. To this end, the hosts are distributed in a two-dimensional space and aggregated randomly in field-like structures mimicking the distribution of the trees in an orchard. Details of this landscape model are given as electronic supplementary material. Transmission is governed by an exponential power kernel [25]. The probability of a susceptible individual becoming infected in a unit of time is then given by

p(sS)=βbA2πθ2Γ(2/b)iIexp(|xixs|bθb), 2.14

where s is a susceptible host among the set of all susceptible hosts S. Similarly, i and I represent the infected hosts. A is the area occupied by one host and Γ is the gamma function. β is the probability of infection, θ is the dispersal scale and b is a shape parameter (producing fat-tailed kernels for b < 1). The coordinates x mark the locations of the hosts. Following Klein et al. [26], the mean dispersal distance for this two-dimensional kernel is given by:

δ=θΓ(3/b)Γ(2/b). 2.15

These epidemics are simulated with the τ-leap version of the Gillespie stochastic simulation algorithm (see e.g. [27]). The estimation model and its approximation are evaluated in the same way as the non-spatial case.

3. Results

(a). Model behaviours

Figure 1 illustrates the effects of the epidemic and sampling parameters on the resulting probability distributions of the incidence and upper quantiles Q95. Increasing Mk, the number of detected infected hosts in the sample, unsurprisingly increases the estimated incidence. Increasing the sample size Nk reduces the uncertainty in the estimates. Increasing the sampling interval Δ decreases the impact of the historic samples on the estimation. This reflects the fact that samples taken further back in time are less informative of current disease incidence. As for the epidemic parameters, the growth rate r and the asymptomatic period σ (not shown in figure 1 for dimensional reasons) have very similar effects to Δ. Increasing them increases the estimated incidence by decreasing the impact of the historic samples (which are the ones sampling lower incidences q). Increasing any of the parameters Δ, r or σ also reduces the effective sample size (i.e. k=1KNkZk), which increases the uncertainty on the estimates (i.e. producing probability distributions with larger variance).

Figure 1.

Figure 1.

Probability distributions of the incidence q given by equations (2.4) and (2.3). The vertical lines mark Q95, the upper limit of the 95% CI. The distributions result from a sampling series composed of K = 3 monitoring rounds, of which the first two are fully negative (i.e. M1 = M2 = 0) and the last varies from M3 = 0 (i.e. all sampled hosts are negative) to M3 = N (i.e. all sampled hosts are positive). These probability distributions are represented for varying values of epidemic growth rate r, sampling size N and sampling interval Δ. (Online version in colour.)

(b). Test against logistic epidemics

Figure 2 shows the distribution of the exact and approximated upper bounds of the 95% CI, Q95 and Q~95, for uniform distributions of qT and different values of the epidemic and sampling parameters. The full model, which similarly to the simulations builds on the logistic equation, behaves exactly as expected: it ensures that 95% of the Q95 are above their respective qT, for every set of parameters tested. On the other hand, the approximation displays another behaviour which is explained by its underlying exponential growth model. For the low incidences which are relevant to practice (i.e. say qT < 0.25), the approximation is accurate (the distributions of Q95 and Q~95 do overlap). For higher incidences, i.e. when the logistic growth decelerates unlike the exponential growth, the approximation overestimates the incidence (increasingly with r, σ and Δ).

Figure 2.

Figure 2.

Estimation of Q95 and Q~95 from sampling series of non-spatial epidemics, i.e. simulated with the logistic equation (equation (2.5)). These estimations are made for contrasted values of sampling and epidemic parameters (and for K = 5 monitoring rounds). Using here the 95% CI, we expect 95% of the estimated Q95 and Q~95 to be above the actual incidence in the field at the end of monitoring qT, i.e. above the oblique black line. The inserted text summarizes these scores for the full model (in red) and its approximation (in blue). (Online version in colour.)

Another model behaviour of particular interest occurs when r and σ are large (see the rightmost column of figure 2). We observe that the estimated Q95 and Q~95 do not align well with the diagonal for small incidences qT. For those cases of very hazardous pathogens with high epidemic growth rates and long asymptomatic periods, the sampling size N is too small to allow discrimination between the non-detection cases (i.e. the ones for which all the Mk = 0), and a larger sampling effort is needed for the estimation to be informative.

Although increasing r and σ accelerates the divergence between the logistic and the exponential curves, the approximation appears accurate for early infections even considering very high values of epidemic parameters such as r=0.1day1 or σ=100days.

(c). Test against spatial epidemics

When locating the hosts in space, the epidemic becomes driven by two new elements: the dispersal range of the pathogen and the intensity of host clustering [28]. Both determine how easily the pathogen spreads across the landscape or remains restricted to a local group of hosts. Random distributions of hosts and long dispersal ranges result in smooth progressions of the pathogen across the landscape, following a logistic-like curve. However, as the dispersal range decreases and host aggregation increases, the simulated epidemics will tend to include interruptions between periods of seemingly logistic growth within host clusters. Questions then arise regarding the performance of our estimation model on such epidemics.

The estimation model and its approximation are tested for varying host aggregations and dispersal ranges. Host aggregation is summarized by μ, the number of hosts in a field (sensu host cluster). For a given landscape-scale population of hosts, more hosts per field means fewer but more populated fields (see the electronic supplementary material for an illustration). The dispersal scale θ is translated in terms of mean dispersal distance δ (see equation (2.15)), while μ is translated in terms of d¯, a landscape metric measuring the mean minimal distance between the fields within a landscape (see Euclidean nearest distance in [29]).

Similar to figure 2, figure 3 shows the performance of the model and its approximation for gradients of dispersal scales θ (columns) and host aggregations μ (rows). For each parameter set θ and μ (i.e. each panel in figure 3), 50 epidemics are first simulated for 50 different landscapes in order to identify the value of r that produces the best fitting logistic curve (with least squares). This r then informs the incidence estimation model and its approximation for the subsequent testing set of 2000 epidemics and landscapes. Most of figure 3 agrees with expectations: the estimated Q95 align neatly above the diagonal, showing in practice the accuracy of the estimation model. The approximation appears to be a good simplification of the full model for early detection. However, the estimation model also produces overestimations of the incidence, specifically in the bottom row and left column (i.e. where the dots do not align above the diagonal). These are cases of epidemics for which the distance between host clusters (quantified by d¯) is too large for the pathogen dispersal range (quantified by δ), hence producing unsteady progressions of the pathogens across the landscapes. This illustrates that our model is of limited interest in such cases where d¯/δ0.5.

Figure 3.

Figure 3.

Estimation of Q95 and Q~95 from sampling series realized on spatially explicit epidemics, i.e. simulated with the dispersal kernel (equation (2.14)). These estimations are made for varying dispersal ranges θ and host aggregations μ, while maintaining constant values of the non-spatial parameters (N = 100, Δ = 30, σ = 30, K = 5, as well as β = 75 and b = 0.45 for the remaining kernel parameters). For better understanding, θ and μ are shown with their distance translation in metres, δ and d¯. The identified logistic growth rate r is given for each experiment. The resulting distributions of Q95 and Q~95 are qualitatively similar for other realistic values of the fixed parameters. (Online version in colour.)

We note also that p(qT < Q95) can be below the 95% expectation. This results from the fact that the stochasticity of the simulations scatters the realized epidemic curves symmetrically around the fitted logistic one (whose identified parameter r is subsequently used by the estimation model). This is no concern in practice where the epidemic parameters are taken conservatively from previous observations of similar outbreaks (e.g. highest observed values of r or σ). Here we choose central estimates (through least squares) for illustrative purposes. Nonetheless, parameter uncertainty can be accounted for in the online application assuming that r and σ can be described with normal distributions. On how to deal with epidemic parameter uncertainty, see Neri et al. [30] and Hyatt-Twynam et al. [24].

4. Discussion

The model developed in this paper is suitable for many monitoring designs, including those with irregular sampling sizes and time intervals between rounds. The model weights the monitoring outcomes according to an estimate of the population incidence at their respective sampling time, before aggregating them into a single binomial-shaped probability distribution of the incidence. The quantiles of this distribution have practical interests for policy makers. The model is directly applicable for situations in which surveillance does not depend on the self-reporting of symptomatic hosts, which makes it appropriate for most animal and plant species. Our model is also appropriate for certain monitoring schemes aimed at pathogens of humans, for example, visitations of rural villages to find Ebola infections where access to healthcare is limited [31,32].

Calculating the probability density of the incidence from the sampling series is computationally inexpensive, but still requires technical proficiency. Therefore, we have produced an online application interfacing the full model as exhaustively as possible, as well as an approximation of the model that can be derived with simple algebraic operations (https://yo-b.shinyapps.io/incidence-estimation/). Our intention is to equip the widest audience of practitioners with this incidence estimation capability. The approximation is as flexible as the original model, and we have shown that its inaccuracies are restricted to high levels of incidences that are less relevant when dealing with emerging epidemics. However, in case such high incidence estimation is needed, we have seen that the approximation is conservative, i.e. biased towards an overestimation of disease progress (which is not always acceptable, since it might lead to overzealous control, see e.g. [33]).

The model relies on the simple and deterministic logistic equation. That it is consistent with more complex systems is not obvious. The tests presented here against spatial and stochastic simulations of epidemics show that our non-spatial model is robust to the significant deviations from the logistic equation, products of both spatiality and stochasticity. The model gives accurate estimates of the disease incidence for most simulated epidemics considered here. However, for highly aggregated host distributions and short distance dispersing pathogens, the deviation from the logistic equation can be too great. In those contexts, the disease progression across the landscape is not steady but punctuated by rare events: the pathogen jumps between distant host clusters. Then, the very distinctive trajectories this epidemic can take do not simplify well into a single logistic curve. In such cases, reduced pathogen dispersal and increased host aggregation result in habitat fragmentation for the pathogen. The estimation should then be attempted on individual clusters, or a multiscale approach considered (as in [34,35]).

From plants to animals, the major shift regarding epidemics lies in individual movement. In many cases, this can be overlooked as it does not necessarily imply movement of the sampling units (e.g. herds/farms in [36]). When sampling individuals, however, our model is applicable to well-mixed populations, i.e. where the pathogen spread is steady and not too impacted by spatio-temporal structure in the host population. We saw the limits of this assumption in figure 3, where highly clustered distributions of hosts cause significant deviations from model predictions. Such deviations may, for example, be increased if clustering correlates with heterogeneous susceptibility of hosts (e.g. age-related aggregation like schools), or attenuated by mutable clusters (e.g. commuting).

Recent technological innovations are changing epidemiological surveillance for more timely and exhaustive censuses. For example, the monitoring of human epidemics is already augmented by the supervision of social networks [37] and internet search queries [38,39]. Tree monitoring could also be assisted by satellite high-resolution imagery [40,41]. Those innovations will still need robust and epidemiologically informed estimation methods and, even if monitoring is conducted continuously, there is no reason to see them incompatible with an adaptation of our model. However, in any foreseeable future, most contagions will still be monitored through discrete and censored inspections and hence remain within the immediate scope of the estimation model presented here.

Supplementary Material

Supplementary information
rstb20180262supp1.pdf (236.1KB, pdf)

Acknowledgements

We are grateful to Francisco Lopez-Ruiz for the idea to make the model available as an application. We are also grateful to Robin Thompson and three anonymous reviewers whose time and efforts greatly improved the manuscript.

Endnote

1

We use here the plant pathology definition where incidence is the fraction of host units infected. In human and other animal pathology, this is termed prevalence.

Data accessibility

The estimation model is available at https://gitlab.com/Yo-B/estimationApp.

Competing interests

We declare we have no competing interests.

Funding

The work at Rothamsted forms part of the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through the Biotechnology and Biological Sciences Research Council (BBSRC) Industrial Strategy Challenge Fund and the BBSRC Newton Fund project 'Real time deployment of pathogen resistance genes in rice', grant no. BB/N01362X/1. The authors are also thankful to the US Department of Agriculture (USDA) for funding support.

References

  • 1.Parnell S, van den Bosch F, Gottwald T, Gilligan CA. 2017. Surveillance to inform control of emerging plant diseases: an epidemiological perspective. Annu. Rev. Phytopathol. 55, 591–610. ( 10.1146/annurev-phyto-080516-035334) [DOI] [PubMed] [Google Scholar]
  • 2.Carpenter TE, O’Brien JM, Hagerman AD, McCarl BA. 2011. Epidemic and economic impacts of delayed detection of foot-and-mouth disease: a case study of a simulated outbreak in California. J. Vet. Diagn. Invest. 23, 26–33. ( 10.1177/104063871102300104) [DOI] [PubMed] [Google Scholar]
  • 3.Cunniffe NJ, Cobb RC, Meentemeyer RK, Rizzo DM, Gilligan CA. 2016. Modeling when, where, and how to manage a forest epidemic, motivated by sudden oak death in California. Proc. Natl Acad. Sci. USA 113, 5640–5645. ( 10.1073/pnas.1602153113) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cunniffe NJ, Stutt ROJH, DeSimone RE, Gottwald TR, Gilligan CA. 2015. Optimising and communicating options for the control of invasive plant disease when there is epidemiological uncertainty. PLoS Comput. Biol. 11, e1004211 ( 10.1371/journal.pcbi.1004211) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Smith RD. 2006. Responding to global infectious disease outbreaks: lessons from SARS on the role of risk perception, communication and management. Soc. Sci. Med. 63, 3113–3123. ( 10.1016/j.socscimed.2006.08.004) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Caporale V, Giovannini A, Zepeda C. 2012. Surveillance strategies for foot and mouth disease to prove absence of disease and absence of viral circulation. Sci. Tech. Rev. 31, 747–759. ( 10.20506/rst.issue.31.3.51) [DOI] [PubMed] [Google Scholar]
  • 7.Cannon RM. 2002. Demonstrating disease freedom – combining confidence levels. Prevent. Vet. Med. 52, 227–249. ( 10.1016/S0167-5877(01)00262-8) [DOI] [PubMed] [Google Scholar]
  • 8.Hanley JA, Lippman-Hand A. 1983. If nothing goes wrong, is everything all right? Interpreting zero numerators. JAMA 249, 1743–1745. ( 10.1001/jama.1983.03330370053031) [DOI] [PubMed] [Google Scholar]
  • 9.Louis TA. 1981. Confidence intervals for a binomial parameter after observing no successes. Am. Stat. 35, 154–154 ( 10.2307/2683985) [DOI] [Google Scholar]
  • 10.Metz JAJ, Wedel M, Angulo AF. 1983. Discovering an epidemic before it has reached a certain level of prevalence. Biometrics 39, 765–770. ( 10.2307/2531106) [DOI] [PubMed] [Google Scholar]
  • 11.Parnell S, Gottwald TR, Cunniffe NJ, Alonso Chavez V, van den Bosch F. 2015. Early detection surveillance for an emerging plant pathogen: a rule of thumb to predict prevalence at first discovery. Proc. R. Soc. B 282, 20151478 ( 10.1098/rspb.2015.1478) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bourhis Y, Gottwald TR, Lopez-Ruiz FJ, Patarapuwadol S, van den Bosch F. 2018. Sampling for disease absence – deriving informed monitoring from epidemic traits. J. Theor. Biol. 416, 8–16. ( 10.1016/j.jtbi.2018.10.038) [DOI] [PubMed] [Google Scholar]
  • 13.Parnell S, Gottwald T, Gilks W, van den Bosch F. 2012. Estimating the incidence of an epidemic when it is first discovered and the design of early detection monitoring. J. Theor. Biol. 305, 30–36. ( 10.1016/j.jtbi.2012.03.009) [DOI] [PubMed] [Google Scholar]
  • 14.Alonso Chavez V, Parnell S, van den Bosch F. 2016. Monitoring invasive pathogens in plant nurseries for early-detection and to minimise the probability of escape. J. Theor. Biol. 407, 290–302. ( 10.1016/j.jtbi.2016.07.041) [DOI] [PubMed] [Google Scholar]
  • 15.Cameron AR, Baldock FC. 1998. A new probability formula for surveys to substantiate freedom from disease. Prevent. Vet. Med. 34, 1–17. ( 10.1016/S0167-5877(97)00081-0) [DOI] [PubMed] [Google Scholar]
  • 16.Gelman A, Carlin JB, Stern HS, Rubin DB. 2003. Bayesian data analysis, 2nd edn Boca Raton, FL: CRC Press. [Google Scholar]
  • 17.Hamelin FM, Bisson A, Desprez-Loustau M-L, Fabre F, Mailleret L. 2016. Temporal niche differentiation of parasites sharing the same plant host: oak powdery mildew as a case study. Ecosphere 7, e01517 ( 10.1002/ecs2.1517) [DOI] [Google Scholar]
  • 18.Murray JD. 2002. Mathematical biology: I. An introduction. Interdisciplinary applied mathematics, 3rd edn New York, NY: Springer-Verlag. [Google Scholar]
  • 19.van der Plank JE. 1963. Plant diseases: epidemics and control. New York, NY: Academic Press. [Google Scholar]
  • 20.Kruschke J. 2014. Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan. New York, NY: Academic Press. [Google Scholar]
  • 21.Wallis S. 2013. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. J. Quant. Linguist. 20, 178–208. ( 10.1080/09296174.2013.799918) [DOI] [Google Scholar]
  • 22.Agresti A, Coull BA. 1998. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am. Stat. 52, 119–126. ( 10.1080/00031305.1998.10480550) [DOI] [Google Scholar]
  • 23.Thompson RN, Gilligan CA, Cunniffe NJ. 2016. Detecting presymptomatic infection is necessary to forecast major epidemics in the earliest stages of infectious disease outbreaks. PLoS Comput. Biol. 12, e1004836 ( 10.1371/journal.pcbi.1004836) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hyatt-Twynam SR, Parnell S, Stutt ROJH, Gottwald TR, Gilligan CA, Cunniffe NJ. 2017. Risk-based management of invading plant disease. New Phytol. 214, 1317–1329. ( 10.1111/nph.14488) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Rieux A, Soubeyrand S, Bonnot F, Klein EK, Ngando JE, Mehl A, Ravigne V, Carlier J, Bellaire LDLD. 2014. Long-distance wind-dispersal of spores in a fungal plant pathogen: estimation of anisotropic dispersal kernels from an extensive field experiment. PLOS ONE 9, e103225 ( 10.1371/journal.pone.0103225) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Klein EK, Lavigne C, Gouyon P-H. 2006. Mixing of propagules from discrete sources at long distance: comparing a dispersal tail to an exponential. BMC Ecol. 6, 3 ( 10.1186/1472-6785-6-3) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Keeling MJ, Rohani P. 2008. Modeling infectious diseases in humans and animals. Princeton, NJ: Princeton University Press. [Google Scholar]
  • 28.Brown DH, Bolker BM. 2004. The effects of disease dispersal and host clustering on the epidemic threshold in plants. Bull. Math. Biol. 66, 341–371. ( 10.1016/j.bulm.2003.08.006) [DOI] [PubMed] [Google Scholar]
  • 29.Leitao AB, Miller J, Ahern J, McGarigal K. 2006. Measuring landscapes: a planner’s handbook. Washington, DC: Island Press. [Google Scholar]
  • 30.Neri FM, Cook AR, Gibson GJ, Gottwald TR, Gilligan CA. 2014. Bayesian analysis for inference of an emerging epidemic: citrus canker in urban landscapes. PLoS Comput. Biol. 10, e1003587 ( 10.1371/journal.pcbi.1003587) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Namukose E. et al. 2018. Active case finding for improved Ebola virus disease case detection in Nimba County, Liberia, 2014/2015: lessons learned. Adv. Public Health 2018, 6753519 ( 10.1155/2018/6753519) [DOI] [Google Scholar]
  • 32.Thompson RN, Morgan OW, Jalava K. 2019. Rigorous surveillance is necessary for high confidence in end-of-outbreak declarations for Ebola and other infectious diseases. Phil. Trans. R. Soc. B 374, 20180431. ( 10.1098/rstb.2018.0431) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Thompson RN, Gilligan CA, Cunniffe NJ. 2018. Control fast or control smart: when should invading pathogens be controlled? PLoS Comput. Biol. 14, e1006014 ( 10.1371/journal.pcbi.1006014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cameron AR, Baldock FC. 1998. Two-stage sampling in surveys to substantiate freedom from disease. Prevent. Vet. Med. 34, 19–30. ( 10.1016/S0167-5877(97)00073-1) [DOI] [PubMed] [Google Scholar]
  • 35.Coulston JW, Koch FH, Smith WD, Sapio FJ. 2008. Invasive forest pest surveillance: survey development and reliability. Can. J. Forest. Res. 38, 2422–2433. ( 10.1139/X08-076) [DOI] [Google Scholar]
  • 36.Bates TW, Thurmond MC, Carpenter TE. 2003. Description of an epidemic simulation model for use in evaluating strategies to control an outbreak of foot-and-mouth disease. Am. J. Vet. Res. 64, 195–204. ( 10.2460/ajvr.2003.64.issue-2) [DOI] [PubMed] [Google Scholar]
  • 37.Chen L, Hossain KSMT, Butler P, Ramakrishnan N, Prakash BA. 2015. Flu gone viral: syndromic surveillance of flu on Twitter using temporal topic models. In Proc. 2014 IEEE Int. Conf. Data Mining, Shenzhen, PR China, 14–17 December 2014, pp. 755–760. Piscataway, NJ: IEEE ( 10.1109/ICDM.2014.137) [DOI]
  • 38.Yang S, Santillana M, Kou SC. 2015. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc. Natl Acad. Sci. USA 112, 14 473–14 478. ( 10.1073/pnas.1515373112) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Yuan Q, Nsoesie EO, Lv B, Peng G, Chunara R, Brownstein JS. 2013. Monitoring influenza epidemics in China with search query from Baidu. PLOS ONE 8, e64323 ( 10.1371/journal.pone.0064323) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Li H, Lee WS, Wang K, Ehsani R, Yang C. 2014. ‘Extended spectral angle mapping (ESAM)’ for citrus greening disease detection using airborne hyperspectral imaging. Precision Agric. 15, 162–183. ( 10.1007/s11119-013-9325-6) [DOI] [Google Scholar]
  • 41.Salgadoe A, Robson A, Lamb D, Dann E, Searle C. 2018. Quantifying the severity of phytophthora root rot disease in avocado trees using image analysis. Remote Sens. 10, 226 ( 10.3390/rs10020226) [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary information
rstb20180262supp1.pdf (236.1KB, pdf)

Data Availability Statement

The estimation model is available at https://gitlab.com/Yo-B/estimationApp.


Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES