Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2021 Apr 27:2021.04.23.21255958. [Version 2] doi: 10.1101/2021.04.23.21255958

Estimation of local time-varying reproduction numbers in noisy surveillance data

Wenrui Li 1,*, Katia Bulekova 2, Brian Gregor 2, Laura F White 3, Eric D Kolaczyk 1,4
PMCID: PMC8095231  PMID: 33948612

Abstract

A valuable metric in understanding infectious disease local dynamics is the local time-varying reproduction number, i.e. the expected number of secondary local cases caused by each infected individual. Accurate estimation of this quantity requires distinguishing cases arising from local transmission from those imported from elsewhere. Realistically, we can expect identification of cases as local or imported to be imperfect. We study the propagation of such errors in estimation of the local time-varying reproduction number. In addition, we propose a Bayesian framework for estimation of the true local time-varying reproduction number when identification errors exist. And we illustrate the practical performance of our estimator through simulation studies and with outbreaks of COVID-19 in Hong Kong and Victoria, Australia.

Introduction

Epidemic modeling, while not at all new, has taken on renewed importance due to the COVID-19 pandemic. The local time-varying reproduction number, R*local (t), is an important quantity to monitor the infectiousness and transmissibility of diseases and, therefore, to design and adjust public health responses during an outbreak. Recent examples include monitoring transmission of the COVID-19 pandemic and demonstrating the efficacy of non-pharmaceutical interventions in more than 100 countries [14]. The value of R*local (t) represents the expected number of secondary local cases arising from a primary case infected at time t. Different formal definitions of R*local (t) have been proposed, and a number of methods are available to estimate this quantity. The most widely used is an estimator of the instantaneous reproduction number that is defined as the ratio of the expected number of incident locally infected cases at time t to the expected total infectiousness of infected individuals at time t [5, 6].

Distinguishing local cases from imported cases is essential to estimation of the local time-varying reproduction number. However, surveillance data generally is available only up to some level of error. For example, if we are unable to identify the correct source of infection from contact tracing or genetic information, imported cases might be misclassified as local cases, and vice versa. Such misclassification error is recognized as one limitation of estimating R*local (t) in the COVID-19 outbreak [7, 8]. We investigate how identification error impacts on the estimation of the instantaneous reproduction number and, thus, on our understanding of diseases transmission dynamics.

Extensive work regarding improving inference of time-varying reproduction numbers has been done. For instance, there have been efforts to estimate the serial interval that is used to compute the total infectiousness for R*local (t) estimation, including Bayesian parametric estimation using data augmentation Markov Chain Monte Carlo [9], and a cure model for limited follow-up data [10]. Many studies have explored the effects of imperfect detection and estimated the true infection prevalence [8, 1113]. But, to our best knowledge, there has been little attention to date given towards accounting for identification errors of local and imported cases.

Our contribution in this paper is to quantify how such errors propagate to the local time-varying reproduction number, and to provide estimators for R*local (t) when contact tracing survey information is available. Adopting the definition of R*local (t) proposed by [5], we characterize the impact of identification errors on the bias of noisy local time-varying reproduction numbers. Our work shows that, in general, the bias can be expected to be nontrivial. Accordingly, we propose a Bayesian framework to estimate the true local time-varying reproduction number. Numerical simulation suggests that high accuracy is possible for estimating local time-varying reproduction numbers in outbreaks of even modest size. We illustrate the practical use of our estimators in the context of COVID-19 pandemic in Hong Kong and Victoria, Australia.

The organization of this paper is as follows. In Methods Section we show the bias of the noisy local time-varying reproduction number, and propose a Bayesian hierarchical framework to estimate the true local time-varying reproduction number with imperfect knowledge. Results Section reports the practical performance of our estimators through simulation studies and with SARS-CoV-2 infections in Hong Kong and Australia. Finally, we conclude in Discussion Section with a discussion of future directions for this work.

Methods

In this section, we first quantify the bias of the noisy local time-varying reproduction number when misidentification occurs in the surveillance data. We then build a Bayesian hierarchical framework to estimate true local time-varying reproduction numbers. We also propose a method to estimate misidentification rates based on contact tracing survey data, which informs the prior distribution in the model.

Notation

We provide essential notation and background here. The number of newly infected cases at time t, I*(t), is the sum of the numbers of local (I*local (t)) and imported (I*imported (t)) cases. If one assumes independence between calendar time and the generation interval, g(s), then the local time-varying reproduction number is defined as [5]

R*local (t)=μ*local (t)0g(s)μ*(ts)ds, (1)

where μ*local (t)=E[I*local (t)] and μ*(t)=E[I*(t)].

In reality, we only know the serial interval and the number of diagnosed cases. Let I(t), Ilocal(t) and Iimported(t) be the numbers of total diagnosed cases, local diagnosed cases, and imported diagnosed cases at time t, respectively. Then, we define a realistic local time-varying reproduction number as

Rlocal (t)=μlocal (t)0w(s)μ(ts)ds, (2)

where w(s) is the serial interval, μlocal (t)=E[Ilocal (t)] and μ(t)=E[I(t)]. Note that the serial interval corresponds to date of symptom onset. One can estimate symptom onset dates by back calculation of report dates [14].

Realistically, we can expect identification of cases as local or imported to be imperfect. Let Ĩlocal(t) and Ĩimported(t) be the number of new local and imported cases reported at time t, with identification error. Thus, we define a noisy local time-varying reproduction number as

R˜local (t)=μ˜local (t)0w(s)μ(ts)ds, (3)

where μ˜local (t)=E[I˜local (t)]. The definition of R˜local (t) in (3) comes from an argument that mimics the original argument using Poisson arrivals in [15]. Specifically, we suppose that we observe a Poisson stream I˜local (t) that is a function of calendar time t in terms of the transmissibility, denoted β˜local (t,s), an arbitrary function of calendar time t and time since infection s. Then, μ˜local (t) follows the so-called renewal equation

μ˜local (t)=0β˜local (t,s)μ(ts)ds. (4)

Following [15], we have

β˜local (t,s)=R˜local (t)w(s). (5)

Inserting (5) into (4) yields the definition of R˜local (t) in (3).

Our interest is in characterizing the manner in which the uncertainty in Ĩlocal(t) and Ĩimported(t) propagates to the local time-varying reproduction number, and providing estimators of Rlocal(t) to account for identification errors.

Bias of the noisy local time-varying reproduction number

We quantify the bias of the noisy local time-varying reproduction number in (3) when misidentification occurs. We begin by defining a model for Ĩlocal(t) and Ĩimported(t). Let α0 denote the probability that an imported case is misidentified as local, and α1 the probability that a local case is misidentified as imported. Then, a simple model is

I˜local (t)Ilocal (t),Iimported (t),α0,α1~Bin(Ilocal (t),1α1)+Bin(Iimported (t),α0),I˜imported (t)=Ilocal (t)+Iimported (t)I˜local (t). (6)

Under independence, the first relationship in (6) is directly obtained by the definition of α0 and α1. And the second equation in (6) is due to the fact that the total number of cases reported at time t is not affected by the misidentification.

By (6), the relationship between μ˜local (t) and μlocal(t) is

μ˜local (t)=(1α1)μlocal (t)+α0μimported (t), (7)

where μimported (t)=E(Iimported (t)). Direct computation yields

R˜local (t)=(1α1+α0μimported (t)μlocal (t))Rlocal (t) (8)

when μlocal(t) ≠ 0. From (8), we can see that the bias of R˜local (t) depends on α0, α1 and the ratio of μimported(t) and μlocal(t). When μimported(t)/μlocal(t) = 1, we have R˜local(t)>Rlocal(t) if α0 > α1, and R˜local(t)<Rlocal(t) if α0 < α1.

Bayesian hierarchical modeling to account for misidentification

We propose a Bayesian framework to estimate Rlocal(t) using noisy surveillance data. Following [5, 6, 15], we specify

Ilocal(t)Rlocal (t),n(t1),w(s)~Pois(Rlocal (t)Λ(t)),fort>0, (9)

where Λ(t)=s=1tw(s)I(ts) is the total infectiousness of infected individuals at time t, and n(t−1) represent the historical data up to time t−1 (i.e., Ilocal(0), Iimported(0), · · ·, Ilocal(t − 1), Iimported(t − 1)). Note that Λ(t) is undefined for t = 0. So, we assume that

Ilocal (0)μlocal (0)~Pois(μlocal (0)). (10)

And we assume the imported case counts follow a Poisson distribution:

Iimported (t)μimported (t)~Pois(μimported (t)). (11)

Next, we define relevant prior distributions. We assume a distribution for Rlocal(t) of the form

Rlocal (t)n(t1),w(s)~Gamma(at|t1local ,bt|t1local ),fort>0. (12)

This choice is similar to that in [5], but differs in that we specify gamma conditioned on the history, rather than marginally. The conditioning reflects the expectation that the evolution of Rlocal(t) is likely to depend on the course of infection in the population and intervention measures that may result. Analogously, we also assume gamma distributed priors for μimported(t) and μlocal(0), that is,

μimported (t)~Gamma(atimported ,btimported ),μlocal (0)~Gamma(a0local ,b0local ). (13)

In addition, we assume the convention that the misidentification rates are beta distributed, and hence given by

α0~Beta(ζα0,ξα0),α1~Beta(ζα1,ξα1). (14)

By using Markov chain Monte Carlo (MCMC) simulation, we can get both estimates of Rlocal(t) and its uncertainty. We implement MCMC using the R package, NIMBLE [1618] with the default assignment of sampler algorithms. The samplers assigned to the variables are as follows: Gibbs samplers are assigned to μlocal(0) and μimported(t), t ≥ 0, which have conjugate relationships between their prior distribution and the distributions of their stochastic dependents; slice samplers [19] are used for Ilocal(t) and Iimported(t), t ≥ 0; Metropolis-Hastings adaptive random-walk samplers are set to α0, α1 and Rlocal(t), t > 0.

Estimating misidentification rates

Without any information on the misidentification rates, it is difficult to get an accurate estimator of Rlocal(t). However, contact tracing data could provide adequate information to estimate the misidentification rates.

Let pi be the probability that we think individual i is a local case based on the survey. Then, pi can be modeled as a mixture of α0 and 1 − α1. Note that α1~Beta(ζα1,ξα1) implies 1α1~Beta(ξα1,ζα1). We thus model the distribution of pi as a mixture of two beta distributions:

pi~π0Beta(ζα0,ξα0)+(1π0)Beta(ξα1,ζα1), (15)

where π0 can be interpreted as the fraction of the diagnosed cases that are imported. By using the expectation–maximization (EM) algorithm, we can obtain estimators ζ^α0, ξ^α0, ζ^α1 and ξ^α1.

Note that, if 1ζα0/(ζα0+ξα0)ζα1/(ζα1+ξα1)0, we obtain unbiased estimators of Ilocal(t) and Iimported(t)

I^local (t)=[1ζα0/(ζα0+ξα0)]I˜local (t)ζα0/(ζα0+ξα0)I˜imported (t)1ζα0/(ζα0+ξα0)ζα1/(ζα1+ξα1),I^imported (t)=[1ζα1/(ζα1+ξα1)]I˜imported (t)ζα1/(ζα1+ξα1)I˜local (t)1ζα0/(ζα0+ξα0)ζα1/(ζα1+ξα1). (16)

Thus, good initial values of Ilocal(t) and Iimported(t) in MCMC are estimators of Îlocal(t) and Îimported(t) based on the estimated misidentification rates, i.e., replacing ζα0, ξα0, ζα1, ξα1 in (16) by ζ^α0, ξ^α0, ζ^α1, ξ^α1.

Results

In this section, we conduct some simulations to illustrate the performance of the proposed estimation methods. And we apply our method to two real data sets. One is surveillance data of COVID-19 in Hong Kong that includes contact tracing information, including travel history data [20]. They collected information on 1,038 SARS-CoV-2 cases confirmed between 23 January and 28 April 2020. And they identified 355 local cases and 683 imported cases. The other data set is from the COVID-19 pandemic in Victoria, Australia, studied in [21]. There they had 1,333 laboratory-confirmed cases of COVID-19 between 6 January and 14 April 2020. After excluding duplicate patients from cases, they identified 345 local cases and 558 imported cases.

We consider two settings, a simulation setting and an application setting. In the simulation setting, we first use surveillance data from Hong Kong and Victoria to create realistic simulated data, and then we add identification errors to the ‘true’ local and imported cases derived from the simulated epidemics, finally we estimate the local time-varying reproduction number using the noisy local and imported cases counts. In the application setting, we assume that identified local and imported cases in the real data sets are with some error. The former results allow us to understand what properties can be expected of our estimators, while the latter are reflective of what would be observed in practice with such data.

Simulation study

In this simulation study, we used Covasim [22], a stochastic individual-based model for transmission of SARS-CoV-2, calibrated to the epidemics in Hong Kong and Victoria. Fig 1 shows the average daily local and imported diagnosed counts over 1,000 trials. The noisy Ĩlocal(t) and Ĩimported(t) are generated according to (6). We set α0 ~ Beta(2, 18) (mean of 0.1), and α1 ~ Beta(2, 8) (mean of 0.2), Beta(4, 8) (mean of 0.33), or Beta(8, 8) (mean of 0.5) to see the effect of small α0 and large α1. This might happen if the definition of imported cases relies on travel history collected in the case investigation and some people are infected locally, even though they have a travel history within 14 days prior to symptom onset. We also consider α1 ~ Beta(2, 18), and α0 ~ Beta(2, 8), Beta(4, 8), or Beta(8, 8) (corresponding to small α1 and large α0, which might occur if cases are defined as local when we are not sure about their source of infection.) We assume that both α0 and α1 are unknown.

Fig 1.

Fig 1.

The means of daily local and imported diagnosed counts in 1,000 simulation trials for epidemics in Hong Kong and Victoria.

We evaluate the estimate for Rlocal(t) in terms of a corresponding posterior, and 95% credible intervals. Fig 2 and 3 show the simulation results, in which we run MCMC chains of 10,000 samples for each of 1,000 simulated epidemic trials. Fig 2 assumes that we are more likely to misclassify local cases as imported cases and Fig 3 assumes that we are more likely to misclassify imported cases as local cases. For comparison purposes, we compute R˜local (t) and Rlocal(t) defined in (1) and (2) by approximating μ*local (t), μ*(t), g(s), μlocal(t), μ(t), w(s) using 1,000 simulation trials. And we calculate the most widely used estimator of R˜local (t) defined in (3), which is implemented in the R package, EpiEstim [23]. We view it as a representative estimator that does not account for misidentification, i.e., it treats the noisy local and imported cases as true.

Fig 2.

Fig 2.

Estimations of local time-varying reproduction numbers in simulated epidemics for Hong Kong and Victoria under three sets of error misidentification rates: α0 ~ Beta(2, 18), and α1 ~ Beta(2, 8), Beta(4, 8), or Beta(8, 8). The error bands are the averages of 95% credible intervals over 1,000 trials. Note that the differences between the blue curve (R*local (t)) and the purple curve (Rlocal(t)) are due to the differences among infected dates, symptom onset dates, diagnosed dates.

Fig 3.

Fig 3.

Estimations of local time-varying reproduction numbers in simulated epidemics for Hong Kong and Victoria under three sets of error misidentification rates: α1 ~ Beta(2, 18), and α0 ~ Beta(2, 8), Beta(4, 8), or Beta(8, 8). The error bands are the averages of 95% credible intervals over 1,000 trials.

In the simulated epidemics for both Hong Kong and Victoria, if we ignore the misidentification, we will underestimate Rlocal(t) when the mean of α0 is small and the mean of α1 is relatively large (Fig 2), and overestimate Rlocal(t) when the mean of α1 is small and the mean of α0 is relatively large (Fig 3), with the biases increasing when the means of α0 and α1 increase. The results are consistent with (8) implying that the biases will lead to inappropriate public health response, i.e., inadequate interventions or overreaction. We correct the bias by our Bayesian hierarchical framework. The biases of our estimators are close to zero in all cases. The 95% credible intervals of our estimators are wide in the first two months because the number of incident cases are very low. For the last month or so when the diagnosed counts are relatively high, the 95% credible intervals are narrow.

Application

We apply our proposed methods to surveillance data of COVID-19 in Hong Kong and Victoria. Fig 4 (a) and (b) show the daily local and imported cases counts in Hong Kong and Victoria. For Hong Kong data, [20] calculated the serial intervals using a gamma distribution and estimated shape and rate parameters of 2.23 and 0.37, respectively (corresponding to a mean of around 6 days and standard deviation of around 4 days). There is no specific serial interval that has been calculated for Victoria. Considering the epidemic curve in Victoria is relatively similar to that in Hong Kong, we use the same serial interval distribution when we estimate Rlocal(t) in Victoria.

Fig 4.

Fig 4.

Epidemic curves of COVID-19 cases and estimations of local time-varying reproduction numbers in Hong Kong and Victoria. (a) The epidemic curve of daily cases of laboratory-confirmed SARS-CoV-2 infection in Hong Kong by symptom onset date and colored by case category. Asymptomatic cases are included here by date of confirmation. (b) The epidemic curve of the coronavirus disease cases in Victoria by sample collection date and colored by case category. (c) and (d) Estimations of local time-varying reproduction numbers under three scenarios: 1) no identification error, 2) α0 ~ Beta(2, 18) and α1 ~ Beta(4, 8) (around 10% imported cases are misclassified as local and around 33.3% local cases are misclassified as imported), 3) α0 ~ Beta(4, 8) and α1 ~ Beta(2, 18) (around 33.3% imported cases are misclassified as local and around 10% local cases are misclassified as imported). The bands are the 95% credible intervals.

Fig 4 (c) and (d) show estimates for Rlocal(t) under three scenarios: 1) no identification error, 2) small α0 and large α1, 3) small α1 and large α0. We run MCMC chains of 10,000 samples and the error bands are the 95% credible intervals. We can see that the estimated local time-varying reproduction numbers are quite different when the two identification error rates are about 10% and 30%. If we think we are more likely to misclassify local cases as imported, then we should trust the curve corresponding to scenario 2). If imported cases are more likely to be misidentified as local, then the curve corresponding to scenario 3) is reliable. And if we believe the identification error is close to zero, we should trust the estimate under scenario 1).

Ultimately, we see that the ability to account for identification error appropriately in reporting the local time-varying reproduction number can lead to substantially different conclusions than use of the original, noisy local time-varying reproduction number. These differences can then in turn be translated to decision making for public health response.

Discussion

We have developed a general framework for estimation of the true local time-varying reproduction numbers in contexts wherein one has identified local and imported case counts with some error. Simulations demonstrate that substantial inferential accuracy by our estimators is possible when nontrivial error is present. And our application to epidemics in Hong Kong and Victoria shows that the gains offered by our approach over presenting the noisy local instantaneous reproduction number can be pronounced.

We have shown examples on a state/province level, but our method could be useful for cities, or more local settings, such as a university trying to determine if there is substantial local transmission occurring. Our approach requires daily numbers of local and imported cases, serial interval, and contact tracing data or other data to provide adequate information to estimate the misidentification rates.

We have pursued a Bayesian approach to the problem of estimating the local instantaneous reproduction number. The credible intervals are relatively wide when the number of cases is low. To improve the performance at low case incidence, Kalman filtering is a natural approach. Estimating the time-vary reproduction number by Kalman filtering is an emerging topic. For instance, [24] constructed a recursive Bayesian smoother for estimating the effective reproduction number from the incidence of an infectious disease in real time and retrospectively. However, one typically does not distinguish between local and imported cases in this setting.

The identification errors are informed by contact tracing survey data in our approach. If the data from the survey is categorical (e.g., we ask people where they were infected and attach some qualitative measure of our confidences that we think they are local cases), we can transform them into numerical values. For example, [25] proposed a method that converts categorical variables to numerical data for Gaussian distribution. We could modify the method to convert categorical variables to Beta distributed data. If the survey data is unavailable, using genomic data is a natural alternative. Genomic surveillance has been used to detect transmission clusters and to provide information on the possible source of individual cases [2631].

We have showed the results of retrospective estimation. And it is computationally feasible to run MCMC on each day to obtain real time estimators; it takes about 5 minutes for the MCMC chain of 10,000 samples. To reduce the computational cost, one approach is adaptive MCMC methods [32, 33], which use the covariance structure of the posterior distribution to design proposal distributions. Other methods include stochastic Newton [34] and Riemannian manifold MCMC [35], which construct efficient proposals by local derivative information.

Funding

This work was supported in part by ARO award W911NF1810237. This work was also supported by National Institutes of Health, R01 GM122878.

Footnotes

Data Accessibility

No primary data are used in this paper. Secondary data sources are taken from [20, 21]. These data and the code necessary to reproduce the results in this paper are available at https://github.com/KolaczykResearch/EstimLocalRt.

References

  • 1.You C, Deng Y, Hu W, Sun J, Lin Q, Zhou F, et al. Estimation of the time-varying reproduction number of COVID-19 outbreak in China. International Journal of Hygiene and Environmental Health. 2020; p. 113555. doi: 10.1101/2020.02.08.20021253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Li Y, Campbell H, Kulkarni D, Harpur A, Nundy M, Wang X, et al. The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries. The Lancet Infectious Diseases. 2020;doi: 10.1016/s1473-3099(20)30785-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rubin D, Huang J, Fisher BT, Gasparrini A, Tam V, Song L, et al. Association of social distancing, population density, and temperature with the instantaneous reproduction number of SARS-CoV-2 in counties across the United States. JAMA network open. 2020;3(7):e2016099–e2016099. doi: 10.1001/jamanetworkopen.2020.16099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Abbott S, Hellewell J, Thompson RN, Sherratt K, Gibbs HP, Bosse NI, et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Research. 2020;5(112):112. doi: 10.12688/wellcomeopenres.16006.1 [DOI] [Google Scholar]
  • 5.Thompson RN, Stockwin JE, van Gaalen RD, Polonsky JA, Kamvar ZN, Demarsh PA, et al. Improved inference of time-varying reproduction numbers during infectious disease outbreaks. Epidemics. 2019;doi: 10.1016/j.epidem.2019.100356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cori A, Ferguson NM, Fraser C, Cauchemez S. A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics. Am J Epi. 2013;178(9). doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chong KC, Cheng W, Zhao S, Ling F, Mohammad KN, Wang M, et al. Transmissibility of coronavirus disease 2019 in Chinese cities with different dynamics of imported cases. PeerJ. 2020;8:e10350. doi: 10.7717/peerj.10350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Arroyo Marioli F, Bullano F, Kučinskas S, Rondón-Moreno C. Tracking R of COVID-19: A New Real-Time Estimation Using the Kalman Filter. Available at SSRN 3581633. 2020;doi: 10.1101/2020.04.19.20071886 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Reich N, Lessler J, Cummings D, Brookmeyer R. Estimating incubation period distributions with coarse data. Stat Med. 2009;28(22). [DOI] [PubMed] [Google Scholar]
  • 10.Ma Y, Jenkins HE, Sebastiani P, Ellner JJ, Jones-Lòpez EC, Dietze R, et al. Using cure models to estimate the serial interval of tuberculosis with limited follow-up. Am J Epidemiol. 2020;189(11):1421–1426. doi: 10.1093/aje/kwaa090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Miller DA, Talley BL, Lips KR, Campbell Grant EH. Estimating patterns and drivers of infection prevalence and intensity when detection is imperfect and sampling error occurs. Methods in Ecology and Evolution. 2012;3(5):850–859. doi: 10.1111/j.2041-210x.2012.00216.x [DOI] [Google Scholar]
  • 12.McClintock BT, Nichols JD, Bailey LL, MacKenzie DI, Kendall WL, Franklin AB. Seeking a second opinion: uncertainty in disease ecology. Ecology letters. 2010;13(6):659–674. doi: 10.1111/j.1461-0248.2010.01472.x [DOI] [PubMed] [Google Scholar]
  • 13.Cui N, Chen Y, Small DS. Modeling parasite infection dynamics when there is heterogeneity and imperfect detectability. Biometrics. 2013;69(3):683–692. doi: 10.1111/biom.12050 [DOI] [PubMed] [Google Scholar]
  • 14.Li T, White LF. Bayesian back-calculation and nowcasting for line list data during the COVID-19 pandemic. medRxiv. 2020;doi: 10.1101/2020.12.08.20238154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fraser C. Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic. PlosOne. 2007;2(8). doi:10.1371/. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.de Valpine P, Turek D, Paciorek C, Anderson-Bergman C, Temple Lang D, Bodik R. Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics. 2017;26:403–413. doi: 10.1080/10618600.2016.1172487 [DOI] [Google Scholar]
  • 17.de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C, Obermeyer F, et al. NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling; 2020. Available from: https://cran.r-project.org/package=nimble.
  • 18.de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C, Obermeyer F, et al. NIMBLE User Manual; 2020. Available from: https://r-nimble.org.
  • 19.Neal RM. Slice sampling. Annals of statistics. 2003; p. 705–741. doi: 10.1214/aos/1056562461 [DOI] [Google Scholar]
  • 20.Adam DC, Wu P, Wong JY, Lau EHY, Tsang TK, Cauchemez S, et al. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine. 2020;26(11):1714–1719. doi: 10.1038/s41591-020-1092-0 [DOI] [PubMed] [Google Scholar]
  • 21.Seemann T, Lane C, Sherry N, Duchene S, da Silva AG, Caly L, et al. Tracking the COVID-19 pandemic in Australia using genomics. medRxiv. 2020;doi: 10.1101/2020.05.12.20099929 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kerr CC, Stuart RM, Mistry D, Abeysuriya RG, Hart G, Rosenfeld K, et al. Covasim: an agent-based model of COVID-19 dynamics and interventions. medRxiv. 2020;doi: 10.1101/2020.05.10.20097469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cori A, Kamvar ZN, Stockwin JE, Jombart T, Thompson RN, Dahlqwist E. EpiEstim; 2020. [DOI] [PMC free article] [PubMed]
  • 24.Parag KV. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. medRxiv. 2020;doi: 10.1101/2020.09.14.20194589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Patki N, Wedge R, Veeramachaneni K. The synthetic data vault. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2016. p. 399–410. Available from: 10.1109/dsaa.2016.49 [DOI] [Google Scholar]
  • 26.Leavitt SV, Lee RS, Sebastiani P, Horsburgh CR, Jenkins HE, White LF. Estimating the relative probability of direct transmission between infectious disease patients. International journal of epidemiology. 2020;doi: 10.1093/ije/dyaa031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Meredith LW, Hamilton WL, Warne B, Houldcroft CJ, Hosmillo M, Jahun AS, et al. Rapid implementation of SARS-CoV-2 sequencing to investigate cases of healthcare associated COVID-19: a prospective genomic surveillance study. The Lancet infectious diseases. 2020;20(11):1263–1272. doi: 10.1016/s1473-3099(20)30562-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Deng X, Gu W, Federman S, du Plessis L, Pybus OG, Faria N, et al. Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California. Science. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Poon AF, Gustafson R, Daly P, Zerr L, Demlow SE, Wong J, et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. The lancet HIV. 2016;3(5):e231–e238. doi: 10.1016/s2352-3018(16)00046-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sansone M, Andersson M, Gustavsson L, Andersson LM, Nordén R, Westin J. Extensive hospital in-ward clustering revealed by molecular characterization of influenza A virus infection. Clinical Infectious Diseases. 2020;doi: 10.1093/cid/ciaa108 [DOI] [PubMed] [Google Scholar]
  • 31.Peters PJ, Pontones P, Hoover KW, Patel MR, Galang RR, Shields J, et al. HIV infection linked to injection use of oxymorphone in Indiana, 2014–2015. New England Journal of Medicine. 2016;375(3):229–239. doi: 10.1056/nejmoa1515195 [DOI] [PubMed] [Google Scholar]
  • 32.Haario H, Saksman E, Tamminen J, et al. An adaptive Metropolis algorithm. Bernoulli. 2001;7(2):223–242. doi: 10.2307/3318737 [DOI] [Google Scholar]
  • 33.Roberts GO, Rosenthal JS. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. Journal of applied probability. 2007;44(2):458–475. doi: 10.1017/s0021900200117954 [DOI] [Google Scholar]
  • 34.Martin J, Wilcox LC, Burstedde C, Ghattas O. A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM Journal on Scientific Computing. 2012;34(3):A1460–A1487. doi: 10.1137/110845598 [DOI] [Google Scholar]
  • 35.Girolami M, Calderhead B. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2011;73(2):123–214. doi: 10.1111/j.1467-9868.2010.00765.x [DOI] [Google Scholar]

Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES