Abstract
Researchers and policymakers have proposed systems to detect novel pathogens earlier than existing surveillance systems by monitoring samples from hospital patients, wastewater, and air travel, in order to mitigate future pandemics. How much benefit would such systems offer? We developed, empirically validated, and mathematically characterized a quantitative model that simulates disease spread and detection time for any given disease and detection system. We find that hospital monitoring could have detected COVID-19 in Wuhan 0.4 weeks earlier than it was actually discovered, at 2,300 cases (standard error: 76 cases) compared to 3,400 (standard error: 161 cases). Wastewater monitoring would not have accelerated COVID-19 detection in Wuhan, but provides benefit in smaller catchments and for asymptomatic or long-incubation diseases like polio or HIV/AIDS. Air travel monitoring does not accelerate outbreak detection in most scenarios we evaluated. In sum, early detection systems can substantially mitigate some future pandemics, but would not have changed the course of COVID-19.
Subject terms: Epidemiology, Health policy, SARS-CoV-2
Monitoring samples from hospital patients, wastewater, or air travel may enable early detection of pathogens. Here, the authors assess how these surveillance systems could have impacted detection of COVID-19 and their potential benefits for detection of other emerging pathogens.
Introduction
It has been widely debated which policies, if any, could have mitigated the health impacts of the initial stages of the COVID-19 pandemic in late 2019 and early 2020 as community transmission became established and widespread. Early studies compared non-pharmaceutical interventions (NPIs) such as mobility restrictions1,2, school closures3,4, voluntary home quarantine5 and testing policies6, and optimized NPI parameters like testing frequency7, quarantine length8, testing modality9, test pooling10 and intervention timing and ordering11. While such NPIs undoubtedly slowed the early spread of COVID-1912 and previous outbreaks13,14, there has been little investigation of whether a separate strategy focused on earlier detection of COVID-19 would have enabled more successful mitigation. In theory, earlier detection enables a response when the outbreak is smaller: thus, resource-intensive mitigation strategies like test-trace-isolate become less costly, and the earlier interventions are applied, the larger the number of infections and deaths that can be delayed until healthcare capacity is increased15. However, the relevant question is not whether early-detection helps, but quantitatively, how much of a difference it would make. This question is especially urgent given current international and national policy proposals to invest billions of dollars in such systems16,17.
Researchers and policymakers have proposed immediate investments in systems to continuously monitor for novel pathogens in (i) patients with infectious symptoms in hospitals and clinics18,19, (ii) community wastewater treatment plants20,21, and (iii) airplane sewage or bridge air on international flights22–24, as well as other sites25–29. These three sites have attracted interest because they have been frequent testing sites in COVID-19: hospitals since the pandemic’s beginning30, and wastewater (including wastewater at treatment plants20, within the sewershed31, and locally near individual buildings32) and air travel more recently33,34 because hospital cases can lag community cases35. COVID-19 also spurred methodological innovation and characterization of sampling from these sites36, particularly wastewater37–39. Detecting novel pandemics at these sites has occasionally been piloted21,40 but has not been implemented at scale, in part because it is unclear if these proposed systems sufficiently expedite detection of outbreaks. The systems under consideration would use multiplex testing for conserved nucleic acid sequences of known pathogen families, exploiting the fact that many past emerging diseases belonged to such families, including SARS-CoV-2 (2019), Ebola (2013), MERS-CoV (2012), and pandemic flu (2009). Proposed technologies include multiplex PCR41–44, CRISPR-based multiplex diagnostics45, and metagenomic sequencing46, possibly implemented with pooling10.
In this work, to determine whether early detection of novel pathogens at these sites could be effective in changing the course of a pandemic, we first examine whether COVID-19 could have been detected earlier in Wuhan if systems had been in place in advance to monitor hospitals, wastewater or air travel. To do this, we develop, empirically validate, and mathematically characterize a simulation-based model that predicts the number of cases at the time of detection given a detection system and a set of outbreak epidemiological parameters. We then use this model and COVID-19 epidemiological parameters47 to estimate how early COVID-19 would initially have been detected in Wuhan by the three early-detection systems, and compare this to the actual date of COVID-19 detection. Finally, we use our model to estimate detection times of infectious agents with different epidemiological properties, such as mpox and polio in recent outbreaks48,49, to inform pathogen-agnostic surveillance for future pandemics.
Results
Model to estimate earliness of detection
Previous research15 and our analysis (Supplementary text, Figs. S1–4 and Table S1) suggest that earlier COVID-19 lockdowns could have delayed cases and deaths. Thus, it is critical to understand which early-detection systems, if any, could have effectively enabled earlier response. To do this, we built a model that simulates outbreak spread and earliness of detection for a given outbreak and detection system (Materials and methods, Supplementary materials). This builds upon branching process models that have previously been used to model the spread of COVID-1950,51 and other infectious diseases52. A traditional branching process model starts from an index case and iteratively simulates each new generation of infections. Our model follows this pattern, but with each new infection, we also simulate whether the infected person is detected by the detection system with some probability (Fig. 1a), and the simulation stops when the number of detected individuals equals the detection threshold and the detection delay has passed. Thus, each detection system is characterized by these three parameters: detection probability, threshold, and delay (Table S2). For example, in hospital monitoring, an infected individual’s detection probability is the probability they are sick enough to enter the hospital, which is the hospitalization rate (assuming testing has a negligible false negative rate). In systems that test individuals (hospital and air travel individual monitoring), the threshold is measured in an absolute number of cases. In systems that test wastewater (wastewater monitoring), the threshold is measured in terms of outbreak prevalence because wastewater monitoring can only sample a small percentage of sewage flows, depending on the sampling capacity53; thus, a higher number of cases is required to trigger detection in a bigger community (Materials and methods). We gathered literature estimates of detection system and outbreak parameters (Tables S2 and S3) and validated wastewater monitoring sensitivity in independent data (Fig. S5 and Materials and methods, Supplementary materials). We then empirically validated the model by testing its ability to predict the detection times for the first COVID-19 outbreaks in 50 US states in 2020. We gathered the dates of the first COVID-19 case reported by the public health department of each US state (Table S4) as well as literature estimates of true (tested and untested) statewide COVID-19 case counts in early 202054. Using our model, we were able to predict the number of weeks until travel-based detection in each US state within a mean absolute error of 0.97 weeks (Figs. S6 and S7). To check the robustness of our results, we implemented a second, more complex model with varying reproduction numbers using a Monte Carlo simulation-based package (EpiNow2 v1.3.555). A list of model assumptions can be found in Table S5.
Early detection’s impact on COVID-19 detection in Wuhan
Next, we use our model to examine the detection systems’ ability to detect the first major COVID-19 outbreak in Wuhan (Fig. 1b and Table S2). To estimate cases at detection in the actual pandemic, we used literature estimates of total (tested and untested) COVID-19 case counts in Wuhan in late 2019 and early 202056. Our model shows that, on average, hospital monitoring could have detected COVID-19 after 2292 cases (standard error: 76 cases). In reality, the pandemic was identified after 3413 cases on average (standard error: 161 cases). Thus, hospital monitoring would have caught the outbreak 1121 cases earlier (~0.43 weeks earlier), a statistically significant difference with p = 1.9e-09 and t = −6.3 (df = 141) in a one-sided Welch two-sample t test. Wastewater monitoring would have lagged detection in the actual pandemic; it caught the outbreak after 4,575 cases (standard error: 523 cases), or 1162 cases later, on average (p = 0.018; t = 2.1; df = 118). We tested this wastewater prediction empirically by calculating the cases until COVID-19 wastewater detection in Massachusetts in early 2020, using literature-estimated Massachusetts COVID-19 cases54 and Massachusetts wastewater SARS-CoV-2 PCR data57; our model prediction was consistent with this analysis (Fig. S8). Because we model wastewater monitoring to detect later in larger communities (Materials and methods, Supplementary materials), the Wuhan result is in part due to Wuhan’s 650,000-person catchments. Wastewater monitoring would lead status quo detection of COVID-19 in catchments smaller than 480,000 people, well above the global median catchment size of 30,000 people58. Air travel monitoring did not provide any acceleration of detection because of the low probability of simultaneously traveling and being sick.
Early detection for other diseases: formula and simulation
To make our model easily usable for pathogenic outbreaks beyond COVID-19, we derived a compact formula that approximates the model’s simulations. We observed that, without accounting for the delay of generations between the threshold case’s infection and detection, the number of cases until detection, , is a random variable that follows a negative binomial distribution by definition: each infected case is a Bernoulli trial, “success” in that trial occurs when that case enters the detection system (with a probability we name ), and we count the number of cases until the number of successes equals the detection threshold . After accounting for and the basic reproduction number R0, we derived a formula approximating the mean of when the outbreak starts in a community covered by the detection system (see Supplementary Text for full derivation):
1 |
We confirmed our formula approximates the simulation model closely by comparing the detection times predicted by both for all the detection systems for multiple diseases (Fig. S9). Thus, the formula allows us to interpret the model and the quantitative relationships between detection times and various variables: the formula shows that the number of cases until detection increases linearly with the detection threshold, increases polynomially with R0 and exponentially with the delay g as R0g, and decreases as the fraction of cases being tested increases. This formula also makes the model easily usable for detection systems beyond the ones studied here.
We applied our model to several outbreaks of recent interest–including COVID-19, mpox (2022), polio (2013–2014), Ebola (2013–2016) and flu (2009 pandemic)–and found that the detection systems vary in their success depending on the epidemiological parameters of the agent (Fig. 2, S10 and S11, and table S3). For example, in our model hospital monitoring tends to outperform wastewater monitoring when the hospitalization rate is high, as in the case of Ebola, but tends to underperform for diseases like polio, in which the hospitalization rate is low and when there is high asymptomatic spread in the delay from detection to hospitalization. This is consistent with Eq. (1), as well as previous observations that Ebola was first detected in hospitals59 and that wastewater monitoring has been more effective than hospital monitoring for detecting polio60. Wastewater monitoring performs even better for smaller, 30,000-person catchments (Figs. S12 and S13). We also modeled the status quo detection times for these outbreaks: the number of cases until these outbreaks were detected in the status quo, without the proposed detection systems in place. We found that early-detection systems can catch outbreaks when they are up to 52% smaller (wastewater for polio) or 110 weeks earlier (hospital for HIV/AIDS) (Figs. S14–S17). Similar results hold for the more complex model: the relative median detection times of the three systems remain the same 97% of the time across the five main diseases (29/30 pairwise comparisons) (Fig. S18).
Because future infectious diseases are likely to have different epidemiological parameters, we generalized the previous analysis and calculated detection times for many possible diseases spanning the epidemiological parameter space (Fig. 3 and S19). As expected, hospital monitoring is the best system for diseases with higher hospitalization rates and lower times to hospitalization. For diseases with higher R0s and times to hospitalization, wastewater monitoring emerges as the best system more often, because hospital monitoring has a longer detection delay (mainly the time from infection to hospitalization) than wastewater (mainly the time from infection to fecal shedding), during which cases grow exponentially with R0. However, this holds mainly for diseases with high probability of fecal shedding and low hospitalization rate. Air travel monitoring, which did not perform well in the previously modeled diseases (Figs. 1 and 2), actually performed best for a few diseases for which fecal shedding is low (disadvantaging wastewater monitoring) and the time to hospitalization and R0 are too large (disadvantaging hospital monitoring).
Discussion
Our results show that the benefits of early-detection systems vary from marginal (0.4 weeks earlier for COVID-19) to significant (110 weeks earlier for HIV/AIDS) (Figs. 1B, 2, and S17). Our detection time model (Fig. 1a) can be used for many diseases and detection systems, including other systems beyond this study25,26, by varying the fraction of the infected population being tested in each system. Some further points are worth emphasizing. First, early-detection only aids mitigation if it leads to a coordinated early response. Many factors beyond detection affect the pace of response, including the economic and political feasibility of lockdowns, the availability of medicines and personal protective equipment, and whether there are pre-determined policies to be implemented upon detection. Second, when deciding to invest in these systems, one must consider factors such as cost-effectiveness and whether the system provides evidence of disease severity. Although wastewater monitoring gives earlier detection than hospital monitoring in multiple diseases (Fig. 3a), it does not discriminate between mild and severe disease (although sequencing could detect lineages known to cause severe illness). In contrast, hospital monitoring provides evidence that the detected pathogen produces symptoms that require hospital treatment. Third, our model is meant to be used now, in advance of future pandemics, and not in the early months of a novel pandemic, because early-detection systems must be set up in advance of the next pandemic to be effective. Because we do not know the epidemiological parameters of the next pandemic, our study assesses how these systems would perform for a wide, representative variety of diseases with different epidemiological parameters, in order to quantify these systems’ benefits in general.
These results can inform ongoing international and national policy debates about which policies are needed to mitigate future pandemics. In the wake of COVID-19, the World Health Organization Intergovernmental Negotiating Body is actively negotiating a new treaty on international pandemic preparedness which updates the International Health Regulations (2005). Drafts of this treaty highlight “early warning and alert systems” as key measures16. Similarly, the presidential administration of the United States has proposed investing $5.3 billion over 7 to 10 years in early warning and real-time monitoring systems, including in hospitals and wastewater17. In this study, we have assessed detection systems’ detection times and have developed a model to assess current and future detection system proposals. Along with additional cost-effectiveness analysis and technical pilots21, these results can help inform which detection systems are most effective and thus worth funding in pandemic preparedness efforts.
Methods
Description of model predicting cases at detection
Our branching process-based model predicts the cumulative number of cases at the time of detection for a given detection system and outbreak. It follows the approach of branching process simulation models used previously to model the spread of COVID-1950,51 and other infectious diseases52, but with the main added step of simulating each infected person’s chance of being detected by the detection system. The values for parameters for detection systems can be found in Table S2. The values for epidemiological parameters for outbreaks (R0, serial interval, dispersion, hospitalization rate, and time to hospitalization) can be found in Fig. 2 and Table S3. As in previous models52, we assume the offspring distribution (the number of secondary cases infected by each primary case) is negative binomial with mean R0.
We generally follow past detection system proposals19,21 to determine the implementation details of each system in our model. Our model assumes the following. In hospital monitoring, hospitals would test for high-priority pathogen families (e.g., coronaviruses) in patients presenting with severe infectious symptoms in hospital emergency departments19. Similarly, in wastewater monitoring, governments would test for pathogens in city wastewater treatment plants daily, and monitor for high and increasing levels of high-priority pathogen families21. In air travel monitoring, we model testing of individual symptomatic passengers (differs from proposals to monitor airplane sewage22 or bridge air) on incoming international flights for the same pathogens. The parameters of these systems are shown in Table S2.
Our model also accounts for different delays involved in different detection systems. For example, if the 500th case of a COVID-19-like outbreak triggers the detection threshold in both the hospital and wastewater monitoring systems, because of the significant delay from infection to hospitalization compared to the delay from infection to fecal shedding, the wastewater system would catch the outbreak earlier.
In systems that test individuals (hospital and air travel individual monitoring), the threshold is measured in an absolute number of cases. In systems that test wastewater (community and air travel wastewater monitoring), the threshold or sensitivity is measured in prevalence (cases as a percentage of the population)53,61,62. To predict the number of cases and time to detection, we need to convert this percentage back to a number of cases, so the wastewater detection time depends on the catchment population size.
To estimate wastewater sensitivity measured in prevalence, we used data from53. This study conducted PCR testing for SARS-CoV-2 1687 longitudinal wastewater samples from 353 sampling locations in 40 US states in early 2020, and synced these with publicly reported local daily new COVID-19 case counts. This enables us to estimate a distribution of the wastewater sensitivity: the lowest case count required to trigger positive detection in wastewater. Of the 353 sampling locations, 47 had both SARS-CoV-2-positive and negative samples such that local case counts on days of positive samples were all higher than those on days of negative samples. We thus knew each sampling location’s sensitivity is between the maximum of case counts on negative sample days and the minimum of case counts on positive sample days. We took the midpoint of this maximum and minimum as the location’s sensitivity; this gave us 47 local sensitivities. We fitted this to a log-normal distribution, yielding a median of 2.5 daily new cases per 100,000 people. As expected, this distribution is similarly shaped but slightly left-shifted from the distribution in Fig. 2b of ref. 53 (median 3.7 per 100,000), because the latter distribution is an upper bound of the former.
To use this distribution in our model, in each simulation run, we first randomly drew a wastewater sensitivity from this distribution, and then we needed to convert this reported incidence i to the true (reported and unreported) number of cases shedding fecally into public wastewater systems up to the time of wastewater detection. We converted as follows. Let day T be the day on which the incidence i is reported. First, we assumed the wastewater SARS-CoV-2 level on day T is proportional to the number of COVID-19 cases who are fecally shedding on day T, which we estimate as the number of fecal shedders infected 2 days before, given the dominant peak in fecal shedding on day 2 of infection61. We infer the number of fecal shedders infected on day T-2 from the incidence as follows. To account for underreporting, we first estimate a true daily incidence of 5.7× i with symptom onset on day T, based on estimates of the ratio of true (dated by symptom onset) to reported (dated by reporting date) COVID-19 cases in the United States in early 202054. (This study’s abstract reports true cases are 5–50× reported ones, but this refers to the early March 1–April 4, 2020, period. We calculated the factor of 5.7 from the study’s data when we use the fuller March 1-May 16 period, which overlaps better with the February-June 2020 period in53 and reflects less underreporting as the pandemic developed and testing capacity increased. We calculated this underreporting factor as an average of state-level underreporting factors, weighted by frequency of each state among the wastewater samples in ref. 53.) Finally, we multiply by (a) the fraction of cases who shed fecally (0.563) and (b) the fraction of people connected to central sewage (0.8 in the US64, which is the area from which the53 threshold is derived). This gives us the one-time prevalence of cases p who contribute to the wastewater SARS-CoV-2 level on day T. For a given catchment with population c, this one-time number of cases is cp, and we estimate the cumulative number of fecal shedders up to this time as , where is the number of days for the daily exponential outbreak incidence curve to grow from 1 to cp cases.
To check this estimate, we identified studies that compared wastewater and hospital COVID-19 trends20,53 found that trends in wastewater SARS-CoV-2 values led trends in hospital admissions by 1-4 days in New Haven (catchment size 2e + 05). We estimate that wastewater detection would lead hospital detection of COVID-19 in New Haven by −0.8 to 3 weeks (90% CI). This is consistent with the 1-4d lead estimate from20. Similarly53, found that trends in wastewater led those in clinical data by 4 days in Massachusetts (catchment size 2,300,000). Their clinical data are dated by date of reporting rather than sample gathering; assuming that hospital admissions are 5 days ahead of tests by date of reporting20, then wastewater is 5d-4d = 1 day behind hospital admissions. We estimate that wastewater detection would lead hospital detection of COVID-19 in Massachusetts by −4 to −0.09 weeks (90% CI). This is consistent with the 1-day lag estimate from53.
Validation of model in US states
We gathered two sources of data for each state: dates of COVID-19 detection and COVID-19 case counts in early 2020. For the former, we searched media reports and US state public health press releases to determine the dates of the first COVID-19 case reported in each US state. Sources for each state’s detection date are listed in Table S4. We were able to identify such dates for all 50 states.
For the latter, we used literature estimates of true (tested and untested) COVID-19 case counts, which incorporate COVID-19 mortality data to deal with variation in testing capacity among states54. We received a time series of weekly symptomatic COVID-19 case estimates for March 1-May 10, 2020 and divided by a symptomatic rate of 0.55 to get an estimate of total (symptomatic and asymptomatic) cases47. We specifically used estimates from the adjusted mMAP (mortality maximum a posteriori) method because54 had mMAP estimates for all 50 states, whereas other methods from the same study were missing estimates for various states. We fit an exponential curve of case counts in each state to extrapolate cases back to January 2020. In the data we received, all states had case data for all weeks from March 1–May 10, 2020.
We used our model to predict the weeks until detection in each US state (y axis in Fig. S6). Because most US states detected their first case by travel (Table S4), we modeled a travel-based detection system similarly to how we modeled the aforementioned detection systems. We simulated a growing stream of imported travel cases (R0i cases for the ith generation and global R0 = 2.5), and as for the other detection systems, we simulated infection and detection steps for each generation, except that we only allowed travel-associated cases to be detected. We assumed that the state COVID-19 outbreaks had the same values for all epidemiological parameters except for R0, which we allowed to vary by state to account for state-specific conditions. We obtained state-specific R0 values from ref. 65. The values for shared parameters were obtained from literature (Table S3). We used a detection delay of 12 days (5-day incubation period47 plus 7-day test and reporting turnaround in early 2020 in the US66) because many first cases were detected following symptoms. The only parameter we were unable to precisely estimate from literature was the probability of a travel case being detected. We noted that this rate was at most the COVID-19 symptomatic rate (0.5547) and at least the hospitalization rate (0.0347): in the highest-detecting scenario, every symptomatic case would volunteer to be tested; in the lowest-detecting scenario, only hospitalized travel cases would get flagged for testing. So we chose a rate of 0.1, near the two rates’ geometric mean. The predicted detection time for each state (the y-value reported in Fig. S6) was the mean of 100 simulations.
We compared these predictions to ground truth estimates in each state (x axis in Fig. S6). These ground truth estimates were calculated by summing the aforementioned weekly case counts from the first week of January 2020 until the date of detection in that state (Fig. S7).
Early detection’s impact on COVID-19 detection in Wuhan
We used our model to examine whether the early-detection systems could have detected COVID-19 earlier than in the actual pandemic. To do this, we used two data sources: (1) literature estimates of total (tested and untested) COVID-19 case counts in late 2019 and early 202056 and (2) simulation output from our model. We then used (1) to calculate the cumulative number of cases when COVID-19 was actually detected, and compared this to results from (2).
For (1), we chose to use estimates from56, which quantifies both the time of SARS-CoV-2 introduction into humans and the time series of cases following said introduction. These estimates are based on phylodynamic rooting methods applied to SARS-CoV-2 sequence data, combined with epidemic simulations and accounting for epidemiological data on the first known cases of COVID-19. These estimates improve upon previous attempts to time SARS-CoV-2’s introduction into humans, which are solely based on phylodynamic rooting methods to quantify the time to the most recent common ancestor of SARS-CoV-2 sequences67.
As instructed by56, we utilized ‘BEAST.primary.IH.Dec10_16.linB.Dec15_25.linA.cumulativeInfections.timedGEMF_combined.stats.pickle’ from GitHub68 to obtain the distribution of daily case counts. Based on the fact that there were at least six COVID-19-related hospitalizations by 2019-12-2969, we narrowed the distribution to those epidemic simulations with the top 25 percent of hospitalizations and case counts. We simulated 100 draws from this distribution, and then took the number of cases on 2019-12-29 in each simulation to get 100 values for the distribution of cumulative cases at detection in the actual pandemic (‘Actual pandemic’ boxplot in Fig. 1B). We chose 2019-12-29 as the date that COVID-19 was detected in the actual pandemic, because this was the date of the first report of an outbreak of pneumonia cases to health authorities in Wuhan70.
For (2), we ran our model for COVID-19 (see Table S3 for the epidemiological parameters used) and all three detection systems (100 simulations for each system). For each detection system, this gave us the estimated number of cases until the detection of COVID-19 if that system had been in place at the start of the pandemic. We assumed the system was present in the community in which COVID-19 originated. We compared each system to the actual pandemic, and determined that detection could have occurred earlier with the system if there was a statistically significant difference in cases until detection between the actual pandemic and the simulated world with the system (Fig. 1b). Statistical significance was assessed by a 1-sided t test in which the alternative hypothesis was that the detection system performed better.
We could empirically test our model predictions for the cases until wastewater detection by using literature-estimated total COVID-19 cases in Massachusetts54 and Massachusetts wastewater SARS-CoV-2 data57 in early 2020. We aimed to use these to estimate the cases until COVID-19 wastewater detection in Massachusetts in early 2020, but because Massachusetts wastewater sampling for COVID-19 started only after the Massachusetts outbreak was underway, wastewater samples were positive for SARS-CoV-2 on the first day of testing, so this first day of testing was later than when wastewater detection could have caught SARS-CoV-2 if wastewater detection had been in place in advance. Thus, we could only calculate an upper bound on the true cases until detection. We utilized the wastewater time series from the Massachusetts Water Resources Authority (MWRA) website and synced it with the COVID-19 case count time series (Fig. S8). We multiplied the Massachusetts statewide cases by 0.33 (equal to 2,300,000/6,900,000) because the MWRA data covers 2,300,000 people, out of 6,900,000 people in Massachusetts in 2020. We then summed these case counts up to the date of apparent wastewater detection to get an upper bound for cases at detection, and checked whether our model prediction was consistent with this bound.
Simulated versus mathematically approximate detection times
We compared the model simulations of cases until detection with our derived mathematical formula, Eq. (1) (Fig. S9). The points in Fig. S9 are the same as in Fig. 2a. The dashed lines are generated by plugging values into Eq. (1) for each detection system: we plugged in the detection threshold, detection probability, outbreak R0, and detection delay (measured in number of generations, i.e., serial intervals) for d, , R0, and g, respectively.
Comparison of detection systems for different diseases
We applied our model to several outbreaks of recent interest: COVID-19, mpox (2022), polio (2013–2014), Ebola (2013–2016) and flu (2009 pandemic) (Fig. 2a). Because of the lack of data on the number of cases at the time of detection in previous outbreaks (except for the COVID-19 data used in Fig. 1b), we used our model to estimate status quo detection times for the outbreaks. Because many recent outbreaks have been detected in healthcare settings59,69,71,72, we assumed status quo detection was similar to hospital monitoring, except with a lower detection probability per case () to reflect that symptomatic cases are less likely to be tested for a panel of diseases without the proposed systematic, proactive testing scheme. The per-case detection probability for status quo was set to 0.67 times that of hospital monitoring to match our modeled status quo detection times for COVID-19 with those estimated independently by ref. 56 (Fig. 1b).
Software
Analyses and figures were generated by code at https://github.com/abliu/early-detection/releases as well as tidyverse (v1.3.1).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We gratefully acknowledge Mauricio Santillana, Nicholas B. Link, Fred S. Lu, and Andre T. Nguyen for sharing their estimates on COVID-19 incidence in the US states. We also acknowledge Jonathan Pekar, Joel Wertheim, and Michael Worobey for sharing their estimates on COVID-19 incidence in late 2019 and early 2020. We acknowledge Michael McLaren, Becky Ward, and Quincey Justman for feedback on the manuscript. We thank the following for their financial support: Lynch Foundation Fellows Program in Systems Biology at Harvard Medical School (A.B.L.), National Library of Medicine grant T15LM007092 (D.L.), National Institutes of Health grant R01GM120122 (A.P.J.), CDC contract 200-2016-91779 (W.P.H.), and National Institutes of Health grant R01GM120122 (M.S.). This publication was also made possible by the New England Pathogen Genomics Center of Excellence (Cooperative Agreement NU50CK000629). This project has been funded (in part) by contract 200-2016-91779 with the Centers for Disease Control and Prevention. Disclaimer: The findings, conclusions, and views expressed are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention (CDC).
Author contributions
A.B.L. and M.S. conceived of the study. A.B.L., A.P.J., and M.S. developed the methodology. A.B.L. wrote the software and validated the results. A.B.L. and D.L. performed the mathematical analysis. A.B.L. gathered data on epidemiological parameters, wastewater sensitivities, case counts, and dates of detection. A.B.L. and M.S. wrote the original draft. A.B.L., D.L., A.P.J., W.P.H., and M.S. reviewed and edited the manuscript. A.B.L. and M.S. developed the figures. A.B.L., W.P.H. and M.S. supervised the study. M.S. acquired funding for the study.
Peer review
Peer review information
Nature Communications thanks Matthew Wade and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
Detection time data are available at https://github.com/abliu/early-detection/releases. Estimated COVID-19 case counts in late 2019 and early 2020 are available from Pekar et al.56 (https://github.com/sars-cov-2-origins/multi-introduction), and the US wastewater threshold data are from Wu et al.53 (https://www.sciencedirect.com/science/article/pii/S0043135421005984?via%3Dihub#sec0019). In the supplementary analyses, the Massachusetts Water Resources Authority wastewater data are from https://www.mwra.com/biobot/biobotdata.htm, national COVID-19 case counts in early 2020 are from the Johns Hopkins Center for Systems Science and Engineering 2023 (https://github.com/CSSEGISandData/COVID-19), and US state COVID-19 case counts in early 2020 are from Lu et al.54 (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008994).
Code availability
Code is available at https://github.com/abliu/early-detection/releases73.
Competing interests
W.P.H. is a member of the scientific advisory board and has stock options in BioBot Analytics. M.S. is a cofounder of Rhinostics and consults for the diagnostic consulting company Vectis Solutions LLC. The other authors declare that they have no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Andrew Bo Liu, Email: andrewliu@g.harvard.edu.
Michael Springer, Email: michael_springer@hms.harvard.edu.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-44199-7.
References
- 1.Kraemer MUG, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368:493–497. doi: 10.1126/science.abb4218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Meyerowitz-Katz G, et al. Is the cure really worse than the disease? The health impacts of lockdowns during COVID-19. BMJ Glob. Health. 2021;6:e006653. doi: 10.1136/bmjgh-2021-006653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yamey G, Walensky RP. Covid-19: Re-opening universities is high risk. BMJ. 2020;370:m3365. doi: 10.1136/bmj.m3365. [DOI] [PubMed] [Google Scholar]
- 4.Brauner JM, et al. Inferring the effectiveness of government interventions against COVID-19. Science. 2021;371:eabd9338. doi: 10.1126/science.abd9338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ferguson, N. M. et al. Report 9 - Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. https://www.imperial.ac.uk/medicine/departments/school-public-health/infectious-disease-epidemiology/mrc-global-infectious-disease-analysis/covid-19/report-9-impact-of-npis-on-covid-19/ (2020). [DOI] [PMC free article] [PubMed]
- 6.Levine-Tiefenbrun M, et al. SARS-CoV-2 RT-qPCR test detection rates are associated with patient age, sex, and time since diagnosis. J. Mol. Diagnostics. 2022;24:112–119. doi: 10.1016/j.jmoldx.2021.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Larremore, D. B. et al. Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening. Sci. Adv. eabd5393 10.1126/sciadv.abd5393 (2020). [DOI] [PMC free article] [PubMed]
- 8.Liu AB, et al. Association of COVID-19 quarantine duration and postquarantine transmission risk in 4 university cohorts. JAMA Netw. Open. 2022;5:e220088. doi: 10.1001/jamanetworkopen.2022.0088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wyllie AL, et al. Saliva or nasopharyngeal swab specimens for detection of SARS-CoV-2. N. Engl. J. Med. 2020;383:1283–1286. doi: 10.1056/NEJMc2016359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yelin I, et al. Evaluation of COVID-19 RT-qPCR test in multi sample pools. Clin. Infect. Dis. 2020;71:2073–2078. doi: 10.1093/cid/ciaa531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Karin, O. et al. Cyclic exit strategies to suppress COVID-19 and allow economic activity. 10.1101/2020.04.04.20053579 (2020).
- 12.Flaxman S, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584:257–261. doi: 10.1038/s41586-020-2405-7. [DOI] [PubMed] [Google Scholar]
- 13.Hatchett RJ, Mecher CE, Lipsitch M. Public health interventions and epidemic intensity during the 1918 influenza pandemic. Proc. Natl Acad. Sci. 2007;104:7582–7587. doi: 10.1073/pnas.0610941104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Peak CM, Childs LM, Grad YH, Buckee CO. Comparing nonpharmaceutical interventions for containing emerging epidemics. Proc. Natl Acad. Sci. 2017;114:4023–4028. doi: 10.1073/pnas.1616438114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pei S, Kandula S, Shaman J. Differential effects of intervention timing on COVID-19 spread in the United States. Sci. Adv. 2020;6:eabd6370. doi: 10.1126/sciadv.abd6370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bureau of the Intergovernmental Negotiating Body, World Health Organization. Conceptual zero draft for the consideration of the Intergovernmental Negotiating Body at its third meeting. 32 (2022).
- 17.Lander, E. & Sullivan, J. American pandemic preparedness: transforming our capabilities. 27 (2021).
- 18.Botti-Lodovico Y, et al. The origins and future of sentinel: an early-warning system for pandemic preemption and response. Viruses. 2021;13:1605. doi: 10.3390/v13081605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ecker, D. J. How to snuff out the next pandemic. Scientific American Blog Network (2020).
- 20.Peccia J, et al. Measurement of SARS-CoV-2 RNA in wastewater tracks community infection dynamics. Nat. Biotechnol. 2020;38:1164–1167. doi: 10.1038/s41587-020-0684-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.The Nucleic Acid Observatory Consortium. A global nucleic acid observatory for biodefense and planetary health. arXiv https://arxiv.org/abs/2108.02678 (2021).
- 22.Hjelmsø MH, et al. Metagenomic analysis of viruses in toilet waste from long distance flights—a new procedure for global infectious disease surveillance. PLoS One. 2019;14:e0210368. doi: 10.1371/journal.pone.0210368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Muntean, J., Howard, K. & Atwood, P. CDC has tested wastewater from aircraft amid concerns over Covid-19 surge in China. CNN (2023).
- 24.Nordahl Petersen T, et al. Meta-genomic analysis of toilet waste from long distance flights; a step towards global surveillance of infectious diseases and antimicrobial resistance. Sci. Rep. 2015;5:11444. doi: 10.1038/srep11444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mina MJ, et al. A global immunological observatory to meet a time of pandemics. eLife. 2020;9:e58989. doi: 10.7554/eLife.58989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chu HY, et al. Early detection of Covid-19 through a citywide pandemic surveillance platform. N. Engl. J. Med. 2020;383:185–187. doi: 10.1056/NEJMc2008646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Brownstein JS, Freifeld CC, Madoff LC. Digital Disease Detection — Harnessing the Web for Public Health Surveillance. N. Engl. J. Med. 2009;360:2153–2157. doi: 10.1056/NEJMp0900702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dugas AF, et al. Influenza forecasting with google flu trends. PLOS One. 2013;8:e56176. doi: 10.1371/journal.pone.0056176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bajema, N., Beaver, W. & Parthemore, C. Toward a global pathogen early warning system: building on the landscape of biosurveillance today. https://councilonstrategicrisks.org/wp-content/uploads/2021/07/Toward-A-Global-Pathogen-Early-Warning-System_2021_07_20-1.pdf (2021).
- 30.Lee, V. J., Chiew, C. J. & Khong, W. X. Interrupting transmission of COVID-19: lessons from containment efforts in Singapore. J. Travel Med.27, taaa039 (2020). [DOI] [PMC free article] [PubMed]
- 31.Keshaviah A, et al. Wastewater monitoring can anchor global disease surveillance systems. Lancet Glob. Health. 2023;11:e976–e981. doi: 10.1016/S2214-109X(23)00170-5. [DOI] [PubMed] [Google Scholar]
- 32.Petros, B. A. et al. Multimodal surveillance of SARS-CoV-2 at a university enables development of a robust outbreak response framework. Med3, 883-900.e13 (2022). [DOI] [PMC free article] [PubMed]
- 33.Boehm AB, et al. Regional replacement of SARS-CoV-2 variant omicron BA.1 with BA.2 as observed through wastewater surveillance. Environ. Sci. Technol. Lett. 2022;9:575–580. doi: 10.1021/acs.estlett.2c00266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kim JE, Lee JH, Lee H, Moon SJ, Nam EW. COVID-19 screening center models in South Korea. J. Public Health Policy. 2021;42:15–26. doi: 10.1057/s41271-020-00258-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Meakin S, et al. Comparative assessment of methods for short-term forecasts of COVID-19 hospital admissions in England at the local level. BMC Med. 2022;20:86. doi: 10.1186/s12916-022-02271-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Arizti-Sanz J, et al. Streamlined inactivation, amplification, and Cas13-based detection of SARS-CoV-2. Nat. Commun. 2020;11:5921. doi: 10.1038/s41467-020-19097-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gregory DA, Wieberg CG, Wenzel J, Lin C-H, Johnson MC. Monitoring SARS-CoV-2 populations in wastewater by amplicon sequencing and using the novel program SAM refiner. Viruses. 2021;13:1647. doi: 10.3390/v13081647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li, C. et al. Population normalization in SARS-CoV-2 wastewater-based epidemiology: implications from statewide wastewater monitoring in Missouri. 10.1101/2022.09.08.22279459 (2022).
- 39.Robinson CA, et al. Defining biological and biophysical properties of SARS-CoV-2 genetic material in wastewater. Sci. Total Environ. 2022;807:150786. doi: 10.1016/j.scitotenv.2021.150786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bibby K, Peccia J. Identification of viral pathogen diversity in sewage sludge by metagenome analysis. Environ. Sci. Technol. 2013;47:1945–1951. doi: 10.1021/es305181x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Creager HM, et al. Clinical evaluation of the BioFire® Respiratory Panel 2.1 and detection of SARS-CoV-2. J. Clin. Virol. 2020;129:104538. doi: 10.1016/j.jcv.2020.104538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Edin A, Eilers H, Allard A. Evaluation of the biofire filmarray pneumonia panel plus for lower respiratory tract infections. Infect. Dis. 2020;52:479–488. doi: 10.1080/23744235.2020.1755053. [DOI] [PubMed] [Google Scholar]
- 43.Murphy CN, et al. Multicenter evaluation of the biofire filmarray pneumonia/pneumonia plus panel for detection and quantification of agents of lower respiratory tract infection. J. Clin. Microbiol. 2020;58:e00128–20. doi: 10.1128/JCM.00128-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Quick J, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 2017;12:1261–1276. doi: 10.1038/nprot.2017.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ackerman CM, et al. Massively multiplexed nucleic acid detection with Cas13. Nature. 2020;582:277–282. doi: 10.1038/s41586-020-2279-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chiu CY, Miller SA. Clinical metagenomics. Nat. Rev. Genet. 2019;20:341–355. doi: 10.1038/s41576-019-0113-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bar-On, Y. M., Sender, R., Flamholz, A. I., Phillips, R. & Milo, R. A quantitative compendium of COVID-19 epidemiology. https://arxiv.org/abs/2006.01283 (2020).
- 48.Du, Z. et al. Reproduction number of monkeypox in the early stage of the 2022 multi-country outbreak. J. Travel Med. taac099 10.1093/jtm/taac099 (2022). [DOI] [PubMed]
- 49.U.S. Centers for Disease Control and Prevention (CDC). United States confirmed as country with circulating vaccine-derived poliovirus. CDC (2022).
- 50.Bradshaw WJ, Alley EC, Huggins JH, Lloyd AL, Esvelt KM. Bidirectional contact tracing could dramatically improve COVID-19 control. Nat. Commun. 2021;12:232. doi: 10.1038/s41467-020-20325-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hellewell J, et al. Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts. Lancet Glob. Health. 2020;8:e488–e496. doi: 10.1016/S2214-109X(20)30074-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438:355–359. doi: 10.1038/nature04153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wu F, et al. Wastewater surveillance of SARS-CoV-2 across 40 U.S. States from February to June 2020. Water Res. 2021;202:117400. doi: 10.1016/j.watres.2021.117400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lu FS, et al. Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: four complementary approaches. PLOS Comput. Biol. 2021;17:e1008994. doi: 10.1371/journal.pcbi.1008994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Abbott, S. et al. Epiforecasts/EpiNow2: 1.3.4 release. 10.5281/zenodo.7611804 (2023).
- 56.Pekar JE, et al. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science. 2022;377:960–966. doi: 10.1126/science.abp8337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Massachusetts Water Resources Authority. MWRA–Wastewater COVID-19 Tracking. (2022).
- 58.Adhikari S, Halden RU. Opportunities and limits of wastewater-based epidemiology for tracking global health and attainment of UN sustainable development goals. Environ. Int. 2022;163:107217. doi: 10.1016/j.envint.2022.107217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Sack, K., Fink, S., Belluck, P. & Nossiter, A. How ebola roared back. The New York Times (2014).
- 60.Brouwer AF, et al. Epidemiology of the silent polio outbreak in Rahat, Israel, based on modeling of environmental surveillance data. Proc. Natl Acad. Sci. 2018;115:E10625–E10633. doi: 10.1073/pnas.1808798115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wu F, et al. SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases. Sci. Total Environ. 2022;805:150121. doi: 10.1016/j.scitotenv.2021.150121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Soller J, et al. Modeling infection from SARS-CoV-2 wastewater concentrations: promise, limitations, and future directions. J. Water Health. 2022;20:1197–1211. doi: 10.2166/wh.2022.094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Jones DL, et al. Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. Sci. Total Environ. 2020;749:141364. doi: 10.1016/j.scitotenv.2020.141364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Biobot Analytics. The effect of septic systems on wastewater-based epidemiology. http://biobot.io/wp-content/uploads/2022/09/BIOBOT_WHITEPAPER_EFFECT_OF_SEPTIC_V01-1.pdf (2022).
- 65.Mallela A, et al. Bayesian inference of state-level COVID-19 basic reproduction numbers across the United States. Viruses. 2022;14:157. doi: 10.3390/v14010157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Goldberg, C. Mass. Public health lab can now test for new coronavirus, Speeding Results. https://www.wbur.org/news/2020/02/28/mass-public-health-coronavirus-testing (2020).
- 67.Pekar J, Worobey M, Moshiri N, Scheffler K, Wertheim JO. Timing the SARS-CoV-2 index case in Hubei province. Science. 2021;372:412–417. doi: 10.1126/science.abf8003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Pekar, J. E. et al. Sars-cov-2-origins/multi-introduction. GitHubhttps://github.com/sars-cov-2-origins/multi-introduction (2022).
- 69.Yuan, Y., Yujie, M., Jialu, Z. & Wenkun, H. Xinhua headlines: Chinese doctor recalls first encounter with mysterious virus - Xinhua English.news.cn. Xinhua (2020).
- 70.The 2019-nCoV Outbreak Joint Field Epidemiology Investigation Team. Li Q. An outbreak of NCIP (2019-nCoV) infection in China — Wuhan, Hubei Province, 2019−2020. China CDC Wkly. 2020;2:79–80. doi: 10.46234/ccdcw2020.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Rosenthal, E. THE sars epidemic: the patH; From China’s Provinces, a Crafty Germ Breaks Out. The New York Times (2003).
- 72.Garrett, L. The coming plague: newly emerging diseases in a world out of balance. (1994).
- 73.Liu, A. B., Lee, D., Jalihal, A. P., Hanage, W. P. & Springer, M. Code and data for “Quantitatively assessing early detection strategies for mitigating COVID-19 and future pandemics.” Repository abliu/early-detection. 10.5281/zenodo.10145998 (2023). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Detection time data are available at https://github.com/abliu/early-detection/releases. Estimated COVID-19 case counts in late 2019 and early 2020 are available from Pekar et al.56 (https://github.com/sars-cov-2-origins/multi-introduction), and the US wastewater threshold data are from Wu et al.53 (https://www.sciencedirect.com/science/article/pii/S0043135421005984?via%3Dihub#sec0019). In the supplementary analyses, the Massachusetts Water Resources Authority wastewater data are from https://www.mwra.com/biobot/biobotdata.htm, national COVID-19 case counts in early 2020 are from the Johns Hopkins Center for Systems Science and Engineering 2023 (https://github.com/CSSEGISandData/COVID-19), and US state COVID-19 case counts in early 2020 are from Lu et al.54 (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008994).
Code is available at https://github.com/abliu/early-detection/releases73.