Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Sep 7;220(1):106–129. doi: 10.1016/j.jeconom.2020.07.047

Estimating the fraction of unreported infections in epidemics with a known epicenter: An application to COVID-19

Ali Hortaçsu a,b,, Jiarui Liu a, Timothy Schwieg c
PMCID: PMC7476454  PMID: 32921876

Abstract

We develop an analytically tractable method to estimate the fraction of unreported infections in epidemics with a known epicenter and estimate the number of unreported COVID-19 infections in the U.S. during the first half of March 2020. Our method utilizes the covariation in initial reported infections across U.S. regions and the number of travelers to these regions from the epicenter, along with the results of an early randomized testing study in Iceland. Using our estimates of the number of unreported infections, which are substantially larger than the number of reported infections, we also provide estimates for the infection fatality rate using data on reported COVID-19 fatalities from U.S. counties.

1. Introduction

The global pandemic COVID-19 is here in the United States. The number of confirmed cases is rising rapidly, reaching 398,809 as of April 7 with 12,895 reported deaths. The coronavirus outbreak was declared a national emergency on March 1.1 More than half of U.S. states have imposed some levels of lockdown measures.2 In addition to the public health crisis, the country is certainly looking at a deep and possibly long-lasting economic recession, according to Ben Bernanke and Janet Yellen in a recent Financial Times article.3

Given the level of severity of current conditions, a basic yet important question remains to be answered: How many people are actually infected with COVID-19 in the U.S. and what is the true fatality rate? Because of the shortage in testing kits, hospitals and disease control centers were only able to test the subsample of people with severe symptoms or travel history. The number of reported infections, especially early on in the course of the pandemic, is likely much lower than the actual number of infections in the U.S. Indeed, these unreported infections may go unrecognized because they often experience mild or no symptoms (Nishiura et al., 2020a, Andrei, 2020). If not hospitalized or quarantined, they can infect a large proportion of the population. Thus, estimating the number of unreported infections can inform policy-makers about the proper scale of virus control policies (Alvarez et al., 2020, Eichenbaum et al., 2020). These estimates can also help to assess the effectiveness of public health policies such as social distancing in slowing the spread of the epidemic.

Estimating the number of unreported infections may also give a more accurate measure of the infection fatality rate (IFR). The widely reported case fatality rate (CFR), reports the rate of fatalities from reported cases of infection. The infection fatality rate is the proportion of those actually infected who die, not of those reported or confirmed infected. The reported case fatality rate is likely an overestimate of the true infection fatality rate, due to selection bias in testing.

Ideally, a randomized testing experiment will give an unbiased estimate of the IFR. However, given the limited supply of testing kits and surging demand by people with symptoms, randomized testing may be infeasible, especially in the early periods of the outbreak. Therefore, it may be of great value to estimate the fraction of unreported infections with observational data at hand. With that knowledge, policy-makers will be better equipped to assess the proper level and duration of virus control policies.

In this paper we develop an analytically tractable method that utilizes data on travel patterns to identify and estimate the fraction of unreported infections for situations where the epidemic has a known epicenter. Our methodological strategy, described in Section 3, exploits the covariation between the number of initial reported infections in locations away from the epicenter, and the number of travelers from the epicenter to these locations. While we do not see our method as a substitute for the “gold-standard” of well defined randomized/universal testing studies, we believe our method can be useful towards providing estimates of unreported infections when results of randomized testing studies are not available for a given location of interest.

To begin illustrating the idea, consider a time period when the epicenter is the only location with infections, and that the only way another city/country can be infected is through travelers. Also assume, as in Section 3.1, that any infected travelers can only come from the unreported infected population in the epicenter — an assumption we find reasonable (as reported infected individuals would not be allowed to travel), but are able to relax in Section 3.2. Suppose now the hypothetical situation where we know the reporting rate of infections in the epicenter (the fraction of reported infections to the true number of infections), and that we know the number of travelers from the epicenter to another city/country. Assuming travelers resemble the population of the epicenter, we can calculate the expected number of infected (but unreported) travelers entering other cities/countries. Assuming further that we know the rate of transmission of the disease, we can then calculate the expected number of infections these travelers will have generated in these locations. Comparing the expected number of infections that arise from travelers to reported cases of the infection, we can estimate the reporting rate.

What can we do in the realistic case if the reporting rate in the epicenter is unknown? In Section 3.1, we propose the following: suppose we make the assumption that the reporting rate at the epicenter and the previously uninfected city/country are the same (or directly proportional). We can then start with a guess on the unknown rate of reporting at the epicenter, which allows us to calculate the implied reporting rate at the previously uninfected city/country, and check whether these are equal (or satisfy the proportion). If not, we update our guess, and try again. In other words, we can solve for the reporting rate(s) balancing the expected number of infections from travel and the number of infected that are being reported in both locations.

While the above strategy, outlined in Section 3.1, is in principle implementable using only data on travel patterns and reported cases, it is crucially dependent on the assumption that reporting rates are the same across the epicenter and destination locations (or at least proportional). Moreover, its results are very sensitive to knowing the transmission rate of the infection from travelers, as this allows us to project the number of infections in the destination city/country of interest. However, suppose now that we have access to the reporting rate of infections from another destination city/country, e.g. through universal or randomized testing, as has been done in Iceland.4 This allows us to estimate how infectious the travelers from the epicenter are. Assuming that this transmission rate from travelers is the same as the transmission rate at the destination city/country of interest, we can then calculate the expected number of infections we would expect from travel. Intuitively, the ratio between number of travelers to two destination cities/countries from the epicenter should tell us the ratio of total infections between the two cities/countries. Randomized or universal testing at one of the destinations, Iceland in our case, will give us its number of total infections, so total infections at the other destination can be computed. This strategy is laid out in detail in Section 3.2.

We would like to be very upfront that the estimation strategies outlined above are dependent on strong assumptions and reliable data on travel patterns, and that any results are very sensitive to these assumptions. However, our hope is that our approach is clear in terms of its assumptions and its corresponding limitations; we hope that future research can improve upon these limitations. We have attempted to account for some of the limitations. For example, in Section 3.3, we discuss how to correct for the fact that infections are often reported with a delay, as there is a delay to the outset of symptoms that are often a prerequisite for testing for the infection, as well as a delay in laboratory testing. Another important limitation is the assumption that the city/country with randomized test results has the same transmission rate as the city/country of interest. Section 5.4 discusses what may be done to address violations of this assumption.

Our data consists of detailed daily reported infections/deaths for all infected U.S. counties and Iceland collected by Johns Hopkins University of Medicine Coronavirus Resource Center from January 22 to April 13, 2020; international travel data to U.S. in January and February 2020 from I-94 travels data by National Travel and Tourism Office; international travel data to Iceland by Icelandic Tourist Board in January and February 2020.

Our model generates a range of estimates that depend on the traveler data that is incorporated, the date range considered, and assumptions regarding the lags associated with reported case data. We report this range of estimates in Table 3. Across these estimates, we find that 4% to 14% of cases were reported across the U.S. up to March 16, when social distancing measures began to be applied in major metropolitan areas and travel declined significantly (Thompson et al., 2020). This estimate assumes that cases are reported with a lag of 8 days as in Table 3(a); that is, we do not treat reported cases of today as the appropriate measure of true infections today.5 , 6 This suggests that for each case reported in late February/early March, between 6 to 24 cases remained unreported (after accounting for an 8 day lag from infection to reporting of a case).

Table 3.

Mean fraction of unreported infections with different cutoffs.

(a) 8 day reporting lag

T0 date

Feb 1 Feb 5 Feb 10 Feb 15 Feb 20 Feb 25 Feb 29
T1 date Mar 9 13.71% 13.75% 13.73% 13.54% 13.35% 13.05% 12.48%
Mar 10 9.35% 9.38% 9.37% 9.24% 9.11% 8.92% 8.61%
Mar 11 8.76% 8.79% 8.78% 8.66% 8.55% 8.40% 8.17%
Mar 12 3.83% 3.85% 3.84% 3.78% 3.72% 3.64% 3.53%
Mar 13 4.44% 4.46% 4.46% 4.38% 4.31% 4.24% 4.13%
Mar 14 5.58% 5.61% 5.60% 5.51% 5.43% 5.34% 5.22%
Mar 15 4.94% 4.97% 4.96% 4.88% 4.81% 4.74% 4.64%
Mar 16 7.96% 8.00% 7.99% 7.86% 7.76% 7.65% 7.52%
Mar 17 11.71% 11.78% 11.76% 11.57% 11.43% 11.29% 11.11%
Mar 18 20.63% 20.74% 20.72% 20.40% 20.17% 19.94% 19.66%
Mar 19 44.76% 44.93% 44.90% 44.42% 44.07% 43.74% 43.33%

(b) 5 day reporting lag

T0 date

Feb 1 Feb 5 Feb 10 Feb 15 Feb 20 Feb 25 Feb 29
T1 date Mar 6 9.64% 9.65% 9.67% 9.61% 9.53% 9.40% 9.05%
Mar 7 5.22% 5.23% 5.24% 5.20% 5.16% 5.09% 4.94%
Mar 8 3.05% 3.05% 3.06% 3.03% 3.01% 2.98% 2.90%
Mar 9 1.59% 1.59% 1.60% 1.59% 1.58% 1.56% 1.53%
Mar 10 1.66% 1.66% 1.67% 1.66% 1.65% 1.63% 1.60%
Mar 11 2.26% 2.26% 2.27% 2.25% 2.24% 2.22% 2.19%
Mar 12 2.01% 2.02% 2.02% 2.01% 1.99% 1.97% 1.94%
Mar 13 3.05% 3.06% 3.07% 3.04% 3.02% 2.99% 2.95%
Mar 14 4.18% 4.19% 4.21% 4.17% 4.14% 4.11% 4.06%
Mar 15 3.62% 3.63% 3.64% 3.61% 3.59% 3.56% 3.52%
Mar 16 4.73% 4.74% 4.76% 4.72% 4.69% 4.66% 4.62%
Mar 17 6.67% 6.68% 6.71% 6.66% 6.62% 6.58% 6.52%
Mar 18 10.75% 10.77% 10.81% 10.73% 10.67% 10.61% 10.53%
Mar 19 24.61% 24.63% 24.70% 24.58% 24.48% 24.40% 24.28%

This table displays α value for different dates for both T0 and T1. We vary T0 across the month of February, and T1 across early March. Very early March and February dates for T1 are not available since Iceland confirmed infections only begin February 28. Travel data is assumed uniform across days throughout and is not weighted as T0 or T1 change. We include Italy, Spain, Germany, and the United Kingdom as epicenters as well as China.

How do these estimates compare to other estimates in the literature? A very recent study by Bendavid et al. (2020) tested a representative sample of Santa Clara county residents in early April and reports that 48,000–81,000 people are infected as of April 1, whereas only 956 are reported that day. This leads to their reported ratio of 50–85 of total infections to reported infections. Importantly, this calculation does not account for the lag in reporting infections. Our data allows us to compute a similar statistic for San Francisco County, which we report in Table 2; we obtain this by dividing our estimate of the true infected by the reported infected on that day. For March 13, this yields a ratio of 85, which is at the upper bound of the 50–85 range reported by Bendavid et al. (2020).

Table 2.

Estimated county-level α and infection fatality rates.

County α UR National α=4.27%
County α
U-A mIFR cIFR U-A mIFR cIFR
Broward, Fl 3.29% 29.36 349 0.16% 0.26% 453 0.13% 0.20%
Clark, NV 9.84% 9.16 184 0.33% 0.34% 80 0.76% 0.78%
Cook, IL 11.31% 7.84 321 0.21% 0.22% 121 0.55% 0.58%
Dallas, TX 3.56% 27.05 278 0.35% 0.40% 333 0.29% 0.34%
Essex, NJ 0.65% 153.60 834 0.38% 0.28% 5514 0.06% 0.04%
Fulton, GA 2.59% 37.59 290 0.28% 0.52% 478 0.17% 0.31%
Harris, TX 3.28% 29.50 177 0.15% 0.13% 230 0.11% 0.10%
Hillsborough, FL 6.70% 13.93 367 0.11% 0.18% 234 0.17% 0.29%
Honolulu, HI 0.29% 342.85 328 0.02% <.01% 4814 <.01% <.01%
Los Angeles, CA 3.20% 30.26 171 0.37% 0.38% 228 0.28% 0.28%
Maricopa, AZ 11.65% 7.59 382 0.33% 0.44% 140 0.89% 1.19%
Miami-Dade, FL 0.13% 776.68 1977 0.04% 0.05% 65714 <.01% <.01%
Multnomah, OR 7.08% 13.12 421 0.46% 0.47% 254 0.77% 0.79%
New York City, NY 9.80% 9.21 1144 0.66% 0.78% 499 1.51% 1.79%
Philadelphia, PA 5.10% 18.61 663 0.17% 0.25% 556 0.20% 0.30%
Ramsey, MN 6.58% 14.20 133 0.42% 0.50% 86 0.65% 0.77%
San Diego, CA 18.79% 4.32 371 0.14% 0.20% 84 0.60% 0.89%
San Francisco, CA 3.60% 26.76 85 0.14% 0.20% 101 0.12% 0.17%
Suffolk, MA 10.46% 8.56 97 0.11% 0.12% 40 0.28% 0.29%
Wayne, MI 1.48% 66.63 1193 1.81% 1.93% 3449 0.63% 0.67%

Median 3.60% 26.76 328 0.28% 0.28% 254 0.28% 0.31%

We estimate each counties’ death rate on March 13 using several measures of both the infection fatality rate and estimated infected. α is the estimated fraction of reported infections for each county accounting for an 8-day lag. UR=1αα gives the ratio of unreported to reported infections for that county, again accounting for the fact that observed reported infections have an 8 day lag. U-A gives the under-ascertainment rate on Mar 13 given by the total infected on Mar 13 (accounting for lag) divided by the reported infected on Mar 13. mIFR matches cohorts of infected using a log-normal fatality lag distribution to determine the death rate, and cIFR compares cumulative deaths 15 days later. All of these calculations are based estimating the number of total infected on March 13 by reported infected on March 21st divided by α, to account for an 8-day reporting lag. National α is taken from Table 1 Full EU travel. County α uses each county’s individually computed α rather than the nationally computed α. <.01% indicates positive numbers that round down to 0.00%.

Our estimates of the number of total infections also allow us to estimate the infection fatality rates implied by observed data on fatalities. These calculations are reported in detail in Section 5.5, and reported in Table 2. We estimate a median cumulative infection fatality rate (cIFR) of 0.280.31% across U.S. counties. We note that a representative sample study in Santa Clara County by Bendavid et al. (2020) found an infection fatality rate of 0.120.2%. Our estimates show, however, potentially substantial dispersion of IFR across U.S. counties. Once again, we would like to stress that our estimates are highly dependent on model assumptions, and the data that is used to inform it. We discuss how our results depend on these assumptions in some detail in Section 5 and in Appendix.

In the economic literature, Berger et al. (2020) and Stock (2020) study the importance of unreported cases in the context of the coronavirus pandemic. Our paper contributes to the growing literature in epidemiology on estimating the true number of infections using observational data and structural model assumptions. Notably, Li et al., 2020b, Wu et al., 2020, Flaxman et al., 2020, Liu et al., 2020b, Liu et al., 2020, Nishiura et al., 2020 utilize simulated epidemiological models to estimate the fraction of unreported infections in China and European countries. As Zhao et al. (2020) note, it is often difficult to identify the fraction of unreported alongside the growth of the infection purely by measures of fit. Our paper complements these extant papers: we provide what we believe is a transparent identification argument and a very light computational strategy that allows researchers to assess the sensitivity of model estimates to modeling and data assumptions. That said, our model may miss important components of disease dynamics that these more sophisticated epidemiological models incorporate. These richer models may also allow one to estimate a richer set of model parameters than we have been able to.7 Another related recent paper is Imai et al. (2020), who estimate potential total cases in Wuhan China from the confirmed cases in other countries due to international travel, assuming that all cases outside of China are reported correctly.8 Korolev (2020) discusses non-identification in SEIRD models and proposes estimation strategy conditional on knowing infectious period and incubation period.

Section 2 introduces our model of infection, which describes the early stages of the dynamics of the epidemic. Section 3 presents our two estimation/identification strategies. Section 4 describes the data we are using for estimation. Section 5 lays out the estimation results and our robustness checks.

2. Model

Our model is based on the classic SIR model in epidemiology. We consider the evolution of the virus in both the epicenter c and into target city i over a period of time T0tT1. We are considering a relative short period of time in the early stage of the epidemics. Thus, the “recovered” population at the epicenter, which is a small fraction of the population, is assumed not to play a significant role during this period.

2.1. What happens at the epicenter c

We denote infected, reported infected, and unreported infected in time t and epicenter c as Ic,t,Rc,t,Uc,t respectively. In all locations, total infected is given by the sum of reported infected and unreported infected: I.,t=R.,t+U.,t.

The epicenter starts with some initial infections Ic,0. We are considering a short period of time in between T0 and T1, so the number of susceptibles at the epicenter remain relatively constant throughout this period. There are also no infected cases traveling into epicenter. Since we are interested in the cumulative cases, assuming recovery plays no role is not a restrictive assumption. So at time t, the total infections at epicenter with transmission rate β is given by

Ic,t=Ic,0exp(β(tT0)) (1)

It is worth noting that β includes spread minus recovery rate, since we do not model a changing number of susceptibles. β should be viewed as the net spread of infections over time.

Each time t, there is a cohort of travelers Mi,t going from epicenter to target city i and potentially bringing the virus to target city.9

2.2. What happens in target city i

We denote infected, observed reported infected, and unreported infected in time t and city i as Ii,t,Ri,t,Ui,t respectively. At period T0, target city i has zero infections, so Ii,T0=Ri,T0=Ui,T0=0.

Each time t[T0,T1], target city receives a cohort t of incoming travelers Mi,t from the epicenter. Among these travelers, Ii,tinc are infected. Each cohort of incoming infected Ii,tinc will transmit the virus in target city with rate β for the period of [t,T1]. We assume that the transmission rate at target city is the same as in epicenter, these locally infected people are also infectious. Thus, at period T1, this cohort will infect Ii,tincexp(β(T1t)) people in the city i. The total new infections at target city at T1 caused by all cohorts of incoming infected travelers will be

Ii,T1=T0T1Ii,tincexp(β(T1t))dt (2)

We define α to be the expected percentage of cases that are reported. α provides a mapping between the expected reported number of cases – which may be small – and the true number of infected spreading in the destination city. Formally, α is defined as:

α=ERi,T1Ii,T1|Ii,T1 (3)

One can interpret the above equation as the projection of Ri,T1 onto Ii,T1.10 Expanding Eq. (2) using the definition of α:

ERi,T1|Ii,T1=αIi,T1=αT0T1Ii,tincexp(β(T1t))dt (4)

3. Estimating the reporting rate α

The estimation/identification question is: can we recover α, the reporting rate, when we only observe reported infections Ri,T1 but not Ii,tinc, the total incoming infected in Eq. (4)? In the following Sections 3.1, 3.2, we provide a complete treatment of how one can recover α under different scenarios of data availability. We consider two sets of data that could potentially be available: (i) data on travel from epicenter to U.S., and (ii) data from a randomized testing implemented outside of U.S. In Section 3.3, we extend our model and estimation strategy to incorporate reporting lags.

3.1. Travel data available but randomized testing data unavailable

When only travel data is available, we must take a stance on how many infected people are leaving the epicenter. We assume that the reported infected are unable to travel, so the spread of the virus is caused by those who are infected but unreported. This allows us to determine how infectious a traveler from the epicenter is (conditional on knowing the parameters of the model.) This is a reasonable assumption especially in the case of COVID-19 because the vast majority of reported infected individuals would be quarantined and not allowed to travel.

Our main assumption in this scenario is:

Assumption 3.1

Ii,tincMi,t=Uc,tNcRc,tfor any time t[T0,T1], city i and epicenter c (5)

Nc is the population of epicenter. In other words, we are assuming that the fraction of unreported infections among incoming travelers from the epicenter is the same as the fraction of unreported infections among people capable of leaving the epicenter. (We will relax this assumption in Section 3.2.) Let αc be the reporting rate at the epicenter, defined as αc=Rc,tIc,t.11 This implies that Uc,t=(1αc)Ic,t=1αcαcRc,t. Therefore, Assumption 3.1 becomes:

Ii,tincMi,t=(1αc)Ic,tNcRc,tt[T0,T1],city i,epicenter c (6)

Plugging Eq. (1) in, we get

Ii,tinc=(1αc)Ic,0exp(β(tT0))NcRc,tMi,tt[T0,T1],city i,epicenter c (7)

Plugging back to Eq. (4), we get

ERi,T1|Ii,T1=αIi,T1 (8)
=αT0T1(1αc)Ic,0exp(β(tT0))NcRc,tMi,texp(β(T1t))dt (9)
=α(1αc)Ic,0exp(β(T1T0))T0T1Mi,tNcRc,tdt (10)
=α(1αc)Rc,0αcexp(β(T1T0))T0T1Mi,tNcRc,tdt (11)
=α1αcαcRc,0exp(β(T1T0))T0T1Mi,tNcRc,tdt (12)

We assume the reporting rate at the epicenter to be the same as in the target city, i.e. αc=α (this will be relaxed in Section 3.2). Let ϵi,T1 be the error term in prediction, i.e. ϵi,T1=Ri,T1ERi,T1|Ii,T1. Then we have

Ri,T1=(1α)Rc,0exp(β(T1T0))T0T1Mi,tNcRc,tdt+ϵi,T1 (13)

Regressing Ri,T1 on Rc,0T0T1Mi,tNcRc,tdt will give us a consistent estimate of (1α)exp(β(T1T0)) since E(ϵi,T1|Ii,T1)=0 by definition of projection.12 , 13 Note that even when αc is a known linear function of α, we can still obtain consistent estimates of α from Eq. (12) conditional on β. We can estimate β from the growth of reported infections in the epicenter because there is no influx of infected people from other regions. Given that β is now determined, we can solve for α. However, there is substantial variation in estimation of β within the literature, and our estimate of α varies with point estimates of β (Liu et al., 2020a, Read et al., 2020, Shen et al., 2020).

3.2. When travel data and a random testing benchmark are available

In this scenario, we will be leveraging the same fact that the number of incoming unreported infections is informed by the travelers from the epicenter. Now we can also allow for selection in traveling. More specifically, if we think that e.g. urban areas are likely to have a higher infection rate than rural areas and travel abroad more,14 then Assumption 3.1 might not hold. Therefore, we introduce a bias correction term γ in the relation between the fraction of infected among travelers and the fraction of unreported infected individuals in the general population. This bias correction term γ can also account for the fact that a fraction of the unreported infected people might be too sick to travel. Our relaxed assumption in this scenario is:

Assumption 3.2

Ii,tincMi,t=γUc,tNcRc,tfor any time t[T0,T1], city i and epicenter cγ0 (14)

We can further allow for the fact that the reporting rate in epicenter αc can be different from that of region i, so Uc,t=(1αc)Ic,t=1αcαcRc,t. We can now rewrite Assumption 3.2 as:

Ii,tincMi,t=γ(1αc)Ic,tNcRc,tt[T0,T1],γ0,city i,epicenter c (15)

Plugging into Eqs. (1), (4), we get

Ri,T1=α1αcαcγexp(β(T1T0))Rc,0T0T1Mi,tNcRc,tdt+ϵi,T1 (16)

The additional parameters for the bias correction term γ and different reporting rate for the epicenter complicate the estimation of α using travel data alone. However, having data from a country that has implemented randomized or complete testing greatly helps overcome this challenge. In our case, we are able to identify α using additional information given by the randomized testing benchmark provided by Iceland. Since the Iceland company deCODE genetics implemented random testing of COVID-19 for a representative sample of the island population,15 we are able to observe the infection rate of a representative sample of Iceland’s population at time T1. Assuming that this is the true infection rate, and multiplying by the population of region j in Iceland will give us the actual number of infections in region j at time T1, which is Ij,T1. Thus, for any region j in Iceland we observe Ij,T1Ij,T0, which in turn, equals the infections generated by travelers from the epicenter:

Ij,T1Ij,T0=1αcαcγexp(β(T1T0))Rc,0T0T1Mj,tNcRc,tdt (17)

If there were repeated observations in countries with randomized testing that allows estimation of the true infection rate, we could allow for an idiosyncratic error term in Eq. (17). However, when there is only one observation with randomized testing, we must treat it as observed without error. Estimating Eq. (17) returns 1αCαcγexp(β(T1T0))Rc,0. Estimating Eq. (16) gives a consistent estimate of α1αcαcγexp(β(T1T0))Rc,0. Taking the ratio, we have identified α.

One intuition for this strategy is the following: the ratio between travel to U.S. and travel to Iceland from the epicenter should tell us the ratio of total infections between U.S. and Iceland. Iceland’s randomized testing gives us its number of total infections, so U.S. total infections can be computed. In other words, we observe the outcome in U.S. with under-reporting, and the unobserved counterfactual outcome with full reporting is given by the benchmark Iceland. An additional advantage of this estimation/identification strategy, as opposed to the previous strategy in Section 3.1, is that now we do not need an estimate of β in order to recover α. We also allow for the fact that Rc,0 could be observed with error. Identifying α does not require observing Rc,0 perfectly because Rc,0 appears identically in both equations.

We should be clear that for terms with β to cancel out, the argument does assume that β is the same across Iceland and the U.S. We believe this might be a reasonable assumption for the early periods of the infection when social distancing or other widespread measures had not yet been implemented (in a potentially differential fashion). In the case of heterogeneous transmission rate, we show in Appendix A.2 how to estimate them if we have high quality travel data. We also need the bias term γ to be the same for U.S. and Iceland; this means that proportion of (unreported) infected travelers from China to the U.S. and Iceland are the same. More detailed micro-data on travelers may be used to assess the validity of this assumption.

This estimation strategy also works when a complete testing benchmark exists. If the whole population of region j is tested, then we observe Ij,T1Ij,T0 trivially. Eq. (17) still gives consistent estimate of 1αcαcγexp(β(T1T0))Rc,0 and the rest of the argument follows.

Note that if in the model reported infections and unreported infections have different transmission rates, then our strategy would not be able to capture the differential rates. We would need other sources of information to help us pin down these differential rates.

3.3. Incorporating reporting lags

In this section, we show how our model can incorporate a fixed reporting lag in reported infections and derive identification equations. Reporting lags are important, because if people are tested for the virus only after symptoms show up, there will be a lag in reported infections. Another major reason for reporting lag is the lag in testing results. The turnaround time for testing results in U.S. major laboratory companies could be 2 to 3 days (Kaplan and Thomas, 2020).

We denote true infected, true reported infected, and true unreported infected in time t and target city i as Ii,t,Ri,t,Ui,t respectively. Those for epicenter c as Ic,t,Rc,t,Uc,t. Let k be the lagged report period. At time t city i denote the lagged reported infected LRi,t=Ri,tk. For epicenter c, lagged reported infected is LRc,t=Rc,tk.

Define reporting rate at city i as α=ERi,tkIi,tk|Ii,tk=ELRi,tIi,tk|Ii,tk and at epicenter c as αc=Rc,tkIc,tk=LRc,tIc,tk. This means that we are considering the reporting rate of lagged reported cases as a fraction of the lagged total infections, not the current total infections. We derive an alternative definition of reporting rate as lagged reported cases over current total infections in Appendix A.3.

When travel data are available but randomized testing data unavailable, we still maintain Assumption 3.1. For city i time T1, we estimate α using the following equation.

LRi,T1=(1α)exp(β(T1T0k))LRc,kT0T1kMi,tNcRc,tdt+ϵi,T1 (18)

where ϵi,T1 is the prediction error defined analogously to Eq. (13). We assume that αc=α as before. α is identified conditional on β and k.

When both travel data and randomized testing data are available, we maintain Assumption 3.2. In U.S. city i time T1,

LRi,T1=αγ1αcαcexp(β(T1T0k))LRc,kT0T1kMi,tNcRc,tdt+ϵi,T1 (19)

In Iceland region j time T1, the estimating equation is

Ij,T1k=γ1αcαcexp(β(T1T0k))LRc,kT0T1kMj,tNcRc,tdt (20)

If we know k, then we can compute Ij,T1k from the randomized testing data. Estimating Eq. (20) gives γ1αcαcexp(β(T1T0k)LRc,k. Regressing Eq. (19) gives a consistent estimate of αγ1αcαcexp(β(T1T0k))LRc,k. Taking the ratio, we can identify α if we know k. Again here we can also allow for the situation where we do not observe LRc,k perfectly. Details of how we derive the estimating equations are in Appendix.

3.4. Identification of α, β, γ and αc

In this section we recap our identification arguments for both strategies presented above. We summarize which parameters are identified and estimated, and which equations are used to identify them.

3.4.1. No randomized testing available

For simplicity, we will discuss identification of α and β under the assumption that αc=α. We note that without randomized testing, our model is identified under Assumption 3.1, so γ is not present during this specification. The remaining two parameters in Eq. (12) are α and β. Conditional on knowing β, α is identified and can be estimated by ordinary least squares estimation.

How do we identify β separately from α? In the early onset of the pandemic, data on the growth of the virus in the epicenter exclude travelers with the virus entering the epicenter. Applying Rc,t=αcIc,t to Eq. (1), we see that

1αcRc,t=1αcRc,0exp(β(tT0)) (21)

From this, we can identify β by dividing through and taking logarithms.

β=1tT0logRc,tRc,0 (22)

Data of reported infected in the epicenter identifies β separately from α. Conditional on an estimate of β, the only remaining unknown coefficient of Eq. (12) is (1α). Thus α is identified conditional on β, and αc is identified by assumption that αc=α.

3.4.2. Randomized testing available

When randomized testing is available, the model allows for a more complex relationship between the reporting rates of the epicenter and the destination. We no longer have to assume that αc=α; moreover, we introduce an additional parameter γ. Identification is based on Assumption 3.2. While this specification features more parameters, identification of α requires separately identifying and estimating fewer parameters. The reason for this is the inclusion of additional information from randomized testing. We assume that γ and β are constant across all countries.

Estimation of Eq. (17) yields 1αcαcγexp(β(T1T0))Rc,0, and estimation of Eq. (16) yields α1αcαcγexp(β(T1T0))Rc,0. By taking the quotient, 1αcαcγexp(β(T1T0))Rc,0 cancels out, leaving only α. Because of this cancellation, with randomized testing, we need not identify β,γ or αc to identify α.

4. Data

4.1. COVID-19 data

Daily reported infections, recovery, and death data are collected by Johns Hopkins University of Medicine Coronavirus Resource Center from January 22 to April 13, 2020. We use data for all U.S. counties as well as epicenters.

Randomized testing data in Iceland is obtained from the website maintained by the Directorate of Health and the Department of Civil Protection and Emergency Management in Iceland.16 We have daily number of tests conducted by deCODE genetics and daily number of confirmed cases. We use the first half of testing by deCODE which spans March 15–19, 2020. During this round of testing deCODE performed 5490 tests and confirmed 48 cases, which implies an infection rate of .874%.

Testing in Iceland conducted by deCODE genetics featured open invitations for testing among individuals who were not confirmed infected at the time. We believe that the confirmed infected population was very small at the onset of infections into Iceland. As a result, ignoring them from the sample does not contaminate the testing very much.

There is also a risk of selection into testing: people who were in contact with infected individuals are more likely to select into testing than socially isolated individuals. We assume that there is no selection into testing by the more at-risk. Those who are at risk of contracting the virus were not aware of their high risk due to several reasons. The incubation time of the virus was not understood this early into its spread. Another aspect of the virus that was not understood was the high number of asymptomatic but infectious individuals in the population. Under this assumption, we treat open invitations of the unconfirmed as randomized testing.

Moreover, we cannot consider the Iceland data as representative testing because of a sampling issue. The deCODE testing does not attempt to re-weight its sample in order to be representative of the demographics of the population. However, as long as the previous assumption of randomization holds, this does not affect our results.

If the randomization assumption were incorrect, and the deCODE data was biased upwards in the number of infections present in Iceland, this would mean that the infectiousness of travelers from the epicenter was actually lower than what our model predicts. This would mean that reporting rate is higher, and there are fewer total infected in the United States. This would bias our estimates of the infection fatality rate upwards as well.

In Gudbjartsson et al. (2020) more data from Iceland is considered. The paper considers two waves of studies: one set is the open invitation from deCODE testing, and the other is a set of randomized invitations sent out via text-message. This is a much larger set of testing than our data, but uses a later time-frame than our method. Using the entire first wave of deCODE data, they find an infection rate of .8% rather than our .87%. They also consider a second wave of randomly invited individuals which gives a .6% infection rate. However this second sample is restrictive in terms of age, only featuring individuals from ages 20–70, which omits the very young and the elderly, two high-risk groups. This second wave is still vulnerable to the same selection concerns as the deCODE testing data as well.

4.2. Travel data

We obtain monthly data of international arrivals to U.S. by port of entry and country of origin from I-94 Arrivals by National Travel and Tourism Office. We use the number of visitors from China, Italy, Spain, UK, and Germany in January and February 2020 as the measure for incoming travelers to U.S. states. For international arrivals to Iceland, we get the number of visitors from China, Italy, Spain, UK, and Germany in January and February 2020 from the Icelandic Tourist Board. We have not been able to obtain March travel data into either country.

The National Travel and Tourism Office of the United States provides monthly data for entry by port of entry, as well as a separate data set for country of origin. We construct the number of visitors from China, Italy, Spain, Germany, and the UK by scaling the port-of-entry data by the percentage of total visitors that are from these countries. This introduces error, as we cannot observe directly the number of e.g. Chinese travelers into a particular city or state. It is also important to note that we do not observe inter-state travel. While this may not be important for the immediate infections caused by travelers from the epicenter, our projections for the number of infections for T1 that are far removed from T0 will be less accurate due to interstate travel.

To attempt to alleviate this error, we note that The National Travel and Tourism Office of the United States also provides a market profile of travelers from each of the epicenters. This data however appears to be flawed. In particular several major ports such as Portland, Oregon have no recorded travelers from any of our epicenters. This data also only contains travel for 2019 and the years before, and we have concerns over its stationarity. We report our results using this data in Appendix A.2, but maintain our use of the 2020 data for our main results.

For Icelandic data, 99% of international travelers arrive through Keflavik airport into Iceland. The data contains a breakdown of arrival by country of origin, broken down by month of arrival. We use January and February arrival data from China, Italy, Spain, UK, and Germany for estimation.

Our travel data for both countries does not control for connecting flights. However the United States data is limited to the top-30 port of entries, many of which are large urban cities for which there will be less connecting flights. Further work that can obtain more precise estimates of entry may be able to control for this. In the case of Iceland: a survey conducted by the Icelandic Tourist Board suggests that 2%–5% of international travelers are aboard connecting flights, suggesting it is less of a problem for this set of data.

4.3. Population data

Estimates of U.S. State and county population data come from the U.S. Census Bureau. Data for the populations of China, Iceland, Italy, Spain, UK, and Germany as of 2020 are obtained from the United Nations Population Division.

5. Empirical application

5.1. Implementation

We now consider estimation of αUS using randomized sampling in Iceland, as described in Section 3.2. Randomized sampling done by deCODE genetics gives a percentage of the population that has contracted the virus. We estimate Eq. (17) using Randomized Testing to construct Ij,T1k. We do not have city-level travel data into Iceland. 99% of all international travel arrives through a single airport, and while the data provided is accurate, this gives only a single data point for estimation. As a result, exp(β(T1T0))γ1αcαcRc,0 is estimated without error by the ratio of Ij,T1Ij,T0 and T0T1Mj,tNcRc,tdt.17

For estimation of reporting rates in the U.S., we need several pieces of data: Firstly Mj,t, and secondly of Ij,T1Ij,T0. We discuss the imputation of these here. We observe only monthly travel data to construct Mj,t, and to maintain robustness to January travels and infections, we average February and January travel into both the United States and Iceland. We assume that Mj,t is uniform over the entire time period such that Jan1Feb29Mj,t is equal to the sum of all travel into the city from January and February. Thus the integrand of T0T1Mj,tNcRc,tdt varies over time only by confirmed infections in the epicenter. Estimation of Ij,T1 is complicated due to randomized testing by Iceland only being conducted at certain dates. To resolve this problem, we scale the Iceland randomized results by the scale of the confirmed cases against March 15. This means that if there were half the confirmed cases in March 5 as in March 15, the total infections would be half of the randomized testing percent times the population of Iceland. This allows for us to consider T1 closer to the onset of the infection than the randomized testing dates. We also remove the number of infected from Wuhan China from our data on confirmed infected in China due to the lock-down restrictions placed on this city. We use the first wave of deCODE testing to determine the percentage of the population that has contracted the disease. This testing took place during Mar 15 through Mar 19. The results show that .874% of the population of Iceland have contracted the disease as of Mar 15.18

We estimate Eq. (19) using multiple data points from U.S. states and counties. We obtain our estimate of αγ1αcαcexp(β(T1T0k))LRc,k via OLS without a constant term. One important note is that if the magnitude of measurement error in travel data were high, this problem may be alleviated via instrumental variables strategy using other travel data measured with error.

We then construct our estimate of α by dividing the two estimates. It is important to note that as a result of the division, this method is not reliant on population data from the epicenter of infection. As long as γ and β are the same between Iceland and the United States we will have identified α. It is likely that at the onset of the infection similar preventative measures have been taken in these two countries, meaning that β will be reasonably close for each country.

Is China the only epicenter for the United States? While the first confirmed infection in Seattle occurred from a visitor from China, our data on The United States and Iceland occurs later in the global progression of the virus than our Chinese data. By the time these countries were experiencing infections, Italy had also experienced an outbreak. To this end, we also allow for a second epicenter: Italy. Italy is located much closer to Iceland and constitutes a substantial amount of travel to the country. However, to maintain identification, we require that α, β and T0 be same for both China and Italy, and we observe LRc,k for both epicenters with no error. However we find that allowing T0 to vary does not affect our estimates by much. We also consider a broader collection of epicenters of China, Italy, Spain, Germany and the UK. For some collection of epicenters L: Our estimation equation for the United States is given below.

LRi,T1=α1αcαcγexp(β(T1T0k))LT0T1kLRc,kMi,tNcRc,tdt+ϵi,T1 (23)

A similar equation is also estimated with multiple epicenters (China and Italy, also with Spain, Germany, and UK) for Iceland. Our results using multiple epicenters run OLS without a constant using this equation, as well as the corresponding Iceland equation to estimate α.

Table 1.

Estimated average fraction of reported infections.

αUS 1αUSαUS
8 day lag
China and Italy Travel Data
.0416 23.1
Only Chinese Travel Data
.0458 20.8
China, Italy, Spain, Germany, UK
.0427 22.4

5 day lag
China and Italy Travel Data
.0161 61.2
Only Chinese Travel Data
.0169 58.3
China, Italy, Spain, Germany, UK
.0164 60.1

We report estimated α by OLS without a constant for several specifications of the model. We use T0 as Feb 23 and T1 as March 10,13 for each lag respectively. For the versions including European data, European travel to both Iceland and the United States is considered. King County, WA is omitted from the calculation.

5.2. Results: Illustration

As a first illustration of our approach, we first estimate α using February 23 as T0 and March 13 as T1 with a lag of 8 days. This is because there were very few infections in January and early February. We check robustness of different time periods in Section 5.3. We choose T1 to include the beginning of the growth of infections in the United States, while still being early into the progression of COVID-19 so travel is still important. Using traveler data from China, Italy, Spain, Germany, and the UK, in Table 1, we estimate α=0.0427 (robust s.e. 0.0211). This would mean that for every case confirmed in the United States in early March, there are still 1αα=22 unconfirmed cases (assuming a reporting lag of 8 days). We also consider a 5 day lag model, the median time for symptoms to appear, but prefer 8 days, in order to capture the testing lag in addition to symptom onset (Lauer et al., 2020, Kaplan and Thomas, 2020, Li et al., 2020a). For the 5 day lag, T1 was set to March 10 for comparison. See Fig. 1 for county-level results.

Fig. 1.

Fig. 1

Estimated reported infections by county.

This plot shows the ratio of confirmed cases to estimated infected in each county on T1= March 10 for 5 day lag, and 13 for 8 day lag. T0 is February 23rd. We use the full European entry. Estimated infected are given by the reported on T1+Lag and divided by α.

How does this estimate compare to other estimates in the literature? A very recent study by Bendavid et al. (2020) tests for COVID-19 antibodies in a representative sample of Santa Clara county residents and reports that 48,000–81,000 people are infected as of April 1, whereas only 956 cumulative infected are reported that day. This leads to their reported ratio of 50–85 of total infections to reported infections. Importantly, this calculation does not account for the lag in reporting infections. While we do not have travel data for Santa Clara county, we can compute a similar statistic for San Francisco County: Our estimate of the true infected on March 13, divided by the reported cumulative infected on that day. This yields a ratio of 85, which is at the upper bound of the 50–85 range reported by Bendavid et al. (2020). We report these ratios for the counties in our data set in Table 2.

We note that there is one city present in our data that is a huge outlier. Seattle featured very early infections, and was unable to contain the spread of early infections unlike other cities in the United States. We believe that for King County, T0 may be much earlier than for the other cities. This means that within our time interval, there are substantial amounts of infections caused by residents of the city, not only visitors. As a result, this city has a substantially higher (3700%) amount of confirmed cases per visitor than any other city at the current time so we exclude it. Results from the profile travel data are also provided in Appendix A.2.

Our approach is also sensitive to the travel data magnitudes, which may not be well estimated for the United States due to data limitations. In particular, connecting flights after port of entry may lead to underestimates of international arrivals into smaller cities and counties. We also lack inter-state travel between the United States, which would be important for estimating α later into the spread of the virus.

Have we considered all epicenters of the virus for the United States and Iceland? There were other countries which had seen substantial infections such as South Korea. Their exclusion biases both the estimates from both Iceland as well as the United States, and as long as the magnitudes of travel were even between the two will not bias alpha. If these other epicenters had more travel to the United States relative to Iceland, this would downward bias our estimates of α, and vice versa. However, we see little change in our estimates by adding in Spain, Germany and UK. If the travel patterns between the United States and Iceland to and from an omitted set of epicenter countries are not very different, we do not believe their omission will substantially alter our results.

5.3. Results: Range of estimates and robustness checks

Our dates for T0 and T1 are chosen such that they capture the onset of the infection for the United States. As Table 3 shows, our α estimate is reasonably stable along choices of T1, and very stable among choices of T0 all throughout February. We estimate a range of 4%14% reporting rates when there is a lag of 8 days and a range of 1.5%10% for the average reporting rate across the U.S. with a reporting lag of 5 days. Using only China as the epicenter, we observe similar patterns in α. For early March we note a relatively stable α over T1. For very early choices for T1, our Iceland estimates of confirmed are very small, and this could create very noisy estimates of α (the first case in Iceland was confirmed February 28). As we increase T1, we see an increase in α. This may be due to increases in the availability of test kits, which lead to higher reporting rates. However, this result may in part be due to unobserved/unaccounted travel, particularly within the United States, along with the fact that we do not have data on March travel into the U.S. Both of these factors would lead to under-reporting of travel for late March, and cause estimates of α to be upward biased. Moreover, as we progress later into March, social distancing/health policy measures across Iceland and U.S. began to be applied, leading to differential changes in the transmission rate.

Throughout the analysis above, we have excluded King County, Washington which contains Seattle. Table A.2 displays our estimates including this county, which heavily skews the data. We believe this may be due to significant community infections occurring in the county during our time period, as the city was infected much earlier than other cities.

Correct estimation of the reporting lag parameter is essential, as our estimates of α are sensitive to this. We consider our estimates robustness to reporting lags in Table A.3. Our estimates of α appear reasonably robust to a range of lengths of the lag, with an increase as the lag becomes longer.

We note that large lags (k>10) pose a problem for estimation in our model. For estimation purposes, we maintain T1k to be a constant date as we consider changes in the lag parameter. This means that for large lags, we must consider T1 dates deep into March. However, the further we get into March, the more interstate travel and carrying of infections between cities and states matters, which may lead to overstating α.

5.4. Heterogeneity in β

The above argument is predicated on β being constant for both Iceland as well as the United States. However, along with potential differences in normal social interaction patterns leading to virus transmission, Icelandic and the U.S. policies for handling the spread of the virus may have diverged significantly. This means that our assumption of β being the same across the U.S. and Iceland may not hold. Evidence from Kucharski et al. (2020) suggests that β is very sensitive to changes in policy, leading to this upward bias in α.

In Appendix A.4, we attempt to relax that assumption. In our model, β cannot be directly estimated from the growth of the infected trajectory due to the majority of infected at the start arriving from travel. The details for our estimation procedure are given in Appendix. A major problem with this approach is that it requires variation in travel over time which we do not measure well. In essence, we use the evolution of the infected in each country over time to learn about how infectious travelers are once they have arrived. With constant daily travel, this variation is poor and makes identification of β difficult.

As a coarse estimate, we find that the difference in β is about .06. The estimate of α resulting from this procedure yields a value of .007, suggesting that our estimates using the assumption of same β may be upward biased. However, we note our monthly travel data does not allow for variation in travel over time, making it difficult to estimate β in a robust manner, so any bias in measuring travel leads to bias in α. We thus note that our main estimates of α may be upward biased, suggesting that there may be more total infections than we have estimated. However, better travel data showing adequate variation in time is required for reliable estimation of separate βs in our model.

5.5. Infection fatality rate

Given our estimated reporting rate over the time period of interest, we can compute the implied infection fatality rate (IFR). Let Dt be the number of deaths observed on day t. Let It be the number of cumulative total infections on day t. Let p be the period from illness onset to death, i.e. the fatality lag. Let f(p) be the density of p over [0,P] with mean p¯.

We consider two alternative definitions of IFR:

  • 1.

    Cumulative Infection Fatality Rate (cIFR): The cIFR at t is the fraction of cumulative deaths adjusted by mean fatality lag and cumulative total infections as of t.

  • 2.

    Cohort-matching infection Fatality Rate (mIFR): For each cohort of new infections at time t, the mIFR is the ratio of all deaths attributed to cohorts until t and the size of each infected cohort until t. This method accounts for a random fatality-lag rather than a fixed interval.

More specifically, we define

cIFRt=s=0t+p¯DsIt (24)
mIFRt=s=0t0PDs+pf(p)dps=0tIsIs1 (25)

We follow Linton et al. (2020) when estimating the distribution of the fatality lag and fit a log-normal distribution. I1 is the infected on the day before T0, and may be zero.

We estimate cIFR and mIFR for each U.S. county in our data set, computing death rates at March 13. County-level results are shown in Table 2; we have estimated the cIFR and mIFR using both county-specific α estimates, and the national estimate of 4.27%. We estimate that the median cohort-matching infection fatality rate (mIFR) is 0.28% across U.S. counties, and cIFR of 0.28%.

Fig. 2 shows the results for each U.S. county. Wayne MI, where Detroit is located, has low estimated number of infections potentially due to low travel inflow and therefore a high estimated mIFR.

Fig. 2.

Fig. 2

Reported deaths per estimated infections by county.

This plot shows the ratio of cohort-matched confirmed deaths to estimated infected in each county on Mar 13th, taking into account an 8 day lag in reporting. We use the full European Entry. We estimate fatality-lag time as a log-normal distribution with mean 14.5 and standard deviation 6.7. Estimated infected are computed for each daily cohort using the National-α of 4.27%. We omit New York City for visual clarity.

How do these estimates compare to estimates of the IFR in the literature? Russell et al. (2020) estimate a comparable definition to cIFR as 1.2%19 with complete testing data on the Diamond Princess cruise ship, using a population that is weighted towards the elderly. Since elderly have a substantially higher case-fatality rate, this suggests the real cIFR may be lower than 1.2%.

Using their estimates of the true infection rate from their representative antibody testing study, Bendavid et al. (2020), compute a cumulative infection fatality rate with projected death accounting for 3 week fatality lag, and obtain an 0.12–0.2% infection fatality rate. Accounting for lags, our cumulative infection fatality rates for San Francisco county are 0.17–0.20%, which lies at the upper bound of their estimates.

Streeck et al. (2020) estimate an IFR of 0.36%[0.29%,0.45%] with a randomized testing study in a German town. Our estimates of the median mIFR 0.28% and cIFR 0.28% with national reporting rate approximately fall in the 95% CI of their estimates. With county-specific reporting rate, our estimated median mIFR 0.28% and cIFR 0.31% are even closer to their estimates.

While our estimates of the IFR are substantially lower than reported case fatality rates, we note that there is still substantial variation in our estimated IFRs across counties in the early stages of the epidemic. This means that results from a single county, regardless of the quality of the methodology, may not be indicative of the IFR elsewhere. Along with many other factors, the variation in demographic composition, and the variation in the quality and capacity of the health care sector, can lead to drastically different infection fatality rates across the country.

6. Conclusion

In this paper, we lay out an analytically tractable model of early-period disease transmission across a known epicenter and target cities. Using this model, we provide analytical arguments to demonstrate identification of reporting rates in target cities away from the epicenter. Our preferred estimation strategy utilizes variation of travel patterns from epicenter to destination cities and available randomized testing results from elsewhere in the world. The empirical implementation of our model generates a range of estimates for the percentage of infections that have been reported. Using international travel data to the U.S. and randomized testing data from Iceland, for a February to early March window, we estimate an average reporting rate in the U.S. of 4.3%. This estimate leads to an estimated median infection fatality rate of .28.31% across U.S. counties. Our estimates suggest that a large number of infections in the U.S. have not been reported in this early period.

We are not offering or endorsing any policy recommendations based on our estimates. Nor do we suggest that any of our analysis should be taken as a substitute for well designed randomized/universal testing programs, which will provide the most reliable estimates of the true infection rate in the population. However, we believe our method can be useful towards providing estimates of unreported infections when results of randomized testing studies are not available for a given location of interest.

Another aim in this paper has been to obtain tractable analytic results showing how to identify the reporting rate from available data. Our model is a substantially stripped down version of epidemiological models considered by Li et al., 2020b, Wu et al., 2020, Flaxman et al., 2020. These more complex models may allow additional sources of variation in the data to pin down the key parameters of interest. Importantly, we do want to emphasize that our identification and estimation results rely quite sensitively on model assumptions and the (un)availability of high quality data on travel. We hope future research can improve on these important limitations.

Acknowledgments

We thank the Becker Friedman Institute, United States of America for financial support. We also thank Fernando Alvarez, Susan Athey, Patrick Bayer, Jaroslav Borovicka, Rana Choi, Liran Einav, Jeremy Fox, Mikhail Golosov, Austan Goolsbee, Philip Haile, Jakub Kastl, Magne Mogstad, Casey Mulligan, Derek Neal, Robert Shimer, Jose Scheinkman, Chad Syverson, Raphael Thomadsen, Harald Uhlig, Theodore Vassilakis, and Alessandra Voena for their helpful comments.

Footnotes

4

Of course, another strategy is to assume that the reporting rate discovered through randomized testing in Iceland is the same in the destination city/country of interest.

5

A shorter assumed reporting lag of e.g. 5 days generates a range of estimated reporting rates between 1.5% to 10%.

6

The estimates use travel data from China, Italy, Spain, Germany, and the UK. We have excluded King county, Washington in these results because this county, containing Seattle, shows much earlier community infections than other regions in U.S.

7

For example, that Li et al. (2020b) estimate different transmission rates for reported vs. unreported infections, which we are unable to identify with our strategy. Li et al. (2020b) assume that unreported infected individuals transmit the disease at a slower rate than reported infected individuals. However, since most reported infections are either hospitalized or self-quarantined, it is not clear whether this assumption is an a priori reasonable one.

8

Bogoch et al. (2020) and Lai et al. (2020) calculate how vulnerable countries are to the virus by the magnitude of travelers from Wuhan, and correlate these vulnerability/risk measures with reported cases in these countries.

9

We assume that the magnitude of unreported infected among travelers is inconsequential to the dynamics of the virus in the epicenter because the virus has progressed for a substantial period of time in the epicenter.

10

In reality α may be varying over time, due to e.g. changes in the extensiveness of testing. If this is the case, as we vary the [T0,T1] window, we will obtain window-specific estimates of α, which can be thought of as a weighted average of α during this period.

11

Because αc is unobserved and not a parameter of interest to be estimated, we define it in terms of the true Rc,t and Ic,t for analytical simplicity. This modeling choice does not affect the analysis since αc does not affect our estimates of α once randomized testing is introduced.

12

Assuming E(ϵi,T1|Ii,T1)=0 is equivalent to assuming E(ϵi,T1Ii,T1)=0 and E(ϵi,T1)=0. We assume E(ϵi,T1)=0 because there is no constant term in Eq. (4).

13

There is another interpretation of ϵ — measurement error. We observe in data Rˆi,t=αIi,t+ϵi,t where ϵi,t is idiosyncratic measurement error such that E(ϵi,t|Ii,t)=0. Eq. (4) still holds in this specification. Both interpretations lead to the same estimator.

14

This is likely to be true in epicenters such as China.

15

We will describe the randomized testing in detail in Section 4.

17

Since exp(β(T1T0))γ1αcαc is measured without error from Iceland. We report the robust standard error for α from running the regression on Eq. (16). For the case when Iceland is measured with error, Fieller’s method or the delta method can be used to compute the standard errors.

18

Stock et al. (2020) also estimate the undetected rate and total infection rate in the Iceland study.

19

Emery et al. (2020) note 50% of infections on the ship went undetected, so the cIFR may be closer to .6%.

20

We still maintain the assumption of constant susceptible population throughout March. Growth rate of susceptible population depends on the ratio of infected over total population. Since the number of infected as a fraction of total population is still small in March, we deem constant susceptible population as a reasonable assumption.

Appendix.

A.1. Tables

See Table A.1, Table A.2, Table A.3.

Table A.1.

Summary statistics of fraction of reported infections by county.

Version Min. 1st Qu. Median Mean 3rd Qu. Max.
China Travel Only 0.001404 0.027434 0.037345 0.060896 0.100897 0.203500
China and Italian Travel 0.001254 0.025142 0.034784 0.055746 0.094906 0.183048
China and EU Travel 0.001286 0.025915 0.036020 0.057454 0.097979 0.187902

Summary statistics reported on the distribution of α for each county in the data. α is estimated for T0 Feb 23, T1 Mar 13, and a lag of eight days. EU travel includes traveler data from Italy, Spain, UK, and Germany.

Table A.2.

Reporting rate (α) estimates including king county.

αUS 1αUSαUS
5 day lag
China and Italy Travel Data
0.0200 48.9
Only Chinese Travel Data
0.0211 46.5
China, Italy, Spain, Germany, UK
0.0204 48.1

8 day lag
China and Italy Travel Data
0.0486 19.6
Only Chinese Travel Data
0.0539 17.6
China, Italy, Spain, Germany, UK
0.0500 19.0

We report estimated α by OLS without a constant for several specifications of the model. We use T0 as Feb 23rd and T1 as Mar 10,13 for each lag respectively. For the versions including European data, European travel to both Iceland and the United States is considered. King County is included in this data.

Table A.3.

Robustness to Lag.

Lag α
0 0.0055
1 0.0086
2 0.00927
3 0.00984
4 0.0121
5 0.0164
6 0.028
7 0.0284
8 0.0427
9 0.0675
10 0.0699
11 0.113
12 0.194

This table shows estimates of α as the reporting lag period is varied. We use T0 as Feb 23, and T1 as March 5 + Lag days. King County is omitted. We include Italy, Spain, Germany, and the United Kingdom as epicenters as well as China.

A.2. Profile travel data

The United States National Travel and Tourism Office also provides profile travel data for each of our Epicenter Countries. As part of this data, US Port of Entry is listed with population weights for each year. We assume stationarity over time, and estimate travel into the United States from each Epicenter by weighting the total travel data for each country in 2020 by the 2019 port-specific weights. While we believe there may be flaws in this data, with several large ports appearing to have zero travel, we attempt to resolve this by dropping these ports and continuing with ports for which we have data. However dropping these ports may downward-bias our estimates of α.

Results below are for the model specification with T1 at Mar 13 and an 8 day reporting lag. We exclude Seattle WA in all of our estimation below (see Table A.4 and Table A.5).

Our 8-Day lag results for the Full-European Entry are similar, however results using only the Chinese Data show a severe downward bias from before. This leads to higher total infected predictions, and slightly lower death rates, particularly using the cohort-matching technique. We note that our fit to estimated infected with this data is significantly worse than before (see Fig. A.1).

Fig. A.1.

Fig. A.1

Estimated infected and deaths using profile travel.

The Estimated infected plot shows the ratio of confirmed cases to estimated infected in each county on T1= March 13. T0 is February 23rd. We use the full European Entry. Estimated infected are given by the reported on T1+Lag and divided by α. We use an eight-day lag.

The second plot shows the ratio of cohort-matched confirmed deaths to estimated infected in each county on Mar 13th, taking into account an 8 day lag in reporting. We use the full European Entry. We estimate fatality-lag time as a log-normal distribution with mean 14.5 and standard deviation 6.7. Estimated infected are computed for each daily cohort using the National-α of 4.09%. We omit New York City for visual clarity.

Table A.4.

Reporting rate (α) using profile travel data.

αUS 1αUSαUS
5 day lag
China and Italy Travel Data
0.0108 91.3
Only Chinese Travel Data
0.0106 93.6
China, Italy, Spain, Germany, UK
0.0112 88.1

8 day lag
China and Italy Travel Data
0.0385 25.0
Only Chinese Travel Data
0.0330 29.3
China, Italy, Spain, Germany, UK
0.0409 23.4

We report estimated α by OLS without a constant for several specifications of the model. We use T0 as Feb 23rd and T1 as Mar 10,13 for each lag respectively. For the versions including European data, European travel to both Iceland and the United States is considered. Hamilton OH, Allgeheny, PA; Multnomah, OR; Seminole, FL; Santa Clara, CA; and Baltimore MD counties are all excluded for zero-travel.

Table A.5.

Estimated county-level α and IFR using profile travel data.

County α UR National α=4.09%
County α
U-A mIFR cIFR U-A mIFR cIFR
Broward, FL 31.13% 2.21 364 0.16% 0.25% 48 1.20% 1.90%
Clark, NV 12.69% 6.88 192 0.32% 0.32% 62 0.98% 1.01%
Cook, IL 6.54% 14.30 335 0.20% 0.21% 210 0.32% 0.33%
Dallas, TX 3.20% 30.27 290 0.33% 0.39% 371 0.26% 0.30%
Denver, CO 38.02% 1.63 158 0.13% 0.13% 17 1.23% 1.18%
Essex, NJ 0.45% 222.55 871 0.36% 0.27% 7973 0.04% 0.03%
Fulton, GA 3.27% 29.58 302 0.27% 0.50% 378 0.21% 0.40%
Harris, TX 5.97% 15.76 185 0.14% 0.12% 127 0.21% 0.18%
Honolulu HI 1.03% 96.40 342 0.02% <.01% 1364 <.01% <.01%
Los Angeles, CA 1.89% 51.89 178 0.35% 0.36% 386 0.16% 0.17%
Mecklenburg, NC 2.18% 44.89 1881 0.02% <.01% 3533 0.01% <.01%
Miami-Dade, FL 0.32% 307.31 2064 0.04% 0.05% 26053 <.01% <.01%
New York City, NY 6.69% 13.95 1194 0.63% 0.75% 731 1.03% 1.22%
Philadelphia, PA 4.04% 23.75 692 0.16% 0.24% 701 0.16% 0.24%
Ramsey, MN 10.02% 8.98 138 0.40% 0.48% 57 0.98% 1.18%
San Francisco, CA 2.23% 43.80 89 0.13% 0.19% 164 0.07% 0.11%
Suffolk, MA 6.52% 14.34 101 0.11% 0.11% 64 0.17% 0.18%
Wayne, MI 0.86% 115.87 1246 1.74% 1.85% 5960 0.36% 0.39%

Median 3.66% 26.67 318.5 0.18% 0.24% 374.5 0.21% 0.27%

We estimate each counties’ death rate on March 13 using several measures of both the infection fatality rate and estimated infected. α is the estimated fraction of reported infections for each county accounting for an 8-day lag. UR=1αα gives the ratio of unreported to reported infections for that county, again accounting for the fact that observed reported infections have an 8 day lag. U-A gives the under-ascertainment rate on Mar 13 given by the total infected on Mar 13 (accounting for lag) divided by the reported infected on Mar 13. mIFR matches cohorts of infected using a log-normal fatality lag distribution to determine the death rate, and cIFR compares cumulative deaths 15 days later. All of these calculations are based estimating the number of total infected on March 13 by reported infected on March 21st divided by α, to account for an 8-day reporting lag. National α is taken from Table A.4 full EU travel. County α uses each county’s individually computed α rather than the nationally computed α. <.01% indicates positive numbers that round down to 0.00%.

A.3. Lags

We derive our model incorporating reporting lags in the Appendix and show how we get the estimating equations in Section 3.3.

Recall that we denote true infected, true reported infected, and true unreported infected in time t and target city i as Ii,t,Ri,t,Ui,t respectively. Those for epicenter c as Ic,t,Rc,t,Uc,t. Let k be the lagged report period. At time t city i denote the lagged reported infected LRi,t=Ri,tk. For epicenter c, the lagged reported infected is LRc,t=Rc,tk.

Define reporting rate at city i as α=ERi,tkIi,tk|Ii,tk=ELRi,tIi,tk|Ii,tk and at epicenter c as αc=Rc,tkIc,tk=LRc,tIc,tk. This means that we are considering the reporting rate of lagged reported cases on the lagged total infection.

We know that in the epicenter c, we have the following:

Ic,t=Ic,0exp(β(tT0)) (A.1)
Rc,t=αcIc,t (A.2)
Uc,t=(1αc)Ic,t (A.3)
=(1αc)Ic,0exp(β(tT0)) (A.4)

When only travel data is available, our Assumption 3.1 is

Ii,tincMi,t=Uc,tNcRc,tfor any time t[T0,T1], region i and epicenter c (A.5)

Applying Eq. (A.4) and solving for Ii,tinc.

Ii,tinc=Mi,tNcRc,t(1αc)Ic,0exp(β(tT0)) (A.6)

In city i, at time T1 we observe LRi,T1. We have

Ii,T1k=T0T1kIi,tincexp(β(T1kt))dt
LRi,T1=αIi,T1k+ϵi,T1=αT0T1kIi,tincexp(β(T1kt))dt+ϵi,T1=αT0T1kMi,tNcRc,t(1αc)Ic,0exp(β(tT0))exp(β(T1kt))dt+ϵi,T1=α(1αc)Ic,0exp(β(T1T0k))T0T1kMi,tNcRc,tdt+ϵi,T1=α(1αc)Rc,0αcexp(β(T1T0k))T0T1kMi,tNcRc,tdt+ϵi,T1=α1αcαcexp(β(T1T0k))LRc,kT0T1kMi,tNcRc,tdt+ϵi,T1

For α=αc, this equation simplifies to:

LRi,T1=(1α)exp(β(T1T0k))LRc,kT0T1kMi,tNcRc,tdt+ϵi,T1 (A.7)

When both travel data and randomized testing data are available, we maintain Assumption 3.2:

Ii,tincMi,t=γUc,tNcRc,tfor any time t[T0,T1], region i and epicenter c (A.8)

We can write it as

Ii,tinc=γMi,tNcRc,t(1αc)Ic,0exp(β(tT0)) (A.9)

In U.S. city i, at time T1 we observe LRi,T1. Following the same derivation as above, we have

LRi,T1=αγ1αcαcexp(β(T1T0k))LRc,kT0T1kMi,tNcRc,tdt+ϵi,T1 (A.10)

A similar derivation shows that for region j in Iceland at time T1, we have

Ij,T1k=γ1αcαcexp(β(T1T0k))LRc,kT0T1kMj,tNcRc,tdt (A.11)

We also consider an alternative definition of reporting rate, which is lagged reported cases as a fraction of current total infections, i.e. α~c=LRc,tIc,t=Rc,tkIc,t and α~=ELRi,tIi,t|Ii,t=ERi,tkIi,t|Ii,t.

In epicenter, we have

Ic,t=Ic,0exp(β(tT0)) (A.12)
Rc,t=α~cIc,t+k (A.13)
Rc,tk=α~cIc,t (A.14)
Rc,k=α~cIc,0 (A.15)
Uc,t=Ic,tRc,t (A.16)
=Ic,tα~cIc,t+k (A.17)
=Ic,0exp(β(tT0))α~cIc,0exp(β(t+kT0)) (A.18)

When only travel data is available, our Assumption 3.1 is

Ii,tincMi,t=Uc,tNcRc,tfor any time t[T0,T1], region i and epicenter c (A.19)

Then we have

Ii,tinc=Mi,tNcRc,tIc,0exp(β(tT0))α~cIc,0exp(β(t+kT0)) (A.20)

In target city, at time T1, it is known that ELRi,T1|Ii,T1=α~Ii,T1. We know

Ii,T1=T0T1Ii,tincexp(β(T1t))dt
LRi,T1=α~Ii,T1+ϵi,T1=α~T0T1Ii,tincexp(β(T1t))dt+ϵi,T1=α~T0T1Mi,tNcRc,tIc,0exp(β(tT0))α~cIc,0exp(β(t+kT0))exp(β(T1t))dt+ϵi,T1=α~Ic,0exp(β(T1T0))T0T1Mi,tNcRc,tdtα~α~cIc,0exp(β(T1T0+k))T0T1Mi,tNcRc,tdt+ϵi,T1=α~Rc,kα~cexp(β(T1T0))T0T1Mi,tNcRc,tdtα~α~cRc,kα~cexp(β(T1T0+k))T0T1Mi,tNcRc,tdt+ϵi,T1=α~α~cexp(β(T1T0))α~exp(β(T1T0+k))Rc,kT0T1Mi,tNcRc,tdt+ϵi,T1=α~1α~cexp(β(T1T0))exp(β(T1T0+k))Rc,kT0T1Mi,tNcRc,tdt+ϵi,T1=α~1α~cexp(β(T1T0))exp(β(T1T0+k))LRc,0T0T1Mi,tNcRc,tdt+ϵi,T1

If α~=α~c, then α is identified conditional on β and k.

When both travel data and randomized testing data are available, we maintain Assumption 3.2:

Ii,tincMi,t=γUc,tNcRc,tfor any time t[T0,T1], region i and epicenter c (A.21)

Then we have

Ii,tinc=γMi,tNcRc,tIc,0exp(β(tT0))α~cIc,0exp(β(t+kT0)) (A.22)

For U.S. city i, we have

LRi,T1=α~γ1α~cexp(β(T1T0))exp(β(T1T0+k))Rc,kT0T1Mi,tNcRc,tdt+ϵi,T1 (A.23)
=α~γ1α~cexp(β(T1T0))exp(β(T1T0+k))LRc,0T0T1Mi,tNcRc,tdt+ϵi,T1 (A.24)

In Iceland region j time T1, we can compute Ij,T1. We have

Ij,T1=T0T1Ij,tincexp(β(T1t))dt=T0T1γMj,tNcRc,tIc,0exp(β(tT0))α~cIc,0exp(β(t+kT0))exp(β(T1t))dt=γexp(β(T1T0))α~cexp(β(T1T0+k))Ic,0T0T1Mj,tNcRc,tdt=γexp(β(T1T0))α~cexp(β(T1T0+k))Rc,kα~cT0T1Mj,tNcRc,tdt=γ1α~cexp(β(T1T0))exp(β(T1T0+k))Rc,kT0T1Mi,tNcRc,tdt=γ1α~cexp(β(T1T0))exp(β(T1T0+k))LRc,0T0T1Mi,tNcRc,tdt

Estimating the last equation gives consistent estimate of

γ1α~cexp(β(T1T0))exp(β(T1T0+k))LRc,0

and estimating Eq. (A.24) gives

α~γ1α~cexp(β(T1T0))exp(β(T1T0+k))LRc,0

Taking the ratio, we are left with α~.

A.4. Heterogeneous transmission rates

In this section, we show how our model can be modified to allow for different transmission rate β among target cities and epicenter. Transmission rate of virus could be different across locations due to population density or other reasons (Sajadi et al., 2020). Our identification strategy in Section 3.2 relies on the assumption that transmission rates across Iceland and U.S. are the same. We will relax this assumption in this section. We will be able to capture differential transmission rate through the variation in virus evolution trends across locations, controlling for travel.

Let βc be the transmission rate in epicenter, βi rate for U.S. city i and βj rate for Iceland city j. Define β~i=βiβc and β~j=βjβc as relative transmission rates for U.S. and Iceland. Maintaining Assumption 3.2, for any U.S. city i end period T1 we have

Ri,T1=αT0T1Ii,tincexp(βi(T1t))dt (A.25)
Ii,tinc=γ(1αc)Ic,0exp(βc(tT0))NcRc,tMi,t (A.26)

Therefore, for U.S. city i we have:

Ri,T1=αT0T1(1αc)γIc,0exp(βc(tT0))NcRc,tMi,texp(βi(T1t))dt (A.27)
=α1αcαcRc,0γT0T1exp(βc(tT0))exp(βi(T1t))Mi,tNcRc,tdt (A.28)
=α1αcαcRc,0γexp(βiT1βcT0)T0T1exp(t(βcβi))Mi,tNcRc,tdt (A.29)

Similarly, for Iceland city j we have:

Ij,T1=1αcαcRc,0γexp(βjT1βcT0)T0T1exp(t(βcβj))Mj,tNcRc,tdt (A.30)

Take logs and differencing, we get

logRi,T1logIj,T1=logα+(βiβj)T1+logT0T1exp(t(βcβi))Mi,tNcRc,tdtlogT0T1exp(t(βcβj))Mj,tNcRc,tdt=logα+(β~iβ~j)T1+logT0T1exp(tβ~i)Mi,tNcRc,tdtlogT0T1exp(tβ~j)Mj,tNcRc,tdt

Assume that we observe logRi,t and logIj,t with idiosyncratic measurement error ηi,t and ηj,t with Eη.,t=0. Denote these observations as logRˆi,t and logIˆj,t. Let ui,j,t=ηi,tηj,t, so ui,j,t is also iid.

logRˆi,T1logIˆj,T1=logRi,T1logIj,T1+ηi,T1ηj,T1=logRi,T1logIj,T1+ui,j,T1=logα+(βi~βj~)T1+logT0T1exp(βi~t)Mi,tNcRc,tdtlogT0T1exp(βj~t)Mj,tNcRc,tdt+ui,j,T1

With a k period lag, we have for U.S. city i:

LRi,T1=αT0T1k(1αc)γIc,0exp(βc(tT0))NcRc,tMi,texp(βi(T1kt))dt=α1αcαcRc,0γT0T1kexp(βc(tT0))exp(βi(T1kt))Mi,tNcRc,tdt=α1αcαcRc,0γexp(βi(T1k)βcT0)T0T1kexp(t(βcβi))Mi,tNcRc,tdt

Similarly, for Iceland city j we have:

Ij,T1k=1αcαcRc,0γexp(βj(T1k)βcT0)T0T1kexp(t(βcβj))Mj,tNcRc,tdt

Take logs and differencing, we get

logLRi,T1logIj,T1k=logα+(βiβj)(T1k)+logT0T1kexp(t(βcβi))Mi,tNcRc,tdtlogT0T1kexp(t(βcβj))Mj,tNcRc,tdt=logα+(β~iβ~j)(T1k)+logT0T1kexp(tβ~i)Mi,tNcRc,tdtlogT0T1kexp(tβ~j)Mj,tNcRc,tdt

Taking into account measurement error, we have

logLR^i,T1logIˆj,T1k=logα+(βi~βj~)(T1k)+logT0T1kexp(βi~t)Mi,tNcRc,tdtlogT0T1kexp(βj~t)Mj,tNcRc,tdt+ui,j,T1k

When we consider multiple epicenters L, the estimating equation becomes:

logLR^i,T1logIˆj,T1k=logα+(βi~βj~)(T1k)+logT0T1kLLRc,kMi,tNcRc,texp(βi~t)dtlogT0T1kLLRc,kMj,tNcRc,texp(βj~t)dt+ui,j,T1k

We parameterize transmission rate as linear in urban population density, and an indicator for social distancing. We can then estimate the equations above with nonlinear least squares pooling different end period T1. We consider European Epicenters as well as China. Because our data only contains constant travel, terms that appear in both βi as well as βj are very difficult to estimate. In particular, the constant term is very difficult to estimate, and not identified for very large magnitudes. This problem occurs because with constant travel, infected arrivals vary over time only with Rc,t, which causes very minor changes in arrivals. This makes time-variation of the same cities provide little identification in β.

To circumvent these issues, we attempted to estimate βj separately using differences in infected in Iceland over time, however since travel is still constant for Iceland, the same identification problems were present. We estimate a difference in β of .06, indicating that in 10 days infections in the United States will double one more time than in Iceland. However, we note that our α estimate is biased downwards because of travel data, and this leads to an upward bias on beta. We consider this to be an upper bound on the difference in β between the two countries. With further investigation and better travel data, we believe that this framework can be used to identify the spread of infection with different infection rates.

This strategy is better than naively studying growth rates of infected between countries because it accounts for the origin of the infection: travelers from infected epicenters. While the infection is spreading within the country, it is not isolated, and there are continual arrivals from the epicenter that are also spreading the infection, failing to take into account these arrivals will lead to estimates of the spread of infection being too high in the early stages of the progression of the virus.20

References

  1. Alvarez F.E., Argente D., Lippi F. National Bureau of Economic Research; 2020. A Simple Planning Problem for COVID-19 Lockdown: Working Paper, Working Paper Series. 26981, http://www.nber.org/papers/w26981. [Google Scholar]
  2. Andrei M. Iceland’s testing suggests 50% of covid-19 cases are asymptomatic. ZME Science. 2020 [Google Scholar]
  3. Bendavid E., Mulaney B., Sood N., Shah S., Ling E., Bromley-Dulfano R., Lai C., Weissberg Z., Saavedra R., Tedrow J. Covid-19 antibody seroprevalence in santa clara county, california. medRxiv. 2020 doi: 10.1093/ije/dyab010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Berger D.W., Herkenhoff K.F., Mongey S. National Bureau of Economic Research; 2020. An SEIR Infectious Disease Model with Testing and Conditional Quarantine: Working Paper, Working Paper Series. http://www.nber.org/papers/w26901. [Google Scholar]
  5. Bogoch I.I., Watts A., Thomas-Bachli A., Huber C., Kraemer M.U.G., Khan K. Pneumonia of unknown aetiology in wuhan, china: potential for international spread via commercial air travel. J. Travel Med. 2020;27(2) doi: 10.1093/jtm/taaa008. taaa008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Eichenbaum M.S., Rebelo S., Trabandt M. National Bureau of Economic Research; 2020. The Macroeconomics of Epidemics: Working Paper, Working Paper Series. 26882, http://www.nber.org/papers/w26882. [Google Scholar]
  7. Emery J.C., Russel T.W., Liu Y., Hellewell J., Pearson C.A., Knight G.M., Eggo R.M., Kucharski A.J., Funk S., Flasche S. The contribution of asymptomatic sars-cov-2 infections to transmission-a model-based analysis of the diamond princess outbreak. medRxiv. 2020 doi: 10.7554/eLife.58699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Flaxman S., Mishra S., Gandy A. Estimating the number of infections and the impact of non-pharmaceutical interventions on covid-19 in 11 european countries. Imperial College London. 2020 [Google Scholar]
  9. Gudbjartsson D., Helgason A., Jonsson H., Magnusson O., Melsted P., Norddahl G., Saemundsdottir J., Sigurdsson A., Sulem P., Agustsdottir A., Eiriksdottir B., Fridriksdottir R., Gardarsdottir E., Georgsson G., Gretarsdottir O., Gudmundsson K., Gunnarsdottir T., Gylfason A., Holm H., Stefansson K. Spread of sars-cov-2 in the icelandic population. New England J. Med. 2020;382:2302–2315. doi: 10.1056/NEJMoa2006100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Imai N., Dorigatti I., Cori A., Riley S., Ferguson N.M. Report 1: estimating the potential total number of novel coronavirus cases in wuhan city, china. Imperial College London. 2020 [Google Scholar]
  11. Kaplan S., Thomas K. Delays and shortages exacerbate coronavirus testing gaps in the u.s. The New York Times. 2020 [Google Scholar]
  12. Korolev I. Identification and estimation of the seird epidemic model for covid-19. J Econ. 2020 doi: 10.1016/j.jeconom.2020.07.038. http://www.sciencedirect.com/science/article/pii/S0304407620302621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kucharski A.J., Russell T.W., Diamond C., Liu Y., Edmunds J., Funk S., Eggo R.M., Sun F., Jit M., Munday J.D., Davies N., Gimma A., van Zandvoort K., Gibbs H., Hellewell J., Jarvis C.I., Clifford S., Quilty B.J., Bosse N.I., Abbott S., Klepac P., Flasche S. Early dynamics of transmission and control of covid-19: a mathematical modelling study. Lancet Infectious Diseases. 2020;20(5):553–558. doi: 10.1016/S1473-3099(20)30144-4. http://www.sciencedirect.com/science/article/pii/S1473309920301444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lai S., Bogoch I., Ruktanonchai N., Watts A., Lu X., Yang W., Yu H., Khan K., Tatem A.J. Assessing spread risk of wuhan novel coronavirus within and beyond china, january-april 2020: a travel network-based modelling study. medRxiv. 2020 https://www.medrxiv.org/content/early/2020/03/09/2020.02.04.20020479 [Google Scholar]
  15. Lauer S.A., Grantz K.H., Bi Q., Jones F.K., Zheng Q., Meredith H., Azman A.S., Reich N.G., Lessler J. The incubation period of 2019-ncov from publicly reported confirmed cases: estimation and application. Ann. Internal Med. 2020;172:577–582. doi: 10.7326/M20-0504. https://pubmed.ncbi.nlm.nih.gov/32150748/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Li Q., Guan X., Wu P., Wang X., Zhou L., Tong Y., Ren R., Leung K.S., Lau E.H., Wong J.Y., Xing X., Xiang N., Wu Y., Li C., Chen Q., Li D., Liu T., Zhao J., Liu M., Tu W., Chen C., Jin L., Yang R., Wang Q., Zhou S., Wang R., Liu H., Luo Y., Liu Y., Shao G., Li H., Tao Z., Yang Y., Deng Z., Liu B., Ma Z., Zhang Y., Shi G., Lam T.T., Wu J.T., Gao G.F., Cowling B.J., Yang B., Leung G.M., Feng Z. Early transmission dynamics in wuhan, china, of novel coronavirus infected pneumonia. New J. Med. 2020;382(13):1199–1207. doi: 10.1056/NEJMoa2001316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Li R., Pei S., Chen B., Song Y., Zhang T., Yang W., Shaman J. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) Science. 2020;368(6490):489–493. doi: 10.1126/science.abb3221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Linton N.M., Kobayashi T., Yang Y., Hayashi K., Akhmetzhanov A.R., Jung S.-m., Yuan B., Kinoshita R., Nishiura H. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data. J. Clinical Med. 2020;9(2):538. doi: 10.3390/jcm9020538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Liu T., Hu J., Kang M., Lin L., Zhong H., Xiao J., He G., Song T., Huang Q., Rong Z., Deng A., Zeng W., Tan X., Zeng S., Zhu Z., Li J., Wan D., Lu J., Deng H., He J., Ma W. Transmission dynamics of 2019 novel coronavirus (2019-ncov) bioRxiv. 2020 https://www.biorxiv.org/content/early/2020/01/26/2020.01.25.919787 [Google Scholar]
  20. Liu Z., Magal P., Seydi O., Webb G. Understanding unreported cases in the covid-19 epidemic outbreak in wuhan, china, and the importance of major public health interventions. Biology. 2020;9(3):50. doi: 10.3390/biology9030050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liu Z., magal p., Seydi O., Webb G. Predicting the cumulative number of cases for the covid-19 epidemic in china from early data. medRxiv. 2020 doi: 10.3934/mbe.2020172. https://www.medrxiv.org/content/early/2020/03/13/2020.03.11.20034314 [DOI] [PubMed] [Google Scholar]
  22. Nishiura H., Kobayashi T., Miyama T., Suzuki A., Jung S., Hayashi K., Kinoshita R., Yang Y., Yuan B., Akhmetzhanov A.R. Estimation of the asymptomatic ratio of novel coronavirus infections (covid-19) Internat. J. Infect. Dis. 2020;94:154–155. doi: 10.1016/j.ijid.2020.03.020. https://www.ijidonline.com/article/S1201-9712(20)30139-9/pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Nishiura H., Kobayashi T., Yang Y., Hayashi K., Miyama T., Kinoshita R., Linton N., Jung S.-M., Yuan B., Suzuki A., Akhmetzhanov A. The rate of underascertainment of novel coronavirus (2019-ncov) infection: estimation using japanese passengers data on evacuation flights. J Clin. Med. 2020;9:419. doi: 10.3390/jcm9020419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Read J.M., Bridgen J.R., Cummings D.A., Ho A., Jewell C.P. Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions. medRxiv. 2020 doi: 10.1098/rstb.2020.0265. https://www.medrxiv.org/content/early/2020/01/28/2020.01.23.20018549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Russell T.W., Hellewell J., Jarvis C.I., Van-Zandvoort K., Abbott S., Ratnayake R., Flasche S., Eggo R.M., Kucharski A.J., nCov working group C. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. Eurosurveillance. 2020;25(12) doi: 10.2807/1560-7917.ES.2020.25.12.2000256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Sajadi M.M., Habibzadeh P., Vintzileos A., Shokouhi S., Miralles-Wilhelm F., Amoroso A. Temperature, humidity, and latitude analysis to estimate potential spread and seasonality of coronavirus disease 2019 (covid-19) JAMA Network Open. 2020;3(6):e2011834. doi: 10.1001/jamanetworkopen.2020.11834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Shen M., Peng Z., Xiao Y., Zhang L. Modelling the epidemic trend of the 2019 novel coronavirus outbreak in china. bioRxiv. 2020 doi: 10.1016/j.xinn.2020.100048. https://www.biorxiv.org/content/early/2020/01/25/2020.01.23.916726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Stock J.H. National Bureau of Economic Research; 2020. Data Gaps and the Policy Response to the Novel Coronavirus: Working Paper, Working Paper Series. http://www.nber.org/papers/w26902. [Google Scholar]
  29. Stock J.H., Aspelund K.M., Droste M., Walker C.D. Identification and estimation of undetected covid-19 cases using testing data from iceland. medRxiv. 2020 https://www.medrxiv.org/content/early/2020/06/23/2020.04.06.20055582 [Google Scholar]
  30. Streeck H., Schulte B., Kuemmerer B., Richter E., Höller T., Fuhrmann C., Bartok E., Dolscheid R., Berger M., Wessendorf L. Infection fatality rate of sars-cov-2 infection in a german community with a super-spreading event. medRxiv. 2020 doi: 10.1038/s41467-020-19509-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Thompson S., Serkez Y., Kelley L. How has your state reacted to social distancing? New York Times. 2020 [Google Scholar]
  32. Wu J.T., Leung K., Leung G.M. Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study. The Lancet. 2020;395(10225):689–697. doi: 10.1016/S0140-6736(20)30260-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Zhao S., Lin Q., Ran J., Musa S.S., Yang G., Wang W., Lou Y., Gao D., Yang L., He D. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov) in china, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. Internat. J. Infect. Dis. 2020;92:214–217. doi: 10.1016/j.ijid.2020.01.050. [DOI] [PMC free article] [PubMed] [Google Scholar]

Further Reading

  1. Baud D., Qi X., Nielsen-Saines K., Musso D., Pomar L., Favre G. Real estimates of mortality following covid-19 infection. The Lancet Infectious Diseases. 2020;20(7):773. doi: 10.1016/S1473-3099(20)30195-X. http://www.sciencedirect.com/science/article/pii/S147330992030195X [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bernanke B., Yellen J. The federal reserve must reduce long-term damage from coronavirus. Financ. Times. 2020 [Google Scholar]
  3. Magal P., Webb G. The parameter identification problem for sir epidemic models: identifying unreported cases. J. Math. Biol. 2018;77(6-7):1629–1648. doi: 10.1007/s00285-017-1203-9. [DOI] [PubMed] [Google Scholar]
  4. Riou J., Althaus C.L. Pattern of early human-to-human transmission of wuhan 2019 novel coronavirus (2019-ncov), december 2019 to january 2020. Eurosurveillance. 2020;25(4) doi: 10.2807/1560-7917.ES.2020.25.4.2000058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Wang H., Wang Z., Dong Y., Chang R., Xu C., Yu X., Zhang S., Tsamlag L., Shang M., Huang J. Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china. Cell Discovery. 2020;6(1):1–8. doi: 10.1038/s41421-020-0148-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Econometrics are provided here courtesy of Elsevier

RESOURCES