Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2023 Mar 1;59(3):259–279. doi: 10.1007/s11123-023-00664-5

Estimating the propagation of both reported and undocumented COVID-19 cases in Spain: a panel data frontier approximation of epidemiological models

Inmaculada C Álvarez 1, Luis Orea 2, Alan Wall 2,
PMCID: PMC9975832  PMID: 37143450

Abstract

We use a stochastic frontier analysis (SFA) approach to model the propagation of the COVID-19 epidemic across geographical areas. The proposed models permit reported and undocumented cases to be estimated, which is important as case counts are overwhelmingly believed to be undercounted. The models can be estimated using only epidemic-type data but are flexible enough to permit these reporting rates to vary across geographical cross-section units of observation. We provide an empirical application of our models to Spanish data corresponding to the initial months of the original outbreak of the virus in early 2020. We find remarkable rates of under-reporting that might explain why the Spanish Government took its time to implement strict mitigation strategies. We also provide insights into the effectiveness of the national and regional lockdown measures and the influence of socio-economic factors in the propagation of the virus.

Keywords: SIR models, Stochastic frontier analysis, Panel data, COVID-19, Spain

Introduction

The COVID-19 pandemic, which began in China in December 2019, spread worldwide in a short time. Faced with the threat of their public health systems being overwhelmed, several countries, with Italy and Spain at the forefront as they were the most-affected at the initial stage of the pandemic, saw themselves forced to implement national lockdowns of the population. In the specific case of Spain, this gave rise to heated debates, which would be repeated in other countries (notably the UK), over the timing and duration of the lockdown. There was fierce criticism from some opposition parties over the Spanish national government’s handling of the first wave of the pandemic, and it is noteworthy that the institutional response in Spain to the following waves of COVID-19 up to and including the sixth wave which began in late 2021 have been delegated to regional governments which are charged with implementing measures at local or regional level. A consequence of the regional nature of the new institutional response, however, is that much less attention may be paid to the propagation of the coronavirus across the Spanish provinces and regions.

The propagation of the COVID-19 epidemic and the effectiveness of institutional responses have given rise to a rapidly-evolving literature. Most existing empirical research has focused on the Chinese COVID-19 epidemic. Chinazzi et al. (2020) show that travel limitations had modest effects on containing the spread of the disease in Wuhan (unless they were complemented with additional public health interventions and behavioural changes). Fang et al. (2020), using a difference-in-differences estimator, find that the lockdown was highly effective in reducing the total infection cases outside the city. Regarding the relaxation of the control measures in China, Leung et al. (2020) find it would increase the cumulative number of coronavirus cases and conclude that it is necessary to monitor the increase in new cases due to the effects of relaxing control measures in order for policy makers to be able readjust their decisions.

One of the first studies that aimed to examine the effectiveness of the control measures implemented in several European countries was carried out by Flaxman et al. (2020). They find that the Spanish lockdown averted about 67% of potential deaths by the 31st of March. Saez et al. (2020) and Orea and Alvarez (2022) also find that the Spanish national lockdown was effective in attenuating the propagation of the virus during the first wave of COVID-19 contagion. Orea and Alvarez (2022) conclude, however, that this control measure should be implemented at the very early stages of the epidemics because a rapid institutional response to the COVID-19 outbreak not only saves lives but also attenuates the economic impact of the Spanish coronavirus epidemic.

The effectiveness of institutional control measures while controlling for spatial propagation effects has been treated marginally in the literature, though there are notable exceptions. Thus, Gross et al. (2020) study the spatio-temporal propagation of COVID-19 in China and compare it to other countries. They conclude that early action may attenuate the disease, given the strong relation between population migration and the spreading of disease. Giuliani et al. (2020) also use data disaggregated by provinces to implement an epidemiological model explaining the propagation of COVID-19 across the Italian provinces. The origin of this spatial dimension of propagation is the high inter-provincial mobility of people, and they conclude that the control measures were more successful in those provinces with more effective enforcement. Dickson et al. (2020) find that in the northern Italian provinces the Government containment measures not only succeeded in drastically reducing the transmission of COVID-19 amongst individuals within these provinces, but also avoided contagions between neighbouring areas. Another exception is Gutiérrez et al. (2021), who show that part of the heterogeneity in the incidence of the disease found in Spain is due to differences in mobility flows across the Spanish territories. Orea and Álvarez (2022) find similar results using a simple but novel empirical strategy that loosely mimics the popular reproduction-based models used in the epidemiological literature.

Aside from spatial propagation effects, another important issue that has often been overlooked or not controlled for in this literature is the number of undocumented coronavirus cases. The relevance of this lies in the fact that the proportion of coronavirus infections not detected by the health system during the first wave of contagion of COVID-19 was likely much larger than the proportion of laboratory-confirmed coronavirus cases (see Flaxman et al. 2020), with the result that the official number of coronavirus cases likely falls short of the true number of cases, perhaps significantly so. As Korolev (2021) points out, if we do not take underreporting into account and estimate models from data on confirmed cases under the assumption that all cases are reported, our estimates might be seriously biased. In addition, underreporting may dampen public and political support for more stringent measures such as investments in medical equipment, mandatory masks or mandatory lockdowns. As the undocumented cases facilitate the rapid dissemination of coronavirus (see Li et al. 2020), the reported cases at the first stages of the coronavirus epidemic were likely unable to anticipate the fast development of the coronavirus epidemic in the following weeks.

To account simultaneously for geographical propagation of the virus, the prevalence of undocumented cases and the effectiveness of institutional control measures, in this paper we propose a stochastic frontier analysis (SFA) approach to estimating epidemic curves. The SFA approach can be used to control for the existence of undocumented coronavirus cases because these cases are not observed by the econometrician and the reported cases are always lower than the total number of COVID-19 infections. Therefore, the unobserved cases can be proxied using a one-sided random term in the same fashion as firms’ inefficiency in production economics. In this sense, our work can also be considered as contributing to the line of research initiated by Millimet and Parmeter (2021), which highlighted that the stochastic frontier framework can usefully be extended into the measurement error literature when researchers consider that outcomes are measured with asymmetric error. COVID cases certainly fall into this category.

The model we propose can be seen as an extension to a frontier setting of previous work by Orea and Álvarez (2022), who advocate using a third-order function of the so-called ‘epidemic time’ of the outbreak (i.e., the number of days since the onset date) to capture the typical S-shaped temporal pattern of the virus epidemic. Their non-frontier epidemic-time model can be viewed as a reduced-form model that simply aims to fit the observed epidemic curve of cumulative cases, and for this reason, it does not make assumptions about the incubation period or other critical parameters that determine the contagion of COVID-19. This appealing feature can obviously be applied to our stochastic frontier specification of the epidemic curve. For robustness analysis, and as an alternative to the epidemic-time specification of Orea and Álvarez (2022), we have also estimated a stochastic SIR-based frontier specification, inspired by the non-frontier econometric SIR model proposed by Chudik et al. (2020) which replaces the epidemic time variables with time-varying epidemiological regressors.

Our epidemic stochastic frontier analysis (ESFA) model has other attractive features. First, the stochastic frontier model can be estimated using epidemic-type data only, i.e., the rates of growth of coronavirus cases depend in our models on own and neighbours’ epidemic times, lagged cases of COVID-19, date of implementation of control measures, and so on. However, the model is flexible enough to include other covariates if deemed appropriate. Another advantage of our model is that it permits reporting rates to be estimated rather than assumed and is flexible enough to permit these reporting rates to vary across geographical cross-section units of observation. As such, our ESFA model can be thought of as complementary to existing epidemiological models, such as Chudik et al. (2020), which often assume common reporting rates across areas.1

As the volatility of the rates of growth of reported cases are typically much larger at the beginning of the epidemic than when the epidemic has advanced, our ESFA model must be estimated using time-varying heteroskedastic noise terms. To capture this feature, we propose a stochastic frontier specification which can be interpreted as a heteroskedastic version of the model introduced by Wang and Ho (2010) whose aim was to control for individual effects in a production economics setting. Therefore, our paper also has a methodological contribution for practitioners aiming to estimate firms’ efficiency using the Wang and Ho (2010) approach.

As our epidemic-time model can be extended to include other covariates, in our empirical application we take advantage of this feature to incorporate a series of socio-economic and environmental variables to test their influence of the evolution of total and under-reported cases. We also carry out a series of robustness checks on our epidemic-time stochastic frontier model, including an analysis of the effects of changes to the distributional assumptions and the effects of changing the actual panel data set used to check the effect of dropping observations with zeroes in the variables.

Overall, the empirical strategy used in this paper can be said to rely on several different but related assumptions, which are supported by previous literature: i) the propagation of the virus across areas (Spanish provinces in our application) depends on people’s mobility (Giuliani et al. 2020); ii) this mobility can be modelled using spatial econometrics techniques (Eliasson et al. 2003; Orea and Álvarez 2022); iii) the undocumented cases represent a large proportion of total cases of infection (Flaxman et al. 2020); and iv) the proportion of undocumented cases through the epidemic development varies over time (Li et al. 2020).

A final assumption, which opens the way for a stochastic frontier-based approach, is that unobserved cases can be proxied using a one-sided random term in the same fashion as Millimet and Parmeter (2022). A comparison of our model with Millimet and Parmeter (2022) is instructive as our approaches have similarities and differences. As in Millimet and Parmeter (2022), our model permits that underreporting can be modelled as a function of a set of covariates and that the impact of non-pharmaceutical interventions (in our case, lockdown measures) on COVID-19 cases can be assessed. Spatial spillover effects are incorporated into both models, in the sense that the spread of the virus is modelled not only as a function of an area’s own cases but also as a function of cases in neighbouring areas. It is worth noting that whereas Millimet and Parmeter (2022) uses country-level data, we use more disaggregated spatial units (i.e. provinces). As human mobility across our Spanish provinces is larger than across countries, the spatial propagation of the disease is likely more intense in our application than in their application to countries. Note also that whereas in Millimet and Parmeter (2022) these spillover effects are incorporated into the frontier, in our model they are incorporated into both the frontier and to the one-sided error term capturing undocumented cases. Another contrast to Millimet and Parmeter (2022) is that our model, based as it is on Wang and Ho (2010), explicitly controls for individual (fixed) effects. Another contrast is the feature, mentioned above, that we explicitly model heteroscedasticity in the idiosyncratic noise term to capture differences in the temporal evolution of volatility of the growth rates of reported cases, and in particular the fact that volatility is much larger at the beginning of the epidemic. Millimet and Parmeter (2022) control for this volatility indirectly by using weekly data. When a sufficiently long data set is available, this is a perfectly valid solution. However, when the time series is relatively short, as in our case, aggregating daily data to weekly level will not be feasible. In these settings where daily data must be used, explicit modelling of heteroscedasticity of the noise term becomes a necessary and attractive feature. Finally, note that we follow Orea and Álvarez (2022) and use a simple epidemic-time model, whereas Millimet and Parmeter (2022) use a SIR-based model, where current coronavirus cases depend on a set of geographic, demographic and political characteristics of each country. For comparison purposes, however, we also estimate a SIR-based model that instead uses lagged values of coronavirus cases to explain current cases in the same fashion as Chudik et al. (2020). As lagged cases are also measured with error due to the existence of undocumented coronavirus cases, the epidemic-time model, which can be viewed as a reduced-form of a SIR-based model, is our preferred specification. In summary, we see our model as complementary to that of Millimet and Parmeter (2022), where one or the other may be more suitable depending on the data available and the assumptions the researcher is willing to make.

The paper proceeds as follows. Section 2 defines the three epidemic curves we use, namely the total epidemic curve, the reported cases epidemic curve, and the undocumented cases epidemic curve. In Section 3 we present the stochastic frontier representation of the epidemic curves. Distributional assumptions about the error terms and the maximum likelihood procedure for the general specification of the model are discussed. Section 4 presents our empirical application to Spanish provinces at the outset of the COVID-19 epidemic in the spring of 2020, where we estimate a basic version of our preferred frontier model, namely the epidemic-time model, with a spatial lag specification. In Section 5 we present a series of extensions and robustness checks to the basic model. Section 6 provides a discussion of the empirical results and Section 7 concludes and provides some pointers for future research.

Total and partial epidemic curves

In this section we define three epidemic curves that resemble the popular reproduction-based models used in the epidemiological literature, which often ignore the existence of undocumented coronavirus cases.

Consider a panel of i = 1,…, N provinces observed on t = 1,…, T days. t is the calendar time. Let Ei denote the onset date of the epidemic, namely the date on which province i reports its first coronavirus case. We then analyse the development of the epidemic in each province, i.e., the temporal evolution of coronavirus cases once each province reports its first coronavirus case. A key variable to carry out this analysis is the epidemic time, Kit = t-Ei, which denotes the number of days since the onset date. Next, let Yit* denote the cumulative number of both laboratory-confirmed (Yit) and undocumented (Uit) coronavirus cases until day t in province i. Thus:

Yit*=Yit+Uit 1

In Orea and Álvarez (2022), the epidemic curve of reported cases (Yit) is represented by an autoregressive relationship:2

Yit=βitYit1 2

where βit can be interpreted as an autoregressive parameter (function) that depends on a set of covariates. We label this the epidemic curve.3 We have found in our application that Yit is not a stationary variable, in which case estimating (2) might give spurious results. This issue vanishes if we use rates of growth of reported coronavirus cases. In order to get a simple empirical specification of (2), we take natural logarithms and first-differentiate the model. This yields the following expression:

Rateit=lnYitlnYit1=lnβit 3

where lnβit simply measures the daily rate of growth of reported cases. Two alternative specifications (linear vs exponential) for lnβit are used in our empirical application. While the linear specification might yield negative rates of growth of cumulative cases, the so-called exponential specification imposes the theoretical restriction βit ≥ 1. We expect rates of growth of coronavirus cases to vary with the epidemic time, Kit, because the traditional epidemic curve has an S-shaped form. If this is indeed the case, the epidemic curve βit can be modelled empirically as a third-order function of the (logged) epidemic time variable, conditional on other control variables.4

Similar autoregressive expressions can be written for undocumented and total coronavirus cases. That is, each variable measuring coronavirus cases has its own epidemic curve. While the epidemic curve of reported cases is given by (3), the epidemic curves of undocumented and total coronavirus cases can be written as follows:

Uit=θitUit1 4
Yit*=βit*Yit1* 5

Figure 1 illustrates our three hypothetical epidemic curves. By construction, we have assumed in this figure that Yit* is the sum of Yit and Uit for each epidemic time Kit. Note that while the epidemic curve of reported cases has the traditional S-shaped form, the epidemic curve of undocumented cases is depicted using a log form from the beginning of the epidemic onwards. This allows the proportion of undocumented cases to decrease over time as in Li et al. (2020). The shape of the total epidemic curve is thus a combination of the shapes of the two partial epidemic curves.

Fig. 1.

Fig. 1

Epidemic curve of total, reported and undocumented cases

We now examine this feature analytically. Taking into account (1), the autoregressive parameter βit* can be decomposed as follows:

βit*=βit+θitβitUit1/Yit1* 6

This equation shows that the slope of the overall epidemic curve coincides with that of the epidemic curve of reported cases if both reported and undocumented cases have the same temporal patterns (i.e., θit = βit).5 In order to link both epidemic curves, let uit denote the log difference between total and reported coronavirus cases:

uit=lnYit*lnYit 7

Given the above definition, the proportion of undocumented cases can be expressed as an increasing function of uit because Uit/Yit* = 1−euit. uit can therefore be viewed as a relative measure of the undocumented cases in an epidemic outbreak: loosely speaking, we can interpret uit as the “proportion of undocumented cases”. Equation (7) also allows us to link the reported and undocumented cases as follows:

Uit=Yiteuit1 8

If we plug (8) into (4) in both consecutive periods and use (2), we get:

βit=θiteuit11/euit1 9

This equation states that βit = θit if, and only if, the log difference between total and reported coronavirus cases (uit) is time invariant, i.e., when ∆uit = uit − uit−1 = 0. Using (2) and (5) and the definition of uit in (7), the previous decomposition in (6) collapses to:

βit=βit*eΔuit 10

This equation shows that the total epidemic curve coincides with the epidemic curve of reported cases when the proportion of undocumented cases does not change over time, i.e., when ∆uit = 0. On the other hand, Eq. (10) suggests that the epidemic curve of reported cases (i.e. βit) can be estimated using two approaches: i) from an econometric specification of Eq. (2) that does not provide any information about the relative importance of undocumented cases, as in Orea and Álvarez (2022);6 or ii) from a stochastic frontier specification of (10) that is able to estimate both the total epidemic curve (βit*) and the temporal changes in the proportion of undocumented cases (∆uit), as we will see in the next section.

The latter empirical strategy is developed in detail in the next section. In a nutshell, this strategy involves estimating the epidemic curve of total cases using a stochastic frontier specification of the model where the undocumented cases are proxied using a one-sided random term in the same fashion as firms’ inefficiency in production economics. The two partial epidemic curves (i.e., the epidemic curves of reported and undocumented cases) can be obtained once the epidemic curve of total cases has been appropriately adjusted using the estimated proportions of undocumented cases that appear in (9) and (10).

Frontier specification of our epidemic curves

Frontier specification

This section discusses estimation of the epidemic curve of reported cases using a stochastic frontier model, an econometric specification widely used in production economics to measure firms’ efficiency. The stochastic frontier analysis approach can be used to control for the existence of undocumented coronavirus cases because these cases are not observed by the econometrician and the reported cases are always lower than the total number of COVID-19 infections. This is illustrated in Fig. 2, where we have simplified our previous figure by dropping the two partial epidemic curves. This figure shows that the total epidemic curve can be viewed as a function that envelops the observed number of coronavirus cases from above. The gap between Y* and Y is the number of undocumented cases, which never takes negative values. The stochastic frontier analysis approach uses one-sided random terms to control for non-negative (or non-positive) unobserved variables, such as firm inefficiency in production economics.

Fig. 2.

Fig. 2

Overall epidemic curve and undocumented cases

As lnβit = Rateit by definition, the stochastic frontier model that is finally estimated can be obtained once we take natural logarithms in (10) and add a traditional noise term:

Rateit=lnβit*+vitΔuit=lnβit*+εit 11

where lnβit* is a function of a set of covariates determining the temporal evolution of total coronavirus cases. The idiosyncratic feature of our frontier specification of the model is the existence of two random terms. The first one is the traditional noise term (vit) capturing random shocks, measurement or specification errors and other unobservable variables not correlated with the set of explanatory variables determining the rate of growth of coronavirus cases. The second random term is the difference of two one-sided random terms and captures changes over time in the proportion of undocumented cases (∆uit). Note that the cumulative number of unreported cases obviously increases over time but that the proportion of undocumented cases, uit, may either increase or decrease as the pandemic evolves so that ∆uit may be either positive or negative. We show in Section 3.3. that this does not preclude the use of a stochastic frontier model if we impose some structure on the distribution of uit.

Our empirical strategy thus relies on three assumptions: i) the epidemic nature of this disease can be best represented by a total epidemic curve, regardless of whether researchers observe all COVID-19 cases or not; ii) the unobserved cases can be proxied using a one-sided random term in the same fashion as firm inefficiency in production economics; and iii) the proportion of undocumented cases varies over time during the evolution of the epidemic.

Our epidemic frontier model in (11) looks similar to a (panel) stochastic production frontier model. It is common in this literature to estimate the following model in levels:

lnYit=αi+fXit,β+vituit 12

where the subscript i stands for firm, Xit is a vector of exogenous production drivers, β is a vector of technological parameters, vit is a noise term capturing production shocks, and uit is a non-negative random term capturing firm inefficiency. αi is a firm-specific intercept aiming to capture characteristics that affect firms’ production but that are unobserved or omitted variables. In our setting, this captures time-invariant unobserved effects that affect the levels of COVID-19 cases and that Millimet and Parmeter (2022) controlled directly using a set of geographic and demographic characteristics of each country. Estimation of the model in (12) using the so-called True Fixed Effects (TFE) model introduced by Greene (2005)7 is not easy due to the incidental parameter problem.8 Wang and Ho (2010) solve this problem using temporal transformations of (12). If we take first differences in Eq. (12) to remove the time-invariant firm-specific effects, we get:

ΔlnYit=ΔfXit,β+vit*Δuit 13

where vit*=Δvit follows a (multivariate) normal distribution. The production frontier model in (13) is similar to our epidemic frontier model in (11). There are, however, two main differences. First, while ∆f (Xit, β) can be negative in a production economics setting, we need to impose the theoretical restriction lnβit*0 due to the cumulative nature of Yit. Second, while the production frontier function represents a “technology” (i.e. an unknown combination of production processes), our frontier represents an underlying epidemic process that involves both confirmed and undocumented cases.

Finally, it is worth highlighting that the aforementioned stochastic epidemic frontier model recently proposed by Millimet and Parmeter (2022) aims to explain new coronavirus cases. Our model, on the other hand, focuses on rates of growth of cumulative cases. Despite this difference, both approaches are similar if we take into account that Rateit ≈ (YitYit−1)/Yit−1 = Nit/Yit−1, where Nit stands for new cases in day t in province i. For this reason, there are no clear advantages of using rates of growth instead of new cases. Moreover, Orea and Álvarez (2022) show that our parameter estimates can be interpreted as a semi-elasticity of the number of new cases with respect to an explanatory variable, in the same fashion as in count regression models. They point out, however, that the growth rate of cumulative cases is much less volatile than the number of new cases or its growth rate. Our empirical strategy therefore provides more accurate predictions than a count-type model. This is a feature of the model that is important in our application because we use predicted values to carry out our counterfactual analyses aimed at examining the effect of the Spanish lockdown.

In order to estimate the above model using ML, we are forced to choose a distribution for both the noise term (vit) and the one-sided random term capturing the proportion of undocumented cases (uit). In what follows, we discuss the distribution of vit, the distribution of uit, and the likelihood function.

Distribution of the noise term

We have added a noise term (vit) in Eq. (11) in order to directly capture measurement errors in the rate of growth of coronavirus cases. As is customary in the stochastic frontier literature, we assume that the vit’s are independent of the ui’s. If we next assume that vit is independently distributed over time and follows a normal distribution with zero mean, the noise vector vi = (vi1,…,viT) will follow a multivariate normal distribution with a diagonal covariance matrix. Using the notation from Wang and Ho (2010), the density function of the vector vi is:

gvi=2πT2Π1/2exp12viΠ1vi 14

where ∏ is the variance-covariance matrix of vi. We then assume that the noise vector vi = (vi1,…,viT) follows a multivariate normal distribution with a diagonal but heteroskedastic variance-covariance matrix because the volatility of the rates of growth of reported cases decreases throughout the epidemic development:

Π=σv120000σv2200000000σvT2 15

This specification of the variance-covariance matrix of vi differs from that used in Wang and Ho (2010) in two important aspects. On one hand, our noise term is heteroskedastic, whereas it follows a homoskedastic distribution in Wang and Ho (2010). On the other, while we assumed that vit is not autocorrelated over time, the first-differences and within transformations carried out by Wang and Ho (2010) to remove time-invariant firm-specific effects introduce negative correlations between two consecutive (transformed) noise terms.

An autocorrelated specification can be obtained if we introduce the noise terms before computing the rates of growth of coronavirus cases, in the spirit of Chudik et al. (2020) and Millimet and Parmeter (2022). Let us rewrite (7) as follows:

lnYit*=lnYit+uitvit 16

where vit is a two-sided error term that now captures non-systematic variations in total coronavirus cases (Millimet and Parmeter 2022). If we next take natural logarithms in (5) and replace lnYit* and lnYit1* with (16) evaluated at t and t−1, we get:

ΔlnYit=lnβit*+v~itΔuit 17

where v~it=Δvit. The new noise term is no longer independently distributed over time. If we assume that vit follows a heteroskedastic normal distribution, the noise vector v~i=v~i1,,v~iT follows a multivariate normal distribution with the following variance-covariance matrix:

Π=σv12+σv02σv1200σv12σv22+σv12σv22000σv220σv32+σv220σvT120σvT12σvT2+σvT12 18

If vit is homoskedastic as in Wang and Ho (2010), we get the variance-covariance matrix of their first-differences transformed noise term (see their Eq. 12). It is an empirical question whether specification (15) or (18) of the noise term is better. However, it should be mentioned that estimation of a frontier epidemic model using (18) is more problematic if the panel dataset is not continuous and there are missing observations between t = 1 and t = T. This happens, for instance, if we drop the observations with zero rates of growth of coronavirus cases that led to convergence problems9 when maximizing the likelihood functions in most of our estimated models.10

Distribution of uit

We now turn to the part of the likelihood function related to the proportion of undocumented cases. Estimating (11) is far from straightforward because the distribution of ∆uit is generally not known if we assume that uit is independently distributed across provinces and over time (see, for instance, Wang 2003, and Orea and Álvarez 2019). To deal with this issue, we follow Wang and Ho (2010) and assume that uit possesses the so-called scaling property so that it can be multiplicatively decomposed into two components as follows:

uit=hzit,τui 19

where hit = h(zit, τ) ≥ 0 is a deterministic (scaling) function, zit is a set of undocumented-cases determinants (often labelled as contextual or z-variables), and ui is a homoskedastic one-sided random variable. For notational ease, we assume hereafter that the panel dataset is balanced in the sense that we have not dropped observations along the epidemic development. The preceding implies that the first temporal difference of uit in (11) can be rewritten as:

Δuit=hithit1ui=Δhitui 20

where ∆hit can be positive or negative. Notice that if the scaling function hit is not constant, the one-sided random variable is identified after the first-difference, and that the distribution of ui is not affected by the first-differences transformation. This key aspect of their model enabled Wang and Ho (2010) to get a tractable likelihood function for their transformed model. The same applies to our stochastic frontier epidemic model. Consequently, as the density function of εi = (εi1,…, εiT) has a closed-form, Eq. (11) can be estimated by Maximum Likelihood (ML), provided that the scaling function hit is not constant. As Wang and Ho (2010) point out, this condition requires that zit contains at least one variable which changes values over time. Obviously, this happens if we include the epidemic time Kit = tEi as determinant of the proportion of undocumented cases.

Our frontier model in (20) essentially mimics the one proposed by Wang and Ho (2010) to get a tractable likelihood function for their transformed model. It also looks like the specification introduced by Kumbhakar (1990) and Battese and Coelli (1992) except for the first-differencing transformation of the scaling function. In this sense, we have basically replaced ηit = eη(t-T) in Battese and Coelli (1992, Eq. 2) with ηit = ∆hit = eτzit − zit−1, where zit is a set of undocumented-cases determinants that include a time-trend variable (e.g., t or Kit).

Likelihood function

For simplicity, we will assume that ui~N+(0, σu). We recall that the half-normal distribution of ui is not affected by the first-differencing transformation of the idiosyncratic one-sided error term, so that ∆uit = ∆hitui is distributed as a heteroscedastic half-normal. Wang and Ho (2010) showed that the aforementioned assumptions on vit and uit yield the following log-likelihood function for province i:

lnLi=N2ln2π12lnΠ12εiΠ1εi+12μ*2σ*2μ2σu2+lnσ*Φμ*σ*lnσuΦμσu 21

where Φ is the standard normal cumulative distribution function, εi = (εi1,…, εiT), εit = ∆lnYitlnβit*(·), and

μ*=μ/σu2εiΠ1ΔhiΔhiΠ1Δhi+1/σu2 22
σ*2=1ΔhiΠ1Δhi+1/σu2 23

where ∆hi = (∆hi1,…, ∆hiT). Consistent parameters estimates can be obtained by numerically maximizing lnL=i=1NlnLi.

Empirical illustration

Sample and data

We have used several sources to construct a dataset of coronavirus cases across Spain. As most control measures began on the days of March 13th and 14th, 2020, we analyse data on coronavirus cases two weeks before and two weeks after those dates. In particular, our data set covers the period between the onset of the epidemic in each province and the 4th of April.

The daily evolution of laboratory-confirmed COVID-19 cases in the Spanish mainland provinces was collected manually by the authors from the official press releases of the Spanish regional governments, the Ministry of Health and Wikipedia. These information sources had to be consulted to extend backwards in time the provincial data published by Datadista in GitHub, under a free license. GitHub extracts their data from a variety of documents published by the Ministry of Health but only published data from March 13th on.11 For the 28th of March onwards we collected the data directly using RTVE Flourish.12 We used the regional online data released by the Ministry of Health13 and the province-level data released by the Spanish regional governments to correct typos and lack of information on coronavirus cases in some provinces.

Figure 3 shows the boxplots of the growth rates of cumulative reported cases by epidemic time, from which two features are evident. First, the rates of growth of reported cases are much larger at the beginning of the epidemic than when the epidemic has advanced. That is, our dependent variable tends to decrease over the epidemic time. Second, the volatility is much larger when Kit is small, and declines as Kit increases. This calls for a time-varying heteroskedastic specification of our symmetric error term.

Fig. 3.

Fig. 3

Growth rates of cumulative cases

Parameter estimates

Table 1 shows the parameter estimates of several epidemic-time specifications of Eq. (11). That is, the four specifications in this table use a third-order function of lnKit to capture the temporal evolution of coronavirus cases. As the likelihood function of these models has a closed form, they have all been estimated by ML. The first two specifications assume that the epidemic curve of total coronavirus cases (i.e., lnβit*) is a linear function of a set of covariates, whereas the last two specifications assume that lnβit* is an exponential function in order to impose the theoretical restriction βit*1. The non-frontier models assume that ∆uit = 0, thereby ignoring the one-sided random term that appears in Eq. (11), which is equivalent to assuming that the proportion of undocumented cases does not change over time. These non-frontier models therefore impose the strong assumption that the epidemic curves of both total and reported coronavirus cases coincide (see Eq. 10). The frontier models relax this assumption by adding the first difference of a one-sided error term that can be multiplicatively decomposed into an exponential scaling function (that is, we assume that the scaling function that appears in Eq. (12) is hit = ezitτ) and a homoskedastic half-normal random variable.

Table 1.

MLE: Epidemic-time (lnKit) specification

Linear Exponential
Non-frontier model Frontier model Non-frontier model Frontier model
Coef. s.e. Coef. s.e. Coef. s.e. Coef. s.e.
Overall epidemic curve
 Intercept 0.8083** 0.3734 0.8950 0.7857 4.1081*** 1.1877 5.1474 3.2768
 lnKit −0.3775 0.4130 −0.4959 0.8168 −7.0363*** 1.5375 −8.8926** 3.8386
lnKit2 0.1492 0.1518 0.1602 0.2869 3.2610*** 0.6438 4.0997*** 1.5123
lnKit3 −0.0248 0.0183 −0.0227 0.0333 −0.5010*** 0.0863 −0.6332*** 0.1964
 WilnKit 0.0360* 0.0185 0.0830*** 0.0201 0.0835*** 0.0499 0.1278* 0.0724
 Dt −0.1815*** 0.0304 −0.1376*** 0.0295 −0.5964*** 0.0875 −0.4977*** 0.1306
 WilnKtDt −0.0466*** 0.0187 −0.0782*** 0.0175 −0.1964*** 0.0541 −0.2879*** 0.0592
Noise term (lnσv)
 Intercept 0.7954*** 0.1375 1.1092*** 0.0924 0.8430*** 0.1333 1.1247*** 0.1111
 lnKit −1.0215*** 0.0485 −1.1673*** 0.0322 −1.0411*** 0.0468 −1.1714*** 0.0390
Scaling function
 Kt −0.0383*** 0.0112 −0.0437*** 0.0145
 WilnKt −0.1376*** 0.0429 −0.0796 0.0501
 KtDt −0.0020* 0.0011 −0.0026* 0.0014
 WilnKtDt −0.0281** 0.0136 −0.0350* 0.0194
u-term (lnσu)
 Intercept 1.2122*** 0.5060 1.1527*** 0.4280
Day of the week effects
 Tuesday 0.0154 0.0108 0.0182 0.0210 0.1009 0.0722 0.1992 0.1478
 Wednesday 0.0128 0.0106 0.0173 0.0180 0.0689 0.0720 0.1599 0.1281
 Thursday 0.0130 0.0103 0.0177 0.0156 0.1083 0.0699 0.2243* 0.1220
 Friday 0.0047 0.0103 0.0127 0.0185 0.0216 0.0780 0.1421 0.1288
 Saturday 0.0020 0.0101 0.0062 0.0132 −0.0109 0.0734 −0.0124 0.1463
 Sunday 0.0213 0.0114 0.0167 0.0201 0.1729** 0.0711 0.2571** 0.1432
 Mean log LF 0.6572 1.6373 0.6648 1.6379
 Pseudo R-sq 0.3442 0.3735 0.3291 0.3507
 Mean RR 0.3490 0.4220
 Obs. 1290 1290 1290 1290

***Significant at 1% level

**Significant at 5% level

*Significant at 10% level

All models include a day-of-the-week effect (not reported) that aims to capture reporting lags by regional and national governments. They also all include a dummy variable Dt that takes the value 1 from the 14th of March, 2020, the day marking the imposition of most of the coronavirus control measures by the Spanish government. The coefficient of this dummy variable allows us to test whether the Spanish lockdown and other public control measures implemented around the 14th of March were able to attenuate the spread of the virus within each province.14

Following the scant epidemiology literature that controls for spatial spillover effects, we use a spatial lag of X specification (SLX) to measure the propagation effect of mobility of people across provinces. In particular, we include Wi lnKt as an epidemic frontier driver, where lnKt is a N × 1 vector of epidemic times of the Spanish provinces, and Wi is a 1 × N spatial weight vector where the weights measure the degree of mobility (connectivity) between provinces. We follow Giuliani et al. (2020) and Gross et al. (2020) and use a contiguity or binary Wi vector, where the weights equal one for adjacent units and zero for non-bordering units.15 Therefore, we assume that lnβit* depends on the epidemic time of neighbouring provinces. We have selected the epidemic time to capture the potential propagation effects between provinces for two reasons. First, this variable is exogenous by construction. Second, Orea and Álvarez (2022) found that the SLX spatial specification captured all the spatial dependence in the dependent variable using a set of spatial autocorrelation tests on the model’s residuals.16

The specification of both random terms is also common to all models. On the one hand, all models have been estimated assuming that the logarithm of the standard deviation of vit depends on the logarithm of Kit because the volatility of growth rates of reported cases decreases throughout the evolution of the epidemic. In order to capture temporal changes in uit, we assume that the scaling function depends on two time-varying contextual variables: i) the epidemic time of each province (Kit), in the same fashion as Battese and Coelli (1992); and ii) the logged epidemic time of neighbouring provinces (WilnKt), because we believe that the mobility of people across provinces might also have a significant effect on the proportion of undocumented cases. Finally, we interact our lockdown dummy variable Dt with both Kit and WilnKt in order to examine whether the Spanish lockdown and other control measures (such as an increase in testing) reduced the proportion of undocumented cases.

The intercepts estimated in the linear models are close to unity, indicating that the initial rates of growth of coronavirus cases are relatively large. The exponential models yield much larger initial growth rates, a result that might explain why all the coefficients of the third-order function of lnKit are statistically significant using this specification. In contrast, we do not find significant lnKit coefficients using the linear specification, a result that implies that the rates of growth of coronavirus cases do not change during the epidemic. This result would appear to be incorrect, however, as Fig. 3 suggests that these rates of growth decrease rapidly in the early stages of the epidemic. This feature is better captured by the exponential model, as the negative large coefficient of lnKit found using this specification indicates that these growth rates rapidly decreased a short time after the beginning of the epidemic. Moreover, the previous result, together with the positive and negative coefficients found respectively for lnKit2 and lnKit3, is consistent with the traditional S-shaped epidemic curves. For all these reasons, the exponential specifications of our epidemic curve are the preferred ones.

Another key result of our empirical exercise is the positive and statistically significant coefficient found for the spatially-lagged variable, WilnKt. This result provides evidence supporting the belief that people’s mobility did spread the virus across the country, as it indicates that the growth rate of COVID-19 cases in a province depends on the evolution of the pandemic in other provinces. Notice that we have interacted Dt with WilnKt. This implies that the coefficient of WilnKt actually measures propagation effects before the implementation of the Spanish lockdown. The coefficient of WilnKt·Dt is negative and statistically significant, indicating that the lockdown was quite effective in preventing the propagation of the coronavirus between provinces. Another issue is whether the lockdown was effective in reducing the propagation of the virus within each province. This within-province impact of the Spanish lockdown can be examined using the estimated coefficient of Dt.17 We find a negative and statistically significant effect of the Spanish lockdown on the rates of growth of coronavirus cases, regardless of whether we use linear or exponential specifications for the epidemic curve.

In summary, these results allow us to conclude that the lockdown was effective in both preventing the propagation of the coronavirus between provinces and in attenuating the propagation of the virus within each province. We carried out a counterfactual exercise using the parameter estimates of our preferred model to simulate what the situation would have been on April 4th if the lockdown had not been implemented around March 14th. We found that the lockdown reduced the number of potential COVID-19 cases by 65.2%. Using a similar approach, Cho (2020) found that the cases of infection in Sweden would have been reduced by almost 75% had its policymakers followed stricter containment policies.

We now focus our attention on the distribution of both random terms. As expected, we find that the standard deviation of the noise term decreases with the logarithm of Kit. Regarding the one-sided random term, using the exponential frontier model we find that the average reporting rate (RR = Y/Y*) is 42.2%. This rate changes over time as we find that the coefficients of the scaling function are negative, with most of them being statistically significant. We also find very different (under)reporting rates across the Spanish provinces, which is one of the contributions of the paper as the previous epidemiological literature often relies on common rates. For illustrative purposes, in Appendix B, the reported, unreported and total case estimations for each Spanish province on April 4th, 2020, are presented in Table 4 and the temporal evolution of reported and total cases by province are presented in Fig. 7.

Table 4.

Reported, undocumented and total cases by province (April 4th, 2020)

Region Province Reported Undocumented Total RR
A B C = A + B D = A/C
Andalucía Almería 346 327 673 51.4
Cádiz 846 236 1082 78.2
Córdoba 974 394 1368 71.2
Granada 1477 351 1828 80.8
Huelva 279 65 344 81.1
Jaén 914 651 1565 58.4
Málaga 1863 717 2580 72.2
Sevilla 1602 2185 3787 42.3
Aragón Huesca 396 106 502 78.9
Teruel 371 154 525 70.7
Zaragoza 2409 1062 3471 69.4
Asturias Asturias 1605 691 2296 69.9
Cantabria Cantabria 1441 1500 2941 49.0
Castilla Albacete 2653 3269 5922 44.8
La Mancha Ciudad Real 3854 3418 7272 53.0
Cuenca 497 164 661 75.2
Guadalajara 858 168 1026 83.6
Toledo 2169 1994 4163 52.1
Castilla Ávila 679 242 921 73.7
León Burgos 985 265 1250 78.8
León 1261 1087 2348 53.7
Palencia 472 198 670 70.4
Salamanca 1659 2226 3885 42.7
Segovia 1148 1659 2807 40.9
Soria 803 262 1065 75.4
Valladolid 1403 2652 4055 34.6
Zamora 339 84 423 80.1
Cataluña Barcelona 27484 34557 62041 44.3
Girona 2072 1538 3610 57.4
Lleida 1176 274 1450 81.1
Tarragona 958 756 1714 55.9
Extremadura Badajoz 672 526 1198 56.1
Cáceres 1375 1337 2712 50.7
Galicia A Coruña 2180 715 2895 75.3
Lugo 561 344 905 62.0
Ourense 921 460 1381 66.7
Pontevedra 1519 528 2047 74.2
La Rioja La Rioja 2592 516 3108 83.4
Madrid Madrid 37584 27553 65137 57.7
Murcia Murcia 1235 194 1429 86.4
Navarra Navarra 3073 2859 5932 51.8
País Vasco Álava 2639 544 3183 82.9
Vizcaya 4489 2481 6970 64.4
Guipúzcoa 1500 729 2229 67.3
Valencia Alicante 2627 1042 3669 71.6
Castellón 852 2056 2908 29.3
Valencia 3701 2838 6539 56.6

Note: Reported rate (RR) in percentage

Fig. 7.

Fig. 7

Temporal evolution of reported and total cases by province

Regarding the temporal path of reporting rates, Fig. 4 shows the province-specific reporting rates by epidemic time computed using our preferred exponential frontier model. Several comments are in order regarding this figure. First, we observe that all rates tend to increase throughout the evolution of the epidemic because we have found before that uit tends to decline over time. Second, the sample mean varies from 25.3 to 52.5%. These averages reveal that the multiplication factor is on average close to 4 at the beginning of the epidemic and close to 2 at later stages.18 Third, the minimum RR values suggest that there are (many) provinces with very low reporting rates, and hence extremely large multiplication factors, especially at the very beginning of their epidemic episodes. In this sense, our estimated reporting rates are in line with Li et al. (2020), who also find very low reporting rates (14%) before the implementation of the Chinese travel restrictions.19

Fig. 4.

Fig. 4

Temporal evolution of reporting rates (epidemic-time model)

The geographical distribution of reporting rates across the Spanish provinces is shown in Fig. 5. As the reporting rates vary over time, we have depicted this map using the provincial reporting rates evaluated at the epidemic time 20. Figure 5 seems to suggest the existence of two groups of provinces, one with relatively large reporting rates and the other with relatively low reporting rates. It can be seen that most, but not all, of the provinces with small reporting rates are located in the regions of Castilla-León, Extremadura, and Valencia, and the two main epicentres in Spain (Madrid and Barcelona). The multiplication factors in these provinces (not shown) are on average close to 8. The largest reporting rates are found in coastal Andalucía and several provinces located in the Iberian and Pyrenees mountain ranges. Consequently, their multiplication factors are much smaller than those computed for the previously-mentioned provinces (close to 1.7 on average).

Fig. 5.

Fig. 5

Geographical distribution of reporting rates

Robustness analyses

In this section we provide some extensions to the base model. We do not show the new parameter estimates for reasons of space but they can be found in Orea et al. (2021). Here, we simply summarize the main results of these robustness analyses.

Alternative specifications: SIR-based models

First, we compared our results with those obtained using a frontier specification inspired in the SIR theoretical epidemic model of Chudik et al. (2020), where we replace the third-order function of lnKit with the first and second-order lagged values of lnYit and their interaction. The derivation of a SIR-based frontier model can be found in Appendix A. There we show that some simplifying and strong assumptions need to be made in order to estimate a SIR-based model once undocumented cases are incorporated, so that its results are likely to be biased.

To examine the relative performance of the SIR-based and epidemic-time frontier specifications, we carried out several simulation exercises, which can be found in a previous version of this work published as a working paper (Orea et al. 2021). In summary, we found that in all cases the frontier specification performs better than a non-frontier model in terms of goodness-of-fit. Regarding the frontier specifications, we find that the SIR specification provides a better goodness of fit than the epidemic-time specification because lnYit exhibits greater cross-sectional heterogeneity than lnKit. However, the SIR specifications tend to significantly overestimate the (proportion of) unreported cases. Finally, both models perform particularly poorly when they do not take account of heteroskedasticity in the symmetric error term when the cross-section dimension of the panel dataset is small. When the cross-section dimension is increased, the estimates are much more accurate. In any case, it appears appropriate to model the symmetric error term as heteroskedastic.20

The parameter estimates from the linear and exponential specifications of the non-frontier and frontier models are reported in Table 2. As with the epidemic-time models, we find using the SIR-based models that the mobility of people across provinces did clearly spread the virus across the country. We find a larger coefficient for the interaction of WilnKt with the lockdown dummy variable, indicating that the lockdown was even more effective in preventing the propagation of the coronavirus between provinces using the SIR specification. In contrast, the within-province impact of the Spanish lockdown is smaller than in the epidemic-time models. Overall, both the epidemic-time and SIR-based specifications suggest the existence of significant spatial spillovers and provide evidence that the Spanish lockdown was effective in reducing the propagation of COVID-19 both within and between provinces.

Table 2.

MLE: SIR specification

Linear Exponential
Non-frontier model Frontier model Non-frontier model Frontier model
Coef. s.e. Coef. s.e. Coef. s.e. Coef. s.e.
Overall epidemic curve
 Intercept 0.2897*** 0.0301 0.1842*** 0.0385 −1.4418*** 0.1058 −2.0487*** 0.1978
 lnYt−1 0.1948*** 0.0230 0.0370 0.0369 0.2666*** 0.0679 −0.0901 0.1467
 lnYt−2 −0.2380*** 0.0220 −0.1062*** 0.0330 −0.5327*** 0.0618 −0.3994*** 0.1238
 lnYt−1∙lnYt−2 0.0053*** 0.0007 0.0020 0.0020 −0.0105** 0.0053 −0.0334** 0.0149
 WilnKt 0.0550*** 0.0181 0.0972*** 0.0218 0.1941*** 0.0491 0.4492*** 0.1442
 Dt −0.1121*** 0.0296 −0.0574 0.0363 −0.3384*** 0.0875 −0.0790 0.1639
 WilnKitDt −0.0591*** 0.0182 −0.0863*** 0.0187 −0.2344*** 0.0524 −0.3795*** 0.1157
Noise term (lnσv)
 Intercept 0.8743*** 0.1352 1.0180*** 0.0926 0.6446*** 0.1288 0.9607*** 0.1537
 lnKt −1.0710*** 0.0476 −1.1553*** 0.0326 −0.9807*** 0.0453 −1.1295*** 0.0554
Scaling function
 Kt −0.0091 0.0233 −0.0538*** 0.0188
 WilnKit −0.0185 0.0395 −0.0688* 0.0370
 KtDt 0.0005 0.0012 0.0002 0.0018
 WilnKtDt 0.0044 0.0103 0.0042 0.0284
u-term (lnσu)
 Intercept 2.5189 2.1028 1.2793*** 0.2587
Day of the week effects
Tuesday 0.0192** 0.0100 0.0193 0.0215 0.1316* 0.0712 0.1892 0.1300
Wednesday 0.0124 0.0098 0.0173 0.0149 0.1018 0.0718 0.1756* 0.1005
Thursday 0.0103 0.0096 0.0182 0.0144 0.1077 0.0714 0.2436** 0.1171
Friday 0.0006 0.0094 0.0125 0.0193 0.0061 0.0788 0.1842 0.1433
Saturday −0.0064 0.0092 0.0042 0.0151 −0.0632 0.0718 −0.1121 0.1688
Sunday 0.0234** 0.0106 0.0177 0.0248 0.2262*** 0.0695 0.2796** 0.1332
Mean log LF 0.7174 1.6843 0.6934 1.6817
Pseudo R-sq 0.3799 0.4531 0.3827 0.4344
Mean RR 0.0340 0.4060
Obs 1290 1290 1290 1290

***Significant at 1% level

**Significant at 5% level

*Significant at 10% level

Regarding the two random terms, in the SIR-based models we find a decreasing standard deviation for the noise term, as occurred with the epidemic-time models. The parameter estimates of the scaling function, on the other hand, differ notably from those obtained in the epidemic-time models when we use a linear specification of the model, but not when using an exponential specification. Moreover, whereas the linear SIR-based and epidemic-time models provide very different average reporting rates, the exponential specification of the SIR-based model provides quite similar average reporting rates to its epidemic-time equivalent. This occurs in spite of our finding in the simulation exercises that the SIR specifications tended to underestimate the reporting rates. The exponential form of the SIR-based frontier epidemic curve therefore tends to attenuate the bias in the estimation of the one-sided error term using the linear SIR-based specification. Regarding the temporal path of reporting rates, Fig. 6 shows the province-specific reporting rates by epidemic time computed using the exponential SIR-based model. As in our epidemic-time model, all rates tend to increase throughout the evolution of the epidemic.

Fig. 6.

Fig. 6

Temporal evolution of reporting rates (SIR model)

Finally, it is worth highlighting the larger (mean) log likelihood and (pseudo) R-squared values of both linear and non-linear SIR-based models. This is an expected result because the temporal lags of reported cases explain a larger proportion of the current cross-sectional heterogeneity of reported cases than the simple polynomial of epidemic times.21 While the temporal path of reporting rates is robust to this issue, the provinces’ reporting rates might change if their “true” frontier is not properly captured using a polynomial of epidemic-time variables. We do not find a systematic positive or negative bias in reporting rates. We instead find that the differences in epidemic-time and SIR-based reporting rates have to do with differences in reported cases across provinces that have not been perfectly captured by the epidemic-time variables. Indeed, although the correlation is not strong, we find that the epidemic-time model tends to overestimate (underestimate) the SIR-based reporting rates of provinces with large (small) numbers of reporting cases. In the sense, our reporting rates can be viewed as a lower (upper) bound of the true reporting rates in these provinces.

In summary, both epidemic-time and SIR-based models provide similar frontier and distributional results and confirm our hypothesis that the proportion of reported cases through the epidemic development increases over time in line with Li et al. (2020). Just pointing out that the undocumented cases are likely higher (smaller) than that estimated using our epidemic-time model in most (less) affected provinces.

Additional variables: socio-economic determinants

An appealing feature of both epidemic-time and SIR-based specifications is that they can be estimated using epidemic-type data only, i.e., the rates of growth of coronavirus cases depend in our models on own and neighbours’ epidemic times, lagged cases of COVID-19, date of implementation of control measures, etc. However, this does not preclude adding other covariates. The introduction of crucial socio-economic determinants not only provides an estimate of their potential impact but may also offer guidance for future policies aimed at preventing the emergence of epidemics.

To examine this issue, we estimated our preferred model (exponential epidemic-time specification) by adding, one at a time, a series of socio-economic variables to both the overall epidemic frontier and the proportion of under-reported cases through the scaling function.22 None of these variables had a significant effect on the proportion of under-reporting cases, perhaps due to the fact that the random u-term we are modelling here is time-invariant. Similarly, most of the demographic and weather variables do not have a significant frontier effect. However, we did find that the most-populated provinces have had more intensive coronavirus epidemics, most likely due to agglomeration of individuals and the fact that the use of public transport is more prevalent in these provinces. We also found that the COVID-19 epidemic was more intense in provinces with a relatively large share of workers in the service sector. In contrast, the epidemic was weaker in provinces with a relatively large share of workers in the agriculture sector. The risk of contagion in the service sector, where many jobs are indoor, is likely much larger than in the agricultural sector, where work is mainly outdoor.

Temporal windows

As most control measures began on March 14th, the data used in our empirical analysis on coronavirus cases corresponded to a temporal window defined between the onset of the epidemic in each province and the 4th of April (i.e., about three weeks before and three weeks after mid-March). The sample epidemic time ranges from Kit = 3 to Kit = 40 in this window, labelled hereafter as W0340. The first two days of the epidemic of each province are not used because we need two temporal lags to estimate the SIR-based models.

As mentioned above, zero rates of growth of coronavirus cases often appear at the beginning of outbreaks. We estimated our epidemic models dropping these observations, for two reasons. First, we found convergence problems when estimating the frontier specifications of our epidemic curves, even when we added a dummy variable á la Battese (1997) to identify the observations with zero rates of growths. The huge volatility of the dependent variable caused by the presence of zero rates of growth was not sufficiently captured by the mentioned dummy variable to achieve convergence of the maximization procedures. Second, when using non-frontier econometric techniques we found that only the initial temporal patterns were biased once we dropped observations with zero rates of growth of coronavirus cases (we use a third-order function of lnKit).

In order to partially address this issue of dropping observations, we re-estimate our models using two additional alternative temporal windows. The epidemic time ranges from Kit = 7 to Kit = 44 in the second window (W0744 hereafter) and ranges from Kit = 10 to Kit = 47 in the third window (W1047 hereafter). As we move from the first through to the third window, there is a fall in the number of zero rates of growth dropped from the sample. Whereas in the first (original) window we dropped 134 observations with zero rates of growth of coronavirus cases, this figure falls by half in the second window (67 observations were dropped in W0744), and falls by half again in the third and final window (only 30 observations were dropped in W1047).

While the panel datasets for each window are highly unbalanced due to the widely-differing epidemic onset dates across provinces, the second and third windows use more complete panel datasets. They do, however, reduce the number of pre-lockdown observations, which is problematic in that these are needed not only to measure the effectiveness of the Spanish lockdown to battle the COVID-19 pandemic but also to estimate spatial propagation effects across the Spanish provinces. As such, there are advantages and disadvantages to using windows that begin at later dates. To assess these trade-offs, we present the parameter estimates of the exponential epidemic-time specification for the three different temporal windows (W0340, W0744, W1047). The parameter estimates are presented in Table 3. Notice that the third window is estimated with a second-order function of lnKit because the epidemic curve in this window is properly captured with the two first epidemic-time variables. As the volatility of the rates of growth of reported cases in the third window is relatively small, we have also estimated this specification with zero rates of growth of reported cases. Our results in Table 3 show that our parameter estimates are robust to this issue when both models, with and without zero rates of growth, converge. As the volatility of the rates of growth of reported cases is much larger in the earlier stages of the epidemic, the goodness-of-fit increases notably in the second and third windows. We find similar provincial reporting rates, with correlation coefficients close to 0.90 in all cases. The temporal patterns of these reporting rates are also similar, although the reporting rates are larger in the later windows. On the other hand, we do not find significant spatial propagation effects across provinces when we use the second and third windows because they include much fewer pre-lockdown observations, a result that is to be expected. As the national lockdown of the population basically halted the mobility of people across provinces, this effect can only be measured if there is a relatively large dispersion of epidemic developments across provinces before the implementation of the Spanish lockdown. Using the final window (W1047), we do not find a significant effect of the lockdown on the rates of growth of coronavirus cases. Again, this is to be expected because W1047 includes fewer of the pre-lockdown observations that are needed to identify a differential temporal pattern before and after the policy measure.

Table 3.

MLE: Epidemic-time (lnKit) specification with different temporal windows

W0340 W0744 W1047 W1047
Coef. s.e. Coef. s.e. Coef. s.e. Coef. s.e.
Overall epidemic curve
 Intercept 5.1474 3.2768 31.3080 20.0315 −24.5151*** 5.9564 −29.0482*** 6.1906
 lnK −8.8926** 3.8386 −39.1261* 21.3358 16.4949*** 3.9424 19.7441*** 4.0948
 lnK2 4.0997*** 1.5123 15.4096** 7.5498 −3.0285*** 0.6332 −3.5628*** 0.6556
 lnK3 −0.6332*** 0.1964 −2.0246** 0.8826
 WlnK 0.1278* 0.0724 0.1070 0.1116 −0.0299 0.1813 0.2430 0.1629
 D −0.4977*** 0.1306 −0.4942** 0.2522 −0.3842 0.4481 −0.6655* 0.3471
 WlnK·D −0.2879*** 0.0592 −0.3453*** 0.1054 −0.2528 0.1961 −0.4942*** 0.1612
Noise term
 Intercept 1.1247*** 0.1111 2.9305*** 0.1539 4.1414*** 0.2183 4.3246*** 0.2278
 lnK −1.1714*** 0.0390 −1.7956*** 0.0474 −2.1849*** 0.0661 −2.2389*** 0.0698
Scaling function
 K −0.0437*** 0.0145 −0.0655*** 0.0079 −0.0711*** 0.0062 −0.0634*** 0.0058
 WlnK −0.0796 0.0501 0.0119 0.0433 0.0348 0.0484 0.0471 0.0462
 K·D −0.0026* 0.0014 −0.0021* 0.0012 −0.0020 0.0015 −0.0035** 0.0015
 WlnK·D −0.0350* 0.0194 −0.0298 0.0211 −0.0296 0.0251 −0.0441* 0.0242
Undocumented cases term
Intercept 1.1527*** 0.4280 1.7139*** 0.4982 1.8568*** 0.4922 1.7275*** 0.5083
Day of the week effects
Tuesday 0.1992 0.1478 0.3095 0.2062 0.3604 0.2611 0.3110 0.2754
Wednesday 0.1599 0.1281 0.2727 0.1731 0.3215* 0.1884 0.2928 0.1823
Thursday 0.2243* 0.1220 0.3945* 0.1530 0.4357*** 0.1580 0.4263*** 0.1688
Friday 0.1421 0.1288 0.2897* 0.1639 0.3035* 0.1768 0.3110 0.2045
Saturday −0.0124 0.1463 0.1232 0.1813 0.1877 0.1841 0.1570 0.1978
Sunday 0.2571* 0.1432 0.3280 0.2162 0.3510 0.2526 0.3215 0.2588
Mean log LF 1.6379 1.9594 2.2119 2.1735
Obs 1290 1357 1394 1424
Pseudo R-sq 0.3507 0.4157 0.4575 0.3457
Epidemic time
Minimum 3 7 10 10
Maximum 40 44 47 47
Mean RR 0.422 0.438 0.480 0.470
Zero rates of growth No No No Yes

***Significant at 1% level

**Significant at 5% level

*Significant at 10% level

Variance-covariance matrix specification

A final robustness analysis has to do with the specification of the variance-covariance matrix of our noise term. We assume in this subsection that the noise term is autocorrelated over time, in the same fashion as Wang and Ho (2010). However, estimation of a frontier epidemic model using (14) is problematic if the panel dataset is not continuous and there are missing observations. When varying the temporal windows above, we did not find severe convergence issues when we estimated the model using all observations of the last (third) window. For this reason, this robustness analysis is performed using the epidemic times from Kit = 10 to Kit = 47.

Generally speaking, the frontier coefficients were robust to different specifications of the variance-covariance matrix of the noise term. Moreover, the diagonal variance-covariance matrix outperforms the alternative specification in terms of goodness-of-fit. On the other hand, regardless of whether we use diagonal or autocorrelated variance-covariance matrix, we find that its standard deviation decreases over time. This feature is thus robust to the specification of the noise term as autoregressive. Finally, we find that most of the coefficients of the scaling function are negative using both specifications, indicating again that the proportion of undocumented (reported) cases decreases (increases) over time.

Discussion of empirical results

In all estimated models, we find very different reporting rates across the Spanish provinces. The large cross-sectional heterogeneity in reporting rates found in our empirical application is one of the contributions of the paper, as previous epidemiological literature has often assumed common rates (an exception to this is Millimet and Parmeter 2022, who also have differential reporting rates). For instance, the strength of the government mitigation policy is modelled in Chudik et al. (2020) in terms of the proportion of population that is exposed to COVID-19. To estimate this proportion, they need to make an assumption regarding the reporting rate. In particular, they use the data from the Diamond Princess cruise ship reported by Moriarty et al. (2020) to calibrate this rate and assume that the average reporting rate is equal to 50% in all Chinese provinces. They find a very large exposure rate in Hubei province (the epicentre of the epidemic), where reducing this exposure required time due to the novelty of the virus. The estimated exposure rates in other provinces ranged between 9 and 87%, indicating that the Chinese control measures had very different effects in each province. This somewhat unexpected result might be caused by the common value used by these authors to calibrate the reporting rate. On average, most of our reporting rates range from 10 to 79%, a similar variation found for the exposure rates in Chudik et al. (2020). Therefore, it may be the case that their estimated variety of exposure rates is caused by of the fact that their econometric model ignores systematic variations in reporting rates across provinces.

Most of our estimated models provide evidence supporting the belief that human mobility did spread the virus across the country before the implementation of the Spanish lockdown. Therefore, restricting people’s mobility (between or within provinces) seems to be a reasonable measure to attenuate the propagation of the coronavirus. In this sense, our results show that the lockdown was effective both in preventing the propagation of the coronavirus between provinces as well as in attenuating the propagation of the virus within each province. Hence, we find that the Spanish lockdown, together with other control measures, was an effective measure to battle COVID-19 in the absence of pharmaceutical measures (e.g., vaccines).

The average contraction in the rates of growth of coronavirus cases attributed to the lockdown is around 6.8 percentage points (from 18.2% with no lockdown to 11.4% with the lockdown). The largest reductions were found in provinces that are either close to the epicentres of the coronavirus or adjacent to provinces with more advanced epidemics. The reductions in the rates of growth of coronavirus cases attributed to the lockdown in these provinces are much larger than the average value. For instance, we find notable effects in Ávila, Segovia and Cuenca, which neighbour Madrid, the Spanish province hardest-hit by coronavirus. Large effects are also found in Tarragona and Lérida, which neighbour Barcelona, the second hardest-hit Spanish province. We also find large effects of the lockdown in Ciudad Real and Albacete, two adjacent provinces that are two local hotspots of the coronavirus in the centre of Spain. In southern Spain, we find large effects in Córdoba, which neighbours Málaga, the main epicentre of the coronavirus in this area. We also find important effects for sparsely-populated provinces such as León, Soria, Palencia, Burgos and Teruel. It is worth mentioning that the epidemic in many of these provinces began almost one week later than it did in neighbouring provinces. Therefore, while local and national lockdowns of the population are effective measures to battle COVID-19, they should be implemented at the very early stages of the epidemics.

It is worth mentioning here that, although the lockdown was very strict, this control measure did not completely eliminate human mobility. Apart from movement within and across provinces due to the existence of essential work, there was also an exodus from the epicentres of the Spanish coronavirus crisis of people wishing to spend the lockdown in provinces with few or no reported cases of COVID-19.

We also extended our pure frontier epidemic models by including a set of socio-economic factors that might influence the evolution of the epidemic in each province. This information can be very useful for policy makers and health authorities planning the relaxation of a lockdown. We find that the most-populated provinces had more intensive coronavirus epidemics. More (less) intensive coronavirus epidemics were also found in provinces with a relatively large share of workers in the service (agricultural) sector. These results, together with the strong propagation effects estimated for provinces close to the main epicentre of the coronavirus in Spain, point to the idoneity of carrying out a gradual, focused relaxation of the control measures. Thus, the relaxation of the lockdown should be slow in the most-populated provinces, in provinces with a higher share of the workforce in the service sector, and in the main epicentres of the coronavirus of Spain. Control measures could be lifted earlier in provinces mainly engaged in primary-sector production.23

Conclusions and future research

This paper attempts to bridge the epidemiological modelling and production economics literatures by proposing stochastic frontier analysis as a useful tool with which the epidemic curves of COVID-19 can be estimated. We have proposed two different types of stochastic epidemic frontier specifications, one based on the econometric SIR specification of Chudik et al. (2020) and the other based on previous work by Orea and Álvarez (2022) which approximates the epidemic curves with functions of the epidemic times, i.e., the time since the onset of the pandemic. The most appealing feature of these models is that they can both be estimated using standard stochastic frontier techniques. One of the specifications of the model can be interpreted as a heteroskedastic version of the model introduced by Wang and Ho (2010). As such, the model we propose should prove useful for practitioners to control for individual effects in a production economics context under time-varying heteroskedasticity.

The models presented permit undocumented cases to be estimated, rather than assumed, and also allow spatial propagation of the virus across geographical areas to be modelled. A simulation exercise indicated that the epidemic-time model performed better, and in an empirical application to the case of the original outbreak of the pandemic in Spain we provide estimates from several different specifications of this model. The results from our models provided insights into the effectiveness of the national and regional lockdown measures and the influence of socio-economic factors in the propagation of the virus.

Our work can be extended in several directions. We have found convergence problems when the model included observations with zero rates of growths. These observations tended to generate huge volatility in the dependent variable. Our application of the SFA approach to examine a non-traditional issue seems to uncover a weakness of this approach when then the target variable is highly volatile. Practitioners aiming to estimate firms’ efficiency should expect the appearance of convergence problems if firms’ production is highly volatile. As this situation is often observed (e.g., in agricultural economics), an interesting topic for future research would be to examine how to deal with this issue properly from a methodological perspective.

In the empirical application in this paper we availed of data at provincial level that allowed us to analyse the effectiveness of national and regional institutional responses at this level of disaggregation. However, several regions in Spain, including Andalusia, Asturias, the Basque Country, Cantabria, Catalonia, Madrid and Murcia have also provided data on coronavirus cases at municipal level. By adapting our empirical strategy to this more disaggregated data we will be able to evaluate the local control measures established by the regional governments during the second and successive waves of contagion of COVID-19.

Another extension would be to explore the possibility of different collectives within the population having different proportions of asymptomatic or undocumented cases. For example, data at provincial level by gender would allow us to examine whether the proportion of undocumented cases among women is larger or smaller than that among men. If this were the case, public health authorities should be particularly aware of gender-based channels of transmission of the virus in sectors of the economy where one gender or the other makes up a substantial majority of the workforce. These types of differences between collectives can be modelled with a system of epidemic spatial stochastic frontier equations, one for each collective. The copula-based maximum likelihood (ML) approach introduced by Lai and Huang (2013) is well-suited for such an analysis.

Finally, the relationship between reported and undocumented cases could be explored in greater depth. Li et al. (2020) have indicated that undocumented (asymptomatic) cases facilitate the dissemination of COVID-19. It is not clear how to explore this cross-group propagation effect using a frontier analysis approach because it tends to “reverse” the sign of the one-sided error term capturing the proportion of undocumented cases. A candidate is the latent class frontier model approach of Kumbhakar et al. (2007), as this model allows the sample to be split into two groups that differ in how the one-sided error term enters the model.

Appendix A

SIR-based specification of our epidemiological frontier model

We derive in this appendix an epidemiological model using the Susceptible-Infected-Recovered (SIR) specification of Chudik et al. (2020) and discuss how it can be expressed in a frontier setting. These authors derive the following second-order non-linear difference equation specification of the SIR model (see their Eq. 11):

Y~it=Y~it12/Y~it2+θY~it1Y~it21γY~it12 A1

where Y~it denotes the true number of infected in province i at time t, θ is the effective transmission rate, and γ is the rate of recovery. If we divide both sides of (A1) by Y~it1 and take logs, we get:

ΔlnY~it=lnY~it1/Y~it2+θY~it21γY~it1 A2

As can be seen, the true rate of growth of coronavirus cases depends on first- and second-order lagged values and their interaction. Ignoring other random errors, Chudik et al. (2020) assume that the ratio of confirmed to true cases at time t can be written as:

YitY~it=πit=euit,uit0 A3

so that

Yiteuit=Y~it,uit0 A4

where the one-sided term uit in (A4) simply measures the gap between the true and confirmed number of cases, such that:

lnY~it=lnYit+uit A5

If we use (A5) to replace the true number of cases on the left-hand side of (A2) with their “observed” counterparts, we get:

Rateit=lnfqit,βΔuit=lnY~it1/Y~it2+θY~it21γY~it1Δuit A6

Note that the term in brackets depends on the true, but unobserved, number of cases in periods t−1 and t−2. If we follow Chudik et al. (2020) and replace them with their “observed” counterparts, Eq. (A6) becomes:

Rateit=lnYit1Yit2eΔuit1+θYit2euit21γYit1euit1Δuit A7

Several comments are in order regarding this SIR-based frontier model. First, if we assume that the one-sided random term ut is i.i.d. and follows, say, a half-normal distribution, the distribution of (A7) in not known and cannot be estimated using the standard stochastic frontier (SF) estimators. Second, we need to make some simplifying assumptions if we are to estimate (A7) using standard SF techniques. For instance, we might assume that the u-terms inside the brackets balance each other out and that lnf(qit, β) is a linear specification of lnYit−1, lnYit−2 and lnYit−1lnYit−2, as is customary in the production economics literature (squares of both lnYt−1 and lnYt−2 can also be included if a Translog specification is preferred). However, estimating such a model likely provides biased results, not only because we are ignoring u-terms but also because the lagged values of reported cases might be correlated with the time-invariant part of the error term capturing the proportion of undocumented cases (ui). This could occur if undocumented (asymptomatic) cases facilitate the dissemination of COVID-19 and thereby increase the reporting rates.

Appendix B

Figure 7 and Table 4

Compliance with ethical standards

Conflict of interest

The authors declare no competing interests.

Footnotes

1

For example, Chudik et al. (2020) use the data from the Diamond Princess cruise ship reported by Moriarty et al. (2020) to calibrate the proportion of the population exposed to COVID-19 and assume an average reporting rate in all Chines provinces of 50%. They find large variations in exposure rates across Chinese provinces, ranging from 9 to 87%. The fact that their econometric model ignores systematic variations in reporting rates across provinces may well be causing this wide variety of exposure rates.

2

The model that describes the expected number of infections at time (day) t in Giuliani et al. (2020) is also allowed to depend on the number of infections reported at time t−1.

3

Notice that this specification resembles the popular reproduction-based models used in the epidemiological literature in the sense that our beta parameter plays the same role as the so-called “reproductive number of the infection (R), a fundamental epidemiological quantity that represents the average number of infections per infected case over the course of their infection. The key aim of the coronavirus control measures is to reduce βit. If βit is equal to one, there are no new infections, and the pandemic has therefore been controlled. The same would happen if the reproductive number of the infection is equal to unity in an epidemiological model.

4

Orea and Álvarez (2022) show that the theoretical SIR and SEIR epidemiological models yield time-varying growth rates of cumulative cases, regardless of whether daily or longer temporal lags are used. Moreover, the SIR model and its variants produce S-shaped epidemic curves of cumulative cases that can be accurately predicted using a third-order function of lnKit.

5

Of course, if there are no undocumented cases (Uit = Uit−1 = 0), then the actual curves themselves coincide.

6

This simple empirical strategy might provide biased results as it ignores the potential correlation with the undocumented cases, which constitute an omitted variable in this analysis.

7

This estimator treats αi as fixed parameters. If they are treated instead as time-invariant random variables, we get the so-called True Random Effects (TRE) panel stochastic frontier model.

8

This problem appears when the number of parameters to be estimated increases with the number of cross-sectional observations in the data. In this situation, consistency of the parameter estimates is not guaranteed even if N → ∞.

9

Their inclusion makes the rates of growth of coronavirus cases extremely volatile, especially at the beginning of the epidemic outbreaks. This extremely high volatility is difficult to capture using the standard distributions for both the noise term (vit) and the one-sided random term capturing the proportion of undocumented cases (uit).

10

The number of zero rates of growth decreases notably if we use more recent temporal windows (i.e. not centred around the start of the lockdown) to carry out our empirical analysis. For this reason, we will try to deal with this issue in our empirical application by using a temporal window that begins one week later, at the expense of a fall in the number of pre-lockdown observations.

14

It is worth mentioning that the third-order function of lnKit captures the temporal pattern of the virus epidemic, conditional on Dt. In other words, the epidemic curve associated to this function can be interpreted as our as if scenario with no control measures.

15

Other spatial specifications based on human mobility across all the Spanish provinces were used in Orea and Álvarez (2022). They found very similar results due to 77% of the variation of the weights of the mobility-based W matrix being explained by the binary values of the weights of the contiguity W matrix.

16

Our spatial SLX specification does not distinguish between reported and undocumented propagation across provinces. A SAR specification with WilnYt and Wiut allows us to deal with this issue. However, estimating this model is far from simple because the distribution of Wiut is generally not known if uit is independently distributed across provinces, as assumed above. As estimating this model presents important methodological challenges, we leave an examination of this issue for future research.

17

AsWilnKt is measured in deviations with respect to the post-lockdown sample mean, the coefficient of Dt can be interpreted as an average effect.

18

If we use individual reporting rates to compute individual multiplication factors, we get values on the order of two or three digits, with a mean value of 8, which are consistent with the large attack rates (i.e. proportions of infected people) found for Spain by Flaxman et al. (2020) in their study using 11 European countries.

19

The fraction of all infections that were documented after the travel restrictions was estimated to be 65%, a slightly larger reporting rate than that found in our paper after the implementation of the Spanish lockdown.

20

A separate but related matter has to do with the onset date of the pandemic used in our paper. Our epidemic time variable is defined as the number of days since the observed onset date of the pandemic, which relies on reported cases. In order to see whether in practice the gap between observed and true onset dates is an important issue, Orea and Álvarez (2022) simulated several scenarios with different observed onset dates due to underreporting. They found that the goodness-of-fit of the model only deteriorated when underreporting is extremely large and the gap between observed and true onset dates varies notably across provinces. In this case, however, a model with fixed effects retrieves the predictive capabilities of the model. Estimating such a model in a frontier setting will be a topic of future research.

21

We have carried out a Vuong test in order to examine whether the (non-nested) epidemic-time and SIR models are equivalent in terms of goodness-of-fit. Although the sign of the Vuong tests tended to suggest that the SIR models provide a better goodness of fit, the results of these tests were not totally conclusive. Thus, while we cannot reject that the non-frontier epidemic-time and SIR models are equivalent (the absolute value of the Vuong test was 1.23), the Vuong test (2.46 in absolute value) suggests a similar performance at 1% confident level, but a better fit of the frontier SIR model at the 5% confident level.

22

The socio-economic environment is measured through the provincial GDP per capita and the shares of the services and agricultural sectors in total provincial employment. The demographic structure is measured using population size, population density, and three population age variables. As there is an active debate regarding the influence of the natural environment, we also included two weather variables (temperature and rainfall).

23

As most tasks in the construction sectors are outdoor, this sector might also be restarted before other sectors.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Battese G. A note on the estimation of Cobb‐Douglas production functions when some explanatory variables have zero values. J Agric Econ. 1997;48(1‐3):250–252. doi: 10.1111/j.1477-9552.1997.tb01149.x. [DOI] [Google Scholar]
  2. Battese G, Coelli T. Frontier production functions, technical efficiency and panel data: with application to paddy farmers in India. J Product Anal. 1992;3:153–169. doi: 10.1007/BF00158774. [DOI] [Google Scholar]
  3. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, Pastore y Piontti A, Mu K, Rossi L, Sun K, Viboud C, Xiong X, Yu H, Halloran ME, Longini IM, Vespignani A (2020) The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Outbreak to pandemic. Science 368(6489):395–400. 10.1126/science.aba9757 [DOI] [PMC free article] [PubMed]
  4. Cho SW. Quantifying the impact of nonpharmaceutical interventions during the COVID-19 outbreak: the case of Sweden. Econom J. 2020;23(3):323–344. doi: 10.1093/ectj/utaa025. [DOI] [Google Scholar]
  5. Chudik A, Pesaran MH, Rebucci A (2020) Voluntary and mandatory social distancing: evidence on Covid-19 exposure rates from Chinese provinces and selected countries. NBER Working paper 27039. Working Paper 27039, http://www.nber.org/papers/w27039.
  6. Dickson MM, Espa G, Giuliani D, Santi F, Savadori L. Assessing the effect of containment measures on the spatio-temporal dynamic of COVID-19 in Italy. Nonlinear Dynamics. 2020;101(3):1833–1846. doi: 10.1007/s11071-020-05853-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Eliasson K, Lindgren U, Westerlund O. Geographical labour mobility: migration or commuting. Reg Stud. 2003;37(8):827–837. doi: 10.1080/0034340032000128749. [DOI] [Google Scholar]
  8. Fang H, Wang L, Yang Y. Human mobility restrictions and the spread of the Novel Coronavirus (2019-nCoV) in China. J Public Econ. 2020;191:104272. doi: 10.1016/j.jpubeco.2020.104272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Flaxman S, Mishra S, Gandy A, Unwin HJT, Mellan TA, Coupland H, Whittaker C, Zhu H, Berah T, Eaton JW, Monod M. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584(7820):257–261. doi: 10.1038/s41586-020-2405-7. [DOI] [PubMed] [Google Scholar]
  10. Giuliani D, Dickson, MM, Espa G, Santi F (2020) Modelling and predicting the spatio-temporal spread of Coronavirus disease 2019 (COVID-19) in Italy. Available at SSRN: https://ssrn.com/abstract=3559569 or 10.2139/ssrn.3559569. [DOI] [PMC free article] [PubMed]
  11. Greene W. Reconsidering heterogeneity in panel data estimators of the stochastic frontier model. J Econom. 2005;126(2):269–303. doi: 10.1016/j.jeconom.2004.05.003. [DOI] [Google Scholar]
  12. Gross B, Zheng Z, Liu S, Chen X, Sela A, Li J, Li D, Havlin S (2020) Spatio-temporal propagation of COVID-19 pandemics. Available at medRxiv preprint. 10.1101/2020.03.23.20041517.
  13. Gutiérrez MJ, Inguanzo B, Orbe S. Distributional impact of COVID-19: regional inequalities in cases and deaths in Spain during the first wave. Appl Econ. 2021;53(31):3636–3657. doi: 10.1080/00036846.2021.1884838. [DOI] [Google Scholar]
  14. Korolev I. Identification and estimation of the SEIRD epidemic model for COVID-19. J Econom. 2021;220(1):63–85. doi: 10.1016/j.jeconom.2020.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kumbhakar SC. Production frontiers, panel data, and time-varying technical inefficiency. J Econom. 1990;46:201–211. doi: 10.1016/0304-4076(90)90055-X. [DOI] [Google Scholar]
  16. Kumbhakar SC, Orea L, Rodríguez-Álvarez A, Tsionas EG. Do we estimate an input or an output distance function? An application of the mixture approach to European railways. J Prod Anal. 2007;27(2):87–100. doi: 10.1007/s11123-006-0031-5. [DOI] [Google Scholar]
  17. Lai H-P, Huang CJ. Maximum likelihood estimation of seemingly unrelated stochastic frontier regressions. J Prod Anal. 2013;40(1):1–14. doi: 10.1007/s11123-012-0289-8. [DOI] [Google Scholar]
  18. Leung K, Wu JT, Liu D, Leung GM. First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment. Lancet. 2020;395:1382–1393. doi: 10.1016/S0140-6736(20)30746-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, Shaman J. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) Science. 2020;368(6490):489–493. doi: 10.1126/science.abb3221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Millimet DL, Parmeter CF. Accounting for skewed or one-sided measurement error in the dependent variable. Political Analysis. 2021;30(1):66–88. doi: 10.1017/pan.2020.45. [DOI] [Google Scholar]
  21. Millimet DL, Parmeter CF. COVID-19 severity: a new approach to quantifying global cases and deaths. J R Stat Soc Series A. 2022;185(3):1178–1215. doi: 10.1111/rssa.12826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Moriarty L, Plucinski MMarston B, et al. (2020) Public health responses to COVID-19 outbreaks on cruise ships - worldwide, February–March 2020. Morbidity and Mortality Weekly Report (MMWR) 69(Mar):347–352. 10.15585/mmwr.mm6912e3 [DOI] [PMC free article] [PubMed]
  23. Orea L, Álvarez IC. A new stochastic frontier model with cross-sectional effects in both noise and inefficiency terms. J Econom. 2019;213(2):556–577. doi: 10.1016/j.jeconom.2019.07.004. [DOI] [Google Scholar]
  24. Orea L, Alvarez I (2022) How effective has the Spanish lockdown been to battle COVID-19? A spatial analysis of the coronavirus propagation across provinces. 31(1), 154–173. 10.1002/hec.4437. [DOI] [PMC free article] [PubMed]
  25. Orea L, Álvarez I, Wall A (2021) Estimating the propagation of the COVID-19 virus with a stochastic frontier approximation of epidemiological models: a panel data econometric model with an application to Spain. Efficiency Series Paper, 01/2021, Oviedo Efficiency Group, University of Oviedo. http://www.unioviedo.es/oeg/ESP/esp_2021_01.pdf.
  26. Saez M, Tobias A, Varga D, Barceló MA. Effectiveness of the measures to flatten the epidemic curve of COVID-19. The case of Spain. Sci Total Environ. 2020;727:138761. doi: 10.1016/j.scitotenv.2020.138761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wang HJ, Ho CW. Estimating fixed-effect panel stochastic frontier models by model transformation. J Econom. 2010;157(2):286–296. doi: 10.1016/j.jeconom.2009.12.006. [DOI] [Google Scholar]
  28. Wang H-J. A stochastic frontier analysis of financing constraints on investment: the case of financial liberalization in Taiwan. J Bus Econ Stat. 2003;21:406–419. doi: 10.1198/073500103288619016. [DOI] [Google Scholar]

Articles from Journal of Productivity Analysis are provided here courtesy of Nature Publishing Group

RESOURCES