Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Apr 30;12:7077. doi: 10.1038/s41598-022-10185-0

Revisiting the standard for modeling the spread of infectious diseases

Michael Nikolaou 1,
PMCID: PMC9056532  PMID: 35490159

Abstract

The COVID-19 epidemic brought to the forefront the value of mathematical modelling for infectious diseases as a guide to help manage a formidable challenge for human health. A standard dynamic model widely used for a spreading epidemic separates a population into compartments—each comprising individuals at a similar stage before, during, or after infection—and keeps track of the population fraction in each compartment over time, by balancing compartment loading, discharge, and accumulation rates. The standard model provides valuable insight into when an epidemic spreads or what fraction of a population will have been infected by the epidemic’s end. A subtle issue, however, with that model, is that it may misrepresent the peak of the infectious fraction of a population, the time to reach that peak, or the rate at which an epidemic spreads. This may compromise the model’s usability for tasks such as “Flattening the Curve” or other interventions for epidemic management. Here we develop an extension of the standard model’s structure, which retains the simplicity and insights of the standard model while avoiding the misrepresentation issues mentioned above. The proposed model relies on replacing a module of the standard model by a module resulting from Padé approximation in the Laplace domain. The Padé-approximation module would also be suitable for incorporation in the wide array of standard model variants used in epidemiology. This warrants a re-examination of the subject and could potentially impact model-based management of epidemics, development of software tools for practicing epidemiologists, and related educational resources.

Subject terms: Infectious diseases, Computational models

Introduction

The global epidemic of COVID-19 has brought to the forefront the importance of mathematical modelling in the development of strategies for managing the spread of infectious diseases17. Terms such as flattening the curve, R0, or herd immunity, which entered public discourse8 emerge from mathematical models that purport to provide useful predictions and thus to help guide effective management strategies9. A basic class of such models separates a population into various compartments—each comprising individuals at a similar stage before, during, or after infection—and keeps track of the population fraction in each compartment over time, by balancing loading, discharge, and accumulation rates. The archetype for this modelling approach is the celebrated SIR model structure1017 which splits a population into three compartments: susceptible (S) to the infection, infectious (I), and the rest (R) being immune or removed from infectious by recovery or death. The dynamics of how individuals move from S to I to R was developed almost a century ago in a mathematical modelling tour-de-force by Kermack and McKendrick18 who derived a general, if elaborate model structure in Eqs. (11)–(15) of their landmark paper. In the same publication (Eq. (29) ibid.) these authors also presented a well characterized special case in the form of the following three simple ordinary differential equations (ODEs) comprising the widely used standard SIR model:

s(t)=-βs(t)i(t) 1
i(t)=βs(t)i(t)-γi(t) 2
r(t)=γi(t) 3

where s,i,r are the susceptible, infectious, and removed fractions of a fixed-size population, respectively; β,γ are infectivity and discharge constants, respectively; and each of the Eqs. (1)–(3) can be derived from the remaining two using the compatibility condition

st+it+rt=1 4

The great value of the SIR model is not merely that it can fit data (as already shown by Kermack and McKendrick in the same publication) but that it can also provide two deep and insightful conclusions about the dynamics governing the course of infectious disease epidemics. The first conclusion concerns the Threshold Theorem:

there exists a critical or threshold density of population. If the actual population density be equal to (or below) this threshold value the introduction of one (or more) infected person does not give rise to an epidemic, whereas if the population be only slightly more dense a small epidemic occurs (ibid., p. 701).

The second conclusion concerns the long-term behavior of s,i,r at the asymptotic end of an epidemic:

… the course of an epidemic is not necessarily terminated by the exhaustion of the susceptible members of the community. … the termination of an epidemic may result from a particular relation between the population density, and the infectivity, recovery, and death rates. (ibid., pp. 701, 702, and Eq. (20))

These conclusions are fairly robust, whether the general or the simplified version (Eqs. (1)–(3)) of the Kermack-McKendrick model is considered15,18. In fact, it immediately follows from stability analysis of Eqs. (1) and (2) that the threshold value for s implied by the SIR model is

sthreshold=γβ=def1R0 5

where R0 (introduced as such later15,19,20) is the basic reproductive ratio, widely considered “one of the most critical epidemiological parameters”11,21. It also follows18 from Eqs. (1)–(3) that the total fraction of individuals infected throughout an epidemic, r, is the real solution of the transcendental algebraic equation

ln1-rs0+1-r-s0R0=0 6

as depicted in Fig. 1. That figure shows the rapid escalation of r as R0 rises above 1, given an initially susceptible population (In fact, making time dimensionless as η=defγt, immediately transforms Eqs. (1) and (2) to sη=R0sηiη,iη=R0sη-1iη, whose only parameter is R0).

Figure 1.

Figure 1

Top: Qualitative trends of individual (left) and stacked (right) profiles for s,i,r fractions of a population in a spreading epidemic, from initiation to termination. Bottom: Total fraction of a population infected by the end of an epidemic, r=1-s, as a function of the basic reproductive ratio R0=defβγ, according to Eq. (6).

The above two quantitative predictions by Eqs. (5) and (6) lend exceptional value to the SIR model, both conceptually and computationally. For instance, they can be used to assess herd immunity11 for a population, corresponding to an estimated value of R0 achieved by non-pharmaceutical or pharmaceutical interventions22. Or, conversely, for an epidemic that ran its course or in development, data can be used to gauge an overall or temporary value of R021.

However, as we will substantiate in the next section, there are another two important quantitative predictions of the standard SIR model that, we argue, can be problematic (see Fig. 1 for visualization):

  1. The peak value, i, of the infectious fraction, i, which may be misrepresented by as much as a factor of about 2, and

  2. the exponential growth rate of the infectious fraction, i, which is also misrepresented by as much as a factor of about 2, with corresponding misrepresentation of the time to peak, t.

The above two shortcomings are not confined to the standard SIR model but, as we elaborate in the next section, are far more pervasive and reach a wide area in compartment-based epidemiology modeling spanned by SIR variants.

To start with, good prediction of both the infectious peak, i, and the time to that peak, t, is of paramount importance when considering management strategies for an epidemic. This is because i and t significantly affect the resources needed for care of infectious patients. To wit, calls for Flattening the Curve22 during the COVID-19 epidemic aimed precisely at lowering i and thus averting the overwhelming of medical care resources23.

In addition, good estimates of R0 from data of exponential growth during the spread of an epidemic are critical for assessing the situation and for designing effective interventions9,21.

Furthermore, and more importantly, to the extent that predictions of the infectious peak and time-to-peak by the standard SIR model may be problematic, the problem is not confined to the I compartment of the standard SIR model. Rather, it may be endemic (no pun intended) in the numerous possible variants of compartment-based epidemiology models with loading and discharge terms. Such variants include a variety of compartments with corresponding arrangements and interactions (e.g., SEIR, SI, SIS, or similar24), multiple subpopulations (e.g., of different age and/or social contact structure11,16,17,25), spatial variation in addition to temporal (entailing partial differential equations11), and any combinations thereof, which collectively lead to diverse stratification patterns26. In the voluminous literature dealing with such models, the discharge rate from a compartment is virtually always represented by a term similar to the term -γi(t) of Eq. (2)27. In fact, this practice is so widespread in the entire literature of epidemiology11,15,2830 that it is selected, perhaps uncritically, even in advanced modeling efforts which employ sophisticated tools (e.g., automated algorithmic discovery31) in attempts to uncover more realistic expressions for infection dynamics. It is plausible, therefore, to claim that peak and time-to-peak predictions for a related compartment in any of these models may be as problematic as the corresponding predictions of the simple SIR model, with similarly adverse consequences.

Of course, for more accurate predictions, one could forego the simplifying assumptions leading to Eqs. (1)–(3) and its variants, in favor of the general time-varying integrodifferential equation patterns introduced by Kermack and McKendrick12,29,32. This, however, would significantly increase complexity of analysis and use32, which partly explains the popularity and underscores the importance of simplified models such as SIR.

Consequently, a natural question arises: Given the aforementioned shortcomings of the standard SIR model, is there a mathematical model of comparable simplicity to Eqs. (1)–(3) that retains the two sound conclusions about the epidemic threshold (Eq. (5)) and long-term epidemic course (Eq. (6)) while avoiding the two issues mentioned above, namely misrepresentation of the infectious peak, i, and time to that peak, t?

Here, we constructively answer this question in the positive. Using a combination of Laplace transforms and Padé approximations to describe compartment discharge dynamics, we develop in the “The Padé SIR model structure” section (Eqs. (12) and (13)) of similar simplicity to SIR. The Padé SIR structure produces the exact same threshold and long-term values (Eqs. (5) and (6)) as Eqs. (1)–(3), while predicting more realistic infectious peak and time to peak for a wide range of practically significant cases. More importantly, because the proposed structure relies on replacement of the discharge term -γi(t) in the I module of the SIR Eq. (2) without increasing complexity, it can be used widely in the large array of compartment-based epidemiology models to realistically represent the dynamics of compartment discharge. This immediately prompts a re-evaluation and possible revision of the wide literature on compartment-based modeling in epidemiology inspired by the SIR model. It is emphasized that the preceding prompt is not motivated by a mere intent for higher accuracy; rather, the aim is to offer higher utility, in the spirit of George Box’s dictum “all models are wrong, but some are useful”33.

In the rest of the paper, we first present the significant merits and subtle issues of the SIR structure. Subsequently, we offer a remedy to these shortcomings, in the form of a new class of SIR variants (the Padé SIR model structure) whose properties and implications we explore for epidemiology modeling and epidemic management. Discussion and extensions follow, pointing to the usability of the proposed modeling approach and its applicability to the wide class of compartment-based epidemiology models.

Methods

To provide context and intuition for the developments that follow, we will rely on the basic schematic of Fig. 2. Directly inspired by the original Kermack-McKendrick ideas, Fig. 2 shows how the stacked fractions s,i,r of a fixed-size population change during an epidemic, as individuals move from compartment S to I to R over time. Infectious individuals in the I compartment are discharged (to enter the R compartment) at times T0 after becoming infectious, where the infectious period T follows a cumulative distribution and corresponding density18,29 defined in the standard way as Fθ=defPTθ and fθ=Fθ.

Figure 2.

Figure 2

Schematic of time-varying susceptible (s, green), infectious (i, orange), and removed (r, blue) fractions of a fixed-size population after an initial infection, i0, at discretized time t0. Each new part of the infectious fraction i (thick-framed orange rectangles) moves to the removed fraction, r (thick-framed blue rectangles) piecewise in a number of time steps following a certain distribution. The population eventually reaches a steady state at s,r=1-s, and i=0. The pattern analogy with the stacked chart in Fig. 1 is evident.

Simple balances around the boxed areas in Fig. 2 for a time-invariant cumulative distribution Fθ (APPENDIX A) yield the equation

rt=r0+0t1-r0-st-θFθfθdθ 7

which, combined with the infectivity Eq. (1) and the consistency Eq. (4) forms a general representation of the SIR system dynamics29,32.

Selecting F in Eq. (7) to be the cumulative density function of the exponential distribution Fθ=1-exp-γθ=defFγθ, one immediately gets the SIR model, Eqs. (1)–(3) (APPENDIX A). For that model, the parameter γ is the inverse of

the average infectious period … estimated relatively precisely from epidemiological data11.

As will be detailed in what follows, it is at this point where issues with peak and time-to-peak misrepresentations by the SIR model may originate:

Heeding the above suggestion to use epidemiological data for direct estimation of the average, 1/γ, of the infectious period, T, is indeed sensible (in fact, necessary for a reasonable estimate). However, the associated distribution of T is typically far from exponential (because an exponential distribution would suggest, inter alia, that most infectious individuals leave the I compartment in zero time, an untenable assumption). Rather, T follows distributions with peak not near zero34 as shown in Fig. 3 by the curves indexed by n1. The exact shape of these curves is not important; rather, these curves serve as examples of distributions fγθ with peak not near 0.

Figure 3.

Figure 3

Sample cumulative distribution functions F(γθ) (left) and corresponding probability distribution functions fγθ=F(γθ) (right) for discharge time, T, from the I compartment of a population. Curves follow the formulas Fγθ=1-Γn,nγθΓ(n,0) and fγθ=Fγθ, (see APPENDIX B). The exponential distribution with Fγθ=1-e-γθ (left) and fγθ=e-γθ (right) corresponds to n=1, whereas the impulse distribution with Fγθ=H(γθ-1) (unit step, left) and fγθ=δγθ-1 (unit impulse, right) correspond to n=. All distributions are shown in terms of the dimensionless variable γθ and have the same average equal to 1.

The assumption of exponential distribution for the infectious period “has appeared in many epidemic models but has seldom been questioned”27 yet would be conveniently acceptable, if it did not lead to inadvertent outcomes. Unfortunately it does, in the following subtle yet important way: While the same threshold and long-term values (Eqs. (5) and (6), respectively) would result from Eqs. (1), (7), and (4), and for practically any reasonable distribution of T with the same average, D=def1/γ, (an insight already provided by Kermack and McKendrick18) the estimated infectious peak and time to peak would be significantly affected by the kind of distribution considered, in an interesting fashion, as demonstrated in Fig. 4. This figure shows the profiles of the infectious fraction, it, for the infectious period distributions in Fig. 3, with time in both dimensional and dimensionless form. The latter is in terms of dimensionless time t/D, because this simple transformation trivially makes the dynamics of all considered models dependent on R0 alone and allows for meaningful comparisons without loss of generality. The dimensional time is in days, to provide some context for epidemics such as COVID-19 with related values3436

1γ=8.4days,β=γR0days-1 8

Figure 4.

Figure 4

Response of the infectious fraction, i(t), according to the model of Eqs. (1), (4), and (7), for distributions shown in Fig. 3. Note that the distribution for n=4 is closer to the exponential distribution n=1 than to the impulse distribution (n=) in Fig. 3, yet the response of i(t) for n=4 is a lot closer to the response for n= rather than to that for n=1.

What is remarkable in Fig. 4 is that while different distributions of T sharing the same average, D=def1/γ, expectedly yield different profiles of i(t)27 these profiles quickly approach the profile corresponding to the unit-impulse distribution shown in Fig. 3. For that distribution of T, it immediately follows from Eqs. (1), (7), and (4) (APPENDIX A) that the resulting dynamic model, which we will term dSIR, comprises the delay differential equation (DDE)

st=βstst-st-D 9

and the delay algebraic equations

it=st-D-st 10
rt=1-st-D 11

with D=def1γ. Therefore, the dSIR model of Eqs. (9)–(11) constitutes a more realistic representation of spreading epidemic dynamics than the standard SIR model.

Delay differential equations (DDEs) such as the above have been a classic subject of study in biology37,38). DDEs are generally perceived as more difficult to analyze than ODEs32, p. 5 perhaps because of infinite spectra (for linear DDEs) or discontinuities in the derivatives of DDE solutions—albeit the corresponding theory for DDEs such as the above “does not present substantial additional difficulties” compared to ODEs39, p]. 6. Nevertheless, even though Eqs. (9)–(11) have long been known29 they are typically bypassed in favor of their ODE counterparts, Eqs. (1)–(3), along with their misrepresentations of the infectious peak and time to peak already discussed.

To address this issue, in the next section we derive novel approximations of the dSIR Eqs. (9)–(11) in the form of the Padé SIR ODEs, which have a number of advantages: While the Padé SIR model structure is as simple as that of the standard SIR model, Eqs. (1)–(3), and produces the same threshold and long-term values captured by Eqs. (5) and (6), it produces more realistic representations for the infectious peak and time to peak than the standard SIR ODEs. As such, the Padé SIR model structure not only creates an alternative to the standard SIR model but also provides a general module that can be immediately incorporated in the wide variety of compartment-based models used in epidemiology.

Main results

The Padé SIR model structure

Combining Laplace transforms with first-order Padé approximation (a popular approach for approximating transcendental transfer functions by polynomial rational fractions in automatic control40,41) one can show (APPENDIX C) that Eqs. (9)–(11) of the dSIR model can be approximated by the first-order Padé SIR model, comprising Eqs. (1), (4), and the novel ODE

it=2DDβR0st-1i(t) 12

where D is the value of the infectious period, T. Note that the only difference between the above Eq. (12) and its standard SIR counterpart, Eq. (2), is simply the factor 2. Yet this difference has significant implications, to be highlighted shortly.

For better approximation of Eqs. (9)–(11) one can use a second-order Padé approximation to obtain (APPENDIX C) the second-order Padé SIR model, which comprises Eqs. (1), (4), and the second-order ODE

it=12D2DβR0s(t)i(t)-i(t)-D2i(t) 13

in place of the SIR model’s Eq. (2).

Why Padé SIR?

A basic merit of the Padé SIR model is illustrated in Fig. 5, which shows that the profiles of i(t) obtained by numerically integrating the (first- or second-order) Padé SIR models are close to that produced by the dSIR model for a range of values of R0, but far from the corresponding profile produced by the standard SIR model.

Figure 5.

Figure 5

Comparison of the infectious fraction profiles, i(t), resulting from the dSIR, Padé SIR (first- and second-order), and SIR models for different values of R0. Note that for relatively moderate values of R0 both Padé SIR models approximate the dSIR model well, whereas for large values of R0 the Padé-2 SIR remains a usable approximator while the Padé-1 SIR model approaches its usefulness limits.

Note that the approximation in Fig. 5 depends on R0=defβD and deteriorates as R0 takes values farther away from 1, as expected by the properties of Padé approximants. In fact, the first-order Padé SIR model should be used with caution for R02, because it would yield negative early values of r(t), as can be immediately deduced by linear analysis of the corresponding third ODE, rt=-st-it=-R0Dstit+2Dit, which implies rt-r¯2-R0Dit; and the same model, for larger R02, would produce peak values of it>1, which is clearly meaningless. However, as shown in Fig. 5, the predictions of i by the first-order Padé SIR remain remarkably close to those of the dSIR model, even for fairly large R0 well above 2. This behavior of approximation accentuates the value of the Padé SIR model, as values of R0 close to (or lower than) 1 would be far more desirable than values well above 1 (Fig. 1). Of course, one could easily extend the Padé SIR model to yield r(t) values in the interval 0,1 through the simple modification rt=max-R0Dstit+2Dit,0, as indicated for R0=5,6 in Fig. 5.

Figure 5 also shows profiles of it by the second-order Padé SIR model, and indicates that Padé approximations of third or higher order could be used in an similar way, but the point of diminishing returns would be quickly reached, as model complexity would increase a lot more quickly than quality of approximation.

Before discussing the important consequences implied by the Padé SIR model, relevant properties of that model are briefly summarized next, to better support the consequences established thereafter.

Comparative summary of important properties of the Padé SIR models

The models considered can be analyzed using standard ODE or DDE theory, as already mentioned. Therefore, only aspects that bear insight or novelty will be discussed and corresponding comparisons will be made.

Instability at equilibrium and epidemic outbreak

It can be shown (APPENDIX D) that an equilibrium point, {s=s¯,i=0,r=1-s¯} of the dSIR or of the Padé SIR model is stable and an epidemic outbreak does not occur iff s¯ is below the threshold in Eq. (5). This result is in fact anticipated by the original Kermack and McKendrick analysis.

Final values of s,i,r

It can be shown (APPENDIX E) that at the end of an epidemic that started at s0s¯, i00, and r01-s¯, the total fraction of infected throughout the epidemic is

r=1+W-R0s(0)exp-R0s(0)R0 14

for all four models, where R0=defβD=β/γ and W is the Lambert function42, whose importance in epidemiology modeling appears to have been recognized only recently43 (note that s¯βD=defs¯R0>1 is required for the epidemic to spread). Equation (14) is the analytical solution of Eq. (6) and is precisely what is depicted in the graph of Fig. 1 for s01.

Exponential rate of epidemic spread

For the early part of a spreading epidemic, it can be shown (APPENDIX F) that the infectious fraction, i(t), follows the approximately exponential growth

iti0exp2R0s¯-1p0,Pade´-1SIRtDη 15
iti0exp-3+12R0s¯-3p0,Pade´-2SIRtDη 16

according to the two Padé SIR models, or

iti0a+bexps¯R0+W-s¯R0e-s¯R0p0,dSIRtDη 17

according to the dSIR model, where the constants ab in Eq. (17) are in terms of R0=defβD (APPENDIX F). By comparison, the early growth of it according to the standard SIR model in Eqs. (1)–(3) is

iti0expR0s¯-1p0,SIRγtη 18

where R0=defβ/γ. Note that in all four Eqs. (15)–(18) the rates p0 depend on R0s¯ alone, as is anticipated by the corresponding dSIR, Padé SIR, and SIR models, for which introduction of the dimensionless time η=γt=deft/D leaves R0 as the only remaining parameter in the corresponding equations. Therefore, the above exponential rates p0 are shown as functions of s¯R0 in Fig. 6. Note that s¯=1 for an epidemic without prior immunity in the population.

Figure 6.

Figure 6

Dimensionless exponential rate,p0, or doubling period, td/D (with respect to dimensionless time, η=γt=deft/D) for the dSIR, Padé SIR, and SIR models, by Eqs. (17), (15), (16), and (18), respectively. The first three rates approach each other as s¯R0 approaches 1, whereas the SIR rate remains about half of the other two. The dashed portion of the Padé-1 SIR line is included only to indicate the trend, as the corresponding model would not be used in that range. Recall that p0=ln2D/td and that s¯=1 for an epidemic with no prior immunity.

It is evident in Fig. 6 that the rates (or doubling periods) corresponding to the Padé SIR and dSIR models differ from their SIR counterparts by a factor of about 2, for s¯R0 not much higher than 1. This agrees well with the more rapid early rise of i(t) from numerical integration of the dSIR and Padé SIR models compared to that of the SIR model, as shown in Fig. 5. Note again that despite these rate differences shown in Fig. 6, all four models considered eventually reach the same steady-state values, as captured by Eq. (14).

The importance of these discrepancies for estimation of R0 from early epidemic data will made clear shortly.

Peak of infectious fraction

While an analytical solution for i according to the dSIR model is not obvious to the author, a good approximation can be easily obtained (APPENDIX G) through the first-order Padé SIR model, following the same approach taken for the SIR model, to get

iPade´-1SIR=2s0-lnR0s0R0-1R0 19

The above i, for the same R0, is exactly double the i of the standard SIR model, which is known to be

iSIR=s0-lnR0s0R0-1R0 20

(APPENDIX G). This discrepancy accounts for the differences observed in Fig. 5 between the i produced by the SIR and by the other three models considered. Obviously, this approximation breaks down for values of R0 that yield iPade´-1SIR>1, a situation that would be expected for large values of R0, as illustrated in the last plot R0=6 of Fig. 5.

The discrepancy between the SIR and Padé SIR models also manifests itself in using them for model-based predictions that depend on parameter estimates driven by epidemiological data, as discussed in the next section.

Discussion and extensions

Model-based predictions form fitting epidemiological data

An immediate and important discrepancy for the models discussed is in the estimation of R0 from epidemiological data on daily new cases during exponential growth, i.e. from i(t) or i(t), and from the average infectious period, D=def1/γ. Figure 6 captures the relationship between the exponential growth rate p0 given a corresponding s¯R0. Therefore, for sts¯=1, it is standard to use a simple log-plot of daily new cases vs. time to estimate the slope p0=ln2D/td of expp0t/D (where td is the doubling period) and from that the resulting R0. Following this procedure for s¯=1 (no prior immunity) the two Padé SIR models, Eqs. (15) and (16), yield the novel R0 estimates

R0,Pade´-1SIR=p02+1 21
R0,Pade´-2SIR=p0+32+312 22

the dSIR model yields

R0,dSIR=p01-exp-p0 23

whereas the standard SIR model, Eq. (2), yields the well known estimate9,21,44

R0,SIR=p0+1 24

The above Eqs. (21)–(24) can be visualized in Fig. 6 with p0 considered the independent variable. Note that R0,Pade´-1SIR-1=R0,SIR-12 and R0,Pade´-2SIR=R0,SIR-121+R0,SIR-16 and that R0,dSIRR0,Pade´-1SIR for small p0.

The important message of Fig. 6 is that systematic error may arise in the estimation of R0 when using the standard SIR model. For example, taking D=8.4days (Eq. (8) for COVID-19) and td=2.3days (corresponding to early COVID-19 spread in the US45) yields R0,dSIRR0,Pade-2SIR=3.2 vs. R0,SIR=4 for tdD=0.23 in Fig. 6. As td increases, the discrepancy between R0,dSIR or R0,Pade-2SIR on one hand and R0,SIR on the other becomes more pronounced.

Systematic errors in estimates of R0 have important implications. For example, the conceptual anticipation of total infected through the pandemic, as shown in Fig. 1, following Eq. (14), is going to be significantly affected. In addition, the infectious peak is also going to be affected in a non-trivial way, as shown in Fig. 7. In that figure, profiles of i are plotted as functions of the exponential growth rate, p0, through the following procedure: Given p0, the corresponding values of R0 are computed according to the dSIR, Padé-1 SIR, Padé-2 SIR, and SIR models (Eqs. (21)–(24)) and, subsequently, values of i are computed using Eq. (19) (Padé-1 SIR model) for the first three R0 values and Eq. (20) for the fourth value of R0. For calibration, the dots in Fig. 7 represent calculation of i through direct numerical integration of the dSIR Eqs. (9)–(11) for values of R0 computed using Eq. (23). There is remarkable closeness of i values produced by the Padé SIR models to the ideal values produced by the dSIR model, contrasted to the distance of i values produced by the standard SIR model.

Figure 7.

Figure 7

Predicted maximum infectious fraction, i, based on the exponential rate, p0, of an epidemic spread. The values of i are calculated by (a) the analytical expression of the Pade-1 SIR model fed with estimates of R0 from p0 according to the dSIR, Pade-1 SIR, and Pade-2 SIR models, (b) the analytical expression of the standard SIR model fed with an estimate of R0 from p0 according to the same model, and (c) numerically by integration of the dSIR model fed with an estimate of R0 from p0 according to the same model. The top graph is portion of the bottom graph at higher resolution.

The message from this exercise is that although adjusting the parameter R0 of the standard SIR model can fit data from exponential epidemic growth well, there will remain two significant problems, namely neither the estimated R0 nor the predicted i will be represented well. The proposed model structures offer a better representation.

Analytical calculation of R0 to observe an upper bound on i

Of practical interest is the situation where an upper bound is placed on i, to avoid the overwhelming of hospitalization facilities during an epidemic. For that situation, Eq. (19) of the Padé-1 SIR model has an explicit analytical solution for the corresponding R0=defβD as

s(0)βD=defs(0)R0=2W-1i/s(0)-22ei/s(0)-2 25

where W-1 is the Lambert function of order -1 and typically s01 without prior immunity. By comparison, the standard SIR model yields

s(0)βγ=defs(0)R0=W-1i/s(0)-1ei/s(0)-1 26

The values of R0 indicated by Eqs. (25) and (26), with corresponding definitions, are shown in Fig. 8. It is evident that the Padé SIR model places twice as tight a restriction on R0-1 as the standard SIR model, if i is not to exceed the specified i value. The implications of this result for tasks such as Flattening the Curve through interventions that adjust R0 are clear.

Figure 8.

Figure 8

Maximum value of R0 indicated by the Padé SIR and SIR models for i not to exceed i when s01.

How does the Padé SIR model work?

Underlying the Padé SIR model are constructs for approximating the unit-step cumulative distribution of the infectious time period, T, shown in Fig. 3(n=), as explained in APPENDIX C. Graphs of these approximations and their corresponding formulas are presented in Fig. 9, along with the exponential and unit-step distributions for comparison. Note that the two Padé SIR distributions in Fig. 9 might appear absurd, as they involve negative values. However, this pattern turns out to yield acceptable values for the fractions s,i,r.

Figure 9.

Figure 9

Distributions FθD of the dimensionless infectious period θ/D with average 1. The curves shown imply that the newly infected are removed from the infectious compartment, I, according to the formulas shown (cf. Fig. 5). All distributions have the same average equal to 1, as also indicated by the shaded areas.

It should also be noted that Eq. (12) of the first-order Padé SIR suggests that the infectious loading rate remains R0Dstit, whereas the infectious discharge rate appears as R0st-2Di(t) rather than -i(t)/D, suggested by Eq. (2). This is illustrated visually in Fig. 10 in two ways, both of which underscore the significant differences between the SIR model on one hand and dSIR and Padé SIR models on the other: First (top), a time-varying γt=defr(t)/i(t) (following Eq. (3)) is shown, with the values of rt and i(t) calculated by the first- or second-order Padé SIR model with a fixed D. Note that the discrepancy between D and 1/γt (shown as values of γtD in Fig. 10) remains appreciable even for values of R0 close to 1. Second (middle and bottom), Fig. 10 shows in a stacked plot the differences between the fractions st,it,rt produced by the (first- or second-order) Padé SIR models and the SIR model. In addition to the clear difference in the time profiles and infectious fraction peaks, note that the horizontal slices of the orange segments, corresponding to the infectious period for each newly infected fraction (Fig. 2), remain constant (equal to D) over time for the Padé SIR model, in contrast to the SIR model, for which the infectious period increases (Fig. 10, top).

Figure 10.

Figure 10

Top: Comparison between the inferred time-varying γt=defr(t)/i(t) and the corresponding time-invariant 1/D=0.12day-1 for various R0=defβD in numerical integration of the first- (r(t)/i(t)=βs(t)-2/D) and second-order (r(t)/i(t)=βs(t)-i(t)/i(t)) Padé SIR equations. Middle and bottom: Stacked fractions s,i,r of a population through an epidemic for R0=1.5 according to the first- and second-order Padé SIR model, superimposed on the standard SIR model (cf. Figure 1). The horizontal slices of equal length D shown in the orange area for i(t) are the continuous counterparts of the same area in the discretized plot of Fig. 2.

The proposed approach in the context of Kermack and McKendrick

In the sentence right before they present their SIR model in Equ. (29) of their paper, Kermack and McKendrick18 explain that this is a

special case in which ϕ and ψ are constants κ and l respectively.

with κ,l refering to β,γ of Eqs. (1)–(3), respectively. The assumption about constant ϕ is plausible, as it refers to the rate of spread of the epidemic (cf. Eq. (1)). While that parameter might change over time as a result of interventions taken to curb an epidemic, such changes could easily be reflected in the SIR model by a time-varying ϕ (cf. β in Eqs. (1) and (2)). The assumption about constant ψ, however, as widely as it may have been used, is chosen for mathematical convenience rather than for reasonableness of representation:

If ψθ denotes the rate of removal, …, then the number who are removed from each θ group at the end of the interval t is ψθvt,θ, (ibid., p. 703).

where

vt,θ shall denote the number of individuals in unit area at the time t who have been infected for θ intervals (ibid., p. 702).

However, the rate of removal depends more on the duration over which individuals have remained infected and less on the size of that group. It is this simple fact that is critiqued here and alternatives for which are proposed.

Finally, it is fitting to quote Kermack and MacKendrick’s remarks on fitting field data from a plague outbreak: Along with using the SIR model, thereby assuming an exponential distribution of infectious time after infection, these authors explicitly state five additional simplifying assumptions (p. 715, ibid.) and warn that

deductions as to the actual values of the various constants should not be drawn. It may be said, however, that the calculated curve, …, conforms roughly to the observed figures.

Indeed, all four models considered in this study (dSIR, 1-/2- Padé SIR, and SIR) fit well the data mentioned. Yet, were these models to be used for fitting the early exponential spread of the epidemic, their projections would be quite different, as already elaborated on.

Extensions

As already mentioned, the proposed approach to compartment-based epidemiological modeling is applicable to model structures with a variety of compartments and flows among them. For these structures, the corresponding ODE models resulting from compartment discharge rates proportional to the load of each corresponding compartment14 can be immediately translated (a) from ODEs to DDEs with each compartment delay equal to the average residence time of that compartment, and (b) from DDEs to (first- or second-order) Padé approximations, which retain an ODE structure.

To substantiate these claims by an example, we briefly discuss next an extension of the ideas developed for the SIR structure to the SPIR variant that includes a compartment P between S and I (APPENDIX H). Individuals in the P compartment (equivalent to the E compartment in the standard SEIR structure10,11,46,47) are asymptomatic infectious, that is they can transmit the disease before they enter the I compartment as symptomatic infectious48 a trait observed in several occasions, notably in the current COVID-19 epidemic23,49,50. Corresponding equations are shown in Table 1.

Table 1.

Equations for SPIR, dSPIR, Padé-1 SPIR and Padé-2 SPIR models.

st=-βistit-βpstpt

pt=βistit+βpstpt-αpt

it=αpt-γit

rt=γit

st=-βistit-βpstpt

pt=st-Dp-st

it=st-Di-st-Dp=

=st-Di-st-pt

rt=1-st-Di

st=-βistit-βpstpt

pt=2βistit+βpstpt-ptDp

it=21Dp-1Dipt-1Diit

rt=1-st-pt-it

st=-βistit-βpstpt

pt=12Dp2DpβpR0pstit-pt-Dp2pt

it=-pt+12Di2DiβiR0istit-it-pt-Di2it+pt

rt=1-st-pt-it

Note the correspondence Dp=def1α,Di=def1α+1γ>Dp. Also note that if βi=βp, treating pt+i(t) as a single variable renders the SPIR structures similar to the SIR structures with similar dynamics.

Figure 11 presents a comparison of profiles of i(t) which result from numerical solution of the dSPIR, Padé SPIR, and SPIR models. The values

Dp=1α=5.1days,Di-Dp=1γ=3.3days 27

relevant to COVID-1935,36 are used in all simulations with R0=defβiDi-Dp+βpDp.

Figure 11.

Figure 11

Comparison of the infectious fraction, i, produced by the dSPIR, Padé-1 SPIR, Padé-2 SPIR, and SPIR models for a number of values of R0 (cf. Figure 5). Note that the total infected fraction at any moment would comprise the sum of p and i fractions.

Additional properties of the proposed SPIR models can be established in a similar manner51 and will be explored in more detail elsewhere.

Finally, in situations where there are data to warrant it, one can relax the basic premise of the preceding discussion, namely that the dynamics of a system with S, I, R compartments will likely be close to the dynamics of a system with a step function as cumulative distribution F of infectious period (Figs. 3 and 4). In such situations (e.g. models by Anderson et al.52) a corresponding SIR-like model structure can be developed that employs the ODE it=αDR0st-1i(t) in place of Eq. (12), where the parameter α (1α2) is associated with the sigmoidicity of F. A full development of that case is presented in a separate publication53.

Conclusion

We have made a case for revisiting the standard SIR model that describes the spread of infectious disease epidemics. While that model features valuable insights, it also has fundamental shortcomings related to quantifying the spread of an epidemic, as detailed in the main text. Therefore, use of that model to manage an epidemic could have adverse consequences. A remedy to this problem is proposed in the form of the Padé SIR model structure, which retains all qualitative features of the standard SIR structure as well as its simplicity, yet mitigates its systematic errors. It is also noted that the remedy proposed is not confined to the standard SIR model, but is applicable to the numerous compartment-based epidemiological models that constitute SIR variants, a re-examination of which would be warranted. The tools developed here can be easily and transparently incorporated in related software for practitioners or researchers44,54,55. Related formulas, derived in the main text, can be used both for epidemiological data processing to guide decision making as well as for theoretical analysis to advance the mathematical theory of epidemics.

Supplementary Information

Acknowledgement

The author gratefully acknowledges constructive comments and encouragement by Professors Peter Vekilov and Navin Varadarajan during early stages of the manuscript.

Author contributions

M.N. is solely responsible for all activity associated with creation and submission of this manuscript.

Funding

The Institute of Allergy and Infectious Diseases of the National Institutes of Health under award (grant number R01AI140287) partially supported the research reported in this publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funding sources had no involvement in study design; in the collection, analysis and interpretation of data, nor in the writing of the report nor in the decision to submit the article for publication.

Competing interests

The author declares no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-022-10185-0.

References

  • 1.Jewell NP, Lewnard JA, Jewell BL. Predictive mathematical models of the COVID-19 pandemic: Underlying principles and value of projections. JAMA. 2020;323:1893–1894. doi: 10.1001/jama.2020.6585. [DOI] [PubMed] [Google Scholar]
  • 2.Giordano G, et al. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nat. Med. 2020 doi: 10.1038/s41591-020-0883-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Adam D. The simulations driving the world’s response to COVID-19. Nature. 2020;580:316–318. doi: 10.1038/d41586-020-01003-6. [DOI] [PubMed] [Google Scholar]
  • 4.Wang C, et al. Evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease 2019 in Wuhan, China. medRxiv. 2020 doi: 10.1101/2020.03.03.20030593. [DOI] [Google Scholar]
  • 5.Kucharski AJ, et al. Early dynamics of transmission and control of COVID-19: A mathematical modelling study. Lancet. Infect. Dis. 2020;20:553–558. doi: 10.1016/S1473-3099(20)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rahimi I, Chen F, Gandomi AH. A review on COVID-19 forecasting models. Neural Comput. Appl. 2021 doi: 10.1007/s00521-020-05626-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wynants L, et al. Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ. 2020;369:m1328. doi: 10.1136/bmj.m1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tufekci, Z. This overlooked variable is the key to the pandemic. The Atlantic 30.https://www.theatlantic.com/health/archive/2020/09/k-overlooked-variable-drivingpandemic/616548/ (2020).
  • 9.Fraser C, Riley S, Anderson R, Ferguson N. Factors that make an infectious disease outbreak controllable. Proc. Natl. Acad. Sci. USA. 2004;101:6146–6151. doi: 10.1073/pnas.0307506101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Murray JD. Mathematical Biology: I. An Introduction. Springer; 2002. [Google Scholar]
  • 11.Keeling MJ, Rohani P. Modeling Infectious Diseases in Humans and Animals. Princeton University Press; 2008. [Google Scholar]
  • 12.Brauer F, Castillo-Chavez C. Mathematical Models in Population Biology and Epidemiology. Springer; 2012. [Google Scholar]
  • 13.Diekmann, O., Heesterbeek, H. & Metz, H. In Epidemic Models: Their Structure and Relation to Data (ed Mollison, D.) (1995).
  • 14.Anderson RM, May RM. Population biology of infectious diseases: Part I. Nature. 1979;280:361–367. doi: 10.1038/280361a0. [DOI] [PubMed] [Google Scholar]
  • 15.Brauer F. Mathematical epidemiology: Past, present, and future. Infect. Dis. Model. 2017;2:113–127. doi: 10.1016/j.idm.2017.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hethcote, H. W. In Models for Infectious Human Diseases: Their Structure and Relation to Data (eds Isham, C. & Medley, G.) 215–238 (Publications of the Newton Institute, 1996).
  • 17.Anderson RM, May RM. Infectious Diseases of Humans. Dynamics and Control. Oxford University Press; 1991. [Google Scholar]
  • 18.Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A. 1927;115:700. doi: 10.1098/rspa.1927.0118. [DOI] [Google Scholar]
  • 19.Heesterbeek JA. A brief history of R0 and a recipe for its calculation. Acta. Biotheor. 2002;50:189–204. doi: 10.1023/a:1016599411804. [DOI] [PubMed] [Google Scholar]
  • 20.MacDonald G. The Epidemiology and Control of Malaria. Oxford Univ. Pr.; 1957. [Google Scholar]
  • 21.Dietz K. The estimation of the basic reproduction number for infectious diseases. Stat. Methods Med. Res. 1993;2:23–41. doi: 10.1177/096228029300200103. [DOI] [PubMed] [Google Scholar]
  • 22.Qualls NL, et al. Community mitigation guidelines to prevent pandemic influenza—United States, 2017. MMWR Recomm. Rep. 2017;66:1. doi: 10.15585/mmwr.rr6601a1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ferguson NM, et al. Report 9—Impact of Non-pharmaceutical Interventions (NPIs) to Reduce COVID-19 Mortality and Healthcare Demand. Imperial College; 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hethcote HW. Frontiers in theoretical biology. In: Levin SA, editor. A Thousand and One Epidemic Models. Springer; 1994. pp. 504–515. [Google Scholar]
  • 25.Ferguson NM, et al. Strategies for mitigating an influenza pandemic. Nature. 2006;442:448–452. doi: 10.1038/nature04795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kestenbaum B. Epidemiology and Biostatistics. Springer; 2019. [Google Scholar]
  • 27.Kemper JT. On the identification of superspreaders for infectious disease. Math. Biosci. 1980;48:111–127. doi: 10.1016/0025-5564(80)90018-8. [DOI] [Google Scholar]
  • 28.Siettos CI, Russo L. Mathematical modeling of infectious disease dynamics. Virulence. 2013;4:295–306. doi: 10.4161/viru.24041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hethcote HW. The mathematics of infectious diseases. SIAM Rev. 2000;42:599–653. doi: 10.1137/S0036144500371907. [DOI] [Google Scholar]
  • 30.Rock K, Brand S, Moir J, Keeling MJ. Dynamics of infectious diseases. Rep. Prog. Phys. 2014;77:026602. doi: 10.1088/0034-4885/77/2/026602. [DOI] [PubMed] [Google Scholar]
  • 31.Horrocks J, Bauch CT. Algorithmic discovery of dynamic models from infectious disease data. Sci. Rep. 2020;10:7061. doi: 10.1038/s41598-020-63877-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cushing JM. Integrodifferential Equations and Delay Models in Population Dynamics. Springer; 1977. [Google Scholar]
  • 33.Box, G. E. P. Robustness in Statistics (eds Launer, R. L. & Wilkinson, G. N.) 201–236 (Academic Press, 1979).
  • 34.Byrne AW, et al. Inferred duration of infectious period of SARS-CoV-2: Rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open. 2020;10:e039856. doi: 10.1136/bmjopen-2020-039856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lauer SA, et al. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Ann. Intern. Med. 2019 doi: 10.7326/M20-0504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Boldog P, et al. Risk assessment of novel coronavirus COVID-19 outbreaks outside China. J. Clin. Med. 2020;9:571. doi: 10.3390/jcm9020571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kuang Y. Delay Differential Equations with Applications in Population Dynamics. Academic Press; 1993. [Google Scholar]
  • 38.Gopalsamy K. Stability and Oscillations in Delay Differential Equations of Population Dynamics. Springer; 1992. [Google Scholar]
  • 39.Bellen A, Zennaro M. Numerical Methods for Delay Differential Equations. Oxford; 2003. [Google Scholar]
  • 40.Stephanopoulos G. Chemical Process Control: An Introduction to Theory and Practice. Prentice Hall; 1984. [Google Scholar]
  • 41.Baker GA, Graves-Morris P. Pade Approximants. 2. Cambridge University Press; 1996. [Google Scholar]
  • 42.Corless RM, Gonnet GH, Hare DE, Jeffrey DJ, Knuth DE. On the Lambert W function. Adv. Comput. Math. 1996;5:329–359. doi: 10.1007/BF02124750. [DOI] [Google Scholar]
  • 43.Kesisoglou I, Singh G, Nikolaou M. The Lambert function should be in the engineering mathematical toolbox. Comput. Chem. Eng. 2021;148:107259. doi: 10.1016/j.compchemeng.2021.107259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Obadia T, Haneef R, Boëlle P-Y. The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks. BMC Med. Inform. Decis. Mak. 2012;12:147. doi: 10.1186/1472-6947-12-147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Centers for Disease Control and Prevention—COVID-19 Response. COVID-19 Case Surveillance Public Data Access, Summary, and Limitations (version date: October 31, 2020). (2020).
  • 46.Blackwood JC, Childs LM. An introduction to compartmental modeling for the budding infectious disease modeler. Lett. Biomath. 2018;5:195–221. doi: 10.1080/23737867.2018.1509026. [DOI] [Google Scholar]
  • 47.Anderson RM, Anderson B, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press; 1992. [Google Scholar]
  • 48.Zhang J, et al. Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: A descriptive and modelling study. Lancet. Infect. Dis. 2020;20:793–802. doi: 10.1016/S1473-3099(20)30230-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Institute for Health Metrics and Evaluation (IHME). COVID-19 Projections Assuming Full Social Distancing Through May 2020 (2020).
  • 50.Kucharski AJ, et al. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. 2020 doi: 10.1016/S1473-3099(20)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Nikolaou M. Using feedback on symptomatic infections to contain the coronavirus epidemic: Insight from a SPIR model. medRxiv. 2020 doi: 10.1101/2020.04.14.20065698. [DOI] [Google Scholar]
  • 52.Anderson R, Medley G, May R, Johnson A. A preliminary study of the transmission dynamics of the human immunodeficiency virus (HIV), the causative agent of AIDS. IMA J. Math. Appl. Med. Biol. 1986;3:229–263. doi: 10.1093/imammb/3.4.229. [DOI] [PubMed] [Google Scholar]
  • 53.Nikolaou M. Ziegler and Nichols meet Kermack and McKendrick: Parsimony in dynamic models for epidemiology. Comput. Chem. Eng. 2022;157:107615. doi: 10.1016/j.compchemeng.2021.107615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Jalali MS, DiGennaro C, Sridhar D. Transparency assessment of COVID-19 models. Lancet Glob. Health. 2020;8:e1459–e1460. doi: 10.1016/S2214-109X(20)30447-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sills J, et al. Call for transparency of COVID-19 models. Science. 2020;368:482–483. doi: 10.1126/science.abb8637. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES