Abstract
The Omicron variant has led to a new wave of the COVID-19 pandemic worldwide, with unprecedented numbers of daily confirmed new cases in many countries and areas. To analyze the impact of society or policy changes on the development of the Omicron wave, the stochastic susceptible-infected-removed (SIR) model with change points is proposed to accommodate the situations where the transmission rate and the removal rate may vary significantly at change points. Bayesian inference based on a Markov chain Monte Carlo algorithm is developed to estimate both the locations of change points as well as the transmission rate and removal rate within each stage. Experiments on simulated data reveal the effectiveness of the proposed method, and several stages are detected in analyzing the Omicron wave data in Singapore.
Subject terms: Infectious diseases, Statistics
Introduction
The ongoing worldwide COVID-19 pandemic caused by the SARS-CoV-2 virus has spread to over 200 countries and areas with more than 523 million confirmed cases and around 7 million deaths by May 2022. Although more than 11 billion doses of vaccine have been administered, several variants of SARS-CoV-2 have appeared with faster transmission and greater virulence and led to diminished effectiveness of developed vaccines, which resulted in multiple COVID-19 pandemic waves. Among all the five variants of concern listed by the World Health Organization, the Omicron variant has a higher level of transmissibility and immune escape capability1–6, leading to unprecedented numbers of daily confirmed new cases in many countries and areas. Due to the unique characteristics of Omicron (high transmissibility but low mortality or severe cases), existing analysis methods on the spreading of other variants are not applicable to coping with the Omicron wave of COVID-19 pandemic.
In epidemiology, the susceptible-infected-removed (SIR) model7 is the most popular approach to analyzing the transmission of infectious diseases. Since the beginning of the Omicron wave, a large amount of research on the SIR model or its extensions has been conducted for making inference and prediction. For example, Van Wees et al.8 applied the standard SIR model to predict the infection rate and hospitalization rate in South African province Gauteng, the United Kingdom and the Netherlands. To incorporate local transmission into the SIR model, Götz9 developed the global–local SIR model with a locality-adjusted basic reproduction number. The multi-wave SIR model proposed by Ghosh and Ghosh10 allows researchers to investigate the nonperiodicity of COVID-19 pandemic waves, while Khan and Atangana11 separated the Omicron variant from other variants in order to estimate its basic reproduction number accurately. However, all the aforementioned approaches assume that throughout a COVID-19 pandemic wave both the transmission rate and removal rate remain unchanged. This assumption is unrealistic because both individual actions for self-protection and government policies in response to the outbreak would decrease the transmission rate. In addition, more medical resources would be allocated to combat the uprise of new cases, leading to an increase of the removal rate during a pandemic wave.
To take the effect of societal changes into consideration, the stochastic SIR model with change points is proposed to analyze the evolvement of the Omicron wave of COVID-19 pandemic. Assuming that both the transmission rate and removal rate are time-varying, binomial models are proposed for the daily reported numbers of confirmed cases and removal cases. A latent indicator vector is introduced to partition a pandemic wave into several stages. Compared to existing works12–16 that incorporate change points to compartmental models, our model assumes that both the transmission rate and the removal rate are time-varying and can change multiple times during the study period. Instead of being constant between any pair of adjacent change points, time-varying parameters in our model are homogenous in distribution within each stage and thus are more flexible for model fitting. We develop a Markov chain Monte Carlo (MCMC) algorithm to draw posterior samples of parameters and make Bayesian inference of pandemic wave development. Experiments on simulated datasets suggest the effectiveness of the proposed method in detecting change points of a pandemic wave and the Omicron data in Singapore are analyzed to corroborate the major changes during the Omicron wave.
The rest of this paper is organized as follows. Section “The SIR model” reviews the standard SIR model. In section “Methodology”, we propose the stochastic SIR model with change points and develop an MCMC algorithm for making Bayesian inference. Experiments on simulated datasets are conducted in section “Simulations” to illustrate the performance of the proposed method. In section “Analysis of the Omicron wave in Singapore”, we analyze the Omicron wave of COVID-19 pandemic in Singapore and find several stages with different transmission rates and removal rates which match the major societal changes in Singapore. We conclude this paper in section “Conclusion”.
The SIR model
The susceptible-infected-removed (SIR) model7 is the most widely used mathematical tool to model the spreading of infectious diseases. Given a closed population of N individuals and three possible states, susceptible (S), infectious (I) and removed (R, either recovered or dead), each individual is assumed to be in one state at any time. As time goes by, the state of each individual would evolve from S to I and then from I to R, implying the process that an individual gets infected and then recovers or dies. Let S(t), I(t) and R(t) denote the number of susceptible, infectious and removed individuals in the population at time point t () respectively. Define as the proportion of infectious individuals at time t. Assuming that in-person contacts among individuals follow the Erdős–Rényi model17,18, the standard SIR model describes the flow of individuals from S to I and then from I to R by a set of ordinary differential equations without the access to individual records of infection and removal,
| 1 |
where and are the transmission rate parameter and the removal rate parameter respectively. It is clear that
and thus the population size is fixed as for .
Although the SIR model has been widely used due to its simplicity and interpretability, it cannot fit real Omicron data well. The main reason is that the SIR model assumes that both the transmission rate and the removal rate remain unchanged throughout the whole study period. This assumption is unrealistic for the ongoing COVID-19 pandemic, where the transmission rate would decrease as individual strategies of self-protection or government policies of social distancing are implemented. In addition, the removal rate would also increase as more medical resources are allocated to cope with the pandemic and help to cure patients.
Methodology
The stochastic SIR model with change points
To model the spreading of infectious diseases with time-varying parameters, we propose the stochastic SIR model with change points. Because the numbers of confirmed cases and removed cases are reported only on a daily basis during the COVID-19 pandemic, we assume that the study period is , where T is the length of the study period. Let and be sequences of daily reported numbers of newly infected (confirmed) cases and removed cases respectively. Given the initial state of the population , we modify the SIR model in (1) and develop the discrete-time stochastic SIR model with change points as follows. For , we assume
| 2 |
where is the proportion of infectious individuals at day t and are updated as shown in Fig. 1a. Time-varying parameters and represent the transmission rate and the removal rate at day t respectively.
Figure 1.
Graphical illustration of (a) the update of ; (b) change-points in the expected transmission rates and expected removal rates of different stages.
The second equation of the proposed model (2) corresponds the term in (1), which indicates the number of infectious individuals being removed. Assuming that the removal of different infectious individuals are mutually independent, the number of infectious individuals being removed at day t () follows . However, unlike the term in model (1), we make the following assumptions for each susceptible individual at day : (i) The number of in-person contacts with other individuals at day t, , follows ; (ii) Given , the number of in-person contacts with infectious individuals at day t follows ; and (iii) The probability of transmission for each in-person contact with infectious individuals is . As a result, at day t, the probability for a susceptible individual getting infected is . For identifiability, we reparameterize as the disease transmission rate at day t.
Unlike the original SIR model (1) where parameters and are constant throughout the study period, our model (2) assumes that the transmission rate and the removal rate may change significantly at several change points in the study period. The change points exist because there could be some societal changes, such as the implementation of new social distancing policies, restrictions on individual activities, and the increase of medical resources on patients, which would cause sudden changes in either the transmission rate or the removal rate or both. To depict such dynamic patterns of pandemic evolvement, we introduce a latent binary vector , where indicates that day t is a change point and otherwise, and is fixed as 1 for convenience. The index of the stage that contains day t is and the stage index vector is , as illustrated in Fig. 1b. Thus, the study period is partitioned into stages. Within each stage, parameters are assumed to be homogeneous with the hierarchical priors,
| 3 |
which are denoted as , , , and respectively, with and . The expected transmission rate and the expected removal rate in stage k () are and , respectively. The hyperparameter p expresses one’s prior belief on how often a change point occurs in a pandemic wave, because p is the probability of each time point is a change point. We fix in our experiments and also explore the sensitivity analysis with respect to p.
Bayesian analysis with MCMC
To make Bayesian inference on parameters , , , and , we develop an MCMC algorithm to draw samples from the posterior distribution,
| 4 |
where the functions on the right-hand side correspond to the priors in (3), and and represent the binomial likelihood functions in (2). At each iteration, parameters are sequentially updated with the Gibbs sampling procedure detailed as follows.
-
Update : Given current values and , is updated via an add–delete–swap algorithm. Initializing , an operation is selected from {add, delete, swap} with probabilities,
where . If the add (or delete) operation is selected, we randomly select a which is 0 (or 1) and update its value as 1 (or 0). If the swap operation is selected, we randomly select a pair of with different values and exchange their values. Examples of the candidate obtained by different operations are shown in Fig. 2. Given the candidate , we compute the Metropolis–Hasting ratio as5
where following (3) we can derive
with being the gamma function, and the proposal of the Metropolis–Hasting algorithm isWe then obtain the updated value asCorrespondingly, we also update and ().
- Update and : Given current values , and , we sample and from
for , where is the indicator function. - Update and : Given current values , and , we sample and from posterior densities,
for .
Figure 2.

Illustration of the add–delete–swap algorithm. Based on the initialization , the candidate in the -th update can be obtained as: (i) (1, 0, 1, 0, 0, 1, 0, 0, 1, 0) by selecting the add operation and updating as 1; (ii) (1, 0, 0, 0, 0, 1, 0, 0, 0, 0) by selecting the delete operation and updating as 0; (iii) (1, 0, 1, 0, 0, 0, 1, 0, 0, 0) by selecting the swap operation and exchanging values of and .
Using the MCMC algorithm, a set of posterior samples is obtained for inference. Because the main interest lies in detection of change points, we aggregate to obtain a point estimate (or the corresponding ). As the indicator vector is a binary vector with possible values in total, its posterior mean does not imply a partition of the study period and thus is difficult to interpret. Taking the sequential structure of into consideration, we interpret each (or the corresponding ) as a cluster of time points and obtain by solving a clustering aggregation problem as follows.
- For each pair of time points t and (), we estimate the posterior probability that no change points exist in the time period as
where is the indicator function. - The Bayes estimator is then obtained as
. The Bayes estimator can also be obtained correspondingly.
Based on , we can also compute probability regions of change points. With , let denote the k-th estimated change point (). The highest posterior density (HPD) interval of the k-th change point is computed as , where
As the number of stages varies among posterior samples, we use and as smoothed estimators of the expected transmission rate and expected removal rate at day t respectively.
Simulations
To evaluate the performance of the proposed method in detecting change points of the transmission rate and removal rate during a pandemic wave, we conduct simulated experiments by applying the developed MCMC algorithm for making Bayesian inference.
Data generating mechanism
Considering a study period of length and a fixed population size , we partition the study period into four stages evenly. In other words, there are three true change points at , and . To mimic the dynamics of the transmission rate and removal rate during a pandemic wave, we design three scenarios with the time-varying transmission rate and removal rate as follows,
- Scenario 1:
- Scenario 2:
- Scenario 3:
With initial numbers , we generate 100 datasets for each of three scenarios. Within each dataset, daily confirmed cases and daily removed cases are generated sequentially as Fig. 1a under the model (2) from to . Given and , we exhibit () in Fig. 3. It is clear that the severity of the pandemic wave is the strongest under Scenario 1 and the total number of infections is the smallest under Scenario 3.
Figure 3.
Average proportions of susceptible (S), infectious (I) and removed (R) individuals in the population at time t () under different scenarios.
Bayesian analysis
We apply the MCMC algorithm proposed in section “Methodology” with hyperparameter to analyze simulated datasets. For each dataset, we discard the first 5000 iterations as burn-in and draw one posterior sample for every 10 iterations in the next 10000 iterations, leading to posterior samples in total.
As our main interest lies in the detection of change points, we first investigate the performance of the proposed Bayes estimator . We present the proportion of 100 simulated datasets where a change point is detected at time point t () in Fig. 4a–c. Under all three scenarios, the estimated change points are near the true ones, suggesting that the proposed Bayesian method can accurately locate the change points of a pandemic wave. Compared with the first two scenarios, the estimated change points under Scenario 3 are more concentrated around the truth. We also present the coverage probability of probability regions (i.e., HPD intervals) of change points for each t under different scenarios and different values of in Fig. 4d–f. The average length of HPD intervals is the shortest under Scenario 3, while intervals of the second and third change points sometimes intersect under Scenarios 1 and 2.
Figure 4.
(a–c) Proportions that a change point is estimated at time t () under different scenarios; (d–f) Coverage probability of the HPD intervals of change points at each t under different scenarios ( in red; in green; in blue).
To quantitatively measure the agreement between the true (or ) and the estimator (or ), we transfer the stage allocation of time points to a clustering problem and adopt the adjusted Rand index19 and the mutual information20 as evaluation metrics.
-
Adjusted Rand index (ARI):
where the proportions of true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN) in stage allocation are given byThe range of ARI is [0, 1], where a larger ARI suggests that is more similar to and the maximum possible value is obtained when .
-
Mutual information (MI):
whereThe range of MI is , where a larger MI suggests that is more similar to and the maximum possible value is obtained when . Under our three scenarios, the maximum possible value of MI is
because for
The performance of the Bayes estimator under different scenarios is summarized in the first row of Table 1. It is clear that our estimator can accurately estimate the underlying , especially under Scenario 3 with large changes of the transmission rate and removal rate between adjacent stages.
Table 1.
Average values of accuracy measures of the Bayes estimator with standard deviations in parentheses.
| ARI | MI | |||||
|---|---|---|---|---|---|---|
| Scenario 1 | Scenario 2 | Scenario 3 | Scenario 1 | Scenario 2 | Scenario 3 | |
| 0.855 (0.092) | 0.862 (0.094) | 0.960 (0.035) | 1.198 (0.107) | 1.206 (0.107) | 1.327 (0.049) | |
| 0.954 (0.039) | 0.953 (0.037) | 0.977 (0.023) | 1.319 (0.054) | 1.316 (0.052) | 1.351 (0.035) | |
| 0.625 (0.040) | 0.618 (0.042) | 0.709 (0.050) | 1.144 (0.057) | 1.146 (0.051) | 1.213 (0.058) | |
We also vary the hyper-parameter p to investigate how the performance of is sensitive to the choice of prior probability. Both the ARI and MI of are larger under than those under but with no significant gaps. This is because is closer to the true proportion of change points than 0.01 and thus true change points can be selected with more certainty, even though the signal strengths of true change points are still strong enough to be detected using . When , the ARI of becomes smaller due to more false positives in change point detection. In general, the performance of is stable as long as the prior probability p is not far larger than the true proportion of change points. A guideline to choose p is first checking the naive estimators of the transmission rate and removal rate (as discussed in the next section) and then taking the ratio of the number of change points and the study length T.
Analysis of the Omicron wave in Singapore
At the beginning of 2022, Singapore witnessed the outbreak of the Omicron wave of COVID-19 pandemic. To investigate change points of the Omicron wave, we collect the daily reported number of confirmed cases and removed cases in Singapore from January 3 to April 21, 2022, during which 893854 confirmed cases and 453 deaths of COVID-19 were reported in days. Let be the population size. The daily reported number of confirmed and removed cases during the study period is presented in Fig. 5a.
Figure 5.
(a) The numbers of daily confirmed and removed COVID-19 cases, (b) the naive estimator of the time-varying transmission rate, and (c) the naive estimator of the time-varying removal rate during the Omicron wave in Singapore.
To gain insight into the trend of and , we compute the naive estimator of model (2),
As shown in Fig. 5b,c, the transmission rate was very high in the first several days and then became stable between January 18 and March 19, 2022. After March 19, the transmission rate decreased to almost zero. The removal rate was low before January 18, 2022. Between January 18 and March 23, the removal rate increased stably, which then had a sharp drop followed with a sudden increase because recovery cases were not reported from March 24 to April 1, 2022.
Although the naive estimator suggests possible existence of change points in the study period, it cannot provide the estimates of change points with quantitative uncertainty measurement and thus we apply the proposed MCMC algorithm with hyperparameter to draw posterior samples of parameters for Bayesian inference. We discard the first 5000 iterations as burn-in and draw one posterior sample for every 10 iterations in the next 10000 iterations, leading to posterior samples in total. The Bayes estimator is presented in Fig. 6a, with estimated posterior probabilities () and the corresponding probability regions of change points. Four change points are detected in the Omicron wave in Singapore. The probability regions of the third (March 23, 2022) and fourth (April 2, 2022) change points are shorter, while uncertainties of the first two change points (January 28 and February 25, 2022) are greater. With the smoothed estimators of the expected transmission rate and expected removal rate on day t () shown in Fig. 6b,c, we can interpret different stages as follows.
January 3 to January 27: The transmission rate was high while the removal rate was low, indicating the outbreak of the Omicron wave and medical resources in Singapore were not ready to cope with the Omicron variant yet.
January 28 to February 24: The transmission rate decreased slightly, indicating that people in Singapore had realized the outbreak of the Omicron wave and begun self-protection activities. In addition, the removal rate increased to a high level as the Ministry of Health in Singapore begun deploying medical resources to COVID-19 patients.
February 25 to March 22: The Omicron wave reached the peak and then passed it due to the sharp decrease in the transmission rate.
March 23 to April 1: An abnormal period was detected when recovery cases were not reported for several days, leading to a sharp drop in the estimated removal rate.
April 2 to April 21: The Omicron wave became stabilized and under control.
Figure 6.
The Bayes estimator of (a) and 95% probability regions (HPD intervals) for change points; (b) the expected transmission rate; and (c) the expected removal rate.
Conclusion
The stochastic SIR model with change points is developed to quantify the evolvement of the Omicron wave of COVID-19 pandemic. The proposed model assumes that the numbers of daily confirmed cases and removed cases follow binomial distributions and the probability of a susceptible individual getting infected is determined by the proportion of currently infectious individuals and the time-varying transmission rate. We develop an MCMC algorithm to draw posterior samples and provide both the Bayes estimators and probability regions of change points, which are corroborated by experiments on simulated data. By analyzing the Omicron wave in Singapore, four change points are detected, corresponding to the increase of medical resources, decrease of the transmission rate, the beginning and the end of an abnormal period with no removal cases reported. The proposed model is applicable to a pandemic wave of any infectious disease under the impact of social changes. It is also possible to make inference on the spreading of COVID-19 and compare the effect of disease-control policies in different countries, as long as both daily numbers of confirmed cases and removal cases are available. In addition, our model can be easily extended to incorporate other interpersonal contact patterns into disease transmission. Although our model (2) is also applicable to sequential (real-time) change point detection in an ongoing epidemic, the goals for real-time detection are to minimize the detection delay as well as false detections of change points21. Timely detecting changes in the transmission rate and removal rate is critically important for policy making and identifying any possible new variants of the virus. To accommodate such objectives, construction of the posterior loss and Bayes estimators of change points warrant further investigation.
Supplementary Information
Acknowledgements
We thank the two referees for their careful reviews of our manuscript. This research was partially supported by funding from the Research Grants Council of Hong Kong (17308321).
Author contributions
Both J.G. and G.Y. wrote and revised the manuscript, and J.G. conducted all experiments and drew all plots using R-3.6.1.
Data availability
All data generated and analyzed during this study are included in the supplementary file “Code.zip”.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-022-25473-y.
References
- 1.Lyngse FP, et al. Household transmission of the SARS-CoV-2 Omicron variant in Denmark. Nat. Commun. 2022;13:5573. doi: 10.1038/s41467-022-33328-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pearson CAB, et al. Bounding the levels of transmissibility and immune evasion of the Omicron variant in South Africa. medRxiv. 2021 doi: 10.1101/2021.12.19.21268038. [DOI] [Google Scholar]
- 3.Lu L, et al. Neutralization of severe acute respiratory syndrome coronavirus 2 omicron variant by sera from BNT162b2 or CoronaVac vaccine recipients. Clin. Infect. Dis. 2021 doi: 10.1093/cid/ciab1041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang L, et al. The significant immune escape of pseudotyped SARS-CoV-2 variant Omicron. Emerg. Microbes Infect. 2021;11:1–5. doi: 10.1080/22221751.2021.2017757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gozzi N, et al. Preliminary modeling estimates of the relative transmissibility and immune escape of the Omicron SARS-CoV-2 variant of concern in South Africa. medRxiv. 2022 doi: 10.1101/2022.01.04.22268721. [DOI] [Google Scholar]
- 6.Viana R, et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature. 2022;603:679–686. doi: 10.1038/s41586-022-04411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1927;115:700–721. doi: 10.1098/rspa.1927.0118. [DOI] [Google Scholar]
- 8.van Wees J-D, et al. SIR model for assessing the impact of the advent of Omicron and mitigating measures on infection pressure and hospitalization needs. medRxiv. 2021 doi: 10.1101/2021.12.25.21268394. [DOI] [Google Scholar]
- 9.Götz, T. Analysis of an SIR-model with global and local infections. arXiv (2022).
- 10.Ghosh K, Ghosh AK. Study of COVID-19 epidemiological evolution in India with a multi-wave SIR model. Nonlinear Dyn. 2022 doi: 10.1007/s11071-022-07471-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Khan MA, Atangana A. Mathematical modeling and analysis of COVID-19: A study of new variant Omicron. Physica A Stat.l Mech. Appl. 2022;599:127452. doi: 10.1016/j.physa.2022.127452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dehning J, et al. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 2020;369:eabb9789. doi: 10.1126/science.abb9789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kim Y-J, Seo MH, Yeom H-E. Estimating a breakpoint in the pattern of spread of COVID-19 in South Korea. Int. J. Infect. Dis. 2020;97:360–364. doi: 10.1016/j.ijid.2020.06.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dass SC, et al. A data driven change-point epidemic model for assessing the impact of large gathering and subsequent movement control order on COVID-19 spread in Malaysia. PLoS One. 2021;16:e0252136. doi: 10.1371/journal.pone.0252136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jiang S, Zhou Q, Zhan X, Li Q. BayesSMILES: Bayesian segmentation modeling for longitudinal epidemiological studies. J. Data Sci. 2021;19:365–389. doi: 10.6339/21-jds1009. [DOI] [Google Scholar]
- 16.Perakis G, Singhvi D, Lami OS, Thayaparan L. COVID-19: A multiwave SIR-based model for learning waves. Prod. Oper. Manag. 2022 doi: 10.1111/poms.13681. [DOI] [Google Scholar]
- 17.Erdös P, Rényi A. On random graphs I. Publ. Math. Debrecen. 1959;6:290. doi: 10.5486/PMD.1959.6.3-4.12. [DOI] [Google Scholar]
- 18.Gilbert EN. Random graphs. Ann. Math. Stat. 1959;30:1141–1144. doi: 10.1214/aoms/1177706098. [DOI] [Google Scholar]
- 19.Hubert L, Arabie P. Comparing partitions. J. Classif. 1985;2:193–218. doi: 10.1007/bf01908075. [DOI] [Google Scholar]
- 20.Steuer R, Kurths J, Daub CO, Weise J, Selbig J. The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics. 2002;18:S231–S240. doi: 10.1093/bioinformatics/18.suppl_2.s231. [DOI] [PubMed] [Google Scholar]
- 21.Polunchenko AS, Tartakovsky AG. State-of-the-art in sequential change-point detection. Methodol. Comput. Appl. Probab. 2011;14:649–684. doi: 10.1007/s11009-011-9256-5. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated and analyzed during this study are included in the supplementary file “Code.zip”.





