Skip to main content
PLOS One logoLink to PLOS One
. 2026 Mar 6;21(3):e0341667. doi: 10.1371/journal.pone.0341667

Integrated disease model considering mutation-induced infection waves with COVID-19 cases

Seungho Baek 1,2, Haneol Cho 3, SangChul Lee 1, Myeongsu Yoo 4, Donghyok Kwon 4, KyuHwan Lee 1, Yeonju Kim 5,*, Chansoo Kim 1,2,*
Editor: Yury E Khudyakov6
PMCID: PMC12965675  PMID: 41790946

Abstract

COVID-19, an unprecedented global pandemic, has caused successive waves that pose unique challenges to public health and epidemiological research. Traditional Susceptible–Infected–Recovered (SIR) models often struggle to capture these complex dynamics, especially given that the virus has spawned multiple sub-variants. To tackle these challenges, we adopt an empirical modeling approach by integrating real-world data from Our World in Data (daily confirmed COVID-19 cases) and GISAID (variant prevalence) into a newly proposed integrated model. Specifically, we sum multiple sigmoidal (logistic) curves, each representing the cumulative infections of a distinct dominant variant, and recalibrate the model whenever a new variant emerges. By incorporating variant-specific parameters, our framework effectively captures the biological and epidemiological characteristics of COVID-19 in a dynamic, data-driven manner. Most importantly, we employ the Pruned Exact Linear Time (PELT) algorithm to provide rigorous mathematical justification for when separate variant models can be legitimately summed: specifically, when variant dominance reaches approximately 50%. This establishes the theoretical foundation that separate logistic models can be additively combined under specific dominance conditions. We evaluate our model using Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Our empirical findings confirm that integrating dominant variants is associated with markedly improved accuracy compared to a single-strain approach, based on data from fourteen countries (including South Korea, the United States, and the United Kingdom) and global aggregates. We also provide theoretical motivation for approximating SIR dynamics via logistic functions and discuss how this integrated framework can be refined to enhance predictive performance. In doing so, we demonstrate the feasibility and advantages of iteratively updating epidemiological models in response to emerging variants, thereby offering actionable insights for ongoing and future pandemic management.

Introduction

Since the outbreak of the novel coronavirus disease (COVID-19), which was first reported in Wuhan, China, in December 2019 [13], a vast array of mathematical and computational models have been developed to understand and predict the pandemic’s course. The canonical Susceptible–Infected–Recovered (SIR) model, described in Eq (1), served as a cornerstone in early analyses, offering insight into the basic dynamics of infectious diseases [48].

dSdt=βSI,dIdt=βSIγI,dRdt=γI, (1)

In the above formulation, S, I, and R represent the susceptible, infected, and recovered (or removed) populations, respectively; β is the transmission rate, and γ is the recovery rate. Various extensions to the SIR framework have been proposed to incorporate practical elements such as social distancing [47,911] and vaccination strategies [1214], and these models have proved valuable for preparedness and risk management [1518]. Despite these advances, many early predictions underestimated the pandemic’s duration, failing to account for ongoing waves of infection driven by newly emerging variants [19,20]. In particular, the Omicron variant exhibited a remarkable transmission rate, approximately threefold higher than previous strains, even among fully vaccinated populations [2123], and drastically reduced vaccine effectiveness [14,2426]. The repeated mutation of the SARS-CoV-2 virus has thus challenged the assumption of a single dominant strain, revealing limitations of traditional single-strain SIR models that do not explicitly incorporate variant dynamics [2734]. Fig 1 shows how each distinct wave from late 2021 to mid 2022 closely aligns with the emergence of a specific variant (Delta, BA.1, BA.2, BA.5, and so forth).

Fig 1. Worldwide confirmed cases and variant ratios.

Fig 1

Id (left) is the global confirmed COVID-19 cases from October 2021 to August 2022; ϕ (right) represents the variant ratio over the same period, corresponding to the timeline.

The emergence of virus mutations plays a significant role in driving new waves of pandemics [35,36]. As shown in Fig 1, from October 2021 to August 2022, four distinct waves were observed, each moving in tandem with the emergence of a different variant of the virus. This means that one virus can be seen as a series of viral waves driven by successive mutations. Epidemiological models that are predicated on a single strain or that do not include viral mutations may not adequately express the dynamics of disease propagation. Therefore, as viruses mutate over time and spread rapidly, the predictability of the SIR model may decrease [27,37], since traditional SIR models do not take into account the effects of new variants [2830,38]. In response to changes in the pandemic, many researchers expanded their models to a dynamic SIR model that accounted for vaccine uptake and effectiveness and new variants in estimating cases, hospitalization, deaths, and the number of necessary beds [31,39].

In recent research, the subsequent waves of COVID-19 were represented through a combination of five wavelets [40], demonstrating superior results compared to predictions from the ARIMA model. Research by Kaxiras [29] added a 14-day cycle S-I-R model to represent sub-variants of COVID-19. However, these precedents lacked evidence for representing sub-variants with a single SIR or wavelet. Vasconcelos’s research [41] modeled the long-term waves of COVID-19 by summing single SIR models for each sub-variant and utilized time parameters to accurately represent peaks, though concerns of overfitting arose due to the SIR model having six parameters. Finally, Gerardo’s [28] study examined the modeling of sub-variants for Ebola and SARS infections but showed potential overfitting and insufficient biological evidence for simply adding peaks of various SIR models. In addition, Lazebnik [42] proposed a generalized multi-strain SIR framework that highlights theoretical challenges in modeling more than two competing pathogens. While analytically interesting, their model does not utilize real-world prevalence data, which limits its practical applicability to observed variant transitions. Here, we consider the question: if a bimodal approach is superior to a unimodal approach, why not fit all sub-variants of Omicron using a multimodal model? As mentioned, there are over 50 reported sub-variants of the Omicron virus. However, this approach has been previously explored, and fitting all these variants into a single model risks creating an unproductive overfitted model. In this research, we demonstrate through calculations that a model incorporating only two dominant variants can sufficiently describe the actual data. In this study, we approximate each variant’s SIR wave by a logistic function, and sum multiple logistic curves to capture successive waves. We present a model that effectively captures the ongoing waves by incorporating the characteristics of mutant viruses. Additionally, we establish the biological basis for this model, providing an interpretation of the continuous waves. This approach is expected to enable more accurate and realistic predictions of infectious disease spread and support the development of effective response strategies.

Materials and methods

Heuristic model

In this study, we employ the following logistic form as the foundational basis for modeling infectious diseases:

Ic(t)=a1+exp{c(tb)} , (2)

where Ic(t) denotes the cumulative number of confirmed cases at time t. The parameter a represents the final epidemic size (i.e., the upper bound of the cumulative infected), b is the inflection point at which the epidemic growth is most rapid, and c is a rate parameter that governs the speed of infection spread. We fit Eq (2) to the observed cumulative cases for each dominant variant period, using an ordinary least-squares approach. This logistic equation can also be viewed as an approximation to the cumulative infections derived from a classical SIR framework under certain simplifying assumptions. Specifically, if we define Ic(t) as the rate of new infections, then under the approximation S(t)N (where N is the total population) we have

Ic(t)=βS(t)I(t)NβI(t),

which, upon integration, yields a logistic-type solution for Ic(t) [28,29]. Hence, a, b, and c can be interpreted, respectively, as the eventual maximum of cumulative infections, the time at which growth transitions (inflection point), and the overall infection growth rate.

However, real-world COVID-19 data often exhibit multiple waves due to the emergence of distinct variants over time. To capture such successive waves, we integrate two (or more) logistic terms, each corresponding to a dominant variant. For example, if two dominant variants drive consecutive waves, an integrated model can be written as:

Ic(t)=a11+exp{c1(tb1)}+a21+exp{c2(tb2)}. (3)

Each logistic term models the cumulative infections of a specific variant with its own parameters ai, bi, and ci. In the sections that follow, we show how this logistic function is fitted to the COVID-19 data and how each parameter (a, b, c) reflects the epidemic characteristics. We then extend the model by incorporating additional logistic terms to account for new dominant variants over time.

Data extraction and selection criteria

The heuristic model is based on daily COVID-19 confirmed case data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [43] and Our World in Data (OWiD) [44]. The study period spans from October 2021 to September 2022, encompassing four major variant transitions (Delta, BA.1, BA.2, BA.5). This period was selected to ensure sufficient temporal coverage for model validation while maintaining data reliability before the global scale-down of COVID-19 surveillance systems [45,46].

The data of the viral mutations in confirmed cases were obtained from GISAID [47], providing data that details the distribution of mutated viruses in various countries.

We selected fourteen countries (United States, United Kingdom, Japan, Australia, Canada, France, Israel, South Korea, Chile, Denmark, Germany, South Africa, Singapore) and global aggregates based on two criteria: (1) availability of consistent genomic sequencing data from GISAID, and (2) reliability of daily case reporting. Countries with known data quality issues (e.g., China) were excluded from the analysis.

Data were smoothed using a 7-day moving average to mitigate reporting fluctuations (e.g., weekend effects). For countries with weekly reporting schedules (France, and Canada and Australia after June 2022), linear interpolation was applied to estimate daily values before smoothing.

Statistical analysis

Model parameters (a, b, c) in Eq (2) were estimated using nonlinear least squares optimization (Levenberg–Marquardt algorithm) implemented in Python’s SciPy library (version 1.7.3). Initial parameter values were set heuristically: the epidemic size a was initialized as the maximum cumulative cases in the observation window, the inflection point b as the midpoint of the observation period, and the growth rate c as 0.1.

To rigorously evaluate model performance, we employed three complementary error metrics: Mean Absolute Percentage Error (MAPE, Eq 5), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE):

RMSE=1ni=1n(yiy^i)2,MAE=1ni=1n|yiy^i| (4)

where yi denotes observed values, y^i represents model estimates, and n is the number of observations. MAPE provides scale-independent comparison across countries, RMSE emphasizes larger deviations during peak periods, and MAE offers interpretable absolute error in case counts.

Mean absolute percentage error

In our study, we employed the Mean Absolute Percentage Error (MAPE) to evaluate the accuracy of our integrated model. MAPE is a standard measure of forecast accuracy in various analytical fields, quantifying the average absolute percentage difference between observed values and those predicted by the model. This measure is particularly beneficial for its interpretability as it presents errors as a percentage. The formula for MAPE is expressed as follows:

MAPE=(1ni=1n|yiy^iyi|)×100% (5)

where yi denotes the actual data values, y^i represents the estimates provided by the integrated model, and n is the total number of observations. This metric allows for a straightforward comparison of model performance across different datasets or modeling approaches.

Mathematical justification for additive model combination

The additive combination of multiple logistic models requires theoretical justification, particularly regarding the timing of model integration. Let Itotal(t) denote the total cumulative incidence. When a single variant V1 dominates (ϕV11), the dynamics are well-approximated by Itotal(t)IV1(t). As a new variant V2 emerges and gains prevalence, the total incidence transitions to a superposition:

Itotal(t)=IV1(t)+IV2(t) (6)

The critical question is: what prevalence threshold ϕ does this additive formulation become necessary?

Our empirical analysis using the PELT algorithm (Fig 3) identifies this change point at approximately ϕ50% (ranging from 50% for BA.1 to 56% for BA.5). This aligns with the theoretical intuition that when a new variant comprises the majority of incident cases (ϕ>50%), the epidemiological dynamics are no longer adequately captured by the declining wave of the previous variant alone; instead, the superposition of both variant-specific curves is required.

Fig 3. Model application limit point.

Fig 3

The graph displays the difference in error values between the two models for each country on the y-axis, represented as Δα. The prevalence of new variant strains is represented by ϕ on the x-axis. As a new variant emerges and triggers a new wave, the single model increasingly fails to fit the data. This leads to a sharp rise in its error compared to the integrated model. The average Δα across the nine cases is represented by a bold red line. The green vertical line indicates the abrupt change point in the time series data (the red line), as detected by the PELT algorithm.

This additive formulation assumes that infections from different variants can be treated as largely independent during their respective dominance periods. During the early-to-mid pandemic phase studied (October 2021, September 2022), each major variant exhibited significant antigenic differences from its predecessors, and population immunity remained limited, supporting this independence assumption. The implications and limitations of assumption are discussed in the Limitations section.

PELT algorithm

The PELT (Pruned Exact Linear Time) algorithm was employed to detect changes in the accuracy of our model. PELT is an advanced change point detection method that identifies shifts in statistical properties within a dataset. The algorithm optimizes the detection process by minimizing a cost function, which combines a measure of fit (such as the residual sum of squares) and a penalty term for the number of change points. The cost function is given by:

C(t)=i=1t(yiy^i)2+βK (7)

where C(t) is the cost up to time t, yi are the observed values, y^i are the predicted values, β is a penalty constant determined via the Bayesian Information Criterion (BIC), and K is the number of change points. The algorithm prunes unnecessary calculations to achieve linear time complexity, ensuring efficiency even with large datasets. In our study, we applied PELT to identify significant change points in the accuracy of the model over time. By analyzing the MAPE values before and after the detected change points, we demonstrated that the enhanced model, incorporating the improvements, significantly outperformed the original model post-change point.

Results

Comparative evaluation of integrated and single models

Our integrated model is designed to capture the unique epidemiological properties of each variant by fitting separate SIR models (approximated by a logistic function) to the data corresponding to during the dominance period of each variant. These individual models are then combined to create a comprehensive representation of the pandemic’s trajectory over time. To evaluate the performance of our integrated model, we conducted a comprehensive analysis using data from fourteen cases: the United States, the United Kingdom, Japan, Australia, Canada, France, Israel, South Korea, Chile, Denmark, Germany, South Africa, Singapore and global data. Accuracy was measured using the Mean Absolute Percentage Error (MAPE), a widely used metric for assessing the accuracy of models against non-negative actual time-series data [48]. Additionally, RMSE and MAE were computed to provide complementary perspectives on model accuracy (S1 Table). MAPE calculates the error by measuring the deviation between model estimates and the actual data. We focused on the four most prevalent Omicron subvariants: Delta, BA.1, BA.2, and BA.5. In all cases, our integrated model consistently outperformed the single SIR model in terms of its ability to accurately capture the observed dynamics of the pandemic. To illustrate these accuracy improvements, we present examples in Fig 2 that showcase diverse epidemiological contexts. This improved performance was evident in the model’s ability to better fit the data during periods of transition between dominant variants, as well as in its overall accuracy throughout the study period.

Fig 2. Integrated and single models for variants in Worldwide, South Korea, the US, the UK and Japan.

Fig 2

Each figure’s bar chart represents daily confirmed cases. The black dashed line indicates the estimates from the single model, and the colored segments represent the estimates from the integrated model, which are the sum of the two variant viruses.

The error reduction rates shown in Table 1 exhibit varying levels of improvement across different regions as new variants emerge. Globally, the reduction rates increased steadily from 12.89%p (Delta–BA.1) to 19.84%p (BA.1–BA.2) and 25.46%p (BA.2–BA.5). In Korea, notable improvement was observed between BA.2 and BA.5 (42.24%p), whereas the reduction was more modest (3.51%p) between BA.1 and BA.2, reflecting the temporal overlap of these variants within a two-week interval. The USA showed balanced improvements across all variant transitions, ranging from 22.88%p to 25.49%p. The UK exhibited significant gains, especially during the later transitions (43.91%p for BA.2–BA.5). Japan displayed strong performance in both early (Delta–BA.1: 49.15%p) and later transitions (BA.2–BA.5: 49.23%p). In France, the smallest reduction (5.03%p) was noted initially, but larger gains (62.65%p) appeared from BA.2 to BA.5. Australia similarly demonstrated considerable improvement (49.29%p) for the BA.2–BA.5 transition. Canada recorded a higher error reduction (31.44%p) for the BA.1–BA.2 transition, while Israel displayed its largest improvement (58.33%p) in the same transition. Overall, these findings underscore the consistent benefit of incorporating emerging variants into the model, even though the magnitude of improvement varies according to local epidemiological and demographic factors. Additionally, we extended our analysis to five more countries—Chile, Denmark, Germany, South Africa, and Singapore—and present those results in Supporting Information. These supplementary findings confirm similar patterns of improvement, further supporting the robustness of our integrated model.

Table 1. Error reduction rates (%p) for each case (MAPE). Additional error metrics (RMSE, MAE) are presented in S1 Table.

Case Delta-BA.1 BA.1-BA.2 BA.2-BA.5
World 12.89 19.84 25.46
Korea (kr) 21.41 3.51 42.24
USA (us) 25.49 24.36 22.88
UK (uk) 31.34 35.94 43.91
Japan (jp) 49.15 18.34 49.23
France (fr) 5.03 30.64 62.65
Australia (au) 36.56 15.70 49.29
Canada (ca) 11.45 31.44 22.92
Israel (is) 35.52 58.33 34.96

Model fit assessments and applying criteria by PELT algorithm

We have computed the model’s error using MAPE for three cases: when the BA.1, BA.2, and BA.5 variants emerged. MAPE measures the mean percentage error between the fitted and observed values, indicating how much the model deviates from the actual data. As a new variant emerges and triggers a new wave, the error of the single model increasingly grows. This leads to a sharp rise in its error compared to the integrated model. In Fig 3, we represent the difference in MAPE values between the integrated model and the single model as Δα. As the prevalence of a new variant ϕ increases, the rising Δα value indicates that the error of the single model increases relative to that of the integrated model. We applied the PELT algorithm to the red line, which represents the average of all Δα values, to identify the point where the error of the single model increases most sharply. Typically, fitting single model after approximately 50% for BA.1, 54% for BA.2, and 56% for BA.5 resulted in a sharp increase in error. Therefore, when the prevalence of a new variant exceeds about 50% (When it becomes dominant), it is essential to use a model that accounts for the variant rather than relying on a standard SIR model.

Sensitivity analysis

To assess the robustness of our findings, we conducted sensitivity analyses on key modeling parameters.

Dominance threshold. Based on the error improvement patterns shown in Fig 3, we examined model performance across different prevalence thresholds for integrating additional variant components. The integrated model consistently outperforms the single model across thresholds from approximately 40% to 60%. The PELT algorithm identified thresholds around 50% (BA.1), 54% (BA.2), and 56% (BA.5), all falling within this optimal range, confirming that our empirically-derived criterion is robust to reasonable variations in threshold selection.

Error metrics. The superiority of the integrated model was confirmed across all three metrics (MAPE, RMSE, MAE), with consistent patterns of improvement observed both globally and across individual countries (S1 Table). This consistency across metrics with different mathematical properties strengthens confidence in the robustness of our findings.

Discussion

Summary

We have demonstrated that emerging and dominant variants are significantly associated with the epidemiological waves of COVID-19. Incorporating these variants into a single model markedly enhances the performance relative to a single-strain SIR approach. Using data from various countries (e.g., South Korea, the United Kingdom), we confirmed that variant-specific modeling captures the real-world trajectory more accurately. Moreover, we found that a new wave generally appears every 2–3 months when a new variant emerges, suggesting that policymakers can utilize this interval to design and implement public health strategies.

Country-specific case analysis

In Fig 2, while variations exist across different regions, an important observation is that Omicron sublineages often manifest a new peak every two months. This timing allows local authorities to respond proactively. Whenever major variants (Delta, BA.1, BA.2, BA.5, etc.) were dominant, our integrated model provided superior fits compared to a single logistic curve. Significant improvements occurred when the timing of variant emergence was clearly distinct, producing bimodal or multimodal distributions in the data. For example, the worldwide shift from BA.2 to BA.5, as well as the United Kingdom’s shifts from BA.1 to BA.2 and from BA.2 to BA.5, yielded error reductions exceeding 40%. Conversely, in South Korea, where the development periods of BA.1 and BA.2 overlapped within a two-week interval, the wave resembled that of a single dominant strain, resulting in only modest improvement (a 3.51 percentage-point reduction). Additionally, there are special cases like South Korea’s shift from Delta to BA.1 and Japan’s shift from Delta to BA.1, where stringent public health measures initially suppressed the pandemic, delaying the first peak until BA.1 eventually spread widely. Despite this suppression, our integrated model still captured the subsequent sharp increase, confirming its robustness against such irregular wave patterns. From these findings, we conclude that clearly separated variant waves yield the greatest enhancement in modeling accuracy. Based on Fig 3, focusing on new variants once they reach about 50% prevalence is sufficient to reflect real data without overfitting. Minor sub-variants only marginally affect the model estimates and do not undermine the approach.

In the realm of infectious disease modeling, conventional models such as SIR often encounter challenges when applied to pathogens like SARS-CoV-2, which exhibit frequent mutations and recurrent waves. While combining single heuristic models may appear straightforward, such an approach can risk overfitting and lacks solid biological justification. Previous work has introduced parameters for reinfection [49] or viewed recurrent waves as a train of smaller wavelets [29]. Although these attempts capture some nuances, our model explicitly tracks dominant variants, which allows for improved interpretability and offers enhanced predictive performance in contexts where variant-driven dynamics are prominent. Compared to Lazebnik’s multi-strain SIR framework [42], which provides theoretical insights but lacks real-world prevalence data integration, our approach directly incorporates observed variant frequencies from GISAID. Similarly, while Vasconcelos’s summed SIR approach [41] risks overfitting with six parameters per wave, our logistic approximation achieves comparable flexibility with only three parameters per variant, reducing overfitting risk while maintaining biological interpretability. A comprehensive quantitative comparison with SEIR models and machine learning methods remains an important direction for future research.

Future improvements

Our analysis deals with a period from October 2021 to September 2022. This cutoff reflects data availability constraints rather than methodological limitations. Following the decline of the pandemic emergency phase, global COVID-19 surveillance systems were substantially reduced: the European Centre for Disease Prevention and Control (ECDC) discontinued non-EU/EEA data collection in June 2022 [45]; the World Health Organization declared in August 2023 that daily reporting was no longer required [46]; and the U.S. Centers for Disease Control and Prevention transitioned multiple surveillance systems following the end of the public health emergency in May 2023 [50]. These changes resulted in inconsistent data quality after mid-2022, precluding reliable model extension. Nevertheless, our study period captures four distinct variant-driven waves, providing sufficient temporal coverage for demonstrating the utility of our integrated approach.

Our integrated model can be fitted quickly to data from diverse regions and can be extended to account for additional factors such as super-spreader events, policy relaxations, or changes in vaccine efficacy. In subsequent studies, extending this methodology to alternative models could enhance forecasting reliability and offer deeper insights into multi-variant disease dynamics under various real-world scenarios [5153]. Additionally, incorporating explicit vaccination dynamics, cross-immunity parameters, and real-time variant surveillance data could improve the model’s applicability to endemic-phase COVID-19 and other respiratory pathogens.

Conclusion

In this study, we demonstrated that incorporating variant-specific data into an integrated SIR model significantly enhances its performance in representing the dynamics of COVID-19 outbreaks. Our analysis confirms that the emergence and dominance of new variants are closely associated with the epidemiological waves observed during the pandemic. The integrated model effectively captures these dynamics, especially when the timing of variant emergence is distinct, resulting in improved accuracy over traditional single-strain models. Our findings highlight that the integrated model excels when the periods of variant dominance are temporally separated, allowing for clear bimodal distributions. This approach reduces model error by over 40% in cases with non-overlapping variant emergence, underscoring the importance of accurately modeling distinct waves of infections. Conversely, in scenarios where variants emerge and overlap in time, such as in South Korea, the model still shows improvement, albeit modest, demonstrating its robustness even under less ideal conditions. Comparisons with prior works suggest that indiscriminate aggregation of multiple SIR curves (or multi-wave heuristics) without considering biological evidence can lead to inaccuracies. Our integrated approach, by focusing on dominant variants and avoiding unnecessary complexity, addresses these limitations and provides a more biologically grounded framework. Moreover, the model’s adaptability and ease of application to various regions and data sets make it a valuable tool for ongoing and future pandemic response strategies. Overall, this study contributes to the evolving landscape of infectious disease modeling by offering a more precise and adaptable approach to understanding and estimating the spread of COVID-19 and its variants. The integrated model not only improves our comprehension of the pandemic’s trajectory but also supports the development of targeted and effective public health interventions. Future work should focus on expanding the model’s application to a wider range of scenarios and incorporating additional data sources to further refine its predictive capabilities.

Supporting information

S1 Text. Integrated and Single Models for Variants in Australia, Canada, France and Israel.

Each figure’s bar chart represents daily confirmed cases. The black dashed line indicates the estimates from the single model, and the colored segments represent the estimates from the integrated model, which are the sum of the two variant viruses.

(PDF)

pone.0341667.s001.pdf (311.8KB, pdf)
S2 Text. Integrated and Single Models for Variants in Chile, Denmark, Germany, South Africa and Singapore.

Each figure’s bar chart represents daily confirmed cases. The black dashed line indicates the estimates from the single model, and the colored segments represent the estimates from the integrated model, which are the sum of the two variant viruses.

(PDF)

pone.0341667.s002.pdf (376.4KB, pdf)
S3 Text. Residual time series for single and integrated models in global total and the United States.

Each panel shows residuals (observed minus fitted daily cases) for the single logistic model (dashed lines) and the integrated model (solid lines) across three variant transitions (Delta–BA.1, BA.1–BA.2, and BA.2–BA.5). Columns correspond to variant transitions and rows correspond to regions (World and USA).

(PDF)

pone.0341667.s003.pdf (309.2KB, pdf)
S4 Text. Comparison of error metrics across countries.

MAPE, RMSE, and MAE values for the single model and integrated model across all study countries and variant transitions.

(PDF)

pone.0341667.s004.pdf (66.9KB, pdf)
S5 Text. Error reduction rates for additional countries.

MAPE improvement (%p) for Chile, Denmark, Germany, South Africa, and Singapore across three major variant transitions (Delta–BA.1, BA.1–BA.2, BA.2–BA.5).

(PDF)

pone.0341667.s005.pdf (72.4KB, pdf)
S6 Text. Algorithm of Detection of structural transition in accuracy improvement.

Detailed algorithm for applying PELT to averaged accuracy improvement curves.

(PDF)

pone.0341667.s006.pdf (126.2KB, pdf)

Data Availability

All data and analysis code used in this study are publicly available. The analysis code and processed datasets are deposited at GitHub: https://github.com/alcyonesha/COVID19_variant_wave_model. Raw COVID-19 case data were obtained from Our World in Data (https://ourworldindata.org/). Viral variant data were obtained from GISAID (https://gisaid.org/); access requires registration and acceptance of the GISAID Database Access Agreement. We gratefully acknowledge the contributors of sequences to GISAID for making viral genomic data openly available.

Funding Statement

This research and authors (SB, HC, SL, and CK) are supported by research grants Nos. RS-2023-00262155; 2024-00339583; 2024-00460980; and 2025-02304717 (IITP) funded by the Korea government (Ministry of Science and ICT).

References

  • 1.Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species severe acute respiratory syndrome-related coronavirus: classifying 2019 -nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020;5(4):536–44. doi: 10.1038/s41564-020-0695-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chen X, Chen Z, Azman AS, Deng X, Sun R, Zhao Z, et al. Serological evidence of human infection with SARS-CoV-2: a systematic review and meta-analysis. Lancet Glob Health. 2021;9(5):e598–609. doi: 10.1016/S2214-109X(21)00026-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Worobey M. Dissecting the early COVID-19 cases in Wuhan. Science. 2021;374(6572):1202–4. doi: 10.1126/science.abm4454 [DOI] [PubMed] [Google Scholar]
  • 4.Lauer SA, Grantz KH, Bi Q, Jones FK, Zheng Q, Meredith HR, et al. The incubation period of Coronavirus Disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann Intern Med. 2020;172(9):577–82. doi: 10.7326/M20-0504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mizumoto K, Kagaya K, Chowell G. Early epidemiological assessment of the transmission potential and virulence of coronavirus disease 2019 (COVID-19) in Wuhan City, China, January–February, 2020. BMC Medicine. 2020;18(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rentsch CT, Kidwai-Khan F, Tate JP, Park LS, King JT Jr, Skanderson M, et al. Patterns of COVID-19 testing and mortality by race and ethnicity among United States veterans: a nationwide cohort study. PLoS medicine. 2020;17(9):e1003379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang L, Shi Y, Xiao T, Fu J, Feng X, Mu D, et al. Chinese expert consensus on the perinatal, neonatal management for the prevention and control of the 2019 novel coronavirus infection (First edition). Ann Transl Med. 2020;8(3):47. doi: 10.21037/atm.2020.02.20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhou X, Ma X, Hong N, Su L, Ma Y, He J. Forecasting the worldwide spread of COVID-19 based on logistic model and SEIR model. medRxiv. 2020. [Google Scholar]
  • 9.Lai S, Ruktanonchai NW, Zhou L, Prosper O, Luo W, Floyd JR, et al. Effect of non-pharmaceutical interventions to contain COVID-19 in China. Nature. 2020;585(7825):410–3. doi: 10.1038/s41586-020-2293-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Imai N, Gaythorpe KAM, Abbott S, Bhatia S, van Elsland S, Prem K, et al. Adoption and impact of non-pharmaceutical interventions for COVID-19. Wellcome Open Res. 2020;5:59. doi: 10.12688/wellcomeopenres.15808.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ferguson NM, Laydon D, Nedjati-Gilani G, Imai N, Ainslie K, Baguelin M, et al. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Imperial College COVID-19 Response Team. 2020.
  • 12.Tatapudi H, Das R, Das TK. Impact of vaccine prioritization strategies on mitigating COVID-19: an agent-based simulation study using an urban region in the United States. BMC Med Res Methodol. 2021;21(1):272. doi: 10.1186/s12874-021-01458-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Marinov TT, Marinova RS. Adaptive SIR model with vaccination: simultaneous identification of rates and functions illustrated with COVID-19. Sci Rep. 2022;12(1):15688. doi: 10.1038/s41598-022-20276-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Eyre DW, Taylor D, Purver M, Chapman D, Fowler T, Pouwels KB, et al. Effect of Covid-19 vaccination on transmission of alpha and delta variants. N Engl J Med. 2022;386(8):744–56. doi: 10.1056/NEJMoa2116597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Booton RD, MacGregor L, Vass L, Looker KJ, Hyams C, Bright PD, et al. Estimating the COVID-19 epidemic trajectory and hospital capacity requirements in South West England: a mathematical modelling framework. BMJ Open. 2021;11(1):e041536. doi: 10.1136/bmjopen-2020-041536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kumaresan V, Balachandar N, Poole SF, Myers LJ, Varghese P, Washington V, et al. Fitting and validation of an agent-based model for COVID-19 case forecasting in workplaces and universities. PLoS One. 2023;18(3):e0283517. doi: 10.1371/journal.pone.0283517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Layton AT, Sadria M. Understanding the dynamics of SARS-CoV-2 variants of concern in Ontario, Canada: a modeling study. Sci Rep. 2022;12(1):2114. doi: 10.1038/s41598-022-06159-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Anastassopoulou C, Russo L, Tsakris A, Siettos C. Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PLoS One. 2020;15(3):e0230405. doi: 10.1371/journal.pone.0230405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Friedman J, Liu P, Troeger CE, Carter A, Reiner RC Jr, Barber RM, et al. Predictive performance of international COVID-19 mortality forecasting models. Nat Commun. 2021;12(1):2609. doi: 10.1038/s41467-021-22457-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ioannidis JPA, Cripps S, Tanner MA. Forecasting for COVID-19 has failed. Int J Forecast. 2022;38(2):423–38. doi: 10.1016/j.ijforecast.2020.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lyngse FP, Kirkeby CT, Denwood M, Christiansen LE, Mølbak K, Møller CH, et al. Household transmission of SARS-CoV-2 Omicron variant of concern subvariants BA.1 and BA.2 in Denmark. Nat Commun. 2022;13(1):5760. doi: 10.1038/s41467-022-33498-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Viana R, Moyo S, Amoako DG, Tegally H, Scheepers C, Althaus CL, et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature. 2022;603(7902):679–86. doi: 10.1038/s41586-022-04411-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sanche S, Lin YT, Xu C, Romero-Severson E, Hengartner N, Ke R. High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis. 2020;26(7):1470–7. doi: 10.3201/eid2607.200282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Abu-Raddad LJ, Chemaitelly H, Bertollini R. Effectiveness of mRNA-1273 and BNT162b2 vaccines in Qatar. New England Journal of Medicine. 2022;386(8):799–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhao Q, Meng M, Kumar R, Wu Y, Huang J, Deng Y, et al. Lymphopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a systemic review and meta-analysis. Int J Infect Dis. 2020;96:131–5. doi: 10.1016/j.ijid.2020.04.086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Puhach O, Adea K, Hulo N, Sattonnet P, Genecand C, Iten A, et al. Infectious viral load in unvaccinated and vaccinated individuals infected with ancestral, Delta or Omicron SARS-CoV-2. Nat Med. 2022;28(7):1491–500. doi: 10.1038/s41591-022-01816-0 [DOI] [PubMed] [Google Scholar]
  • 27.Merchant H. CoViD-19 may not end as predicted by the SIR model. BMJ. 2020;369. [Google Scholar]
  • 28.Chowell G, Tariq A, Hyman JM. A novel sub-epidemic modeling framework for short-term forecasting epidemic waves. BMC Med. 2019;17(1):164. doi: 10.1186/s12916-019-1406-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kaxiras E, Neofotistos G. Multiple epidemic wave model of the COVID-19 pandemic: modeling study. J Med Internet Res. 2020;22(7):e20912. doi: 10.2196/20912 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Perakis G, Singhvi D, Skali Lami O, Thayaparan L. COVID-19: a multiwave SIR-based model for learning waves. Production and Operations Management. 2023;32(5):1471–89. doi: 10.1111/poms.13681 [DOI] [Google Scholar]
  • 31.Reiner R, Collins JK, COVID-19 Forecasting Team, Murray CJL. Forecasting the trajectory of the COVID-19 pandemic into 2023 under plausible variant and intervention scenarios: a global modelling study. medRxiv. 2023. [Google Scholar]
  • 32.Rader B, Scarpino SV, Nande A, Hill AL, Adlam B, Reiner RC, et al. Crowding and the shape of COVID-19 epidemics. Nat Med. 2020;26(12):1829–34. doi: 10.1038/s41591-020-1104-0 [DOI] [PubMed] [Google Scholar]
  • 33.Hao X, Cheng S, Wu D, Wu T, Lin X, Wang C. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature. 2020;584(7821):420–4. doi: 10.1038/s41586-020-2554-8 [DOI] [PubMed] [Google Scholar]
  • 34.González-Leonardo M, Potančoková M, Yildiz D, Rowe F. Quantifying the impact of COVID-19 on immigration in receiving high-income countries. PLoS One. 2023;18(1):e0280324. doi: 10.1371/journal.pone.0280324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chatterjee S, Bhattacharya M, Nag S, Dhama K, Chakraborty C. A detailed overview of SARS-CoV-2 omicron: its sub-variants, mutations and pathophysiology, clinical characteristics, immunological landscape, immune escape, and therapies. Viruses. 2023;15(1):167. doi: 10.3390/v15010167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ravi V, Saxena S, Panda PS. Basic virology of SARS-CoV 2. Indian J Med Microbiol. 2022;40(2):182–6. doi: 10.1016/j.ijmmb.2022.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Baek J, Farias VF, Georgescu A, Levi R, Peng T, Sinha D. The limits to learning an SIR process: granular forecasting for COVID-19. arXiv preprint 2020. https://arxiv.org/abs/2006.06373 [Google Scholar]
  • 38.Brauer F. Compartmental models in epidemiology. Mathematical epidemiology. Springer; 2008. p. 19–79. [Google Scholar]
  • 39.IHME COVID-19 Health Service Utilization Forecasting Team, Murray CJL. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. MedRxiv. 2020. [Google Scholar]
  • 40.Tat Dat T, Frédéric P, Hang NTT, Jules M, Duc Thang N, Piffault C, et al. Epidemic dynamics via wavelet theory and machine learning with applications to covid-19. Biology (Basel). 2020;9(12):477. doi: 10.3390/biology9120477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Vasconcelos GL, Brum AA, Almeida FAG, Macêdo AMS, Duarte-Filho GC, Ospina R. Standard and anomalous waves of COVID-19: a multiple-wave growth model for epidemics. Braz J Phys. 2021;51(6):1867–83. doi: 10.1007/s13538-021-00996-3 [DOI] [Google Scholar]
  • 42.Lazebnik T, Bunimovich-Mendrazitsky S. Generic approach for mathematical model of multi-strain pandemics. PLoS One. 2022;17(4):e0260683. doi: 10.1371/journal.pone.0260683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Center for Systems Science and Engineering (CSSE) at JHU. COVID-19 Data Repository. 2023. https://github.com/CSSEGISandData/COVID-19
  • 44.Our World in Data. Our World in Data. 2022. https://ourworldindata.org/
  • 45.European Centre for Disease Prevention and Control. How ECDC collects and processes COVID-19 data. 2022. https://www.ecdc.europa.eu/en/covid-19/data-collection
  • 46.World Health Organization. Update to requirements for reporting COVID-19 surveillance data under the International Health Regulations (IHR 2005). 2023. https://www.who.int/publications-detail-redirect/WHO-2019-nCoV-Surveillance-Guidance-Addendum-2023-1
  • 47.GISAID. GISAID. 2022. https://www.gisaid.org/
  • 48.Chen RJC, Bloomfield P, Fu JS. An evaluation of alternative forecasting methods to recreation visitation. Journal of Leisure Research. 2003;35(4):441–54. doi: 10.1080/00222216.2003.11950005 [DOI] [Google Scholar]
  • 49.Ghosh K, Ghosh AK. Study of COVID-19 epidemiological evolution in India with a multi-wave SIR model. Nonlinear Dyn. 2022;109(1):47–55. doi: 10.1007/s11071-022-07471-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Silk BJ, Scobie HM, Duck WM, Palmer T, Ahmad FB, Baumgart AM, et al. Correlations, timeliness of COVID-19 surveillance data sources, indicators — United States and October 1 2020 –March 22, 2023. MMWR Morb Mortal Wkly Rep. 2023;72(19):529–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8. doi: 10.7861/futurehosp.6-2-94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Wang L, Zhang Y, Wang D, Tong X, Liu T, Zhang S, et al. Artificial Intelligence for COVID-19: A Systematic Review. Front Med (Lausanne). 2021;8:704256. doi: 10.3389/fmed.2021.704256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Hu Y, Jacob J, Parker GJM, Hawkes DJ, Hurst JR, Stoyanov D. The challenges of deploying artificial intelligence models in a rapidly evolving pandemic. Nat Mach Intell. 2020;2(6):298–300. doi: 10.1038/s42256-020-0185-2 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text. Integrated and Single Models for Variants in Australia, Canada, France and Israel.

Each figure’s bar chart represents daily confirmed cases. The black dashed line indicates the estimates from the single model, and the colored segments represent the estimates from the integrated model, which are the sum of the two variant viruses.

(PDF)

pone.0341667.s001.pdf (311.8KB, pdf)
S2 Text. Integrated and Single Models for Variants in Chile, Denmark, Germany, South Africa and Singapore.

Each figure’s bar chart represents daily confirmed cases. The black dashed line indicates the estimates from the single model, and the colored segments represent the estimates from the integrated model, which are the sum of the two variant viruses.

(PDF)

pone.0341667.s002.pdf (376.4KB, pdf)
S3 Text. Residual time series for single and integrated models in global total and the United States.

Each panel shows residuals (observed minus fitted daily cases) for the single logistic model (dashed lines) and the integrated model (solid lines) across three variant transitions (Delta–BA.1, BA.1–BA.2, and BA.2–BA.5). Columns correspond to variant transitions and rows correspond to regions (World and USA).

(PDF)

pone.0341667.s003.pdf (309.2KB, pdf)
S4 Text. Comparison of error metrics across countries.

MAPE, RMSE, and MAE values for the single model and integrated model across all study countries and variant transitions.

(PDF)

pone.0341667.s004.pdf (66.9KB, pdf)
S5 Text. Error reduction rates for additional countries.

MAPE improvement (%p) for Chile, Denmark, Germany, South Africa, and Singapore across three major variant transitions (Delta–BA.1, BA.1–BA.2, BA.2–BA.5).

(PDF)

pone.0341667.s005.pdf (72.4KB, pdf)
S6 Text. Algorithm of Detection of structural transition in accuracy improvement.

Detailed algorithm for applying PELT to averaged accuracy improvement curves.

(PDF)

pone.0341667.s006.pdf (126.2KB, pdf)

Data Availability Statement

All data and analysis code used in this study are publicly available. The analysis code and processed datasets are deposited at GitHub: https://github.com/alcyonesha/COVID19_variant_wave_model. Raw COVID-19 case data were obtained from Our World in Data (https://ourworldindata.org/). Viral variant data were obtained from GISAID (https://gisaid.org/); access requires registration and acceptance of the GISAID Database Access Agreement. We gratefully acknowledge the contributors of sequences to GISAID for making viral genomic data openly available.


Articles from PLOS One are provided here courtesy of PLOS

RESOURCES