Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2025 Jun 6;194(9):2489–2498. doi: 10.1093/aje/kwaf119

Inferring temporal trends of multiple pathogens, variants, subtypes or serotypes from routine surveillance data

Oliver Eales 1,2,, Saras M Windecker 3, James M McCaw 4,5, Freya M Shearer 6,7
PMCID: PMC12799599  PMID: 40481640

Abstract

Estimating the temporal trends in infectious disease activity is crucial for monitoring disease spread and the impact of interventions. Surveillance indicators routinely collected to monitor these trends are often a composite of multiple pathogens. For example, “influenza-like illness”—routinely monitored as a proxy for influenza infections—is a symptom definition that could be caused by a wide range of pathogens, including multiple subtypes of influenza, SARS-CoV-2, and RSV. Inferred trends from such composite time series may not reflect the trends of any one of the component pathogens, each of which can exhibit distinct dynamics. Although many surveillance systems routinely test a subset of individuals contributing to a surveillance indicator—providing information on the relative contribution of the component pathogens—trends may be obscured by time-varying testing rates or substantial noise in the observation process. Here we develop a general statistical framework for inferring temporal trends of multiple pathogens from routinely collected surveillance data. We demonstrate its application to three different surveillance systems covering multiple pathogens (influenza, SARS-CoV-2, dengue), locations (Australia, Singapore, USA, Taiwan, UK), scenarios (seasonal epidemics, non-seasonal epidemics, pandemic emergence), and temporal reporting resolutions (weekly, daily). This methodology is applicable to a wide range of pathogens and surveillance systems.

Keywords: influenza subtypes, SARS-CoV-2 variants, dengue serotypes, Bayesian time-series analysis, statistical modeling, pathogen dynamics

Introduction

The time series of incidence of new infections through time is an important metric for monitoring the spread of an infectious disease and the impact of interventions. The true time series of infection incidence cannot be measured—to do so would require that all infections (including asymptomatic and mild infections) were detected and their timing known. Instead, epidemic dynamics are inferred using the time series of quantities more amenable to surveillance that are expected to correlate with infection levels (eg, the daily number of symptomatic cases).1 However, these surveillance indicators often reflect trends in a composite of multiple pathogens, variants, subtype, or serotypes that each exhibit their own distinct dynamics2-4 (hereafter composite time series). For example, the surveillance of respiratory pathogens such as influenza relies on the monitoring of “influenza-like illness” (ILI) and “acute respiratory infections” (ARI). Both surveillance indicators are based on individuals reporting to healthcare with symptoms, which could be caused by (infection with) a wide range of pathogens (eg, SARS-CoV-2, influenza, RSV), including multiple influenza types and subtypes.5,6 Additionally, even when a surveillance indicator is only composed of confirmed infections (ie, cases), it may contain multiple distinct signals. For example, SARS-CoV-2 case time series have often been composed of multiple highly (genetically) distinct variants7; and dengue virus infections can be caused by any of four dengue serotypes.

Inferring epidemic dynamics from composite time series, under the assumption that they correlate with the infection levels of only a single pathogen, can lead to biased results. For example, during a period of variant replacement (eg, the replacement of the SARS-CoV-2 Alpha variant during the emergence of the Delta variant), the incidence of one variant may be increasing, while that of the other declines. Analysis of the combined signal would obscure the risk of a future resurgence in infections. Disease prediction could be improved—enhancing the effectiveness of public health responses—if the epidemic dynamics of individual pathogens could be disentangled5 from a composite time series.

Many countries collect additional data on component pathogens that contribute to the composite time series. For example: individuals presenting with ILI may be tested for influenza with positive cases typed and subtyped; SARS-CoV-2 cases may be sequenced and the variant determined; and laboratory confirmed dengue virus infections may be serotyped. This data are currently underutilized in public health surveillance reports; the time series of component data is regularly presented separately to the composite time series as a stacked bar chart.8,9 Although this may allow visual inspection of trends in the pathogen represented by the bottom bar, it obscures trends of the other “stacked” pathogens. Additionally, virological testing rates can vary over time obscuring temporal trends in infections, and when testing rates are low, substantial noise will be present in daily or even weekly data, hampering visual interpretation. If statistical methods existed for combining composite time series with data on the component pathogens, the individual dynamics of each pathogen could be inferred and visualized. For example, during the COVID-19 pandemic models were developed to infer the relative-growth trends of novel variants of concern, particularly during their emergence.3,10,11 Additionally, some mechanistic models have been fitted to time series of ILI and testing data simultaneously12,13 However, these models have often been disease and context specific and so may not be generalizable.

We present a general methodological framework for inferring the epidemic dynamics of multiple pathogens, variants, subtypes or serotypes from routinely collected pathogen surveillance data. Individual epidemic curves of each pathogen are modeled as distinct stochastic processes and fit to data describing the relative contribution of each pathogen to the component time series. Simultaneously, the sum of all epidemic curves is fit to the composite time series. The methodological framework is designed to be applicable to a broad range of pathogens and surveillance systems. Here we apply it to data from three distinct pathogen surveillance systems to: (1) infer the dynamics of influenza subtypes in Australia, Singapore, and the United States of America (USA) from 2012 to 2024; (2) quantify the role of SARS-CoV-2 variants in driving epidemic dynamics from 2020 to 2023 in the United Kingdom (UK); and (3) describe the contribution of dengue virus serotypes to dengue dynamics in Taiwan (province of China) from 2006–2016 and during a recent large outbreak in late 2023. For each of these case studies, we present the inferred temporal trends of the composite time series and all component pathogens. Our analysis provides greater insight into the epidemiological dynamics of the case study pathogens, and demonstrates how our methodological framework could be widely used to augment public health surveillance reporting.

Methods

Statistical modeling framework

We extend an existing Bayesian statistical modeling framework for inferring the trends of up to two pathogens from infection prevalence data.3,14 The modeling framework has been extended in this study to handle any number of pathogens; fit to time series data of counts (eg, daily number of cases); incorporate influenza testing data in which the subtype for influenza A samples may be undetermined; account for day-of-the-week effects in daily data; include options for fitting penalized splines or random walks; support additional (optional) correlation structures in the parameters describing the smoothness of the penalized splines (or random walks); and account for additional (optional) sources of noise in the observation process. The model is described in full in the Supplementary Methods (Appendix S1). The model takes as input:

  1. a time series of count data (eg, the daily number of influenza-like illness cases) where it is expected that the signal is composed of multiple pathogens (ie, the composite time series); and

  2. a time series of count data describing the number of tests positive for each component pathogen of interest and the number of negative tests where applicable (ie, the component time series).

The model then estimates the expected value of the time series (eg, a smoothed trend in the daily number of cases accounting for noise) for each individual pathogen. The model is non-parametric, modeling the epidemic curves of each pathogen as distinct stochastic processes—random-walks for weekly data (ILI) and penalized-splines for daily data (dengue and SARS-CoV-2). For the penalized-spline implementation, we set the number of days between adjacent knots to be five days. This has been demonstrated to be an appropriate choice when fitting to SARS-CoV-2 infection prevalence time series14 as it captures most of the dynamical effects (fine enough temporal resolution) while not being too computationally expensive—the main benefit of using penalized splines over random walks. From the modeled trend for each pathogen, we can estimate the pathogen's growth rate and effective reproduction number, Inline graphic, over time.14 By comparing the modeled trends of two pathogens, we can estimate the relative growth rate advantage (additive) and Inline graphic advantage (multiplicative) over time.3 All code is publicly available (see Data and code availability).

Data

We applied the statistical modeling framework to datasets from three different pathogen surveillance systems. We demonstrate the broad applicability and utility of the methodology by inferring the temporal dynamics of all component pathogens for each case study, highlighting key methodological features and epidemiological insights. We investigate: how sampling rates for component pathogens affect inferred trends in their temporal dynamics (influenza-like illness case study); how important epidemic signals can be identified by incorporating component pathogen data in real-time analysis (SARS-CoV-2 case study); and how the posterior distribution of the model for previous seasons can inform estimates of temporal dynamics in subsequent seasons (dengue case study). For all three pathogens, we report the epidemic dynamics as inferred from the case data, while noting that we are unable to validate against the underlying infection dynamics due to the absence of data on infections (eg, infection prevalence studies15).

Influenza-like illness

Influenza-like illness data were retrieved from the World Health Organization’s Global Influenza Programme.16 We collated weekly data for Australia, Singapore, and the USA from the week starting January 2, 2012 to the week starting December 25, 2023 inclusive. The data described: (1) the weekly number of cases of influenza-like illness; and (2) the weekly number of specimens positive for influenza by subtype (and the number of negative tests). We grouped the influenza specimens into: influenza A subtype not determined; influenza A H3N2; influenza A H1N1 (influenza A H1N1, influenza A H1N1pdm09); influenza B (influenza B Yamagata, influenza B Victoria, influenza B lineage not determined); and negative tests (Figures S1S3). Negative tests likely reflect individuals infected with other ILI-causing pathogens or other unknown causes which we do not have data to determine. We fit the statistical model to these weekly data assuming that the ILI case time series was composed of cases of influenza A H3N2, influenza A H1N1, influenza B, and “unknown”.

SARS-CoV-2 cases

SARS-CoV-2 case data for the United Kingdom were retrieved from the United Kingdom Health Security Agency’s data dashboard.17 We downloaded the daily SARS-CoV-2 case numbers by specimen date for the United Kingdom from 2020–2022.

Additionally, we downloaded data describing the daily number of variants detected by collection date7 (Figure S4). The data classified all sequences based on “major lineage calls”. We considered 11 groupings of the variants based on their major lineage calls: B.1.177; B.1.1.7 (Alpha variant); B.1.617.2 (Delta variant); BA.1 (Omicron BA.1 variant); BA.2 (Omicron BA.2 variant), BA.4 (Omicron BA.4 variant); BA.5 (Omicron BA.5 variant); BA.2.75 (Omicron BA.2.75 variant); BQ.1 (Omicron BQ.1 variant); XBB (a recombinant of omicron sub-variants); and other. The other category included all lineages with a designation not consistent with any of the major lineage calls. The lineages grouped in the other category are entirely wild type lineages until late-March 2021, and after March 2021 are almost entirely recombinant variants (excluding XBB). Note that B.1.177 was a variant that became dominant in Europe but was not estimated to have any transmission advantage.18

We considered the period of time from September 23, 2020 to December 31, 2022. The earliest date considered was chosen as the first in which there were 10 samples in the variant dataset. We fit the statistical model to this daily data assuming that the SARS-CoV-2 case time series was composed of cases from all 11 variant groupings. Note that in the periods before a variant had emerged no detections of that variant would be present in the data, and so modeled estimates of the variant’s activity would be close to zero (ie, <<1 modeled cases).

Dengue cases

Dengue case data were retrieved from the Taiwan Centers for Disease Control.19 We downloaded the daily number of dengue fever cases detected. For a subset of cases, the serotype of infection (1, 2, 3 or 4) was known. We aggregated the data into: (1) a time series describing the daily number of dengue cases by onset date; and (2) a time series describing the daily number of cases for which the serotype was determined, by serotype (Figure S5).

We considered the period of time from April 1, 2006 to March 31, 2016, and the period of time from April 1, 2023 to March 31, 2024. We fit the statistical model to each period of daily data separately assuming that the Dengue case time series contained contributions from the four dengue serotypes. We considered periods running from April to March so that we only include full dengue seasons (dengue activity was typically at a minimum around April). We considered the 2023 season in isolation (see Appendix S1: Supplementary Methods) as there was very limited dengue activity from April 2016 to April 2023.

Results

We applied our statistical modeling framework to three pathogen surveillance systems monitoring influenza-like illness, SARS-CoV-2, and dengue cases. Below we present the inferred temporal trends in all component pathogens for each case study.

Temporal trends in influenza

We estimated the long-term dynamics of influenza subtypes in Australia, Singapore and the USA (Figure 1) from 2012 to 2023 using weekly ILI case numbers and data describing influenza testing/subtyping (ie, the weekly number of tests determining influenza infection and subtype for a subset of ILI cases). In the USA and Australia, there were distinct seasonal peaks of ILI during (their respective) winter months. Inferred influenza subtype dynamics also exhibited distinct seasonal peaks with similar peak timings to ILI (Figure 1, Figures S6 and S7). Peaks in influenza activity were sharper compared to ILI; influenza epidemic activity lasted between 3 and 6 months, with little to no activity between seasons. In contrast, ILI activity was observed all year round with increases in ILI activity often occurring months before any (inferred) increases in influenza. In some influenza seasons, epidemic activity was only observed for a single influenza subtype (eg, 2012–2013, 2013–2014, 2014–2015 for the USA), but in many seasons there were multiple overlapping influenza epidemics. During these multi-subtype seasons, there was a high degree of synchrony in modeled estimates of epidemic onset and peak timing between influenza subtypes. In Singapore, the timing of epidemics of each influenza subtype was far more variable (compared to the USA and Australia), often with sustained levels of cases between epidemic waves (eg, H3N2 2016–2018). There were also periods in Singapore with simultaneous epidemic activity of multiple influenza subtypes (eg, 2016–2020).

Figure 1.

Figure 1

Temporal trends in influenza-like illness and influenza subtypes in the USA, Australia and Singapore. Modeled weekly number of cases attributable to influenza A H3N2 (red), influenza A H1N1 (blue), influenza B (yellow), and not attributable to influenza, “unknown”, (green). Weekly number of cases of influenza-like illness (black points) and modeled total number of cases of influenza-like illness (black line). All modeled estimates are shown with median (line) and central 50% (dark shaded region) and 95% (light shaded region) credible intervals.

A high proportion of ILI cases were not attributable to influenza infections (unknown etiological agent) in all three countries. The proportion of ILI attributable to influenza infections was greatest (relatively consistently) in Singapore (Figures S8S11), where the influenza positive proportion regularly reached greater than 60% (maximum value of 86%) at the peak of influenza activity. In contrast, in Australia and the USA the proportion of ILI attributable to influenza infections regularly peaked at values between 25% and 35% (maximum value of 40% for Australia and 38% for the USA), even in years when modeled influenza activity was very different (eg, compare 2015–2016 influenza season in USA to other seasons in Figures S6 and S8). For much of 2020 and 2021 (during the SARS-CoV-2 pandemic), almost all ILI cases were unattributable to influenza infections across all three countries. Following the pandemic period, the proportion of ILI attributable to influenza infections has been lower (on average) in all three countries relative to the pre-pandemic period.

When using lower sampling rates for influenza testing/subtyping than occurred in practice (Figures S12-S14), our modeled estimates of subtype dynamics display an increased degree of uncertainty (wider credible intervals) with some features of the epidemic curves—identified at higher sampling rates—obscured. For example, in Singapore from late-2017 to early-2018, at a sampling rate of 20 tests/week a single broad peak of influenza H3N2 was inferred, whereas at higher sampling rates (ie, 50 tests/week max) a double peak was identified. There was a large degree of agreement between modeled estimates of the total ILI and ILI not attributable to influenza infections (and conversely ILI attributable to influenza), made at all sampling rates considered.

Temporal trends of SARS-CoV-2 variants

We estimated the dynamics of competing SARS-CoV-2 variants in the United Kingdom (Figure 2, Figure S15) from September 23, 2020 to December 31, 2022 using daily case numbers and the daily number of variants detected (from a subset of cases that underwent sequencing). We captured the dynamics of several periods of variant replacement as novel variants emerged. In November 2020, the Alpha variant (B.1.1.7) emerged with a relatively higher estimated growth rate than existing wildtype variants and the B.1.177 lineage, and Alpha became the sole variant circulating by mid-March 2021. By late-2022, there was a greater variant diversity with multiple variants circulating in the population simultaneously. Omicron BA.1 exhibited the fastest growth rate—on November 29, 2021 (the first day Omicron BA.1 was detected in the UK in the dataset) the growth rate was estimated to be 0.36 (95% CrI: 0.28, 0.45) corresponding to a doubling time of 1.92 (95% CrI: 1.55, 2.47) days. The growth rate of BA.1 then rapidly decreased, becoming less than 0 (the threshold for epidemic decline) only a month later. The replacement of Delta with Omicron BA.1 also reflected the highest growth rate advantage of one variant over the previously dominant variant (Figure S16). Similarly, this period of replacement reflected the highest multiplicative Inline graphic advantage, even when estimated under the assumption that the generation interval declined (Figures S16 and S17) with each successive variant (see Appendix S1: Supplementary Methods).

Figure 2.

Figure 2

Temporal trends of SARS-CoV-2 variants in the UK. (A). Daily number of cases of SARS-CoV-2 (black points) and modeled total number of cases of SARS-CoV-2 (black line). Shaded box is magnified in the top-right panel. (B) Modeled daily number of cases attributable to each SARS-CoV-2 variant considered (colored). Shaded box is magnified in the bottom-right panel. (C) Daily epidemic growth rate inferred for each pathogen. The dashed black line highlights an epidemic growth rate of 0 (the threshold for epidemic growth or decline). Estimates for the growth rate of each variant are only shown for times after/before their first/last detection in the UK. Estimates for the growth rate of “other” are shown up to January 19, 2021 (when the rolling 7 day average first dropped below one) and then shown after march 5, 2022 (when the rolling 7 day average next exceeded one). All lineages categorized as other prior to January 19, 2021 were wild type strains. All estimates are shown with median (line) and central 50% (dark shaded region) and 95% (light shaded region) credible intervals.

Our model can identify increasing components in a composite time series (eg, SARS-CoV-2 cases), even when the composite time series overall is not increasing. We estimated SARS-CoV-2 case dynamics at six different timepoints (each a week apart) from May 2021–June 2021 (Figure 3) during the emergence of the Delta variant (the first time point was the final day of the week in which the Delta variant was first recorded in the dataset). We compared estimates of growth rates incorporating the component time series (the daily number of each variant detected) and using only the composite time series. When variant data were included in the model, we estimated that Delta cases were likely increasing, although with high uncertainty, at the first two time points, and by the third time point we estimated that Delta cases were increasing with a high degree of certainty (95% CrI for growth rate was greater than 0). In contrast, when only the composite time series was included, there was no clear signal of SARS-CoV-2 cases increasing until the fifth time-point (ie, five weeks after the Delta variant was first recorded in the dataset) and the overall case growth rate estimated was less than the inferred growth rate of the Delta variant.

Figure 3.

Figure 3

Competing SARS-CoV-2 variant dynamics during the introduction of the Delta variant in the UK. (A) Daily number of cases of SARS-CoV-2 (points) and modeled cases of SARS-CoV-2 attributable to the alpha variant (blue), attributable to the Delta variant (red) and overall (“Total”, green). (B) Daily number of cases of SARS-CoV-2 (points) and modeled total cases of SARS-CoV-2 (black) using a model that does not consider the dynamics of different variants (ie, variant data is not included in the model). (C) Daily epidemic growth rates inferred for the Delta variant (red), the alpha variant (blue), and for all SARS-CoV-2 cases not accounting for competing variant dynamics (from the modeled estimates in B). Estimates for the growth rate of the Delta variant are only shown for times after the first Delta variant had been detected. All estimates are shown with median (line) and central 50% (dark shaded region) and 95% (light shaded region) credible intervals. Models were fit at six different timepoints (see Figure S4) reflecting 1-6 weeks following the first detected Delta variant in the UK).

Temporal trends of dengue serotypes

We estimated the multi-season dynamics of dengue serotypes in Taiwan (province of China) (Figures 4 and 5) using daily dengue (confirmed) case numbers and serotyping data (ie, the confirmed serotype of infection for a subset of cases) from 2006 to 2016 and 2023 to 2024. The model was fit to the 2023 season in isolation using prior distributions informed by the posterior distribution of the model fit to data from 2006 to 2016 (see Appendix S1: Supplementary Methods). There was a seasonal structure to the estimated epidemic dynamics (Figures 4 and 5; Figure S18) with limited epidemic activity from January–June each year, and epidemic peaks occurring in approximately October–November and in some years a second peak in July–August (eg, in 2007). We estimated the proportion of dengue cases caused by each serotype for each season (Figure S19). In most seasons our estimates were consistent with a crude approach (calculating the proportion from the raw serotyping data) that did not account for different dynamics between serotypes, but there were significant differences in 2014 and 2015. The dengue dynamics for a single season were often dominated by a single serotype that was responsible for the majority of cases. The main exception was the 2010 season in which all four dengue serotypes contributed substantially, and to a lesser extent the 2012 season. Even for seasons where we estimated a dominant serotype, there was still limited circulation of at least one, and at times two, other serotype(s) (eg, in 2011).

Figure 4.

Figure 4

Temporal trends of dengue serotypes in Taiwan (province of China) for the 2006–2015 dengue seasons. Top panel: Daily number of cases of dengue (black points) and modeled total number of cases of dengue (black line). The y-axis is scaled by a factor of 10 for April 1, 2014 onwards. Middle panel: Modeled daily number of cases attributable to each dengue serotype (colored). The y-axis is scaled by a factor of 10 for April 1, 2014 onwards. Bottom panel: Daily epidemic growth rate inferred for each serotype. Estimates of the growth rate for a serotype are only shown when the modeled serotype case numbers were greater than 0.5. All estimates are shown with median (line) and central 50% (dark shaded region) and 95% (light shaded region) credible intervals.

Figure 5.

Figure 5

Temporal trends of dengue serotypes in Taiwan (province of China) for the 2023 dengue season. Top panel: Daily number of cases of dengue (black points) and modeled total number of cases of dengue (black line) and cases attributable to each dengue serotype (colored). Bottom panel: Daily epidemic growth rate inferred for each serotype. Estimates of the growth rate for a serotype are only shown when the modeled serotype case numbers were greater than 0.5. All estimates are shown with median (line) and central 50% (dark shaded region) and 95% (light shaded region) credible intervals.

Discussion

We have developed a method for inferring the epidemic dynamics of multiple pathogens, variants, subtypes or serotypes from routinely collected surveillance data. The method is pathogen-agnostic and we have demonstrated its application to three different pathogen surveillance systems, each of which produce data on a composite time series (eg, daily number of cases) and component time series (eg, daily number of laboratory confirmed infections by pathogen). The systems considered cover multiple pathogens (influenza, SARS-CoV-2, dengue), locations (Australia, Singapore, USA, Taiwan, UK), scenarios (seasonal epidemics, non-seasonal epidemics, pandemic emergence) and data structures (weekly, daily, daily with day-of-the-week effects).

Routine surveillance of influenza predominantly relies on the symptom-based surveillance of influenza-like illness. We have demonstrated that a significant proportion of ILI is not attributable to influenza infections and have estimated the true underlying dynamics of each influenza subtype. The quantity estimated by our model is equivalent to the “subtype-specific ILI+”,20 a surveillance indicator(s) that is crudely calculated by multiplying the number of ILI cases by the proportion of tests positive for a specific subtype. The subtype-specific ILI+ has previously been suggested to be more representative of the overall and subtype-specific influenza dynamics compared to ILI.1 However, trends in crude estimates of subtype-specific ILI+ may be obscured (limiting its utility) when there is noise in the time series of ILI (the composite time series) and/or the time series describing influenza testing and subtyping data (the component time series). Our statistical model can estimate the expected smooth trends (and credible intervals) in the subtype-specific ILI+ accounting for noise in the observation processes. Improved estimation of the underlying dynamics of influenza infections (overall and by subtype) can help refine estimates of influenza season onset, which is important for optimizing the timing of vaccination campaigns. In all three settings investigated, we observed that increases in ILI cases can occur notably earlier than actual increases in influenza infections, suggesting that ILI is a poor indicator for determining influenza season onset.

Influenza-like illness is caused by a wide range of non-influenza pathogens with similar symptom profiles. As one would expect, the proportion of ILI attributable to influenza appeared to have decreased following the emergence of SARS-CoV-2; the continued circulation of (ILI causing) SARS-CoV-2 will likely continue to influence trends in the time series of ILI into the future.6 The time series of ILI has likely always been biased as an influenza surveillance indicator due to the circulation of other respiratory pathogens such as RSV, rhinovirus, and parainfluenza,21 but now those biases may have changed due to the emergence of SARS-CoV-2. If a subset of ILI cases were tested for a wide array of pathogens through virological testing (as opposed to just testing for influenza and subtyping) the positive proportion of many more ILI-causing pathogens (including SARS-CoV-2) could be measured4,13 and underlying trends inferred using the methods we have presented.

Even when all cases in a time series are confirmed infections (as opposed to those with a symptomatic diagnosis) the overall signal detected may be a composition of multiple signals. For example, we demonstrated that the composite time series of detected dengue and SARS-CoV-2 infections concealed distinct serotype and variant dynamics respectively. Similarly, if a composite time series of laboratory confirmed influenza infections was used as a surveillance indicator for influenza, it would obscure distinct epidemic dynamics of influenza subtypes. Disentangling these distinct dynamics can be critical for predicting future epidemic trends. For example, patterns of population immunity may be dependent on the population’s past exposure levels to each pathogen.22,23 Additionally, disentangling each pathogen’s dynamics can be important for assessing the effectiveness of past interventions—pharmaceutical (eg, antivirals, vaccines) and non-pharmaceutical (eg, school closures)—against each individual pathogen so that their effectiveness at later times (when a different mixture of pathogens may be circulating) can be anticipated.

Improved knowledge of the individual dynamics of pathogens contributing to a composite time series can improve estimation of current trends5 (ie, real-time analysis). In general, if current trends continue into the future, then the composite time series will be dominated by the pathogen with the largest growth rate at the current time point. Outputs from our model can be used to estimate the growth rates of all component pathogens and therefore predict which pathogens are increasing at the fastest rate, and so which pathogens are likely to dominate the composite time series in the future. We have retrospectively demonstrated how the model might have been used for real-time analysis during the emergence of the SARS-CoV-2 Delta variant in the United Kingdom. While trends in modeled Delta cases suggested an increasing component in the SARS-CoV-2 case time series, increasing cases would not have been predicted (or identified) until five weeks after the first Delta sequence was isolated if sequencing data were not included in the analysis. Our estimates of current growth rates (ie, estimates up to the most recent day) appeared to be robust to the addition of future data points with credible intervals overlapping with estimates made using additional data points. However, a formal performance evaluation is needed to further establish the model’s utility as a tool for real-time analysis.

Our model has limitations. The model assumes that infection incidence increases/decreases exponentially over the short-term; this is an appropriate assumption for many pathogens that exhibit exponential growth in the early phase of an epidemic but may not be appropriate for pathogens, such as measles, that have been observed to grow sub-exponentially.24 For the penalized-spline implementation of the model, the number of days between adjacent knots has to be set initially. If the number of days is set too high, then the model will not be able to capture temporal trends that are at a finer resolution. We assumed that five days between knots would be a fine enough temporal resolution to capture dynamical effects in the time-series for dengue and SARS-CoV-2, but the model may have smoothed over any dynamics present that were over shorter time scales. Our model infers pathogen-specific epidemic dynamics at the population-level over an entire region. However, there may be differences in epidemic dynamics between sub-regions and sub-populations (eg, by age group). The model in its current form could be fit to data for a single sub-region or sub-population in isolation; in future the model could be extended to infer trends in pathogen-specific epidemic dynamics stratified across sub-groups simultaneously allowing for common parameters (eg, overdispersion in case data) to be estimated using all available data, reducing uncertainty in modeled estimates.25 Finally, the model can only infer trends that exist in the data. If the data are not a reasonable correlate of the underlying epidemic dynamics, then our modeled estimates will not reflect the true epidemic dynamics. The symptomatic- and case-based surveillance data systems used in this paper have previously been shown to be biased by (for example) circulating pathogens with similar symptom profiles, and changes in healthcare-seeking or test-seeking behavior of the general population through time.1,26 There are limited data available without such biases and so we are unable to validate our estimates against the true values. In the future, a simulation-estimation study could be used for further testing and validation of the methodology.

Virological testing is a highly important component of pathogen surveillance. However, to date there has been limited routine analyses that integrate virological testing data with analysis of other composite epidemic time-series. One exception is analyses performed during the SARS-CoV-2 pandemic that incorporated genomic sequencing data into time series analysis of cases and infection prevalence (which motivated the approach presented here3). We have developed a statistical method to better utilize routine surveillance data and demonstrated its utility for three pathogen surveillance systems measuring: (1) influenza-like illness cases; (2) SARS-CoV-2 cases; and (3) dengue cases. However, the method is pathogen-agnostic and likely has wider applications across other pathogen monitoring systems. For example, RSV strains are separated into two groups (A and B) and measuring the dynamics of each individually may be important for assessing if pharmaceuticals (such as maternal vaccines or monoclonal antibodies) are equally effective against infection with both groups.27 Similarly, Mpox is composed of two distinct clades (I and II) each with two distinct subclades (Ia, Ib, IIa, and IIb). Measuring the dynamics of each (sub)clade individually will be crucial for estimating the spread and transmission advantage, if present, of the newly emergent clade Ib.28 In general, our methodology can enhance the epidemic intelligence obtained from these routine surveillance data, improving our understanding of the dynamics of multiple pathogens, and ultimately improving public health responses.

Supplementary material

Supplementary material is available at American Journal of Epidemiology online.

Supplementary Material

Web_Material_kwaf119
web_material_kwaf119.docx (13.6MB, docx)

Contributor Information

Oliver Eales, Infectious Disease Dynamics Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia.

Saras M Windecker, Infectious Disease Ecology and Modelling, The Kids Research Institute, Perth, Australia.

James M McCaw, Infectious Disease Dynamics Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia.

Freya M Shearer, Infectious Disease Dynamics Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; Infectious Disease Ecology and Modelling, The Kids Research Institute, Perth, Australia.

Funding

OE is supported by a University of Melbourne McKenzie fellowship. FMS is supported by the National Health and Medical Research Council of Australia through the Investigator Grant Scheme (Emerging Leader Fellowship, 2021/GNT2010051). JMM is supported by the Australian Research Council through the Laureate Fellowship Scheme (FL240100126). FMS and JMM’s research is also supported by an Australian Research Council Discovery Project Grant (DP240102286). This research is supported by the Australian Consortium of Epidemic Forecasting and Analytics (ACEFA), a National Health and Medical Research Council of Australia Centre of Research Excellence (2035303).

Conflict of interest

The authors have no conflicts of interest to declare.

Data availability

The code that produced this analysis is publicly available at: https://github.com/acefa-hubs/EpiStrainDynamics/tree/preprint (Zenodo DOI: https://doi.org/10.5281/zenodo.14015867). All the underlying data are publicly available and are also provided at the same location as the code.

References

  • 1. Eales  O, McCaw  JM, Shearer  FM. Biases in routine influenza surveillance indicators used to monitor infection incidence and recommendations for improvement. Influenza Other Respi Viruses.  2024;18(12):e70050. 10.1111/irv.70050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Reiner  RC  Jr, Stoddard  ST, Forshey  BM, et al.  Time-varying, serotype-specific force of infection of dengue virus. Proc Natl Acad Sci U S A.  2014;111(26):E2694-E2702. 10.1073/pnas.1314933111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Eales  O, de  Oliveira  ML, Page  AJ, et al.  Dynamics of competing SARS-CoV-2 variants during the Omicron epidemic in England. Nat Commun.  2022;13(1):1-11. 10.1038/s41467-022-32096-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Perofsky  AC, Hansen  CL, Burstein  R, et al.  Impacts of human mobility on the citywide transmission dynamics of 18 respiratory viruses in pre- and post-COVID-19 pandemic years. Nat Commun.  2024;15(1):4164. 10.1038/s41467-024-48528-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Pei  S, Shaman  J. Aggregating forecasts of multiple respiratory pathogens supports more accurate forecasting of influenza-like illness. PLoS Comput Biol.  2020;16(10):e1008301. 10.1371/journal.pcbi.1008301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Eales  O, Plank  MJ, Cowling  BJ, et al.  Key challenges for respiratory virus surveillance while transitioning out of acute phase of COVID-19 pandemic. Emerg Infect Dis.  2024;30(2):30. 10.3201/eid3002.230768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Lythgoe  KA, Golubchik  T, Hall  M, et al.  Lineage replacement and evolution captured by 3 years of the United Kingdom coronavirus (COVID-19) infection survey. Proc Biol Sci.  2009;2023(290):20231284. 10.1098/rspb.2023.1284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Australian Government Department of Health and Aged Care . Australian Respiratory Surveillance Report 12–26 August to September 8, 2024. Australian Government Department of Health and Aged Care; 2024. https://www.health.gov.au/resources/publications/australian-respiratory-surveillance-report-12-26-august-to-8-september-2024?language=en
  • 9. UK Health Security Agency . National influenza and COVID-19 surveillance report Week 37 report (up to week 36 2024 data). UK Health Security Agency; 2024. https://www.gov.uk/government/statistics/national-flu-and-covid-19-surveillance-reports-2024-to-2025-season [Google Scholar]
  • 10. Volz  E, Mishra  S, Chand  M, et al.  Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature.  2021;593(7858):266-269. 10.1038/s41586-021-03470-x [DOI] [PubMed] [Google Scholar]
  • 11. Bhatia  S, Wardle  J, Nash  RK, et al.  Extending EpiEstim to estimate the transmission advantage of pathogen variants in real-time: SARS-CoV-2 as a case-study. Epidemics.  2023;44:100692. 10.1016/j.epidem.2023.100692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Baguelin  M, Flasche  S, Camacho  A, et al.  Assessing optimal target populations for influenza vaccination programmes: an evidence synthesis and modelling study. PLoS Med.  2013;10(10):e1001527. 10.1371/journal.pmed.1001527 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Pei  S, Teng  X, Lewis  P, et al.  Optimizing respiratory virus surveillance networks using uncertainty propagation. Nat Commun.  2021;12(1):222. 10.1038/s41467-020-20399-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Eales  O, Ainslie  KEC, Walters  CE, et al.  Appropriately smoothing prevalence data to inform estimates of growth rate and reproduction number. Epidemics.  2022;40:100604. 10.1016/j.epidem.2022.100604 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Chadeau-Hyam  M, Tang  D, Eales  O, et al.  Omicron SARS-CoV-2 epidemic in England during February 2022: a series of cross-sectional community surveys. Lancet Reg Health Eur.  2022;21:100462. 10.1016/j.lanepe.2022.100462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. World Health Organization . Global Influenza Programme: Influenza Surveillance Outputs. In: World Health Organisation Global Influenza Programme. [cited 2 Feb 2024]. https://www.who.int/teams/global-influenza-programme/surveillance-and-monitoring/influenza-surveillance-outputs
  • 17. UKHSA data dashboard (COVID-19 archived data) . [cited 1 Nov 2024]. https://ukhsa-dashboard.data.gov.uk/covid-19-archive-data-download
  • 18. Hodcroft  EB, Zuber  M, Nadeau  S, et al.  Spread of a SARS-CoV-2 variant through Europe in the summer of 2020. Nature.  2021;595(7869):707-712. 10.1038/s41586-021-03677-y [DOI] [PubMed] [Google Scholar]
  • 19. Inline graphic  Taiwan Centers for Disease Control. Inline graphic1998Inline graphicInline graphic (Daily confirmed case statistics of dengue fever since 1998). https://data.gov.tw/dataset/21025
  • 20. Kandula  S, Yang  W, Shaman  J. Type- and subtype-specific influenza forecast. Am J Epidemiol.  2017;185(5):395-402. 10.1093/aje/kww211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Reis  J, Shaman  J. Simulation of four respiratory viruses and inference of epidemiological parameters. Infect Dis Model.  2018;3:23-34. 10.1016/j.idm.2018.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Reich  NG, Shrestha  S, King  AA, et al.  Interactions between serotypes of dengue highlight epidemiological impact of cross-immunity. J R Soc Interface.  2013;10(86):20130414. 10.1098/rsif.2013.0414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Perez-Guzman  PN, Knock  E, Imai  N, et al.  Epidemiological drivers of transmissibility and severity of SARS-CoV-2 in England. Nat Commun.  2023;14(1):4279. 10.1038/s41467-023-39661-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Chowell  G, Viboud  C, Simonsen  L, et al.  Characterizing the reproduction number of epidemics with early subexponential growth dynamics. J R Soc Interface.  2016;13(123):20160659. 10.1098/rsif.2016.0659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Eales  O, Haw  D, Wang  H, et al.  Dynamics of SARS-CoV-2 infection hospitalisation and infection fatality ratios over 23 months in England. PLoS Biol.  2023;21(5):e3002118. 10.1371/journal.pbio.3002118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Eales  O, McCaw  JM, Shearer  FM. Challenges in the case-based surveillance of infectious diseases. R Soc Open Sci.  2024;11(8):240202. 10.1098/rsos.240202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Nuttens  C, Moyersoen  J, Curcio  D, et al.  Differences between RSV A and RSV B subgroups and implications for pharmaceutical preventive measures. Infect Dis Ther.  2024;13(8):1725-1742. 10.1007/s40121-024-01012-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Vakaniaki  EH, Kacita  C, Kinganda-Lusamaki  E, et al.  Sustained human outbreak of a new MPXV clade I lineage in eastern Democratic Republic of the Congo. Nat Med.  2024;30(10):2791-2795. 10.1038/s41591-024-03130-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web_Material_kwaf119
web_material_kwaf119.docx (13.6MB, docx)

Data Availability Statement

The code that produced this analysis is publicly available at: https://github.com/acefa-hubs/EpiStrainDynamics/tree/preprint (Zenodo DOI: https://doi.org/10.5281/zenodo.14015867). All the underlying data are publicly available and are also provided at the same location as the code.


Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES