Skip to main content
Wiley - PMC COVID-19 Collection logoLink to Wiley - PMC COVID-19 Collection
. 2022 Oct 23;90(Suppl 1):S82–S95. doi: 10.1111/insr.12529

Global seasonal and pandemic patterns in influenza: An application of longitudinal study designs

Elena N Naumova 1,2,, Ryan B Simpson 1, Bingjie Zhou 1, Meghan A Hartwick 2
PMCID: PMC9874745  PMID: 38607896

Summary

The confluence of growing analytic capacities and global surveillance systems for seasonal infections has created new opportunities to further develop statistical methodology and advance the understanding of the global disease dynamics. We developed a framework to characterise the seasonality of infectious diseases for publicly available global health surveillance data. Specifically, we aimed to estimate the seasonal characteristics and their uncertainty using mixed effects models with harmonic components and the δ‐method and develop multi‐panel visualisations to present complex interplay of seasonal peaks across geographic locations. We compiled a set of 2 422 weekly time series of 14 reported outcomes for 173 Member States from the World Health Organization's (WHO) international influenza virological surveillance system, FluNet, from 02 January 1995 through 20 June 2021. We produced an analecta of data visualisations to describe global travelling waves of influenza while addressing issues of data completeness and credibility. Our results offer directions for further improvements in data collection, reporting, analysis and development of statistical methodology and predictive approaches.

Keywords: data quality, influenza, mixed effects models, surveillance, time series, visualisation, workflow optimization, World Health Organization

1. Introduction

As Nan Laird pointed out in her seminal work, ‘The essential feature of a longitudinal study is taking multiple measurements on the same subjects repeatedly over time’ [Laird, 2022 (submitted for this special issue)]. Several decades of epidemiological research have clearly demonstrated that statistical methodology is capable to power longitudinal studies and able to accommodate a variety of research designs (Laird, 1988; Fitzmaurice et al., 2012). The methodology of longitudinal data analysis is helping to resolve the major conundrum of properly treating individual observations that vary in quantity and quality and address common challenges of data collection schemes (Laird & Ware, 1982; Laird, 1988; Cnaan et al., 1997; Fitzmaurice et al., 2012). The methodology developed to capture changes in a health outcome for an individual over time is now further adapted for systems monitoring health of a group of people, a nation or collectively worldwide.

Repeated measurements, like daily cases of infections, are the cornerstone of modern disease surveillance systems aiming to provide real‐time information for decision makers to act (Fefferman & Naumova, 2010). For a global surveillance system tracking an infectious agent with high pandemic potential, like influenza or coronavirus, each country represents an individual entity with a unique disease trajectory and a set of features influencing changes to that trajectory (Caini et al., 2017; Bloom‐Feshbach et al., 2013; Alonso et al., 2015). Collectively, all countries participating in global disease monitoring form a global profile of the spread, prevalence and seasonal oscillations of the infectious agent. The ability to detect seasonality depends on temporal resolution of data collected over time—specifically, the frequency and regularity and the statistical methods and models are suitably applied (Alarcon Falconi et al., 2020; Simpson et al., 2022a). Modern infectious disease surveillance systems typically report data weekly and even daily for the ongoing 2019 novel coronavirus (COVID‐19) pandemic (United States Centers for Disease Control and Prevention, 2022; Dong et al., 2020). Well established systems, like FluNet, have reported weekly cases of influenza for over two decades with systematic updates to the list of circulating influenza virus subtypes (World Health Organization, 2021). To characterise global trends and periodicities in processes prone to variations, classic time series methods have been adapted and deployed, yet the unavoidable missingness and irregularities in global surveillance reporting create challenges for a broad implementation of these methods. The methodology developed for longitudinal studies offers new insights and flexibility to accommodate the challenges of surveillance data.

Capitalising on the advantages of a mixed linear model, in this paper, we proposed a framework to characterise seasonal behaviours of reported cases of diseases recognising several important aspects. In the presented framework, we stressed the need for assessing high order moments and their relevance for understanding differences and similarities in seasonal patterns in influenza across causing pathogens in time and space. First, after developing a traditional model for a set of time‐referenced observations for each country and exploring temporal patterns for variables of interest, we described ‘the model in terms of the entire vector of all observations’ (Laird & Ware, 1982). Second, we formulated a set of research hypotheses and emphasised critical aspects of the model, specifically the interplay of random and fixed effects. Finally, we clarified the context for using the term linear in the model, which refers not only to the additivity of the fixed and random effects but also to our application of harmonic functions reflecting non‐linear seasonal periodic changes over time. We demonstrated the ability of the framework to simultaneously characterise seasonal behaviours of several outcomes reported by a public global surveillance system. We combined the widely accepted methodology for longitudinal studies with the δ‐method to examine national weekly records of influenza for 14 reported indicators in all participating 173 countries and territories (referred to as Member States) collected by FluNet over a 25‐year period. We presented the findings as a compilation of statistical visualisations to enhance scientific communication and outlined directions for further improvements in data collection, reporting and analysis.

2. Seasonality and The δ‐Method

We defined disease seasonality as a systematic periodic oscillation of an outcome observed over the course of a calendar year (Naumova, 2006). In our early work, we demonstrated the use of the δ‐method to efficiently utilise regression model results and derive meaningful characteristics of seasonality, such as peak timing and amplitude (Naumova & MacNeill, 2007). This approach allowed us to fit periodic fluctuations in a time series with harmonic terms and transform regression coefficients of harmonic terms to quantify the uncertainties of peak timing and amplitude estimates. We applied the δ‐method to implement the formal statistical comparisons of seasonality characteristics and thus better understand spatio‐temporal patterns of disease trajectories by location and infection subtype (Wenger & Naumova, 2010; Chui et al., 2011a; Moorthy et al., 2012). We have documented distinct preferences for seasonal peaks and nadirs of infectious diseases that vary by etiological (Naumova et al., 2007), by geographical (Castronovo et al., 2009; Chui et al., 2009) and by environmental factors (Stashevsky et al., 2019; Ureña‐Castro et al., 2019), yet tend to be synchronised (Wenger & Naumova, 2010; Simpson et al., 2020a).

The increasing quality and quantity of publicly available records permit detailed description of temporal changes. These surveillance improvements have promoted research on travelling waves of infections, which capture how infection peak timing shifts earlier or later at differing latitudes or at increasing distances from the equator to a country's geographic centroid (Caini et al., 2017; Bloom‐Feshbach et al., 2013; Alonso et al., 2015). By applying the δ‐method, we assessed whether shifts could occur in the seasonal characteristics of an infection and judge the extent of detected shifts (Wenger & Naumova, 2010; Alarcon Falconi et al., 2018).

For meaningful seasonality comparisons across locations, time periods or populations, we typically correct for time‐varying population size by presenting weekly or daily counts as rates or using population offsets. We also assume right skewness in the distribution of weekly cases due to the general nature of count data and relatively rare yet pronounced spikes of cases associated with local epidemics. For infections with strong seasonality, spikes often occur close to seasonal peaks. In addition, disease seasonality could be exaggerated by prolonged periods of low incidence, which further contribute to a distribution's skewness. We previously noted that periods of low incidence and restricted pathogen testing capacities are associated with low data completeness (Simpson et al., 2021). Therefore, preliminary data checking and reporting of distributional properties with statistical characteristics (like skewness and kurtosis coefficients) and data completeness (like proportion and structure of missingness) are necessary in epidemiological investigations to determine analytic rigour (Alarcon Falconi et al., 2020). In the presented framework, we stressed the need of assessing high‐order moments and their relevance for understanding differences and similarities in seasonal patterns of influenza across time and space.

A thorough examination of temporal patterns of multiple outcomes across multiple locations requires modellers to design graphical presentations that cover large volumes of information. Dynamic maps and heatmaps are complex graphical presentations and could compress large amounts of spatially and temporally refined data into an animated or a single image and serve as an effective communication tool. In our early work, we created several graphical templates that utilised shared axes and various graph types to reveal hidden trends and patterns, better recognise underlying heterogeneities and uncertainties (Castronovo et al., 2009; Moorthy et al., 2012; Chui et al., 2011b; Simpson et al., 2022b). In this paper, we expanded the use of complex heatmaps and dynamic maps by collating our findings into an analecta of visualisations to demonstrate the detected global patterns and seasonality of influenza over two decades (Naumova et al., 2022).

3. Data and Models

3.1. Selection of Primary Variables of Interest

We extracted national weekly records for 173 Member States reported to WHO's FluNet from 02 January 1995 through 20 June 2021 (World Health Organization, 2021). Records included six testing outcomes (specimens processed, tests, negative tests, and total, influenza A and influenza B positive tests) and eight circulating subtypes (positive tests for A(H1), A(H1N1)pdm09, A(H3), A(H5) A (Unsubtyped), B (Yamagata), B (Victoria) and B (Undetermined)). We extracted country's annual population estimates from the United States Census Bureau International Data Base (IDB) and the Institut National de la Statistique et des Etudes Economiques (INSEE) (United States Census Bureau, 2021; Institut National de la Statistique et des Etudes Economiques, 2021). We calculated rates per million persons by dividing weekly cases by annual population estimates and multiplying by 1 000 000. We reported rates in cases per million persons, or ‘cpm’.

We conducted modelling using three spatial allocation schemes for Member States. First, we abstracted latitudes of the most populous city for each Member State using SimpleMaps.com, which aggregates spatial information from various governmental sources (SimpleMaps.com, 2021). In addition, we used WHO‐defined regions, African (AFRO), Eastern Mediterranean (EMRO), European (EURO), Pan‐American (PAHO), Southeast Asian (SEARO) and Western Pacific (WPRO), and 17 WHO‐defined influenza transmission zones for exploring spatio‐temporal patterns (Supplementary Table S1).

The WHO defined an epidemiologic week as Monday through Sunday beginning with the first full week of each year, resulting in 52 or 53 weeks annually (depending on the start of the full week). Each time series reflected the full study period duration of 651–1 381 weeks and captured 12–26 annual cycles of seasonal influenza and the well‐documented pandemic of 2009. We defined effective time series length (ETSL) as the number of weeks with meaningful information, thereby distinguishing between weeks with no reported cases (0 cases) and weeks with incomplete records (blanks). As different countries could report no records or missing cases differently, we explored this issue in detail elsewhere (Simpson et al., 2021). We calculated the annual completeness, C i,j,k as a fraction of the time series length for which reliable data are available to the overall length of the considered time series, or the number of full weeks between the start and end of the time period, multiplied by 100:

Ci,j,k=ni,j,kL1*100%, (1)

where C i,j,k is completeness for i‐outcome (i = 1–14), j‐country (j = 1–173) and k‐year (k = 1–25); n i,j,k is the number of time units (weeks) in the time series when records are available (e.g. weeks with reported counts ≥0) for i‐outcome, j‐country and k‐year; L 1 is the number of full weeks (52 or 53) for k‐year. For each outcome and location, the full study period duration may vary depending on allocated resources and circulating stains, like for the A(H1N1)pdm09 subtype that only began circulating at the start of the pandemic in 2009. The results are shown in Supplementary Table S2.

Completeness reflected the total amount of usable time series records that influence sample size and statistical power when modelling. After evaluating completeness, we found that 12 Member States reported fewer than 10% records and 3 Member States reported fewer than 5% records for all influenza outcomes, warranting their exclusion from our analysis. Additionally, both specimens processed and negative tests had <15% completeness across Member States, preventing their assessment of seasonality characteristics. Thus, we compiled 2 040 time series of weekly rates (12 outcomes for 170 Member States) for a detailed analysis, out of which we identified 1 660 time series with ETSL of 3+ weeks to examine seasonality features.

3.2. Characterisation of Outcomes and Their Seasonality Features

We estimated values for median weekly rate, interquartile range and the robust version of four moments (L‐mean, L‐scale, L‐skewness and L‐kurtosis) for each time series. For a random variable X, the r th population L‐moment is defined as

λr=r1k=0r11kr1kEXrk:r, (2)

where X k:n is the k th order statistic in an independent sample size of n from the distribution of X and E is the expectation operator value (Hosking, 1990). We estimated sample L‐moments indirectly as probability weighted moments. The results are shown in Supplementary Table S2. We also estimated unadjusted average weekly rates for each outcome of interest by applying a generalised linear model with a negative binomial distributional assumption and log‐link function (GLM‐NB).

lnEYk,j,l=β0, (3)

where Y k,j,l is the times series of weekly rates for k‐outcome for j‐Member State for l‐time interval. To estimate rates and their confidence interval (CI) bounds, we exponentiated regression coefficient estimates: Rk,j=eβ0 and CIRk,j=eβ0±M*σβ0, where M is a relevant constant. The results are shown in Supplementary Table S2. We explored the relationship between average rates, L‐moments and overall completeness metrics by expanding on traditional moment plots (Johnson et al., 1994).

Outbreaks of influenza appeared seasonal, except for occasional pandemics when increased rates lasted longer than a typical 10–25‐week period. We estimated seasonal peak timing and its CI for each Member State and outcome using the δ‐method applied to the results of harmonic generalized linear models adapted for negative binomial assumptions (HGLM‐NB) (Table 1) (Naumova & MacNeill, 2007). When applied to each outcome of interest individually, the harmonic terms are used to fit seasonal periodic oscillations as follows:

lnEYk,j,t=β0+βssinφt+βccosφt, (4)

where Y k,j,t is the times series of weekly rates for k‐outcome in j‐Member State; β s and β c are coefficients of sine‐ and cosine‐terms, respectively; t is the consecutive week; and φ=2πω, ω = 1/52.25 are to reflect the annual cycle in weeks (Table 1). When applying the δ‐method, β s and β c coefficients allowed us to estimate the phase angle, ψ, which reflected the seasonal peak timing relative to the start of the calendar year, and the peak amplitude, γ, which reflected the difference between outcome values at the peak and nadir of a seasonal curve. Most importantly, the δ‐method allowed us to quantify uncertainties for the estimates of peak timing and amplitude (Naumova & MacNeill, 2007). We then recalibrated the phase angle to match the temporal unit of the analysis, in the presented case, an epidemiological week. The CI for peak timing also could be calibrated to reflect bounds of a calendar year with respect to the temporal unit of assessment such as 1–365.25 days, 1–52.25 weeks and 1–12 months. The results are shown in Supplementary Table S2.

Table 1.

The δ‐method equations and notations for estimating peak timing of any single‐peaking infection using generalised linear models with a negative binomial distributional assumption and log‐link function (HGLM‐NB) applied to outcomes' time series.

Feature Equations and notations Comments
Phase angle, ψ
ψ=arctanβsβc
The phase angle describes seasonal peaks according to radial coordinates.
Variance of ψ, Varψ
Varψ=βc2*σs2+βs2*σc22*σβsβc*βs*βcβc2+βs22
Estimates are derived from the standard error of harmonic terms and their covariance.
Peak timing, PT

if βs>0&βc>0, then: PT=ψφ

if βc<0, then: PT=ψ+πφ

if βs<0&βc>0, then: PT=ψ+2πφ

The peak timing is recalibrated to match the temporal unit of assessment with respect to a quadrant of the radial coordinates.
Standard error of PT, σPT
σPT=Varψφ
The standard error is recalibrated to match the temporal unit of assessment.
Confidence interval of PT, 95% CI (PT)
95%CIPT=PT±1.96*σPT
The 95% CI is calibrated to reflect the bounds of a calendar year.
Amplitude, γ
γ=βc2+βs2
The mathematical amplitude, or value midpoint between peak and nadir values
Variance of γ, Var(γ)
Varγ=βc2*σc2+βs2*σs2+2*σβsβc*βs*βcβc2+βs2
Estimates are derived from the standard error of harmonic terms and their covariance.
Intensity, ϑ
eγ
The intensity is exponentiated to match the units of the outcome.
Confidence interval of ϑ, CI ( ϑ)
CIϑ=eγ±M*σγ
The CI is calculated to reflect the M‐based bounds of intensity.

We then applied mixed effects model designs to estimate the global, regional and transmission zone‐specific seasonal peak timing:

lnEYk,j,t=β0,j+βs,jsinφt+βc,jcosφt+b0+bssinφt+bccosφt+aZ, (5)

where random effects capture seasonal oscillations for each Member State and fixed effects define general seasonality features for various spatial allocation schemes, including global, regional and zonal estimates, Z. The estimates of peak timing and county‐specific rates were compiled in Supplementary Table S3 and used to produce visualisations. The model parameters can be estimated using standard commercial statistical packages, including lmer in R.

3.3. Creation of Data Visualisations

To communicate the findings that contain modelling results for each of 12 outcomes for 170 countries at three spatial allocation schemes along with varying degrees of data completeness, we produced visualisation templates that illustrate the essential characteristics and spatio‐temporal patterns of disease seasonality globally and regionally. This compilation, or analecta of visuals, integrates information using a variety of plot types, including boxplot, scatterplot, time series plots, heatmaps and cartographic maps. They also utilised various statistical graphing techniques, such as shared axes, multi‐panel organisation and colour gradation and coordination to help readers examine multiple spatio‐temporal relationships simultaneously.

We performed data preparation, conducted statistical analyses and created data visualisations using Stata (SE/16.1) and R (3.6.3) software. The analecta of images for all 12 outcomes, R and Stata software codes and Supplementary tables are available at our publicly available repository: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022).

4. Visualisations as a Science Communication Tool

Figure 1 captures the distribution of seasonal peaks with respect to Member State latitude and the overall rate of reported influenza positive cases (Supplementary tables). Adjacent to the scatterplot is a Miller cylindrical projection map to ease the interpretation of well‐delineated clusters. Latitude serves as the shared axis to align clusters visible in the scatterplot with well‐recognised geographical locations for a Member State and region. For example, EURO Member States are clustered at ~3–6 weeks and ~40°N–60°N latitude. In contrast, PAHO Member States have a diffused spread of seasonal peaks near 20°N latitude with relatively high rates of influenza whereas WPRO Member States cluster at ~20–30 weeks and ~0°S–40°S latitude. By animating the sequence of annual peak timing and rate estimates, this template can be transformed into a dynamic map to visualise shifts in peak timing and rates over the 26‐year study period (Castronovo et al., 2009; Moorthy et al., 2012; Chui et al., 2011b; Simpson et al., 2020b).

Figure 1.

Figure 1

Seasonal peak timing estimates, derived from the mixed effects model, for the overall rate of reported influenza (Flu) positive cases per million (cpm) by the Member State‐assigned latitude and WHO regions. The scatterplot (left panel) illustrates the complex relationship between seasonal peak timing and location while emphasising region‐specific clusters. The map (right panel) serves to clarify the locations and regions as well as provide the outcome's spatial distribution. The data and codes to produce visualisations for all outcomes can be found in Supplementary Table S3 and on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022). AFRO, African; EMRO, Eastern Mediterranean; EURO, European; NA, Not Available; PAHO, Pan‐American; SEARO, Southeast Asian; WPRO, Western Pacific [Colour figure can be viewed at wileyonlinelibrary.com]

To assess the statistical properties of available time series, we examined the pair‐wise relationships between L‐mean, L‐scale, L‐skewness and L‐kurtosis as a plot of moments with shared axes (Figure 2). As in Figure 1, we used coloured markers to discern WHO regions and created a standardised template replicable for all outcomes. Our moments plot depicts the overall behaviour of a selected outcome across Member States and can be used to identify situations when outcome distributions could be subject to a high degree of skewness given high or low overall average values. We selected A(H3) and A(H1N1)pdm09 positive tests to demonstrate differences in behaviours of seasonal and pandemic influenza subtypes, respectively. For example, A(H1N1)pdm09 positive tests show rapid spikes in infection illustrated by high L‐skewness and L‐kurtosis values, which may be handled poorly by traditionally applied distributional assumptions and call for advanced statistical methodologies.

Figure 2.

Figure 2

Plots of four L‐moments (L‐mean, L‐scale, L‐skewness and L‐kurtosis) for all WHO Member States for seasonal A(H3) positive tests (left panel) and A(H1N1)pdm09 positive tests (right panel). Marker colours help delineate region‐specific relationships. The data and codes to produce visualisations for all outcomes can be found in Supplementary Table S4 and on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022). AFRO, African; EMRO, Eastern Mediterranean; EURO, European; PAHO, Pan‐American; SEARO, Southeast Asian; WPRO, Western Pacific [Colour figure can be viewed at wileyonlinelibrary.com]

We developed a visual template consisting of aligned boxplots and heatmaps to emphasise the extent of missingness in reported surveillance data (Figure 3). We presented outcomes in descending order by average overall completeness, which ranged from 40% to 5% across outcomes. This visual indicates the potential reliability of estimates of seasonality characteristics based on the outcome of interest for a researcher. For example, the graph shows that the total numbers of tests and positive tests are the most reliable outcomes to conduct a seasonality analysis. To compare by geographical location and region, we applied a monochromatic heatmap with the sorting order based on Member State‐specific average overall completeness. This arrangement depicts the general patterns of missingness for outcomes, Member States and regions simultaneously. For example, some Member States are capable to achieve over 80% in data completeness that make them strong candidates for an in‐depth data analysis.

Figure 3.

Figure 3

A composite of boxplots and heatmaps of completeness for all 14 available influenza outcomes including specimens processed (SPE), tests (TES), negative tests (NEG), and total (POS), influenza A (FLUA), influenza B (FLUB), A(H1), A(H1N1)pdm09 (A(H1N1)), A(H3), A(H5) A (Unsubtyped) (A (UNS)), B (Yamagata) (B (YAM)), B (Victoria) (B (VIC)), and B (Undetermined) (B (UND)) positive tests. Outcomes are presented in descending order by average overall completeness whereas regions and Member States are presented in descending order by regional and State‐specific average overall completeness. The data and codes to produce visualisations for all outcomes can be found in Supplementary Table S5 and on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022). AFRO, African; EMRO, Eastern Mediterranean; EURO, European; PAHO, Pan‐American; SEARO, Southeast Asian; WPRO, Western Pacific [Colour figure can be viewed at wileyonlinelibrary.com]

We found that completeness was ~10‐times greater for general outcomes, like weekly rates of tests and total positive tests (41.42%, 95% CI = [20.93, 53.66] and 40.04%, 95% CI = [18.90, 55.32]) as compared to the rate of negative tests (3.48%, 95% CI = [0.00, 16.51]) or rarely tested strains. There are clear differences in attitudes to reporting negative tests; whereas PAHO Member States adhered well to reporting negative tests, all other regions were reluctant to report perhaps due to redundant reporting. A combination of both positive and negative tests could offer an additional check on data quality and reliability.

By extending this approach to the annualised data, researchers can achieve higher granularity and further define and detect temporal structural missingness when records are limited or missing during a specific time of the year. Further rearrangement of this template can be transformed into of a sequence of animated annual estimates to demonstrate changes over time. These annual trends and fluctuations can illuminate potential deterioration of surveillance data due to reduced resources or improvements in surveillance capacity after implementing targeted activities.

To further elaborate on global spatial patterns and the seasonal behaviour of influenza, we created a composite of a line graph for global weekly rate estimates and heatmaps of average weekly rates by Member State and region (Figure 4). While this template allows researchers to change the sorting order based on any organisational principle or variable, the presented heatmap is arranged to reflect the degree of completeness (from high to low) for influenza positive tests. As shown in Figure 4, the global early spring peak in influenza positive tests appeared to be driven by EURO Member States and countries of the northern hemisphere. A substantial drop in reporting, especially in the Central America and the Caribbean transmission zone within PAHO, trigger concerns over the data analysis quality when records barely reached 20–30% completeness (shown in grey colour in Figure 4).

Figure 4.

Figure 4

A composite of a line graph for global weekly rate estimates for influenza positive tests and heatmaps of average weekly rates by Member State and WHO‐defined region indicating periods with missing data (shown in grey colour). The data and codes to produce visualisations for all outcomes can be found in Supplementary Table S6 and on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022). AFRO, African; EMRO, Eastern Mediterranean; EURO, European; PAHO, Pan‐American; SEARO, Southeast Asian; WPRO, Western Pacific [Colour figure can be viewed at wileyonlinelibrary.com]

5. Discussion

With the presented study, we illustrate how the application of linear mixed effects models with harmonic components can be applied to describe, explain and predict seasonal influenza profiles across varying spatial locations. The proposed framework allows us to adjust for spatial features, temporal dependencies and data completeness within individual locations and to capture differences among Member States within the same geographic context. We also emphasised the value of developing comprehensive visualisations to communicate complex relationships and spatio‐temporal patterns to broad audiences.

Data quality and credibility affect inferences and conclusions of data analysis that in turn affect public health recommendations and preparedness plans (Rosenbaum et al., 2021; Long et al., 2022). Given the economic, social, political and public health repercussions of global infections including COVID‐19, the needs for improved data and model quality and timeliness will continue to grow. From the onset of the pandemic, many sources provided daily records that allowed researchers to detect a complex day‐of‐the‐week effect, a feature valuable to better anticipate fluctuations in SARS‐CoV‐2 testing and manage appropriate workflow for health care workers and public health officials (Simpson et al., 2022c). Weekly aggregates are masking such effects.

Furthermore, the needs for detailed analysis of granular data at refined temporal and spatial scales require refinement of methodology for producing and assembling modelling results. The analysis should involve directional statistics and wrapped distributions dealing with temporal periodicity and spatial spherical data representations. As statisticians and public health practitioners work together, we should place greater attention on expanding the range of statistical models to fully utilise publicly available surveillance data. We should also examine the best practices in developing and implementing analytical workflows that assess the usability of reported surveillance records for establishing global and local spatio‐temporal patterns (Simpson et al., 2021).

Finally, these approaches offer ways for researchers and practitioners to align with the Findability, Accessibility, Interoperability and Reusability (FAIR) principles that are proactively supported by funding agencies (Wilkinson et al., 2016). These four foundational principles guide data curators in maximising the added value gained by contemporary data sharing opportunities. To improve the infrastructure supporting the reuse of scholarly data, we advocate for supplementing public data with a rigorously structured metadata so that diverse groups of potential users can recognise the anticipated quality of findings, select methods suitable for analyses and offer recommendations for further data collection improvements. We offer nuanced aspects for implementing FAIR principles for surveillance data, for which completeness and structural missingness must be recognised and addressed. In addition, the FAIR principles emphasise enhancing capabilities of digital technologies for automatic identification and usage of data. Adoption of automation processes will help ensure that the process of data collection, analysis, visualisation and dissemination is both credible and reliable.

6. Concluding Remarks

With growing capacities for disease surveillance demonstrated during the COVID‐19 pandemic, the global research community should place greater attention on statistical models and analytical tools to better utilise publicly available surveillance data. Future directions could include the evaluation of national and global dashboards with the focus on the use of proper, well‐tailored statistical tools and data quality metrics. As more threatening novel pathogens appear, we implore the global health community to use more sophisticated analytical workflows to model trends of infections, forecast outbreak events and inform preparedness response plans.

Funding information

This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via award 2017‐17072100002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA or the US Government. The US Government is authorised to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. The United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) Cooperative State Research, Education, and Extension Service Fellowship supported Ryan B. Simpson via award 2020‐38420‐30724. This work in part was supported by the STOP Spillover project through the United States Agency for International Development (USAID) via cooperative agreement 7200AA20CA00032. The contents are the responsibility of STOP Spillover and do not necessarily reflect the views of USAID or the US Government. The Tufts University Data Intensive Studies Center (DISC) Seed Grant and the National Science Foundation's Innovations of Graduate Education Program's SOLution‐oriented, Student‐Initiated, Computationally‐Enriched (SOLSTICE) approach, via award 1855886 also supported this research.

Conflict of Interest

The authors declare no conflict of interest.

Supporting information

Table S1. Supporting information.

Table S2. Supporting information.

Table S3. Supporting information.

Table S4. Supporting information.

Table S5. Supporting information.

Table S6. Supporting information.

Naumova, E. N. , Simpson, R. B. , Zhou, B. , and Hartwick, M. A. (2022) Global seasonal and pandemic patterns in influenza: An application of longitudinal study designs. International Statistical Review, 90: S82–S95. 10.1111/insr.12529.

Data Availability Statement

The authors have shared time series data, model results, analytical and visualisation codes, and Supplementary tables and visualisations on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022).

References

  1. Alarcon Falconi, T.M. , Cruz, M.S. & Naumova, E.N. (2018). The shift in seasonality of legionellosis in the USA. Epidemiol. Infect., 146(14), 1824–1833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alarcon Falconi, T.M. , Estrella, B. , Sempértegui, F. & Naumova, E.N. (2020). Effects of data aggregation on time series analysis of seasonal infections. Int. J. Environ. Res. Public Health, 17(16), 5887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alonso, W.J. , Yu, C. , Viboud, C. , Richard, S.A. , Schuck‐Paim, C. , Simonsen, L. , Mello, W.A. & Miller, M.A. (2015). A global map of hemispheric influenza vaccine recommendations based on local patterns of viral circulation. Sci. Rep., 5(1), 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bloom‐Feshbach, K. , Alonso, W.J. , Charu, V. , Tamerius, J. , Simonsen, L. , Miller, M.A. & Viboud, C. (2013). Latitudinal variations in seasonal activity of influenza and respiratory syncytial virus (RSV): A global comparative review. PLoS ONE, 8(2), e54445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Caini, S. , Alonso, W.J. , Séblain, C.E.G. , Schellevis, F. & Paget, J. (2017). The spatiotemporal characteristics of influenza A and B in the WHO European Region: Can one define influenza transmission zones in Europe?. Euro. Surveill., 22(35), 30606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Castronovo, D.A. , Chui, K.K. & Naumova, E.N. (2009). Dynamic maps: A visual‐analytic methodology for exploring spatio‐temporal disease patterns. J. Environ. Health, 8(1), 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chui, K.K. , Cohen, S.A. & Naumova, E.N. (2011a). Snowbirds and infection—New phenomena in pneumonia and influenza hospitalizations from winter migration of older adults: A spatiotemporal analysis. BMC Public Health, 11(1), 444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chui, K.K. , Webb, P. , Russell, R.M. & Naumova, E.N. (2009). Geographic variations and temporal trends of Salmonella‐associated hospitalization in the U.S. elderly, 1991–2004: A time series analysis of the impact of HACCP regulation. BMC Public Health, 9(1), 447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chui, K.K. , Wenger, J.B. , Cohen, S.A. & Naumova, E.N. (2011b). Visual analytics for epidemiologists: Understanding the interactions between age, time, and disease with multi‐panel graphs. PLoS ONE, 6(2), e14683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cnaan, A. , Laird, N.M. & Slasor, P. (1997). Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Stat. Med., 16(20), 2349–2380. [DOI] [PubMed] [Google Scholar]
  11. Dong, E. , Du, H. & Gardner, L. (2020). An interactive web‐based dashboard to track COVID‐19 in real time. Lancet Infect. Dis., 20(5), 533–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fefferman, N. & Naumova, E.N. (2010). Innovation in observation: a vision for early outbreak detection. Emerg. Health Threats J., 3(1), 7103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fitzmaurice, G.M. , Laird, N.M. & Ware, J.H. (2012). Applied Longitudinal Analysis. John Wiley & Sons. [Google Scholar]
  14. Hosking, J.R.M. (1990). L‐moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B Stat Methodol., 52(1), 105–124. [Google Scholar]
  15. Institut National de la Statistique et des Etudes Economiques (2021). Statistics and Studies. Available at https://www.insee.fr/en/statistiques. Accessed December 2021.
  16. Johnson, N. L , Kotz, S. & Balakrishnan, N. (1994). Continuous Univariate Distributions, 1, 2nd ed. India: Wiley Series in Probability and Statistics. [Google Scholar]
  17. Laird, N.M. (1988). Missing data in longitudinal studies. Stat. Med., 7(1–2), 305–315. [DOI] [PubMed] [Google Scholar]
  18. Laird, N.M. (2022). The analysis of longitudinal studies. Int. Stat. Rev. 10.1111/insr.12523 [DOI] [Google Scholar]
  19. Laird, N.M. & Ware, J.H. (1982). Random‐effects models for longitudinal data. Biometrics, 38(4), 963–974. [PubMed] [Google Scholar]
  20. Long, S. , Loutfi, D. , Kaufman, J.S. & Schuster, T. (2022). Limitations of Canadian COVID‐19 data reporting to the general public. J. Public Health Policy, 203–221 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Moorthy, M. , Castronovo, D. , Abraham, A. , Bhattacharyya, S. , Gradus, S. , Gorski, J. , Naumov, Y.N. , Fefferman, N.H. & Naumova, E.N. (2012). Deviations in influenza seasonality: Odd coincidence or obscure consequence?. Clin. Microbiol. Infect., 18(10), 955–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Naumova, E.N. (2006). Mystery of seasonality: Getting the rhythm of nature. J. Public Health Policy, 27(1), 2–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Naumova, E.N. , Jagai, J.S. , Matyas, B. , DeMaria, A. , MacNeill, I.B. & Griffiths, J.K. (2007). Seasonality in six enterically transmitted diseases and ambient temperature. Epidemiol. Infect., 135(2), 281–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Naumova, E.N. & MacNeill, I.B. (2007). Seasonality assessment for biosurveillance systems. In Advances in Statistical Methods for the Health Sciences (pp. 437–450). Birkhäuser Boston; Boston, MA. [Google Scholar]
  25. Naumova, E.N. , Simpson, R.B. , Zhou, B. & Hartwick, M.A. (2022). Global Seasonal and Pandemic Patterns in Influenza: An Application of Longitudinal Study Designs. figshare. Available at 10.6084/m9.figshare.19583908.v1 [DOI] [PMC free article] [PubMed]
  26. Rosenbaum, J.E. , Stillo, M. , Graves, N. & Rivera, R. (2021). Timeliness of provisional United States mortality data releases during the COVID‐19 pandemic: Delays associated with electronic death registration system and weekly mortality. J. Public Health Policy, 42(4), 536–549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. SimpleMaps.com (2021). World Cities Database. Available at https://simplemaps.com/data/world‐cities. Accessed July 2021.
  28. Simpson, R.B. , Babool, S. , Tarnas, M.C. , Kaminski, P.M. , Hartwick, M.A. & Naumova, E.N. (2022b). Dynamic mapping of cholera spread and conflict severity during the Yemeni Civil War, 2016–2019. J. Public Health Policy, 45(2), 185–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Simpson, R.B. , Gottlieb, J. , Zhou, B. , Hartwick, M.A. & Naumova, E.N. (2021). Completeness of open access FluNet influenza surveillance data for Pan‐America in 2005–2019. Sci. Rep., 11(1), 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Simpson, R.B. , Kulinkina, A.V. & Naumova, E.N. (2022a). Investigating seasonal patterns in enteric infections: A systematic review of time series methods. Epidemiol. Infect., 1–25 150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Simpson, R.B. , Lauren, B.N. , Schipper, K.H. , McCann, J.C. , Tarnas, M.C. & Naumova, E.N. (2022c). Critical periods, critical time points and day‐of‐the‐week effects in COVID‐19 surveillance data: An example in Middlesex County, Massachusetts, USA. Int. J. Environ. Res. Public Health, 19(3), 1321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Simpson, R.B. , Zhou, B. , Alarcon Falconi, T.M. & Naumova, E.N. (2020b). An analecta of visualizations for foodborne illness trends and seasonality. Sci. Data, 7(1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Simpson, R.B. , Zhou, B. & Naumova, E.N. (2020a). Seasonal synchronization of foodborne outbreaks in the United States, 1996–2017. Sci. Rep., 10(1), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Stashevsky, P.S. , Yakovina, I.N. , Alarcon Falconi, T.M. & Naumova, E.N. (2019). Agglomerative clustering of enteric infections and weather parameters to identify seasonal outbreaks in cold climates. Int. J. Environ. Res. Public Health, 16(12), 2083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. United States Census Bureau (2021). International Database: World Population Estimates and Projections. Available at https://www.census.gov/programs‐surveys/international‐programs/about/idb.html. Accessed December 2021.
  36. United States Centers for Disease Control and Prevention (2022). National Notifiable Diseases Surveillance System. Available at https://data.cdc.gov/browse?category=NNDSS. Accessed March 2022.
  37. Ureña‐Castro, K. , Ávila, S. , Gutierrez, M. , Naumova, E.N. , Ulloa‐Gutierrez, R. & Mora‐Guevara, A. (2019). Seasonality of rotavirus hospitalizations at Costa Rica's National Children's Hospital in 2010–2015. Int. J. Environ. Res. Public Health, 16(13), 2321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Wenger, J.B. & Naumova, E.N. (2010). Seasonal synchronization of influenza in the United States older adult population. PLoS ONE, 5(4), e10187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wilkinson, M.D. , Dumontier, M. , Jan Aalbersberg, I. , Appleton, G. , Axton, M. , Baak, A. , Blomberg, N. , Boiten, J.W. , da Silva Santos, L.B. , Bourne, P.E. , Bouwman, J. , Brookes, A.J. , Clark, T. , Crosas, M. , Dillo, I. , Dumon, O. , Edmunds, S. , Evelo, C.T. , Finkers, R. , Gonzalez‐Beltran, A. , Gray, A.J.G. , Growth, P. , Goble, C. , Grethe, J.S. , Heringa, J. , Hoen, P.A.C. , Hooft, R. , Kuhn, T. , Kok, R. , Kok, J. , Lusher, S.J. , Martone, M.E. , Mons, A. , Packer, A.L. , Persson, B. , Rocca‐Serra, P. , Roos, M. , van Schaik, R. , Sansone, S.A. , Schultes, E. , Sengstag, T. , Slater, T. , Strawn, G. , Swertz, M.A. , Thompson, M. , van der Lei, J. , van Mulligen, E. , Velterop, J. , Waagmeester, A. , Wittenburg, P. , Wolstencroft, K. , Zhao, J. & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 3(1), 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. World Health Organization (2021). Global Influenza Programme. Available at https://www.who.int/tools/flunet. Accessed June 2021.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Supporting information.

Table S2. Supporting information.

Table S3. Supporting information.

Table S4. Supporting information.

Table S5. Supporting information.

Table S6. Supporting information.

Data Availability Statement

The authors have shared time series data, model results, analytical and visualisation codes, and Supplementary tables and visualisations on figshare: https://doi.org/10.6084/m9.figshare.19583908.v1 (Naumova et al., 2022).


Articles from International Statistical Review are provided here courtesy of Wiley

RESOURCES