Skip to main content
. 2024 Sep 25;13:RP91849. doi: 10.7554/eLife.91849

Figure 2. Antigenic and genetic evolution of seasonal influenza A(H3N2) viruses, 1997 – 2019.

(A–B) Temporal phylogenies of (A) hemagglutinin (H3) and (B) neuraminidase (N2) gene segments. Tip color denotes the Hamming distance from the root of the tree, based on the number of substitutions at epitope sites in H3 (N=129 sites) and N2 (N=223 sites). Black ‘X’ marks indicate the phylogenetic positions of U.S. recommended vaccine strains. (C–D) Seasonal genetic and antigenic distances are the mean distance between A(H3N2) viruses circulating in the current season t and viruses circulating in the prior season (t – 1), measured by (C) five sequence-based metrics (HA epitope (N=129), HA receptor binding site (RBS) (N=7), HA stalk footprint (N=34), NA epitope (N=223 or N=53)) and (D) hemagglutination inhibition (HI) titer measurements. (E) The Shannon diversity of H3 and N2 local branching index (LBI) values in each season. Vertical bars in (C), (D), and (E) are 95% confidence intervals of seasonal estimates from five bootstrapped phylogenies.

Figure 2—source data 1. A/H3 sequence counts in five subsampled datasets.
We downloaded all H3 sequences and associated metadata from the GISAID EpiFlu database and focused our analysis on complete H3 sequences that were sampled between January 1, 1997, and October 1, 2019. To account for variation in sequence availability across global regions, we subsampled the selected sequences five times to representative sets of no more than 50 viruses per month, with preferential sampling for North America. Each month up to 25 viruses were selected from North America (when available) and up to 25 viruses were selected from nine other global regions (when available), with even sampling across the other global regions (China, Southeast Asia, West Asia, Japan and Korea, South Asia, Oceania, Europe, South America, and Africa).
Figure 2—source data 2. A/N2 sequence counts in five subsampled datasets.
We downloaded all N2 sequences and associated metadata from the GISAID EpiFlu database and focused our analysis on complete N2 sequences that were sampled between January 1, 1997, and October 1, 2019. To account for variation in sequence availability across global regions, we subsampled the selected sequences five times to representative sets of no more than 50 viruses per month, with preferential sampling for North America. Each month up to 25 viruses were selected from North America (when available) and up to 25 viruses were selected from nine other global regions (when available), with even sampling across the other global regions (China, Southeast Asia, West Asia, Japan and Korea, South Asia, Oceania, Europe, South America, and Africa).

Figure 2.

Figure 2—figure supplement 1. The number of A/H3 sequences in five subsampled datasets in each month and in each influenza season.

Figure 2—figure supplement 1.

In each figure, the five subsampled datasets are plotted individually but individual time series are difficult to discern due to minor differences in sequence counts across the datasets. (A) The number of sequences in subsampled datasets in each month collected in North America (purple) versus nine other world regions combined (dark green). (B) The total number of sequences in subsampled datasets collected in each month in all world regions combined. (C) The number of sequences in subsampled datasets in each season collected in North America (purple) versus nine other world regions combined (dark green). (D) The total number of sequences in subsampled datasets collected in each season in all world regions combined.
Figure 2—figure supplement 2. The number of A/N2 sequences in five subsampled datasets in each month and in each influenza season.

Figure 2—figure supplement 2.

In each figure, the five subsampled datasets are plotted individually but individual time series are difficult to discern due to minor differences in sequence counts across the datasets. (A) The number of sequences in subsampled datasets in each month collected in North America (purple) versus nine other world regions combined (dark green). (B) The total number of sequences in subsampled datasets collected in each month in all world regions combined. (C) The number of sequences in subsampled datasets in each season collected in North America (purple) versus nine other world regions combined (dark green). (D) The total number of sequences in subsampled datasets in each season in all world regions combined.
Figure 2—figure supplement 3. Comparison of seasonal antigenic drift measured by substitutions at H3 epitope sites and HI log2 titer measurements, from seasons 1997–1998 to 2018–2019.

Figure 2—figure supplement 3.

Spearman’s rank correlations between H3 epitope distance and HI log2 titer distance at (A) one-season lags and (B) two-season lags. Correlation coefficients and associated p-values are shown in the top right section of each plot. Seasonal antigenic distance is the mean distance between viruses circulating in the current season t and viruses circulating in the prior season (t – 1 year, one-season lags) or two prior seasons ago (t – 2 years, two-season lags). Seasonal distances are scaled because H3 epitope distance and HI log2 titer distance use different units of measurement. Point labels indicate the current influenza season, and point color denotes the relative timing of influenza seasons, with earlier seasons shaded dark purple (e.g. 1997–1998) and later seasons shaded light yellow (e.g. 2018–2019). H3 epitope distance and HI log2 titer distance at two-season lags capture expected ‘jumps’ in antigenic drift during key seasons previously associated with major antigenic transitions (Smith et al., 2004), such as the SY97 cluster seasons (1997–1998, 1998–1999, 1999–2000), the FU02 cluster season (2003–2004), and the CA04 cluster season (2004–2005).
Figure 2—figure supplement 4. Pairwise correlations between H3 and N2 evolutionary indicators (one-season lags).

Figure 2—figure supplement 4.

Spearman’s rank correlations between seasonal measures of H3 and N2 evolution, including H3 RBS distance, H3 epitope distance, H3 non-epitope distance, H3 stalk footprint distance, HI log2 titer distance, N2 epitope distance based on 223 or 53 epitope sites, N2 non-epitope distance, and the standard deviation (s.d.) and Shannon diversity of H3 and N2 local branching index (LBI) values in the current season t. Seasonal distances were estimated as the mean distance between viruses circulating in the current season t and viruses circulating in the prior season (t – 1). The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted p<0.05). The Benjamini and Hochberg method was used to adjust p-values for multiple testing.
Figure 2—figure supplement 5. Pairwise correlations between H3 and N2 evolutionary indicators (two-season lags).

Figure 2—figure supplement 5.

We measured Spearman’s rank correlations between seasonal measures of H3 and N2 evolution, including H3 RBS distance, H3 epitope distance, H3 non-epitope distance, H3 stalk footprint distance, HI log2 titer distance, N2 epitope distance based on 223 or 53 epitope sites, N2 non-epitope distance, and the standard deviation (s.d.) and Shannon diversity of H3 and N2 local branching index (LBI) values in the current season t. Seasonal distances were estimated as the mean distance between viruses circulating in the current season t and viruses circulating two prior seasons ago (t – 2). The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted p<0.05). The Benjamini and Hochberg method was used to adjust p-values for multiple testing.
Figure 2—figure supplement 6. Pairwise correlations between H3 and N2 evolutionary indicators (one- and two-season lags).

Figure 2—figure supplement 6.

We measured Spearman’s rank correlations between seasonal measures of H3 and N2 evolution, including H3 RBS distance, H3 epitope distance, H3 non-epitope distance, H3 stalk footprint distance, HI log2 titer distance, N2 epitope distance based on 223 or 53 epitope sites, N2 non-epitope distance, and the standard deviation (s.d.) and Shannon diversity of H3 and N2 local branching index (LBI) values in the current season t. Seasonal distances were estimated as the mean distance between viruses circulating in the current season t and viruses circulating in the prior season (t – 1) or two prior seasons ago (t – 2). The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted p<0.05). The Benjamini and Hochberg method was used to adjust p-values for multiple testing.
Figure 2—figure supplement 7. Comparison of seasonal antigenic drift measured by substitutions at H3 and N2 epitope sites, from seasons 1997–1998 to 2018–2019.

Figure 2—figure supplement 7.

Spearman’s rank correlations between H3 epitope distance and N2 epitope distance at (A) one-season lags and (B) two-season lags. Correlation coefficients and associated p-values are shown in the top right section of each plot. Seasonal epitope distance is the mean distance between viruses circulating in the current season t and viruses circulating in the prior season t – 1 (one-season lag) or two prior seasons ago t – 2 (two-season lag). Point labels indicate the current influenza season, and point color denotes the relative timing of influenza seasons, with earlier seasons shaded dark purple (e.g. 1997–1998) and later seasons shaded light yellow (e.g. 2018–2019). H3 epitope distance at two-season lags and N2 epitope distance at one-season lags capture expected ‘jumps’ in antigenic drift during key seasons previously associated with major antigenic transitions (Smith et al., 2004), such as the SY97 cluster seasons (1997–1998, 1998–1999, 1999–2000), the FU02 cluster season (2003–2004), and the CA04 cluster season (2004–2005).