Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2023 Oct 26;97(11):e01329-23. doi: 10.1128/jvi.01329-23

Deep mutational scanning reveals the functional constraints and evolutionary potential of the influenza A virus PB1 protein

Yuan Li 1, Sarah Arcos 1, Kimberly R Sabsay 2,3, Aartjan J W te Velthuis 2, Adam S Lauring 1,4,
Editor: Anice C Lowen5
PMCID: PMC10688322  PMID: 37882522

ABSTRACT

The influenza virus polymerase is central to influenza virus evolution. Adaptive mutations within the polymerase are often a prerequisite for efficient spread of novel animal-derived viruses in human populations. The polymerase also determines fidelity and, therefore, the rate at which the virus will acquire mutations that lead to host range expansion, drug resistance, or antigenic drift. Despite its importance to viral replication and evolution, our understanding of the mutational effects and associated constraints on the influenza RNA-dependent RNA polymerase (RdRp) is relatively limited. We performed deep mutational scanning of the A/WSN/1933(H1N1) polymerase basic 1 (PB1), generating a library of 95.4% of amino acid substitutions at 757 sites. After accuracy filters, we were able to measure replicative fitness for 13,354 (84%) of all possible amino acid substitutions, and 13 were validated by results from pairwise competition assays. Functional and structural constraints were better revealed by individual sites involved in RNA or protein interactions than by major subdomains defined by sequence conservation. Mutational tolerance, as defined by site entropy, was correlated with evolutionary potential, as captured by diversity in the available H1N1 sequences. Of the 29 beneficial sites, many have either been identified in the natural evolution of PB1 or shown experimentally to have important impacts on replication and adaptation. Accessibility of amino acid substitutions by single nucleotide mutation was a key factor in determining whether mutations appeared in natural PB1 evolution. Our work provides a comprehensive map of mutational effects on a viral RdRp and a valuable resource for subsequent studies of influenza replication and evolution.

IMPORTANCE

The influenza virus polymerase is important for adaptation to new hosts and, as a determinant of mutation rate, for the process of adaptation itself. We performed a deep mutational scan of the polymerase basic 1 (PB1) protein to gain insights into the structural and functional constraints on the influenza RNA-dependent RNA polymerase. We find that PB1 is highly constrained at specific sites that are only moderately predicted by the global structure or larger domain. We identified a number of beneficial mutations, many of which have been shown to be functionally important or observed in influenza virus’ natural evolution. Overall, our atlas of PB1 mutations and their fitness impacts serves as an important resource for future studies of influenza replication and evolution.

KEYWORDS: influenza virus, polymerase, deep mutational scanning, evolution

INTRODUCTION

Viral RNA-dependent RNA polymerases (RdRp) are central to RNA virus replication and evolution. The RdRp replicates the genome and is a key determinant for replicative fitness and viral mutation rates. For negative-strand RNA viruses, the RdRp is also responsible for transcription, thereby regulating protein expression. The RdRp has been directly linked to virulence (1). Mutations within the RdRp influence host adaptation (2 4), replication fidelity (5 8), post-translational modifications (9), and host immune responses (10, 11).

The evolution of viral RdRp is functionally and structurally constrained. Functional constraints include requirements for interactions with RNAs and other proteins, adaptation to new replication environments (12), the deleterious impact of low fidelity (5), and viral codon abundance (13 15). Residues that are involved in obligatory interactions tend to be less tolerant to mutation and evolve at a slower rate (16 19). The primary structural constraints are solvent accessibility (20), maintenance of molecular flexibility (21 23), intermolecular interactions (24, 25), and key protein secondary structures (26). For example, the establishment of secondary structures requires certain biochemical characteristics conferred by a limited number of amino acids (27), and mutations in buried residues often have a bigger fitness effect, as their change will impact nearby residues (27, 28).

The influenza virus RdRp is a heterotrimer that consists of three subunits: polymerase basic 1 (PB1), polymerase basic 2 (PB2), and polymerase acidic (PA), in which PB1 functions as the catalytic subunit. The PB1 subunit may have additional functional and structural constraints, because it cooperates with the two other polymerase subunits and viral nucleoproteins (NP) in transcription and genome replication. During transcription, PB1 guides the capped primer cleaved from a host pre-mRNA by PB2 and PA into the polymerase active site and stabilizes it on the 3′ end of the viral RNA (vRNA) template in the active site (29). The PB1 RdRp then extends the capped primer through the incorporation of nucleoside triphosphates, separates the template-product duplex downstream of the active site, and extrudes the viral mRNA through the product exit channel and the copied template through the template exit channel (29). The interactions among PB1, PB2, and PA shift at every stage of transcription (30). During replication, a vRNA is copied into a complementary RNA (cRNA). Next, the cRNA product serves as the template for negative-strand vRNA synthesis. The process of vRNA and cRNA synthesis not only requires the coordination of polymerase subunits but also interactions with an encapsidating RdRp and host protein ANP32 to form an RdRp dimer, a trans-activating RdRp to induce correct replication initiation, conformational changes to transfer the nascent vRNA or cRNA to the additional RdRp, and recruitment of viral nucleoprotein to encapsidate the nascent vRNA and cRNA molecules (31).

Given the importance of PB1 to influenza virus replication and evolution, defining the fitness effects of amino acid substitutions can elucidate the relevant functional and structural constraints. Deep mutational scanning (DMS)—saturation mutagenesis combined with deep sequencing—is a massively parallel approach that has recently been used to explore the fitness landscapes of viral proteins (19, 32 35). Here, we applied deep mutational scanning to the influenza virus A/WSN/1933(H1N1) (abbreviated WSN33) PB1 RdRp subunit, identifying constrained regions of the protein and relating beneficial mutations to those observed in natural evolution. Overall, our study provides a comprehensive resource for studies of influenza virus replication and evolution.

MATERIALS AND METHODS

Cell lines and media

MDCK-SIAT1-TMPRSS2 and HEK293T-CMV-PB1 were provided by Dr. Jesse Bloom (Fred Hutchinson Cancer Research Center) and maintained in D10 media [Dulbecco’s modified Eagle medium (DMEM), Invitrogen, 11995-065], with 10% heat-inactivated fetal bovine serum (FBS, Gibco, 26140-079), 1% L-Glutamine (100×, Gibco, 25030-081), and 1% Pen+Strep (10,000 U/mL P, 10,000 µg/mL S, Invitrogen, 15140-122). A549 cells were maintained in A549 growth media (DMEM, high glucose, with L-glutamine, without Na pyruvate (Invitrogen 11965-092), with 10% FBS, 1% Pen+Strep, 0.1875% bovine albumin fraction V (7.5%, Invitrogen 15260-037), and 2.5% HEPES (1M, Invitrogen, 15630-080). We used IGM+ media [Opti-MEM1 Reduced Serum Media (Gibco, 31985-070), with 0.5% heat-inactivated FBS, 1% Pen+Strep, 0.3% bovine albumin fraction V, and 500 µL of 100 mg/mL CaCl2] for 24 hours following transfection. We used WNM media (Medium 199, Gibco, 11043-023, no phenol red, 0.5% heat-inactivated FBS, 1% Pen+Strep, 0.3% bovine albumin fraction V, 2.5% HEPES, and 500 µL of 100 mg/mL CaCl2) for TCID50 assays. We used A549 growth media for seeding cells for viral passages and A549 viral media [DMEM, high glucose, with L-glutamine, without Na pyruvate (Invitrogen 11965-092), with 1% Pen+Strep, 0.1875% bovine albumin fraction V, 2.5% HEPES, and TPCK-trypsin at a final concentration of 4 µg/mL] for virus infections.

Construction of PB1 codon mutant plasmid libraries

PB1 codon mutant libraries were generated using an overlapping PCR strategy described in reference (32) with (36) as a reference. We used the code in reference (37) first described in reference (38) with the modifications from reference (39) to generate tiled primers for mutagenesis and a code from reference (40) to determine how library diversity would be impacted by restriction enzymes used in cloning. We performed 10 cycles of fragment PCR (round one) with 1.2 µg of plasmid (pHW2000) containing the wild-type (WT) PB1 sequence from WSN33 and 20 cycles of joining PCR (round two). The lengths of PCR products were checked by gel electrophoresis. In a pilot experiment in which we generated PB1 variants for 96 out of the 758 sites, we randomly picked PCR products from 24 clones for Sanger sequencing to evaluate the library mutation rate. Twenty out of 24 clones had only a single codon mutation at the target site, and four clones were wild type.

We pooled an equal volume from the 758 PCR reactions into 16 pools. Each pool was digested by restriction enzyme AarI and the 16 pools combined into one variant insert pool. We used T4 DNA ligase (NEB, #M0202L) to ligate the variant insert pool into BsmBI-digested pHW2000 plasmid and transformed Stellar Competent Cells (TaKaRa, #636763) according to the manufacturer’s instructions. We independently performed the ligation and transformation three times to create three libraries. We plated the transformed cells onto Nunc Square BioAssay Dishes (Thermo Scientific, #240845) and obtained 82,800–118,800 colonies for each library replicate. Plasmid DNA was extracted directly from the pooled colonies using a QIAGEN Plasmid Maxi Kit (QIAGEN, #12162).

Transfection

We generated variant virus libraries by transfecting HEK293T-CMV-PB1 cells, which constitutively express the wild-type PB1 protein from WSN33. For each variant plasmid library, we seeded 36 wells of 6-well plates with 5 × 105 MDCK-SIAT1-TMPRSS2 cells and 5 × 105 HEK293T-CMV-PB1 cells. Seventeen hours later, we transfected each well with 1 µg in each of the seven plasmids containing the seven wild-type WSN33 genome segments and 1 µg of the PB1 variant library using TransIT-LT1 Transfection Reagent (MIR 2300). We used the same procedure to make the wild-type WSN33 viruses as control, only on a smaller scale (six wells) and using the wild-type WSN33 PB1 in place of the variant plasmid library. At 24 hours post-transfection, we replaced the transfection media with fresh IGM+ and then incubated for an additional 24 hours. At 48 hours post-transfection, we harvested viral supernatants by centrifuging at 200 × g for 5 minutes. Three virus variant libraries and the wild-type virus control were aliquoted and snap frozen in 0.5% glycerol prior to storage at −80°C.

Determination of virus titer

Viruses were titered by median Tissue Culture Infective Dose (TCID50) on MDCK-SIAT1-TMPRSS2 cells. For each assay, we seeded 6 × 103 MDCK-SIAT1-TMPRSS2 cells in 100 µL of WNM media in each well of a 96-well plate. Seventeen hours later, we serially diluted the virus samples 1:10 with WNM media supplemented with 4 µg/mL TPCK-trypsin reconstituted in phosphate buffered saline (PBS) to 1 mg/mL for working stock and added 100 µL virus per well. We incubated the plates at 37°C and monitored them daily for cytopathic effect (CPE) up to 4 days.

Viral passages

Each passage had 1 × 106 infectious viral particles on 1 × 108 A549 cells to achieve an approximate multiplicity of infection (MOI) of 0.01 TCID50/cell. We seeded 8 × 107 A549 cells in a total of 60 mL A549 growth media in three T182 flasks. Seventeen hours later, we suspended 1 × 106 TCID50 of virus in 45 mL of A549 viral media with 4 µg/mL freshly added TPCK-trypsin. We aspirated the overnight A549 growth media, rinsed the cells gently with pre-warmed PBS, and added 15 mL of viral dilution to each flask. Three hours after infection, we removed the inoculum, rinsed the cells again with pre-warmed PBS, and replaced the inoculum with 20 mL of fresh A549 viral media per flask with 4 µg/mL TPCK-trypsin. We harvested viral supernatants by centrifugation at 400 × g for 4 minutes, 48 hours after infection, and snap-froze the supernatant in 0.5% glycerol prior to storage at −80°C.

Barcoded subamplicon sequencing

Passaged viruses were concentrated by ultracentrifugation at 27,000 rpm, using Thermo Scientific Sorvall WX Ultra Series Centrifuge with rotor Sorvall AH-629 (DuPont Instruments), for 2 hours at 4°C using Beckman Coulter Centrifuge Tubes (25 × 89 mm, 344058). We then resuspended the viruses in 500 µL of residual media and extracted viral RNA using a QIAamp Viral RNA Mini Kit (QIAGEN, 52906). To accurately measure mutation frequencies, we used a barcoded-subamplicon sequencing strategy described in reference (41) that adds unique sequence barcodes to every DNA molecule in a sample, as follows.

We reverse transcribed the extracted RNA using SuperScript III First-Strand Synthesis System (Invitrogen, 18080-051) and performed PCR to amplify the entire PB1 open reading frame (PCR0). For plasmid samples, we used 2 µL of plasmid DNA at 10 ng/µL as template in PCR0. We cleaned up the PCR0 products using GeneJet PCR clean up kit (GeneJet, K0702) and gel isolated the bands corresponding to full PB1 genome length (~2,341 bp).

Next, we PCR amplified the PB1 gene in eight subamplicons (PCR1). The subamplicons were designed to start and end in full codons, and each subamplicon starts precisely after the previous subamplicon ends. In this way, the nucleotides in one codon in a PB1 DNA molecule will only be calculated once. Forward and reverse primers for PCR1 contained random 8N barcodes at their 5′ termini to uniquely label every cDNA molecule in the template. Theoretically, there would be 416 = 4.29 × 109 unique barcodes. The template input for PCR1 was limited to ~8 × 107 molecules such that each was uniquely barcoded. Illumina compatible, sample-specific adapters were added in a subsequent PCR reaction, PCR2. Eight subamplicons for each sample were pooled together, and we used ~1 × 106 uniquely barcoded molecules from PCR1 as template and unique dual (UD) indexed primers to diminish the issue of index hopping. Finally, we gel isolated the PCR2 products before sequencing on an Illumina NextSeq 1000, P2 600 cycle (2 × 300 PE), with 20% PhiX. We conducted two sequencing runs with 60 µL of the combined PCR2 products at 5 nM, 30 µL for each run, and merged the reads for analysis. We used KOD Hot Start Master Mix (EMD Millipore, 71842) to perform all PCRs. Primers and cycling programs can be found in a Supplementary File (Supplemental Text).

Analysis of deep sequencing data

Sequence files were analyzed using dms_tools2 (42), which groups the paired-end reads with the same PCR1 barcodes. Sequences were discarded if the Q-score of any nucleotide in the barcode was <15. Consensus sequences were generated for barcodes with at least two reads and aligned to the reference genome to record the codon at each site for that molecule. Because mutations are defined at a subamplicon level, it is possible that rare secondary mutations on the same PB1 haplotype could be present on distinct subamplicons.

We calculated the fitness of each mutation based on the enrichment ratio method described in reference (35) with modifications. We calculated the frequency of mutation i at site s as:

frequencyi,s= read counti,s+pseudo countksread counti,k+pseudo count

where the pseudo count was added to ensure a non-zero denominator and fixed as 1 by default. To offset frequency inflation by the pseudo count, we discarded a mutation if its read count in the variant plasmid library was less than 10. We then discarded the mutations whose frequency in the variant plasmid library was not at least sixfold higher than that in the wild-type plasmid library. With these filters, we calculated the enrichment ratio as:

enrichment ratioi,s= frequencypostpassagei,sfrequencyprepassagei,s

and we defined “fitness” as log10(enrichment ratio) normalized by the average fitness of silent mutations in the corresponding subamplicon in each individual library:

fitnessi,s= log10(enrichment ratioi,s) ksilent, amplog10(enrichment ratiok,s)Nsilent, amp

The fitness of a mutation at a certain site used for subsequent analyses was the average fitness of that in all replicates where it was available.

Analysis of naturally occurring influenza sequences

We downloaded the influenza sequences from Global Initiative on Sharing All Influenza Data (GISAID) from 1918 to 2023, with the filtering conditions of “type A,” “H1N1,” “human host,” “required segment PB1,” and “complete sequences only.” According to CDC’s timeline for the 2009 H1N1 pandemic (43), we classified pre-09 strains as all sequences collected before 14 April 2009 and post-09 strains as sequences collected after 12 August 2010. We discarded the sequences collected during the pandemic to avoid the time period when pre- and post-09 strains might be co-circulating. We downloaded the amino acid sequences along with the corresponding metadata and filtered out any sequences that had been passaged in eggs. We aligned the sequences to the wild-type WSN33 amino acid sequence using MAFFT (44). The entropy of a site was measured as the Shannon entropy (45) of all amino acids that appeared at that site:

site entropys= xsp(x)logp(x)

To adjust for uneven sampling over time, we adopted the weighted entropy method described in reference (46). Briefly, we grouped the sequences by collection year, calculated the frequencies of the amino acids in each year, and used the average of amino acid frequencies over all years for the entropy calculation.

Protein structure visualization and analysis

We used UCSF ChimeraX (47) for protein visualizations, including movies. To visualize site entropy on the PB1 protein, we replaced the b-factor column with site entropy data in the PDB files. We identified protein-RNA contacts using LigPlot Plus with default thresholds for the maximum distance between interacting atoms (48). The protein structures used are as follows: 5D9A (apoenzyme), 7NHX (template binding, early), 6T0N (template binding, late), 5M3H (cap-snatching), 6RR7 (pre-initiation), 6QCW (mixed pre-initiation), 6QCV (mixed pre-catalysis), 6QCX (mixed post-incorporation), 6SZV (elongation), and 6SZU (termination).

Measurement of accessible surface area

We measured the Accessible Surface Area (ASA) using PDBePISA (“Protein interfaces, surfaces and assemblies” service PISA at the European Bioinformatics Institute, http://www.ebi.ac.uk/pdbe/prot_int/pistart.html) (49), with the influenza A/Brevig Mission/1/1918(H1N1) polymerase heterotrimer structure (PDB: 7NHX). We chose to perform this and subsequent analyses with 7NHX because this is the only resolved structure for the H1N1 polymerase complex, which may be a closer approximation to WSN33 RdRp. We used a default water probe of 1.4 Å in diameter to roll over the surface of the entire polymerase complex and added up all points in contact with the probe.

Molecular dynamics simulation and measurement of root mean square fluctuation

We performed a molecular dynamics simulation of A/Brevig Mission/1/1918(H1N1) RdRp (PDB: 7NHX) to measure the relative structural flexibility of the heterotrimer. We removed the RNA molecules in the structure and modeled the missing residues (Chain B, PB1: 187–204 and 645–653) using SWISS-MODEL template-based homology using full sequences from UNIPROT (PA: Q3HM39, PB1: Q3HM40, and PB2: Q3HM41). The global model quality estimate (GMQE) for this homology model is 0.88. Molecular dynamics were simulated using GROMACS on the Princeton University HPC Tiger GPU. The system build parameters used a cubic tip3p water box, charmm27 force field, neutralizing NaCl ions, temperature of 310.15 K (37°C), and time steps of 0.002 ps. The total system had 2,148 protein residues, 166,370 water residues, and 1,002 ion residues. Energy minimization was performed for a total of 100 ps and converged to a maximum force of less than 1,000kJ/mol in 2,102 steps. Equilibration (both constant number of particles, volume, and temperature (NVT) and constant number of particles, pressure, and temperature (NPT) ensembles) was performed for 200 ps. A 20-ns (10,000,000 steps) production simulation took roughly 19hours. We analyzed the resulting trajectory for the root mean square fluctuations (RMSF) of the atomic positions at every time point and calculated the RMSF of each residue within the structure from an average of RMSF values of each atom within the residue.

RESULTS

A comprehensive library of single amino acid substitutions in PB1

We used overlap PCR mutagenesis to create a PB1 plasmid library in which every codon in the WSN33 PB1 open reading frame is mutated to code for every other amino acid. We cloned the mutagenized plasmid library three times, independently, to make three replicate plasmid libraries (Fig. 1). High depth-of-coverage sequencing demonstrated that the three plasmid libraries covered 82%–93% of 24,224 possible codon mutations and 89%–96% of 15,897 possible amino acid substitutions at 757 residues across 758 (stop codon included) sites in PB1 (Table 1). After excluding mutations whose frequencies might have been inflated by mutational hotspots during sequencing library preparation, each replicate library covered 64%–70% of all possible amino acid substitutions with 84% of all possible amino acid substitutions present in at least one replicate library.

Fig 1.

Fig 1

Deep mutational scanning of influenza PB1 protein. Scheme of major steps for generating variant virus libraries. We mutagenized wild-type PB1 by overlap PCR, using primers encoding NNS in the codon for the targeted residue. N refers to an equal mixture of A, T, G, and C nucleotides, while S refers to a mixture of only G and C. This coding is able to generate 32 codons, 20 amino acids, and stop codons. The PB1 variant library was ligated and transformed independently three times to make variant plasmid library replicates. Each plasmid library was then transfected independently along with plasmids expressing the other seven influenza segments to make three variant virus library replicates.

TABLE 1.

Codon and amino acid variant diversity in plasmid libraries before and after filtering out mutations with low codon counts or under the influence of PCR errors.

Library Number of codons Percentage of codons Number of amino acids Percentage of amino acids
Before filtering Replicate 1 22,820 92.6% 15,168 95.4%
Replicate 2 21,995 89.2% 14,786 93.0%
Replicate 3 20,264 82.2% 14,154 89.0%
After filtering Replicate 1 a 11,008 69.3%
Replicate 2 10,565 66.5%
Replicate 3 10,194 64.1%
Present in all replicates 7,351 46.2%
Present in at least one replicate 13,354 84.0%
a

–, indicates not applicable.

We rescued the corresponding viral variant libraries by transfecting HEK293T cells that stably express PB1 with the plasmid libraries and bidirectional expression plasmids containing the other seven genomic segments from WSN33. The passage 0 (P0) viral stocks exhibited titers of 3.51 × 106 to 5.27 × 107 TCID50/mL after 48 hours, slightly lower than those from “wild-type” WSN33 rescues. There were 482 mutations in Replicate 1 (2.1% of total mutations in Replicate 1), 1,371 mutations in Replicate 2 (6.2%), and 175 mutations in Replicate 3 (0.86%) that were present in the plasmid library but not in the P0 viral library, which may indicate lethal mutations. The experimental lethal mutation rate was lower than the expected ~ 25%–30% lethal mutation fraction (50), because the wild-type PB1 protein expressed by the cells partially rescued the variant PB1 proteins with lethal mutations.

We examined the fitness effects of the mutations through serial passage of the variant virus libraries. We passaged the three libraries independently on A549 human lung epithelial carcinoma cells at an MOI of 0.01 for four passages, during which viruses carrying different PB1 substitutions competed against each other. The titers of viruses at each passage decreased slightly to 5 × 106 to 5 × 107 TCID50/mL (Fig. 2A). Forty-three percent of codons and 57% of unique amino acids on average were detected through four passages (Fig. 2B). We used barcoded-subamplicon sequencing to correct for PCR and sequencing errors and measured the frequencies of individual mutations in each library at passages 1 and 4 (Fig. S1). Throughout passaging, we observed signs of purifying selection, reflected by a relative reduction in the number of non-synonymous mutations (Fig. 2C) and in codons with two or three nucleotide changes (Fig. 2D).

Fig 2.

Fig 2

Change in codon and amino acid mutations throughout passaging. (A) Titers of variant virus libraries before and after each passage. (B) Percentage of codon and amino acid variants remaining at each passage. Pla: in plasmid library, before rescue; P0: after rescue, before passaging; P1: after the first passage; P4: after four passages. (C) Frequency of synonymous, non-synonymous, and nonsense mutations in replicate (Rep) plasmid libraries, virus libraries after passages, and the wild-type plasmid and virus samples as controls. (D) Frequency of codon mutations with 1-, 2-, and 3-nucleotide changes in plasmid libraries, virus libraries after passages, and the wild-type plasmid and virus samples. Frequency in both (C and D) panels were averaged across the PB1 gene and were prior to filtering and adjustment in fitness calculations, as described in Materials and Methods.

Replicative fitness of amino acid substitutions in PB1

We quantified the fitness of viral mutants at the amino acid level based on an amino acid’s frequency before and after passage. All fitness values were measured after passage four unless stated otherwise (Fig. 3; Supplemental Data set). Here, the fitness of an amino acid at a site is the log10 enrichment ratio normalized by the average fitness of silent mutations in the same amplicon (see Materials and Methods). Since > 99% of the codons at any given site in the libraries encoded the wild-type amino acid, the change in the frequency of wild-type variants was negligible, and the measured fitness of the wild type (log10 of ~ 1 or 0) was fixed by the experimental design. As expected, the frequency of most mutations decreased after four passages, indicating that most mutations in the influenza virus RdRp are detrimental (fitness < 0, Fig. 3 and 4A). Nonsense mutations never increased in frequency. Fitness measurements were well correlated across biological replicates with Pearson correlation coefficients between 0.788 and 0.864 (Fig. 4B).

Fig 3.

Fig 3

Replicative fitness of amino acid substitutions on PB1. The replicative fitness of individual amino acid variants in PB1, with subdomains annotated by the colored bar above the heatmap. Mutations in gray were excluded from the analysis due to low counts in the plasmid library or high occurrence in the wild-type sample, as described in Materials and Methods. Wild-type amino acids are marked by black dots.

Fig 4.

Fig 4

Precision and accuracy of replicative fitness, as measured by deep mutational scanning. (A) The fitness distribution of missense, nonsense, and silent mutations, after filtering out mutations caused by potential PCR errors. (B) Correlations of variant fitness in three replicates. The upper right panels show the Pearson correlation coefficients of corresponding replicates with the significance level. Diagonal panels show the overall fitness distribution, disregarding the types of mutation. The lower left panels show the fitness values for individual mutants in the indicated replicates. (C) The fitness values of 13 selected mutations were measured by deep mutational scanning or pairwise competition with the wild-type virus. Lethal mutations in the competition assay are shown on the x-axis. R indicates the Pearson correlation coefficient among viable variants, while ρ indicates the Spearman correlation coefficient in all variants, including the lethal mutations. The red line shows the trendline using a linear regression model. The gray zone indicates the 95% CI for predictions from the linear model.

We validated our fitness measurements by comparing the deep mutational scanning fitness of 13 amino acid substitutions to the fitness values we have measured previously by pairwise competition and quantitative RT-PCR (50). These 13 PB1 substitutions were measured in the same genomic background (A/WSN/33/H1N1) with pairwise competition assays performed in the same cells (A549), at the same MOI (0.01), and for the same number of passages (four). The fitness values in two experiments were well correlated with a Pearson correlation coefficient of 0.98 (P < 0.005) for viable variants and a Spearman correlation coefficient of 0.62 (P = 0.023) for all variants including the lethal mutants (Fig. 4C; Fig. S2A and B). The fitness of two non-lethal (R192K and E751D) and one lethal (E519D) substitutions in the targeted mutagenesis (50) could not be measured in deep mutational scanning after filtering for mutations caused by potential PCR errors. Five other lethal substitutions were identified in passaged DMS libraries, but with very low fitness values.

Site entropy defines constraints

We calculated site entropy, or Shannon entropy at each site, based on the enrichment of all amino acid variants at a site. The enrichment of each amino acid variant in the calculation was determined by its enrichment ratio after four passages and normalized to sum to 1 (see Materials and Methods). High site entropy indicates that variation at the amino acid level does not substantially impact viral fitness and/or that several amino acids are equally tolerated at a site. Because the site entropy calculation would be misleading if some amino acids were absent in the initial libraries, we marked and excluded 16 sites with fewer than 40% of amino acid variants (fewer than 9 out of 21 possible variants) generated in the plasmid libraries (Fig. S3).

Site entropy varied across PB1 subdomains. Structural mapping revealed lower site entropy at buried sites and at interfaces between PB1 and RNA and between PB1 and either PA or PB2 (ChimeraX file available at DOI: 10.5061/dryad.p2ngf1vxm). Consistent with this observation, there was a modest, but statistically significant correlation (ρ = 0.28, P < 0.005) between site entropy and a residue’s Accessible Surface Area (Fig. S4A). Residues that are more flexible are often more tolerant to mutation and evolve at a higher rate (23). We performed a molecular dynamics simulation and found the correlation between residue flexibility, captured by the root mean square fluctuation of a 20-ns molecular dynamics simulation of A/Brevig Mission/1/1918(H1N1) RdRp, and site entropy was also weak but significant (ρ = 0.21, P < 0.005) (Fig. S4B). Using the subdomains defined in (46), we grouped site entropy by subdomain. Residues in the fingertips subdomain exhibited lower entropy (P < 0.005 compared with β-hairpin, C-terminal, fingers, prime loop, and ribbon subdomains; P < 0.05 compared with palm and thumb subdomains, and P > 0.05 compared with N-terminal subdomain), residues in the prime loop and ribbon subdomains exhibited higher entropy (for prime loop subdomain: P < 0.005 compared with fingertips, N-terminal, palm, and thumb subdomains; and P < 0.005 compared with fingertips and thumb subdomains), and the distribution of entropy values across other subdomains were largely similar (Fig. S4C).

Because ASA, RMSF, and simple subdomain identity may mask important differences by averaging over a number of high and low entropy sites, we focused subsequent analyses on specific sites with defined functions. The PB1 active site consists of the evolutionarily conserved motifs A–G, with the catalytic metal ions being coordinated by motifs A and C at the edge of the central cavity (29, 51). We used logo plots to display the enrichment of each amino acid substitution at residues in motif C (52). The site entropy for the active site was quite low, and there were few alternatives to the wild-type amino acid (Fig. 5A). Similarly, we evaluated PB1 residues that are bound to RNA by hydrogen bonds or interact with RNA due to proximity at each stage (e.g., apo-enzyme, early template binding, late template binding, cap-snatching, pre-initiation, mixed-initiation, initiation-to-catalysis, catalysis-to-nucleotide-incorporation, elongation, and termination). Here, residues interacting with the template RNA (3′ vRNA) and product (mRNA) had lower site entropy than others. Site entropy at residues that bind to RNA (5′ vRNA) but are not involved in transcription was not significantly different than those of other sites (Fig. 5B).

Fig 5.

Fig 5

Site entropy of key residues. (A) Enrichment of amino acid substitutions at each residue in motif C. Residues conserved in all negative sense RNA viruses are marked with the light-yellow box. Amino acids are colored by their biochemical characteristics. A stop codon is represented by “X.” (B) Site entropy of sites based on their direct interaction with mRNA, 3′ vRNA, and 5′ vRNA, visualized by Tukey boxplot. The line in the boxes represents the median, and the top and bottom of the boxes represent the 25th and 75th percentile. Data points greater than the 75th percentile + 1.5 × interquartile range (IQR) or less than the 25th percentile – 1.5 × IQR are shown outside the box and the whisker. Wilcoxon test. *P < 0.5 and **P < 0.05; NS: non-significant.

Beneficial residues observed in natural evolution

To gain insights into the relationship between mutational tolerance and the long-term evolution of PB1, we compared our measured site entropy to the “natural” amino acid Shannon diversity of each site. The calculation of Shannon diversity in natural sequences is slightly different from that of site entropy for deep mutational scanning; they use the same equation (see Materials and Methods), but the former uses the frequency of each amino acid variant, while the latter uses the enrichment ratio. We divided the records of naturally evolved PB1 sequences from human hosts available on GISAID into pre- and post-2009 subsets, separated by the time period of the 2009 H1N1 pandemic, to minimize the impact of co-circulation of pre- and post-pandemic viruses. After filtering, we evaluated 1,491 PB1 sequences in the pre-2009 data set and 35,501 in the post-2009 data set. Since Shannon diversity is biased for mutations observed in years that have been sampled more densely, we corrected for the uneven sampling of PB1 sequences over time by calculating weighted Shannon diversity as previously described (46, see Materials and Methods). In general, there was greater Shannon diversity in the pre-2009 data set (Fig. S5). We found a moderate correlation between DMS site entropy and natural Shannon diversity in the pre- and post-2009 data sets (pre-2009: ρ = 0.40, P < 0.005; post-2009: ρ = 0.31, P < 0.005; Fig. 6A and B).

Fig 6.

Fig 6

Impacts of DMS fitness and mutational tolerance on natural PB1 evolution. Correlation between the Shannon diversity of naturally occurring sequences (A) before and (B) after 2009 and the site entropy measured by deep mutational scanning. Five hundred five residues in the pre-2009 and one residue in the post-2009 natural sequences are completely conserved. The difference between the number of conserved residues before and after 2009 is potentially due to insufficient sampling prior to 2009. ρ indicates the Spearman correlation coefficient. (C) The minimum nucleotide differences between the wild type (or dominant amino acid) in naturally occurring PB1 sequences and beneficial mutations identified by deep mutational scanning. Each dot represents a beneficial mutation.

Similarly, we determined whether mutations identified as beneficial in deep mutational scanning forecast those that appear in the natural evolution of PB1 in human hosts. We defined 29 mutations as beneficial based on a measured fitness greater than two standard deviations (Z-score > 2) above the mean fitness of silent mutations (the neutral, null model) (Table 2). All beneficial amino acid mutations had one or two nucleotide changes compared with the corresponding wild-type codon. Fourteen of 29 beneficial mutations have occurred during the evolution of H1N1 PB1, and of these, many have appeared multiple times independently. The other 15 were not observed in the available sequences. The majority of beneficial mutations that did appear in natural evolution are accessible by a single nucleotide substitution from the wild-type codon, while those that did not appear in natural sequences usually required two nucleotide substitutions in the codon (Fig. 6C). Of the beneficial mutations, M317V, T323M, I637V, K653R, K691R, and M744A have appeared in >0.1% of all sequences collected in at least 1 year when the mutation was present. Notably, although 691K was the wild type in WSN33 at site 691, arginine (R) was the dominant amino acid in pre-2009 strains (Fig. S6), suggesting a true fitness advantage of arginine over lysine at this site. Lysine was again the dominant amino acid in the 2009 pandemic strain, but 691R has been detected every year.

TABLE 2.

Natural occurrence of beneficial mutations identified by deep mutational scanning

Natural frequency Mutation DMS fitness Z-score Nucleotide change from wild type
Co-existed as dominant amino acids K691R 3.162489 1
Above 0.1% in natural sequences T323M 2.386425 1
M317V 2.209847 1
K653R 2.047385 1
I637V 2.025235 1
M744A 3.443046 2
Appeared but below 0.1% L108R 2.372813 1
K577Q 3.535065 1
V255A 3.021593 1
Q116V 2.247724 2
I164L 2.747743 1
P701L 2.360081 1
I674L 2.073289 1
K578T 2.050551 1
Did not appear in natural data set L108Y 5.126652 2
P647N 3.608524 2
V255S 3.490547 2
V255T 2.050218 2
Q116M 3.409729 2
Q116T 2.281114 2
Q679N 2.979386 2
P510A 2.890052 1
P510G 2.233921 2
R151L 2.793400 1
L351R 2.513735 1
T105R 2.454788 2 a
S261F 2.241466 1
M646A 2.150846 2
N654D 2.020037 2 b
a

Although the wild-type amino acid at site 105 for WSN33 is threonine (T), the dominant amino acid at this site in natural PB1 population is asparagine (N). Therefore, the number of nucleotide change(s) needed for most natural PB1 to have arginine (R) at site 105 should be 2 (from N) rather than 1 (from T).

b

Wild-type amino acid at site 654 was asparagine (N) before 2009 and became serine (S) after 2009. The minimum nucleotide change needed from N to D is 1, and from S to D is 2.

Sites where we identified beneficial mutations have also been found to be relevant to polymerase activity and viral fitness. Mutations at site 317 were identified in the 1997 Hong Kong H5N1 outbreak (53) and were found to be functionally significant for virulence in mammals (54, 55). Site 744 is located in the vRNA-binding region, and M744V was found to be a canine-adaptive mutation of avian H3N2 (56). Site 674 is both part of the contact points between the PB1 C-terminal and PB2 N-terminal subdomains and interacts with the 3′ end of the vRNA promoter; mutations to T, L, and S all increased polymerase activity (57). At the polymerase dimer interface, residue 577 interacts with PA and PB2 (58, 59), and residue 578 orients to a residue in the PB2 unstructured loop; K577E in avian H9N2 increases polymerase activity at a lower replication temperature (60), and serial passage of A/Hong Kong/1/68 (H3N2) in mice also gave rise to K577E/M/Q (61). Lysine 578, the wild type, is a ubiquitination site, and mutations from K578 to both non-charged alanine (A) and positively charged arginine (R) increase polymerase activity but are harmful to viral fitness (62). The neutral side chain of A578 reduced polymerase dimerization, while the positively charged R578 aborted cRNA synthesis and led to the premature assembly of the dimer.

DISCUSSION

We performed a near complete deep mutational scan of the WSN33 PB1 RdRp subunit, defining the impacts of nearly all amino acid substitutions on replicative fitness in A549 cells. Most substitutions are detrimental, and we identified mutational constraints at sites involved in key polymerase interactions, specifically at sites interacting with the RNA template and product. In contrast, mutations in other regions of the protein are better tolerated. Diversity at these sites was moderately correlated with site diversity as defined in available influenza sequences. A small number of mutations are beneficial, and many of these have been observed in natural evolution. Those that were not observed in natural evolution were generally inaccessible by single nucleotide mutation. Our study was comprehensive, as we interrogated a much larger number of codon and amino acid variants compared with studies that evaluate mutations occurring in natural sequences or generated by error-prone PCR. While prior work on the functional domains and evolutionary constraints on RdRp have largely relied on the analyses of sequence conservation (51, 63, 64), our DMS identified significant, site-specific heterogeneity in the influenza virus polymerase.

Through deep mutational scanning, we find that the fitness of mutations on influenza virus PB1 is moderately and positively correlated with site entropy (Fig. S7). The rise of most beneficial mutations requires some degree of mutational flexibility, and highly detrimental mutations are more commonly seen in sites with low mutational tolerance.

Unlike in hemagglutinin (65) and neuraminidase (35), the evolutionary constraints on the influenza virus RdRp are not well defined by protein subdomain. Instead, each subdomain has some sites that are under strict purifying selection and other sites that are more tolerant to mutation. Similar phenomena were observed in naturally occurring genomes, where conservative and variable residues were distributed relatively evenly across major subdomains (66). These findings highlight the importance of local structures and functional interactions in influenza virus replication. As expected, mutations to amino acids with side chains of similar biochemical properties (e.g., charged/uncharged, polar/non-polar) are usually more tolerated. This is consistent with the impact of these biochemical properties on higher-level protein structures: large and non-polar amino acids are more likely to form hydrophobic cores, while polar or charged amino acids are more likely to be surface residues.

We identified beneficial mutations that have been observed in the natural evolution of influenza virus RdRp and found accessibility by single nucleotide substitution to be a key factor determining whether a beneficial mutation can arise naturally. We also identified several adaptative mutations that arose in nature with more than one nucleotide change, which could imply an indirect evolutionary path involving gain and subsequent loss of intermediate mutations (67). Many of the beneficial mutations identified in our study not only increase polymerase activity but have been shown to be functionally important for host adaptation or by altering post-translation modification. In addition, mutations with higher fitness had a moderate but significant association with sites that have higher mutational tolerance.

Our work is subject to several limitations. First, while our deep mutational scan provides comprehensive fitness measurements in the WSN33 genetic background, the measured mutational effects may not be recapitulated in the genetic background of other H1N1 strains. Second, we performed our DMS on A549 cells, which allow for high-volume infections and the robust viral replication necessary for a comprehensive screen with a large library. It is possible that fitness values may differ in a more physiologically relevant replication system, such as primary airway epithelial cells. Third, we focused on the mutational effects of single amino acid substitutions and did not account for epistatic interactions within PB1 and between PB1 and other viral proteins. In natural evolution, interacting sites often co-evolve (46, 68), and an adaptive mutation towards a stimulus is commonly accompanied by compensatory mutations that maintain effective replication (5). Finally, we only examined fitness in terms of replication, but various treatments can be applied to the variant virus library and future research can examine mutational fitness under specific conditions such as with drug selection or altered baseline mutational rates.

Overall, we have developed a comprehensive map of the local fitness landscape for the influenza A virus PB1 protein. In doing so, we identified how specific amino acid substitutions affect the replicative fitness of the virus and the degree of evolutionary constraint at each site. Our work provides a foundation for subsequent studies of influenza virus replication and host adaptation and may prove to be a valuable addition to genomic surveillance efforts.

ACKNOWLEDGMENTS

We thank Jesse Bloom and Shirleen Soh for making their analysis code available and for the helpful suggestions, and Aaron King, Gideon Bradburd, and Kayla Peck for the helpful discussion. We further acknowledge the contributions of all submitters to GISAID. We performed molecular graphics with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from the National Institutes of Health (NIH) R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases. The MD simulations were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology's High Performance Computing Center and Visualization Laboratory at Princeton University.

This work was supported by NIH R01 AI170520 and a Burroughs Wellcome Fund Investigator in the Pathogenesis of Infectious Diseases Award, both to A.S.L., and NIH DP2 AI175474 to A.T.V.

Contributor Information

Adam S. Lauring, Email: alauring@med.umich.edu.

Anice C. Lowen, Emory University School of Medicine, Atlanta, Georgia, USA

DATA AVAILABILITY

Raw sequence reads are available in the NCBI Sequence Read Archive under Bioproject #PRJNA1009589.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/jvi.01329-23.

Figure S1. jvi.01329-23-s0001.tif.

Full description of deep mutational scanning libraries.

DOI: 10.1128/jvi.01329-23.SuF1
Figure S2. jvi.01329-23-s0002.tif.

Fitness comparison between deep mutational scanning and direct competition in early passages.

DOI: 10.1128/jvi.01329-23.SuF2
Figure S3. jvi.01329-23-s0003.tif.

Sites with varying mutational representation.

DOI: 10.1128/jvi.01329-23.SuF3
Figure S4. jvi.01329-23-s0004.tif.

Correlation between site entropy and defined features on RdRp.

DOI: 10.1128/jvi.01329-23.SuF4
Figure S5. jvi.01329-23-s0005.tif.

Amino acid diversity at sites of naturally occurring influenza H1N1 PB1 sequences.

DOI: 10.1128/jvi.01329-23.SuF5
Figure S6. jvi.01329-23-s0006.tif.

Frequency change of amino acid variants at site 691.

DOI: 10.1128/jvi.01329-23.SuF6
Figure S7. jvi.01329-23-s0007.tif.

Correlation between site entropy and mutational fitness.

DOI: 10.1128/jvi.01329-23.SuF7
Supplemental data set. jvi.01329-23-s0008.csv.

Fitness values for all mutants.

DOI: 10.1128/jvi.01329-23.SuF8
Supplemental text. jvi.01329-23-s0009.docx.

Supplemental figure legends, primers, PCR conditions.

DOI: 10.1128/jvi.01329-23.SuF9

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Pappas C, Aguilar PV, Basler CF, Solórzano A, Zeng H, Perrone LA, Palese P, García-Sastre A, Katz JM, Tumpey TM. 2008. Single gene reassortants identify a critical role for PB1, HA, and NA in the high virulence of the 1918 pandemic influenza virus. Proc Natl Acad Sci U S A 105:3064–3069. doi: 10.1073/pnas.0711815105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Zhang X, Li Y, Jin S, Zhang Y, Sun L, Hu X, Zhao M, Li F, Wang T, Sun W, Feng N, Wang H, He H, Zhao Y, Yang S, Xia X, Gao Y. 2021. PB1 S524G mutation of wild bird-origin H3N8 influenza A virus enhances virulence and fitness for transmission in mammals. Emerg Microbes Infect 10:1038–1051. doi: 10.1080/22221751.2021.1912644 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Feng X, Wang Z, Shi J, Deng G, Kong H, Tao S, Li C, Liu L, Guan Y, Chen H. 2016. Glycine at position 622 in PB1 contributes to the virulence of H5N1 avian influenza virus in mice. J Virol 90:1872–1879. doi: 10.1128/JVI.02387-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Xu C, Hu W-B, Xu K, He Y-X, Wang T-Y, Chen Z, Li T-X, Liu J-H, Buchy P, Sun B. 2012. Amino acids 473V and 598P of PB1 from an avian-origin influenza A virus contribute to polymerase activity, especially in mammalian cells. J Gen Virol 93:531–540. doi: 10.1099/vir.0.036434-0 [DOI] [PubMed] [Google Scholar]
  • 5. Goldhill DH, Te Velthuis AJW, Fletcher RA, Langat P, Zambon M, Lackenby A, Barclay WS. 2018. The mechanism of resistance to favipiravir in influenza. Proc Natl Acad Sci U S A 115:11613–11618. doi: 10.1073/pnas.1811345115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Pauly MD, Lyons DM, Fitzsimmons WJ, Lauring AS. 2017. Epistatic interactions within the influenza A virus polymerase complex mediate mutagen resistance and replication fidelity. mSphere 2:e00323-17. doi: 10.1128/mSphere.00323-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Naito T, Shirai K, Mori K, Muratsu H, Ushirogawa H, Ohniwa RL, Hanada K, Saito M. 2019. Tyr82 amino acid mutation in PB1 polymerase induces an influenza virus mutator phenotype. J Virol 93:e00834-19. doi: 10.1128/JVI.00834-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Cheung PPH, Watson SJ, Choy K-T, Fun Sia S, Wong DDY, Poon LLM, Kellam P, Guan Y, Malik Peiris JS, Yen H-L. 2014. Generation and characterization of influenza A viruses with altered polymerase fidelity. Nat Commun 5:4794. doi: 10.1038/ncomms5794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Li J, Liang L, Jiang L, Wang Q, Wen X, Zhao Y, Cui P, Zhang Y, Wang G, Li Q, Deng G, Shi J, Tian G, Zeng X, Jiang Y, Liu L, Chen H, Li C, Yount JS. 2021. Viral RNA-binding ability conferred by SUMOylation at PB1 K612 of influenza A virus is essential for viral pathogenesis and transmission. PLoS Pathog 17:e1009336. doi: 10.1371/journal.ppat.1009336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Varga ZT, Ramos I, Hai R, Schmolke M, García-Sastre A, Fernandez-Sesma A, Palese P. 2011. The influenza virus protein PB1-F2 inhibits the induction of type I interferon at the level of the MAVS adaptor protein. PLoS Pathog 7:e1002067. doi: 10.1371/journal.ppat.1002067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hai R, Schmolke M, Varga ZT, Manicassamy B, Wang TT, Belser JA, Pearce MB, García-Sastre A, Tumpey TM, Palese P. 2010. PB1-F2 expression by the 2009 pandemic H1N1 influenza virus has minimal impact on virulence in animal models. J Virol 84:4442–4450. doi: 10.1128/JVI.02717-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Laporte M, Stevaert A, Raeymaekers V, Boogaerts T, Nehlmeier I, Chiu W, Benkheil M, Vanaudenaerde B, Pöhlmann S, Naesens L. 2019. Hemagglutinin cleavability, acid stability, and temperature dependence optimize influenza B virus for replication in human airways. J Virol 94:e01430-19. doi: 10.1128/JVI.01430-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Goñi N, Iriarte A, Comas V, Soñora M, Moreno P, Moratorio G, Musto H, Cristina J. 2012. Pandemic influenza A virus codon usage revisited: biases, adaptation and implications for vaccine strain development. Virol J 9:263. doi: 10.1186/1743-422X-9-263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Fan RLY, Valkenburg SA, Wong CKS, Li OTW, Nicholls JM, Rabadan R, Peiris JSM, Poon LLM, Perlman S. 2015. Generation of live attenuated influenza virus by using codon usage bias. J Virol 89:10762–10773. doi: 10.1128/JVI.01443-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Kumar N, Bera BC, Greenbaum BD, Bhatia S, Sood R, Selvaraj P, Anand T, Tripathi BN, Virmani N. 2016. Revelation of influencing factors in overall codon usage bias of equine influenza viruses. PLOS ONE 11:e0154376. doi: 10.1371/journal.pone.0154376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Mintseris J, Weng Z. 2005. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci U S A 102:10930–10935. doi: 10.1073/pnas.0502667102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Aharoni A, Gaidukov L, Khersonsky O, McQ Gould S, Roodveldt C, Tawfik DS. 2005. The “evolvability” of promiscuous protein functions. Nat Genet 37:73–76. doi: 10.1038/ng1482 [DOI] [PubMed] [Google Scholar]
  • 18. Andreeva A, Murzin AG. 2006. Evolution of protein fold in the presence of functional constraints. Curr Opin Struct Biol 16:399–408. doi: 10.1016/j.sbi.2006.04.003 [DOI] [PubMed] [Google Scholar]
  • 19. Hom N, Gentles L, Bloom JD, Lee KK. 2019. Deep mutational scan of the highly conserved influenza A virus M1 matrix protein reveals substantial intrinsic mutational tolerance. J Virol 93:e00161-19. doi: 10.1128/JVI.00161-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Goldman N, Thorne JL, Jones DT. 1998. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458. doi: 10.1093/genetics/149.1.445 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Velázquez-Muriel JA, Rueda M, Cuesta I, Pascual-Montano A, Orozco M, Carazo J-M. 2009. Comparison of molecular dynamics and superfamily spaces of protein domain deformation. BMC Struct Biol 9:6. doi: 10.1186/1472-6807-9-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Friedland GD, Lakomek N-A, Griesinger C, Meiler J, Kortemme T. 2009. A correspondence between solution-state dynamics of an individual protein and the sequence and conformational diversity of its family. PLOS Comput Biol 5:e1000393. doi: 10.1371/journal.pcbi.1000393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Marsh JA, Teichmann SA. 2014. Parallel dynamics and evolution: protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure. Bioessays 36:209–218. doi: 10.1002/bies.201300134 [DOI] [PubMed] [Google Scholar]
  • 24. Mintseris J, Weng Z. 2005. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci U S A 102:10930–10935. doi: 10.1073/pnas.0502667102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Eames M, Kortemme T. 2007. Structural mapping of protein interactions reveals differences in evolutionary pressures correlated to mRNA level and protein abundance. Structure 15:1442–1451. doi: 10.1016/j.str.2007.09.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Worth CL, Gong S, Blundell TL. 2009. Structural and functional constraints in the evolution of protein families. Nat Rev Mol Cell Biol 10:709–720. doi: 10.1038/nrm2762 [DOI] [PubMed] [Google Scholar]
  • 27. Franzosa EA, Xia Y. 2009. Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol 26:2387–2395. doi: 10.1093/molbev/msp146 [DOI] [PubMed] [Google Scholar]
  • 28. Ramsey DC, Scherrer MP, Zhou T, Wilke CO. 2011. The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 188:479–488. doi: 10.1534/genetics.111.128025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. te Velthuis AJW, Fodor E. 2016. Influenza virus RNA polymerase: insights into the mechanisms of viral RNA synthesis. Nat Rev Microbiol 14:479–493. doi: 10.1038/nrmicro.2016.87 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kouba T, Drncová P, Cusack S. 2019. Structural snapshots of actively transcribing influenza polymerase. Nat Struct Mol Biol 26:460–470. doi: 10.1038/s41594-019-0232-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. York A, Hengrung N, Vreede FT, Huiskonen JT, Fodor E. 2013. Isolation and characterization of the positive-sense replicative intermediate of a negative-strand RNA virus. Proc Natl Acad Sci U S A 110:E4238–E4245. doi: 10.1073/pnas.1315068110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Soh YS, Moncla LH, Eguia R, Bedford T, Bloom JD. 2019. Comprehensive mapping of adaptation of the avian influenza polymerase protein PB2 to humans. Elife 8:e45079. doi: 10.7554/eLife.45079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Sourisseau M, Lawrence DJP, Schwarz MC, Storrs CH, Veit EC, Bloom JD, Evans MJ. 2019. Deep mutational scanning comprehensively maps how zika envelope protein mutations affect viral growth and antibody escape. J Virol 93:e01291-19. doi: 10.1128/JVI.01291-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, Navarro MJ, Bowen JE, Tortorici MA, Walls AC, King NP, Veesler D, Bloom JD. 2020. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182:1295–1310. doi: 10.1016/j.cell.2020.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lei R, Hernandez Garcia A, Tan TJC, Teo QW, Wang Y, Zhang X, Luo S, Nair SK, Peng J, Wu NC. 2023. Mutational fitness landscape of human influenza H3N2 neuraminidase. Cell Rep 42:111951. doi: 10.1016/j.celrep.2022.111951 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Doud MB, Hensley SE, Bloom JD. 2017. Complete mapping of viral escape from neutralizing antibodies. PLOS Pathog. 13:e1006271. doi: 10.1371/journal.ppat.1006271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Bloom JD, Dingens A. 2019. Tiling primers for codon mutagenesis. Github. https://github.com/jbloomlab/CodonTilingPrimers. [Google Scholar]
  • 38. Bloom JD. 2014. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol Biol Evol 31:1956–1978. doi: 10.1093/molbev/msu173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Dingens AS, Haddox HK, Overbaugh J, Bloom JD. 2017. Comprehensive mapping of HIV-1 escape from a broadly neutralizing antibody. Cell Host Microbe 21:777–787. doi: 10.1016/j.chom.2017.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Peck K. 2017. RE_check. Github. https://github.com/kmpeck/RE_check. [Google Scholar]
  • 41. Doud MB, Bloom JD. 2016. Accurate measurement of the effects of all amino-acid mutations on influenza hemagglutinin. Viruses 8:155. doi: 10.3390/v8060155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Bloom JD. 2015. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinformatics 16:168. doi: 10.1186/s12859-015-0590-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Centers for Disease Control and Prevention . 2019. 2009 H1N1 pandemic timeline. Centers for Disease Control and Prevention. https://www.cdc.gov/flu/pandemic-resources/2009-pandemic-timeline.html.
  • 44. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30:3059–3066. doi: 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Shannon CE. 1948. A mathematical theory of communication. Bell Syst Tech J 27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x [DOI] [Google Scholar]
  • 46. Arcos S, Han AX, Te Velthuis AJW, Russell CA, Lauring AS. 2023. Mutual information networks reveal evolutionary relationships within the influenza A virus polymerase. Virus Evol 9:vead037. doi: 10.1093/ve/vead037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. 2021. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci 30:70–82. doi: 10.1002/pro.3943 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Laskowski RA, Swindells MB. 2011. LigPlot+: multiple ligand–protein interaction diagrams for drug discovery. J Chem Inf Model 51:2778–2786. doi: 10.1021/ci200227u [DOI] [PubMed] [Google Scholar]
  • 49. Krissinel E, Henrick K. 2007. Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797. doi: 10.1016/j.jmb.2007.05.022 [DOI] [PubMed] [Google Scholar]
  • 50. Visher E, Whitefield SE, McCrone JT, Fitzsimmons W, Lauring AS. 2016. The mutational robustness of influenza A virus. PLoS Pathog 12:e1005856. doi: 10.1371/journal.ppat.1005856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Chu C, Fan S, Li C, Macken C, Kim JH, Hatta M, Neumann G, Kawaoka Y, Poon LLM. 2012. Functional analysis of conserved motifs in influenza virus PB1 protein. PLoS ONE 7:e36113. doi: 10.1371/journal.pone.0036113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Wagih O. 2017. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33:3645–3647. doi: 10.1093/bioinformatics/btx469 [DOI] [PubMed] [Google Scholar]
  • 53. Katz JM, Lu X, Tumpey TM, Smith CB, Shaw MW, Subbarao K. 2000. Molecular correlates of influenza A H5N1 virus pathogenesis in mice. J Virol 74:10807–10810. doi: 10.1128/jvi.74.22.10807-10810.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Lycett SJ, Ward MJ, Lewis FI, Poon AFY, Kosakovsky Pond SL, Brown AJL. 2009. Detection of mammalian virulence determinants in highly pathogenic avian influenza H5N1 viruses: multivariate analysis of published data. J Virol 83:9901–9910. doi: 10.1128/JVI.00608-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Nao N, Kajihara M, Manzoor R, Maruyama J, Yoshida R, Muramatsu M, Miyamoto H, Igarashi M, Eguchi N, Sato M, Kondoh T, Okamatsu M, Sakoda Y, Kida H, Takada A. 2015. A single amino acid in the M1 protein responsible for the different pathogenic potentials of H5N1 highly pathogenic avian influenza virus strains. PLoS One 10:e0137989. doi: 10.1371/journal.pone.0137989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Li X, Liu J, Qiu Z, Liao Q, Peng Y, Chen Y, Shu Y. 2021. Host-adaptive signatures of H3N2 influenza virus in canine. Front Vet Sci 8:740472. doi: 10.3389/fvets.2021.740472 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Welkers MRA, Pawestri HA, Fonville JM, Sampurno OD, Pater M, Holwerda M, Han AX, Russell CA, Jeeninga RE, Setiawaty V, de Jong MD, Eggink D. 2019. Genetic diversity and host adaptation of avian H5N1 influenza viruses during human infection. Emerg Microbes Infect 8:262–271. doi: 10.1080/22221751.2019.1575700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Fan H, Walker AP, Carrique L, Keown JR, Serna Martin I, Karia D, Sharps J, Hengrung N, Pardon E, Steyaert J, Grimes JM, Fodor E. 2019. Structures of influenza A virus RNA polymerase offer insight into viral genome replication. Nature 573:287–290. doi: 10.1038/s41586-019-1530-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Chen K-Y, Santos Afonso ED, Enouf V, Isel C, Naffakh N. 2019. Influenza virus polymerase subunits co-evolve to ensure proper levels of dimerization of the heterotrimer. PLoS Pathog. 15:e1008034. doi: 10.1371/journal.ppat.1008034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Kamiki H, Matsugo H, Kobayashi T, Ishida H, Takenaka-Uema A, Murakami S, Horimoto T. 2018. A PB1-K577E mutation in H9N2 influenza virus increases polymerase activity and pathogenicity in mice. Viruses 10:653. doi: 10.3390/v10110653 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Ping J, Keleta L, Forbes NE, Dankar S, Stecho W, Tyler S, Zhou Y, Babiuk L, Weingartl H, Halpin RA, Boyne A, Bera J, Hostetler J, Fedorova NB, Proudfoot K, Katzel DA, Stockwell TB, Ghedin E, Spiro DJ, Brown EG, Zhang L. 2011. Genomic and protein structural maps of adaptive evolution of human influenza A virus to increased virulence in the mouse. PLoS ONE 6:e21740. doi: 10.1371/journal.pone.0021740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Günl F, Krischuns T, Schreiber JA, Henschel L, Wahrenburg M, Drexler HCA, Leidel SA, Cojocaru V, Seebohm G, Mellmann A, Schwemmle M, Ludwig S, Brunotte L. 2023. The ubiquitination landscape of the influenza A virus polymerase. Nat Commun 14:787. doi: 10.1038/s41467-023-36389-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Wu NC, Olson CA, Du Y, Le S, Tran K, Remenyi R, Gong D, Al-Mawsawi LQ, Qi H, Wu T-T, Sun R. 2015. Functional constraint profiling of a viral protein reveals discordance of evolutionary conservation and functionality. PLoS Genet 11:e1005310. doi: 10.1371/journal.pgen.1005310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Chan J-S, Tang Y-S, Lo C-Y, Shaw P-C. 2023. Functional importance of the hydrophobic residue 362 in influenza A PB1 subunit. Viruses 15:396. doi: 10.3390/v15020396 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Lee JM, Huddleston J, Doud MB, Hooper KA, Wu NC, Bedford T, Bloom JD. 2018. Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants. Proc Natl Acad Sci U S A 115:E8276–E8285. doi: 10.1073/pnas.1806133115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Figueiredo-Nunes I, Trigueiro-Louro J, Rebelo-de-Andrade H. 2023. Exploring new antiviral targets for influenza and COVID-19: mapping promising hot spots in viral RNA polymerases. Virology 578:45–60. doi: 10.1016/j.virol.2022.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Wu NC, Dai L, Olson CA, Lloyd-Smith JO, Sun R. 2016. Adaptation in protein fitness landscapes is facilitated by indirect paths. Elife 5:e16965. doi: 10.7554/eLife.16965 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002. Evolutionary rate in the protein interaction network. Science 296:750–752. doi: 10.1126/science.1068696 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. jvi.01329-23-s0001.tif.

Full description of deep mutational scanning libraries.

DOI: 10.1128/jvi.01329-23.SuF1
Figure S2. jvi.01329-23-s0002.tif.

Fitness comparison between deep mutational scanning and direct competition in early passages.

DOI: 10.1128/jvi.01329-23.SuF2
Figure S3. jvi.01329-23-s0003.tif.

Sites with varying mutational representation.

DOI: 10.1128/jvi.01329-23.SuF3
Figure S4. jvi.01329-23-s0004.tif.

Correlation between site entropy and defined features on RdRp.

DOI: 10.1128/jvi.01329-23.SuF4
Figure S5. jvi.01329-23-s0005.tif.

Amino acid diversity at sites of naturally occurring influenza H1N1 PB1 sequences.

DOI: 10.1128/jvi.01329-23.SuF5
Figure S6. jvi.01329-23-s0006.tif.

Frequency change of amino acid variants at site 691.

DOI: 10.1128/jvi.01329-23.SuF6
Figure S7. jvi.01329-23-s0007.tif.

Correlation between site entropy and mutational fitness.

DOI: 10.1128/jvi.01329-23.SuF7
Supplemental data set. jvi.01329-23-s0008.csv.

Fitness values for all mutants.

DOI: 10.1128/jvi.01329-23.SuF8
Supplemental text. jvi.01329-23-s0009.docx.

Supplemental figure legends, primers, PCR conditions.

DOI: 10.1128/jvi.01329-23.SuF9

Data Availability Statement

Raw sequence reads are available in the NCBI Sequence Read Archive under Bioproject #PRJNA1009589.


Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES