Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2022 Feb 20;23(2):bbac028. doi: 10.1093/bib/bbac028

Librator: a platform for the optimized analysis, design, and expression of mutable influenza viral antigens

Lei Li 1,2,#, Siriruk Changrob 3,4,#, Yanbin Fu 5,6,#, Olivia Stovicek 7, Jenna J Guthmiller 8, Joshua J C McGrath 9, Haley L Dugan 10, Christopher T Stamper 11, Nai-Ying Zheng 12,13, Min Huang 14, Patrick C Wilson 15,16,17,3,
PMCID: PMC8921739  PMID: 35183062

Abstract

Artificial mutagenesis and protein engineering have laid the foundation for antigenic characterization and universal vaccine design for influenza viruses. However, many methods used in this process require manual sequence editing and protein expression, limiting their efficiency and utility in high-throughput applications. More streamlined in silico tools allowing researchers to properly analyze and visualize influenza viral protein sequences with accurate nomenclature are necessary to improve antigen design and productivity. To address this need, we developed Librator, a system for analyzing and designing custom protein sequences of influenza virus hemagglutinin (HA) and neuraminidase (NA) glycoproteins. Within Librator’s graphical interface, users can easily interrogate viral sequences and phylogenies, visualize antigen structures and conservation, mutate target residues and design custom antigens. Librator also provides optimized fragment design for Gibson Assembly of HA and NA expression constructs based on peptide conservation of all historical HA and NA sequences, ensuring fragments are reusable and compatible across related subtypes, thereby promoting reagent savings. Finally, the program facilitates single-cell immune profiling, epitope mapping of monoclonal antibodies and mosaic protein design. Using Librator-based antigen construction, we demonstrate that antigenicity can be readily transferred between HA molecules of H3, but not H1, lineage viruses. Altogether, Librator is a valuable tool for analyzing influenza virus HA and NA proteins and provides an efficient resource for optimizing recombinant influenza antigen synthesis.

Keywords: protein engineering, influenza; sequence design, artificial mutagenesis, system biology, Gibson Assembly

Introduction

Influenza viruses continue to be a major pandemic threat in addition to causing highly burdensome annual epidemics. Vaccination has been proven to be an effective approach to prevent infection and global spread of influenza viruses [1]. However, due to frequent mutations in the influenza virus genome, particularly including alterations to the hemagglutinin (HA) and neuraminidase (NA) surface glycoproteins, vaccine-induced immunity is short-lived and frequently ineffective when viral mismatches occur between the vaccine and circulating strain [2, 3]. In this context, developing universal influenza vaccine candidates that induce broadly neutralizing immunity, particularly including antibodies against conserved epitopes of influenza virus surface proteins, is critical to limit the influenza disease burden [1, 4–11].

The use of artificial mutagenesis to design immunogens is a crucial step in vaccine development. To date, several existing tools for sequence analysis, editing and cloning have been used in the influenza research community. Among these, the user friendly MEGA program has been the most widely used since 1993 [12]. This program enables efficient and convenient analysis, including nucleotide sequence translation, multiple sequence alignment and phylogenetic analysis for sequences from multiple species (including influenza). In addition to MEGA, BioEdit has also been frequently used because of its convenient features and graphical user interfaces (UIs) [13], whereas SnapGene software (Insightful Science; available at snapgene.com) is likely the most commonly used tool for protein cloning.

Each of the above programs can manage a specific step in the complete process of influenza protein design and cloning. With this in mind, however, current workflows for generating HA and NA antigens for research purposes still face several challenges that make the process expensive, fallible and time-consuming. First, none of the above sequence analysis tools implement specific configurations for influenza virus proteins. There are three residue numbering systems for HA protein sequences, being the coding sequence (CDS) position, crystal structure-based H1/H3 numbering [14–16] and Burke and Smith HA numbering (see methods for details), that have been commonly but disparately used in the influenza virus research community, and confound sequence analysis and mutagenesis [17, 18]. Notably, existing tools do not provide any compatibility with these HA numbering systems, forcing users to put significant effort into ensuring correct residue identification to avoid errors in analysis. Nucleotide and amino acid sequences are difficult to read, and inefficient and fallible to edit manually. Furthermore, current cloning of HA and NA proteins is expensive and error-prone. Although Gibson Assembly based cloning can assemble multiple linear deoxyribonucleic acid (DNA) fragments for protein cloning and is highly efficient [19], it is cost intensive at the same time. Automated Gibson fragment prediction and databasing of fragments conserved between similar HA and NA constructs would allow accurate and cost-effective production of variant influenza virus protein libraries for a variety of applications. Finally, there are no comprehensive tools to develop individual influenza sequence databases (DBs) and readily compare varied influenza virus sequences and phylogenies, or to immediately export annotated sequences for visualization of epitopes and mutations on representative HA and NA structures. These broad challenges in current influenza virus studies limit the efficiency and breadth of immunology and virology research possible. Improving the accuracy and cost-effectiveness of these processes will improve the development of innovative approaches to study and develop vaccines against influenza viruses.

Aiming to overcome these challenges, we have developed an all-in-one graphical user interface (GUI) tool called ‘Librator’ to facilitate the efficient and comprehensive sequence analysis, editing and cost-effective cloning of influenza virus HA and NA antigens. Librator is an integrated processing platform for influenza viral sequences that seamlessly connects public sequence DBs and laboratory-based sources. Moreover, Librator contains a variety of functions to facilitate the management, analysis, editing and access to influenza sequences, all to improve the efficiency of antigen design and expression. With Librator, users can complete all operations graphically in an integrated system, avoiding difficult-to-read raw sequences or switching between different software applications. Here, to demonstrate the utility of Librator, we use it in conjunction with downstream molecular cloning to generate novel mosaic H1 and H3 recombinant proteins with mosaic antigenic sites. Using this approach, we demonstrate that although antigenicity is readily transferred between disparate H3 proteins, this is not the case for H1 strains. For H3, mostly surface amino acid alterations readily transfer antigenicity, whereas conversely, for H1 proteins, transfer of most of the core structure is required to maintain antigenicity, highlighting challenges in H1 vaccine antigen design. Librator is available on a public data repository for users worldwide: https://wilsonimmunologylab.github.io/Librator/.

Results

Librator is an integrative and user friendly platform for analysis, design and cloning of influenza virus sequences

Librator is designed to provide an integrated, user friendly platform solution for time-intensive, repetitive and fallible sequence editing work, specifically in the context of influenza virus antigen construction. The Librator system is composed of three layers: the UI layer, the logic layer and the data layer. To optimize the user experience, we developed a GUI in the UI layer and integrated all information display and function calling into the main interface (Figure 1). We entrust all error-prone sequence editing and data processing tasks to the background algorithm, so that users can accurately design their sequences with a few clicks on the GUI.

Figure 1.

Figure 1

System structure of Librator. Librator is comprised of three layers: UI layer (GUI), logic layer (all functions) and data layer (SQL DBs). All functions can be divided into two broad categories: basic functions and advanced functions. Basic function includes I/O operations and DB operations, whereas advanced function includes sequence design/editing, fragment design, phylogenetic analysis and structure visualization.

The logic layer is the core of the system and is responsible for handling all user queries, function implementations and data processing. In this layer, we developed robust and effective data input/output (I/O) functions that support multiple widely used data formats and resources, and integrated a comprehensive set of functions that allows users to analyze, edit and clone their influenza virus sequences. First, we developed an HA sequence alignment function with a graphical viewer to help users identify amino acid residues using CDS, crystal structure-based or Burke and Smith numbering systems with ease. We also developed a multiple sequence alignment viewer, a protein structure viewer, a phylogenetic tree builder and a key residue identifier to help users to analyze their data visually and rapidly. Furthermore, we developed multiple functions for users to edit their sequences. With Librator, users can accurately generate mutations on demand, for instance including HA variants that lack sialic acid binding (Y98F) and complicated chimeric/mosaic proteins, in minutes. This is as compared with the manual generation of mutated sequences, which is error-prone and time-consuming. Lastly, to clone and express influenza virus proteins economically, Librator implements a Gibson Clone fragment designer to split influenza virus sequences into several standardized gene fragments, aimed at promoting reagent savings by reusing gene fragments that are shared by multiple constructs of similar HA or NA genes. Lastly, Librator is designed around two Structured Query Language (SQL)-driven DBs for reliable and efficient data storage and access of sequence and fragment data in the data layer.

Taken together, Librator represents a user friendly program with an efficient workflow for the analysis, design and cloning of influenza virus sequences, enabling a significant saving of both users’ efforts and resources.

Librator enables convenient, rapid and comprehensive analysis for influenza virus sequences

Multiple common functions were built-in to Librator to help users analyze influenza sequences more efficiently (Figure 2A). First, an HA numbering aligner was integrated into Librator’s sequence viewer and editor. This function aligns given HA sequences to crystal structures of an H1 (PDB ID: 4JTV) and/or an H3 template (PDB ID: 4HMG) to identify corresponding H1/H3 numberings for each residue. Known major antigenic sites and epitopes are automatically labeled based on this numbering; for example, receptor binding sites (RBSs), classical stalk epitopes and the H1 (Ca1, Ca2, Cb, Sa, Sb) and H3 (A, B, C, D, E) major antigenic sites are color-coded on the H1/H3 numbering rulers in Figure 2B. Notably, epitope definitions in Librator are highly customizable, allowing users to annotate HA sequences according to their specific research interest and focus (Supplementary Figure S1A). Since glycosylation on the HA protein was reported to have an important impact on antigenic drift [20, 21], an ‘N-X-S/T’ pattern that indicates potential N-linked glycosylation sites are also highlighted. This viewer is also capable of displaying fully annotated multiple sequence alignments with two informative modes, original sequence mode and template mode, enabling convenient investigation of evolutionary sequence patterns and mutations (Figure 2C). For example, users characterizing escape mutant sequences induced by selective pressure with monoclonal antibodies (mAbs) or sera can align the HA or NA sequences and immediately visualize and export graphics of regions that were mutated.

Figure 2.

Figure 2

Librator enables efficient analysis of influenza HA and NA sequences and structures. (A) Librator seamlessly connects nucleotide sequences from public DBs and lab sources, providing a variety of functions for sequence editing and design. Some icons are created with BioRender.com. (B) Librator’s graphical sequence viewer with HA numbering aligner. Three numbering rulers—a CDS position ruler, H1 numbering ruler and H3 numbering ruler—indicate positional information for selected residues. (C) Multiple sequence alignment viewer. Users can choose to show all nucleotides and peptides (original mode) or only show sequence difference comparing to a user defined template sequence (template mode). Original sequence mode is displayed on the top, and template mode is displayed on the bottom. (D) Phylogenetic analysis function and tree viewer. Users can generate phylogenetic trees using either nucleotide sequences or peptide sequences. (E) Librator allows users to assess nucleotide conservation and peptide conservation by generating sequence logos. Librator also can visualize the peptide conservation on HA 3D structures with the help of PyMOL or USCF Chimera (right). (F) Librator allows users to visualize peptides on 3D structures of HA proteins with color-annotated amino acids at all antigenic regions and user defined sequence labels with the help of PyMOL and USCF Chimera.

To help users quickly infer phylogenetic relationships among a group of sequences, Librator also allows users to generate and visualize maximal-likelihood trees from either nucleotide sequences or peptide sequences (Figure 2D). Furthermore, powered by WebLogo, Librator allows users to access nucleotide and peptide conservation among groups of sequences [22] (Figure 2E). By automatically exporting amino acid sequences and labeling instructions to PyMOL or University of California San Francisco (UCSF) Chimera, Librator allows users to visualize peptides on 3D structures of HA proteins with color-annotated amino acids according to either peptide conservation score (Figure 2E) or all antigenic regions and user defined sequence labels (Figure 2F) [23, 24]. Librator uses an H1 structure (PDB ID: 4JTV) for visualization of all Group 1 HA structures and a H3 structure (PDB ID: 4HMG) for visualization of all Group 2 HA structures [25–27]. For example, with a single click, users can immediately evaluate whether an escape mutation is predicted to alter a surface amino acid or occurs deeper in the structure potentially driving conformational changes. In addition, a function was also developed that allows users to identify potential key residues between two groups of sequences by ranking residues by their amino acid difference. For example, by comparing pre-1994 and post-1994 human H1N1 seasonal viruses, Librator highlighted the importance of a deletion ‘Δ130,’ which has been validated by experiments [28], by a high ranking score. This tool helps focus on important sequence elements driving influenza virus evolution. Lastly, an independent viewer for users to easily access the Burke and Smith HA numbering scheme proposed by Burke et al. was implemented since it is commonly used in the influenza virus research community [17] (Supplementary Figure S1B). It should be noted, however, that all functions in Librator, including the alignment viewer, sequence editing, and sequence designing, were based on structure-based numbering systems.

Accurate and efficient artificial mutagenesis and sequence editing using Librator

To improve the efficiency and accuracy of mutagenesis and sequence editing, we developed multiple functions to help users design and edit influenza virus sequences. With the use of the HA numbering aligner, users can easily locate target residues and mutate them by simply typing a mutation code using whichever numbering system they prefer. For example, for an H3 sequence (A/England/80740425/2018), typing ‘Y177M’ in the CDS position input will mutate the 177th residue of the CDS from tyrosine (Y) to methionine (M). Exemplifying the utility of having built in an automated translation between influenza HA numbering systems, this is equivalent to typing ‘Y164M’ in H1 numbering HA1 input or typing ‘Y161M’ in H3 numbering HA1 input (Figure 3A). By translating between the various numbering schemes, Librator avoids confusion and mistakes that are common in analyzing influenza sequence data. For NA sequences, only CDS position input is available since it is the only numbering system typically used for NA sequences. To avoid mistakes, Librator validates the original amino acid in the mutation code to make sure it matches the amino acid in the raw sequence in the numbering system used.

Figure 3.

Figure 3

Librator enables efficient design of HA and NA influenza virus proteins. (A) Demonstration of mutating a residue on an HA sequence using each of the three common numbering systems. (B) Librator designs antigen probes for given HA sequences by generating the mutation Y98F (H3 numbering) and replacing the flexible linker and transmembrane region with a Trimerization-Avitag-H6 sequence. This process is demonstrated using a HA structure of A/duck/Alberta/35/76 (H1N1, PDB ID: 6HJR). (C) Scanning all amino acid differences between two antigenically distinct sequences (A/Switzerland/9715293/2013 [SWZ/13] and A/HongKong/4801/2014 [HK/14]). Librator generates a series of sequences, each with a single mutation, to identify key residues of the antigenic drift. (D) Demonstration of chimeric sequence design. Users can replace regions on the target sequence with regions from multiple donor sequences. Details of the product can be previewed on a graphical viewer. This function is designed to transplant epitopes from one sequence (or multiple sequences) to another (mosaic protein) or to combine the HA1 (HA head) from one sequence and HA2 (HA stalk) from another (chimeric protein).

Expression of HA soluble proteins for experimental purposes is an important tool for characterizing influenza virus immunity or mAb specificity. Building on this mutagenesis function, we also developed a function to design HA expression constructs for most HA subtypes (H1–H15, see Methods section for details) with one click that replaces the flexible linker and transmembrane region with a stabilizing T4 fibritin foldon trimerization domain [29], an Avitag for mono-biotinylation and a histidine tag (H6) sequence for nickel-based purification (Figure 3B) [30]. Selecting the ‘probe option’ of this function also introduces a ‘Y98F’ mutation (H3 numbering) that reduces binding to sialic acid for probes to be used in cellular assays such as for flow cytometric sorting of antigen-specific B cells [31] or Libra-seq [32, 33].

In addition to the mutagenesis function, Librator also provides several sequence editing modes. For example, the Cocktail mode allows users to compare a donor sequence to a template sequence and scan all amino acid differences between them. Then Librator will automatically generate multiple sequences based on the template sequence with each identified mutation or their combinations (Figure 3C). This function improves the efficiency of identifying key residues between antigenically distinct viruses. Users can instantaneously generate a library of point mutant variants for expression, for example, to identify which amino acids differing between two HA molecules are important for mAb binding. With these functions, users can generate mutant libraries within minutes compared with manual generation of mutated sequences that can take hours.

We also implemented a feature in Librator to facilitate complicated sequence design. Current influenza vaccine-antigen design efforts aim to retarget immunity away from some epitopes and focused on others through the production of chimeric and mosaic HA (mHA) and NA proteins. Compared with individual mutagenesis, chimeric and mosaic sequence design usually requires mutating multiple regions (groups of residues) or even splicing sequences together from multiple influenza strains. Large numbers of mutations coupled with complex structures make manual design of chimeric/mHA proteins difficult to engineer and prone to error. To overcome this challenge, Librator includes an interactive GUI to enable easy and efficient design of chimeric and mosaic proteins (Figure 3D). Using the graphical sequence viewer, users can easily specify and highlight regions to be replaced on a template sequence and regions to be inserted from a donor sequence. A dedicated viewer displays the current product with information about all replacements. After users review and confirm the current product in the product viewer, Librator will generate a new record of the user designed product, with nucleotide sequence, subtype (same as template) and mutated residues. Altogether, users can easily design complicated chimeric and mHA/NA sequences with extensive mutations or replacement of entire epitopes or regions. Use of Librator in our laboratory has enormously improved the efficiency and accuracy of sequence design.

Librator enables significant reagents saving in influenza virus sequence cloning

Librator’s cloning function notably also maximizes the cost-effectiveness, practicality and accuracy of synthesizing nucleotide sequences for expression by Gibson cloning using a recipe-based generator. For this, Librator capitalizes on the fact that Gibson cloning uses sequence homology of a short overlap/joint region (usually 20–25 bp) between neighboring fragments and also that most HA and NA sequences of a type have highly conserved and homologous regions interspersed with the variable sequence elements. Natural mutations in these proteins are enriched in only a few highly variable regions (e.g. epitopes, antibody binding sites) (Figure 4A). The cloning algorithm of Librator optimizes fragment design for HA and NA sequences to maximize the reusability of gene fragments. Librator typically produces HA as four fragments (user customizable) or NA as three fragments and DBs of all previous fragments generated by a lab. Therefore, HA molecules differing in only one fragment can be synthesized by replacing only the single fragment based on an automatically generated recipe specifying the existing fragments in the laboratories inventory, and the new sequence to be synthesized (Figure 4B). For example, an antibody escape mutation on the HA of a particular strain may contain only several amino acid changes within a single antigenic site, which may be on one fragment of the construct. If the original variant was designed by Librator and expressed in the lab, the escape variant can now be synthesized at only 1/4th the cost. This function becomes particularly cost-effective when libraries of point-mutants are generated. Due to the requirement of Gibson assembly, sequences of a joint region will be included in sequences of two neighboring fragments that share this specific joint. Considering that joint regions are very short, putting them in the most conserved region is relatively easy and is able to maximize the reusability of the accumulated Gibson clone fragments at the same time. For this, Librator identifies potential overlapping regions by locating highly conserved regions based on peptide conservation of all historical HA and NA sequences. These regions are then used to define fragments on a template sequence for each subtype or group of subtypes, ensuring that end compatibility of fragments is unaffected by sporadic mutations, insertions or deletions. In Librator, all query sequences are aligned to the appropriate template sequence to ensure fragments from different batches are subject to the same design, guaranteeing their reusability. Users can clone and express their HA and NA sequences for a reduced cost by reusing fragments in their inventory (Figure 4C). The more sequences users clone, the more comprehensive a fragment inventory they will amass, enabling more fragment reuse and reagent savings.

Figure 4.

Figure 4

Librator reduces reagent costs by designing optimized Gibson Clone fragments for HA and NA sequences. (A) Natural mutations on HA protein are enriched in a few highly variable regions. Peptide conservation was visualized on a H1 protein structure (A/California/04/2009 H1N1, PDB ID: 4JTV). Peptide conservation was calculated from HAs of 58 representative H1N1 viruses from 1918 to 2018. (B) Illustration of fragment designs for a group 1 HA (based on a H1 template) and a group 2 HA (based on a H3 template). Joint regions were determined by locating highly conserved regions on H1/H3 peptides and balancing the length of each fragment. (C) Librator determines overlapping regions based on peptide conservation of all historical HA and NA sequences and then defines fragment design on a template sequence for each subtype. (D) Three modes of customizing the C-terminal domain/tag region for the Gibson cloning downstream end. Default mode directly links the last fragment with the Gibson cloning downstream sequence; however, users can also add a customizable C-terminal domain/tag region for HA proteins: Trimerization domain + Purification tag (e.g. 6xHisTag) or Trimerization domain + AviTag + Purification tag. (E) Graphical viewer of fragments for users to preview their products. (F) Librator users can communicate with a local fragment DB or a remote fragment DB managed by their lab manager. (G) Customized fragment design function for any given sequence. This function allows users to add up to 12 joint regions in their sequences and split their sequences into fragments for Gibson Assembly.

To generate overlapping fragments, we designed uniform fragments on the basis of a classic H1 sequence (A/California/7/2009, H1N1) and a classic H3 sequence (A/Aichi/2/1968, H3N2). We aligned all group 1 HAs (H1, H2, H5, H6, H8, H9, H11, H12, H13, H16, H17 and H18) to the H1 template and all group 2 HAs (H3, H4, H7, H10, H14 and H15) to the H3 template for fragment design (Supplementary Tables S1 and S2, Supplementary Figure S2). For NAs, we designed uniform fragments for each subtype by aligning each of the NA sequences to the template of their respective subtypes (Supplementary Table S1, Supplementary Figure S3). Using sequence alignment, Librator will always locate the correct residue to slice a complete sequence into uniform fragments. This template mapping-based fragment design ensures that all fragments are standardized and not affected by either different batches or sporadic insertion/deletion events (e.g. a deletion of Δ130 between pre-1994 and post-1994 human seasonal H1N1, or insertions in the cleavage site of high pathogenic avian H5 and H7) [28, 34, 35]. We applied this system to several applications to validate its effectiveness and compatibility. Lab practices demonstrated that this tool could help to clone and express proteins at a reduced cost. For example, reagent cost was reduced by 54% when expressing proteins with single mutations to investigate the key residues of the antigenic drift between A/HongKong/4801/2014 (H3N2) and A/Switzerland/9715293/2013 (H3N2) influenza viruses (Supplemental Data S1). Even in an extreme case of expressing HAs of 39 representative H3N2 viruses from 1968 to 2018, using Librator design only increased the reagent cost by 4% while generating many reusable gene fragments for future projects (Supplemental Data S2).

We further developed several supporting functions to enable efficient workflow and a smooth user experience. Users can customize the Gibson upstream connector and downstream connector to fit any cloning vectors. Furthermore, we also designed a user function to customize C-terminal domain/tag regions for HA proteins: Trimerization domain + Purification tag (e.g. 6xHisTag) or Trimerization domain + AviTag + Purification tag for better end-compatibility (Figure 4D). To reduce the risk of error, we designed an interactive GUI on which users can preview their designed fragments before generating all products (Figure 4E). All generated fragments are archived in an SQL-driven DB for better data access and management. To facilitate lab reagent stock management, Librator also allows multiple users to connect to a remote MySQL fragment DB (Figure 4F). Once fragments are generated by users, Librator searches the current fragment inventory, then generates a list of reusable fragments already in the inventory, and novel fragments that need to be ordered with recipes for all sequences engineered. An Excel file containing fragment names and sequences in the format of a 96-well plate is also generated and can be sent to a DNA synthesis companies directly. FASTA format files that contain the fragments of each sequence are generated as well, enabling users to validate their compatibility using sequence analysis software. Lastly, we also developed a general fragment design feature that allows users to split any nucleotide sequence into a few customized fragments, most applicable when reusability is not a priority (Figure 4G).

Design of recombinant H1/H3 mosaic HA proteins using Librator

Librator allows users to design a variety of influenza virus protein antigens, including mutated, chimeric, mosaic and partial protein antigens for different virology and immunology studies. To demonstrate the utility of Librator, we designed recombinant mHA proteins for H1 and H3 subtypes and screened them with a panel of H1/H3-reactive mAbs using enzyme-linked immunosorbent assay (ELISA) to investigate antigenicity (Figure 5A). The H1 mHA was generated using the ‘Fusion’ function of Librator, where the major antigenic sites of the H1 head (A/Hawaii/70/2019) were replaced with the major antigenic sites of a pre-2009-pandemic H1N1 viruses, A/Brisbane/59/2007 (Figure 5B). Specifically, these mHAs were designed to determine if the antigenicity of transferred antigenic sites between two different strains of HA can remain functional. For this we designed mHA antigens, including partial and complete mosaic antigen molecules, based on H1 using Librator (Figure 5B). The partial mosaic antigen was made by having antigenic sites of Cb, Sa and Sb of a recipient HA strain (A/Hawaii/70/2019) replaced by those from the donor HA strain (A/Brisbane/59/2007). The complete mosaic antigen was generated when all five antigenic sites of the donor HA were transferred to the recipient HA (Figure 5C). To determine antigenicity of the mHAs, we utilized a panel of 15 mAbs that have been proven to be reactive to the HA head of the donor strain [36] (Figure 5D). Results generated by ELISA showed that 14 out of 15 mAbs (93.3%) lost nearly all binding to both partial mosaic antigen and complete mosaic antigen. The mAb 217 14A 1F6 has higher affinity to the recipient HA than the donor HA and remained strongly reactive to the partial mosaic antigen but lost binding to the complete mosaic antigen. We speculate that is because this clone is targeting an epitope located within the Ca1 and Ca2 antigenic area of the recipient HA. To summarize this result, we concluded that because of the complicated evolutionary path of H1N1 strains, the antigenic sites from HA of pre-2009 pandemic H1N1 strains cannot be fully maintained when transferred to an HA of a 2009-pdm-like H1N1 strain.

Figure 5.

Figure 5

mHA protein design for universal vaccine development using Librator. (A) Overall workflow of mHA generation and ELISA screening. (B) Design of H1 mosaic proteins using Librator. Sequences were aligned between the donor (A/Brisbane/59/2007) and recipient sequence (A/Hawaii/70/2019) (top). Alignments of antigenic sites of Cb, Sa, Sb, Ca1 and Ca2 among donor (A/Brisbane/59/2007), recipient (A/Haeaii/70/2019), partial mosaic and complete mosaic sequences (bottom). (C) Structure illustration showing that three and five antigenic sites were replaced with those from donor HA strain in partial and complete mosaics, respectively. (D) Heatmap displaying area under curve (AUC) for a panel of monoclonal (m)Abs that are reactive to HA head of A/Brisbane/59/2007 employed to test their reactivities to donor, recipient and mosaic antigens. (E) Amino acid sequences on major antigenic sites A and B of recipient/parental HA strain (A/South Australia/34/2019) replaced by sites A and B of A/Perth/16/2009 (mHA Perth/2019), A/New York/680/1995(mHA New York/1995) and A/Canine/Illinois/12191/2015(mHA Canine/2015). (F) Five antigenic sites of H3 HA are indicated by color code corresponding to each site. (G) Binding profiles of 20 H3N2 head-binding mAbs against three mHAs and their HA donors. Heatmap illustrating AUC of the binding for each mAb.

We also generated an H3 mHA using the same strategy in Librator with the major antigenic sites A and B of A/South Australia/34/2019 (HA recipient), the H3N2 component in Southern Hemisphere’s 2020 influenza virus vaccines, replaced with those from historical and zoonotic H3N2 strains, including A/New York/680/1995, A/Perth/16/2009 and A/canine/Illinois/12191/2015 (HA donors) (Figure 5E and F). We focused our studies on antigenic sites A and B because these epitopes have the largest footprint on the HA head and have previously been shown to be immunodominant [37]. Three mHAs and the four wildtype HAs were expressed as soluble trimeric recombinant HAs and screened with a panel of HA head-binding mAbs. Of 20 HA head-binding mAbs, 13 mAbs bound to HA recipient, whereas 7 HA recipient-negative mAbs were included to determine whether antigenic sites A and B epitopes on HA donors were successfully transferred on to the mHA. Based on binding data, 15 mAbs, 13 mAbs and 5 mAbs were able to recognize mHA Perth/South Australia, mHA New York/South Australia and mHA Canine/South Australia, respectively. However, reduced affinity of binding to mHA Perth/South Australia and mHA Canine/South Australia was noted for several of the mAbs. Among HA recipient-negative mAbs, 83% and 100% retained their binding to mHA Perth/South Australia and mHA New York/South Australia, at a comparable level to its HA donor. These data indicate that antigenic epitopes on the HA donor protein were transposed onto the mHA even though they were composed of more than 10 amino acid substitutions within site A and B. Accordingly, we found five mAbs that possessed broad binding to all HA and mHA proteins tested which will allow us to further identify broadly cross-reactive and protective epitopes on the HA head region.

Taken together, our results not only show that Librator is able to design influenza virus proteins in batches to be tested for antigenicity in a cost-effective and efficient approach to optimize antigen and immunogen design, but also demonstrate that refocusing immunity using mosaic antigens appears to be much more precise for H3 influenza strains and readily possible with each individual antigenic site of the H3 head. Conversely, H1 strains require transfer of much larger domains of the HA head and so will be more difficult to precisely target particular antigenic sites using mosaic antigen technologies.

Discussion

Herein we report on the development of an efficient and user friendly in silico platform for the analysis, design and cost-effective cloning of influenza virus HA and NA glycoproteins. Three primary unique features include: (i) specific configurations for influenza HA and NA proteins, including HA numbering systems, HA structure modeling and influenza antigen probe design; (ii) user friendly GUI and one-stop workflow for sequence analysis, design and cloning; and (iii) a cost-effective cloning strategy for influenza HA and NA proteins. Together, these features make Librator a powerful tool for influenza research beyond what is available using existing software.

Librator shows clear advantages in improving the efficiency, accuracy and economy of influenza research when compared with existing tools. As discussed, comparing sequences and locating correct residues are difficult using these tools and are further complicated by the disparate use of three unique HA numbering systems. In comparison to MEGA 11, a commonly used genetic analysis tool, users can locate residues more rapidly and accurately by using the automatic numbering system and graphical sequence viewer in Librator (Supplementary Figure S4). In particular, MEGA 11 does not allow users to easily identify known epitopes of HA proteins using the alignment viewer and is not compatible with HA numbering systems (Supplementary Figure S4A). Librator automatically aligns given sequences to H1 and H3 templates and reveals residues under each numbering system with an interactive ruler (Supplementary Figure S4B). In addition, all known epitopes, including antigenic sites, RBSs, and N-linked glycosylation sites, are also highlighted in Librator’s alignment viewer. For optimal clarity, Librator presents nucleotide sequences and their peptide sequences in a pairwise manner, enabling a convenient investigation of individual peptides and their codons (Supplementary Figure S4C). Taken together, these unique and influenza-specific features make Librator an ideal tool for the analysis of influenza virus sequences.

Correctly identifying residues of interest is a critical step for artificial mutagenesis. Compared with the inconvenient and error-prone direct editing of raw sequences using existing tools (e.g. MEGA 11 and BioEdit), Librator implements a more secure and precise approach that modifies sequences using a mutation code. Results showed that this approach is compatible with multiple HA numbering systems and can avoid mistakes by validating the peptide of the targeted residue. Furthermore, several additional influenza-exclusive elements, including an antigen-probe designer and sequence fusion function, help users to customize antigens efficiently. In aggregate, these features establish Librator as a premier, state-of-the-art resource for modifying influenza viral sequences among existing software.

The optimized Gibson clone fragment design of HA and NA proteins of influenza viruses is a unique feature of Librator. Some existing tools, for example SnapGene, are able to design gene fragments for the Gibson cloning of any given sequences, including influenza virus sequences. However, only optimized fragment design that is based on a comprehensive investigation of peptide variation of the research subject (e.g. HA) can generate reusable gene fragments across batches and lead to budget conservation. Results showed that cloning with Librator can save up to 50% of budget allocated for cloning purposes (Supplemental Data S1). In practice, we suggest users split the template sequence of interest (e.g. representative strains of genetic clades, representative strains of antigenic clusters and historical vaccine strains) into four optimized gene fragments. As a result, for most of the newly generated sequences, users only need to order one or two new gene fragments given that most known epitopes are located on the first two fragments.

To date, Librator has provided significant demonstrable benefit to our laboratory in facilitating a diverse array of research outcomes. For instance, using the program we designed and cloned influenza virus HA and SARS-CoV-2 spike antigen-probes for the purpose of single-cell profiling of antigen-specific memory B cells in peripheral blood, finding that none protective memory B cells targeting NP and ORF8 adapt over time and are increased in older patients [33]. Secondly, using the peptide conservation model, which allows users to visualize amino acid variation, highlight variable and conserved regions and infer evolutionary patterns on a representative HA structure, we evaluated and visualized the peptide conservation of H1 proteins over the past 40 years, unraveling conserved and broadly protective epitopes important for universal vaccine development [38, 39]. Finally and most recently, using Librator we also examined and quantified the peptide conservation even across influenza HA subtypes, which ultimately aided in identifying a novel HA anchor epitope frequently targeted by broadly neutralizing antibodies [40].

Looking to the future, Librator has the potential to expand toward additional antigens from influenza and other viruses. In recent years, numerous studies have revealed broadly protective epitopes on NA glycoproteins and highlighted the importance of NA as a target of human antibodies [41–44]. Compared with HA, there is still a lack of knowledge of the conservation and antigenicity of NA epitopes; the diverse features of Librator can provide significant benefit in resolving this knowledge gap. In this regard, Librator will be continuously updated with the latest research progress on NA, as well for antigens from lineages such as influenza B. Lastly, this template-based and standardized fragment design also has the potential to be extended to other mutable viruses, such as human immunodeficiency virus, hepatitis C virus, hantaviruses and coronaviruses. Because of the broad applicability of this software, all source code is provided. In summary, Librator provides a user friendly tool with strong demonstrated potential to enhance the efficiency and throughput of influenza viral antigen analysis and construction.

Methods

Dataset

All the HA and NA sequences used in this study were downloaded from the National Center for Biotechnology Information (NCBI) FLU DB (https://www.ncbi.nlm.nih.gov/genomes/FLU/) [45] and Global Initiative on Sharing Avian Influenza Data (GISAID) DB (https://www.gisaid.org/) [46]. H1 protein: 2243 seasonal H1 sequences and 31 575 pdm09 sequences. H3 protein: 61 798 sequences. NA protein: 28 747 N1 sequences, 15 194 N2 sequences, 1430 N3 sequences, 291 N4 sequences, 382 N5 sequences, 2420 N6 sequences, 1188 N7 sequences, 2446 N8 sequences and 2446 N9 sequences. All sequences are peptide sequences.

Gibson clone fragment design for HA and NA proteins

Gibson clone fragments should be designed according to a uniform criterion that is unaffected by sporadic insertions/deletions in different strains, and all the joint regions of neighboring fragments should be located at the most conserved region. Furthermore, an optimized fragment design should also balance the reusability of each single fragment and the total number of fragments. The shorter a single fragment is, the less the probability of mutations will be, enabling higher reusability of each fragment; too short a fragment length will result in a larger number of fragments, however, which highly increases the total reagent cost.

To determine the optimized fragment design (including number of fragments and joint region location), we investigated amino acid variations of all residues of human H1, human H3 and NA (all hosts), and we quantified the amino acid variations by an amino acid variation entropy function:

graphic file with name DmEquation1.gif
graphic file with name DmEquation2.gif

Inline graphic denotes the entropy of the observed symbol. Inline graphic denotes the frequency of the Inline graphic-th amino acid of this residue, Inline graphic denotes the total number of all possible amino acids (N = 20), andInline graphic denotes total number of the Inline graphic-th amino acid of this residue. This entropy function is proposed to quantitatively evaluate the variation level of nucleotide and amino acid sequences [22]. This metric has been widely used in the research field since first published. Here, we adopted this scoring function to quantify the conservation level of each residue of influenza proteins, so that users can quickly identify highly conserved region (or residues) and highly variable residues. Because Gibson cloning requires an overlap region between any two neighboring fragments, highly conserved regions are the best candidates of these overlap regions, therefore rapidly locating highly conserved regions are critical to a reusable Gibson clone fragment design.

By comprehensively considering commercial DNA fragment sizes and prices and distribution of the conserved regions in the HA/NA sequences, we propose an optimized fragment design that divides HA into four fragments and NA into three fragments. Length of joint regions was set to nine amino acids (27 bp in nucleotides) because Gibson Clone Assembly requires at least 25 bp joint region length. Joint regions of group 1 HA were set at 123–131, 264–272 and 403–411 (CDS position on an A/California/7/2009[H1N1] HA). Joint regions of group 2 HA were set at 123–131, 265–273 and 403–411 (CDS position on a A/Aichi/2/1968[H3N2] HA). Joint regions of NA were set at 131–139 and 292–301 (for each subtype, all positions are subject to CDS position on a representative template of this subtype). Joint regions and templates of all HA and NA subtypes are shown in Supplementary Table S1. Furthermore, to maximize the compatibility of joint regions, Librator revised all nucleotide sequences of joint regions by translating them from peptide sequences using a dictionary in which each amino acid only has one corresponding codon.

Pipeline design

To optimize the user experience, especially for biologists without a computer science background and not familiar with command-line tools, we developed a highly interactive GUI for Librator. Function calling, parameter setting and information display were integrated into one main interface with multiple tabs. All functions can be divided into two broad categories: basic function and advanced function. Basic function includes I/O operations and DB operations: parameter setting, create new sequence DB, open existing sequence DB, import sequences and export sequences. Advanced function includes sequence design/editing, fragment design, phylogenetic analysis and structure visualization: the specific functions are sequence information editing, HA numbering, mutation identification, antigen probe design, multiple sequence alignment, phylogenetic analysis, sequence editing, chimeric HA design, structure visualization and Gibson Clone fragments design (Figure 1).

Multiple HA numbering schemes in influenza research field

As discussed in the introduction section, there are three different numbering systems commonly used in the Influenza research field: (a) CDS position, (b) crystal-structure-based H1/H3 numbering and (c) Burke and Smith HA numbering scheme.

Residue number on CDS is usually counted from the first amino acid of the CDS (Methionine). For a given sequence, CDS position can cover all residues of given sequence regardless of sporadic insertion and/or deletion. The crystal-structure-based H1/H3 numbering aligns given sequences against a classic H1/H3 template and assigns position numbers for all residues that can map to the template crystal structures. Thus, inserted residues and non-structural residues (e.g. signal peptides) will not be assigned a residue number because they cannot be aligned to the template crystal structures. Furthermore, numbers of residues in HA1 and HA2 are counted independently. The Burke and Smith HA numbering scheme proposed by Burke et al. aligns given sequences against 26 templates of different subtypes to determine the residue numbers [17]. Different from the structure-based HA numbering scheme, the Burke and Smith HA numbering scheme is based on amino acid sequences without considering structural information, and it counts from the first amino acid of the CDS after signal peptide removal. This numbering scheme has been implemented by FLUDB (https://www.fludb.org/brc/haNumbering.spg?method=ShowCleanInputPage&decorator=influenza) recently. We compared three different HA numbering systems using H1 and H3 template sequences (Supplementary Figure S5; Supplementary Tables S3 and S4).

Because protein structures play an important role in antigen phenotypes, all functions in Librator, including alignment viewer, sequence editing and sequence designing were based on structure-based HA numbering systems. Users can only access the universal HA numbering scheme in the ‘Burke and Smith HA numbering’ viewer.

Antigen probe design

The antigen probe design function makes HA probes for a given HA sequence by generating an ‘Y98F’ mutation (H3 numbering) and replacing the flexible linker and transmembrane region with Trimerization-Avitag-H6 sequence. Residue 98 under H3 numbering is located by the built-in HA numbering aligner system automatically. The transmembrane region is identified by aligning given sequences to an H3 template. This function is not available for most H16 HAs and all H17 and H18 HAs because these sequences are isolated from avian and bat sources, and their residue 98 under H3 numbering is already ‘F.’

Identification of key residues between two groups of sequences

In this function, first we align all sequences from both groups together; then we investigate peptide differences between the two groups for every residue independently. We developed two matrices to evaluate the amino acid differences between two groups:

Non-weighted scoring function

For each residue, we convert amino acid composition of two groups into numerical amino acid vectors:

graphic file with name DmEquation3.gif

Inline graphic denotes the Inline graphic-th amino acid of a total of 21 different amino acid options (20 amino acids + any symbol beside those 20 AAs e.g. alignment gap or unclear amino acid X). Inline graphic denotes the total number of appearances of the Inline graphic-th amino acid. Then we defined a score to represent the difference in amino acid composition between two groups on a specific residue:

graphic file with name DmEquation4.gif

Inline graphic and Inline graphic denote amino acid vectors of residue Inline graphic. Inline graphic denotes peptide difference of residue Inline graphic.

Pattern-induced multi-sequence alignment score-based weighted scoring function

Pattern-induced multi-sequence alignment (PIMA) score is a commonly used scoring matrix in amino acid sequence comparison [47]. Compared with none-weighted scoring function, PIMA scoring matrices assign different scores to different amino acid pairs to indicate the difference among amino acids with different properties. This scoring system has been proved to be effective on quantifying amino acid difference among influenza protein sequences [48].

For each residue, because numbers of sequences in two groups could be different, we perform pair-wise comparisons between amino acids, quantify a PIMA score for each pair and summarize all PIMA scores.

graphic file with name DmEquation5.gif

Inline graphic denotes peptide differences at residue Inline graphic, Inline graphicdenotes PIMA score of m-th amino acid in group 1 and n-th amino acid in group 2. N1 and N2 denote numbers of sequences in group 1 and group 2.

Using these systems, a score = 0 indicates no peptide difference at this residue between two groups. The higher the score is, the bigger the peptide difference will be. Then all residues will be ranked by the score from high to low to facilitate users’ further analysis. In summary, this function gives suggestions of key residues to narrow the candidate range and accelerate biological studies.

Nomenclature of Gibson clone fragments

We defined a nomenclature for all fragments for easier inventory management. Each fragment name is composed of three parts: gene segment subtype (H1–H18, N1–N11), fragment number (F1–F4 for HA, F1–F3 for NA) and a unique numerical ID. For example, H3-F1–0001 denotes a gene fragment at position 1 (first fragment) generated from an H3 sequence with an ID 0001. We designed a SQL table for Librator for inventory management of all gene fragments. There are nine keys in the fragment table: Name (prime key), Segment (HA/NA), Fragment (F1–F4), Subtype, ID, Template (template sequence name), AAseq (amino acid sequence), NTseq (nucleotide sequence) and Instock (yes/no). We also designed an interface for users to manage their fragment inventory.

MAb production

MAbs were generated as previously described [49]. Briefly, variable regions of human antibody heavy and light chains were cloned from human B cells isolated at different time points upon influenza virus vaccination or natural infection and these fragments were sequenced and inserted into human IgG1, human kappa chain or human lambda chain expression vectors. Then the expression vectors of heavy and light chain were cotransfected into HEK-293FT cells. The secreted mAbs in supernatant were harvested 4 days posttransfection and purified using protein A agarose beads as per manufacturer’s instruction (Thermo Fisher). The purified mAbs were buffer exchanged in phosphate buffer saline and stored at 4 °C.

Recombinant proteins

The recombinant HA proteins designed by Librator were generated in-house with gBlock fragment synthesized by Integrated DNA Technology and cloned into HA expression vector with C-terminus 6 × His tag. The recombinant HA proteins were then expressed in Expi293F cells (Thermo Fisher) using ExpiFectamine 293 Transfection kit (Thermo Fisher). The secreted proteins in the medium were harvested 4–5 days posttransfection and purified with Ni-NTA agarose (QIAGEN).

ELISA

High-protein binding microtiter plates (Costar) were coated with 50 μl of recombinant proteins at 2 μg/ml in phosphate-buffered saline (PBS) solution overnight at 4 °C. The plates were washed three times the next day with PBS supplemented with 0.05% Tween 20 and blocked with PBS containing 20% fetal bovine serum (FBS) for 1 h at 37 °C. MAbs were serially diluted 1:3 starting at 10 μg/ml and incubated for 1 h at 37 °C. The plates were then washed three times and incubated with horseradish peroxidase-conjugated goat anti-human IgG antibody (Jackson ImmunoResearch) diluted 1:1000 for 1 h at 37 °C, and plates were subsequently developed with Super AquaBlue ELISA substrate (eBioscience). Absorbance was measured at 405 nm on a microplate spectrophotometer (Bio-Rad). To standardize the assays, control antibodies with known binding characteristics were included on each plate and the plates were developed when the absorbance of the control reached 3.0 units at 405 nm. The experiments were repeated at least twice.

Implementation

The pipeline was primarily implemented in Python3 (version 3.7.3 for MacOS, version 3.9 for Windows 10) using PyQT5 library (version 5.13.0). The executable application was compiled from source code using Pyinstaller (version 4, https://www.pyinstaller.org/). JQuery JavaScript library (version 3.4.1, https://jquery.com/), pyecharts library (version 1.8.1, https://pyecharts.org/) and matplotlib library (version 3.1.1, https://matplotlib.org/) were used to generate figures and integrative HTML sequence viewers. Local DBs were generated by sqlite3 (https://docs.python.org/3/library/sqlite3.html), a Python version of SQLite (version 3.33.0, https://www.sqlite.org/); remote DBs were generated by MySQL (version 8.0, https://www.mysql.com/). The entire project was developed using PyCharm CE community (version 2019.2, https://www.jetbrains.com/pycharm/) integrated development environment. We integrated two sequence aligners MUSCLE (version 3.8.31, https://www.drive5.com/muscle/) and Clustal Omega (version 1.2.3, http://www.clustal.org/omega/) for multiple sequence alignment, H1/H3 numbering alignment, fragment alignment and mutation identification [50, 51]. We also implemented an interface for users to visualize their sequences on 3D structures using PyMOL (version 2.3.2, https://pymol.org/) and UCSF Chimera (https://www.cgl.ucsf.edu/chimera/), to generate a maximum-likelihood tree using RAxML (version 8.0.0, https://raxml-ng.vital-it.ch/) and to visualize a phylogenetic tree using phylotree.js library (http://phylotree.hyphy.org/) [23, 24, 52]. Sequence logos of selected sequences were generated by WebLogo (version 3.7.1, https://weblogo.berkeley.edu/) [22]. Landscapes of multiple sequence alignment are generated by html2canvas (https://html2canvas.hertzen.com/) and Python3. The codon optimization functions is powered by DNA Chisel (version 3.2.6, https://github.com/Edinburgh-Genome-Foundry/DnaChisel). The crystal-structure-based HA numbering system is adopted from a public repository (https://github.com/bloomlab/HA_numbering).

Software and code availability

Librator is freely hosted online (https://wilsonimmunologylab.github.io/Librator/). Tutorials are available from a Wilson Lab GitHub page (https://wilsonimmunologylab.github.io/Librator/), and a .pdf format user guide is also available for downloading. The source code is also available from GitHub (https://github.com/WilsonImmunologyLab/Librator).

We provide executable version of this software for two operating systems: Windows 10 and MacOS. The MacOS version of this software is compiled under macOS Mojave (version 10.14.6) and has been tested under macOS Mojave (version 10.14.6), macOS Catalina (version 10.15.2) and macOS Big Sur (version 11.2.3). The Windows 10 version of this software is compiled under Windows 10 Home (OS build 19042.867) and has been tested under the same system.

This Python-based software is also transferable and can be compiled under other systems (e.g. ubuntu) from source code.

Key Points

  • Librator is an all-in-one program which enables rapid analysis of sequences, phylogenies and structures for key influenza viral antigens, including hemagglutinin (HA) and neuraminidase (NA)

  • The program features a graphical interface which is user friendly to biologists without extensive programming knowledge

  • Implementation of Librator reduces reagent costs associated with Gibson cloning of influenza virus proteins

  • Applications of Librator include the generation of chimeric/mosaic antigens; using Librator-based construction, we show that antigenicity can be readily transferred between HA molecules of H3 lineage virus, but not between H1 viruses

Supplementary Material

Supplementary_Figures_And_Tables_bbac028
SupplementaryData_bbac028

Acknowledgements

We would like to thank Dr. Jesse Bloom and Dr. Julianna Han for their assistance and suggestions for this project.

Lei Li is currently a senior bioinformatics analyst at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. His work focuses on developing computational algorithms for single cell multi-modal data processing and applying bioinformatics approaches to the study of B cell biology.

Siriruk Changrob is currently a postdoctoral researcher at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. She studies the novel influenza vaccine concept based on mosaic influenza proteins to protect against infection from most seasonal and pandemic influenza strains.

Yanbin Fu is currently a postdoctoral researcher at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. He studies design and development of a universal influenza vaccine that could induce broadly neutralizing antibodies.

Olivia Stovicek is a research technician at the Department of Medicine, University of Chicago, Chicago, IL, USA. She studies B cell biology which revolves around the specificity of expressed antibody molecules.

Jenna J. Guthmiller is currently a postdoctoral researcher at the University of Chicago. She is an incoming assistant professor at Department of Immunology & Microbiology, University of Colorado Anschutz Medical Campus. She studies the features of broadly neutralizing antibodies against influenza viruses and how vaccines can induce them.

Joshua JC McGrath is a postdoctoral researcher at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. He is working to engineer novel H3N2 influenza vaccine candidates, and is investigating the relationship between mucosal/systemic B cell responses.

Haley L. Dugan is currently a scientist at Adimab, LLC, Lebanon, NH, USA. Her work focuses on identifying antibody-based drugs that can protect against divergent influenza and SARS-CoV-2 viruses.

Christopher T. Stamper is currently a postdoctoral researcher at the Center for Infectious Medicine within the Department of Medicine Huddinge at the Karolinska Institute in Stockholm, Sweden. His research focuses on the role of ILCs and innate like T cells in human gut homeostasis and inflammatory bowel disease.

Nai-Ying Zheng is a research technician at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. She studies B cell biology which revolves around the specificity of expressed antibody molecules.

Min Huang is a research technician at the Department of Medicine, University of Chicago, Chicago, IL, USA. She studies B cell biology which revolves around the specificity of expressed antibody molecules.

Patrick C. Wilson is a professor at the Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, USA. His research is focused on B cell biology which revolves around the specificity of expressed antibody molecules.

Contributor Information

Lei Li, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Siriruk Changrob, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Yanbin Fu, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Olivia Stovicek, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.

Jenna J Guthmiller, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.

Joshua J C McGrath, Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Haley L Dugan, Committee on Immunology, University of Chicago, Chicago, IL 60637, USA.

Christopher T Stamper, Committee on Immunology, University of Chicago, Chicago, IL 60637, USA.

Nai-Ying Zheng, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Min Huang, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.

Patrick C Wilson, Section of Rheumatology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Committee on Immunology, University of Chicago, Chicago, IL 60637, USA; Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medicine, New York, NY, 10065, USA.

Author contributions

L.L. designed the model, developed the software, performed computational analyses and wrote the manuscript. S.C. and Y.F. tested software, improved software design, designed and performed experiment validation, and wrote the manuscript. O.S. and J.J.G. tested software, improved software design and revised the manuscript. J.M. revised the manuscript. H.L.D. and C.T.S. tested software and improved software design. N.Z. and M.H. performed experimental validations. P.C.W. initiated and supervised the work, designed the model, developed the software and wrote the manuscript.

Funding

National Institute of Allergy and Infectious Disease (NIAID); National Institutes of Health (NIH) grant numbers U19AI082724 (P.C.W.), U19AI109946 (P.C.W.), U19AI057266 (P.C.W.) and the NIAID Centers of Excellence for Influenza Research and Surveillance (CEIRS) grant numbers HHSN272201400005C (P.C.W.).

References

  • 1. Carter  DM, et al.  Design and characterization of a computationally optimized broadly reactive hemagglutinin vaccine for H1N1 influenza viruses. J Virol  2016;90:4720–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Flannery  B, et al.  Interim estimates of 2017–18 seasonal influenza vaccine effectiveness—United States, February 2018. Morb Mortal Wkly Rep  2018;67:180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Xie  H, et al.  H3N2 mismatch of 2014–15 northern hemisphere influenza vaccines and head-to-head comparison between human and ferret antisera derived antigenic maps. Sci Rep  2015;5:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Chiu  C, et al.  Cross-reactive humoral responses to influenza and their implications for a universal vaccine. Ann N Y Acad Sci  2013;1283:13–21. [DOI] [PubMed] [Google Scholar]
  • 5. Hagan  T, et al.  Antibiotics-driven gut microbiome perturbation alters immunity to vaccines in humans. Cell  2019;178:1313–1328. e1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Henry  C, et al.  Monoclonal antibody responses after recombinant hemagglutinin vaccine versus subunit inactivated influenza virus vaccine: a comparative study. J Virol  2019;93:e01150–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Staneková  Z, Varečková  E. Conserved epitopes of influenza a virus inducing protective immunity and their prospects for universal vaccine development. Virol J  2010;7:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Krammer  F, Palese  P. Universal influenza virus vaccines that target the conserved hemagglutinin stalk and conserved sites in the head domain. J Infect Dis  2019;219:S62–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Sun  W, et al.  Development of influenza B universal vaccine candidates using the “mosaic” hemagglutinin approach. J Virol  2019;93e00333–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Kosikova  M, et al.  Imprinting of repeated influenza a/H3 exposures on antibody quantity and antibody quality: implications for seasonal vaccine strain selection and vaccine performance. Clin Infect Dis  2018;67:1523–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Knight  M, Changrob  S, Li  L, et al.  Imprinting, immunodominance, and other impediments to generating broad influenza immunity. Immunol Rev  2020;296:191–204. [DOI] [PubMed] [Google Scholar]
  • 12. Tamura  K, Stecher  G, Kumar  S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol  2021;38:3022–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Hall  T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 1999;41:95–8. [Google Scholar]
  • 14. Deem  MW, Pan  K. The epitope regions of H1-subtype influenza a, with application to vaccine efficacy. Protein Eng Des Sel  2009;22:543–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hai  R, et al.  Influenza viruses expressing chimeric hemagglutinins: globular head and stalk domains derived from different subtypes. J Virol  2012;86:5774–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Steel  J, et al.  Influenza virus vaccine based on the conserved hemagglutinin stalk domain. MBio  2010;1:e00018–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Burke  DF, Smith  DJ. A recommended numbering scheme for influenza a HA subtypes. PLoS One  2014;9:e112302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kirkpatrick  E, Qiu  X, Wilson  PC, et al.  The influenza virus hemagglutinin head evolves faster than the stalk domain. Sci Rep  2018;8:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Gibson  DG, et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods  2009;6:343–5. [DOI] [PubMed] [Google Scholar]
  • 20. Medina  RA, et al.  Glycosylations in the globular head of the hemagglutinin protein modulate the virulence and antigenic properties of the H1N1 influenza viruses. Sci Transl Med  2013;5:187ra170–0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Li  L, et al.  Multi-task learning sparse group lasso: a method for quantifying antigenicity of influenza a (H1N1) virus using mutations and variations in glycosylation of hemagglutinin. BMC Bioinformatics  2020;21:1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Crooks  GE, Hon  G, Chandonia  J-M, et al.  WebLogo: a sequence logo generator. Genome Res  2004;14:1188–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Schrödinger  L. The PyMOL molecular graphics system, version 2.0 Schrödinger, LLC (2017). Google Scholar There is no corresponding record for this reference.
  • 24. Pettersen  EF, et al.  UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem  2004;25:1605–12. [DOI] [PubMed] [Google Scholar]
  • 25. Abola  EE, Bernstein  FC, Koetzle  TF. The protein data bank. Nucleic acids research  2000;28:235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Weis  WI, Brunger  AT, Skehel  JJ, et al.  Refinement of the influenza virus hemagglutinin by simulated annealing. J Mol Biol  1990;212:737–61. 10.1016/0022-2836(90)90234-d. [DOI] [PubMed] [Google Scholar]
  • 27. Zhang  W, et al.  Molecular basis of the receptor binding specificity switch of the hemagglutinins from both the 1918 and 2009 pandemic influenza a viruses by a D225G substitution. J Virol  2013;87:5949–58. 10.1128/jvi.00545-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. McDonald  NJ, Smith  CB, Cox  NJ. Antigenic drift in the evolution of H1N1 influenza a viruses resulting from deletion of a single amino acid in the haemagglutinin gene. J Gen Virol  2007;88:3209–13. [DOI] [PubMed] [Google Scholar]
  • 29. Krammer  F, et al.  A carboxy-terminal trimerization domain stabilizes conformational epitopes on the stalk domain of soluble recombinant hemagglutinin substrates. PLoS One  2012;7:e43603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Benton  DJ, et al.  Influenza hemagglutinin membrane anchor. Proc Natl Acad Sci U S A  2018;115:10112–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Whittle  JR, et al.  Flow cytometry reveals that H5N1 vaccination elicits cross-reactive stem-directed antibodies from multiple Ig heavy-chain lineages. J Virol  2014;88:4047–57. 10.1128/JVI.03422-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Setliff  I, et al.  High-throughput mapping of B cell receptor sequences to antigen specificity. Cell  2019;179:1636–1646 e1615. 10.1016/j.cell.2019.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Dugan  HL, et al.  Profiling B cell immunodominance after SARS-CoV-2 infection reveals antibody evolution to non-neutralizing viral targets. Immunity  2021;54:1290–1303. e1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Harvey  WT, et al.  Identification of low-and high-impact hemagglutinin amino acid substitutions that drive antigenic drift of influenza a (H1N1) viruses. PLoS Pathog  2016;12:e1005526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Li  L, DeLiberto  TJ, Killian  ML, et al.  Evolutionary pathway for the 2017 emergence of a novel highly pathogenic avian influenza a (H7N9) virus among domestic poultry in Tennessee, United States. Virology  2018;525:32–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Dugan  HL, et al.  Preexisting immunity shapes distinct antibody landscapes after influenza virus infection and vaccination in humans. Sci Transl Med  2020;12:eabd3601. 10.1126/scitranslmed.abd3601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Popova  L, et al.  Immunodominance of antigenic site B over site a of hemagglutinin of recent H3N2 influenza viruses. PLoS One  2012;7:e41895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Guthmiller  JJ, et al.  First exposure to the pandemic H1N1 virus induced broadly neutralizing antibodies targeting hemagglutinin head epitopes. Sci Transl Med  2021;13:eabg4535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Guthmiller  JJ, Utset  HA, Wilson  PC. B cell responses against influenza viruses: short-lived humoral immunity against a life-long threat. Viruses  2021;13:965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Guthmiller  JJ, et al.  Broadly neutralizing antibodies target a hemagglutinin anchor epitope. Nature  2021;326:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Henry  C, et al.  Monoclonal antibody responses after recombinant hemagglutinin vaccine versus subunit inactivated influenza virus vaccine: a comparative study. J Virol  2019;93:e01150–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Zhu  X, et al.  Structural basis of protection against H7N9 influenza virus by human anti-N9 neuraminidase antibodies. Cell Host Microbe  2019;26:729–738. e724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Gilchuk  IM, et al.  Influenza H7N9 virus neuraminidase-specific human monoclonal antibodies inhibit viral egress and protect from lethal influenza infection in mice. Cell Host Microbe  2019;26:715–728.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Chen  Y-Q, et al.  Influenza infection in humans induces broadly cross-reactive and protective neuraminidase-reactive antibodies. Cell  2018;173:417–429.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Bao  Y, et al.  The influenza virus resource at the National Center for biotechnology information. J Virol  2008;82:596–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Bogner  P, Capua  I, Lipman  DJ, et al.  A global initiative on sharing avian flu data. Nature  2006;442:981–1. [Google Scholar]
  • 47. Smith  RF, Smmith  TF. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng Des Select  1992;5:35–41. [DOI] [PubMed] [Google Scholar]
  • 48. Sun  H, et al.  Using sequence data to infer the antigenicity of influenza virus. MBio  2013;4:e00230–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Guthmiller  JJ, Dugan  HL, Neu  KE, et al.  Human Monoclonal Antibodies. New York, NY: Humana Press, 2019, 109–45. [DOI] [PubMed] [Google Scholar]
  • 50. Edgar  RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res  2004;32:1792–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Sievers  F, et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol  2011;7:539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Stamatakis  A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics  2014;30:1312–3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_Figures_And_Tables_bbac028
SupplementaryData_bbac028

Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES