Skip to main content
The Journal of Biological Chemistry logoLink to The Journal of Biological Chemistry
. 2020 Feb 6;295(12):3826–3836. doi: 10.1074/jbc.RA119.011258

Structure-based group A streptococcal vaccine design: Helical wheel homology predicts antibody cross-reactivity among streptococcal M protein–derived peptides

Michelle P Aranha ‡,§,1, Thomas A Penfound , Jay A Spencer , Rupesh Agarwal ‡,**, Jerome Baudry , James B Dale , Jeremy C Smith ‡,§
PMCID: PMC7086045  PMID: 32029479

Abstract

Group A streptococcus (Strep A) surface M protein, an α-helical coiled-coil dimer, is a vaccine target and a major determinant of streptococcal virulence. The sequence-variable N-terminal region of the M protein defines the M type and also contains epitopes that promote opsonophagocytic killing of streptococci. Recent reports have reported considerable cross-reactivity among different M types, suggesting the prospect of identifying cross-protective epitopes that would constitute a broadly protective multivalent vaccine against Strep A isolates. Here, we have used a combination of immunological assays, structural biology, and cheminformatics to construct a recombinant M protein–based vaccine that included six Strep A M peptides that were predicted to elicit antisera that would cross-react with an additional 15 nonvaccine M types of Strep A. Rabbit antisera against this recombinant vaccine cross-reacted with 10 of the 15 nonvaccine M peptides. Two of the five nonvaccine M peptides that did not cross-react shared high sequence identity (≥50%) with the vaccine peptides, implying that high sequence identity alone was insufficient for cross-reactivity among the M peptides. Additional structural analyses revealed that the sequence identity at corresponding polar helical-wheel heptad sites between vaccine and nonvaccine peptides accurately distinguishes cross-reactive from non–cross-reactive peptides. On the basis of these observations, we developed a scoring algorithm based on the sequence identity at polar heptad sites. When applied to all epidemiologically important M types, this algorithm should enable the selection of a minimal number of M peptide–based vaccine candidates that elicit broadly protective immunity against Strep A.

Keywords: Streptococcus pyogenes (S. pyogenes), structural biology, bioinformatics, vaccine development, humoral response, vaccine, coiled coil, cross-reactive epitope, heptad site identity, M proteins, multivalent vaccine

Introduction

Streptococcus pyogenes, or group A streptococcus (Strep A),2 causes noninvasive infections such as pharyngitis, impetigo, and cellulitis, as well as invasive and life-threatening infections such as bacteremia, streptococcal toxic shock syndrome, and necrotizing fasciitis (1). Poststreptococcal sequelae that may follow Strep A infections include acute rheumatic fever, rheumatic heart disease, and glomerulonephritis (2). Rheumatic heart disease and invasive Strep A infections are estimated to cause over 500,000 deaths/year, with the vast majority of the disease burden in low- and middle-income countries where the mortality rate is also disproportionately high (3).

To reduce the health burden and the economic impact of these infections, efforts to develop an effective vaccine against Strep A have been ongoing (4). There are a number of potential vaccine candidates (48), and in particular, there has been considerable progress in the preclinical and clinical development of multivalent M protein–based vaccines (913). These vaccines have been formulated using recombinant fusion proteins containing multiple peptides from the variable N terminus of epidemiologically prevalent M types of Strep A (10, 11). One obstacle to this approach has been the perception that protection against Strep A infections is type-specific and that the diversity of M types (>200) may prevent the development of broadly efficacious M protein–based vaccines. However, there is recent evidence that N-terminal M peptides evoke antibodies that cross-react with heterologous M types of Strep A (11), and natural infection with Strep A elicits cross-opsonic antibodies (14). This is supported by the observation that the majority of M proteins are members of structurally and functionally related clusters, indicating that immunity may be a combination of cluster-specific and type-specific antibody responses (15).

We previously showed that the relatedness of M peptides within a single M cluster could be exploited by using a computational structure-based approach to select five N-terminal M peptides that elicited antibodies that reacted with all 17 M peptides in the cluster and opsonized 15 of 17 M types of Strep A (16). In the present study we examined 117 M proteins, responsible for 92% of Strep A infections globally (15, 17). The N-terminal hypervariable region (HVR) of the M protein (residues ∼1–50) contains epitopes that elicit antibodies with the greatest bactericidal activity and is least likely to elicit antibodies that cross-react with host tissues (4, 911, 18). Therefore, we confine our analysis to residues 1–50 and have redefined the sequence-based M clusters (15) to include seven N-terminal clusters (NTCs) containing all 117 M peptides. In this initial study, we focused on a cluster of 21 peptides (NTC6) that shared significant sequence similarity. Using a structure-based computational approach, we designed an NTC6 vaccine that elicited antibodies in rabbits that cross-reacted with 10 of 15 nonvaccine peptides. A posteriori analysis of the nonreactive versus cross-reactive peptides revealed that sequence identity within the polar heptad sites of the predicted α-helical domains within the N-terminal region is a strong predictor of cross-reactivity. The application of this new approach to the structure-based design of multivalent vaccines may result in more broadly cross-reactive and efficacious M protein vaccines.

Results

Sequence-based clustering

117 M types were divided into M peptide NTCs (Fig. 1) by constructing a phylogenetic sequence-based tree of the N-terminal 50 amino acids of the mature proteins that define the HVR region (Geneious, version 9.1.6). The seven N-terminal clusters were designated based on the calculated common branches. The overall phylogenetic relationships of the N-terminal peptides bear some resemblance to the previous description of M clusters based on the whole M sequences (15). In this study we limited the analysis to the NTC6 cluster, which contains 21 different M types that collectively accounted for 33% of all Strep A isolates from children with pharyngitis in North America (19), many of which are prevalent globally (17).

Figure 1.

Figure 1.

N-terminal (residues 1–50) sequence–based clusters of 117 M peptides.

Subclusters of immunologically similar M peptides

A functional matrix of antibody binding and cross-reactivity among the NTC6 peptides, which describes the inhibition of antibody binding to 12 NTC6 M peptides by all 21 peptides in the cluster, was developed by performing ELISA inhibition experiments. The relational matrix of experimentally obtained antibody binding between NTC6 peptides (Table S1) was subclustered using k means into seven immunologically related peptide groups (Fig. 2). To resolve the optimal number of clusters, k was varied from 2 to 9, and the maximal average silhouette coefficient was obtained for k = 7 (Fig. S1). The silhouette coefficient is considered as measure of quality of the structure of a cluster; in other words. it informs us how closely related objects in a cluster are and how distinct or well-separated a cluster is from other clusters (20). Clusters with high silhouette coefficients are well-separated and were considered to contain M peptides more likely to cross-react than clusters with low-silhouette coefficients. For example, from Fig. 2, M84 and M89 belonging to FC2 (s = 0.46) would be predicted to cross-react with greater probability than M1, M9, and M227 belonging to FC1 (s = 0.19).

Figure 2.

Figure 2.

Antibody-binding function-based clusters obtained from k-means clustering of the 12-dimensional experimental antibody-binding functional matrix. FC refers to function-based cluster. s refers to the silhouette coefficient and is given for each individual cluster.

Clusters of structurally and immunologically similar M peptides

The structures of the 21 M peptides were calculated using the de novo computational framework, PEP-FOLD3 (21) (Fig. S2A). To identify structural features that are most relevant to correctly predicting antibody binding, we used the 44 structure-based protein descriptors from the Molecular Operating Environment program (22) as independent variables to describe the PEP-FOLD3 models and the columns of the relational matrix as the dependent variables in a multiple regression analysis with feature ranking. The feature ranking resulted in a subset of 20 top-ranked descriptors from the supervised regression approach that were then used in k-means clustering of the NTC6 M peptides to obtain experimentally informed structure-based clusters (Fig. 3). An overlay of the models that have been grouped together for each cluster shows that models belonging to the same cluster share greater structural similarity than models belonging to different clusters (Fig. S2B).

Figure 3.

Figure 3.

k-means clusters based on 20 top-ranked features identified from multiple regression using structure-based computational cheminformatic features and experimental antibody-binding data. The tick size on the bottom axis represents one 3D PEP-FOLD3 model (five were generated for each sequence). For example, cluster 5 contains two models of M112, five models of M102, and three models of M77. s refers to the silhouette coefficient and is reported for each cluster.

There was considerable overlap between the experimentally informed clusters and antibody-binding function-based clusters (Rand index = 0.77). FC3, FC4, FC6, and FC7 in Fig. 2 resembled and respectively corresponded to experimentally informed clusters 1, 5, 4, and 7 in Fig. 3. This demonstrates that the experimentally informed structure-based top 20 descriptors can adequately detect and isolate together different M peptides with similar antibody-binding function.

Next, the experimentally informed clusters (Fig. 3) were considered together with the functional data (Fig. 2) to select a minimal number of peptides predicted to elicit broad cross-reactivity against the remaining 15 M peptides in NTC6. The s coefficient was used as the main criterion in this selection. However, in Figs. 2 and 3, for some of the clusters the values of s do not indicate strong clustering of data. Therefore, additional information was taken into account in selecting the final vaccine candidates. In the case of the experimentally informed clusters, if three or more PEP-FOLD3 models of different M types each shared a cluster, they were considered to be more likely to cross-react than those with fewer than three models in a same cluster. As an example, in Fig. 3 we observed that cluster 6 contained only one model each of M114, M112, M73, M50, and M49, and these M types were considered unlikely to be immunologically related. In contrast, M1, M238, and M239 had all five models placed in cluster 4 and were considered likely to cross-react. Based on these considerations and also taking into account epidemiological prevalence, six M types were predicted to elicit cross-reactive antibodies against the remaining 15 M peptides in NTC6 (Table 1). The vaccine construct is shown in Fig. 4A.

Table 1.

Selected vaccine M peptides and predicted antibody cross-reactivity against nonvaccine M peptides

FC stands for antibody binding function-based cluster described in Fig. 3, and SC stands for experimentally informed structure-based clusters using the 20 top-ranked protein features described in Fig. 4.

Vaccine type Nonvaccine type predicted to be covered Coverage predicted based on FC or SC M types covered
M89 M84 FC 5
M232 SC
M175 SC
M227 SC
M1 M238 FC/SC 3
M239 SC
M2 M114 SC/FC 3
M124 FC
M77 M102 SC/FC 4
M112 SC
M15 SC
M118 + M73 M183 SC 6
M49 SC
M9 SC
M50 SC
Total number of M types predicted to be covered 21

Figure 4.

Figure 4.

A, schematic representation of the hexavalent NTC6 recombinant experimental vaccine containing the N-terminal 50 amino acids of the indicated M proteins linked in tandem without spacers. B, NTC6 peptide-specific and cross-reactive antibodies evoked in rabbits by the NTC6 vaccine. Antibody titers from the immune sera of three rabbits are shown. For this analysis, the nonvaccine peptides were considered to be cross-reactive when two of the three immune sera displayed antibody titers of at least 800, which is an 8-fold increase over preimmune antibody levels. As a group, the marginally cross-reactive peptides resulted in geometric mean titers <700 (range 126–635), whereas the cross-reactive peptides resulted in geometric mean titers of >1,200 (range 1,270–25,600).

Hexavalent NTC6 vaccine evoked antibodies against vaccine and nonvaccine peptides

In three rabbits immunized with the hexavalent NTC6 vaccine, immune sera contained significant levels of antibodies against all vaccine peptides as well as significant levels of antibodies against 10 of the 15 nonvaccine peptides (Fig. 4B). All preimmune sera resulted in antibody titers of 100 against all 21 peptides. For this analysis, the nonvaccine peptides were considered to be cross-reactive when two of the three immune sera displayed antibody titers of at least 800, which is an 8-fold increase over preimmune antibody levels. Using these criteria, the NTC6 vaccine antisera were only marginally cross-reactive with five nonvaccine peptides: M15, M50, M112, M183, and M238 (Fig. 4B).

Cross-reactivity and sequence identity were correlated but not without exceptions

Given the experimental results in Fig. 4B, it was of interest to conduct an a posteriori analysis of factors that distinguish antibody cross-reactivity with nonvaccine peptides. To understand the extent of correlation between sequence identity and the antibody cross-reactivity, we performed a pairwise sequence alignment between the 21 M peptides and calculated their mutual sequence identities using the EMBOSS Needle program (23) (Table 2). The maximum pairwise sequence identities between nonvaccine peptides and vaccine peptides extracted from Table 2 are listed in Table 3.

Table 2.

The pairwise sequence identity matrix between 21 M types

The asterisk denotes a vaccine M type. The highest sequence identity that an M type can share with a vaccine M type is highlighted with a gray background. The five marginally cross-reactive M types are shown with a black background.

graphic file with name zbc999202069t002.jpg

Table 3.

Maximum pairwise sequence identity between NTC6 nonvaccine peptides and vaccine peptides

Nonvaccine peptide Maximum pairwise sequence identity (%) Corresponding vaccine peptide
M84 73 M89
M124 71 M2
M114 67 M73
M175 67 M2
M232 62 M89
M49 57 M118
M239 56 M1
M102 55 M77
M238a 54 M1
M112a 50 M77
M9 47 M118
M227 46 M1
M183a 46 M118
M15a 35 M118
M50a 25 M2

a The M types against which marginal cross-reactivity was observed.

Five nonvaccine M types (M84, M124, M114, M175, and M232) have pairwise sequence identities greater than 60% with at least one or more vaccine M peptides and exhibited cross-reactivity with the antibodies elicited by the vaccine. The nonvaccine M types with low pairwise sequence identity (<40%) with one or more vaccine peptides were M15 and M50, and these M peptides displayed marginal cross-reactivity with antibodies elicited by the vaccine. Thus, there is a moderate positive correlation between the pairwise sequence identity among the vaccine peptides and the nonvaccine peptides and the cross-reactivity, as indicated by the Spearman correlation (ρ = 0.56, p value = 0.05) (Fig. 5). However, when nonvaccine peptides shared between 40 and 60% sequence identity with any of the vaccine peptides, sequence identity could not be used to infer whether the antibodies raised by the vaccine peptide would cross-react with the nonvaccine peptide. For instance, M238, M112, and M183 share 45–60% sequence identities with vaccine peptides and were only marginally cross-reactive with the vaccine antisera. On the other hand, nonvaccine peptides M9, M49, M102, M227, and M239 share a similar degree of sequence identity with at least one of the vaccine M peptides and did cross-react with the NTC6 vaccine antisera. Thus, although significant sequence identity is a useful consideration for antibody cross-reactivity, it alone cannot reliably predict cross-reactivity.

Figure 5.

Figure 5.

Maximum sequence identity moderately correlates with cross-reactive immunogenicity of NTC6.1 vaccine. The range of Spearman rank correlation is between −1 and +1, with +1 indicating the Y variable as a perfectly increasing monotone of the X variable and −1 indicating the Y variable as a perfectly decreasing monotone of the X variable. A Spearman correlation of 0 signifies no correlation between the X and Y variables.

Coiled-coil heptad repeat sequence identity

Although overall sequence identity was correlated with cross-reactivity, one might expect the three-dimensional structure of the antigen in vivo to also be of importance. The monomer structures predicted by PEP-FOLD3 have extensive helical content, and in many cases the peptide was predicted to fold on itself in a manner resembling coiled-coil structures. The α-helical coiled-coil structure is also evidenced by available crystal structures (24, 25) of M1, M2, M22, M28, and M49. Therefore in a further analysis, we assume that this coiled-coil extends into the N terminus. Heptad repeats form coiled coils, and the positions in the heptad repeat are labeled a–g. The core-forming positions of the coiled-coil (a and d) are usually occupied by hydrophobic residues whereas the remaining, solvent-exposed positions (b, c, e, f, and g) are dominated by hydrophilic residues.

The probability of shared epitopes between vaccine and nonvaccine M types increases with increased sequence identity. Therefore, instead of simply comparing overall sequence identity, we considered only the region within the N-terminal 1–50 residues that is predicted to be significantly coiled-coil (MARCOIL (26) assigned coiled-coil probability ≥ 2%) and calculated the sequence identity between each of the corresponding heptad sites of vaccine and nonvaccine M types. The heptad repeat projected on a helical wheel generated using DRAWCOIL 1.0 (27) for M types that share high overall sequence identity with vaccine type M1 (>40%) is shown in Fig. 6.

Figure 6.

Figure 6.

MARCOIL heptad position assignment of the N-terminal regions of M1, M238, M227, and M239 projected onto a helical wheel generated using DRAWCOIL 1.0. Only heptad repeat regions with probability percentages of >2% were considered for helical wheel representation. The view is from the N-terminal region to the C-terminal region. Heptad repeat positions are labeled a–g. The following color scheme is followed: polar positive, blue; polar negative, red; polar neutral, orange; and nonpolar aliphatic, gray.

Empirical scoring scheme for predicting cross-reactivity

We developed an empirical scoring scheme that penalizes low sequence identity and rewards high pairwise sequence identity at the corresponding heptad sites between the vaccine and nonvaccine M types. Pairwise alignments of the vaccine and nonvaccine peptides at the heptad positions was performed. In calculating the empirical pair score between vaccine and nonvaccine M types, we left out the hydrophobic positions (a and d) of the heptad. This was done in part because little sequence variation is found at the hydrophobic core sites of the heptad (a and d) that are restricted in terms of the types of residue that can occupy those positions, favoring hydrophobic residues such as leucine, isoleucine, valine, and alanine. Heptad positions a and d are therefore conserved among many M types and do not serve as good discriminants between cross-reactive and non–cross-reactive types. Most of the NTC6 M types are leucine zippers, i.e. they contain an abundance of leucines at the d site. Another reason for not taking a and d into account is that on the surface of the bacterium, and possibly in the vaccine, these residues are buried and are presumably least accessible for antibody contact and recognition.

For the remaining polar heptad positions, the following scores were given when comparing vaccine and nonvaccine peptides: 20 ≤ sequence identity < 30, score: 1.0; 30 ≤ sequence identity < 40, score: 1.5; 40 ≤ sequence identity < 50; score: 2.0; 50 ≤ sequence identity < 60; score: 3.0; sequence identity ≥ 60; score: 3.5; and sequence identity < 20; score: −1.0. An empirical score was then calculated by summing up the scores at each of the polar heptad positions. As an example, the pairwise sequence identity of nonvaccine M238 with vaccine peptides at each of the heptad positions is shown in Table 4. The empirical pairwise score for M238 with vaccine peptide M1 is thus calculated: (2 × 1) + (1 × 1.5) + (0 × 2) + (0 × 3) + (1 × 3.5) + (1 × −1) = 6.

Table 4.

Example of heptad position identity scoring between nonvaccine and vaccine M types

The nonvaccine type was M238. The vaccine types were M1, M2, M77, M118, M73, and M89.

Heptad position M238–M1 M238–M2 M238–M73 M238–M77 M238–M89 M238–M118
    a 50 10 10 29 0 7
    d 43 42.9 43 38 25 29
    b 67 11.1 9 29 13 9
    c 33 11.1 0 11 9 14
    e 29 9.1 22 14 11 8
    f 14 0 13 8 8 9
    g 25 11.1 11 25 43 29
Sequence identity range Number of polar heptad sites within the sequence identity range
≥20 and <30 2 0 1 2 0 1
≥30 and <40 1 0 0 0 0 0
≥40 and <50 0 0 0 0 1 0
≥50 and <60 0 0 0 0 0 0
≥60 1 0 0 0 0 0
<20 1 5 4 3 4 4
Empirical pairwise score 6 −5 −3 −1 −2 −3

Low sequence identity at heptad positions is correlated with low antibody cross-reactivity against M peptides that share high overall sequence identity

Empirical pairwise scores based on the sequence identity at the heptad sites between all vaccine and nonvaccine peptides for NTC6 are given in Table S2. Table 5 shows the maximum empirical pairwise score between nonvaccine and vaccine M types for the NTC6 peptides based on heptad homologies. The score distinguished between cross-reactive and marginally cross-reactive M peptides. When the score for a pair was ≥10.5, antibody cross-reactivity was observed. Conversely, when a score was <10.5, cross-reactivity was not observed. Fig. 7 shows the geometric mean antibody titers obtained after immunizing three rabbits with the NTC6 vaccine plotted against the maximum heptad identity score of each nonvaccine peptide. This indicates that a score of ≤10.5 predicted all five marginally cross-reactive peptides in the NTC6 cluster. Spearman's rank-order correlation is (ρ = 0.75, p value = 0.001).

Table 5.

Maximum empirical pairwise score between a nonvaccine M type and a vaccine M type for NTC6 peptides

Nonvaccine M type Max empirical heptad identity score Vaccine M type
M114 17.5 M73
M124 16.0 M2
M175 16.5 M2
M232 15.0 M89
M84 14.0 M89
M239 13.5 M1
M227 12.0 M1
M102 12.5 M77
M49 11.5 M118
M9 11.0 M118
M183a 10.5 M118
M112a 9.5 M89
M15a 7.0 M77
M238a 6.0 M1
M50a 5.5 M89

a M types that did not cross-react.

Figure 7.

Figure 7.

Maximum heptad identity score strongly correlates with cross-reactive immunogenicity of NTC6 vaccine.

An illustration of the use of heptad identity is the correct prediction of lack of cross-reactivity of M238 and vaccine peptide M1, despite high overall sequence identity between the two (54%). These results indicate that mutual heptad identity between the vaccine and nonvaccine M types is an important indication of the degree of shared epitopes, as well as an important determinant of antibody cross-reactivity among sequence-similar M peptides.

Discussion

The M protein of Strep A is a major protective antigen and a leading vaccine target. A significant challenge to the development of M protein–based Strep A vaccines has been the number of different M types (>200) identified to date. Each emm type is defined by the 5′ sequence that encodes the variable N terminus of the mature protein (28). Recently it has been shown that the majority of M proteins can be clustered based on similar structural and functional characteristics (15), leading to a new paradigm suggesting that M antibody responses may be M cluster–specific as well as type-specific. The overall goal of structure-based approaches to vaccine design is to focus on the structural and functional similarities of M proteins to identify the fewest number of M peptides needed to formulate vaccines that will elicit immune responses against the majority of epidemiologically important M types of Strep A. In the present study, we have used both 3D structure-based and sequence-based approaches to analyze the antibody cross-reactivity among Strep A N-terminal M peptides from one sequence-related cluster. Because our subunit vaccines contain peptides from the N-terminal regions of the M proteins (11), we have redefined the M clusters using only the first 50 amino acid residues, as opposed to our previous studies that clustered the sequences of the entire protein (15).

In principle, if a complete antibody-binding matrix were available, it could simply be clustered to obtain immunologically related M types. However, the Table S1 is incomplete because the human intravenous immunoglobulin (IVIG) antibodies did not react with all peptides, so we identified structural features that correlated with immunological data and then used those features to cluster M types. The end result was the identification of clusters of peptides that shared immunologically relevant structural features. This also allowed us to test the hypothesis that M types that share similar structural features are immunologically related.

The starting point for the current structure-based analysis was sequence similarities among M peptides within the same cluster. However, because sequence similarity does not directly take into account conformational similarity of putative epitopes, we then extended the analysis to determine 3D structures of the peptides and to compare their shape-dependent properties using cheminformatics methods. All 21 peptides within the NTC6 cluster were then subclustered based on similarities among 3D models and functional antibody inhibition studies. Six peptides, each representing one of the subclusters, were selected to construct the recombinant NTC6.1 vaccine. The vaccine elicited significant levels of cross-reactive antibodies against 10 of the 15 nonvaccine peptides within NTC6.

In an a posteriori analysis of the cross-reactivity results, we found that a method distinguishing cross-reactive and non–cross-reactive pairs that only considers the coiled-coil domain within the N-terminal region and that calculates the homology between the residues at the corresponding polar heptad sites was more discriminating than predicting cross-reactivity from calculated 3D monomer structures. This method makes the assumption that the peptide epitopes are in a helical conformation. Although coiled-coil structures are strongly predicted for the M proteins, there is some evidence that the N-terminal ∼15 residues are less likely to be coiled coils (24). The heptad identity–based scoring scheme incorporates simple structural data into sequence-based considerations and is a computationally efficient and effective way of predicting antibody cross-reactive immunogenicity among Strep A M peptides.

Two factors that must be considered while determining antibody cross-reactivity are that the primary requirement for antibody binding or antigenicity is surface accessibility (29, 30) and that 80% of all naturally occurring antibody epitopes studied so far are discontinuous in sequence, as expected because the antibody sees a contiguous surface. The heptad identity–based scoring scheme takes into account both of these factors by assigning precedence to surface-accessible polar residues in forming the dominant epitopes that stimulate the immune response and by searching for homology in the surface-accessible regions between nonvaccine peptides and vaccine peptides.

The determination of the helical wheel homology helps ascertain the probability of a nonvaccine M type sharing the same epitopes as the vaccine M type and the extent to which it will be recognized by the same antibody that recognizes the vaccine M type. We have shown that an analysis of the conservation of exposed residues between the vaccine and nonvaccine peptides aided by the construction of a helical wheel yields a positive correlation with observed antibody cross-reactivity. Future Strep A structural vaccinology work will include expanding epitope analysis with solution biophysical experiments and detailed molecular dynamics simulation of the native M proteins and vaccines. Our previous studies of multivalent vaccines containing N-terminal peptides of M proteins have shown that these complex vaccines elicit type-specific and cross-reactive antibodies that promote opsonization and killing of Strep A bacteria (11, 16). Future experiments will rely on several structure-based approaches to identify the fewest number of M peptides to include in broadly protective vaccines that will be tested for functional opsonic antibodies against multiple epidemiologically important M types of Strep A.

Experimental procedures

Vaccine design strategy

The workflow for the selection of vaccine candidates is presented in Fig. 8. Sequence-based clustering of the N-terminal 50 residues was performed, and one cluster, NTC6, containing 21 M types was selected for study. We aimed to select a minimum number of M peptides that would elicit antibody responses against the majority of streptococcal M peptides within NTC6. The sequences of the HVRs of the 21 M types are provided in Table S3, and their pairwise sequence identities were obtained using the EMBOSS program Needle (Tables 2 and 3). 3D structural models were created of monomeric forms of these peptides upon which cheminformatic analysis was performed. Partial data on experimental antibody binding were also derived (antibody cross-reactivity and the development of a functional matrix). Combining the experimental antibody-binding data with the cheminformatic results involved multiple regression calculations and resulted in the generation of experimentally informed cheminformatics descriptors. The candidate experimental vaccine was chosen based on the experimental and computational results.

Figure 8.

Figure 8.

Workflow for selection of vaccine candidates among NTC6 M types.

Antibody cross-reactivity and the development of a functional matrix

A functional matrix of antibody binding and cross-reactivity among the NTC6 peptides was developed using naturally acquired human antibodies in commercially available IVIG (Gammagard liquid; Baxalta US Inc., Westlake Village, CA). IVIG contains purified IgG that is pooled from the serum of multiple donors and contains antibodies against many of the common M antigens. Functional relatedness was assessed by performing ELISA-inhibition experiments performed in duplicate (16) using all 21 NTC6 peptides as inhibitors of IVIG antibody binding to 12 of the NTC6 peptides (Table S1). The 12 peptides were selected based on direct ELISA results, indicating that the IVIG contained levels of antibodies sufficient to perform inhibition assays. Relative functional “distances” were expressed as 100 − (percentage of inhibition), where 0 = complete functional identity and 100 = no functional activity, i.e. no inhibition. k-means clustering of antibody binding between NTC6 peptides was used to identify “subclusters” of functionally related peptides, and the optimal number of clusters was determined by a silhouette analysis (20).

Structural models of monomeric M peptides

In designing the vaccine we used our previous approach (16) of modeling the structures of the linear peptides in monomeric form in aqueous solution and identifying physicochemical similarities in the resulting computed structures as an indication of cross-reactivity. PEP-FOLD3 (21) is a computational framework that allows de novo structure prediction for linear peptides between 5 and 50 amino acids. PEP-FOLD3 was used to predict the 3D structures of the N-terminal regions (50 residues long) of all 21 NTC6 M peptides as monomers (Fig. S2A). Five top-scoring conformations per M peptide were retained for calculation of structural descriptors. The Molecular Operating Environment program package (22) was used to calculate structure-based physicochemical descriptors (44 nonzero variance descriptors) such as peptide surface area, patches of excess charge, hydrophobicity, volume, and shape of the 1–50 residue 3D models of M peptides that were obtained using PEP-FOLD3 with default parameters. For initial quality check, clusters based on all the nonzero descriptors are shown in Fig. S3.

Machine learning to identify clusters of structurally and immunologically related M types

Instead of using all of the nonzero variance descriptors as in our previous study (16), we chose only the top descriptors that correlated the PEP-FOLD3–predicted structures with antibody binding. We assumed that the use of these descriptors would improve the probability of identifying shared structural similarities and shared epitopes. We used supervised machine learning employing multiple regression to rank and identify the descriptors that contributed the most to the M peptide structure–activity relationship. The experimental inhibition of antibody binding to each of the 12 NTC6 M types in the relational matrix (Table S1) was used as a response or dependent variable, and the 44 scaled protein descriptors were used as predictors/independent variables in univariate multiple regression models. The univariate regression allowed us to rank the 44 descriptors per regression model and identify a subset of higher-ranked descriptors that appeared more frequently than others in the 12 regression models.

Following the multiple regression analysis, the subset of the 20 top-ranked cheminformatic descriptors from the supervised regression approach was used in k-means clustering of the NTC6 M peptides to obtain subclusters. The new subclusters thus obtained were compared with those that were obtained from clustering experimental antibody-binding data, i.e. the clusters based on the 12 response variables. We observed some correlation between the subgroups of M peptides defined by the k-means clustering based on the 20 physicochemical descriptors and those based on immunological relatedness. We were thus able to use antibody-binding data in a supervised learning approach to identify important descriptors that were then used in an unsupervised k-means clustering scheme to identify groups of M peptides that were predicted to be both structurally and functionally related.

NTC6 vaccine construction, immunization of rabbits, and detection of antibodies

Six NTC6 peptides containing residues 1–50 of M1, M77, M89, M2, M73, and M118 were selected as vaccine components based on the structural and antibody-binding predictions described above. A recombinant hybrid protein vaccine containing the six M peptides joined in tandem without linkers was produced from extracts of Escherichia coli containing pUC57, into which was inserted a chemically synthesized hybrid gene (Genscript, Piscataway, NJ), using methods previously described (11). The synthetic gene was also designed to encode an upstream T7 promoter and a 3′ polyhistidine motif followed by a stop codon. Three rabbits were immunized with 200 μg of protein on alum via the intramuscular route at time 0, 4 weeks, and 8 weeks. Serum was obtained prior to immunization and 2 weeks after the final injection. Antibody levels in preimmune and immune sera were assayed using the 21 NTC6 M peptides as solid-phase antigens, as previously described (16). Assays were repeated once to confirm the results of the initial ELISA. Antibody titer was defined as the reciprocal of the last serum dilution resulting in an A450 of ≥0.2. All research involving animals was reviewed and approved by the University of Tennessee Health Science Center Institutional Animal Care and Use Committee.

Prediction of coiled-coil regions of M peptides to determine helical wheel homology among M types

After the vaccine design and testing phase, we searched for 3D structural correlates distinguishing cross-reactive from nonreactive peptide pairs. A simple approach makes use of the predominantly α-helical coiled-coil dimeric structure of the M protein (18, 31). A standard, canonical coiled-coil structure consists of two α-helices twisting around each other with their side chains interlocking in a “knobs” into “holes” packing. The regular meshing of knobs into holes requires recurrence of the side-chain residue types every seven residues along the helix interface (32). Various tools exist to predict the structures of coiled-coil regions in protein to a level of detail that permits the assignment of the individual residues to the positions of the heptad repeat. One such tool is MARCOIL, which calculates posterior probabilities of hidden Markov models and has been reported to offer the best combination of sensitivity and speed compared with other similar tools (26, 33). The knowledge of the heptad register using tools like MARCOIL can eliminate the need for homology-based methods for modeling coiled-coil proteins, because the structure of coiled coil, unlike almost any other known protein fold, can be computed from parametric equations if the heptad assignment is known.

We confirmed the MARCOIL prediction of coiled-coil domains and the assignment of individual residues in a sequence to the heptad by comparing it with the knobs-in-holes interactions in experimental 3D X-ray crystallographic structures recognized by the SOCKET program (34). The SOCKET program can both identify the repeating knobs-into-holes structural motif and can use this information to assign oligomer order (number of helices), orientation (parallel, anti-parallel, and mixed), and heptad register for the coiled-coil. We found the prediction by MARCOIL and the actual heptad assignment calculated by SOCKET for the three available M dimer crystal structures (PDB entries 2OTO, 5HZP, and 5HYT) to be identical, thus validating further use of MARCOIL (Table S4 and Fig. S4, A–C). Therefore, the heptad assignment of the HVR of all 21 M types was made using MARCOIL. Table S5 provides the sequence predicted to be coiled-coil within the 1–50 residues, along with their starting heptad register and the sequence length. Additionally, the sequences at the heptad sites for each of the M peptides are provided in Table S6.

Author contributions

M. P. A., J. B. D., and J. C. S. conceptualization; M. P. A., T. A. P., J. A. S., R. A., J. B., and J. B. D. data curation; M. P. A., J. B. D., and J. C. S. formal analysis; M. P. A. validation; M. P. A., J. B. D., and J. C. S. investigation; M. P. A. visualization; M. P. A., T. A. P., J. A. S., R. A., J. B., and J. B. D. methodology; M. P. A. writing-original draft; M. P. A., T. A. P., J. A. S., R. A., J. B., J. B. D., and J. C. S. writing-review and editing; J. B. D. and J. C. S. resources; J. B. D. and J. C. S. supervision; J. B. D. and J. C. S. funding acquisition; J. B. D. and J. C. S. project administration; J. C. S. software.

Supplementary Material

Supporting Information

Acknowledgment

We thank Zachary Crockett for assisting with PEP-FOLD3.

This work was supported by National Institutes of Health Grant AI-R01AI132117. J. B. D. is the inventor of certain technologies related to the development of Strep A vaccines. The technology has been licensed from the University of Tennessee Research Foundation to Vaxent, LLC, of which J. B. D. is a member and Chief Scientific Officer. All other authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

This article contains Tables S1–S6 and Figs. S1–S4.

2
The abbreviations used are:
Strep A
group A streptococcus
HVR
hypervariable region
NTC
N-terminal cluster
IVIG
intravenous immunoglobulin.

References

  • 1. Lamagni T. L., Darenberg J., Luca-Harari B., Siljander T., Efstratiou A., Henriques-Normark B., Vuopio-Varkila J., Bouvet A., Creti R., Ekelund K., Koliou M., Reinert R. R., Stathi A., Strakova L., Ungureanu V., et al. (2008) Epidemiology of severe Streptococcus pyogenes disease in Europe. J. Clin. Microbiol. 46, 2359–2367 10.1128/JCM.00422-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Haggar A., Nerlich A., Kumar R., Abraham V. J., Brahmadathan K. N., Ray P., Dhanda V., Joshua J. M., Mehra N., Bergmann R., Chhatwal G. S., and Norrby-Teglund A. (2012) Clinical and microbiologic characteristics of invasive Streptococcus pyogenes infections in north and south India. J. Clin. Microbiol. 50, 1626–1631 10.1128/JCM.06697-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Carapetis J. R., Steer A. C., Mulholland E. K., and Weber M. (2005) The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5, 685–694 10.1016/S1473-3099(05)70267-X [DOI] [PubMed] [Google Scholar]
  • 4. Dale J. B., Batzloff M. R., Cleary P. P., Courtney H. S., Good M. F., Grandi G., Halperin S., Margarit I. Y., McNeil S., and Pandey M. (2016) Current approaches to group A streptococcal vaccine development. In Streptococcus pyogenes: Basic Biology to Clinical Manifestations, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma: [PubMed] [Google Scholar]
  • 5. Vekemans J., Gouvea-Reis F., Kim J. H., Excler J.-L., Smeesters P. R., O'Brien K. L., Van Beneden C. A., Steer A. C., Carapetis J. R., and Kaslow D. C. (2019) The path to group A Streptococcus vaccines: World Health Organization research and development technology roadmap and preferred product characteristics. Clin. Infect. Dis. 69, 877–883 10.1093/cid/ciy1143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Davies M. R., McIntyre L., Mutreja A., Lacey J. A., Lees J. A., Towers R. J., Duchêne S., Smeesters P. R., Frost H. R., Price D. J., Holden M. T. G., David S., Giffard P. M., Worthing K. A., Seale A. C., et al. (2019) Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat. Genet. 51, 1035–1043 10.1038/s41588-019-0417-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Fritzer A., Senn B. M., Minh D. B., Hanner M., Gelbmann D., Noiges B., Henics T., Schulze K., Guzman C. A., Goodacre J., von Gabain A., Nagy E., and Meinke A. L. (2010) Novel conserved group A streptococcal proteins identified by the antigenome technology as vaccine candidates for a non–M protein–based vaccine. Infect. Immun. 78, 4051–4067 10.1128/IAI.00295-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Steer A. C., Carapetis J. R., Dale J. B., Fraser J. D., Good M. F., Guilherme L., Moreland N. J., Mulholland E. K., Schodel F., and Smeesters P. R. (2016) Status of research and development of vaccines for Streptococcus pyogenes. Vaccine 34, 2953–2958 10.1016/j.vaccine.2016.03.073 [DOI] [PubMed] [Google Scholar]
  • 9. Dale J. B. (1999) Multivalent group A streptococcal vaccine designed to optimize the immunogenicity of six tandem M protein fragments. Vaccine 17, 193–200 10.1016/S0264-410X(98)00150-9 [DOI] [PubMed] [Google Scholar]
  • 10. Hu M. C., Walls M. A., Stroop S. D., Reddish M. A., Beall B., and Dale J. B. (2002) Immunogenicity of a 26-valent group A streptococcal vaccine. Infect. Immun. 70, 2171–2177 10.1128/IAI.70.4.2171-2177.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Dale J. B., Penfound T. A., Chiang E. Y., and Walton W. J. (2011) New 30-valent M protein–based vaccine evokes cross-opsonic antibodies against non-vaccine serotypes of group A streptococci. Vaccine 29, 8175–8178 10.1016/j.vaccine.2011.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Kotloff K. L., Corretti M., Palmer K., Campbell J. D., Reddish M. A., Hu M. C., Wasserman S. S., and Dale J. B. (2004) Safety and immunogenicity of a recombinant multivalent group a streptococcal vaccine in healthy adults: phase 1 trial. JAMA 292, 709–715 10.1001/jama.292.6.709 [DOI] [PubMed] [Google Scholar]
  • 13. McNeil S. A., Halperin S. A., Langley J. M., Smith B., Warren A., Sharratt G. P., Baxendale D. M., Reddish M. A., Hu M. C., Stroop S. D., Linden J., Fries L. F., Vink P. E., and Dale J. B. (2005) Safety and immunogenicity of 26-valent group A streptococcus vaccine in healthy adult volunteers. Clin. Infect. Dis. 41, 1114–1122 10.1086/444458 [DOI] [PubMed] [Google Scholar]
  • 14. Frost H. R., Laho D., Sanderson-Smith M. L., Licciardi P., Donath S., Curtis N., Kado J., Dale J. B., Steer A. C., and Smeesters P. R. (2017) Immune cross-opsonization within emm clusters following group A Streptococcus skin infection: broadening the scope of type-specific immunity. Clin. Infect. Dis. 65, 1523–1531 10.1093/cid/cix599 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Sanderson-Smith M., De Oliveira D. M., Guglielmini J., McMillan D. J., Vu T., Holien J. K., Henningham A., Steer A. C., Bessen D. E., Dale J. B., Curtis N., Beall B. W., Walker M. J., Parker M. W., Carapetis J. R., et al. (2014) A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development. J. Infect. Dis. 210, 1325–1338 10.1093/infdis/jiu260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Dale J. B., Smeesters P. R., Courtney H. S., Penfound T. A., Hohn C. M., Smith J. C., and Baudry J. Y. (2017) Structure-based design of broadly protective group a streptococcal M protein–based vaccines. Vaccine 35, 19–26 10.1016/j.vaccine.2016.11.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Steer A. C., Law I., Matatolu L., Beall B. W., and Carapetis J. R. (2009) Global emm type distribution of group A streptococci: systematic review and implications for vaccine development. Lancet Infect. Dis. 9, 611–616 10.1016/S1473-3099(09)70178-1 [DOI] [PubMed] [Google Scholar]
  • 18. Fischetti V. A. (1989) Streptococcal M protein: molecular design and biological behavior. Clin. Microbiol. Rev. 2, 285–314 10.1128/CMR.2.3.285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Shulman S. T., Tanz R. R., Dale J. B., Beall B., Kabat W., Kabat K., Cederlund E., Patel D., Rippe J., Li Z., Sakota V., and North American Streptococcal Pharyngitis Surveillance Group (2009) Seven-year surveillance of North American pediatric group a streptococcal pharyngitis isolates. Clin. Infect. Dis. 49, 78–84 10.1086/599344 [DOI] [PubMed] [Google Scholar]
  • 20. Rousseeuw P. J. (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 10.1016/0377-0427(87)90125-7 [DOI] [Google Scholar]
  • 21. Lamiable A., Thévenet P., Rey J., Vavrusa M., Derreumaux P., and Tufféry P. (2016) PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res. 44, W449–454 10.1093/nar/gkw329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Vilar S., Cozza G., and Moro S. (2008) Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr. Top. Med. Chem. 8, 1555–1572 10.2174/156802608786786624 [DOI] [PubMed] [Google Scholar]
  • 23. Madeira F., Park Y. M., Lee J., Buso N., Gur T., Madhusoodanan N., Basutkar P., Tivey A. R. N., Potter S. C., Finn R. D., and Lopez R. (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47, W636–W641 10.1093/nar/gkz268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. McNamara C., Zinkernagel A. S., Macheboeuf P., Cunningham M. W., Nizet V., and Ghosh P. (2008) Coiled-coil irregularities and instabilities in group A Streptococcus M1 are required for virulence. Science 319, 1405–1408 10.1126/science.1154470 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Buffalo C. Z., Bahn-Suh A. J., Hirakis S. P., Biswas T., Amaro R. E., Nizet V., and Ghosh P. (2016) Conserved patterns hidden within group A Streptococcus M protein hypervariability recognize human C4b-binding protein. Nat. Microbiol. 1, 16155–16155 10.1038/nmicrobiol.2016.155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Delorenzi M., and Speed T. (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18, 617–625 10.1093/bioinformatics/18.4.617 [DOI] [PubMed] [Google Scholar]
  • 27. Grigoryan G., and Keating A. E. (2008) Structural specificity in coiled-coil interactions. Curr. Opin. Struct. Biol. 18, 477–483 10.1016/j.sbi.2008.04.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Beall B., Facklam R., and Thompson T. (1996) Sequencing emm-specific PCR products for routine and accurate typing of group A streptococci. J. Clin. Microbiol. 34, 953–958 10.1128/JCM.34.4.953-958.1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Benjamin D. C., Berzofsky J. A., East I. J., Gurd F. R., Hannum C., Leach S. J., Margoliash E., Michael J. G., Miller A., Prager E. M., Reichlin M., Sercarz E. E., Smith-Gill S. J., Todd P. E., and Wilson A. C. (1984) The antigenic structure of proteins: a reappraisal. Annu. Rev. Immunol. 2, 67–101 10.1146/annurev.iy.02.040184.000435 [DOI] [PubMed] [Google Scholar]
  • 30. Novotný J., Handschumacher M., Haber E., Bruccoleri R. E., Carlson W. B., Fanning D. W., Smith J. A., and Rose G. D. (1986) Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc. Natl. Acad. Sci. U.S.A. 83, 226–230 10.1073/pnas.83.2.226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Smeesters P. R., McMillan D. J., and Sriprakash K. S. (2010) The streptococcal M protein: a highly versatile molecule. Trends Microbiol. 18, 275–282 10.1016/j.tim.2010.02.007 [DOI] [PubMed] [Google Scholar]
  • 32. Lupas A. N., and Gruber M. (2005) The structure of α-helical coiled-coils. In Advances in Protein Chemistry, pp. 37–38, Elsevier Science Publishing Co., Inc., New York: [DOI] [PubMed] [Google Scholar]
  • 33. Gruber M., Söding J., and Lupas A. N. (2006) Comparative analysis of coiled-coil prediction methods. J. Struct. Biol. 155, 140–145 10.1016/j.jsb.2006.03.009 [DOI] [PubMed] [Google Scholar]
  • 34. Walshaw J., and Woolfson D. N. (2001) Socket: a program for identifying and analysing coiled-coil motifs within protein structures. J. Mol. Biol. 307, 1427–1450 10.1006/jmbi.2001.4545 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from The Journal of Biological Chemistry are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES