Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2023 Jan 10;19(1):e1010360. doi: 10.1371/journal.pcbi.1010360

Diversity and evolution of computationally predicted T cell epitopes against human respiratory syncytial virus

Jiani Chen 1,2,3,4, Swan Tan 1,3,4,5, Vasanthi Avadhanula 6, Leonard Moise 3,7,¤, Pedro A Piedra 6, Anne S De Groot 3,7, Justin Bahl 1,2,3,4,5,8,*
Editor: Michiel van Boven9
PMCID: PMC9870173  PMID: 36626370

Abstract

Human respiratory syncytial virus (RSV) is a major cause of lower respiratory infection. Despite more than 60 years of research, there is no licensed vaccine. While B cell response is a major focus for vaccine design, the T cell epitope profile of RSV is also important for vaccine development. Here, we computationally predicted putative T cell epitopes in the Fusion protein (F) and Glycoprotein (G) of RSV wild circulating strains by predicting Major Histocompatibility Complex (MHC) class I and class II binding affinity. We limited our inferences to conserved epitopes in both F and G proteins that have been experimentally validated. We applied multidimensional scaling (MDS) to construct T cell epitope landscapes to investigate the diversity and evolution of T cell profiles across different RSV strains. We find the RSV strains are clustered into three RSV-A groups and two RSV-B groups on this T epitope landscape. These clusters represent divergent RSV strains with potentially different immunogenic profiles. In addition, our results show a greater proportion of F protein T cell epitope content conservation among recent epidemic strains, whereas the G protein T cell epitope content was decreased. Importantly, our results suggest that RSV-A and RSV-B have different patterns of epitope drift and replacement and that RSV-B vaccines may need more frequent updates. Our study provides a novel framework to study RSV T cell epitope evolution. Understanding the patterns of T cell epitope conservation and change may be valuable for vaccine design and assessment.

Author summary

Lower respiratory infections caused by human respiratory syncytial virus (RSV) is a global health challenge. B cell epitope immune response has been the major focus of RSV vaccine and therapeutic development. However, T cell epitope induced immunity plays an important role in the resolution of RSV infection. While RSV genetic diversity has been widely reported, few studies focus on RSV T cell epitope diversity, which can influence vaccine effectiveness. Here, we use computationally predicted T cell epitope profiles of circulating strains to characterize the diversity and evolution of the T cell epitope of RSV A and B. We systematically evaluate the T cell epitope profile of RSV F and G proteins. We provide a T cell epitope landscape visualization that shows co-circulation of three RSV-A groups and two RSV-B groups, suggesting potentially distinct T cell immunity. Furthermore, our study shows different levels of F and G protein T cell epitope content conservation, which may be important to correlate with duration of vaccine protection. This study provides a novel framework to study RSV T cell epitope evolution, infer RSV T cell immunity at population levels and monitor RSV vaccine effectiveness.

Introduction

Human respiratory syncytial virus (RSV) is a negative-strand RNA virus that is classified in the Orthopneumovirus genus of the family Pneumoviridae. It is a major cause of lower respiratory disease in young infants, immunocompromised individuals, and elderly people, resulting in annual epidemics worldwide [1]. The single-stranded RNA genome of RSV is approximate 15.2 kb and encodes 11 viral proteins [2]. The Fusion (F) and Glycoprotein (G) proteins are the two major surface proteins [3]. F protein is generally thought to be conserved and therefore it is the focus of most current RSV vaccine designs. Although G protein is highly variable, its contribution to disease pathogenesis and its role in the biology of infection suggest it can also be an effective RSV vaccine antigen [4]. Despite the significant burden of RSV infection worldwide, there is no licensed vaccine. The only approved intervention is passive immuno-prophylaxis with palivizumab, which is achieved by administering the monoclonal antibody (mAb) to a highly restricted group of infants under the age of 24 months and treatment must be repeated monthly during the RSV season due to the relatively short half-life of the antibody [5,6]. Due to the high cost of monoclonal antibody treatments, this intervention is limited to high-risk infants and is generally unavailable in developing countries. An RSV vaccine is an urgent global healthcare priority, and it is likely that different strategies are needed for the various high-risk groups.

A number of research teams have worked on the development of RSV vaccine since its isolation and characterization in 1956 [7,8]. However, vaccination with the formalin-inactivated, alum precipitated RSV (FI-RSV) vaccine in RSV-naïve infants and young children, led to the development of vaccine enhanced disease (VED) that hampered vaccine development for decades to follow [9]. Many studies have been conducted to explain this undesirable outcome. It is likely that formalin fixation led to a vaccine that mostly presented the post-fusion conformation of RSV F protein, leading to an excess of non-neutralizing antibodies and immune complex formation [1012]. Other studies indicated that an impaired T cell response with Th2 skewing [13,14], as well as complement deposition in the lungs, contributed to enhanced neutrophil recruitment [12]. Recent developments, including the resolution of the F protein [15] and the development of RSV rodent models [16] have contributed to a number of vaccine candidates with novel designs and formulations currently in clinical trials [3,17,18].

While most current RSV vaccination strategies focus on a B-cell-induced neutralization immune response, T cell immunity also plays a major role in the resolution of virus infection and is essential for RSV vaccine development [17,18]. Once RSV infection of the lower airways is established, CD8 T cells play an important part in viral clearance and CD4 helper T cells can orchestrate cellular immune responses and stimulate B cells to produce antibodies. However, Th2-biased responses have been associated with animal models of RSV VED, and measurement of Th1 and Th2 responses are considered important to predict the safety of vaccine candidates [12]. Therefore, induction of a balanced cell-mediated immune response through vaccination would promote RSV clearance, but caution must be taken to avoid the potential for immunopathology. Taken together, a closer examination of T cell immunity and the virus sequences that induce T cell responses are needed for RSV vaccine development.

Human respiratory syncytial virus has a complex circulation pattern in the human population. Within two antigenic groups, RSV-A and RSV-B, different genotypes can co-circulate within the same community, while novel RSV genotypes with high genomic diversity may arise and potentially replace the previously dominant genotypes [19]. In recent years, several unique genetic modifications in RSV have been identified, including a 72-nucleotide (nt) duplication (ON genotype) in RSV-A G gene and another with a 60-nt duplication (BA genotype) in RSV-B at a similar region [20]. The observed RSV genetic diversity has raised a question about whether it is necessary for an RSV vaccine to include several different strains to be effective. Most current RSV vaccine developments are based on an RSV A2 laboratory strain, which is a chimeric strain that belongs to subtype A [21]. While these treatments hold promise, there is the possibility of viral strains developing escape mutations. For example, palivizumab-resistant strains have been isolated from both RSV rodent models and human [17,22]. Several lines of evidence also suggest antigenic variation may play a role in the ability of RSV to escape immune response and established infections [23]. While highly conserved T cell epitopes in RSV vaccine may not provide complete protection against infection when cross-protective antibody responses are lacking, highly conserved T cell epitopes in the vaccine may still reduce the severity of the illness and limit the spread of the virus. However, amino-acid variation at the T cell epitope level and the potential emergence of novel T cell epitopes of recent RSV circulating strains have been reported [24], and further studies are needed to illustrate the effects of amino acid variations on T cell recognition. Hence, characterizing T cell epitope profiles across different strains is very important to understand RSV evolution and can be important for RSV vaccine development.

In this study, we utilize immunoinformatic approaches that are implemented in the iVAX toolkit [25] to predict T cell epitopes in RSV across different strains with a focus on the two major surface proteins F and G. With the analysis of a comprehensive dataset, we evaluate the lineage-specific T cell epitope profile of RSV. We also create sequence-based T cell epitope landscapes based on epitope content comparison across different strains and further correlate RSV T cell immunity change with virus evolution. The proportion of cross-conserved T cell epitope content between vaccine candidate strains that developed earlier and RSV circulating strains with different isolation years and locations was also calculated. These analyses may aid in understanding RSV T cell immunity across different strains and contribute to current vaccine design efforts.

Results

Distribution of T cell epitopes in RSV surface proteins

We evaluated the T cell immunogenic potential across RSV surface proteins by scanning 9 residue regions to predict the binding probability to MHC class I and class II molecules (Fig 1). The epitope density of RSV surface proteins was evaluated using a normalized epitope density score, which is computed by summing up the predicted peptide-MHC binding score across the protein and normalizing it with the protein length. The score for randomly generated proteins is set to zero and vaccine antigens generally score above +20 on this scale [25]. F protein has an epitope density score greater than +20 for both the class I and class II immunogenicity scale analysis, indicating significant immunogenic potential [25]. This contrasts with lower G protein class I and class II epitope density protein scores for both RSV subtypes. The class I epitope density score of G protein was greater than +10 in both subtypes but the class II density was lower than random expectation in the analysis of RSV-B (Fig 1A). This result suggests that RSV surface proteins are likely to have the potential to stimulate T cells that are required for protective immunity. We then investigated the distribution of T cell immunogenicity across the proteins and found that there are regions with relatively high T cell immunogenic potential (Fig 1B). The distribution of T cell immunogenicity of F protein was mapped onto its protein structure and overlap between protein sequence regions with high T cell immunity potential and the antibody neutralizing targets was observed at antigenic site Φ and site II.

Fig 1. T cell immunogenic potential for RSV surface proteins based on MHC binding prediction.

Fig 1

(A) T cell immunogenic potential of RSV major surface proteins. T cell epitope density scores for RSV major surface proteins and other pathogen proteins are labeled on a scale bar. Low-scoring proteins are known to engender little to no immunogenicity while higher-scoring proteins are known immunogens. Proteins scoring above +20 on this scale are considered to have significant immunogenic potential. (B) Distribution of RSV T cell immunogenic potential across F and G protein in RSV reference strain A2 and RSV F protein main antigenic sites that are determined in previous studies [26]. Prefusion or post-fusion F protein surface was colored by the antigenic sites and relative immunogenetic potential at each location. Analyses are based on the RSV-A reference sequence.

Lineage specific T cell epitope profiles

We then extended T cell epitope predictions from RSV representative strains to multiple wild-circulating strains. The distribution and diversity of T cell epitopes across different strains are illustrated in heatmaps with the corresponding time-scaled phylogenies (S1 and S2 Figs). Both F and G proteins contain epitopes that were conserved across all RSV strains in almost 100% of sampled isolates, suggesting that they could serve as high-quality T cell epitope candidates for vaccine design. In contrast, some epitopes were mutated in selected strains, and those epitopes that only occurred in certain clades within the phylogeny could be interpreted as clade-specific “fingerprints”.

The G gene duplication events in RSV, which are unique gene signatures, can either shift the position of epitopes (locations are different but the amino acids of epitopes are identical to the G protein isolates without duplication) or cause the emergence of novel epitopes. Two novel class I epitopes, (no. 31 and no. 40 in S2A Fig), were found in RSV-A strains that contain G gene duplication. In addition, an emergent class II epitope (no. 25 in S2A Fig) was identified in RSV-A sequences that contain G gene duplication, which was a shift from an epitope (no. 24) that has been observed in other strains. From RSV-B strains that contain the G gene duplication event, we also observed multiple lineage specific class I T cell epitopes, which are caused by a 2-amino acid (aa) deletion (aa157 and aa158) in these strains instead of directly due to the 60-nt duplication event. RSV-B G proteins that have the duplication event contain multiple novel epitopes (no. 22, 23, 26, 28, 30, 37) but do not contain several epitopes (no. 24, 25, 27, 29, 31, 38) that are identified in other strains (S2B Fig).

To further determine whether the computationally predicted T cell epitopes with high MHC binding potential are immunogenic, we utilized the JanusMatrix [27] algorithm to identify the T cell epitopes that are likely to be cross-conserved with human peptides and thereby tolerated by the immune system. Based on this analysis, 6.45% of putative class I epitopes and 1.12% of putative class II epitopes of RSV major surface proteins are cross-conserved with human proteome-derived epitopes at T cell receptor (TCR)-facing residues. As these peptides have similar HLA binding preferences that are contained in human proteins (S3 Fig), they were therefore assumed not to be immunogenic. After excluding the high-JanusMatrix score epitopes identified above, we were able to identify T cell epitopes that were conserved in more than 60% of currently circulating RSV strains. We searched the IEDB epitope database to determine if these epitopes were related to experimentally validated RSV T cell epitopes or HLA ligands (Table A in S1 Text). The conserved RSV T cell epitope sequences that may be important for future vaccine development are shown in Tables 1 and 2 (Table B and C in S1 Text).

Table 1. Experimentally validated conserved MHC class I epitopes peptides in RSV major surface proteinsa.

Subgroup Protein Epitope address Epitope sequence b Binding HLAs c Conservation d Number of human matches e Epitope id in IEDB
RSV-A & RSV-B F 45–53 LSALRTGWY A0101 99.55%(A) & 74.24%(B) 1 158982
140–148 FLLGVGSAI A0201 99.59%(A) & 97.98%(B) 0 156869
250–258 YMLTNSELL A0201, A2402 99.59%(A) & 99.33%(B) 0 156979
272–280 KLMSSNVQI A0201 66.64%(A) & 96.08%(B) 3 156902
273–281 LMSSNVQIV A0201 66.56%(A) & 96.08%(B) 1 156915
449–457 TVSVGNTLY A0101 99.75%(A) & 99.33%(B) 0 97017
RSV-A F 10–18 AITTILAAV A0201 84.69% 3 156844
111–119 LPRFMNYTL B0702 91.18% 0 158975
170–178 ALLSTNKAV A0201 99.67% 2 156847
383–391 NIDIFNPKY A0101 95.86% 0 159045
G 25–33 FISSCLYKL A0201 99.26% 0 158759
61–69 FIASANHKV A0201 82.08% 0 158751
RSV-B F 525–533 IMITAIIIV A0201 89.25% 0 156892
540–548 SLIAIGLLL A0201 97.65% 5 156960
G 25–33 VISSCLYKL A0201 90.91% 0 158759
61–69 FIISANHKV A0201 99.02% 0 158751

a. This table contains putative MHC class I epitopes that have already been experimentally validated in publications. Only putative class I epitopes that have positive results in MHC class I ligand assays with the same computationally predicted binding HLAs are shown in the table.

b. Epitope sequences that are conserved in both RSV-A and RSV-B are in bold.

c. HLAs that have the top 1% binder scores in EpiMatrix for epitope sequence.

d. The conservation is evaluated by the presence of epitope peptides across all RSV-A or RSV-B sequences that are publicly available (only epitope sequences with at least 60% conservation are shown in the table).

e. Count of human peptides found in the search database. JanusMatrix was used to search human peptides that are predicted to bind to the same allele as the RSV epitope and share TCR-facing contacts with the RSV epitope.

Table 2. Experimentally validated conserved MHC class II epitopes peptides in RSV major surface proteinsa.

Subtype Protein Epitope address Epitope sequence b Conservation c Number of human matches d Epitope id in IEDB
RSV-A F 29–44 TEEFYQSTCSAVSKGY 98.53% 3 956680
50–70 TGWYTSVITIELSNIKENKCN 97.75% 1 153700
167–192 IKSALLSTNKAVVSLSNGVSVLTSKV 93.14% 4 545502
218–234 ETVIEFQQKNNRLLEIT 98.86% 3 1087566
247–268 VSTYMLTNSELLSLINDMPITN 98.98% 8 99471
288–310 IMSIIKEEVLAYVVQLPLYGVID 98.57% 5 99334
399–418 KTDVSSSVITSLGAIVSCYG 99.14% 0 545603
453–470 GNTLYYVNKQEGKSLYVK 98.37% 1 99691
492–510 ISQVNEKINQSLAFIRKSD 80.32% 1 153713
543–560 AVGLLLYCKARSTPVTLS 79.26% 6 153641
G 19–43 TLNHLLFISSCLYKLNLKSIAQITL 93.13% 8 1087567
RSV-B F 29–44 TEEFYQSTCSAVSRGY 99.78% 3 956680
50–70 TGWYTSVITIELSNIKETKCN 93.95% 1 153700
192–218 VLDLKNYINNQLLPIVNQQSCRISNIE 83.43% 4 153636
247–268 LSTYMLTNSELLSLINDMPITN 98.54% 8 99471
399–418 KTDISSSVITSLGAIVSCYG 98.88% 0 545603
453–470 GNTLYYVNKLEGKNLYVK 98.77% 0 99691
492–510 ISQVNEKINQSLAFIRRSD 97.42% 1 153713
543–560 AIGLLLYCKAKNTPVTLS 94.96% 4 153641
G 51–74 STSLIIAAIIFIISANHKVTLTTV 94.66% 8 158751

a. This table contains putative MHC class II epitopes that share the identical binding groove sequence, which represent the nine-mer frames with the greatest potential to bind class II HLA (epitope sequences with underlines), with the RSV class II epitopes that have already been experimentally validated in publications. Only the putative class II epitopes that have positive results in MHC class II ligand assays with the same computationally predicted binding HLAs are shown in the table.

b. Underlined sequences represent the nine-mer frames with the greatest potential to bind class II HLA. Epitope sequences that are in bold indicate sequences are predicted to bind class II HLA and are conserved in both RSV-A and RSV-B.

c. Conservation is evaluated by the presence of epitope peptides across all RSV-A or RSV-B sequences that are publicly available (Only epitope sequences with at least 60% conservation are shown in the table).

d. Count of human peptides found in the search database. JanusMatrix was used to search human peptides that are predicted to bind to the same allele as the RSV epitope and share TCR-facing contacts with the RSV epitope.

Predicted RSV T cell epitope landscapes

To investigate the evolution of RSV on T cell immunity profiles, we use a multidimensional scaling (MDS) approach to visualize the T cell immunity profile of multiple RSV strains on a landscape. We performed a T cell epitope content pairwise comparison between RSV strains using in silico predicted peptide-HLA allele binding affinity. The pairwise T cell epitope distances were then calculated using the algorithm reported in this study (Eq 1). We then applied a multidimensional scaling (MDS) approach using these estimated pair-wise T epitope distances to map RSV strains to a landscape to characterize their T-cell immunity profile. We found both class I and class II T cell immunity profiles of F and G proteins of different RSV strains were clustered into groups on this T cell epitope landscapes (S4 Fig). Combining the class I and class II T-cell epitope binding profiles, RSV-A major surface protein isolates can be divided into three clusters and RSV-B major surface protein isolates can be divided into two clusters (Figs 2 and S5). We observe that the G gene sequence isolates that contain 72-nt (RSV-A) or 60-nt (RSV-B) duplications clustered together with other sequences instead of forming isolated groups. To further investigate the T cell epitope diversity, we correlated this clustering pattern with the phylogenetic histories (Fig 2B). The phylogenetic tree topologies of the RSV-A F gene and G gene are similar. The F gene cluster 1 is paraphyletic, while clusters 2 and 3 are monophyletic. Cluster 1 is the closest to the ancestral sequence and mapping this group onto the phylogeny shows that this cluster has a basal relationship with clusters 2 and 3 indicating that the phylogenetic divergence occurred prior to epitope drift. The RSV-B F and G gene genealogies are very different. In particular, the RSV-B F gene topologies is indicative of strong immune selection, similar to observed human influenza A virus or within host HIV phylogenies [28]. In contrast, the RSV-B G gene phylogeny shows the co-circulation of multiple lineages, though this could reflect the sequencing bias of G genes (Fig 2B). We then calculated the T-cell epitope immune distance of each strain from a reconstructed ancestral sequence (Fig 2C). These distances were then plotted against the year of isolation and colored according to the cluster identified in Fig 2A. RSV-A shows that multiple predicted immune phenotypes co-circulate and persist for long periods (>2 decades). Analysis of RSV-B shows a turnover of the predicted immune phenotypes with short periods of co-circulation (<5 years) for F and G protein T cell epitopes. The limited periods of co-circulation are again consistent with phenotype patterns observed for viruses under strong immune selection (e.g H3N2 influenza A virus) [29,30]. In contrast, genetic distances from the reconstructed ancestral sequence plotted against year of isolation show patterns typical of gradual genetic drift, except in the G gene where a 72-nt and 60-nt insertion is present (Fig 2D). Taken together, these results suggest that genetic and predicted T-cell epitope immune diversity are different and may be an important factor to consider when evaluating RSV vaccine efficacy.

Fig 2. Predicted T cell epitope landscapes and genetic evolution of RSV surface proteins.

Fig 2

Filled circles indicate RSV F protein isolates or G protein isolates without duplication. Diamonds indicate G protein isolates with gene duplication. (A) Epitope landscapes of RSV major surface proteins are built with MHC class I and class II epitope content comparison across different strains. T cell immunity clusters are determined with k-means method and are used to color the sequenced isolates in the following panels. (B) The corresponding time-scaled phylogenies are reconstructed with the Maximum Likelihood (ML) approach. (C) T cell epitope immune distance and (D) genetic hamming distance from the estimated TMRCA are plotted against the isolated time of each sequence.

There are multiple methods available to predict T cell epitopes [31], which may result in different reconstructed landscapes if there is a systematic bias in the prediction method. We used the NetMHCpan method [32] to predict T cell epitopes and perform the same landscape reconstruction using MHC class I binding predictions for RSV-A F protein. Our analysis showed a consistent clustered pattern of RSV T epitope profile on the landscape regardless of T cell epitope prediction method (S7 Fig).

Assessment of vaccine candidate strains with T cell epitope content

T cell epitopes that are similar between vaccine strains and wild strains (cross-conserved T epitopes, which are defined as epitopes that share identical T cell receptor-facing residues and are restricted by the same alleles [33]) may be responsible for the T cell immune protection of the vaccine. To quantitatively evaluate whether it might be necessary to include multiple RSV strains to prepare an effective vaccine, two live attenuated RSV strains that are previously considered as vaccine candidates, CP248, a recombinant virus that belongs to subtype A, and CP52, which is a recombinant RSV-B strain, were included in our analysis and we evaluated their T cell epitope conservation with different RSV wild-type strains. We calculated the average proportion of cross-conserved T cell epitope content between the selected vaccine strains and wild-circulating strains from different isolation years and WHO regional groups (Figs 3 and S8). Different proportions of cross-conserved T cell epitope content against isolates from two different subtypes, A and B, were observed in both the F and G protein analyses. In the comparison of the vaccine strains and wild strains belonging to the same subtype, the proportion of cross-conserved T cell epitope in RSV F protein is relatively stable in different groups, all are higher than 78% for RSV-A and higher than 85% for RSV-B. In contrast, changes in the proportion of cross-conserved T cell epitopes were detected among groups within the same RSV subtype, especially in different temporal groups in the G protein analysis (Fig 3). Vaccine strain CP248 appears to have a relatively higher proportion of cross-conserved T cell epitopes within G protein when compared to the RSV-A strains that were isolated before 1991 (> 70%) and a relatively lower degree of conservation against recently isolated strains. A similar decrease in T cell epitope conservation with time was identified for vaccine strain CP52 among circulating RSV-B strains.

Fig 3. Evaluation of previously used RSV vaccine candidate strains with T cell epitope content of circulating strains.

Fig 3

RSV-A and RSV-B major surface protein sequences are subsampled and then grouped by isolation year and 6 isolated WHO regions. African Region (AFRO), Region of the Americas (PAHO), South-East Asia Region (SEARO), European Region (EURO), Eastern Mediterranean Region (EMRO) and Western Pacific Region (WPRO). The proportion of cross-conserved T cell epitope content between live attenuated strains (CP248 or CP52) and wild circulating strains are displayed as radar plots.

Discussion

Although both CD4 and CD8 T cells contribute to protection against RSV-induced disease following primary infection [16,34], T cell epitopes have received limited attention in the RSV research effort. We demonstrate RSV surface proteins appear to have significant potential to drive T cell immunity using a computational approach, based on their T cell epitope density scores as determined by MHC molecular binding prediction. The relatively high putative T-cell epitope density might make F protein a good target for RSV vaccine. In addition to the analysis of T cell epitope density and distribution in RSV major surface proteins, we also demonstrated lineage-specific variations in T cell epitope content. Even though RSV F protein is believed to be well conserved, epitope mutations are observed across different lineages within the F protein, suggesting that studying the lineage-specific T cell epitopes in RSV can provide insight into the impact of immune selection on viral diversity and persistence. In contrast to the conserved F protein, RSV G protein is reported to be highly variable, we still observed potential conserved T cell epitopes across different strains, which suggests a great interest of G protein conserved domain as potential vaccine targets on T cell immune protection. While experimental validation is needed, this analysis highlights the importance of understanding population-level epitope conservation as it may provide important insight into the development of T cell epitope-driven vaccines against RSV infection.

A major focus of our work is the development of a sequence-based method to map the evolution of T cell immunity across different stains. Following a previously pivotal work that used MDS method to map the evolutionary adaptation of influenza A virus-induced by CD8 T cell using the presence and absence of MHC class I epitopes [35], we constructed RSV T cell immunity landscapes using immune distances that were generated by T cell epitope cross-conservation analyses, which allows for easy visualization and intuitive understanding of the potential for T cell immunity relationships among different strains. When comparing across strains, we found that the T cell epitope content of RSV surface proteins from different strains can be clustered, as has been observed for the antigenic relationship reported in other pathogens [36,37]. Our results also demonstrate the correspondence between RSV T cell immunity clusters and their corresponding phylogeny, with sequences in the same clade generally belonging to the same T cell immune cluster. Importantly, we also observe different patterns of T-cell epitope evolution of RSV wild strains compared with their genetic evolution. We find the RSV strains with G gene duplications can still cluster with previous RSV isolate on the T cell epitope MDS space. Our results highlight the importance of characterizing T cell epitope changes in RSV.

We identified highly conserved RSV T cell epitopes in this study, some of which have already been experimentally validated and published in the IEDB database. However, we also identified several other conserved T cell epitopes that have not been previously described. These may be valuable for vaccine design, although experimental validation will be needed. Furthermore, the homology of selected RSV epitopes to human peptides suggests that some predicted RSV T cell epitopes might be tolerated by the human immune system, or could induce a harmful cross-reactive immune response against human proteins when administered with an adjuvant [27]. Certain aspects of immunity to RSV were not addressed by this study. For example, neutralizing antibody responses are currently considered to be the most important correlate of immunity. While neutralizing antibodies would not directly be elicited by a T cell epitope-driven vaccine, helper (CD4) T cell epitopes are required to generate high affinity, high specificity antibodies. We also note that we have limited our focus on the two major RSV surface proteins in our current analysis, but other RSV proteins like N, M, or M2-2 proteins might also contribute to vaccine efficacy [38].

An effective vaccine against variable viruses should contain T cell epitopes that are highly conserved among circulating strains [39]. Vaccine efficacy can be diminished if T cell epitopes in a vaccine strain do not match when new strains of pathogens emerge. In this study, we used an immunoinformatic-based approach to estimate cross-conserved T cell epitope contents between two live attenuated vaccine candidate strains and RSV circulating wild strains. We found that there was a low proportion of cross-conserved T cell epitope content with vaccine strains that belonged to different antigenic groups, which indicates the risk of using a single-subtype strain in RSV vaccines. In addition, we observed a lower proportion of cross-conserved G-protein T cell epitope content between vaccine strains and recent circulating strains in the same antigenic group, which suggests that including T cell epitopes from different strains in the same antigenic group might also be important for RSV vaccine development. Although we did not observe a significant change in cross-conserved T cell epitope content in F protein, we cannot rule out the possibility that variation of F protein in the future could render a single-strain-based vaccine less effective. Our current analysis is based on reduced datasets due to the heavy computational capacity required to perform epitope content comparison. We constructed these representative datasets by randomly subsampling the complete datasets according to geographical regions and isolated years. Our findings may reflect the T cell epitope diversity of publicly available RSV strains, however, additional RSV surveillance efforts may be required to get a full picture of the T cell epitope variability of RSV.

Our current study is limited by lacking experimental validation of T cell epitope prediction. We focus on computationally predicted MHC binding to identify T cell epitopes. Although the strength of MHC binding is the key parameter that determines a peptide’s immunogenicity, but not sufficient for a module to be immunogenic [40]. Other aspects associated with pathogen-induced T cell immune response, such as appropriate antigen-processing [41] and T cell receptor recognition [40] are not considered in this study, which might cause bias in the computational-based T cell epitope landscapes. However, the observed clustered pattern of RSV surface proteins on the T cell epitope landscape in this study reflects the diversity of T cell epitopes within different strains. This finding provides valuable insights into virus evolution in the aspects of T cell immunity and can contribute to the strain selection for vaccine design.

Overall, this study provides a focused analysis of T cell epitopes in RSV major surface proteins using computational tools. We performed a comprehensive T cell epitope prediction for RSV showing the immunological relationship of T cell epitopes in RSV surface proteins. This study demonstrates that T cell epitope evolution may differ from genetic variation and provides a framework for developing an integrated epitope-based RSV vaccine and evaluation methods that could be used to optimize vaccination strategies.

Materials and methods

Dataset

RSV GenBank records files were retrieved from NCBI’s GenBank nucleotide database (https://www.ncbi.nlm.nih.gov/nucleotide/) using the search term "HRSVA" or “HRSVB” on June 22, 2020. F and G gene nucleotide sequences and metadata including country of isolation and collection date were extracted using customized python scripts. Genotype assignments were made with the program “LABEL”, using a customized RSV module [42,43]. Countries of isolation were grouped into 6 WHO regions: African Region, Region of the Americas, South-East Asia Region, European Region, Eastern Mediterranean Region, and Western Pacific Region [44]. The following inclusion and exclusion criteria were applied: (i) each sequence needed to have a known isolated geographic location and isolated year, (ii) each sequence had to be at least 80% of the complete gene sequence in length, (iii) identical sequences with the same isolate country were removed, and (iv) vaccine derivative and recombinant sequences were removed. Using these criteria, comprehensive datasets of RSV F and G genes were defined (RSV-A F gene = 1010, RSV-B F gene = 894, RSV-A G gene = 1488, RSV-B G gene = 1120). Nucleotide sequences from each dataset were aligned using MAFFT.v7 [45] and were translated into amino acids using EMBOSS.v6.6.0 [46] for immunoinformatic analyses. In addition, two artificial sequences, CP248 and CP52 (cold passage live RSV strains that were previously evaluated as vaccine candidates, Accession No: U63644, AF0132551 respectively) were downloaded from the NCBI’s GenBank nucleotide database [47].

Phylogenetic inference

The nucleotide sequences of RSV major surface proteins were used to reconstruct the maximum-likelihood (ML) phylogeny of RSV using RAxML.v8 with GTR+GAMMA substitution model [48]. The best-scoring ML tree was automatically generated from five runs by RAxML. Time-scaled phylogenies were further reconstructed with the best scoring ML trees using the program “Timetree” [49]. The phylogenies are visualized in the R package “ggtree” [50].

T cell epitope prediction

RSV major surface protein sequences were scored for binding potential against a globally representative panel of Human Leukocyte Antigen (HLA) class I and class II alleles using the EpiMatrix algorithm. This algorithm as well as the ClustiMer, JanusMatrix, and EpiCC algorithms discussed below are part of the iVAX toolkit developed by EpiVax, which is available for use under a license or through academic collaborations [25].

Evaluation of class I epitopes was made based on predictions for four HLA-A and two HLA-B supertype alleles: A*01:01, A*02:01, A*03:01, A*24:02, B*07:02, B*44:03. Class II epitopes were identified for nine HLA-DR supertype alleles: DRB1*01:01, DRB1*03:01, DRB1*04:01, DRB1*07:01, DRB1*08:01, DRB1*09:01, DRB1*11:01, DRB1*13:01, and DRB1*15:01. These are HLA allele supertypes (alleles sharing common binding preferences) that cover the genetic diversity of more than 95% of human populations globally [51,52]. EpiMatrix parsed 9-mer sequence frames (each one overlapping the previous one by eight amino acids) from the antigen sequence and assigned a score for each nine-mer/allele pair on a normalized Z distribution. Nine-mer sequences that had Z-scores of at least 1.64 are considered to be in the top 5% of any randomly generated set of 9-mer sequences and to have a high likelihood of binding to HLA molecules and being presented to T cells. Sequences that score above 2.32 on the Z-scale (top 1%) are extremely likely to bind to a particular HLA allele and to be immunogenic. For this analysis, HLA-class I restricted 9-mer sequences that had top 1% binder scores to at least one HLA class I supertype allele were considered to be putative class I epitopes [25].

To identify putative class II epitopes, we used an algorithm called ClustiMer [25] to screen EpiMatrix scoring results for the nine class II alleles. ClustiMer identifies contiguous regions of 15–30 amino acids that have a high density of MHC class II binding potential. Epitope density within a cluster is reported as an EpiMatrix Cluster Score, where scores of 10 and above are likely to be recognized in the context of multiple class II alleles and to be high-quality class II epitopes.

Identification of cross-conservation between putative RSV epitopes and human peptides

We also applied analysis of human homology to this study. After identifying putative T cell epitopes sequences in RSV major surface proteins, the JanusMatrix algorithm [27] was used to assess the potential cross-conservation of T cell epitopes with epitopes restricted by the same HLA alleles in the human proteome (Uniprot-sourced human proteins [53]). JanusMatrix scans input peptides and takes the 9-mer epitope regions that are identified in EpiMatrix to find the human peptides with a compatible HLA facing-agretope (i.e. the agretopes of both the input peptide and its human counterpart are predicted to bind to the same HLA allele) and the same TCR facing epitope to compute as a JanusMatrix Human Homology Score. As defined in retrospective studies, foreign class I epitopes that score greater than 2 and class II epitopes that score greater than 5 may be less immunogenic due to T cell tolerance [25].

Protein-level T cell immunogenic potential evaluation

RSV reference sequences (RSV-A: NC_038235, RSV-B: NC_001781) were downloaded from the NCBI RefSeq database and were used to evaluate the protein-level immunogenic potential of RSV major surface proteins. The protein-level immunogenic potential as represented by the EpiMatrix-defined T cell epitope density score was computed by summing the top 5% binder scores across HLA alleles and normalizing for a 1000-amino acid protein length. Zero on this scale is set to indicate the average number of top 5% binders that would be observed in 10,000 random protein sequences with natural amino acid frequencies. Proteins scoring above +20 have been observed to have the significant immunogenic potential [54]. Fully human proteins generally score lower than zero on the EpiMatrix immunogenicity scale.

To investigate the distribution of T cell immunogenic potential across RSV protein sequence regions, we summed up the binding scores of HLA alleles for each nine-mer frame, to get a frame-specific immunogenic potential score and standardized this score to a relative scale. The relative immunogenic potential across protein structure was represented by a color scale and the visualization of F protein structure was built with PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC). Protein data bank (PDB) files 5UDE [55] and 3RRR [56] were used for the pre-fusion and post-fusion forms.

Subsampling strategy

Considering the heavy computational load that would be required to evaluate all available RSV sequences and to correct the overrepresentation of recently sampled strains, the comparative analysis for T cell epitope content was conducted with datasets in which overrepresented groups were reduced. A maximum of five sequences of each isolation year from different WHO region groups were subsampled randomly from the original datasets (RSV-A F gene = 402, RSV-B F gene = 319, RSV-A G gene = 390, RSV-B G gene = 359).

T cell epitope content comparison

The Epitope Content Comparison (EpiCC) algorithm, which is implemented in iVAX was used to compare T cell epitope content within each subsampled dataset by evaluating cross-conserved T cell epitopes (9-mer peptides with identical TCR-facing residues and are predicted to binding to the same MHC allele) content between different virus strains [33]. We reasoned that epitopes with identical T cell receptor-facing residues (TCRf, position 4, 5, 6, 7, 8 for class I epitopes binding core and 2, 3, 5, 7, 8 for class II epitopes binding core), regardless of differences on their MHC-facing (MHCf) amino acids, which are also predicted to bind to the same MHC allele, are more likely to induce cross-reactive memory T cells (These epitopes are called cross-conserved T cell epitopes). To simplify the analysis, the binding of 9-mer epitopes within protein sequences are assumed to be mutually exclusive and uniform, which means the T cell immune response of the antigen protein can be represent by summing up all T cell epitopes within the protein sequence.

We use u to represent 9-mer peptides with similar MHCf capable of binding the same MHC alleles but bearing different TCRf (non-cross conserved T cell epitopes) in two wild circulating strains (w1 and w2). Because the T cell immune response to virus is directly related to its T cell epitope content, the T cell immune distance (D) between two strains can be represented by the sum of binding probabilities of these unique 9-mer peptides for a set of HLA alleles (Eq 1). p(u)a is the predicted binding probability of unique 9-mer peptide u against a single class I or class II allele a, that is a member of A, which represents a set of HLA alleles.

D(w1,w2)=u(w1,w2)aAp(u)a (1)

Since the calculation of T cell epitope immune distance relies on the predicted epitope binding affinity, we use another T cell epitope prediction tool to evaluate the T cell epitope immune distance generated by the EpiCC algorithm. We apply the Eq (1) to re-calculate T cell epitope immune distance with customized Python scripts (available at https://github.com/JianiC/RSV_Epitope/tree/master/NetMHCpan_reproduce) using MHC binding prediction results that are generated from publicly available T cell epitope prediction tool, netMHCpan EL 4.1 methods in the Immune Epitope Database (IEDB) [57]. Eigenvalues of each sequence that were calculated from the pairwise distance matrix with “RSpectra” package were used to statistically examine the correlation of the epitope distances that are computed from the two methods, and Pearson correlation test was used to test the correlation hypothesis.

The capacity for a vaccine to induce a T cell immune response that could be recalled by a wild circulating strain is related to the cross-conservation of the T cell epitopes between the vaccine strain (v) and the wild circulating strain (w). For each pair of 9-mer peptides i (from strain v) and j (from strain w) that are cross-conserved (i.e. bearing identical residues that face the TCR), the probability to recall cross-reactive T cell memory by those two 9-mer peptides via a single HLA allele a can be represented by the joint estimation of the binding probability of these two 9-mer peptides (p(i)a*p(j)a).

Therefore, a T cell epitope similarity score (S) between two sequences can be represented by summing the probability to cross-reactive memory T cells by all paired 9-mer peptides that are cross-conserved between the vaccine strain (v) and wild circulating strain (w) against a set HLA alleles A (Eq 2.1).

S(v,w)=iv,jwaA(p(i)a*p(j)a) (2.1)

We further normalized the T cell epitope similarity score between the vaccine strain and wild circulating strain by the maximum T cell epitope similarity score for the vaccine strain in comparison with itself (Eq 2.2):

P(v,w)=iv,jwaA(p(i)a*p(j)a)iv,jvaA(p(i)a*p(j)a) (2.2)

Dimension reduction

The equation to calculate T cell epitope immune distance was applied iteratively to the subsampled dataset and therefore the pairwise T cell epitope immune distances are structured into an n × n square-distance matrix. Given that each protein is described by a relative distance to the rest of n-1 proteins, the data must be dimensionally reduced to be graphed. Classic (metric) multidimensional scaling (MDS) can be used to preserve the distances between a set of observations in a way that allows the distances to be represented in a two-dimensional space. MDS was performed as previously described by Gower [58]. The MDS method first constructs an n-dimensional Euclidean space using the distance matrix in which all distances are conserved, and then principal component analysis is performed. MDS [59] were carried out using the cmdscale package in R [58]. K-means clustering was performed using the kmeans function in base R. Due to the lack of previous characterizations of RSV T cell immunity clusters, the number of T cell immunity groups was determined using the optimized within-cluster sum of square (wss) with Elbows method [60]. To evaluate whether applying k-means clustering to classify RSV strains on two-dimensional space can reflect their T epitope profile, we calculated the stress of MDS using the smacof package in R [61]. We also compare the performance of k-means clustering on MDS spaces with different numbers of dimensions (S6 Fig).

Calculation of genetic hamming distance

Genetic hamming distance, which is defined as the number of bases by which two nucleotide sequences differ, was calculated by comparing the number of different bases between each sequence in the subsampled datasets. The reconstructed most recent common ancestor (TMRCA) sequences for each dataset (subsampled F and G protein sequences of subtype A and subtype B, respectively) were estimated using the program “Treetime” and were used as root in our analysis [49].

Supporting information

S1 Text. Supplementary Materials.

Table A: Number of computationally predicted conserved RSV T cell epitopes and experimentally identified RSV T cell epitopes. Table B: Conservation of experimentally validated conserved MHC class I epitopes peptides in RSV major surface proteins in subsampled dataset. Table C: Conservation of experimentally validated conserved MHC class II epitopes peptides in RSV major surface proteins in subsampled dataset.

(DOCX)

S1 File. Accession number to RSV sequence that are used in this study.

(CSV)

S1 Fig. Distribution and diversity of T cell epitopes in RSV F protein.

The tree panel on the left is a time-scaled phylogeny build with RSV-A (A) or RSV-B (B) F gene nucleotide sequences using the ML approach. Determined genotypes are labeled on the right with black bars. Each color column on the right side represents the presence of an MHC class I or class II epitope. Only the epitopes that are present in more than 1% of sampled isolates are displayed. The column color indicates different numbers of epitope sequences at the same location.

(TIF)

S2 Fig. Distribution and diversity of T cell epitopes in RSV G protein.

The tree panel on the left is a time-scaled phylogeny build with RSV-A (A) or RSV-B (B) G gene nucleotide sequences using the ML approach. The clades that contain novel 72-nt or 60-nt duplication at the second hypervariable region of G gene were highlighted in red. Determined genotypes are labeled on the right with black bars. Each color column on the right side represents the presence of an MHC class I or class II epitope. Only the epitopes that are present in more than 1% of sampled isolates were displayed. The column color indicates different numbers of epitope sequences at the same location.

(TIF)

S3 Fig. Distribution of JanusMatrix Human Homology score for putative RSV MHC class I and class II epitopes.

The cross-reactive potential of identified putative T cell epitopes and human host was represented with a JanusMatrix Human Homology score. 6.45% identified putative class I epitopes and 1.12% class II epitopes are cross-conserved on the TCR face with human peptides.

(TIF)

S4 Fig. Predicted T cell epitope landscapes of RSV surface proteins.

RSV T cell epitope landscapes were built with sequenced-based MHC class I epitope binding prediction (left), MHC class II epitope binding prediction (middle) or combining class I and class II epitope binding prediction (right). Sequences are colored by the epitope cluster determined by epitope landscapes built with combining Class I and Class II epitope prediction

(TIF)

S5 Fig. Total within sum of squares (wss) using k-means algorithm.

Totals within sum of squares in epitope topographies were calculated after clustering into k (from 1 to 10) groups with k-means. The optimal number of clusters is determined to be 3 in the analysis of RSV-A F and G proteins and is determined to be 2 in the analysis of RSV-B F and G proteins using the Elbow method.

(TIF)

S6 Fig. Sensitivity analysis for MDS and k-means clustering.

(A) Stress evaluation under the different number of dimensions for RSV distance matrix. Stress less than 0.15 (red dash line) indicates an acceptable precise MDS solution. (B) Performance of k-means clustering under the different number of dimensions, the number of clusters is determined at 2-dimensional space. There is no cluster grouping difference at higher dimensional space (orange). Sum square between clusters /sum square of total differences (BSS/TSS) measures indicates the total variance in the data is explained well under higher dimensional space (green).

(TIF)

S7 Fig. Validation of T cell epitope distance estimation using the IEDB analysis resource.

Validation is performed with MHC class I epitope binding prediction of RSV-A F protein. (A) Heatmaps for pairwise MHC class I epitope distance estimated in iVAX toolkits or calculated with custom python scripts using MHC class I molecule binding prediction that is implemented in IEDB. (B) Eigenvalues for each sequence are calculated from pairwise distance matrices using “RSpectra” package in R. The Pearson correlation test significantly supports a non-zero correlation between T cell epitope distance estimated with EpiCC and T cell epitope distance estimated with IEDB. (C) T cell epitope topographies are built with pairwise epitope distances estimated from EpiCC or IEDB. Both methods resulted in a similar cluster pattern for the CD8 T cell epitope profile of RSV-A F protein.

(TIF)

S8 Fig. Evaluation of RSV vaccine candidate strains with class I and class II T cell epitope content in different WHO regions.

RSV-A and RSV-B major surface protein sequences were grouped by isolation year and 6 isolated WHO regions, African Region (AFRO), Region of the Americas (PAHO), South-East Asia Region (SEARO), European Region (EURO), Eastern Mediterranean Region (EMRO) and Western Pacific Region (WPRO). The proportion of cross-conserved T cell epitope content between vaccine strains (CP248 or CP52) and wild circulating strains in different isolation years and different WHO regions were represented.

(TIF)

Data Availability

Data and materials availability: Sequence data was retrieved from NCBI’s GenBank nucleotide database (https://www.ncbi.nlm.nih.gov/nucleotide/). Accession number to RSV sequences used in this study are available in supplementary materials. Code to generate T epitope landscapes are deposited in GitHub https://github.com/JianiC/RSV_Epitope.

Funding Statement

This work was supported by the National Institute of Allergy and Infectious Diseases, a component of the NIH, Department of Health and Human Services, under contract 75N93019C00052 (J.C. and J.B.) and Contract No. 75N93021C00018 (NIAID Centers of Excellence for Influenza Research and Response, CEIRR) (J.B.). J.B and S.T. are funded in part from the Centers for Disease Control under Contract No. 75D30119C06826 and 75D30121C11990. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Crowe JE Jr., Williams JV. Paramyxoviruses: Respiratory Syncytial Virus and Human Metapneumovirus: Viral Infections of Humans. 2014. Feb 27:601–27. doi: 10.1007/978-1-4899-7448-8_26 [DOI] [Google Scholar]
  • 2.Yun MR, Kim AR, Lee HS, Kim DW, Lee WJ, Kim K, et al. Complete genome sequences of human respiratory syncytial virus genotype a and B isolates from South Korea. Genome Announc. 2015;3(2). Epub 2015/04/25. doi: 10.1128/genomeA.00332-15 ; PubMed Central PMCID: PMC4408341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McLellan JS, Ray WC, Peeples ME. Structure and function of respiratory syncytial virus surface glycoproteins. Curr Top Microbiol Immunol. 2013;372:83–104. Epub 2013/12/24. doi: 10.1007/978-3-642-38919-1_4 ; PubMed Central PMCID: PMC4211642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lee J, Klenow L, Coyle EM, Golding H, Khurana S. Protective antigenic sites in respiratory syncytial virus G attachment protein outside the central conserved and cysteine noose domains. PLOS Pathogens. 2018;14(8):e1007262. doi: 10.1371/journal.ppat.1007262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Updated guidance for palivizumab prophylaxis among infants and young children at increased risk of hospitalization for respiratory syncytial virus infection. Pediatrics. 2014;134(2):e620–38. Epub 2014/07/30. doi: 10.1542/peds.2014-1666 . [DOI] [PubMed] [Google Scholar]
  • 6.Palivizumab, a humanized respiratory syncytial virus monoclonal antibody, reduces hospitalization from respiratory syncytial virus infection in high-risk infants. The IMpact-RSV Study Group. Pediatrics. 1998;102(3 Pt 1):531–7. Epub 1998/09/17. . [PubMed] [Google Scholar]
  • 7.Schwarz TF, Johnson C, Grigat C, Apter D, Csonka P, Lindblad N, et al. Three Dose Levels of a Maternal Respiratory Syncytial Virus Vaccine Candidate Are Well Tolerated and Immunogenic in a Randomized Trial in Nonpregnant Women. J Infect Dis. 2022;225(12):2067–76. Epub 2021/06/20. doi: 10.1093/infdis/jiab317 ; PubMed Central PMCID: PMC9200160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Biagi C, Dondi A, Scarpini S, Rocca A, Vandini S, Poletti G, et al. Current State and Challenges in Developing Respiratory Syncytial Virus Vaccines. Vaccines (Basel). 2020;8(4). Epub 2020/11/15. doi: 10.3390/vaccines8040672 ; PubMed Central PMCID: PMC7711987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kim HW, Canchola JG, Brandt CD, Pyles G, Chanock RM, Jensen K, et al. Respiratory syncytial virus disease in infants despite prior administration of antigenic inactivated vaccine. Am J Epidemiol. 1969;89(4):422–34. Epub 1969/04/01. doi: 10.1093/oxfordjournals.aje.a120955 . [DOI] [PubMed] [Google Scholar]
  • 10.Murphy BR, Sotnikov AV, Lawrence LA, Banks SM, Prince GA. Enhanced pulmonary histopathology is observed in cotton rats immunized with formalin-inactivated respiratory syncytial virus (RSV) or purified F glycoprotein and challenged with RSV 3–6 months after immunization. Vaccine. 1990;8(5):497–502. Epub 1990/10/01. doi: 10.1016/0264-410x(90)90253-i . [DOI] [PubMed] [Google Scholar]
  • 11.Killikelly AM, Kanekiyo M, Graham BS. Pre-fusion F is absent on the surface of formalin-inactivated respiratory syncytial virus. Scientific Reports. 2016;6(1):34108. doi: 10.1038/srep34108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Polack FP, Teng MN, Collins PL, Prince GA, Exner M, Regele H, et al. A role for immune complexes in enhanced respiratory syncytial virus disease. J Exp Med. 2002;196(6):859–65. Epub 2002/09/18. doi: 10.1084/jem.20020781 ; PubMed Central PMCID: PMC2194058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Connors M, Giese NA, Kulkarni AB, Firestone CY, Morse HC 3rd, Murphy BR. Enhanced pulmonary histopathology induced by respiratory syncytial virus (RSV) challenge of formalin-inactivated RSV-immunized BALB/c mice is abrogated by depletion of interleukin-4 (IL-4) and IL-10. J Virol. 1994;68(8):5321–5. Epub 1994/08/01. doi: 10.1128/JVI.68.8.5321-5325.1994 ; PubMed Central PMCID: PMC236482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Waris ME, Tsou C, Erdman DD, Zaki SR, Anderson LJ. Respiratory synctial virus infection in BALB/c mice previously immunized with formalin-inactivated virus induces enhanced pulmonary inflammatory response with a predominant Th2-like cytokine pattern. J Virol. 1996;70(5):2852–60. Epub 1996/05/01. doi: 10.1128/JVI.70.5.2852-2860.1996 ; PubMed Central PMCID: PMC190142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gilman MSA, Furmanova-Hollenstein P, Pascual G, BvtW A, Langedijk JPM, McLellan JS. Transient opening of trimeric prefusion RSV F proteins. Nat Commun. 2019;10(1):2105. Epub 2019/05/10. doi: 10.1038/s41467-019-09807-5 ; PubMed Central PMCID: PMC6506550 RSV, has received research funding from MedImmune and Janssen, has been a paid consultant for MedImmune, and is on the scientific advisory board for Calder Biosciences. M.S.A.G. is a named inventor on a patent application for RSV F-directed single-domain antibodies. P.F.-H., G.P., A.vW., and J.P.M.L. are employees of Janssen, pharmaceutical companies of Johnson & Johnson. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Taylor G. Animal models of respiratory syncytial virus infection. Vaccine. 2017;35(3):469–80. Epub 2016/12/03. doi: 10.1016/j.vaccine.2016.11.054 ; PubMed Central PMCID: PMC5244256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Blunck BN, Rezende W, Piedra PA. Profile of respiratory syncytial virus prefusogenic fusion protein nanoparticle vaccine. Expert Rev Vaccines. 2021;20(4):351–64. Epub 2021/03/19. doi: 10.1080/14760584.2021.1903877 . [DOI] [PubMed] [Google Scholar]
  • 18.Mazur NI, Higgins D, Nunes MC, Melero JA, Langedijk AC, Horsley N, et al. The respiratory syncytial virus vaccine landscape: lessons from the graveyard and promising candidates. Lancet Infect Dis. 2018;18(10):e295–e311. Epub 2018/06/20. doi: 10.1016/S1473-3099(18)30292-5 . [DOI] [PubMed] [Google Scholar]
  • 19.Liang X, Liu DH, Chen D, Guo L, Yang H, Shi YS, et al. Gradual replacement of all previously circulating respiratory syncytial virus A strain with the novel ON1 genotype in Lanzhou from 2010 to 2017. Medicine (Baltimore). 2019;98(19):e15542. Epub 2019/05/15. doi: 10.1097/MD.0000000000015542 ; PubMed Central PMCID: PMC6531141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ahmed A, Haider SH, Parveen S, Arshad M, Alsenaidy HA, Baaboud AO, et al. Co-Circulation of 72bp Duplication Group A and 60bp Duplication Group B Respiratory Syncytial Virus (RSV) Strains in Riyadh, Saudi Arabia during 2014. PLOS ONE. 2016;11(11):e0166145. doi: 10.1371/journal.pone.0166145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tian D, Battles MB, Moin SM, Chen M, Modjarrad K, Kumar A, et al. Structural basis of respiratory syncytial virus subtype-dependent neutralization by an antibody targeting the fusion glycoprotein. Nature Communications. 2017;8(1):1877. doi: 10.1038/s41467-017-01858-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hashimoto K, Hosoya M. Neutralizing epitopes of RSV and palivizumab resistance in Japan. Fukushima J Med Sci. 2017;63(3):127–34. Epub 2017/09/05. doi: 10.5387/fms.2017-09 ; PubMed Central PMCID: PMC5792496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sullender WM. Respiratory syncytial virus genetic and antigenic diversity. Clin Microbiol Rev. 2000;13(1):1–15, table of contents. Epub 2000/01/11. doi: 10.1128/CMR.13.1.1 ; PubMed Central PMCID: PMC88930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chen X, Xu B, Guo J, Li C, An S, Zhou Y, et al. Genetic variations in the fusion protein of respiratory syncytial virus isolated from children hospitalized with community-acquired pneumonia in China. Scientific Reports. 2018;8(1):4491. doi: 10.1038/s41598-018-22826-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.De Groot AS, Moise L, Terry F, Gutierrez AH, Hindocha P, Richard G, et al. Better Epitope Discovery, Precision Immune Engineering, and Accelerated Vaccine Design Using Immunoinformatics Tools. Front Immunol. 2020;11:442. Epub 2020/04/23. doi: 10.3389/fimmu.2020.00442 ; PubMed Central PMCID: PMC7154102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gilman MS, Castellanos CA, Chen M, Ngwuta JO, Goodwin E, Moin SM, et al. Rapid profiling of RSV antibody repertoires from the memory B cells of naturally infected adult donors. Sci Immunol. 2016;1(6). Epub 2017/01/24. doi: 10.1126/sciimmunol.aaj1879 ; PubMed Central PMCID: PMC5244814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.He L, De Groot AS, Gutierrez AH, Martin WD, Moise L, Bailey-Kellogg C. Integrated assessment of predicted MHC binding and cross-conservation with self reveals patterns of viral camouflage. BMC Bioinformatics. 2014;15 Suppl 4(Suppl 4):S1. Epub 2014/08/12. doi: 10.1186/1471-2105-15-S4-S1 ; PubMed Central PMCID: PMC4094998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Grenfell BT, Pybus OG, Gog JR, Wood JL, Daly JM, Mumford JA, et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science. 2004;303(5656):327–32. Epub 2004/01/17. doi: 10.1126/science.1090727 . [DOI] [PubMed] [Google Scholar]
  • 29.Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus AD, et al. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305(5682):371–6. Epub 2004/06/26. doi: 10.1126/science.1097211 . [DOI] [PubMed] [Google Scholar]
  • 30.Bedford T, Suchard MA, Lemey P, Dudas G, Gregory V, Hay AJ, et al. Integrating influenza antigenic dynamics with molecular evolution. eLife. 2014;3:e01914. doi: 10.7554/eLife.01914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Korber B, LaBute M, Yusim K. Immunoinformatics Comes of Age. PLOS Computational Biology. 2006;2(6):e71. doi: 10.1371/journal.pcbi.0020071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Research. 2020;48(W1):W449–W54. doi: 10.1093/nar/gkaa379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gutiérrez AH, Rapp-Gabrielson VJ, Terry FE, Loving CL, Moise L, Martin WD, et al. T-cell epitope content comparison (EpiCC) of swine H1 influenza A virus hemagglutinin. Influenza Other Respir Viruses. 2017;11(6):531–42. Epub 2017/10/21. doi: 10.1111/irv.12513 ; PubMed Central PMCID: PMC5705686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Russell CD, Unger SA, Walton M, Schwarze J. The Human Immune Response to Respiratory Syncytial Virus Infection. Clin Microbiol Rev. 2017;30(2):481–502. Epub 2017/02/10. doi: 10.1128/CMR.00090-16 ; PubMed Central PMCID: PMC5355638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Woolthuis RG, van Dorp CH, Keşmir C, de Boer RJ, van Boven M. Long-term adaptation of the influenza A virus by escaping cytotoxic T-cell recognition. Scientific Reports. 2016;6(1):33334. doi: 10.1038/srep33334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Katzelnick LC, Fonville JM, Gromowski GD, Arriaga JB, Green A, James SL, et al. Dengue viruses cluster antigenically but not as discrete serotypes. Science. 2015;349(6254):1338–43. doi: 10.1126/science.aac5017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tuju J, Mackinnon MJ, Abdi AI, Karanja H, Musyoki JN, Warimwe GM, et al. Antigenic cartography of immune responses to Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1). PLOS Pathogens. 2019;15(7):e1007870. doi: 10.1371/journal.ppat.1007870 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Liu J, Ruckwardt TJ, Chen M, Johnson TR, Graham BS. Characterization of Respiratory Syncytial Virus M- and M2-Specific CD4 T Cells in a Murine Model. Journal of Virology. 2009;83(10):4934–41. doi: 10.1128/JVI.02140-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Viboud C, Gostic K, Nelson MI, Price GE, Perofsky A, Sun K, et al. Beyond clinical trials: Evolutionary and epidemiological considerations for development of a universal influenza vaccine. PLOS Pathogens. 2020;16(9):e1008583. doi: 10.1371/journal.ppat.1008583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schaap-Johansen AL, Vujović M, Borch A, Hadrup SR, Marcatili P. T Cell Epitope Prediction and Its Application to Immunotherapy. Front Immunol. 2021;12:712488. Epub 2021/10/05. doi: 10.3389/fimmu.2021.712488 ; PubMed Central PMCID: PMC8479193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Mettu RR, Charles T, Landry SJ. CD4+ T-cell epitope prediction using antigen processing constraints. J Immunol Methods. 2016;432:72–81. Epub 2016/02/20. doi: 10.1016/j.jim.2016.02.013 ; PubMed Central PMCID: PMC5321161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chen J, Qiu X, Avadhanula V, Shepard SS, Kim D-K, Hixson J, et al. Novel and extendable genotyping system for human respiratory syncytial virus based on whole-genome sequence analysis. Influenza Other Respir Viruses. 2022;16(3):492–500. doi: 10.1111/irv.12936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Shepard SS, Davis CT, Bahl J, Rivailler P, York IA, Donis RO. LABEL: Fast and Accurate Lineage Assignment with Assessment of H5N1 and H9N2 Influenza A Hemagglutinins. PLOS ONE. 2014;9(1):e86921. doi: 10.1371/journal.pone.0086921 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Organization UNbtWH. World Health Statistics 2011 2011.
  • 45.Rozewicki J, Li S, Amada KM, Standley DM, Katoh K. MAFFT-DASH: integrated protein sequence and structural alignment. Nucleic Acids Research. 2019;47(W1):W5–W10. doi: 10.1093/nar/gkz342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–7. Epub 2000/05/29. doi: 10.1016/s0168-9525(00)02024-2 . [DOI] [PubMed] [Google Scholar]
  • 47.Whitehead SS, Hill MG, Firestone CY, St Claire M, Elkins WR, Murphy BR, et al. Replacement of the F and G proteins of respiratory syncytial virus (RSV) subgroup A with those of subgroup B generates chimeric live attenuated RSV subgroup B vaccine candidates. J Virol. 1999;73(12):9773–80. Epub 1999/11/13. doi: 10.1128/JVI.73.12.9773-9780.1999 ; PubMed Central PMCID: PMC113024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. Epub 2014/01/24. doi: 10.1093/bioinformatics/btu033 ; PubMed Central PMCID: PMC3998144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evolution. 2018;4(1). doi: 10.1093/ve/vex042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017;8(1):28–36. doi: 10.1111/2041-210X.12628 [DOI] [Google Scholar]
  • 51.Southwood S, Sidney J, Kondo A, del Guercio MF, Appella E, Hoffman S, et al. Several common HLA-DR types share largely overlapping peptide binding repertoires. J Immunol. 1998;160(7):3363–73. Epub 1998/04/08. . [PubMed] [Google Scholar]
  • 52.Sette A, Sidney J. Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics. 1999;50(3–4):201–12. Epub 1999/12/22. doi: 10.1007/s002510050594 . [DOI] [PubMed] [Google Scholar]
  • 53.UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–d9. Epub 2020/11/26. doi: 10.1093/nar/gkaa1100 ; PubMed Central PMCID: PMC7778908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Moise L, Gutierrez A, Kibria F, Martin R, Tassone R, Liu R, et al. iVAX: An integrated toolkit for the selection and optimization of antigens and the design of epitope-driven vaccines. Human Vaccines & Immunotherapeutics. 2015;11(9):2312–21. doi: 10.1080/21645515.2015.1061159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Zhu Q, McLellan JS, Kallewaard NL, Ulbrandt ND, Palaszynski S, Zhang J, et al. A highly potent extended half-life antibody as a potential RSV vaccine surrogate for all infants. Sci Transl Med. 2017;9(388). Epub 2017/05/05. doi: 10.1126/scitranslmed.aaj1928 . [DOI] [PubMed] [Google Scholar]
  • 56.McLellan JS, Yang Y, Graham BS, Kwong PD. Structure of respiratory syncytial virus fusion glycoprotein in the postfusion conformation reveals preservation of neutralizing epitopes. J Virol. 2011;85(15):7788–96. Epub 2011/05/27. doi: 10.1128/JVI.00555-11 ; PubMed Central PMCID: PMC3147929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics. 2012;64(3):177–86. doi: 10.1007/s00251-011-0579-8 [DOI] [PubMed] [Google Scholar]
  • 58.GOWER JC. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966;53(3–4):325–38. doi: [DOI] [Google Scholar]
  • 59.Sánchez J. MARDIA K. V., J. T. KENT J. M. BIBBY: Multivariate Analysis. Academic Press, London-New York-Toronto-Sydney-San Francisco: 1979. xv, 518 pp., $ 61.00. Biometrical Journal. 1982;24(5):502-. 10.1002/bimj.4710240520. [DOI] [Google Scholar]
  • 60.Charrad M, Ghazzali N, Boiteau V, Niknafs A. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software. 2014;61(6):1–36. doi: 10.18637/jss.v061.i06 [DOI] [Google Scholar]
  • 61.de Leeuw J, Mair P. Multidimensional Scaling Using Majorization: SMACOF in R. Journal of Statistical Software. 2009;31(3):1–30. doi: 10.18637/jss.v031.i03 [DOI] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010360.r001

Decision Letter 0

Thomas Leitner, Michiel van Boven

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

19 Sep 2022

Dear Dr. Bahl,

Thank you very much for submitting your manuscript "Diversity and Evolution of Computationally Predicted T Cell Epitopes against Human Respiratory Syncytial Virus" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

Your manuscript has been reviewed by three knowledgeable reviewers. Although all three reviewers agree that your manuscript contains interesting information, they also had important major comments. I urge you to provide a precise and fully transparent point-by-point response to the comments by the reviewers, and revise your manuscript accordingly. As everyone is strongly time-constrained I may opt not to resend a revised manuscript to the reviewers, but rather make a decision based on your revised manuscript and response to the comments of the reviewers.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Michiel van Boven

Guest Editor

PLOS Computational Biology

Thomas Leitner

Section Editor

PLOS Computational Biology

***********************

Dear authors, your manuscript has been reviewed by three knowledgeable reviewers. Although all three reviewers agree that your manuscript contains interesting information, they also had important major comments. I urge you to provide a precise and fully transparent point-by-point response to the comments by the reviewers, and revise your manuscript accordingly. As everyone is strongly time-constrained I may opt not to resend a revised manuscript to the reviewers, but rather make a decision based on your revised manuscript and response to the comments of the reviewers.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The paper by Chen et al presents an analysis of T cell epitope evolution in RSV. Though an interesting study, the results remain much more descriptive than conclusive. Moreover, a number of issues are not clear and therefore it is rather difficult to re-produce the results presented in this paper. For example:

1) The paper makes use of a non-publicly available tool

2) Are only the epitopes presented in Table I and II are used to make MDS maps? How many epitopes were predicted by the method but not (yet) validated experimentally.

3) Vaccine section: what is the definition of being conserved? If an epitope still binds to its MHC, is it then a conserved epitope? T cell binding can still be totally lost.

More major points to re-consider:

1) At line 224 it is suggested that visualising T cell epitope evolution via MDS is a new approach. As the authors state it in the discussion, this is not the case.

2) JanusMatrix algorithm is used extensively in the paper, but the reader gets very little information on what this model does.

3) How were the alleles included in this study decided? Why are there only two HLA-B alleles and many more HLA-DRB alleles? Clearly allele choice might have a big impact on the results.

4) The analysis makes use of a lot of thresholds to identify a T cell epitope. How are these thresholds defined? Are the results sensitive to the thresholds.

5) Why are the Class I and II epitopes are combined in this analysis?

Minor points:

1) Both in Table I and II it would be nice to add what kind of experimental validation is found? in vitro binding, elution, T cell response?

2) Table II: are predicted and validated epitopes are the same or do they only have the same core?

3) Figure 2 caption: indicate what A, B, C and D panels are.

4) Which thresholds are used to make NetMHCpan analysis?

5) The sentence starting on line 78 does not sound correct to me.

6) Line 232: biding --> binding

Reviewer #2: Chen et al investigate the diversity of RSV from the perspective of T-cell epitopes. These epitopes might be an interesting target for future vaccine development, both because they may provide broad immunity, but also because of the problem of vaccine-induced disease. The authors make use of of-the-shelf bioinformatic tools to find candidate CD4 and CD8 T cell epitopes for common HLA alleles. They filter out epitopes that are similar to human-protein derived peptides and compare the predicted epitopes with experimentally validated ones from IEDB. While I do think the approach can be useful and interesting, I have a couple of concerns that should be addressed. Most importantly, the method section has to be re-written and made much clearer.

Major comments:

The introduction was well written and is very informative, except for the last part of the fourth paragraph. I’m not sure what the relevance of cross-conserved epitopes is. I’m not sure what cross-conservation means. The final sentence does not follow naturally from this paragraph. Try making a better argument for why characterizing T-cell epitope profiles is important.

Identifying epitopes after a gene duplication event: On page 10 you mention that gene duplication events can “shift” the position of epitopes, or create novel epitopes. One could argue that a “shifted” epitope is the same epitope as the original. How is this counted when you compare strains in T-cell antigenic space? Are shifted epitopes identified, or counted as separate? How does this influence the clustering?

Clustering in low-dimensional space. As MDS is a non-linear dimension reduction method, clustering the data in the low-dimensional “MDS-space” could result in artifacts. For instance, points might cluster together because they are projected together, not because they are actually close in the original “epitope space”. This may not be an issue in this case, but I suggest checking this by first clustering the data, and then applying MDS. You might need a clustering method that works directly on the distance matrix (Leiden clustering perhaps?). Alternatively, you could increase the dimension of the MDS space, and use k-means on this higher-dimensional MDS space.

The definition of distance between strains. I don’t understand the logic behind equations 1.1 and 1.2. Why is this a good distance measure between strains? I also don’t understand the notation in 1.1. What is p(i)a? Are you multiplying p(i) with a? I thought that a is an HLA allele. You mention that you take z-scores of the probabilities p. At what point does this happen? Can distances be negative? It appears to me that something is wrong here. This has to be explained in a more rigorous manner.

The same holds for equations 2.1 and 2.2. Why switch to a different definition here (why don’t you use a single distance measure?). I don’t understand what you mean by “T-cell cross-conservation between two epitopes can be represented by a joint probability estimation and therefore …”. Why is 2.1 a product, while 1.1 was a sum?

Several sets of epitopes are mentioned: predicted epitopes (1%, 5%), then human peptides are filtered out, and then you list a set of experimentally confirmed epitopes. In the end, which set of epitopes is used for the MDS analysis and clustering?

Minor comments:

In the case of Influenza, some evidence exists that T-cell epitopes are subject to positive selection in humans. For instance Machkovech et al (J. Virol. 2015) compared the substitution rate in epitope sites with that of swine flu, and in Woolthuis et al (ref 32) the number of epitopes appears to decrease with time. It would be interesting to do a similar analysis for RSV. Have you tried this? Perhaps you can at least comment on this.

You give a list of experimentally verified epitopes. However, are there any epitopes in IEDB that are not present in the predicted dataset? If so, why are they not predicted?

Line 20: the B-cell response

Line 38: The B-cell epitope

Line 80: add some references? Or the names of some of these studies?

Line 105-108: I don’t understand this sentence. Please re-phrase.

Line 112: What does cross-conserved mean?

Line 125: do you mean “Isolation years”?

Line 133: Dependent on the final order of sections (Intoduction, Results, Discussion, Methods), at this point it is not clear how this score is generated. Perhaps give a one-sentence explanation and mention the +20 significance level.

Line 144: You mention that T-cell epitopes and Ab-epitopes overlap, but figure 1B does not show the location of Ab-epitopes. Would it be possible to indicate those in the 3d structures?

Fig S1: Is the “dominant” epitope the most common epitope or the epitope with the highest score? Somewhat confusing terminology.

Line 179: EpiMatrix has not been mentioned before.

Line 181: cross-conserved? Do you mean cross-reactive?

Table 1: Conservation is calculated using all available sequences. But did you account for sampling bias for more recent strains? If an epitope is present in a more recent strain, and many similar strains are also isolated, then the conservation is artificially inflated.

Line 203: perhaps say “human peptides” as these are (hopefully) not actual epitopes.

Line 238: Cluster 1 is paraphylectic, which is quite interesting, but might this also depend on the parameters/settings used for clustering. With alternative settings, cluster 1 could perhaps be split into 2 separate clusters that might become monophylectic. Can briefly you comment on this?

Line 281: Has EpiCC been introduced?

Figure 3: This is not a very clear representation of the data. Would a simple bar plot be better?

Line 363: “Possibility” instead of “probability”

Line 371: This paragraph is a bit weak. Can you say what these biases might be? And what are these valuable insights?

Line 423: “Each one overlapping the previous one by one amino acid”: surely you mean overlapping by 8 amino acids?

Line 480: What does this mean: “are assumed to be mutually exclusive and uniform”? Please explain this more clearly.

Line 496: where in the main text are you using this? Also, please explain more clearly what you are doing here (and why)

Line 521: what was the result of the goodness-of-fit analysis?

Line 528: How did you take these insertions into account in the genetic distance?

Reviewer #3: The paper is an interesting read and describes an elegant computational analyses that provides novel insights that improve understanding of immunity to RSV. I have some minor questions that may be addressed by the authors to broaden the discussion a bit:

1. The work takes a number of HLA types into account to find relevant epitopes. Considering the diversity of HLA alleles in a population of individuals that will be vaccinated, how would an individual's HLA typing impact the success of a vaccine that has been designed by this computational approach? In other words: does a selected peptide work out well for a large part of the whole population of vaccinees? Or should a wide selection be made from a large pool of different peptides? How do the authors envision such an approach towards vaccine design?

2. The authors stress the variations among RSV G proteins. However, the RSV G protein contains a central conserved domain that has been shown to provide T-cell immunogenicity. What would such conserved domain mean for predicting T-cell epitopes and immunity to G protein?

3a. Many epitopes are left out of the analysis due to similarity to human sequences, since this would generate tolerance. That is a fair point. However, these epitopes may also actively induce Treg cells that may control excessive T-cell responses causing unwanted inflammation. Can the authors exclude that such 'active control' as means of tolerance may be involved in the overall immune response to RSV? In other words, would neglecting such 'tolerant' epitopes pose a risk for excessive immune responses when regulatory T cells are not allowed to be activated sufficiently?

3b. Were epitopes in tables 1 and 2 indeed validated to be tolerated experimentally, where indicated? And was immunogenicity of these sequences checked and never observed?

Technical notes:

4. In figure 1B it might be helpful the reader to point antigenic sites ø and II in the 3D-structures of pre-F and post-F.

5. Figure 2A describes 'Tenaus toxin'. Is this Tetanus toxin?

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: The authors are using a non-publicly available tool and their analysis is hard to reproduce because not all details are given. There is some code available in github but I wander if one can use it without having the predictions.

Reviewer #2: No: The authors use publicly available data, but did not share any of their scripts.

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010360.r003

Decision Letter 1

Thomas Leitner, Michiel van Boven

7 Dec 2022

Dear Dr. Bahl,

We are pleased to inform you that your manuscript 'Diversity and Evolution of Computationally Predicted T Cell Epitopes against Human Respiratory Syncytial Virus' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Michiel van Boven

Guest Editor

PLOS Computational Biology

Thomas Leitner

Section Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010360.r004

Acceptance letter

Thomas Leitner, Michiel van Boven

4 Jan 2023

PCOMPBIOL-D-22-01014R1

Diversity and Evolution of Computationally Predicted T Cell Epitopes against Human Respiratory Syncytial Virus

Dear Dr Bahl,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofi Zombor

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Supplementary Materials.

    Table A: Number of computationally predicted conserved RSV T cell epitopes and experimentally identified RSV T cell epitopes. Table B: Conservation of experimentally validated conserved MHC class I epitopes peptides in RSV major surface proteins in subsampled dataset. Table C: Conservation of experimentally validated conserved MHC class II epitopes peptides in RSV major surface proteins in subsampled dataset.

    (DOCX)

    S1 File. Accession number to RSV sequence that are used in this study.

    (CSV)

    S1 Fig. Distribution and diversity of T cell epitopes in RSV F protein.

    The tree panel on the left is a time-scaled phylogeny build with RSV-A (A) or RSV-B (B) F gene nucleotide sequences using the ML approach. Determined genotypes are labeled on the right with black bars. Each color column on the right side represents the presence of an MHC class I or class II epitope. Only the epitopes that are present in more than 1% of sampled isolates are displayed. The column color indicates different numbers of epitope sequences at the same location.

    (TIF)

    S2 Fig. Distribution and diversity of T cell epitopes in RSV G protein.

    The tree panel on the left is a time-scaled phylogeny build with RSV-A (A) or RSV-B (B) G gene nucleotide sequences using the ML approach. The clades that contain novel 72-nt or 60-nt duplication at the second hypervariable region of G gene were highlighted in red. Determined genotypes are labeled on the right with black bars. Each color column on the right side represents the presence of an MHC class I or class II epitope. Only the epitopes that are present in more than 1% of sampled isolates were displayed. The column color indicates different numbers of epitope sequences at the same location.

    (TIF)

    S3 Fig. Distribution of JanusMatrix Human Homology score for putative RSV MHC class I and class II epitopes.

    The cross-reactive potential of identified putative T cell epitopes and human host was represented with a JanusMatrix Human Homology score. 6.45% identified putative class I epitopes and 1.12% class II epitopes are cross-conserved on the TCR face with human peptides.

    (TIF)

    S4 Fig. Predicted T cell epitope landscapes of RSV surface proteins.

    RSV T cell epitope landscapes were built with sequenced-based MHC class I epitope binding prediction (left), MHC class II epitope binding prediction (middle) or combining class I and class II epitope binding prediction (right). Sequences are colored by the epitope cluster determined by epitope landscapes built with combining Class I and Class II epitope prediction

    (TIF)

    S5 Fig. Total within sum of squares (wss) using k-means algorithm.

    Totals within sum of squares in epitope topographies were calculated after clustering into k (from 1 to 10) groups with k-means. The optimal number of clusters is determined to be 3 in the analysis of RSV-A F and G proteins and is determined to be 2 in the analysis of RSV-B F and G proteins using the Elbow method.

    (TIF)

    S6 Fig. Sensitivity analysis for MDS and k-means clustering.

    (A) Stress evaluation under the different number of dimensions for RSV distance matrix. Stress less than 0.15 (red dash line) indicates an acceptable precise MDS solution. (B) Performance of k-means clustering under the different number of dimensions, the number of clusters is determined at 2-dimensional space. There is no cluster grouping difference at higher dimensional space (orange). Sum square between clusters /sum square of total differences (BSS/TSS) measures indicates the total variance in the data is explained well under higher dimensional space (green).

    (TIF)

    S7 Fig. Validation of T cell epitope distance estimation using the IEDB analysis resource.

    Validation is performed with MHC class I epitope binding prediction of RSV-A F protein. (A) Heatmaps for pairwise MHC class I epitope distance estimated in iVAX toolkits or calculated with custom python scripts using MHC class I molecule binding prediction that is implemented in IEDB. (B) Eigenvalues for each sequence are calculated from pairwise distance matrices using “RSpectra” package in R. The Pearson correlation test significantly supports a non-zero correlation between T cell epitope distance estimated with EpiCC and T cell epitope distance estimated with IEDB. (C) T cell epitope topographies are built with pairwise epitope distances estimated from EpiCC or IEDB. Both methods resulted in a similar cluster pattern for the CD8 T cell epitope profile of RSV-A F protein.

    (TIF)

    S8 Fig. Evaluation of RSV vaccine candidate strains with class I and class II T cell epitope content in different WHO regions.

    RSV-A and RSV-B major surface protein sequences were grouped by isolation year and 6 isolated WHO regions, African Region (AFRO), Region of the Americas (PAHO), South-East Asia Region (SEARO), European Region (EURO), Eastern Mediterranean Region (EMRO) and Western Pacific Region (WPRO). The proportion of cross-conserved T cell epitope content between vaccine strains (CP248 or CP52) and wild circulating strains in different isolation years and different WHO regions were represented.

    (TIF)

    Attachment

    Submitted filename: Chen[CompBio]_RTR_1116.pdf

    Data Availability Statement

    Data and materials availability: Sequence data was retrieved from NCBI’s GenBank nucleotide database (https://www.ncbi.nlm.nih.gov/nucleotide/). Accession number to RSV sequences used in this study are available in supplementary materials. Code to generate T epitope landscapes are deposited in GitHub https://github.com/JianiC/RSV_Epitope.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES